<<

SR PT01 Form 1 WAIVER

THE UNIVERSITY OF NEW SOUTH WALES

DECLARATION RELATING TO DISPOSITION OF PROJECT REPORT /THESIS

This is to certify that I .. cQ/t:J6 ; .."ZH~Io.Jf! . . being a candidate for the degree oL ... MA-S..tet'. c>F ... SC~III.C..~ .. am fully aware of the policy of the University relating to the retention and use of higher degree project reports and theses, namely that the University retains the copies submitted for examination and is free to allow them to be consulted or borrowed. Subject to the provisions of the Copyright Act, 1968, the University may issue a project report or thesis in whole or in part, in photostat or microfilm or other copying medium.

In the light of these provisions I grant the University Librarian permission to publish, or to authorize the publication of my project report/thesis, in whole or in part. l also authorize the publication by University Microfilms of a 350 word abstract in Dissertation Abstracts International (applicable to doctorates only).

SiJ,:nature

Witness v Date ..../'t-4 ...... -Zl9

THE DEVELOPMENT OF A HIGH-PERFORMANCE NAPLPS DECODER

A THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY COLLEGE THE UNIVERSITY OF NEW SOUTH WALES AUSTRALIAN DEFENCE FORCE ACADEMY FOR THE DEGREE OF MASTER OF SCIENCE

By Qing Zhong April1988 2 OMAR 1989

USRARY SR.P.TlO

CERTIFICATE OF ORIGINALITY

I hereby declare that this thesis is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a university or other institute of higher learning, except where due acknowledgement is made in the text of the thesis.

(Signed) .IL______J I.....

• © Copyright 1988 by Qing Zhong

11 Acknowledgments

I would like to thank my supervisor Dr. G. W. Gerrity for his encouragement, patience, invaluable assistance, and being such a good friend. I would also like to extend my thanks to Dr. C. Lokan, and Dr. A. P. Seneviratne for their helpful comments.

I want to express my thanks to the following people who have helped me in many areas and given me advice on various topics: Dr. C. W. Johnson, Mr. L. P. Brown, Mr. W. K. Ung, Mr. C. Vance, Mrs. W. A. Nelowkin, Mr. G. Collin, Mr. P. O'Keeffe, Mr. A. Lagos, Mr. C. Stevens, Mr. P. Tang.

I gratefully acknowledge the financial assistance of a University Scholarship.

Ill Abstract

This thesis reports the development of a high-performance NAPLPS videotex pro­ tocol decoder. Various measurements were made to quantify the performance of the decoder and to identify the performance bottlenecks. The results show conclu­ sively that the bottlenecks are in the system software, and much of the complexity of the software is in fact the direct result of the poor match between the graphics features supported by the graphics processor used, and those required to imple­ ment NAPLPS efficiently. It suggests a number of techniques, such as optimised hardware support, which can be applied to remove these bottlenecks. A number of implementation issues such as the scope, interpretation, and verification of the NAPLPS standard are discussed.

The limitations and suitability of the NAPLPS standard with respect to user interaction and animation are investigated. Some short-term and long-term solu­ tions for providing better user interaction and animation are suggested.

iv Contents

Acknowledgments iii

Abstract iv

1 Background 1

2 Overview of NAPLPS 6

2.1 Videotex Service and NAPLPS 6

2.2 Background . . 7

2.3 Basic Features 9

2.4 NAPLPS ... 10

2.4.1 Code-Extension . 10

2.4.2 The Unit Screen and Fixed-Point Binary Format 13

2.4.3 Text conversion . 14

2.4.4 Graphics .... 15

2.4.5 The Logical Pel . 17

2.4.6 Colour Control 17

2.4.7 Control .. 19

2.4.8 User Input 19

2.4.9 Macros ... 19

v 2.5 Conformation and Service Reference Model ...... 20

3 System Hardware 21

3.1 Introduction ...... 21

3.1.1 Graphics Display 21

3.2 Raster Graphics System 34

3.2.1 Colour Lookup Table . 35

3.2.2 Graphics Functions . 37

3.3 Graphics Processor . 41

3.3.1 ACRTC ... 43

3.3.2 System Hardware Design 48

4 Software 51

4.1 Basic Principles of Implementing NAPLPS features . 55

4.1.1 Macros .... 55

4.1.2 Character Sets 56

4.1.3 DRCS ..... 56

4.1.4 Filled Graphic Primitives 57

4.1.5 Logical Pel ...... 58

5 Performance Analysis 60

5.1 Graphics Execution Speed Measurements 60

5.1.1 Maximum character-Parsing speed 60

5.1.2 Character-Display Speed. 61

5.1.3 Point-Drawing Speed ... 62

5.1.4 Rectangle-Drawing Speed 64

Vl 5.1.5 Polygon-Drawing Speed 65

5.1.6 Line-Drawing Speed 65

5.1. 7 Arc-Drawing Speed . 67

5.2 Analysis and Discussion .. 69

5.2.1 Bottlenecks and Possible Improvements 69

5.2.2 Cost/Performance Tradeoffs ...... 71

6 Extensions to NAPLPS standard 72

6.1 Introduction ...... 72

6.2 Animation Techniques 74

6.2.1 Animation techniques 74

6.3 User Interface .... 77

6.4 Temporary Solution 79

6.4.1· Technical Details 81

7 Discussion 83

7.1 NAPLPS standard 83

7.1.1 Scope of the standard 83

7 .1.2 Protocol Error Sequences 84

7.1.3 Interpretation and Verification of the NAPLPS standard. 85

7.2 Decoding Algorithms and Optimised Hardware Support 86

7.2.1 Logical Pel ... 86

7.2.2 Filling Polygons 92

7.3 Test Data 97

7.4 Cost 97

7.4.1 Hardware-Component Cost 97

Vll 7 .4.2 Software Cost . 99

7.5 Improvements ..... 99

7.5.1 Software Performance 99

7.5.2 Error Checking .... 100

7.5.3 ACRTC Register Set-up Values . 101

7.6 Use Better Graphics Controllers ..... 102

8 Conclusion 103

A Listings 105

... Vlll List of Figures

1.1 PAL System . . . 2

2.1 Videotex Service ...... 7

2.2 The Seven Functionally Separate Layers of the OSI Model . 8

2.3 7-Bit In-Use Table ...... 11

2.4 Code Extension in a 7-Bit Environment 12

2.5 The Unit Screen ...... 13

2.6 The Fixed-Point Binary Format. 14

2.7 Two-Dimensional Mode 14

2.8 PDI Command Format . 15

2.9 PDI Drawing Primitives 16

2.10 A Figure Drawn by Using Incremental Lines . 16

2.11 The Logical Pel . . 17

2.12 A Filled Rectangle 18

2.13 NAPLPS Colour Encoding. 18

2.14 Textured Lines Drawn in Two Different Colour Modes 19

3.1 Mapping of Frame Buffer to Display Screen. (a) A Monochrome Display. (b) A Colour Display...... 23

3.2 Arrangement of a Typical DRAM. 24

IX 3.3 Read Cycle Timing . 25

3.4 Write Cycle Timing 25

3.5 Page Mode Operation 26

3.6 Nibble Mode Operation 27

3. 7 Static-Column Decoding Operation . 28

3.8 Ripple Mode Operation . 28

3.9 Packed-Pixel Architecture 30

3.10 Planar Architecture . . . 30

3.11 Colour Expansion Operation 32

3.12 Translation ...... 33

3.13 Pixel and Planar Architecture 34

3.14 A Control Processor Driven Graphics System 34

3.15 A Graphics Processor Driven Graphics System 35

3.16 Dual-port Memory ...... 36

3.17 A Graphics Processor Driven Graphics System (Note the frame buffer has a dedicated port to the graphics processor.) ...... 36

3.18 A Colour Graphics Display System with Colour Lookup Table. 37

3.19 Texas Instruments TMS4161 Dual-port RAM 39

3.20 ACRTC Function Block Diagram ...... 43

3.21 ACRTC Interleaved Access Mode (**display data read from frame buffer) ...... 45

3.22 Early-Write Cycle 46

3.23 Early-Write and Read Cycles 47

3.24 RAS-CAS Generation Circuitry . 48

3.25 Block Diagram of a NAPLPS Decoder 50

X 4.1 Software Structure. (a) Code Generation (b) Code In Use 53

4.2 ACRTC PATTERN Command 56

4.3 ACRTC PAINT Command 57

4.4 Direct Fill ...... 59

4.5 Effect of line width Generated by Drawing Multiple Lines 59

6.1 Colour Cycling ...... 74

6.2 Simple Colour Table Animation . 75

6.3 Complex Colour Table Animation. 75

6.4 QingBox ...... 79

7.1 Draw a Line By Using Fill Method 87

7.2 An Arc ...... 87

7.3 Start and End Pels Overlay 88

7.4 Line Texture and Dot Space . 89

7.5 Line Drawn In Colour Mode 2. 90

7.6 A Textured Arc ...... 91

7. 7 Isolated Pixels Due to Quantization Errors 93

7.8 A New Scheme For Filling Primitives. 95

7.9 Direct Fill .. 96

7.10 Clipping Arc 100

Xl List of Tables

5.1 Character Display Speeds 61

5.2 Point Drawing Speed . . . 63

5.3 Rectangle (outlined) Drawing Speed 64

5.4 Rectangle (filled) Drawing Speed 65

5.5 Polygon (144 vertices) Fill Speed 66

5.6 Polygon (256 vertices) Fill Speed 66

5.7 Line Drawing Speed ...... 67

5.8 Arc (outlined) Drawing Speed . 68

5.9 Arc (filled) Drawing Speed ... 68

5.10 Arc (outlined) Drawing Speed . 69

5.11 Arc (filled) Drawing Speed ... 70

Xll Chapter 1

Background

In conjunction with a computer-aided instruction project, the primary objective of this master thesis is to investigate the possibility of providing a low-cost, yet high-performance, graphics terminal to be used for presenting lesson materials.

Most commercially-available computer-aided instruction, authoring and pre­ sentation systems have serious deficiencies which have inhibited them from gain­ ing wide acceptance. For commercial reasons, they are generally highly machine­ dependent and use the unique features of the target system. This implies that the languages, file structures, and tools used to prepare the lesson materials are determined by the operating system, and also that the quality of presentation frames is limited by the computer terminals provided by the manufacturer and the capability of the graphics software. The consequences of hardware dependen­ cies are that courseware written for one system will not be able to be used in another system, and that the cost of hardware is very high due to the limited market and the marketing strategy used by the manufacturer. This high cost is generally beyond the budgets of most primary and secondary schools. Another problem is that many authoring languages have limited functionality and are dif­ ficult to use. Since writing courseware requires considerable intellectual effort, it is very expensive and time consuming: therefore, there is little quality courseware available on the market [1] [2].

Since June 1984, a team in the Department of Computer Science at the Uni­ versity College of the University of New South Wales, at the Australian Defence Force Academy has undertaken the development of a Portable Authoring Lan­ guage (PAL) system which hopefully will be adopted by academic institutions as a standard. The project has several specific objectives: portability, provision of

1 CH.-\.PTER 1. BACKGROUND 2

Screen-based Editor

Graphics Editor Frame Sequence language PAL Graphics

GKS/CGI Generator

AS CIT

NAPLPS Decoder

r.------:IDIIDI! Colour '------Monitor Colour TY Monitor Monitor

Figure 1.1: PAL System

versatile and powerful editing and presentation features, and low entry-level cost [1). Several strategies have been used to enhance the portability of the author- ing language. First, the PAL system is written in the widely-available high-level systems programming language C. Second, the system will provide graphics in- terfaces for the three standard graphics systems - the ISO standard Graphical Kernel System (GKS) [3), the North American Presentation Level Protocol Syn- tax (NAPLPS) standard [4), and the Computer Graphics Interface (CGI) [5).

Figure 1.1 shows the components of the PAL system. The system provides support for an author to develop lesson materials as a series of graphic frames by using a powerful screen-based graphics editor. The lesson presentation is controlled through a sophisticated procedural frame sequencing language. The three graphic interfaces provided at its lower-level will enable a wide range of terminals to be used. CHAPTER!. BACKGROUND 3

As indicated above, the desire to enhance the portability of the authoring sys­ tem has motivated a research study into the possibility of providing a NAPLPS decoder at reasonable cost which is capable of remote connection to a host com­ puter. The availability of this decoder would allow the use of a television monitor as an economical terminal if the user has no access to a suitable computer. A medium-resolution N APLPS protocol decoder has been designed and constructed. The prototype has been used:

• To investigate the suitability and limitations of the NAPLPS protocol with respect to:

1. its use in an interactive environment, 2. its use for describing animated scenes;

• To investigate efficient decoding algorithms for NAPLPS protocols;

• To study and quantify the cost-performance tradeoffs.

Good user interaction and animation may be some vital factors to the success­ ful realisation of the PAL project. However, at the early stage of the PAL project, there were some issues which were not known or clear: what interaction and an­ imation features could be required by the PAL system, and how user interaction and animation could be supported by NAPLPS ? Therefore, it was necessary to study in parallel with the PAL project the suitability and limitations of NAPLPS with respect to user interaction and animation, and to determine what effects these limitations would impose on the implementation of the PAL system. At the end of the thesis, we will offer some temporary solutions for a better user inter­ face and animation and make some recommendations to the ANSC Task Group (Videotex) and the Canadian Videotex Consultative Committee/Working Group, (CVCC/CSA/WG) for a long-term solution.

Currently-available NAPLPS decoders only provide output for the NTSC colour television system as opposed to the PAL colour television system used in Australia. Therefore, if the NAPLPS decoder is to be used with a PAL television monitor, then a PAL encoder must be provided as part of the package.

There are some factors which may have partially contributed to the failure of NAPLPS to achieve the quick success that technologists predicted (6] [7]. The cost of dedicated decoders is high and the implementation of NAPLPS in existing decoders is also poor. The high cost is due to the limited market and the poor CHAPTERl. BACKGROUND 4

implementation is partly due to the complexity of the NAPLPS standard. For instance, Norpak Corp., a pioneer in videotex technology, has produced a NAPLPS Service Reference Model (SRM) [4] display generator for the IBM PC based on the Rockwell R6549 VLSI Colour Video Display Generator (CVDG), which was claimed to be designed specially for implementing NAPLPS [8]. But the decoder does not support line drawings with different line textures as specified by the NAPLPS standard; and it will not fill certain polygons correctly. This is typical of the limitations of the existing NAPLPS decoders.

Most existing decoders have limited resolution and a limited number of si­ multaneously displayable colours (16 colours is typical, which meets the colour implementation requirement specified by the SRM) [7]. This is due to the fact that the initial emphasis of the development of videotex technology was in domes­ tic and business markets, which may not require high resolution and many colours. As is indicated later in the thesis, bitmap memory has contributed the most to the cost of a decoder. In the past, increasing the display resolution and number of colours would have considerably increased the total cost of N APLPS decoders, ren­ dering them uneconomical for the projected consumer and business market. The domestic TV set has always been considered to be the principal display vehicle for videotex since the early stage of the development of videotex technology. But the limited bandwidth of domestic TV set is another factor which determines the maximum resolution. However, as indicated above, there is now an appreciation that better display resolution and colour choice plays a very important role in user interaction. As the costs of memory chips continue to decline, one can afford to increase resolution and number of colours.

As a matter of fact, as will be shown later, increasing the display resolution and the number of colours results in significant performance degradation and higher production cost. To improve the performance, better hardware support is needed.

It is interesting to note that now a number of semiconductor manufactures have produced graphics processors (or controllers), and that many claim that these graphics processors support the NAPLPS standard (9]. 1 2 3 4 5 Their claims are based on the fact that these graphics processors support graphics drawing 1See Data Sheet: DP8500 Raster Graphics Processor, Advance Information, September, 1985, National Semiconductor Corporation. 2See Data Sheet: 82786 CHMOS Graphics Coprocessor, Advance Information (Order Number: 231676-002), November, 1986, Intel Corporation. 3See HD63484 ACRI'C Advanced CRT Controller User's Manual, second edition, July, 1985, Hitachi, Ltd. 4See TMS34010 User's Guide, 1986, Texas Instruments Inc. 5See Data Sheet: J.LPD7220/GDC Graphics Display Controller, NEC Electronics U.S.A Inc. CHAPTERl. BACKGROUND 5

primitives very close to those which NAPLPS supports. But, to decode NAPLPS faithfully, eg. proper line width or fill with different textures, the host CPU still has to perform a considerable amount of work.

Once the factors which affect performance and cost have been identified and quantified, other implementors will be able to assess those factors and manufacture new economical and/or high performance NAPLPS decoders according to different users' requirements.

The outline of the thesis is as follows: this chapter has described the back­ ground and motivation of the project, and specifies the goals to be achieved. Chap­ ter 2 provides an introduction to videotex technology and briefly describes the key features of NAPLPS standard. Chapter 3 provides the background material on raster technology. Each component of a typical raster display system is described, and the key components which directly affect the performance of a raster display system are discussed. The graphics controller used for the decoder is described, and the design of the decoder hardware is given. Chapter 4 is concerned with the design of the decoder's software. The software tools and development envi­ ronment are described. It also briefly explains how various NAPLPS features are implemented.

Chapter 5 contains the details of measurements which have been made to quan­ tify the performances of the system software and the graphics hardware, and to compare its performance with a commercially-available decoder. From the mea­ surements, the performance bottlenecks are identified, and a number of techniques which can be applied to remove these bottlenecks are suggested. Chapter 6 dis­ cusses implementation issues and problems encountered, such the scope, inter­ pretation, and verification of the NAPLPS standard. Based on the results from chapter 5, it includes a further discussion on decoding algorithms and optimised hardware support. It also contains a discussion on cost considerations and suggests a few areas for future development.

Chapter 7 discusses the limitations and suitability of the NAPLPS standard with respect to user interaction and animation and suggests some temporary and long-term solutions. During the discussion, a number of animation techniques are also given. Chapter 8 summarises the research. Chapter 2

Overview of N APLPS

2.1 Videotex Service and NAPLPS

Videotex is an interactive two-way information system which provides users with access to frames composite with text and graphic information from a remote host computer. Figure 2.1 is a typical videotex system. A videotex terminal is attached to a standard TV set. A Keyboard is used for user input. The user accesses frames by sending requests back to the host computer. The communication channel can be fixed wires, public telephone networks, local area networks, and other commu­ nication networks. Videotex has three main characteristics. The first one is the emphasis on graphics. Second, it is page oriented which implies that information is accessed as a series of frames (or pages) of text and graphic primitives. And the last one is that the frames are structured in a hierarchy, and the user accesses to the frames through the hierarchy by means of sending search requests as num­ bers or keywords. Although the initial emphasis has been purely on information retrieval, it has found many other applications, such as games, and education.

NAPLPS, the cacophonic acronym stands for the North-American Presenta­ tion-Level Protocol Syntax, is a standard for the sixth level of the seven-level model for Open System Interconnection ( OSI) promoted by the International Or­ ganisation for Standardisation (ISO) (Figure 2.2). The standard was developed jointly by the ANSI (American National Standards Institute) X3L2.1 Task Group on Videotex/ Coded Character Sets, and the Canadian Videotex Consul­ tative Committee/Canadian Standards Association/Working Group on Videotex (CVCC/CAS/WG). The standard is designed as X3.110-1983 by ANSI and as T500-1983 by CSA. It was officially approved by the International Telegraph and

6 CHAPTER 2. OVERVIEW OF NAPLPS 7

HOST COMPUTER

TERMINAL

Figure 2.1: Videotex Service

Telephone Consultative Committee (CCITT) in 1985 (see " gains valuable status as international videotex standard", The Australian, January, 29, 1985).

2.2 Background

Europe pioneered the videotex technology. In 1979 the United Kingdom intro- duced Viewdata, a service based on the Presetel coding scheme. The service pro- vides customers with alphamosaic graphics - pictures built from character-size blocks. France's telecommunications agency also recognised the potential future of videotex services and developed its own system, called . In addition to al- phamosaics, the service uses semigraphics characters as well. The Antiope system produces better graphics than the Presetel system. However, the Conference of European Posts and Telecommunications ( CEPT) has endorsed the British tech- nique, with the result that similar systems in other parts of Europe are derived from it.

In 1981, the Conference of European Post and Telecommunications Admin- istrations developed the CEPT PLPS as a compromise videotex standard to be adopted by some European countries. In fact, it is a composite standard of various videotex systems existing in Europe, such as the British and the French Antiope national' videotex systems.

Japan developed it own videotex standard, the Character and Pattern Tele- phone Access Information Network (CAPTAIN), primarily to accommodate the CHA.PTER 2. OVERVIEvV OF NAPLPS 8

END-USER FUNCTION APPLICATION PROCESS t PROVIDES APPROPRIATE APPLICATION SERVICE FOR APPLICATION LAYER 7 PROVIDES PRESENTATION DATA LAYER FORMATTING 6 PROVIDES SERVICE FACILITIES SESSION TO THE APPLICATION LAYER 5 PROVIDES END-TO-END DATA TRANSPORT TRANSMISSION INTEGRITY LAYER 4

SWITHCHES AND ROUTES NETWORK INFORMATION UNITS LAYER 3 PROVIDES TRANSFER FUNCTIONS DATA-LINK FOR INFORMATION UNITS OF LAYER PHYSICAL LINK'S OTHER END 2 TRANSMITS BIT STREAM TO PHYSICAL PHYSICAL MEDIUM LAYER 1

Figure 2.2: The Seven Functionally Separate Layers of the OSI Model CHAPTER 2. OVERVIEW OF NAPLPS 9

special requirements of Japanese text.

The Canadian Department of Communications, after watching developments in Europe with interest, came up with its own system, called Telidon. In 1981, AT&T published a Presentation Level Protocol (PLP) to help standardise the manner in which videotex data would be sent. Two years later, the Canadian Stan­ dards Association and the American National Standards Institute jointly issued the Videotex/Teletext Presentation Level Protocol Syntax, also called the NAPLPS. NAPLPS is now gradually gaining widespread acceptance in North America.

Now, NAPLPS, CAPTA~N and CEPT have become the three major coding standards for the presentational level protocol syntax internationally. (These three standards are contained in the CCITT's recommendations T.lOl and F.300 on videotex) [10].

2.3 Basic Features

N APLPS specifies the data syntax for the encoding of alphanumeric and graphic information in videotex and teletext services originating in North America. The encoding scheme uses only the American National Standard Code for Information Interchange (ASCII), which is widely used in computing, to maintain compatibility with existing equipment and standards.

The protocol encodes graphic information in terms of graphic primitives. By adopting a compression technique, very few bytes are required to describe each primitive, allowing the use of low-bandwidth channels such as public telephone lines.

A major advantage over other encoding schemes, such as Telecom Australia's Viatel service which is based on the Presetel coding scheme, is that the alphanu­ meric and graphic information are encoded in a machine-independent manner: therefore, terminals with different configurations can be used. Furthermore, it also provides techniques that allow extensions to be added to the standard in the future without affecting the existing features.

The basic features of NAPLPS are the coding of text in ASCII and the coding of geometric images as Picture Description Instructions (PDI). The PDI speci­ fies graphic information as points, straight lines, circular arcs, rectangles, poly­ gons and incremental primitives. In addition to normal ASCII characters, a few CHAPTER 2. OVERVIEvV OF NAPLPS 10

other character sets are also provided: Supplementary Character Set, Mosaic Set and Dynamically Redefinable Character Set (DRCS). Since the NAPLPS graph­ ics characters have various attributes such as stroke width, texture, colour, size and rotation, the encoding scheme has the ability to represent informative and attractive frames.

2.4 NAPLPS

2.4.1 Code-Extension

The method of code extension used by NAPLPS is based on the extension tech­ niques specified in ISO 2022-1982. A large table of codes (in-use table) (128 for 7-bit environment or 256 for 8-bit environment) is used, and the table is divided into smaller sets of codes (or small tables) that can be swapped in and out of the large table. (The following descriptions are for the 7-bit environment only. However, the 8-bit environment is similar, but slightly more complex.)

Each smaller code set has a standard name (therefore new sets can be added as long as the chosen name is unique) and includes codes which have similar characteristics. A standard mechanism is used to control the swapping process.

The large 128-character in-use table is divided into two smaller tables as il­ lustrated in Figure 2.3. These tables are known as the control set (CO) and the graphic set (G set).

Figure 2.4 shows how the NAPLPS code-extension technique works in an 7-bit environment. The swapping mechanism allows a variety of code sets to be swapped into and out of the G set. The code-set swapping is done with the 96-character G-sets.

Before a G-set is swapped into the in-use table, it is selected from a repertory (currently there are six selections available) and then replaced into one of the four designate sets. Then one of these designated sets are placed into G. Codes are then interpreted based on the current G set in the in-use table.

The arrows and labels indicate swapping code sequences which are used to control the swapping process. For example, the code required to swap the code set in designate set GO into G set is SI. CH.-lPTER 2. OVERVIEvV OF N.-lPLPS 11

07 0 0 0 0 1 1 1 1 06 0 0 1 1 0 0 1 1 05 0 1 0 1 0 1 0 1 COLU MN 0 1 2 3 4 5 6 7

P4 03 02 b1 RO w 0 0 0 0 0 0 0 0 1 1 0 0 1 0 2 0 0 1 1 3 0 1 0 0 4 0 1 0 1 5 0 1 1 0 6 0 1 1 1 7 co GSET 1 0 0 0 8 SET 1 0 0 1 9 1 0 1 0 10 1 0 1 1 11 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15

Figure 2.3: 7-Bit In-Use Table CHA.PTER 2. OVERVIEvV OF :VAPLPS 12

DEFAULT CO

COSET C1 SET

DESIGNATION DESIGNATION & INVOCATION OFC1 SET INVOCATION OFSINGLEC1 OFCO SET CAHRACTER ---,I I G I 7-BfT IN-USE co I TABLE (DEFAULT: GO) I I

Sl SO LS2, SS2 LS3, SS3

(DEFAULT: (DEFAULT: (DEFAULT: (DEFAULT: PRIM. POl SET) SUP. MOSAIC CHAR. SET) CHAR. SET} CHAR. SET)

DESIGNATION OFG-SETS

G-SET · PRIMARY REPERTORY CHAR. SET

L F CHARACTER: 412

Figure 2.4: Code Extension in a 7-Bit Environment CHAPTER 2. OVERVIEW OF NAPLPS 13

(0, 1) r------~ (1 '1)

I Border ! (0,0.75) (1,0.75)

Display Area li-r--Un it Screen

Physical Display 1.!:------Screen (0,0) ------(1,0)

Figure 2.5: The Unit Screen

To move a G-set from the repertory to one of the designated sets, a three­ character sequence is used. The third character (represented by (F) ) is the name of the G-set. For example, to move the Primary Character G-set from the repertory t? t~e GO designated set, the sequence ESC, 2/8, 4/2 would be used.

2.4.2 The Unit Screen and Fixed-Point Binary Format

NAPLPS uses the Unit Screen concept as shown in Figure 2.5. The unit screen is an abstract rectangular area which maps to the physical display screen. The coordinates are expressed in binary fractions of normalised units of 0 to 1, rather than in physical dimensions or grids such as rows and columns used by other standards.

The advantage of such a unit .screen approach is that the pictures are displayed independently of any particular hardware configuration. Another advantage is that the relative positions of objects in pictures with respect to each other are the same, regardless of the resolution.

A fixed-point qinary format is used to specify the unit coordinates as illustrated in Figure 2.6. The format is the same as a typical integer format, except that the binary point is on the left and between the sign bit and the data bits. The main difference between a fixed-point binary format and a typical integer format is that as more bits of precision are added, they are added on the right instead of the left.

If the number of bits sent to specify the coordinates is greater than the actual number of bits required internally for the resolution of the terminal, then the ter­ minal can truncate the last i.ew least significant bits. Alternatively, if the number CHA.PTER 2. OVERVIEvV OF NA.PLPS 14

Sign Bit

Cl) {!! ID ;z N (. .... a ...... +I· ------

• "- Binary Point Figure 2.6: The Fixed-Point Binary Format

BS B7 B6 85 B4 83 B2 81

+1- +1- MSB • • •

LSB LSB

X y

Figure 2. 7: Two-Dimensional Mode of bits sent is less than the actual number of bits required, then the terminal can add trailing zeros.

To display the pictures, the unit coordinates must first be mapped to a physical display. Thus, such a format makes the task of handling the coordinates easier for integer-oriented microprocessors, as compared to floating-point representations.

'When coordinates are encoded, 6 bits of data are packed into a single byte. Figure 2.7 shows the standard two-dimensional format, in which 3 bits are used for x and 3 bits for y.

2.4.3 Text conversion

Text is treated as a subset of graphics. Each character can be scaled to any size and positioned anywhere on the unit screen. NAPLPS currently supports three CHAPTER 2. OVERVIEW OF NAPLPS 15

88 87 86 85 84 83 82 81 [)(] 0 I· OP. CODE [)(] • • [)(] 1 OPERANDS • • [)(] 1

Figure 2.8: PDI Command Format fixed character sets and one redefinable character set. The default character set is the Primary Character Set which contains normal ASCII characters. The Supple­ mentary Character Set contains accents, diacritical marks, and special characters for Latin-based alphabets. The Mosaic Character Set contains 64 different 2 x 3 block mosaic characters which can be combined to form chunky graphics shapes.

The Dynamically Redefinable Character Set (DRCS) allows each entry to be defined to create some special symbols. Once a character is defined, it can be used in the same way as any other characters.

2.4.4 Graphics

The Picture-Description Instruction (PDI) G-set is used for graphic drawings. Figure 2.8 illustrates. the structure of a typical PDI command. Bit 7 is used to distinguish between the op codes and the data within the PDI G-set. Bit 7 = 0 indicates that the byte is an op code, otherwise it is a data byte.

The basic graphic primitives are points, lines, rectangles, arcs, polygons and incremental primitives (Figure 2.9). The incremental-primitive feature provides a very efficient way to specify figures with irregular edges. With this feature, some line segments can be specified with just 2 bits of information. Figure 2.10 shows a figure drawn by using incremental lines. CH.4PTER 2. OVERVIE"V OF N.4PLPS 16

Point •

Line

Arc

Rectangle

Polygon

Incremental

Figure 2.9: PDI Drawing Primitives

Figure 2.10: A Figure Drawn by Using Incremental Lines CHA.PTER 2. OVERVIE~V OF NAPLPS 17

liTdx dy 1

L dy~Orlgln

dx dx dx

Arc Line Point

Figure 2.11: The Logical Pel

2.4.5 The Logical Pel

In NAPLPS, the drawing point is a rectangle brush which is called the logical pel. The size of the logical pel is specified by the PDI DOMAIN command. A primitive is drawn by tracing the pel's origin along the primitive, while drawing all pixels under the logical pel as the pel is mapped to the display screen (Figure 2.11).

For a filled figure, the area enclosed by the outline (including the the region of the outline traced by the logical pel along the outline) is filled in the current colour(s) with the texture pattern specified in the PDI TEXTURE command (Fig­ ure 2.12). As will be illustrated, the above two features require considerable effort to be implemented properly.

2.4.6 Colour Control

NAPLPS supports three colour modes to suit many different applications. Colour mode 0 is simple and is compatible with almost all colour display screens. In NAPLPS, colour is specified in terms of red, green, and blue primitives. Figure CH.-\PTER 2. OVERVIEvV OF NAPLPS 18

Outline

Logical Pel Region of the outline traced by the logical pel alone the outline

Figure 2.12: A Filled Rectangle

88 87 86 85 84 83 82 81

[X] 1 IMs81 Ms81 Ms81 G R 8 G R 8 • • [X] ILS81 LS8 ILS81 G R 8 G R 8

Figure 2.13: NAPLPS Colour Encoding

2.13 illustrates how colour information is encoded.

In colour mode 0, the PDI SET COLOR instruction is used to specify the desired colour value which is used directly for subsequent drawings and applied to the foreground pixels only, ie, only the pixels that comprise the character pattern. In colour mode 1, the colour is selected from a colour look-up table, and applied only to the foreground pixels. Colour mode 2 allows both the foreground and background colours to be selected from a colour look-up table and applied to the foreground and the background pixels respectively. Figure 2.14 illustrates two textured lines drawn in two different colour modes. For the line drawn in colour modes 0 or 1, the inter-dot spacing is not drawn, while for the line drawn in colour mode 2, the inter-dot spacing is drawn in the background colour. CH.4.PTER 2. OVERVIEW OF Y.4.PLPS 19 ,

, Drawn in Background Colour Drawn in Foreground , _Colour

Drawn in Drawn in Colour mode 0 or 1 Colour mode 2

Figure 2.14: Textured Lines Drawn in Two Different Colour Modes

Colour modes 1 and 2 can be used to create some dynamic visual effects (colour- table animation). Special hardware is required to support these two modes.

2.4. 7 Control

The CO Control set contains the codes needed to perform the G and C set swapping and character formatting. The Cl control set is used to control cursor attributes and text format, and also to create macros, DRCS, etc.

2.4.8 User Input

Certain fields of the screen can be specified as unprotected fields for user input. The user can enter and edit information in an unprotected field. Information entered in these fields is stored as NAPLPS data and can be sent to the host when the user requires.

2.4.9 Macros

Frequently-used byte strings can be. represented as a single-character macro. Once a macro has been defined, it can be invoked by its single-character name. There- fore, the amount of information that is transmitted from the host to the terminal CHAPTER 2. OVERVIEvV OF NAPLPS 20

may be reduced. The contents of macros (transmit macros) can also be transmit­ ted back to the host computer. Currently, nesting of macros is allowed, and 96 macro names are available.

2.5 Conformation and Service Reference Model

The NAPLPS standard includes two major parts: one is the Comformace Section, and the other is the Service Reference Model (SRM). The comformance section de­ scribes what functions should be implemented to conform to N APLPS. The SRM is a set of guidelines which defines the implementation requirements for videotex and teletext services. It specifies the maximum functionalities that the videotex system information providers should assume when encoding frames and the min­ imum functionalities that videotex terminals should implement. For instance, it specifies that at least 16 simultaneous colours out of a palette of 512 shall be available [4] [11] [12] [13] [14]. Chap~er 3

System Hardware

3.1 Introduction

3.1.1 Graphics Display

A typical graphics system includes the following components: a control processor, interactive devices, the graphics processor, and video output circuitry. The control processor controls the overall operation of the system. The interactive devices provide the interface between the graphics system and other systems, and the interface between the user and the graphics system. The graphics processor (or display processor or graphics controller) creates and manipulates graphics images or performs various graphics functions.

A videotex terminal is a basic graphics system. The key component of a video­ tex terminal is its graphics display. Today, display devices based on raster graphics technology, as used in television display systems, have become the most widely used type of display device. In a raster display, an image is conceived to be a two­ dimensional array of graphics primitives or picture element (pixels), each pixel having its own colour and intensity properties. This image is displayed on a de­ vice by painting it row by row, from top to bottom and from left to right (ie. in a raster pattern). Usually, the device is a Cathode Ray Tube (CRT) in which a phosphor on the screen is excited by an electron beam. The light output of the phosphor decays rapidly after excitation (typically in about 1.5 ms), and so the image is not continuous. However, if this flickering image is repainted (refreshed) continuously at a rate greater than rv 30 times per second, the human eye will perceive a continuous image and cannot detect the flicker. As will be seen, this

21 CHAPTER 3. SYSTEM HARDWARE 22

minimum refresh rate required to suppress the perception of image flicker is a critical determinant for required memory access time.

The pixels are stored in a logically two-dimensional refresh buffer (also com­ monly called a bit map, a bit-map memory, or a frame buffer). For a monochrome (black and white) display, only one bit of memory is required to define a pixel, and conceptually the frame buffer is organised as a single bit plane (see figure 3.1(a)). On the other hand, multiple bits are required to define a pixel for a colour display (or grey scale), and the frame buffer is conceptually organised as multiple bit planes. A common configuration, illustrated in figure 3.1(b ), is four bits per pixel, in which three bits are used to control the three primary colours and the remaining bit is used to control the intensity. In general, if the number of planes is n, then 2n different colours can be displayed. The number of bits per pixel is called the colour resolution, or pixel depth.

The frame buffer has a one-to-one relationship with the display screen. This means that each particular coordinate ( x, y) (or pixel) on the screen is mapped from a corresponding memory location in the frame-buffer memory. The corre­ sponding memory location is called the physical address of the pixel in the frame­ buffer memory. The mapping between the x, y coordinates of the display and the physical address is the most frequently used graphics function in a graphics system, and must be performed efficiently.

Frame-buffer Memory

A frame buffer normally consists of inexpensive solid-state random-access semicon­ ductor memories (RAMs), namely dynamic random-access memories (DRAMs), or static random-access memories (SRAMs). Static RAMs are the best on the basis of circuit convenience for fast graphic applications, since they do not require complex interface circuitry and they can operate faster than DRAMs of equivalent size. However, the high cost and high power consumption of static RAMs are drawbacks. For simple applications, the high cost and power consumption are not justified by the extra performance.

DRAMs provide the lower cost per bit, higher density, moderate speeds, and lower power consumption compared to static RAMs. As the density of DRAMs continues to increase and the costs continue to decline, it is now favourable to use DRAMs in many graphic applications. The present competition between the US and Japanese has resulted in DRAM's prices dropping at a very rapid rate (see CH.4.PTER 3. SYSTElvf HARDvVARE 23

Display Screen Frame Buffer 3 3 lo o 0 0 0'0 olo 1 1 lolo r • • • • I~ 11 'f~ 111 ,, c • lc 1 Cl11 c 1 • 011 11 ' r In r • • 2 II' ,, '( [-;;( • • 1(2)• • lo ! r'n I (' C r 2 In 1 Ill n • I .. , I Electron Gun

Figure 3-1 (a): A Monochrome Display

Frame Buffer Screen Display 3

2

DACs RED GREEN BLUE

Figure 3-1(b}: A Colour Display

Figure 3.1: Mapping of Frame Buffer to Display Screen. CHAPTER 3. SYSTEM HARDWARE 24

' ROW ROW MEMORY ADD RESSES ) ADDRESS SELECT ARRAYS 17"' LATCH

DATA DATA INPUT OUTP UT COLUMN I/ 0 CIRCUITS

RAS COLUMN SELECT

CAs COLUMN ADDRESS LATCH ('-

Figure 3.2: Arrangement of a Typical DRAM

"Anger in the Valley as Japan routs US", The Australian, June, 25, 1985).

A standard DRAM is organised as a two-dimensional array of rows and columns of memory cells (see Figure 3.2). Each cell is accessed by giving its column and row addresses. Each storage cell is basically a charge-storage capacitor with a driver transistor. The presence or absence of the electrical charge in a capacitor is interpreted by the RAM's sense line as a logical 1 or 0. Due to leal<:age of the capacitor, the storage cell will gradually lose its electrical charge. Thus, to maintain the stored data, each cell needs to be refreshed within a fixed time interval, typically 2 msec.

To reduce the pin count, a standard DRAM uses a multiplexed addressing scheme, which requires a sequence of events to access each storage cell in the memory. This sequence is: establish row address; assert row address strobe (RAS) (normally the high to low transition of the RAS strobes the row address into the internal latch); maintain row address for some minimum hold time; establish column address; assert column address strobe (CAS); maintain column address for minimum hold time. After the row and column addresses have been latched, the memory can be read or written. The sequence ends with both RAS and CAS being negated (see Figure 3.-3, 3.4). CH.4PTER 3. SYSTE).f HARD ~VARE 25

RAS \~ ______,/ \_

CAS \ ______./

WRITE I ------~

DATAOUT ------~< 6~~~ >~---

Figure 3.3: Read Cycle Timing

RAS \..______/ \_

CAS \"-__/

WRITE \'\o..----J/

DAT_A_I_N______~<~----- 6-~_\_~----J>~------

Figure 3.4: Write Cycle Timing CHAPTER 3. SYSTEM HARDWARE 26

RAS

CAS

DATA OUT

Figure 3.5: Page Mode Operation

Instead of using the entire RAS-CAS cycle for every single read or write mem­ ory cycle, several DRAMs offer operation modes which combine several multiple­ bit accesses into a single RAS-CAS cycle to give increased bandwidth. These modes include: the page, ripple, nibble, and static column mode!).

Page Mode

Page mode, supported by most DRAMs, allows random access to any bit in a selected row. The RAS remains asserted during a page-mode access cycle, and a new column address is latched on each CAS cycle, so an individual bit can be accessed. The advantage of page mode is that for multiple-bit accesses, the average access time is shorter than the access time of the normal RAS-controlled cycle, thus offering increased bandwidth (see Figure 3.5).

Nibble Mode

Nibble mode, used in TMS4275, permits the serial access of a sequence of four bits using an end-around rotation. A dynamic RAM having a typical RAS cycle time of 300 ns can have a nibble cycle time of 75 ns. No available chip is capable of both page and nibble modes, because both are implemented by toggling CAS with RAS asserted [15] (see Figure 3.6). CHAPTER 3. SYSTEM HARDWARE 27

RAS \,______!\_

CAS

DATA OUT

Figure 3.6: Nibble Mode Operation

Static column mode

The static column mode operation, eg. supported by Intel51C259 (64K x 4), permits all bits within a selected row to be accessed at a high data rate. The cycle begins by asserting RAS. Once the row is selected, read, write and read write-modify-write cycles can be performed randomly or sequentially within the selected row. The CAS portion of the cycle is similar to a static-RAM cycle: column addresses are not latched by CAS, and access is from valid address, not CAS. All the bits within a selected row can be accessed merely by changing the column addresses. Nate that CAS is best regarded as the output enable. For a C-MOS DRAM, a data rate of 25 MHz (or average cycle time after RAS setup of 40 ns) is feasible [16] (see Figure 3.7).

Ripple Mode

Ripple mode is supported by Intel51C256 CHMOS-D III DRAMs. This mode is compatible with page mode in other DRAMs. CAS serves both as address latch and output enable. When CAS is negated, addresses are allowed to flow through the memory, thus data access starts at valid column addresses, not CAS. This mode provides an additional 15% increase in the data rate compared to the static column mode [17] [18] (see Figure 3.8).

The Intel 2186 video RAM, (or sometimes called Integrated RAM) combines the best features of the SRAM and DRAM. It contains the entire dynamic RAM system on one single chip. The single chip includes memory cells, automatic internal refresh circuitry, and arbitration and control logic. By integrating all CHAPTER 3. SYSTEM HARDWARE 28

RAS

CAS

ADDRESSES

DATA OUT

Figure 3. 7: Static-Column Decoding Operation

RAS

CAS

ADDRESSES ~~--c--~)(~----c--~)(~ ___c ____ ~)(~---C----~

Figure 3.8: Ripple Mode Operation CHAPTER 3. SYSTE:NI HARDWARE 29

the DRAM system components on a single chip, it provides the cost, power and density advantages of DRAMs with the ease of use of SRAMs [19].

Wide-width RAMs, eg. ¥ostek MK4856 (32K X 8), allow multiple bits to be accessed at one memory cycle for video output, thus increasing the amount of available bandwidth to the processor to update the video buffer (see section 3.2.2 Graphics Functions, part: Image Update and Frame Buffer Memory Design)

Frame Buffer Configuration

There are basically three major ways in which pixels may be mapped to the frame buffer memory: the planar architecture, the packed-pixel arrangement, and a com­ bination of both.

In a packed-pixel arrangement, the frame buffer is a single linear address space and pixels are defined as fields within a word. For example, the HD63484 Advanced CRT Controller from Hitachi, supports this type of architecture. Its word size is 16 bits, and a word can be defined as four 4-bit pixels, or two 8-bit pixels, or one 16-bit pixel, etc (see Figure 3.9). All the bits in a pixel can be modified with a single access. This architecture is suitable for applications where intensive pixel value computation, such as shading, is required.

The packed-pixel approach is attractive for 32-bit architectures because it offers fast pixel processing for a specific pixel depth. It also offers a lower pin and package count than offered by a planar approach. But the rate at which read/modify/write operations can be performed on pixel data varies with the number of bits defined per pixel.

On the other hand, in the planar architecture, each word (normally 16 bits) in the frame buffer maps to multiple bits within a single bit plane, and the processor addresses a word in each plane (see Figure 3.10). Therefore, to change a single bit in a pixel the entire word containing the bit must be accessed, and to update one single pixel requires accessing n words (for an-bit pixel) either sequentially, or in parallel.

The Am95C60 Quad Pixel Dataflow Manager (QPDM) from Advanced Micro Devices, supports the planar approach. It addresses four bit-planes in parallel, and a 16-bit word in each bit-plane. Thus a single QPDM can read/modify/write CHAPTER 3. SYSTE)vf HARD\tVARE 30

Pixel Address

.. r--w Physical Space Logical Space

(Example of 4 bits/pixel)

Figure 3.9: Packed-Pixel Architecture

1 word

Plane N Plane3 Plane2 Plane 1

1 ..

Physical Space Logical Space

Figure 3.10: Planar Architecture CHAPTER 3. SYSTElvi HARDWARE 31

to 16 4-bit pixels simultaneously. It is :possible to add several QPDMs to work in parallel to increase the pixel depth without slowing the speed. Furthermore, a planar-oriented frame buffer has an efficiency advantage in handing text. The bit patterns for characters can be stored as templates in a single plane of an off­ screen frame-buffer area. When they are transferred to the display portion of the frame buffer with a bitblt operation, they also can be given attributes that fill in the other bits in the remaining planes of the affected pixels to give them the desired colour. On the other hand, to display text in a packed-pixel scheme, so­ called colour expansion is used. The character fonts are stored as templates in a single plane of an off-screen area (as used by the TMS 34010 Graphics System Processor (GPS) ) or in the internal pattem RAM (as used by the ACRTC). When a character is transfered to the display area, the single plane is transfered into multiple planes to give the desired colours (see Figure 3.11). This requires processing each bit in the single plane one at a time, and this process can be slow. TI calls this the colour-expand operation. Because of the difference, TI refers to its pixel moves as pixblts rather than bitblts.

This architecture is suitable for engineering and business applications since these applications require intensive data creation and movement but fewer pixel operations. The scheme is cheap to implement, and for large frame buffer, its performance is good.

One problem associated with this architecture is that due to the word bound­ ary, data movements are restricted within the boundary: ie. the relative pixel locations of the source must be the same as those of the destination. If they are not the same, then during the transfer, the source pixels must be shifted. Since the overhead to provide such manipulation is significant, a specialised fast hardware shifter (a barrel shifter) is required to provide a fast arbitrary move down to pixel level (see Figure 3.12).

The combined architecture provides the capability of accessing the frame buffer at either word width or pixel depth (see Figure 3.13). Because of the high costs, this architecture in the past has been implemented only in the more expensive graphic systems. CHA.PTER 3. SYSTErvi HARD"t.VARE 32

Off-Scraan Area

'1' Colour register

'0' Colour register

Expansion operation

Expanded image

l1 lo lo I I I I I I f l l ,_ Jo I1 J I o I I I I I I I I !1 I D 0 11 I I I I I I t-- ,_ 0 0 0 0 t--1-:..... 1- Display Area - - - - I--

Figure 3.11: Colour Expansion Operation CHAPTER 3. SYSTEl\ti HARD\\'ARE 33

word 1 word 2 word3 word4

source destination

(Word-boundary aligned)

source destination

(Word-boundary not aligned)

word 1 word 2

I I : Barrel :1 :I word N Shifter I I, ______

control

Figure 3.12: Translation CHAPTER 3. SYSTE::VI HARDWARE 34

/ No. of planes /

pixel depth (pixel address)

Figure 3.13: Pixel and Planar Architecture

Video Output Display Circuitry Display Controller r-- (Colour Map, Monitor ADCs)

Control Frame Processor Buffer t ' SYSTEM BUS

Figure 3.14: A Control Processor Driven Graphics System

3.2 Raster Graphics System

A typical raster graphics system includes the following components: a control pro­ cessor, interactive devices, the graphics processor, the frame buffer, video refresh controller, and video output circuitry.

In fact, the display processor does not have to be a dedicated processor: it is very common for the control processor to perform all the graphics functions. The graphics processor can read from and write to the frame buffer. There are a number of arrangements by which the graphics processor can access the frame buffer. The simplest case is where the frame buffer is on the system bus and the graphics functions are performed by the control processor (see Figure 3.14).

Alternatively, a graphics processor can be added to the system bus: thus, the frame buffer can be directly accessed by either the control processor or the graphics CHAPTER 3. SYSTEM HARDWARE 35

Video Output Display Circuitry Display Controller (Colour Map, Monitor - ADCs)

Control Graphics Frame Processor Processor Buffer t

SYSTEM BUS

Figure 3.15: A Graphics Processor Driven Graphics System

processor (see Figure 3.15).

Often, the frame buffer memories are dual-port memories which have two data paths on their input/output. This allows the video refresh controller to use one port and the graphics processor to use the other (see Figure 3.16).

Another possibility is for the frame buffer to have a dedicated port to the graphics processor and for all data movements between the control processor and the frame buffer to pass through the graphics processor (see Figure 3.17).

The video refresh controller reads the contents of the frame buffer in the proper format and timing, and then feeds it to the video output circuitry. The video output circuitry includes a monitor, digital-to-analog converter(s) (DACs), and optionally a colour lookup table (LUT). The main function of the video output circuitry is to convert the data values into colour levels which are then passed to the display monitor.

3.2.1 Colour Lookup Table

A very economical way of offering selection from a large range of colours without using many bits per pixel is to use a video colour lookup table (or colour table, colour map, colour palette). Instead of connecting the outputs of the frame buffer directly to the intensity digital-to-analogue converters, the output is used as an index into the table, and the table entries, which generally have more bits than the index value, are connected to the DACs. For example, if there are 8 bits per CHAPTER 3. SYSTElvf HARDvVARE 36

MEMORY Data ARRAY Graphics Input Display Processor Controller

Address Data and Output Control l I I MUX I I BIT-MAP MEMORY

Figure 3.16: Dual-port Memory

Video Output Display r- Circuitry Display Controller (Colour Map, Monitor ADCs)

Control Graphics ~ Frame Processor Processor Buffer

SYSTEM BUS

Figure 3.17: A Graphics Processor Driven Graphics System (Note the frame buffer has a dedicated port to the graphics processor.) CHAPTER 3. SYSTEM HARDWARE 37

Raster Video Colour look-up Table Screen

Frame Buller Red Green Blue 3

2

DACs

Figure 3.18: A Colour Graphics Display System with Colour Lookup Table

8 pixel and each table entry is 12 bits long, then 256 (2 ) colours out of a palette of 12 4096 (2 ) can be simultaneously displayed (see Figure 3.18).

3.2.2 Graphics Functions

A graphics system must perform the following basic functions: image creation and manipulation, video refresh, timing for the display, and optionally refresh memory if DRAMs are used for the frame buffer. A good design must perform all the functions very efficiently. One major problem is the conflict between the screen refresh and image update. Memory bandwidth is the limiting factor, and the frame buffer cannot be accessed in an arbitrary fashion to perform all the functions. Compromise and arbitration are required.

Screen Refresh

To refresh the screen, the entire image is scanned out sequentially, one raster line at a time, from top to bottom. The amount of time required to read a word of memory is called the memory cycle time of the memory. The rate at which each CHAPTER 3. SYSTEM HARDvVARE 38

pixel must be supplied to the display is called the video rate. However, most DRAMs have longer cycle times than the pixel access time (pixel access time = 1/video rate). To solve the problem, multiple pixels must be read out in one memory cycle and loaded into shift registers to be shifted out one at a time. For example, a typical display has the following specifications: 512 x 512 pixels and a 30 Hz frame refresh rate. Taking into consideration both the horizontal and vertical retrace intervals, the above specification yields a minimum access time of 100 ns per pixel. This means that a pixel must be supplied to the display monitor at every 100 ns. But a typical DRAM will have a memory cycle time of 260 ns. To use these slow memories, a typical frame buffer would normally consist of multiple memory chips, with the video refresh controller accessing several chips simultaneously. For instance in theory, if three successive pixels can be read out at one memory cycle, and each pixel is shifted out at every 100 ns, then memories with 260 ns cycle time can meet the refresh rate requirement. But in practice, the number of pixels read out at one memory cycle is constrained by some other factors, such as the frame buffer configuration, word size, pixel depth.

Memory Refresh

As mentioned before, if a frame buffer consists of DRAMs, then these DRAMs need to be refreshed within a fixed interval. This refresh requirement must compete with the video refresh and update of frame buffer for available memory access cycles.

Image Update and Frame Buffer Memory Design

The system performance is affected by the different components of a raster graph­ ics system: the speed of the graphics processor, the architecture of the frame buffer memory, and the methods by which the graphics processor and the video refresh controller access the frame buffer. Each one of the components can be the bot­ tleneck in a graphics system. Often a compromise must be made if the available bandwidth is limited for a given design of the frame buffer. For instance, if the graphics processor is assigned to have a higher priority than the video refresh con­ troller, it can perform drawings during the video refresh period. However, a viewer may find the flashing effect, the result of the drawing and video refresh contention, very unpleasant. On the other hand, if the video refresh has a higher priority, the bandwidth available to the graphics processor can be very limited. The bottleneck is moved to the input mechanism, used by the graphics processor to access the CHA.PTER 3. SYSTE:VI HARDvVARE 39

Data In Data Out

Ad dress

Memory Arrays Co ntrol STANDARD 64Kx1 PORT ------{256 ------Serial D ata In Serial Data Out Shift Register SERIAL PORT and Control Logic

Control

Figure 3.19: Texas Instruments TMS4161 Dual-port RAM frame buffer, if the graphics processor can produce pixels at a great rate and the frame buffer can not keep up with it. Obviously, the speed at which the graphics processor can prepare pixels is also an important factor. Furthermore, these three components are inter-related and cannot be separated when designing a graphics system.

The architecture of the frame buffer plays a very significant role in a graphics system [20]. There exist various techniques to reduce the memory contention while also attempting to increase the bandwidth available to the graphics processor and thus increase the overall performance. The best solution to the problem of contention between video refresh and image update is to remove the contention by using dual-port RAMs. Texas Instruments was the first manufacturer to introduce a special dual-port RAM for graphics applications (64K-RAM TMS4161). This is a dual-port device, in which one port operates as a standard 64K x 1-bit memory and the other as a 256 x 1-bit shift register. 256 bits can be loaded in parallel into the internal shift register, and then shifted out sequentially to the video output circuitry. The standard port is used by the graphics processor and the other port is used by the video refresh controller. This technique will give the graphics processor almost 100 per cent of the available bandwidth for updating images (see Figure 3.19).

Before the introduction of dual-port RAMs, effort was mainly expended on increasing the graphics processor's access to the frame buffer. The simplest image CHAPTER 3. SYSTEM HARDWARE 40

update scheme is to allow frame buffer update only during the horizontal and vertical blanking intervals of the raster scan. Since the total blanking intervals is only 25% to 35% of the frame time, update access to the frame buffer is very limited [21]. Another method of increasing the graphics processor's access to the frame buffer, is to use cache memory. The cache is a small but very fast memory, which sits in between the graphics processor and the frame buffer, and when the graphics processor has created one pixel datum, it is temporarily stored in the cache. When the cache is full, the entire contents are loaded into the frame buffer in parallel in one memory cycle. However, the timing-control circuitry, which is required to coordinate the graphics processor, the cache memory, and the frame buffer, is complex and expensive.

Another technique is to use double-buffering: one frame buffer, which is non­ visible, is used to update the image, while the other one, which is visible, is used for display. When an image update is completed, the role of the buffers is swapped with the update buffer being displayed, and the buffer previously used for display becoming the update buffer. If each of the buffers consists of an entirely separate set of memory chips, then the graphics processor and the refresh processor have complete access to the respective buffers, and because there is no memory contention between the video refresh and image update, this technique provides a much better performance than when updating only during blanking intervals. The tradeoff is that each pixel often has to be drawn twice and that the memory requirement is twice that of a single frame buffer. However, if only one set of memory chips is shared by both the graphics processor and the video refresh controller, then this technique cannot gain any access time, except that it provides smooth image transition [16] [21].

Another common technique is to interleave the video refresh and image update cycles. At every video refresh cycle, the video refresh controller reads out as many pixels as possible so that between every video refresh cycle, there will be some spare memory cycles available to the graphics processor to access the frame buffer. Taking the above example again, if six pixels can be read out in one memory cycle, then there is one spare memory cycle available for every video refresh cycle ( 100 * 6-260 = 340 ns which is longer than a memory cycle). The general rule is that the greater the number of successive pixels that can be read out at one video refresh cycle, the more cycles will be available to the graphics processor. (One point to note is that although a high density n x 1-bit DRAM can have the same capacity of many small m x 1-bit DRAMs which constitute a frame buffer, it can not be used alone; it must be used with others in parallel to give enough CHAPTER 3. SYSTEM HARDWARE 41

bandwidth. Take a 256K x 1-bit DRAM with cycle time of 260 ns as an example. It can contain an image of 256 x 256 with colour a resolution of 16, but it is impossible to access the DRAM for every 100 ns to update the display.)

Now, with the new powerful graphics processors and fast arithmetic processors (or math processors) becoming commonplace, the input mechanism of the frame buffer is usually the bottleneck. Some designs have concentrated on the ability to manipulate multiple pixels in parallel. The image creation operations are often not dependent on scan lines like video refresh, but rather on locality. Ie. if a particular pixel of an image is updated, its nearby pixels are also likely to be updated (this is also called the coherence property of graphical images). Based on this principle, some work has been done on arranging the frame buffer organisation to allow the graphics processor to update square areas for improved performance [22] [23] [24].

Also, some techniques have been explored to design systems with cache mem­ ories. By using a cache memory, the graphics processor can calculate pixels at full speed [20] [22]. Goris [22] has also examined the possibility of a configurable cache memory. The organisation of the cache memory can be 16 x 1 or 4 x 4, depending on the type of operation. For example, the 16 X 1 arrangement is best for drawing horizontal lines, while the 4 X 4 arrangement is suitable for any other line slopes. In a similar fashion, Fairchild Semiconductor Corp. has devised a reconfigurable frame-buffer architecture. The frame-buffer can be dynamically configured into various array sizes, such as 64 x 1, 4 x 16, 16 x 4, and 8 X 8 organisations [23].

3.3 Graphics Processor

In older raster graphics systems, the control processor had to perform the various functions mentioned above. This would leave only a small amount of time for other tasks. To off-load the basic tasks from the control processor, attempts have been made to produce intelligent graphics processors which perform most of the graphics functions.

These graphics processors offer many advantages over conventional systems. The hardware design becomes much simpler. For instance, the complex timing circuitry for video, memory control, CPU and others, previously requiring discrete components, has been integrated into one or a few chip(s). The new graphics controllers attempt to provide most of the timing signals, which may also be software configurable. Some of them also provide high-level instruction sets. For CHAPTER 3. SYSTEM HARDWARE 42

instance, the algorithms for drawing most commonly used primitives, such as lines, arcs, polygons, etc., are implemented on the chip, and these primitives are close to those graphics primitives specified by standard graphics interfaces such as the GKS, NAPLPS, or CGI.

The chief differences among the graphics processors are how the processors manage the frame buffer and how the processors are programmed, and both fac­ tors affect the performance. Some graphics processors support packed-pixel archi­ tecture while others support planar architecture.

There exist basically two major approaches when designing a graphics proces­ sor. One is the need for programmability while the other one is the need for speed by means of on-chip hardwired functions. The main difference between these two approaches is the tradeoff between generality and performaD:ce. The chips designed to optimise programmability, such as the TMS34010 GSP, are general-purpose pro­ cessors with very powerful instruction sets. All the graphics drawing algorithms must be implemented by the user. This gives the implementor a great degree of freedom to implement any graphics drawing algorithms. He also is able to control or tune an algorithm to optimise performance or picture quality. However, the cost of the software development may be high. In the case of the TMS34010, although a library of graphics routines is available, the $US 10,000 price mark cannot be justified unless volume production is expected [25].

The second approach is to provide fixed features while optimising the perfor­ mance by means of on-chip parallel processing. This approach in general yields higher performance than the first if the desired features are matched with the fea­ tures provided by a particular chip. The programming models for these chips are very simple, with only high-level graphics commands being required, thus mak­ ing software development very easy and productive. The drawback is that since the graphics capabilities are fixed, a mismatch between the functionality provided and that required may result in poor performance and/or great complexity in the algorithms.

Advances in technology have made low-cost, high-performance raster graph­ ics systems more feasible and attractive. The availability of new generation of graphics processors, video RAMs, DRAMs with fast operation modes, arithmetic processors, and the decline in DRAM prices have made it possible for designers to apply their ingenuities to design a great variety of graphics systems. CHA..PTER 3. SYSTE:VI HARDvVARE 43

Drawing Address Frame Buffer / Addre ss/Data DMA , DMA , Drawing 8 us Control Control -, Drawing / Unit Da;a , Processor _, Frame Buffer Interrupt Control Interrupt - _,control Control , Request Unit

Display Address Tim lng f-. L _, / , /Signals Data Raster CRT Bus MPU Display Address Interface Interface Processor , - Cu rsor Control·' trol , /Con Control , " t / Data , .. Synchronlsation / J S~nc. " L.....- , _, Pu lses Timing , Register Processor Address Control

Figure 3.20: ACRTC Function Block Diagram

3.3.1 ACRTC

When the project started, only the NEC 7220 Graphics Display Controller (GDC), the TMS9929A (TI) Video Display Processor, and the EF9367 (Thomson-CSF) Graphics Display Processor were readily available locally, so the choice was limited to these three. The TMS9929A is not suitable because of its limited ma.."Cimum spatial and colour resolutions (256 x 210, and 16 respectively). The EF9367 was designed for video applications: it supports only fixed formats and can draw only vector lines. So, the NEC 7220 was selected for the first prototype.

The 7220 is software-programmable to generate different types of video dis­ play format. It is capable of drawing lines, arc/circles, rectangles, and graphic characters at a rate of 1.25 million pixels per second.

After the first prototype, the HD63484 ACRTC was introduced by Hitachi (see Figure 3.20). The HD63484 is a new-generation graphic chip, with all the features of the 7220. It is also much more powerful than the 7220. The ACRTC consists of five major functional blocks: the microprocessor unit (MPU) interface, the CRT CHAPTER 3. SYSTElvi HARDWARE 44

interface, the drawing processor, the display processor and the timing processor. The MPU interface provides the interface to the control processor. The CRT in­ terface controls the frame-buffer bus and the CRT timing signals. The drawing processor interprets commands issued by the control processor and performs the drawing operations on the frame-buffer memory. The display processor manages the frame-buffer refresh addressing based on the given specification of screen or­ ganisation. And finally, the timing processor provides the CRT synchronisation signals and other internal signals for the ACRTC.

An outstanding feature is its Interleaved Access Mode. In this mode, display cycles (video refresh cycles) and drawing cycles are interleaved. A display/ drawing cycle is defined as four system clock cycles. During the first clock cycle, the frame buffer display address is presented. During the second clock cycle, the display data is read from the frame buffer to be passed to the video output circuitry to refresh the display. During the third cycle, the frame buffer drawing address is presented. During the fourth cycle, the drawing data is read or written. Since there is no contention between the display and drawing cycles, a flashless display is obtained while maintaining full drawing speed (see Figure 3.21).

Another feature of the HD63484 is that drawing speed is the same both for monochrome and colour displays. In contrast, the 7220 has to draw a figure several times for colour displays. The HD63484 has more powerful drawing commands than the 7220. The 7220 uses the same drawing command to draw different types of figure, the only difference being that the parameters which follow the command are calculated differently. The number of parameters needed vary from zero to 11 and these calculations are rather complex. On the other hand, the drawing commands of the HD63484 are much easier to use. For example, to draw a circle, first the cursor is positioned and then the drawing command Circle (CRCL) to followed by its parameters are sent to the device. This will draw a circle of the radius specified by the 16-bit parameter in units of pixels at the center pointed by the cursor. Since the HD63484 went into production, several other manufactures have introduced new graphics processors whose performance exceeds the HD63484 in various degrees.

One point to emphasise is that the control processor has no direct access to the frame buffer: all the data movement is passed through the ACRTC. Another point is that the ACRTC seems to be designed to work only with standard static RAMs or DRAMs, and it cannot tal<:e the advantage of other operation modes, such as page, nibble, static column modes, offered by new DRAMs to speed up CHAPTER 3. SYSTE}tf HARD't-V.4.RE 45

Memory Cycle System ~ Clock I I \ I Address --, - Strobe I '-- Frame Buffer Address/Data' - Display Address ** Drawing Address ~ Data In Bus ---

Display Cycle Drawing Cycle (Read)

Display/Drawing Cycle

Memory Cycle System -, Clock \ I \ I\- --, Address Strobe '-- Frame Buffer Address/Data Display Address ** Drawing Address Data Out Bus - -

Display Cycle Drawing Cycle (Write)

Display/Drawing Cycle

Figure 3.21: ACRTC Interleaved Access Mode ( **display data read from frame buffer) CHAPTER 3. SYSTEM HARDWARE 46

RAS \'-----______/ \_

CAS \\.....-----J/

WRITE

DAT_A__ IN------~<~------~A-AL_~A_o ____~>r------

Figure 3.22: Early-Write Cycle graphics execution.

To minimise board space, new designs of graphics systems with higher spatial and colour resolution use high-density DRAM modules in Single-In-Line (SIP) packages. But these DRAM modules share common I/0 pins and use the so­ called early-write cycle to prevent contention between input and output. 1 In a standard write cycle, the row address and column address are latched into the memory chip first, and then the write-enable signal is asserted to strobe the data into the memory. In contrast, for an early-write cycle, the write-enable signal is asserted prior to CAS and the data is strobed in by CAS (see Figure 3.22).

An ACRTC frame-buffer-memory write cycle consists of two system clock cy­ cles. During the first cycle, the frame buffer drawing address is presented. During the second cycle, the drawing data is written. Since the ACRTC frame buffer memory address and data bus are time multiplexed, the data from the ACRTC will not be valid until the second cycle. Thus, the CAS must be delayed until the second cycle. The timing requirements of both the DRAMs and the ACRTC must be studied in great detail to ensure that a successful interface between the two can

1See data sheet: TM4164FL8 65,536 by 8-bit Dynamic RAM Module, November, 1983, Texas Instruments Inc. CHAPTER 3. SYSTEM HARDWARE 47

Drawing Cycle (Write) Display Cyde System --, Clock 1 I \ I ..._

Address ~ Strobe I 1'-- ACRTC Frame Buffer Address/Data - Drawing Address Data Out Display Address Bus

RAS - 1/ CAS \ \ I DRAM

Data Out )-

Figure 3.23: Early-Write and Read Cycles be designed. Apart from other timing requirements, two important factors have created timing problems. First, the CAS signal must be asserted for a certain min­ imum period. And the other is that when a memory cycle is completed (ie. when both RAS and CAS are negated), RAS must be negated for a certain minimum period (this is called the pre-charge time) before the next memory cycle starts. Thus, after a write cycle, a read cycle (which always follows a write cycle) cannot start immediately: it must be delayed to meet the pre-charge time. The net result is that the effective memory cycle of this particular type of memory when used with the ACRTC is longer than that for standard memory: alternatively the cycle time of the memory is not matched with the ACRTC's frame memory cycle time. This implies that to interface the ACRTC and the DRAM, either the ACRTC's system clock must be slowed down or faster DRAMs must be used. The DRAMs presently used have a cycle time of 260 ns, for an effective cycle time of 317 ns. This cycle time is 22% longer than the standard cycle time. As a subsequence, the graphics drawing speed is also reduced by the same amount (see Figure 3.23).

A rather complex timing control logic is required to support the early-write mode. The scheme presently used is a design optimised to give the shortest effective cycle time. The circuitry includes two CAS signal generators, one for read cycle and the other one for write cycle, and according to the type of memory-access, the circuitry selects one of them to drive the DRAMs (see Figure 3.24) .

• CHAPTER 3. SYSTEM HARDWARE 48

RAS --RAS Signal Addre ss Generator Streb a \______/

CAS Signal Generator #1 (Write cycle) CAS -CAS ~ Signal Select CAS Signal ~ ~ Generator #2 (Read cycle) t Select

Figure 3.24: RAS-CAS Generation Circuitry

3.3.2 System Hardware Design

The control processor for the decoder is an Motorola MC68000 microprocessor unit running at an 8 MHz clock speed (see Figure 3.25). An MC6850 Asynchronous Interface Adapter (ACIA) handles the communication between the host computer and the NAPLPS decoder. A second MC6850 ACIA is used to accept user input from the keyboard. (This may be changed later depending on the type of keyboard which will be used.) The MC6850 ACIA is simple to use, inexpensive, and readily available, so it was selected for the current prototype. It may be replaced in the future by anew device which can provide several I/0 channels"on a single chip. An MC6840 Programmable Timer Module (PTM) provides the timing references for the system. For instance, decoding the PDI BLINK command requires a reference interval of 1/10 of. a second. NAPLPS decoding routines will eventually be stored in Read-Only Memory (ROM). Static random access memory provides the system software's working area. Because the software's working space is not very large, the increased circuit complexity required by DRAMs is not offset by their lower cost and power consumption.

The prototype currently has 512 Kbytes of frame-buffer memory to store the bit-mapped image. This has enough capacity to make up the required 640 X 480

X 8 frame buffer. (However, it may not be sufficient when all the NAPLPS sets, CHAPTER 3. SYSTElvi HARDWARE 49

such as the DRCS, are decoded.) The frame buffer consists of eight 64K by 8-bit DRAMs. The frame buffer memory is controlled by the ACRTC.

The MPU decodes and interprets all incoming N APLPS codes and processes user input. When required, the MPU sends commands to the ACRTC to update the frame buffer. Under the MPU's control, the ACRTC also handles the screen control functions needed to incorporate other components of the video display circuit. For instance, it provides programmable synchronisation signals for the display monitor, and several programmable hardware cursors.

A single Brooktree Bt453 Colour Palette RAMDAC (Video RAM-Digital-to­ Analog Converter) is used to provide all the functions of the video output cir­ cuitry. The device integrates a 256-colour-by-24-bit colour-lookup table, three 8-bit digital-to-analog converters, and three video buffers on a single chip. It provides RS-343-A-compatible Red-Green-Blue (RGB) analog video output, and supports up to 256 simultaneous colours from a 16.8 million-colour palette. This device is optimal for the system because of its low-cost, availability, highintegra­ tion, circuit convenience and functionality.

The system will provide several video output formats to suit different types of monitors available to the user or to suit the user's requirements. The MC1377 Colour Television RGB to PAL/NTSC Encoder converts RGB output format into PAL video format, so that users with PAL input monitors can use this output format; the MC1374 TV Modulator Circuit provides a modulated PAL signal to drive a standard television set.

All the processes are interrupt driven, except the main routine which is a tight loop whose function is to poll the Lex input buffer and to generate code for the ACRTC. Since there are a limited number of peripheral devices on the system bus, and all are MC6800 compatible, autovector inte.rrupts are used. Vectoring simplifies the interrupt-handler routines, and autovectoring simplifies the circuitry required to generate the interrupt vectors. ACRTC commands are sent from the control processor to the ACRTC via an 8-word (or 16-byte) first-in-first-out (FIFO) memory on the ACRTC. Measurements have indicated that if the commands are sent as blocks of words, the execution time is twice as fast as if a word is sent whenever the control processor has made a word available. This is probably due to the considerable overhead required in the interrupt handling routine. Thus, to gain the maximum performance, ACRTC commands are sent to the device as blocks of words, and only when the FIFO is empty. MtCROPRca: .S.t"t.;tT U'>ltf . ~ ,, < l M-416..,7 r-- It_ ! ~ A~ Nt~~ II!·~ ,_..... ~ ..... ,""' .... I 1'-- l.A10f .4feAYS f- AcfTC I== f- '-- f- I 1fW /c.£. S.C. .; Tt,...lER. f-- 1\'IITC (IHC6840 f-- ijr ~ EN4BIE PTM ) t- 1 1>1MM I R;?o~ 2Ct}: 1- f-- ,iir_ N~VS PtJC·ClJ: - SE~llll CQo.t~. m 7rl.r,tcur L""o/Sfit"F( - ttiT'Ot/Ft'.c~•' ~/i5, rAS ~ elM#:. (N.r6f5r/J' I• 1.\:>l{Ctr ~~

fl- ~tAL CO.C.fl. ~o.ol"!) /li'fUF>'cr -#2 t ~ (Ale~ f-- IKtA'2) f-- f-:- Mnl 1>11-TA. Jwmt

lfA-4~ p,:tt"-r/1<:" ~ t-- lt>Ar>l~r SJHFT $tS~s: , - A,:>tEsS I ll€C.ota: ~ 1-- L..--. I.-.- 7b_,.. ...; t.4-ti'PN:

'Pnk~ J Vf'-4

6E.witliTtJR Ol ~ 0 Chapter 4

Software

A cursory study of NAPLPS protocol confirms its structural similarity to a pro­ gramming language, if each element in the stream is regarded as a lexical token. (This observation is confirmed by Stevens' [26] study as well.) Thus, well-known compiler techniques can be used to recognise the NAP1PS protocol and to gen­ erate the ACRTC command sequences required to present the graphics pictures described by the protocol. In particular, since the NAP1PS protocol can be de­ scribed by an 11(1) grammar, a predictive parser (eg. recursive-descent, without backtracking) could be written. Alternatively, since 11(1) is a subset of 1A1R(l), an 1A1R parser-generator (or "compiler-compiler") could be used to generate a parser. One of the best-known 1A1R parser generator is Yacc (Yet Another Compiler-Compiler [27]): because of its flexibility and availability, Yacc was used to generate the decoder code in this pr:oject.

To use Yacc, the N APLPS protocol must be expressed as a Yacc specification. This specification includes a low-level routine (lexical analyser) to do the basic in­ put, production rules (or grammar rules) describing the NAPLPS input structure, and user-supplied code (or actions or semantic functions) to be invoked when each rule is recognised. From the specification, Yacc builds a parser (called yyparse) to control the decoding. This is a standard LR parser, consisting of an input, an output, a stack to hold parsing states, a transition matrix to derive a new parsing state for each given possible combination of current state and next input token, a table of user-defined actions which are to be invoked when certain structures have been recognised, and an interpreter to control execution. The parser repeat­ edly calls the lexical analyser to recognise the basic elements (or tokens) from the N APLPS input stream. These tokens are parsed according to the input structure

51 CHAPTER4. SOFTvVARE 52

rules (production rules or grammar rules): when one of these rules has been recog­ nised, the user-defined code associated with the rule is invoked to perform the desired actions such as drawing lines, arcs, displaying characters etc (see Figure 4.1).

The lexical analyser was generated by Lex [28], a UNIX program often used in conjunction with Yacc. The tokens of the NAPLPS protocol were expressed as a Lex specification, in which the tokens are described by regular expressions (or patterns), and a user-defined program segment is supplied for each pattern. From the specification Lex constructs a lexical analyser (a table-driven C function called yylex), which reads an input stream and partitions it up into tokens, executing the program segment when each token is found. In this case, the program segments for the most part simply pass tokens on to yyparse and perform a few simple record-keeping tasks.

NAPLPS specifies all primitives including characters as graphics, and all the character sets use the same text attributes [4] (except font patterns). Thus, imple­ mentation will be simplified if all the character sets are handled in an uniform fash­ ion. With the exception of the PDI set, which accepts a variable-length operand, all other character sets accept a single-byte operand. Thus for these reasons, the decoder is designed to have only two main decoder states: the G-set state and the PDI state.

The lexical analyser recognises all the escape sequences as well as a few N APLPS control sequences, and maintains the state information according to the escape se­ quences. The state information includes the machine state of the parser and the NAPLPS code-extension structure. By passing a few control tokens to yyparse, the lexical analyser also directs it to enter the proper state whenever an escape sequence causes a state change.

All other tokens are passed to the yyparse for further analysis. Since the operand length of a PDI command varies and there may be an unlimited number of operands following a command, preparing rules for the parser to decode the PDI commands presents a problem. For example, a typical PDI command format is a single byte command followed by a block of operands, the length of this block varying between one and eight bytes according to the parameter multi-value operand length [4]. Another example is the polygon command, which may accept an unlimited number of vertices. To deal with such problems, PDI operands were specified by recursive grammar rules in Yacc, and a simple routine is called CHAPTER4. SOFTWARE 53

grammar rules (including user-supplied code. eg. NAPLPS lexical decoding routines) rules t 11f lax l yacc I

lexical parser analyzer

stack Driver Routine

parsing ~ G table yyparse

(a)

lexical parser analyzer NAPLPS NAP LPS Deco ding Input stack s tream Driver Routi nes yylex t- Routine I

~ parsing tabla / yyparse sequence of tokens I ------I

(b)

Figure 4.1: Software Structure. (a) Code Generation (b) Code In Use CHAPTER4. SOFTWARE 54

whenever a single operand byte is recognised. This routine has the knowledge of which PDI command is currently in use and its operand length. When the routine recognises that enough bytes (all belonging to the same operand block) have been received, it will call the appropriate graphic drawing routine.

The software was designed in a top-down fashion. The top level consists of the lexical analyser and the parser routines. Below these are the UNIX-like I/0 and the actual hardware-dependent graphic drawing routines. The lowest levei is the utility library. This library includes certain routines required by the I/0 routines for manipulating I/0 buffers and handling interrupts. NAPLPS has only a few basic graphic primitives - characters, point, rectangle, arc, and polygon: thus decoding the PDI commands requires only a few basic drawing operations. The routines to perform those basic operations are also included in the library.

Only one general character-handling routine is used to implement all character sets .. The routine handles all text attributes and updates appropriate data struc­ tures. Each G-set has one function associated with it, and the function has the knowledge of which character pattern should be passed to the general character­ handling routine. The parser has no knowledge of which G-set is in-use: a pointer to a function is used to reference to the appropriate G-set function. The pointer is maintained by the lexical analyser, and the parser uses it to invoke the appropriate G-set function.

In fact, the Lex and Yacc specifications were revised three times before they became stabilised. There were a number of reasons for revising them. First, the scope of the standard is so large that structuring the lexical analyser and parser was a difficult task. Second, there were major changes whenever a new interpretation of the N APLPS standard arose, or whenever a new detail was discovered. Third, it was difficult to establish dependencies and connections between various N APLPS features in order to code them using Lex and Yacc. For example, macros, drcs, etc, make I/0 routines quite complex. On the other hand, it would have required much more effort to hand-code the decoding software, were Lex and Yacc not available.

As will be shown, for all the N APLPS graphics primitives, except filled rect­ angles, the bottleneck of the overall system is the lexical analyser and the parser. Thus at the final stage, the lexical analyser and the parser and the parser table may be replaced by more efficient and/or more compact programs. The programs generated by Lex and Yacc execute quickly, but any parser has a large number of states to pass through, hence the overall parser may be slow, and the tables generated by the lexical analyser and the parser are very large and occupy the CHAPTER4. SOFTvVARE 55

greatest fraction of the required ROM space. Various existing table compression techniques can be used to reduce the table space [29] [30].

When specifying the Lex and Yacc rules, a great amount of work has been put into arranging these rules into logical groups according to NAPLPS features so that the rules can be easily understood. Once all the NAPLPS features have been implemented and tested, some of the rules can be combined into fewer rules to improve the execution speed. To improve the execution speed of the parser further, it may be modified so that several parser actions are combined into one step [29].

Also, since the grammar is LL(l), then a fast parser can be implemented by hand without too much effort [31]. By applying some of these techniques, the amount of ROM and RAM required by the decoder, and hence its total cost can be further reduced, and the execution speed can be improved. However, during the development stage, Lex and Yacc provided a clear general structure of the overall decoder, considerably simplifying development. Also, if any interpretation of the NAPLPS standard is not appropriate or new features are to be added, then only a few Lex or Yacc rules need to be altered to reflect the change. Another advantage of using Lex and Yacc is that the software can be easily ported to a wide range of hardware, including PCs and other graphics processors.

4.1 Basic Principles of Implementing NAPLPS features

4.1.1 Macros

As specified by the SRM, a total of three kilobytes of memory was allocated as a heap for the storage of macro definitions, DRCS characters, and unprotected fields. Although the SRM does not explicitly specify the use of a dynamic storage allocation technique, based on the assumption made when the SRM specifies the storage space, it is quite obvious that a such technique is the best implementation choice[32]. All the macros are stored in the heap as a continuous block in a sequential order (not in the order as each macro is defined). An array of pointers are used to access the start of each macro. Initially, all the pointers are undefined and are set to NIL. At the start of a macro definition, a block of space within the heap is allocated to accumulate the macro. And after the definition is complete, CHAPTER 4. SOFT't-V:ARE 56

y R DISPLAY SCREEN PATT ERN RAM R

ORG

X

Figure 4.2: ACRTC PATTERN Command the contents are moved from the accumulation area to the proper place in the storage space, and its pointer is set to the beginning address. When a macro is to be undefined, its contents are removed from the storage, and its pointer is reset to NIL. Another routine is provided to retrieve the contents of a macro if a macro is to be expanded. Since the macro entries are stored in sequential order, the size of each macro can be easily computed by knowing its own beginning addresses and the next non-empty macro entry. The lexical analyser input routine was modified to allow the option of accepting either a macro as input or the standard input. The DRCS and unprotected fields are handled in the same way as macros.

4.1.2 Character Sets

The ACRTC's Pattern (PTN) command is currently used to display characters. All standard character set patterns will be stored in the system read-only memory. When a character is to be displayed, its pattern must be loaded into the ACRTC's pattern RAM and then PTN command is executed to display the character (see Figure 4.2). One problem in using this PTN command is that the maximum size of a character is limited to 256 by 256 pixels.

4.1.3 DRCS

The ACRTC allows division of the frame buffer into four separate logical screens, and each screen is programmable: for instance it can be visible or non-visible. We intend to use this feature to decode the DRCS. Only one screen will be used for the unit screen: another screen will be used to generate DRCS patterns. At the CHAPTER4. SOFTWARE 57

EDGE COLOUR

SEED POINT

SET DRAWING POINTER TO SEED POINT

Figure 4.3: ACRTC PAINT Command

start of a DRCS definition, the drawing pointer of the ACRTC will be set to the DRCS-screen. During the definition process, the DRCS-screen will be updated. Since the DRCS-screen is identical to the display screen, all the drawing routines may be used without reference to which screen is being accessed.

When a DRCS definition sequence is to be terminated, the bit map in the DRCS-screen is read, packed and stored in the same way as other character patterns are stored. Thus, DROSs will displayed in the same way as all other character sets. We also intend to use the same technique to implement the four programmable texture masks.

4.1.4 Filled Graphic Primitives

Generally, filling is a non-trivial task. There exist many algorithms in the lit­ erature. NAPLPS specifies that for a filled graphic primitive, the area enclosed by the outline including the region of the outline traced by the logical pel, is to be filled in the current colour(s) with texture pattern specified in the TEXTURE command. Thus; as will be shown, filling such figures supported by NAPLPS requires considerable computational effort.

To simplify the decoding effort and achieve a good fill speed, a simple fill method is currently used as a compromise. In this method, a special colour ( re­ ferred to as edge colour) is reserved for implementing fill. This method utilises the ACRTC's PAINT command, which paints the enclosed area surrounded by the edge colour (see Figure 4.3) . • CHAPTER4. SOFTWARE 58

To fill a polygon, its outline is drawn first in the edge colour. Then, an interior point is found by tracing the outline, starting at the top of the polygon. The current pointer is set to this interior point, and the PAINT command is executed to fill the figure in the edge colour with solid texture. The current pointer is set again to the interior point and the figure is filled in proper colour( s) with texture set by the PDI TEXTURE command, and now the edge colour is set to any colour but the edge colour specified by the EGD colour register. After fill, the enclosed area is traced by the logical pel to give the correct dimension (see Figure 4.4). This method gives the correct result only if the texture is solid. Since the background cannot be recovered, figures cannot be filled correctly when in colour mode 0 or 1 and the texture is not solid.

The following (not implemented yet) is an alternation to the method described above. However, it can give the correct results with any texture pattern if the colour mode is 2.

• Draw the outline in the reserve colour.

• Fill the figure in the reserve colour with solid texture.

• In the reserve colour, draw the enclosed area traced by the logical pel.

• Fill the figure in proper colour(s) with texture set by the PDI TEXTURE command.

4.1.5 Logical Pel

The result of drawing a line by the logical pel is a solid rectangular prism (or a series of solid rectangular prisms, if the line texture is not solid). The method currently used to generate the effect of line width is to draw only the four side faces of the prism, and each side face is generated by drawing a family of the same primitive for a given N APLPS PDI command. During the drawing process, the start positions of these primitives in the family are set to the pixel positions which define the the four edges of the first logical pel which is imaginarily drawn at the start position of the primitive specified by the NAPLPS PDI command. CH.-!PTER 4. SOFT\.V.-!RE 59

STEP 1: DRAW OUTLINE IN STEP 2: FILL IN RESERVE RESERVE COLOUR COLOUR

logical Pel

STEP 3: FILL IN DRAWING STEP 4: DRAW PEL IN COLOUR DRAWING COLOUR

Figure 4.4: Direct Fill

Figure 4.5: Effect of line width Generated by Drawing Multiple Lines Chapter 5

Performance Analysis

5.1 Graphics Execution Speed Measurements

The overall system performance is a function of the performance of both the con­ trol processor and the graphics section. A number of measurements were made to quantify the performances of the system software and the graphics hardware, and to compare the performances with a commercially-available decoder. 1 The Canadian Department of Communications has sponsored the development of a NAPLPS Verification Test Package (see §7.1.3). A number of test frames from the package were selected and modified to be used as the test frames for these measurements.

5.1.1 Maximum character-Parsing speed

This measurement attempted to determine the maximum speed at which the lex­ ical analyser and the parser can parse the input stream. Since decoding character sets is the simplest task the lexical analyser and the parser have to perform, this measurement gives the maximum speed at which the lexical analyser and the parser can parse the input stream. The measurement was made as follows: the

1Vanilla decoder Version 1.65 is a PC-based software decoder to be used in conjunction with either a Norpak PCD6 NAPLPS SRM display generator card or an IBM colour graphic card. The performances of three displays were measured for each test. The first one was to run the Vanilla decoder with a PCD6 card. The second one was to run the Vanilla decoder with an EGA card. And the last one was the decoder being developed. The Vanilla decoder was run on a standard IBM-AT machine running at 6 MHz. The spatial resolutions of the PCD6 is around 256 x 200, and the colour resolution is 16. The colour resolution of the EGA card is 16, and its spatial resolution is less than 320 x 200 since the whole screen is not used by the decoding software.

60 CHAPTER 5. PERFORMANCE ANALYSIS 61

Text Characters/second (normalised) Size PCD6 EGA Qing normal 935 (3.81) 787 (3.21) 245 (1.00) small 925 (3.77) 929 (3.78) 245 (1.00) medium 765 (3.11) 517 (2.11) 245 (1.00) double height 696 (2.83) 712 (2.90) 244 (0.99) double size 431 (1.76) 275 (1.12) 241 (0.98)

Table 5.1: Character Display Speeds character-display function was replaced by a dummy function , and then a stream of characters was sent to the decoder at 960 character/sec. The results show that the decoder can parse input at a maximum rate of 747 characters per second, which is less than the line speed of 960 character/sec. An analysis indicates that the lexical a,nalyser spends an average of 652 f.tSec recognising the input tokens while the parser spends an average of f.tSec parsing the input tokens. 2

5.1.2 Character-Display Speed

The speeds of displaying five standard text sizes were measured. These text sizes were: normal, small, medium, double height and double size. The test frame was created as follows: a small file was created first. It contains 94 different characters which represent all the code positions in a 94 code position G-set. It was then duplicated many times. A prologue and an epilogue were then added to the beginning, and end of the input stream, respectively. The function of the prologue is first to disable scrolling so that the decoders can display characters at maximum speed, and then set the text size and colour to green which is different to the default colour of white so that the changes of colours of characters mark the start and end of a measurement. The function of the epilogue is to make sure that the decoder returns to its default state (See Appendix A for the listings of the prologue and epilogue).

2The Software Performance Analyser (Model 64310A) for the Hewlett-Packard IIP64000 Logic Development System was used to measure the performances of the system software and the ACRTC. The "Measuring Module Duration mode" was used to measure the duration of the func­ tion yylex (from entry to exit) to give the average time the lexical analyser spends on analysing the input. Since the parser continuously calls the lexical analyser, the time between successive accesses to yylex (from exit to entry) indirectly gives the average time the parser spends on parsing the tokens. The "Measuring Module Usage Mode" was used to make the measurement. CHAPTER5. PERFORMANCEANALYSffl 62

The results are swnmarised in Table 5.1. Since the ACRTC is capable of dis­ playing 1421 characters per second, 3 the results show that it is not the factor limiting the performance. This is also verified by the measurement now described. The system has a circular input buffer 512 words long assigned to the ACRTC. During the decoding process, the ACRTC instructions prepared by the host CPU are first stored in a short buffer, and then moved to the circular buffer by the func­ tion crtwrite. The ACRTC interrupt service routine moves as many instructions as the ACRTC can accept from the circular buffer to the ACRTC's FIFO. The first instruction in the function crtwrite executes a wait until there is enough space in the circular buffer to store all the instructions from the short buffer. Thus, the nwnber of entries to crtwrite and the number of times the branching instruction is executed in the wait loop give a good indication of the performances of the sys­ tem software and the ACRTC. The measurements indicate that when displaying characters, the control processor never enters the wait loop. 4

Since N APLPS character sets have a number of attributes, such as size, rota­ tion, path, etc., the character-display function currently written is rather complex. Thus if required, the function could be modified so that it handles the default at­ tributes differently to optimise performance.

5.1.3 Point-Drawing Speed

Each slide of the NAPLPS Verification Test Package begins with a standard pro­ logue and ends with a standard epilogue. From a cursory look at some of the slides, it seems that the first 16 bytes is the prologue and the last 6 bytes is the epilogue. For the remaining tests, a few slides were selected from the test package and modified to be used as the test frames. The modification involves removing a nwnber of bytes from the front and end of the test slides, and/or changing the NAPLPS PDI commands in the slide. (Note that the total nwnber of bytes re­ moved can be more than the total number of bytes of the prologue and epilogue so that certain parts of a test frame, which are not relevant for a particular test, are not displayed. For instance, the frame number at the end of each test frame is removed.) For each test, a modified slide was duplicated a number of times and 3 A simple character display routine was written to measure the maximum number of characters the ACRTC is capable of displaying. The routine was a very tight loop which loaded a character font pattern into the pattern RAM of the ACRTC first and then executed the PTN command to display the character. The character size was 22 x 16 which is the same size as the normal text. 4The "Measuring Real-time Program Activity mode" was used to measure the number of times the first machine instruction of the function crtwrite was encountered and the number of times the branching instruction in the wait loop was executed. CHAPTER5. PERFORMANCEANALYSm 63

NAPLPS Code Throughput Pel Size Display Characters/sec Points/sec Normalised PCD6 948 236 1.66 0 X 0 EGA 948 236 1.66 Qing 572 143 1.00 PCD6 937 233 1.64 1/128 X 1/128 EGA 933 233 1.63 Qing 564 141 0.99 PCD6 937 233 1.64 1/32 X 1/32 EGA 930 232 1.63 Qing 564 141 0.99

Table 5.2: Point Drawing Speed then fed into the decoders with the standard prologue and epilogue attached. A few bytes of NAPLPS code were also inserted right after the prologue. The code, as listed in Appendix A, sets the logical pel size, the line texture and texture pat­ tern. Three different logical pel sizes were used, they were: 0 x 0, 1/128 x 1/128, and 1/32 X 1/32. Note that the logical pel size of 0 X 0 maps to one display pixel.

To measure the performance of drawing points, a small file was created first (see Appendix A). It contained a NAPLPS POINT (Absolute, Visible) command. It was then duplicated many times to be used for the test.

The results are summarised in Table 5.2. The normalised results of the figures in the column "NAPLPS Code" are identical to those in the column "Through­ put", thus the normalised results are listed in only one separate column. Note that during each measurement, a test file was created first, and then the contents of the file were sent to the decoders at 960 character/sec. Thus, the figures listed under the column "NAPLPS Code, Characters/sec'' are the throughputs (or effective transfer rates) of the serial lines; and the figures listed under "Throughput" indi­ cate the actual number of a particular N APLPS primitives (or the number of test frames) the decoders can decode and display within one second. Measurements also indicate that during the test, the control processor never enters the wait loop in crtwrite. CHAPTER 5. PERFORMANCE ANALYSIS 64

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 185 0.1169 0.45 0 X 0 EGA 186 0.1176 0.45 Qing 414 0.2616 1.00 PCD6 141 0.0893 0.34 1/128 X 1/128 EGA 136 0.0858 0.33 Qing 332 0.2094 0.80 PCD6 73 0.0462 0.18 1/32 X 1/32 EGA 63 0.0396 0.15 Qing 211 0.1335 0.51

Table 5.3: Rectangle (outlined) Drawing Speed

5.1.4 Rectangle-Drawing Speed

The test frame "Series A Test 4.2.1" (frame number 2000; file name: a4x2xl.at7) was selected and modified to measure the performance of drawing rectangles.

The first test was to measure the speed of drawing outlined rectangles. The first 16 and last 57 bytes were removed from the test frame (frame number 2000). The results are given in Table 5.3. Since the line drawing speed of the ACRTC is independent of the line texture while the Vanilla decoder does not implement line textures at all, no comparisons were made for different line textures. For the three measurements with different pel sizes, the total number of times the branching instruction was executed in the wait loop varied between 1 and 2 percent of the total number of times the function crtwrite being called.

The second test was to measure the speed of drawing filled rectangles. The first 16 and last 57 bytes were removed from the test frame (frame number 2000) and all the NAPLPS SET & RECTANGLE (outlined) commands were changed to NAPLPS SET & RECTANGLE (filled) commands. The results are listed in Table 5.4. The ratio of the total number of times the br~ching instruction was executed in the wait loop to the total number of times the function crtwrite was called, is 68:1. Thus, the ACRTC was running at full rate most of the time. The test also shows that the drawing speed of the ACRTC is independent of the fill pattern. CHAPTER 5. PERFORMANCE ANALYSIS 65

Texture NAPLPS Code Throughput Pattern Display Characters/sec Frames/sec Normalised PCD6 159 0.0998 0.74 Solid EGA 155 0.0974 0.73 Qing 214 0.1342 1.00 PCD6 77 0.0487 0.36 Vertical EGA 122 0.0767 0.57 Hatching Qing 214 0.1342 1.00 PCD6 76 0.0480 0.36 Horizontal EGA 119 0.0749 0.56 Hatching Qing 214 0.1342 1.00 Vertical and PCD6 70 0.0440 0.33 Horizontal EGA 119 0.0748 0.56 Cross-Hatching Qing 214 0.1342 1.00

Table 5.4: Rectangle (filled) Drawing Speed

5.1.5 Polygon-Drawing Speed

The first test was to measure the drawing speeds of filled polygons with 144 vertices. The test frame "Series A Test 5.3.1", (frame number 1042; file name: a5x3xl.at7), with the first 16 and last 57 bytes removed was used as the test frame. The results are given in Table 5.5. The test shows that the fill speed of the ACRTC is independent of the fill pattern. Also, the control processor never enters the wait loop in crtwrite.

This second test was to measure drawing speeds of filled polygons with 256 vertices. The test frame "Series A Test 5.3.2", (frame number 1270; file name: a5x3x2.at7), with the first 16 and last 839 bytes removed was used as the test frame. The results are given in Table 5.6. The total number of times the branching instruction was executed in the wait loop varied between 2 and 6 percent of the total number of times the function crtwrite was called.

5.1.6 Line-Drawing Speed

The test frame "Series A Test 2.2.1", (frame number 1009; file name: a2x2xl.at7), with the first 16 and last 57 bytes removed was used as the test frame. The results are given in Table 5.7. The total number of times the branching instruction was executed in the wait loop was only 1 percent of the total number of times the CHAPTER5. PERFORMANCEANALYSB 66

Texture NAPLPS Code Throughput Pattern Display Characters/sec Polygons/sec Normalised PCD6 733 1.6411 3.80 Solid EGA 675 1.5121 3.51 Qing 193 0.4313 1.00 PCD6 550 1.2315 2.86 Vertical EGA 719 1.6094 3.73 Hatching Qing 193 0.4313 1.00 PCD6 542 1.2146 2.82 Horizontal EGA 711 1.5924 3.69 Hatching Qing 193 0.4313 1.00 Vertical and PCD6 507 1.1364 2.63 Horizontal EGA 710 1.5890 3.68 Cross-Hatching Qing 193 0.4313 1.00

Table 5.5: Polygon (144 vertices) Fill Speed

Texture NAPLPS Code Throughput Pattern Display Characters/sec Polygons/sec Normalised PCD6 939 1.2156 3.32 Solid EGA 932 1.2058 3.29 Qing 283 0.3660 1.00 PCD6 935 1.2107 3.31 Vertical EGA 929 1.2029 3.29 Hatching Qing 283 0.3660 1.00 PCD6 935 1.2097 3.31 Horizontal EGA 930 1.2039 3.30 Hatching Qing 283 0.3660 1.00 Vertical and PCD6 941 1.2175 3.33 Horizontal EGA 933 1.2077 3.30 Cross-Hatching Qing 283 0.3660 1.00

Table 5.6: Polygon (256 vertices) Fill Speed CHAPTER 5. PERFORMANCE ANALYSIS 67

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 915 0.5535 2.12 0 X 0 EGA 898 0.5435 2.08 Qing 431 0.2610 1.00 PCD6 565 0.3417 1.31 1/128 x.1/128 EGA 483 0.2924 1.12 Qing 237 0.1431 0.55 PCD6 149 0.0902 0.35 1/32 X 1/32 EGA 116 0.0703 0.27 Qing 102 0.0619 0.24

Table 5.7: Line Drawing Speed function crtwrite was called.

The test frame "Series A Test 2.5.1", (frame number 1015; file name: a2x5xl.at7), with the first 437 and last 146 bytes removed was used as the test frame. This test attempted to measure the performance of line drawing of various lengths and slopes. The result shows that the performance is identical to the first test, and thus indicates that the ACRTC is very idle.

5.1.7 Arc-Drawing Speed

The test frame "Series A Test 3.2.1" (frame number 1021; file name: a3x2xl.at7), with the first 260 and last 64 bytes removed, was used to measure the performances of drawing very small outlined arcs. The results are given in Table 5.8. The ratio of the total number of times the branching instruction was executed in the wait loop to the total number of times the function criwrite was called, is 0:100 for pel size of 0 x 0, 7:100 for pel size of 1/128 x 1/128, and 3:100 for pel size of

1/32 X 1/32.

The test frame "Series A Test 3.2.1" (frame number 1021; file name: a3x2xl.at7) was selected and modified to measure the performances of drawing very small filled arcs. The first 260 and last 64 bytes were removed from the test frame (frame number 1021), and all the NAPLPS ARC (outlined) commands were changed to NAPLPS ARC (filled) commands. The results are give in Table 5.9. Note that during the test, the host CPU never entered the wait loop in criwrite . • CHAPTER 5. PERFORMANCE ANALYSIS 68

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 623 2.9412 1.24 0 X 0 EGA 636 3.0000 1.26 Qing 505 2.3810 1.00 PCD6 501 2.3622 0.99 1/128 X 1/128 EGA 505 2.3810 1.00 Qing 312 1.4706 0.62 PCD6 288 1.3575 0.57 1/32 X 1/32 EGA 258 1.2195 0.51 Qing 148 0.6964 0.29

Table 5.8: Arc (outlined) Drawing Speed

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 467 2.2059 1.63 Solid EGA 471 2.2222 1.64 Qing 286 1.3514 1.00 PCD6 424 2.0000 1.48 Vertical EGA 448 2.1127 1.56 Hatching Qing 286 1.3514 1.00 PCD6 408 1.9231 1.42 Horizontal EGA 441 2.0833 1.54 Hatching Qing 286 1.3514 1.00 Vertical and PCD6 397 1.8750 1.39 Horizontal EGA 445 2.0979 1.55 Cross-Hatching Qing 286 1.3514 1.00

Table 5.9: Arc (filled) Drawing Speed CHAPTER 5. PERFORMANCE ANALYSIS 69

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 845 2.3810 1.40 0 X 0 EGA 866 2.4390 1.44 Qing 602 1.6949 1.00 PCD6 845 2.3810 1.40 1/128 X 1/128 EGA 845 2.3810 1.40 Qing 493 1.3889 0.82 PCD6 845 2.3810 1.40 1/32 X 1/32 EGA 832 2.3438 1.38 Qing 333 0.9381 0.55

Table 5.10: Arc (outlined) Drawing Speed

The test frame "Series A Test 3.6.1", (frame number 1459; file name: a3x6xl.at7), with the first 256 and last 64 bytes removed, was used to measure the performances of drawing outlined arcs of various sizes. The results are given in Table 5.10. The ratio of the total number of times the branching instruction was executed in the wait loop to the total number of times the function crtwrite was called, is 0:100 for pel size of 0 X 0, 0:100 for pel size of 1/128 x 1/128, and 5:100 for pel size of 1/32 x 1/32.

The test frame "Series A Test 3.6.1" (frame number 1459; file name: a3x6xl.at7) was selected and modified to measure the performances of drawing filled arcs of various sizes. The first 256 and last 64 bytes were removed from the test frame (frame number 1459), and all the NAPLPS ARC (outlined) commands were changed to NAPLPS ARC (filled) commands. The results are given in Table 5.11. During the test, the host CPU never entered the wait loop in crtwrite.

5.2 Analysis and Discussion

5.2.1 Bottlenecks and Possible Improvements

To measure the software throughput, all the measurements described above were repeated, except that the ACRTC interrupt service routine was modified so that instead of sending commands to the ACRTC, it sent them to an internal register of the control processor. The results were all identical, except for drawing filled CHAPTER 5. PERFORMANCE ANALYSIS 70

NAPLPS Code Throughput Pel Size Display Characters/sec Frames/sec Normalised PCD6 852 2.4000 2.21 Solid EGA 832 2.3438 2.16 Qing 386 1.0870 1.00 PCD6 866 2.4390 2.24 Vertical EGA 852 2.4000 2.21 Hatching Qing 386 1.0870 1.00 PCD6 852 2.4000 2.21 Horizontal EGA 845 2.3810 2.19 Hatching Qing 386 1.0870 1.00 Vertical and PCD6 852 2.4000 2.21 Horizontal EGA 852 2.4000 2.21 Cross-Hatching Qing 386 1.0870 1.00

Table 5.11: Arc (filled) Drawing Speed

rectangles. 5 This shows conclusively that the bottlenecks are in the lexical anal­ yser, the parser, and/or the decoding routines, and that the ACRTC is idle most of the time except when drawing filled rectangles. However, much of the com­ plexity of the decoding routines is the direct result of the poor mismatch between the graphics features supported by the ACRTC, and those required to implement NAPLPS efficiently. A performance improvement is expected should some of the routines be fine tuned, or re-written in assembler, or were some function calls be replaced by C macros. The ACRTC interrupt service routine and crtwrite were initially written in C. Then, the assembler listings of these two routines were later optimised and fine tuned. This improved the performance of the decoder by a factor of 2 to 3 for all graphics drawings. Before optimisation, the decoder could display only 78 characters per second, afterwards, 245 characters per second. This is about the maximum speed at which a viewer can read text.

Fill

The current implementation requires that most filled primitives must be filled twice: if a frame (or page) consists of many filled primitives, then the speed can be quite slow. Because a proper fill cannot be implemented efficiently by using the ACRTC, as will be discussed in §7.2.2, it would not be justified to invest the extra effort to implement it. (The Vanilla decoder does not fill polygons with the correct

5The results were: 537 character/sec and 0.3371 frame/sec. CHAPTER 5. PERFORMANCE ANALYSIS 71

patterns if it runs with an EGA card and the texture patterns are not solid.)

Logical Pel

The speed of drawing the outline of a primitive by the current method is a function of the logical pel size. However, a large logical pel size is not expected to be used very frequently. In §6.2.1, some proposals are made to improve the speed by means of optimised hardware and more efficient decoding algorithms.

5.2.2 Cost/Performance Tradeoffs

The system performance can be further increased if the frame buffer memories currently used (with cycle time of 260 ns) are replaced by faster DRAMs. For instance, if DRAMs with a memory cycle time of 230 ns were to be used, the performance is expected to be increased by around 10 per cent. However, the cost of the frame buffer memory is expected to be also increased by the same amount, and since the ACRTC is idle most of the time, the increased cost cannot be justified unless the software performance is first improved. Chapter 6

Extensions to NAPLPS standard

6.1 Introduction

Animation and user interface have created many implementation difficulties for the PAL project. Animation is a very effective mean of presenting motion, rela­ tionships, changes, etc. A good user interface will provide a very productive and friendly environment for both the author and the user. But, there are no simple solutions to animation and user interaction issues. Colour table animations as used in NAPLPS is rather limited and the standard does not explicitly specifies user interaction, except only a few code sequences and conventions for transfering user input back to the host computer. Furthermore, N APLPS does not support the capability of grouping and naming collections of graphic primitives (segmen­ tation): as will be shown later, this presents some difficulties for implementing animation and echoing input devices.

Although the prime concern during the development of the NAPLPS standard was to ensure hardware independence, some implementation cost issues were also considered. For example, one of the design considerations was the balance between rich features and implementation cost. As a result of this consideration, orienta­ tion, perspective transformations, and other viewing operations commonly used in computer graphics were considered too complex to implement inexpensively, and were omitted [33] [34]. This may also have been the reason for the decision to include only simple colour table animation rather than animation by directly manipulating the frame buffer.

72 CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 73

There are several reasons that N APLPS (or any other Videotex/Teletext stan­ dards) does not specify the user interface explicitly. One is that the initial empha­ sis was on applications such as information retrieval, which do not require heavy interaction: a user request is only a few keystrokes. Even in CAI applications, the most frequently used interaction styles are multiple choice and menu-driven, which also require only a few keystrokes. In fact, one of the original motivations behind the British Prestel service was to improve the utilisation of the telephone network in off-peaks hours (35]. Heavy user interface requires a large bandwidth. But the telephone network is a low-speed communication medium, and at the early stage of videotex development, the cost of high-speed modems was too high for the average consumer (or the technology was not available for producing high-speed modems over the normal telephone networks). Furthermore, if the user interaction is mostly handled by the host computer, then the delays in other communication networks, eg. a packet-switching network, present another problem: the lag be­ tween the actual movements of an input device and its echo can be too long to allow any workable user interaction. Another possible reason was the lack of study in user interface at the time when NAPLPS was first developed. The final point is that the present standard is only the first stage and is intended to specify only the presentation features. Since then, there has been a lot of work done in user interfaces, and several interaction models have been proposed that are appropriate for NAPLPS. The technology has also advanced. Thus, now is the time to specify the user interface explicitly, or at least to provide some guidelines.

Fortunately, NAPLPS is open to future extensions: graphics animation, tele­ software, photographic coding and other features have been proposed and some have been accepted and approved (36] [10] [37]. Canada and Japan have been studying timed geometric drawing techniques for animation. As will be described later, another research project was conducted to introduce a new set of animation primitives to be integrated into the N APLPS standard [38].

This chapter is divided into three sections. The first section describes ani­ mation techniques in NAPLPS, and the limitations of each technique. Further improvements will be presented. The second section discusses user interaction is­ sues. The last section will suggest an idea which will implement the suggestions given in the first two sections. CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 74

TO

T1

T2

T3

TIME

Figure 6.1: Colour Cycling

6.2 Animation Techniques

The NAPLPS specification does not contain time-descriptor instructions and the PDI WAIT command is the only one to be used for time control. NAPLPS uses colour map animation, but as will be shown later, because of the limitations asso- ciated with this type of animation and the limited colour palette size (16) specified by the SRM, only simple animation can be achieved.

6.2.1 Animation techniques

There exist number of techniques to implement animation: colour table animation, direct updating of frame buffer and object-oriented animation.

Colour Table Animation

The colour table can be used to create very simple, economical, but limited an- imation effects. The simplest case is called colour cycling. Every entry in the colour table is shifted by one pixel value in a cyclic fashion to create a smooth fluid-flow-like effect (see Figure 6.1) . • CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 75

.-. .-. '. -. ,., .. .·-·...... • .• ...... : •.. • ...... "••' .···...... ··" . .• ...... • .... . ·--· .·-·· ...... •. . . . ••• ...... •.. • •... • •.·--· .

Figure 6.2: Simple Colour Table Animation

Figure 6.3: Complex Colour Table Animation

Another variation is alternate animation. Each successive position of an object is drawn with a different colour. Initially, all the entries in the colour map are set to the background colour, so that in effect all the positions are invisible. The animation effect is achieved by changing the colour table entries from background colour to another colour at each step. Several objects can be animated by using this method (see Figure 6.2).

There are several limitations associated with colour table animation. First, this method is easy to generate only as long as the background is a single colour. Second, if successive pictures of the object( s) overlay, then extra colours must be used for the areas of intersection. To handle the. more general case where an object is moved over a complex background, or where deformation and overlap will occur at each successive step, the following approach can be applied. All the successive positions are combined into only one composite picture, and each unique area due to overlap and inter- section is drawn with a different colour. Then, at each animation step, the colour definitions of a particular step are changed to be visible (see Figure 6.3).

In any case, the number of steps using colour table animation is quite limited (depending on the number Qf pixel-planes). Specifically, since the SRM allows the CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 76

presenter to use up to only 16 different colours, then the maximum number of steps should be less than 16. For the general case, the relationship between the number of colours (i) in an object, the number of steps (n) and the maximum number of distinct areas (A) in the composite picture is given by: A = in [39]. (Each pixel may have any value of i in each of the n steps, thus the total number of pixel values used is in.) The following table gives the possible values of i, n and A for a colour palette size of 16:

Colours(i) Steps(n) Max. No. of areas( A) 2 4 16 3 2 9 4 2 16 5 1 5

Direct Updating of Frame Buffer

There are various ways to update the frame buffer. One is to update the frame buffer directly: images can be drawn, erased, and r<:;drawn to create an animation effect. Another technique is to have a large frame buffer space. Only a portion of the frame buffer is used to refresh the display, and the rest is used to store successive frames. Once all the frames have been created, they are then displayed sequentially. This method is simple but requires a large amount of memory and time to generate.

In the general case, such as for applications where complex animation is re­ quired, then off-line or non-real-time generation of the images is often necessary. But this approach is impractical for real-time interactive animation. Despite the advances in graphics hardware and the decline in memory cost, the cost of pro­ viding such animation has been always high due to the large bandwidth required for updating images. The high cost has prohibited the use of real-time interac­ tive animation in many applications except for those where the high cost can be justified. CHAPTER 6. EX'rENSIONS TO NAPLPS STANDARD 77

Object oriented Animation

In this approach, collections of graphic primitives are grouped as objects (seg­ ments), and the objects are moved around the screen as units by selectively al­ temating and erasing the objects. If a third dimension ( eg. z-dimension can be specified in a NAPLPS operand) is used, then it is very easy to achieve by allocating some planes for background and others for animation.

Pabouctsidis [38] suggests extensions for animation to be integrated into the NAPLPS standard. He proposed three additional G-sets and a few control codes. These three G-sets are: the Object Set, the Animation Description Instructions Set (AD Is), and the Animation Macro Set.

The object set defines shapes of objects in a very similar way to how the DRCS are defined, except that the objects are in colour.

The ADis set is used to define animation parameters. Its structure is very similar to the PDI set. Some of the parameters are: move an object to a position, move an object alone a line or a polygon or a circle, wait to generate a pause, motion speed encoded in relative distance traveled in second, etc.

The animation macro set is used to group ADis as macros, and the macros can be called to start animation sequences. The new control commands, which are similar to the DEFINE DRCS, are DEFINE OBJECTS, DEFINE ANIMATION MACROS, ANIMATION STOP, END, and SYNCHRONIZE.

In conclusion, colour-table animation as used in N APLPS is rather limited and is sufficient only for simple animations. Section 7.4 will present a temporary solution for better animation.

6.3 User Interface

Although some window systems can be potentially adopted for N APLPS, there are a few technical problems that need to be solved first. The main weakness of interactive videotex is the low communication speed: the commonly used protocol of 1200 bit/s from the host to the user, and 75 bit/s from the user to the host. Because the amount of information that can be sent from the user terminal to the host is very limited, only simple user-interaction styles are possible. So far, most applications use a standard keyboard or numeric keypad, and they are fine CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 78

for entering a few characters. To manipulate graphic information, numeric input devices, such as mice or tablets, are desirable. But, one problem which arises is that, due to the lack of an input specification, the me~hod of incorporating an input device is implementation dependent. This is contrary to the main objective of the PAL project: portability.

The last problem is the difficulty of providing echo for a numeric input device due to the lack of segmentation. An echo is usually a graphical symbol, such as an arrow, which is created by a collection of graphic drawing commands (or a segment). In response to the movement of the input device, the segment is selected and erased, and then re-drawn at the new position. However, the best can be achieved by using the existing NAPLPS features is to erase the echo by drawing it with the background colour, and re-draw it at the new position. This may fail if the background is complex, and because of the overhead required, this may also result in an unacceptable slow interaction time.

Of course, any proposal for solving animation and user interface problems re­ quires time to be approved and actually implemented. However, temporary solu­ tions may be used to solve animation and user interface problems. One possibility is to add some local intelligence to the user terminal. For instance, a pointing device such as a mouse can be incorporated into the terminal. The N APLPS Field capability may be used by the application layer for user interaction. Since any legal NAPLPS stream can be placed in a field, a user is able to enter and edit the information in an unprotected field before its contents axe eventually sent to the host. For instance, mouse movements can be coded as a series of NAPLPS PDI move drawing-pointer commands. To offload the host, the tasks of keeping track of the coordinates of the pointing devices and providing visual feedback are done locally. To minimise the communication traffic and keep application performance high, the host computer uses only poll mode (an enquire function), and the ter­ minal sends information to the host only when the host has a request or when the user has pressed a mouse button on the keyboard or mouse. More details are given in the following section.

The narrow-band telephone networks only present problems now, but they are not the only medium for delivering videotex services. ISDN, fast packet switching networks, cable TV networks axe gaining wide acceptance, and they axe likely to become the principle media in the future. Even if we continue to use the telephone network for delivering videotex services, advances in digital signal processing, VLSI design, coding techniques, modulation schemes, and other areas have made it CH.4.PTER 6. EXTE.VSIONS TO .VAPLPS ST.-!.NDARD 79

Host 110 Bit-map - Logic D Animation Box

Bit-map c:J Mouse Standard NAPLPS Decoder

Figure 6.4: QingBox possible to produce fast modems (greater than 9600 Baud) to be used over normal telephone networks.

6.4 Temporary Solution

Before any of those proposals mentioned in this chapter can be officially accepted, we may be able to implement some of them by adding extra hardware to existing NAPLPS decoders as shown in Figure 6.4. The advantage of such a configuration is that the extensions are transparent to existing decoders.

The basic idea is to add an extra image plane to cover an underlying image which is created by a standard NAPLPS decoder. The underlying image is the background image (or static image), and the foreground, which is a dynamic im- age, is drawn on the extra plane. The foreground has higher priority than the background, ie. anything drawn on the foreground will cover what is behind it.

N APLPS is an obvious choice as the format of information interchange be- tween the host and the combined hardware configuration. The background and foreground information can be distinguished by using the device control techniques based on the ISO reference model (40]. The animation effect is achieved by ma- nipulating the colour map. However, a possible disadvantage of this approach is slow response due to the host overheads required to encode animation sequences, CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 80

and the low data transfer rate if a low-speed communication channel is used. Also this would have the same disadvantages as mentioned previously when encoding complex animations.

Another format may be to use a high-level description language similar to that proposed by Pabouctsidis. The box should have a high degree of local intelligence. The host only downloads object specifications to the box, and leaves the box to handle the animation.

This approach will offer many advantages. First, since the high level PAL routines do not have to know the low level details, the implementation of high level PAL is technology independent. Second, the implementor can take full advantage of the state of the art graphics hardware.

To extend Pabouctsidis' idea further, the specification of an object can be structured as a tree. The importance of each element of the object is ordered in such way that if less important information is ignored by the box, the resulting image is still identifiable.

For example, a human body can be specified as:

Levell: Body: Head Arms Legs etc.

Level 2: Head: head_outline eyes nose etc

Level 2: Arms: arm_outline Hands etc etc

Level 3: eyes: eye_outline eye_texture

(Significant decreases with increasing level number.)

By displaying an object in the order of importance, we can display dynamic images in "real time". One possible motion attribute may be a time constraint. For instance, the host is able to specify the number of steps of a moving object from the start position to the end position with no time constraint. In this case, the decoder will draw the complete frame before it starts to draw the next one. If on the other hand, the host does specify the time duration for a moving object (the host may also specify the number of steps as well), then the decoder has to conform to the time duration. If the box is powerful enough it will draw a complete frame at each step. If it is not powerful enough, then it will ignore the less important information to conform to the time constraint. CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 81

Another advantage of such a scheme is that the implementation of user in­ teraction will be significantly simplified. The foreground can be used for user interaction.

The disadvantages are, obviously, the incompatibility of the protocol with stan­ dard N APLPS, and the extra hardware cost. But the low cost of the PAL package may offset this cost, which is around only few hundred dollars. A "total" solution may be to supply different sets of firmware to suite individual user's needs.

6.4.1 Technical Details

One possible way to create animation effect is first to clear the foreground plane to transparent colour so the background is visible, and then second, to draw the moving objects on the foreground. To move an object, it is redrawn in transparent colour to erase it and it is then drawn at a different position in visible colour. Of course colour map animation is another possibility.

To achieve "real time" animation, the following suggestion is proposed. To conform to the time constraint, the decoder may use a "self-analysis" procedure. For example, after the host has specified the number of frames to be displayed for an object at a given duration, then the box will determine the maximum allowable duration which the graphic processor can spend on drawing a single frame and sets a clock to mark the duration. And after the measurement, the time the graphic processor spends on drawing each subsequent frame will not exceed maximum allowable duration.

The foreground plane will also provide a moving cursor on the screen. The reason for using the foreground for user interaction is that since the foreground is provided by the new decoder, its resolution is known and each individual pixel of the foreground can be counted accurately for picking purpose. On the other hand, since the background is provided by a standard decoder, and the video output is most likely an analog signal, its resolution is unknown. And even if it were known, alignment between the cursor position and the background pixels behind the cursor could not be always guaranteed. (For many applications, this may not be a problem, since only regions, rather than each individual pixel, are needed for picking purposes.) CHAPTER 6. EXTENSIONS TO NAPLPS STANDARD 82

Some related issues may be further explored:

• acceptable minimum number of frames per second

• synchronisation of multiple objects

• what is the minimum acceptable object if some details are removed

• Chapter 7

Discussion

In our experience with the project, designing a N APLPS decoder has two parts. One is the study of the standard itself, such as the features and interpretations of the standard: the other is the study of the basic decoding algorithms. It is interesting to note that although the number of basic graphic primitives NAPLPS supports is very small, decoding every one of them has taken a large portion of the development time.

7.1 NAPLPS standard

7.1.1 Scope of the standard

As discussed previously, due to the large scope of the standard, the decoding software needed several revisions in order to implement all the N APLPS features correctly. The complexity of the N APLPS standard has created much confusion to implementors [41]. Compiler-development tools such as Lex and Yacc, and other tools available in the UNIX environment, such as make, have shown the strength of providing a very productive environment for software development.

Initially, the software was compiled on a Hewlett-Packard 64000 Logic Devel­ opment System (LDS). It soon became clear that the C-compiler available on the LDS has too many limitations to be productive. For instance, one serious prob­ lem is that the compiler cannot initialise a structure while it is being declared; a structure must be declared first, and then initialised by hand coding. Another deficiency is the slow compilation and linking speed. Furthermore, the compiler

83 CHAPTER 7. DISCUSSION 84

cannot choose the correct addressing mode according to the effective address of a variable. It uses the default addressing mode only, unless controlled explicitly. This means that when the default addressing mode cannot produce the correct effective address, the compiler will give a warning, and the programmer must in­ sert a compiler directive in the source file in front of the point where the warning was given to change the addressing mode. This is a trial-and-error process and takes a considerable time if the program is large. Consequently, it was decided to compile and link all the modules by using cross-compiler tools available in the UNIX environment, the object files being down-loaded into the prototype's mem­ ory directly. Only small programs, which were used to measure the efficiencies of the graphics decoding routines, were compiled by the C-compiler on the LDS. This is necessary because the performance-analysis tools available on the LDS require certain program information such as symbols and program source line numbers, etc.

7.1.2 Protocol Error Sequences

A very desirable feature of the software is the ability to recover from errors. There are a number of possible errors: protocol errors made by the presenter, and trans­ mission errors. Certain protocol errors have been be specified by Lex and Yacc rules so that they can be detected and corrected. For instance, when PDI set is currently in use, a possible protocol error is that operands are received without a preceding op-code. When the parser cannot recover from a very serious protocol error, a re-transmission requests is the only answer. Also, it is difficult (perhaps impossible) to find all the possible protocol error sequences and specify them as Lex and Yacc rules. However, since N APLPS codes are normally machine-generated, the chance of getting protocol errors is expected to be very small.

Ideally, transmission errors should not be a concern here, because NAPLPS is at layer six of the ISO communication model, and specifies information interchange under the assumption that the transmission is error free. The transmission error detection and correction are supposedly dealt with by the lower four layers. But a N APLPS decoder is usually regarded as a dumb terminal (N APLPS was initially specified for such a terminal) and no NAPLPS decoder implements all the lower layers. Thus, transmission errors may be detected by the parser as protocol errors. Fortunately, from our experience, if a decoder is used with a standard 1200/75 Baud modem, transmission errors do not occur very frequently (two or three a day). CHA.PTER 7. DISCUSSION 85

7.1.3 Interpretation and Verification of the NAPLPS stan­ dard

The interpretation of the NAPLPS standard presents another problem. The same NAPLPS stream may produce different pictures when different decoders are used because of differing interpretations of the standard.

For example, a common question asked is what action should a decoder take in the case of missing operands of PDis. The standard gives a very general rule on this issue. In the standard section 5.3.2.2.5 Operand Length, states, in part,

"If an operand following an opcode is shorter than the length previously specified by the DOMAIN command {or the implicit length in the fixed format case), trailing zero bits are supplied by the receiving presentation process, unless otherwise indicated in the definition of the command".

Certain PDI commands have several operands, and the question here is whether this statement includes the case where operands are completely missing. The following example is adopted from Mehra's paper [41]:

If a SET and POLY command is received with no operands, then there exist several interpretations:

1. The PDI command is ignored, with the drawing point remaining in its current position. 2. The drawing point is set to 0,0 and nothing is drawn. 3. The drawing point is set to 0,0 and a point is drawn. 4. The drawing point is left at its current position and a point is drawn.

The joint ANSC X3L2.1 and CVCC /CSA/WG committees recommended that if such a case occurs and no action has been stated in the standard, missing operands are regarded as zeros.

After the second version of the software was completed, Mehra's paper was obtained. In his paper, he describes the procedures involved in maintaining of the N APLPS standard, and discusses some technical issues. From the discussions given in the paper, it was found that some of our interpretations were different to the recommendation. Therefore, the software has been modified to comply with the recommendation. CHAPTER 7. DISCUSSION 86

The Canadian Department of Communications had sponsored the development of a NAPLPS Verification Test Package, and the package has been reviewed and approved by the joint committees. The aim of the test package is to provide "an uniform means of evaluating and verifying conformance and ensuring consistent implementations" (41]. Although it is available in North America, it could not be obtained outside North America until quite recently. At the time of writing, the package was just received and the decoder has not been exhaustively tested with it.

7.2 Decoding Algorithms and Optimised Hard­ ware Support

The efficiencies of graphic-primitive decoding algorithms affect the overall system performance. Despite of the claim that the ACRTC supports NAPLPS, this study has shown that it does not support the basic NAPLPS graphic primitives well: the control processor still needs to perform complex code-generation algorithms.

On studying the features of most common new graphics processors, it seems that none of them provides all the necessary features and hardware support on a single chip to implement N APLPS primitives efficiently, while at the same time providing a high-level instruction set to simplify software development.

7.2.1 Logical Pel

Two methods have been investigated to draw outlined primitives. One is to com­ pute and draw the final shape of the 7esulting figure, traced by the logical pel, and then fill it. The other is the method currently used which is to draw a family of primitives to give the effect of line width.

Fill Method

In this method, the outline of the final shape of the resulting figure must be filled twice: first in a reserve colour and then in the foreground colour. This requires the following sequences (see Figure 7.1):

• draw the outline of the enclosed area in a reserved colour CHAPTER 7. DISCL'SSION 87

STEP 1:0RAW STEP 2: FILL IN STEP 3: FILL IN OUTLINE IN RESERVE DRAWING RESERVE COLOUR COLOUR COLOUR

Figure 7.1: Draw a Line By Using Fill Method

Figure 7.2: An Arc

• find an interior point

• fill it in the reserve colour with solid texture

• fill it in the desired drawing colour with solid texture

The line texture must be implemented by dividing a primitive into segments first, and then each one is drawn individually. This is a rather complex and time- consuming, especially when drawing arcs. For example, a N APLPS arc command can result in drawing a circle if the start and end points are coincident. The resultant outline of the area traced by the pel is not a circle at all, but segments of circles and horizontal and vertical lines (see Figure 7.2). CHAPTER 7. DISCUSSION 88

'I 'I I• I I I I ,'I .,,," ......

Figure 7.3: Start and End Pels Overlay

To draw the area traced by the logical pel along the path of an arc with solid texture, the outline has to be divided into segments first according to quadrants, and then each segment is drawn individually. On the other hand, it could be simpler if the outline of the entire enclosed area is drawn first, and then filled. However, one problem associated with this approach is that if the first and last pels overlap, then the outline will not be properly filled by the hardware fill algorithm (see Figure 7.3). Accordingly, the steps to draw an arc with texture can be even more complex.

Multiple Draw Method

The current implementation requires several steps:

• load ACRTC's colour registers according to NAPLPS colour mode.

• load ACRTC's pattern RAM according to the line texture specified the PDI TEXTURE command.

• draw two pels at both the start and end positions by drawing rectangles.

• Repeat until all four side faces are drawn

1. move to the start position. • 2. draw a line to the end position. CHAPTER 7. DISCUSSION 89

DOT SPACING > DIAGONAL DOT SPACING • DIAGONAL DOT SPACING < DIAGONAL

Figure 7.4: Line Texture and Dot Space

3. reset pattern RAM's parameters.

N APLPS states that the outline drawing algorithms are implementation de- pendent. However, since in colour mode 0 or 1, the inter-dot spacings are not drawn, this method leaves all four side faces visible. Thus, the size of the dot should be greater or equal to the diagonal of the logical pel to ensure that the result is correct. Otherwise, each dot segment does not appear as a solid rectan- gular prism, but an open-ended box (see Figure 7.4). It is possible to avoid such a problem by increasing the number of primitives in the family to the number of pixels required to define the entire area of the logical pel. But, the disadvantage is a long execution time and possible redundancy, because one pixel may be updated many times. (Note that horizontal and vertical lines do not have such problem.) Colour mode two.does not have such a problem. Since the inter-dot spacings are also drawn, only two faces are visible and the open ends are 'covered' by the inter- dot spacings (see Figure 7.5). If fact, the line drawing algorithm currently used takes the advantage of this to double the performance: it draws two visible faces only, when in colour mode two.

This technique is applicable to draw arcs only if the texture is solid. This is due to the fact that which faces are visible depends on the quadrants. Again, to draw an arc properly, it•must be divided into segments. (see Figure 7.6). To CHAPTER 7. DISCUSSION 90

Drawn in Foreground Colour

Drawn in Background Colour

Figure 7.5: Line Drawn In Colour Mode 2 divide an arc into segments requires the calculations of the radius and centre of an arc. These calculations require addition, multiplication, division, and square root arithmetic.

A simple test was conducted to measure the performances of these two algo- rithms. The test was to draw a line from (0, 0) to (630, 500) with solid texture, and pel size of 25 by 18. For this particular line, the second method was twice as fast as the first one. This is as expected, since the first method requires the filled area to be visited several times and at each visit of a pixel, several clock cycles are required to examine its contents and update it.

Obviously, the execution times of both algorithms depend on the size of the logical pel. As will be discussed, if an optimised hardware architecture were to be used, then the execution time would be considerably reduced.

If a programmable graphics processor is available, eg. the TMS34010 GSP, then instead of updating only one pixel at each incremental point, the drawing. algorithms may update all the pixels under the logical pel. This should give some improvement in performance, since the pattern RAM parameters need not be and the algorithms need to be executed only once.

Since a frame buffer normally consists of multiple memory chips, and at each memory access all the chips are accessed, another possible improvement is to use a suitable frame buffer arrangement to provide the capability of updating multiple pixels at one memory cycle. Several existing schemes may be modified to implement it. Sproull [24) developed an "8 by 8 display" which is capable of accessing any 8 by 8 square of pixels in a single memory cycle. This scheme uses shift registers and special xilemory addressing circuitry to provide the capability CHAPTER 7. DISCUSSION 91

Figure 7.6: A Textured Arc of addressing an arbitrary area independent of word boundaries. The general rule is that the more chips used for a frame buffer, the larger the area that can be accessed in parallel. However, the cost is also considerable higher.

Fairchild Semiconductor Corp. has devised a chip set based on a reconfigurable frame-buffer architecture. The chip set is called the "Rasteriser", and can dynam­ ically reconfigure the frame buffer RAMs to the most efficient structure for each type of line or shape [23].

Some work has been done in an effort to find better internal RAM architectures to increase the performance. Whelan (42] has proposed a "rectangular area filling architecture", which is a simple scheme allowing the update of rectangular memory cells in a single memory cycle. The architecture is much the same as a conventional DRAM with a few additional circuits. One such circuit is a "banded address decoder", to provide the address range which specifies the rectangle area to be updated.

The architecture of dual-port RAMs could also be modified to implement the logical pel very-efficiently. It is not expected to require too much effort to incor­ porate a register into a dual-port RAM to store the width of the logical pel. Since the whole row is selected even when accessing a single bit, then multiple horizontal pixels can be modified simultaneously according to the contents of the register. To • CHAPTER 7. DISCUSSION 92

extend it further, intelligence may be added to the memory to modify multiple pix­ els both horizontally and vertically. This architecture may also speed up rectangle and polygon drawing speeds. For instance, drawing a rectangle would require the drawing of only one vertical line. Thus, what we may be able to achieve here is to give dual-port RAMs more intelligence by moving some graphics functions from the graphics processor. to the frame-buffer memory.

7.2.2 Filling Polygons

Introduction

Filling polygons is a non-trivial task, and many algorithms are known. They generally divide into two broad categories: scan conversion and seed fill. Scan conversion algorithms determine whether a point is inside a polygon or not. They normally proceed, in scan line order, and from the top of a polygon to the bottom of the polygon. Ordered edge list, edge fill, edge flag fill, parity check algorithms (and their variants (43]) fall into this category (44].

Seed fill algorithms generally require the outline of a closed polygon to be drawn first and then proceed from an interior point of the polygon (this point is also called the seed point). The interior point can be supplied interactively by the user, or by algorithms which can trace the outline of the polygon to find the seed point. After the seed point is given, the algorithms then search for points adjacent to the seed point. If a point has the same colour as the outline colour, then it is considered as the boundary and not drawn. If the colour is not the same as the outline colour, then the point is drawn and becomes the new seed point. The process continues until all the points have been visited. One problem associated with this approach is that if a memory read error occurs during the fill process, then leakage may occur and the entire display area may be painted. More seriously, with the ACRTC the current drawing pointer may be set to an incorrect position and the subsequent drawings will not be done at the correct screen positions or will not be done correctly. Another common problem is how to deal with isolated pixels caused by quantization (see Figure 7.7). If an isolated pixel is used as the seed point, then the fill process stops after the isolated pixel is filled, and the rest of the pixels are left unfilled. Alternatively, isolated pixels may not be filled if other pixels are filled first.

Often it is simpler to decompose a polygon into simpler geometries first, such CHAPTER 7. DISCUSSION 93

Isolated Pixel

Figure 7.7: Isolated Pixels Due to Quantization Errors as trapezoids or triangles, and then fill each one individually [45] [46] [47]. Filling trapezoids or rectangles is much simpler. However, the decomposition process can be complex and requires additional system memory space for storing the lists required by the decomposition process.

In a graphics display using an ACRTC, every access to the frame buffer must pass through the ACRTq. Thus, for such a case, the speeds of edge fill, edge flag fill, and parity check algorithms will be too slow to be practical. This is especially true in our case: because of the poor hardware support of the ACRTC, the decoding routines are rather inefficient, and the control processor does not have enough power to perform the computation required. The ordered edge list algorithm is a general algorithm which can be used on different hardware: however, it requires a considerable amount of programming effort.

A Proposed Fill Method

From the programmer's point of view, a seed fill would be ideal if the seed point could be found without much effort (and/or if the speed is not a major concern). If we attempt to fill a NAPLPS primitive by using the seed fill method, then two problems must first be solved: how to find the seed point, and how to get the correct final shape of a primitive. The following is a proposed scheme based on seed fill for filling NAPLPS primitives. The solution to obtain the correct final shape and avoid erasing the background information during the process of creatinf!: the correct final shape, is to create a textured mask in an off-screen area, and then use BITBLT operations to fill the figure. This scheme also allows the use of the outside area of a figure as the edge during the fill process so that any point on the figure can be the seed point and isolated pixels can be filled correctly. To create a mask, the following steps are required (see Figure 7.8): CHAPTER 7. DISCUSSION 94

In an off-screen area

1. Clear the area with colour 0.

2. Draw the outline of the figure with the pel size in colour 1, and find its bounding-box.

3. In colour 1, with a single pixel draw the outline of a new bounding-box which is slightly larger than the bounding-box.

4. Use a point in the area between the outline of the figure and the new bounding-box as the seed point, and using colour 1 as the edge colour, fill the area between the new bounding box and the figure in colour 2.

5. To fill the area enclos~d by the outline including the region of the outline traced by the logical pel, use any point on the outline of the figure as the seed point and colour 2 as the edge colour, fill the figure in colour 3 with solid texture. (Since colour 2 is the edge colour, and pixels painted in colour 0 and 1 exist in the enclosed area, a different colour is required. Otherwise, if colour 0 or 1 was used to paint the enclosed area, then the fill would stop when the graphics processor detects a pixel was painted in colour 0 or 1, and the remaining pixels would not be painted.) 6. Use any point on the figure as the seed point and colour 2 as the edge colour, fill the figure in colour 1 and/or 0 with the texture pattern.

During the BITBLT operation, the mask pattern will determine what is to be written back to the frame buffer:

• If the mask is colour 2, then no drawing is done.

• If the mask is colour 1, then draw a pixel in the foreground colour.

• If the mask is colour 0 and colour mode is 0 or 1, then no drawing.

• If the mask' is colour 0 and colour mode is 2, then draw a pixel in the background colour.

In fact, since only four colours are needed, the mask can be created with only two bit-planes to minimise the memory usage. If more colours are available, step 5 is not required; the textured mask can be created with another two colours which have not been used, colour 3 and 4 for instance. The extra cost is the cost of one bit-plane. Another way to increase the speed is to combine step 6 with the CHAPTER 7. DISCUSSION 95

STEP 1: Claar Screen in STEP 2: Draw Outline in Colour 1 Colour 0 and find bounding-box

Seed Point

STEP 3: Draw a Bigger STEP 4: Fill Gap in bounding-box in Colour 2 Colour 1

STEP 5: Fill Polygon STEP 6: Fill Polygon in Colour3 in Colour 1 and/or 0

Figure 7.8: A New Scheme For Filling Primitives CHAPTER 7. DISCUSSION' 96

STEP 1: DRAW OUTLINE IN STEP 2: FILL IN RESERVE RESERVE COLOUR COLOUR

STEP 3: FILL IN DRAWING STEP 4: DRAW PEL.IN COLOUR DRAWING COLOUR

Figure 7.9: Direct Fill

BITBLT operation, ie. as soon as a point on the mask is ready, its corresponding point on the display area is drawn.

The disadvantage is obviously the low graphic drawing speed, since the frame buffer needs to be accessed many times. But there are many advantages as well. The first one is that it can fill any types of polygons including those with () and self-crossing topologies (Although NAPLPS allows only polygons with enclosed areas, a legitimate polygon specified by NAPLPS codes may still result a 0-shaped polygon due to the finite resolution of the physical frame buffer.). The second advantage is that seed points can be found very easily. This algorithm could be implemented by using the ACRTC. But, because all data movements need to pass through the ACRTC, the performance would be very poor.

On the other hand, the current fill method, which fills a figure directly on the display screen with seed fill, cannot implement colour mode 0 and 1 correctly if the texture is not solid (see Figure 7.9). CHAPTER 7. DISCUSSION 97

To summarise, the ideal architecture for a high-performance NAPLPS decoder should have the characteristics of both the packaged-pixel and planar arrange­ ments. The plane-oriented characteristic is suitable for displaying text, both of the characteristics are suitable for displaying PDI primitives, and the packed-pixel arrangement is suitable for creating textured masks for filling. Optimised hard­ ware support will reduce the control processor's overhead and increase graphics execution performance.

7.3 Test Data

Generation of test data to test graphics primitive-drawing algorithms is another problem. The nature of the bugs depends on the algorithms used, and some bugs may be detected only by chance. For instance, as previously mentioned, to draw the area traced by the logical pel alone the path of an arc, the first attempt was to draw the outline of the entire enclosed area first, and then fill it. It was later discovered that if the first and last pels overlap, then the outline cannot be filled properly.

7.4 Cost

In general, several factors affect the overall cost of a product: development, support (upgrade/debugging), production, testing costs, etc. The cost considerations of the NAPLPS decoder have been divided into two major areas: the hardware­ component cost and the software cost.

7.4.1 Hardware-Component Cost

Production cost is generally directly proportional to the component count. A general rule is to use VLSI chips, whenever possible, to minimise the component count and thus the production cost. Moreover, some VLSI chips may require a considerable amount of complex interface and support circuitry, which contributes to a quite large portion of the total component count. For instance, the ACRCT requires many support functional blocks for the frame-buffer memory control and the video-attribute control, and if they are implemented by using discrete compo­ nents, the count can be high. The current prototype requires 10 I/0 buffers for CHAPTER 7. DISCUSSION 98

the frame-buffer memory control, 8 shift registers for the video-attribute control, and several miscellaneous gates. Now, Hitachi has released two support chips for the ACRTC: the HD63486 Graphic Video Attribute Controller (GVAC), 1 and the HD63485 Graphic Memory Interface Controller (GMIC). 2 These two chips can significantly simplify the interface circuitry.

A change in technologies is also a key to optimising a design and cutting down the cost. For example, the ACRTC is much more powerful than the NEC 7220, and apart from the speed aspect, programming the chip is much simpler than programming the NEC 7220 because of its high-level graphic commands. Thus, a reduction in software development cost is expected.

Hardware Cost Estimate

The total hardware cost is estimated as follows:

Glue chips: 37.50 I/0 chips: 6.90 Timer: 4.20 CPU: 13.90 CRT Controller: 69.00 Bitmap Memory: 160.00 Keyboard: 20.80 RAMDAC: 62.50 EPROMs: 12.50 RAMs: 9.70 PC Board: 13.90 Power Supply: 34.70 Case: 4.40

Total: 450.00

This figure is the one-off cost. The cost for volume production is expected to be much lower. One point to note is that the frame-buffer memory has always been a major portion of the total cost of a NAPLPS decoder, ever since alphanumeric

1See data sheet: HD63486 Graphic Video Attribute Controller (GVAC), Preliminary, Septem­ ber, 1986, Hitachi, Ltd. 2See data sheet: HD63485 Graphic Memory Interface Controller (GMIC), Preliminary, Septem­ ber, 1986, Hitachi, Ltd. CHAPTER 7. DISCUSSION 99

videotex decoders were first developed (48]. As a tradeoff, the colour resolution may be reduced to cut down the cost. Currently, the cost per bit plane is $US20.

7 .4.2 Software Cost

It is very important to use good software development tools to cut down the software-maintenance cost. (The development cost is not a major concern here.) The current software structure offers a number of advantages. First, it is portable to other environments: PCs, or different graphics processors. Second, it provides a good environment for the maintenance of the NAPLPS standard. Because the standard is so large, complex and vague in certain areas, it is unavoidable that certain clauses in the N APLPS standard can have different interpretations. Also, the standard is open to extensions: foreign character sets, photographic/speech coding, and animation are all possible future extensions. Finally, software bugs need to be fixed. Since the software was designed in a top-down fashion, changes can be made at appropriate levels without greatly affecting other unrelated levels.

Hopefully, this software will contribute partially to the future successful reali­ sation of the PAL project.

7.5 Improvements

A number of areas can be explored further to simplify future development and, more importantly, to improve the performance.

7 .5.1 Software Performance

Various techniques can be applied to improve the performances of the lexical anal­ yser and the parser. Some of the techniques were already discussed in Chapter 4: applying various existing table compression techniques to reduce the table space; combining some of the production rules into fewer rules; and hand-coding the parser. Another possible area for future work is to study the suitability and quan­ tify the performances of a few other less well-known or similar lexical analysers and parsers, such as REX (a scanner generator) and Bison (a parser), which have been claimed to have better performance than Lex and Yacc. CHAPTER 7. DISCUSSION 100

EndPoint

Unit Screen

Figure 7.10: Clipping Arc

Various existing decoding routines can be modified to improve the performance further. For example, whe~ drawing outlined primitives, if the width or height of the logical pel is zero, then only certain side faces need to be drawn to reduce the execution time. Another example is that if the line texture is solid, then there is no need to reset the pattern RAM parameters for each member in a family when drawing a outlined primitive.

7.5.2 Error Checking

NAPLPS does not specify :what action to be taken "If a coordinate specification or a drawing operation would cause the drawing point or any portion of the resulting drawing to be outside the unit screen" (section 5.3.1.1 of NAPLPS standard). The action taken is implementation-dependent. The drawing may be executed and clipped within the unit screen or ignored.

The process of checking for bounding errors is simple except for arcs. "What is required to test all the drawing primitives including characters, is simply to compute the bounding box first, and then to check if the bounding box is inside the unit screen. But, the bounding box of an arc can not be easily computed given the start, inte~ediate and end points: although the start, intermediate and end points of an arc may be inside the unit screen, portions of the arc can still be outside the unit screen (see Figure 7.10).

But, we may make an exception for arcs and utilise the ACRTC's hardware­ clipping features to deal with them. Before drawing any arc, it suffices to check if the three points are inside the unit screen first:· if they are outside then reject the CHAPTER 7. DISCUSSION 101

arc; otherwise it will be drawn. However, before it is drawn, set the option so that drawing is executed only when the drawing pointer is inside the unit screen: the graphics controller will generate an interrupt when the drawing pointer attempts to move outside the screen area. The interrupt service routine will abort the drawing command and set the drawing pointer to the start point.

On the other hand, clipping may be a better choice since this is usually the pre­ senter's intention. Most outlined primitives can be clipped by using the hardware clipping features of the ACRTC. But clipping filled primitives requires a major computational effort except for filled rectangles. To clip filled polygons and arcs, the points of intersection with the unit screen edges need to be calculated. Again, using hardware clipping may be a good choice since when an intersection occurs, the drawing pointer contains the intersection point and can be read by software. So there is no need to calculate the intersection point by software at all.

Of course, hardware clipping is hardware dependent. The clipping routine has to be modified to port the software to other graphics controllers. And if a new graphics controller does not support hardware clipping features similar to those supported by the ACRTC, then the intersecting points have to be computed by software. However, the new generation graphics controllers all tend to support hardware clipping.

7.5.3 ACRTC Register Set-up Values

The internal registers of the ACRTC need to be set according to the specifications given. These include timing specifications of the monitor to be driven, the de­ sign specifications such as the spatial and colour resolutions, the bit-map memory configuration, etc. Considerable effort has been expended so that changing most of the specifications will not require major modifications to the existing program modules. This is accomplished by assigning the parameters of the ACRTC as constants and including the constants in a header file: at run time these constants are loaded into the registers. Since certain parameters are functions of others, one possible improvement is to use functional definitions for those constants in the header file rather than pre-calculating them by hand.

It may also be possible to calculate these constants by other means. One pos­ sibility is to write a computer program to calculate these constants for a given set of specifications and include the results in the header file. If any one specification CHAPTER 7. DISCUSSION 102

is changed, the program is run, and the decoder's software is re-compiled to ac­ commodate the change. Another is to include these specifications in a header file, and let the decoder's software calculate the parameters at run time.

7.6 Use Better Graphics Controllers

The design of the ACRTC does not take advantage of fast operation modes sup­ ported by new DRAMs, and the system clock speed is constrained by the frame buffer memory. Another drawback is that the ACRTC does not support a proper read-modify-write (RMW) cycle in one memory access cycle: it requires at least two complete 'memory cycles to perform a RMW cycle. Since commencing the project, a number of manufacturers have introduced new graphics controllers, and some improvements have been designed into those chips. Some of the features may, to some degree, be more suitable for implementing NAPLPS than the existing fea­ tures of the ACRTC. For example, some provide proper RMW cycles and support the fast operation modes of the new DRAMs. Another example is the TMS 34010 GPS. Its graphics primitives are at the microprogram level, and the implementor can write his own high-level algorithms. Thus, it may be interesting to port the software to suite these new graphics chips to study the decoding performance of certain NAPLPS features which are not implemented efficiently by the ACRTC, such as fill and the logical pel. Chapter 8

Conclusion

The research was successful in achieving its objectives, but there are several areas in which it could be improved.

It has been suggested that the key factor which will determine the widespread acceptance of videotex is the availability of affordable decoders to average users. At the same time, it has been recognised that the cost of decoders will decline through volume production [7] [6]. Thus, if we distribute the software package at a nominal cost, then it is very feasible to provide a wide range of low-cost N APLPS decoders. Since the software is organised in a hierarchical structure, it can be easily distributed and ported to PCs, or other graphics controllers to improve the performance. Although the bitmap memory at present contributes one third of the total cost, it is expected to decline continuously. US and Japanese competition may further reduce the cost of DRAMs. As an interesting observation, the rate of price reduction seem~ to follow so called the 1r rule [49]. The observation is that the initial price of a new DRAM is high, but it declines with a rate much larger than $US3 a year. The rate approaches $US1 a year at a price level of about $US3, which is very close to the value of 7r. "This price level is of critical importance because it corresponds to the peak volume of DRAM shipment as well as the maximum return on investment of the DRAM project"[49]. After this point, the price will eventually settle at about $US1r /2. So in the long run, the cost of a 256K x 1-bit DRAM may be around $ US1.57, which would reduce the total cost of the NAPLPS decoder by another $US15.

The advantage of a stand-alone terminal seems to be worth the cost because of the simplicity and easy of use. However, since commencing the project, the cost of PCs has been decreasing very rapidly. The cost may soon be low enough that

103 CHAPTER 8. CONCLUSION 104

many schools can afford access to them. If users do not consider that ease of use is an important issue, then PCs may take the place of dedicated N APLPS terminals. Even if this is the case, the whole software package can still be used as a N APLPS terminal emulation program.

As shown in the thesis, theNAPLPS standard does not support user interaction and animation very effectively. The thesis has suggested some temporary and long term solutions.

There are many areas which can be explored further. One is to specify more clearly the interface between the higher levels and machine-dependent level to make the software more versatile (or portable), so that it can be widely distributed and ported without much effort.

This thesis has identified the bottlenecks in decoding the NAPLPS protocols, and has investigated several techniques which may improve the performance. If there is a commercial interest in the development of high-performance NAPLPS decoders, and we believe there is, then some of the techniques suggested may be very likely to be implemented.

Another area for further study is the performance of other graphics controllers available on the market with respect to the execution speeds of implementing the logical pel, fill, in order to find a more suitable graphics controller. Appendix A

Listings

The prologue and epilogue which were added to the input stream during the mea­ surement of character-display speed are listed below.

Prologue:

Code in octal Comments 033 ESC 130 C1 Control Set: scroll off 033 ESC llx Cl Control Set: set text size (x- depends on the text size) 016 SO: invocation of PDI set 074 PDI command: SET COLOUR 144 144 144 colour = Green 017 SI: invocation of Primary Character Set

Epilogue:

Code in octal Comments 016 SO: invocation of PDI set 074 PDI command: SET COLOUR 177 177 177 colour = white

105 APPENDIX A. LISTINGS 106

017 SI: invocation of Primary Character Set 033 ESC 127 Cl control set: scroll on 033 ESC 114 C1 control set: normal text APPENDIX A. LISTINGS 107

The code, which was inserted in the NAPLPS test frames to set the logical pel size, the line texture and texture pattern, is listed below.

Code in octal Comments 041 NAPLPS DOMAIN command 110 default settings 1xy 1xy 1xy x, y - depend on the logical pel size 043 NAPLPS TEXTURE command 1xx xx - depend on line texture and texture pattern

The small file, which was used to measure the point-drawing speed, is as follows:

Code in octal Comments 046 NAPLPS POINT (Absolute, Visible) 111 140 position= (0.375, 0.25) 100 Bibliography

[1] G. W. Gerrity, "Integrated Educational Computer Systems -The Role of Standards: An Example," in Micro Plus: Educational Peripherals, (S. Wills and R. Lewis, eds.), pp. 47-54, Amsterdam: North-Holland, 1988.

[2] V. X. Gledhill et al., "A Portable Authoring Language," in Proceedings of the 1985 Conference on Computer-Aided Learning in Tertiary Education {CALITE 85), pp. 433-43, 1-4 Dec. 1985.

[3] G. Enderle, K. Kansy, and G. Pfaff, Computer Graphics Programming GKS - The Graphics Standard. Berlin: Springer-Verlag, 1984.

[4] American National Standards Institute, Videotex/Teletext Presentation Level Protocol Syntax (North American PLPS), ANSI X3.110-1983. New York, NY: American National Standards Institute, December 1983.

[5] American National Standards Institute, Computer Graphics Virtual Device Interface, document X3H3/85-41. New York, NY: American National Stan­ dards Institute, 1985.

[6] -, "Selective Update: Future of videotex in US depends on terminal availabil­ ity," IEEE Computer Graphics and Applications, vol. 7, no. 3, p. 68, March 1987.

[7] K. Y. Chang, "Microcomputer Graphics and Applications with NAPLPS Videotex," IEEE Computer Graphics and Applications, vol. 5, no. 6, pp. 21- 33, June 1985.

[8] W. D. Fretts, "Videotex Terminal Hardware Architecture," Videotex World, vol. 1, no. 3, pp. 32-39, March 1985.

[9] G. Carasso, R. Goettsch, and N. Syrimis, "Controller chip puts text and graphics on the same bit map," Electronic Design, pp. 119-126, 27 June 1985.

108 BIBLIOGRAPHY 109

[10] H. Cai, C. D. O'Brien, and J. S. Riordan, "A Proposed Scheme for Chinese Videotex Standard," Computer Processing of Chinese & Oriental Languages, vol. 2, no. 4, pp. 181-197, October 1986.

[11] J. Fleming and W. Frezza, "N APLPS: ANew Standard for Text and Graphics, Part 1: Introduction, History and Structure," Byte, pp. 203-254, February 1983.

[12] J. Fleming, "NA~LPS: A New Standard for Text and Graphics, Part 2: Basic Features," Byte, pp. 152-185, March 1983.

[13] J. Fleming, "NAPLPS: A New Standard for Text and Graphics, Part 3: Ad­ vanced Features," Byte, pp. 190-206, April 1983.

[14] J. Fleming, "NAPLPS: A New Standard for Text and Graphics, Part 4: More Advanced Features and Conclusions," Byte, pp. 272-284, May 1983.

[15] P. Linder, R. Norwood, and N.H. Hong, "Designers Weight Options for 256k Dynamic RAM Processes," Electronics, pp. 104-107, 12 July 1984.

[16] L. S. White, Jr., G. J. Armstrong, and G. R. M. Rao, "1MB Memories De­ mand New Design Choices," Electronics Week, pp. 123-126, 23 July 1984.

[17] J. P. Altnether, "System Implications of CHMOS Dynamic RAMs," Tech. Rep. AR-311, Intel Corporation, Hillsboro, Oregon, November 1983.

[18] W. H. Righter, "Special Report on Semiconductor Memories: CMOS 256kbit RAMs Are Fast and Use Less Power," Computer Design, pp. 133-140, August 1984.

[19] J. J. Fallin and W. H. Righter, "Designing Memory Systems with the 8k X 8 iRAM," Tech. Rep. AP-132, Intel Corporation, Hillsboro, Oregon, June 1982.

[20] M. C. Whitton, "Memory Design for Raster Graphics Displays," IEEE Com­ puter Graphics and Applications, vol. 4, no. 3, pp. 48-65, March 1984.

[21] W. H. Righter, "Static Column Architecture in CHMOS Dynamic RAMs- A Graphics Memory Solution," Tech. Rep. AP-312, Intel Corporation, Hillsboro, Oregon, November 1983.

[22] A. Goris, B. Fredrickson, and H. L. Baeverstad, Jr., "A Configurable Pixel Cache for Fast Image Generation," IEEE Computer Graphics and Applica­ tions, vol. 7, no. 3, pp. 24-32, March 1987. BIBLIOGRAPHY 110

[23] S. Weber, "Technology to Watch: Fairchild Aims to Move High-end Graphics to PCs," Electronics, pp. 57-59, 23 July 1987.

[24] R. F. Sproull, I. E. Sutherland, A. Thompson, S. Gupta, and C. Minter, "The 8 by 8 Display," A CM Transactions on Graphics, val. 2, no. 1, pp. 32-56, January 1983.

[25] T. Williams, "Graphics ICs Offer New Solutions for Speed and Flexibility," Computer Design, pp. 4Q-46, 1 June 1987.

[26] B. G. Stevens, "NAPLPS Decoding in Forth," The Journal of Forth Applica­ tion and Research, vol. 3, no. 2, pp. 221-224, 1985.

[27] S. C. Johnson, "Yacc: Yet Another Compiler-Compiler," in Unix Program­ mer's Manual, ch. 19, Murray Hill, NJ: Bell Laboratories, 1979.

[28] M. E. Lesk and E. Schmidt, "Lex- A Lexical Analyzer Generator," in Unix Programmer's Manual, ch. 20, Murray Hill, NJ: Bell Laboratories, 1979.

[29] K. Groening and C. Ohsendoth, "NEMO- A Nicely Modified YACC," BIG­ PLAN Notices, val. 21, no. 4, pp. 58-66, April1986.

[30] A. M. M. Al-Hussaini and R. G. Stone, "Yet Another Storage Technique for LR Parsing Tables," Software - Practice and Experience, val. 16, no. 4, pp. 389-401, April 1986.

[31] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers, Principles, Techniques and Tools. Reading, Mass: Addison-Wesley, 1986.

[32] D. E. Knuth, The Art of Computer Programming. Vol. 1 / Seminumerical Algorithms, Reading, Mass: Addison-Wesley, 2nd. ed., 1973.

[33] J.D. Wetherington, "The Story of PLP," IEEE Journal on Selected Areas in Communications, val. SAC-1, no. 2, pp. 267-277, February 1983.

[34] W. H. Ninke, "Design Considerations of NAPLPS, the Data Syntax for VIDEOTEX and TELETEXT in North America," Proceedings of the IEEE, vol. 73, no. 4, pp. 740-753, April 1985.

[35] J. Gecsei, The Architecture of Videotex Systems. Englewood Ciffs, NJ: Prentice-Hall, 1983.

[36] C. D. O'Brien and H. G. Bown, "A Perspective on the Development of Video­ tex in North America," IEEE Journal on Selected Areas in Communications, vol. SAC-1, no. 2, pp. 260-266, February 1983. BIBLIOGRAPHY 111

[37] J. A. Meads, "The Standards Pipeline," Computer Graphics, vol. 20, no. 3, pp. 164-166, July 1986.

[38] C. Pabouctsidis, "The Coding of Graphics Animation in a Videotex Termi­ nal," IEEE Transactions on Consumer Electronics, vol. CE-30, no. 3, pp. 421- 428, August 1984.

[39] R. G. Shoup, "Color Table Animation," in SIGGRAPH '19 Proceedings, pp. 8-13, August 1979. published as Computer Graphics, vol. 13, no. 2.

[40] M. Tonomura, R. Suga, and R. Orsini, "Control of Peripheral Devices in NAPLPS Applications," in Interactive System Design, pp. 221-230, New York, NY: On-line, 1985.

[41] K. L. Mehra, "NAPLPS Technical Issues- a Progress Report," in Interactive System Design, pp. 231-237, New York, NY: On-line, 1985.

[42] D. S. Whelan, "A Rectangular Area Filling Display System Architecture," Computer Graphics, vol. 16, no. 3, pp. 147-153, July 1982.

[43] M. Chlamtac and I. Harary, "The Shift X Parity Watch Algorithm for Raster Scan Displays," IEEE Transactions on Computers, vol. C-34, no. 7, pp. 666- 673, July 1985.

[44] D. F. Rogers, Procedural Elements for Computer Graphics. New York, NY: McGraw-Hill, 1985.

[45] B. Schachter, "Decomposition of Polygons into Convex Sets," IEEE Transac­ tions on Computers, vol. C-27, no. 11, pp. 1078-1082, November 1978.

[46] T. Pavlidis, "Filling Algorithms for Raster Graphics," Computer Graphics and Image Processing, vol. 10, pp. 126-141, 1979.

[47] W. D. Little and R. Heuft, "An Area Shading Graphics Display System," IEEE Transactions on Computers, vol. C-28, no. 7, pp. 528-531, July 1979.

[48] H. G. Bown, C. D. O'Brien, W. Sawchuk, J. Storey, and R. Marsh, "Compar­ ative Terminal Realizations with Alpha-Geometric Coding," IEEE Transac­ tions on Consumer Electronics, vol. CE-26, pp. 605-617, August 1980.

(49] M. P. Lepselter and S. M. Sze, "DRAM Pricing Trends- The 1r Rule," IEEE Circuits and Devices, vol. 1, no. 1, pp. 53-54, January 1985. Addenda and Errata to

The Development of a High-Performance NAPLPS Videotex Decoder

1. On page 3, para 4, line 2, replace "PAL colour television system" with "Phase Alternation Line colour television system". 2. On page 4, para 4, line 1, replace "manufactures" with "manufacturers". 3. On page 5, para 3, line 7, replace "the decoder" with "a prototype NAPLPS decoder". 4. On page 6, para 1, line 2, replace "composite with" with "composed of'. 5. On page 7, para 2, line 7, replace "Presetel" with "Prestel". 6. On page 7, para 4, line 1, replace "it" with "its". 7. On page 8, figure 2.2, replace "swithches" with "switches". 8. On page 8, figure 2.2, replace" information units of physical link's other end" with "information units to the other end of the physical link". 9. On page 9, para 3, line 3, delete "National". 10. On page 12, figure 2.4, upper right, replace " cahracter" with "character". 11. On page 14, para 2, line 2, replace the last sentence to read "Such a format makes the task of handling the coordinates easier for integer-oriented microprocessors, as compared to floating-point representations, since conversion involves simple scaling.". 12. On page 14, para 3, append the sentence "Note that the sign bits are included only in the first byte". 13. On page I8, figure 2.12, replace "alone" with "along". 14. On page 20, para 2, line I, replace "Comformace" with "Conformance". 15. On page 20, para 2, line 2, replace "comformance" with "conformance". 16. On page 22, para 3, line 5, replace "x, y" with "(x, y)". 17. On page 23, figure 3.1, the caption should be appended with "The numbers at top and left indicate pixel coordinates.". 18. On page 26, para 3, line 1, replace "1MS4275" with " the Texas Instruments 1MS4275 Video RAM". 19. On page 29, para 4, line 3, after "Controller", insert "(ACRTC)". 20. On page 31, para 1, line 10, in front of "TMS", insert "Texas Instruments (TI)" 21. On page 37, para 1, line 2, replace "can be simultaneously displayed" with "are available for selection". 22. On page 38, para 1, line 7, delete the sentence beginning with "Taking". Replace the sentence following with "Taking into consideration both the horizontal and vertical retrace intervals, the above specification requires that a pixel be supplied to the monitor every 100 ns.". 23. On page 41, para I, line 2, replace "colour a" with "a colour". 24. On page 43, para 2, line 1, insert at the beginning the sentence "The first prototype used the NEC 7220 GDC.". 25. On page 44, para 3, line 9, delete "to" at the end of the line. 26. On page 44, para 3, line 12, replace "manufactures" with "manufacturers". 27. On page 47, para 1, line 14, replace "subsequence" with "consequence". 28. On page 47, para 2, line 2, replace "presently used" with "I currently use". 29. On page 48, para 1, line 1, replace "an" with "a". - 2-

30. On page 49, para 4, line 1, replace "will" with "will in future". 31. On page 49, para 5, replace the first sentence with "All the MPU processes are interrupt driven, with the exception of the main loop, which polls the input buffer and calls functions to generate code for the ACRTC.". 32. On page 52, para 4, line 2, insert after "state information" the words "of the NAPLPS decoder". Replace the following sentence with "This state information includes the NAPLPS code extension structure.". 33. On page 58, para 2, line 1, replace "alternation" with "alternative". 34. On page 67, para 3, line 5, replace "give" with "given". 35. On page 70, para 1, line 7, delete the word "were". 36. On page 72, para 1, line 2, replace "mean" with "means". 37. On page 72, para 1, line 3, insert after the word "interface" the words "for a CAl system". 38. On page 72, para 1, line 6, replace "specifies" with "specify". 39. On page 73, para 1, line 8, replace "off-peaks" with "off-peak", "interface" with "interaction". 40. On page 74, para 2, line 1, replace "exist number" with "exist a number". 41. On page 76, para 2, line 7, delete "to generate". 42. On page 77, para 6, line 2, replace "7.4" with "6.4". 43. On page 78, para 2, line 5, append "that" to end of line. 44. On page 78, para 3, line 15, replace "mouse button" with "button". 45. On page 79, figure 6.4, replace the caption with "Animation Box Extension". 46. On page 80, para 2, line 2, replace "box" with "animation box". 47. On page 80, para 4, line 3, replace "such way" with "such a way". 48. On page 80, last line of example, replace, "Significant" with "Significance". 49. On page 81, para 2, line 4, replace "suite" with "suit". 50. On page 91, para l,line 3, replace "considerable'' with "considerably". 51. On page 97, para I, line 2, replace "packaged-pixel" with "packed-pixel". 52. On page 97, para 2, line 4, replace "alone" with "along". 53. On page 97, para 4, line 5, replace "ACRCT" with "ACRTC". 54. On page 99, para 5, line 7, replace both instances of "parser" with "parser generator". 55. On page 102, para 2, line 5, delete the apostrophe before "memory". 56. On page 103, para 2,Iine 11, replace "so called the" with "the so-called". 57. On page 103, para 3,line 2, replace "easy" with "ease". 58. On page 110, ref [32], replace "Seminumerical'' with "Fundamental". >0079 82569