KEK Report 89-26 March 1990 D

PROCEEDINGS of SYMPOSIUM on

Data Acquisition and Processing for Next Generation Experiments

9 -10 March 1989 KEK, Tsukuba

Edited

by

H. FUJII, J. CHIBA and Y. WATASE

NATIONAL LABORATORY FOR HIGH ENERGY PHYSICS PROCEEDINGS of SYMPOSIUM on

Data Acquisition and Processing for Next Generation Experiments

9 - 10 March 1989 KEK, Tsukuba

Edited

H. Fiflii, J. Chiba andY. Watase i National Laboratory for High Energy Physics, 1990 KEK Reports are available from: Technical Infonnation&Libraiy National Laboratory for High Energy Physics 1-1 Oho, Tsukuba-shi Ibaraki-ken, 305 JAPAN Phone: 0298-64-1171 Telex: 3652-534 (Domestic) (0)3652-534 (International) Fax: 0298-64-4604 Cable: KEKOHO Foreword

This symposium has been organized to foresee the next generation of data acquisition and processing system in high energy physics and nuclear physics experiments. The recent revolutionary progress in the semiconductor and computer technologies is giving us an oppotunity to extend our idea on the experiments. The high density electronics of LSI technology provides an ideal front-end electronics such as readout circuits for silicon strip detector and multi-anode phototubes as well as wire chambers. The VLSI technology has advantages over the obsolite discrete one in the various aspects ; reduction of noise, small propagation delay, lower power dissipation, small space for the installation, improvement of the system reliability and maintenability. The small sized front-end electronics will be mounted just on the detector and the digital data might be transfered off the detector to the computer room with optical fiber data transmission lines. Then, a monster of bandies of signal cables might disappear from the experimental area.

The another topics is dramatic change of CPU power of a micro­ computer. A tiny CPU chip is giving almost the similar ability of computation with a large mainframe computer. This revolutionary change will give us large shock waves to our society and culture. It is also the case in the experiment, too. Usually we analyze the experimental data after the data taking has been finished. In the case of the TRISTAN experiment, the analysis is almost following the data taking but with a delay of a month. In the future experiments, however, all the data might be analized just after the data taking run with extremely large CPU power provided by so called computer farm. This changes style of experiments. We can run an experiment by watching the physics results on a graphic screen of the not hit wire distribution or angular distribution, but the cross section or sphericity of the collider events. In any case, a software of the system operation and the application in the data acquisition and computing is only vital treasure for the future . In this symposium we discussed current and future data acquisition system with various people from universities as well as excellent companies related to our field. I would like to conclude this symposium with a handfull of results for the future ability in our field. This symposium was partly supported by Grant-in Aid for Scientific Research (General Research B) for " High Speed Data Acquisition System with Computer Array" in the FY 1989 and 1990.

YoshiyuM Watase Computer Center KEK SYMPOSIUM PROGRAM

Thursday, 9 March

Opening address Kasuke Takahashi (KEK)

Session 1. Application of VME for experiments Chairman: Junsei Chiba (KEK)

Online data taking in KEK Hirofumi Fujii (KEK) Frontend electronics Tokio K. Ohsuka (KEK) Activity reports - High power VME system Takesi Murakami (KEK) - Data transfer to mainframe Eiji Inoue (KEK) - Mass storage with SCSI Hideo Kodama (KEK) - Mac-II interface for VME Susumu Inaba (KEK) Discussions

Lunch

Session 2. VME in the industry Chairman; Ryugo Hayano (Univ. Tokyo)

VME and parallel computers in US J. Fiddler (Wind River System) VME and software support Ryuji Mutou (Internic)

Coffee Break

Session 3. Examples of data acquisition with VME Chairman: Ichiro Arai (Univ. Tsukuba)

Data acquisition in FANCY spectrometer at PS Atsushi Manabe (U. Tsukuba) VME data acquisition in balloon experiment Tadahisa Tamura (U. Tokyo) Session 4. Intelligent trigger system for large experiments Chairman: Kouji Ueno (Rochester U.)

Trigger systen of TOPAZ detector Masanori Yamauchi (KEK) Trigger system with processor array Hiroshi Sakamoto (KEK) Upgrade of AMY CDC trigger Sergei Lusin(U. South Carolina)

-Party-

Friday, 10 March

Session 5. Future and data transmission Chairman: Hiroshi Ogata (RCNP, Osaka Univ.)

Bus system in future Masaharu Nomachi (EEK) Data transfer with TRANSPUTER link Yasushi Nagasaka (U. Tsukuba) Data transmission with optical fiber Susumu Inaba (KEK) Advanced technologies in optoelectronics Tsuyoshi Sugawa (Sumitomo )

-Coffee Break-

Session 6. TRON Chairman: Shinkichi Shibata (KEK)

TRON Project Ken Sakamura (U. Tokyo) ITRON Nobuhiko Nishio (U. Tokyo) Parallel computer with TRON Tadayuki Takahashi (U. Tokyo)

—Lunch-

Session 7. Parallel computer system Chairman: Hajime Yoshida (Fukui U.)

Operating system for parallel computer Kazuya Tago (U. Tokyo) QCD-PAX system Tomonori Shirakawa (U. Tsukuba) CAP system Hiroaki Ishihata (Fujitsu) ACP and ACP-H Yosbiji Yasu (KEK) Data acquisition using parallel processor Ryosuke Itoh (KEK) Application of IDRIS Eichiro Hagiwara (U. Tsukuba)

Coffee Break

Session 8. Data acquisition in SSC Chairman: Shigeki Mori (Univ. Tsukuba)

Report on SSC workshops Yoshihide Sakai (KEK) Data acquisition in SSC experiments Yasuo Arai (KEK)

Closing remarks Yoshiyuki Watase (KEK) TABLE of CONTENTS

1. Application of VME for experiments

The Data Acquisition System at KEK in Near Future Hirofumi FUJII (KEK) 1

VME as a Frontend Electronics System in HEP Tokio K Ohsuka (KEK) 7

High Power VME System Takeshi Murakami (KEK) 23

Mass Storage with SCSI (in Japanese) Hideo Kodama (KEK) 29

Macintosh-H / CAMAC / VME Susumu Inaba (KEK) 35

2. VME in the industry

Real-Time Market and VxWorks

Jerry Fiddler (Wind River Systems) 44

3. Examples of data acquisition system with VME

Data Taking System for the FANCY Atsushi Manabe (Univ. Tsukuba) 84 Balloon-borne Experiment Tadahisa Tamura (Univ. Tokyo) 92

4. Intelligent trigger system for large experiments

Development of Second Level Trigger System Based on a Microprocessor Array Hiroshi Sakamoto (KEK) 100

Upgrade of the AMY Trigger System Sergei Lusin (Univ. South Carolina) 115 5. Future bus and data transmission

Bus System and Data Acquisition Architecture Masaharu Nomachi (KEK) 125

Data Transfer with TRANSPUTER Link Yasusi Nagasaka (Univ. Tsukuba) 131

Fiber Optic Data Link Susumu Inaba (KEK) 137

Optoelectronics Technology for Communication System Tsuyoshi Sugawa (Sumitomo Electronics)... 149

6.TRON

ITKON: Industrial-The Realtime Nucleus

Nobuhiko Nishio (Univ. Tokyo) 155

7. Parallel computer system

Hardware and Software Architecture for General Purpose Parallel Processor Kazuya Tago (Univ. Tokyo) 163

Cellular Array Processor CAP Hiroaki Ishihata, Hiroyuki Sato Morio Ikesaka, Kouichi Murakami, and Mitsuo Ishii (Fujitsu Laboratories Ltd.) 167

Advanced Computer Program(ACP) system at KEK Yoshiji Yasu (KEK) 183

TOPAZ Data Pre-Processor Ryosuke Itoh (KEK) 201

- u - 8. Data acquisition in SSC

Report of Workshop on Triggering and DAQ for Experiment at SSC Yoshihide Sakai (KEK) 206

Data Acquisition System at the SSC Yasuo Arai (KEK) 217

-in- The data aquisition system at KEK in near future

Hirofumi Fujii (KEK Online group) March 9, 1989

1 Introduction

We have two main accelerators at KEK, e+e- colliding machine (TRISTAN) and proton-synchrotron (PS). Although the experiments are diffrent in each other, the concepts axe the same. The current situation of the main parts of the data acqusition system are as follows:

• Front-end electronics, which digitize the signals from the detectors. The sig­ nals are collected in some module units, and processed suitable for computer system. In TRISTAN experiments, we are using FASTBUS, TKO (TRIS­ TAN Online, KEK) and CAMAC modules. In PS experiments, we are using TKO and CAMAC systems.

• Front-end computer, which reads digitized signals from the front-end elec­ tronics and builds event data. Some kind of event selections may be done at this level. In TRISTAN experiments, the front-end computers are VAX- ll/780s, and in PS experiments these are micro-VAXes. • Data storage, which stores all the raw data and parameters necessary in of­ fline analysis. In TRISTAN experiments the storage is casette tape library (CTL) which is controlled by Fujitsu main frame, FACOM. For the data transfer between VAX and FACOM, we are using DACU (IBM-channel to VAX interface) and optically linked remote channel interface. In PS experi­ ments, the data are stored on 6250bpi magnetic tapes.

2 Current Problems

There are following problems for the current data aquisition system. In PS exper­ iments,

— 1 — • micorVAX is not fast enough for handling. • The capacity of the 6250bpi magnetic tape is too small.

And in TRISTAN experiments,

• VAX 11/780 is not fast enough for data processing. We need more CPU power to improve the signal/noise ratio.

• The data transfer speed between VAX and FACOM is not good enough (200KBytes/sec). Each expriment has a plan to add new detectors and we need much more speed if we want to add these signals.

Therefore, we are planning

• We will use tape library not only for TRISTAN but also in PS experiments. • The data transfer speed between front-end computer and tape library should be more than 400 Kbytes/sec, i.e., the speed of the 6250bpi tape on VAX. • We will use parallel processor to improve the ability of pre-analysis and event elections.

3 Data Transfer System

If we plan to separete the storage and front-end computer, we have to consider the data transfer system between them. We are planing to use VME-IBMchannel interface which is commercially available. This interface emulates a magnetic tape on main frame. The interface is also user controllable. We have already tested it between VME/68020 and FACOM, and observed that the transfer speed is 600700 KBytes/sec. To replace the current system, we need VME-VAX interface. These are also commercially available (VME-DR11 interface). The transfer capability of the interface is quite enough for this purpose. Therefore, it is not necessary to develop a new hardware. The main problem is the software. Because the hardware development is so fast, it is quite important to build a device-independent software interface. At present, we are developing user inter­ faces which can be easily replaced by the current program. The interface looks like DACU interface at user level.

— 2 — 4 Front-end processor

We are planning to use VME system to collect and process the data. The cost performance of the CPU/VME module is very high. However, it is rather difficult to write programs especially for interrupt handling. We have already developed a CAMAC interface which can handle many CAMAC instruction at once. The CAMAC driver itself has memories and program counters so that user can control the CAMAC action without interruping the VAX. In this method, we get enough speed to get data for PS experiments. The same mathod can be applied to the VME systems. The VME system can be controlled through some comunication line. The user can send a maessage or the list for data aquisition without the detail knowledge about the VME system. In this way, we can collect/monitor /analyse the data on any system.

5 Mass-storage

We think that the main mass storage is the tape library. However, recently, even a small experiment requires a GByte order storage. In this case, SCSI devices are good for this purpose. For example, at present time, the following SCSI interfaces are comercially available,

• Magnetic disks

• Optical disks • Magneto-optical disks • 8mm Video tapes • Digital audio tapes

There also available VME-SCSI interface. We have already tested these devices and a part of them are used for non-accelarator experiments.

6 Conclusion

We are planning to develop a data aquisition system using VME systems. The mass storage will be prepared as a tape library and the user can write the data on them from the front-end computer more than 500 Kbytes/sec. We are also developing a software interface. The user can communicate and control these data aquisition system without knowing the VME system itself.

— 3 — DETECTOR DA-SYSTEM STORAGE

EVENT SELECTION DATA FORMATTING DATA MONITORING DATA TRANSFER DETECTOR MONITOR

DETECTOR DA-SYSTEM STORAGE

FASUBUS VAX11/780 FACOM VHS TAPE 4 Mbytes/sec 200 Kbytes/sec

CAMAC/TKO /LtVAX MT/8mmVIDE0 1.2 Mbytes/sec 200-400 Kbytes/sec I

( HITAC ) \7\ A7\ 17\ DETECTOR VJXLCJ VM£i VJVLCi STORAGE

Ol

1' 1 ' '

VAX and/or

WORK STATION 171 VMIi

DETECTOR VME STORAGE

VME

1 '

VAX and/or

WORK STATION

•\n 171 171 DETECTOR VJULEi VMJEJ VJUfi STORAGE

' ' 1 1

VAX and/or

WORK STATION VME AS A FRONT-END ELECTRONICS SYSTEM IN HIGH ENERGY PHYSICS EXPERIMENTS

Tokio K. Ohska, KEK Electronics Group March. 9, 1989

I would like to talk about what I think of a VME system for the front-end elec­ tronics in High Energy Physics experiments. One may ask why use a VME system when we already have CAMAC, FASTBUS and so forth?

Table 1 shows a number of VME-related products commercially available in Japan as of April 1, 1988. Table 2 summarizes number of commercially available prod­ ucts related to CAMAC, FASTBUS, NIM and TKO available from major vendors for High Energy Physics field in Japan. We notice that the number of available VME products is overwhelmingly large when compared with any other system in the Table 2.

It is only a few years after the VME became a standard system, yet the VME system is already so much more popular than other systems. Think about the CAMAC system. They officially started almost 20 years ago and never became as popular as the VME system. The reason for the growth of the VME becomes obvious when one takes a look at the type of VME products listed in Table 3. The VME system was developed for the industrial applications and not for the scientific research for which the CAMAC, FASTBUS and others were developed. Major difference is the size of the market. Hence we can expect the VME prod­ ucts to be much less expensive and there will be many VME-related manufacturers.

The situation is somewhat similar to the personal computers. Computers never became as inexpensive as today until the personal computers became available. The magic word was the size of the market. For a huge market where manufactur­ ers can sell millions and millions of them, anything is possible. What we should keep in mind is that high energy physics field is a tiny market when compared with industrial markets.

— 7 — When we take a look at the Table 2, there is practically no products for data acquisition in VME format. However, we can use most of the products if the data from detectors are already compressed through front-end electronics so that the resultant data looks the same as any data in the industrial applications. From the Table 1, we can expect that there already are quite a few VME modules that can be used for data memory and for data processing in a rear-end of a data acquisition system. How about a possibility to use VME system for a front-end electronics? We must fist understand what is the requirement for a front-end electronics before going to see the adequacy of the VME system for it.

I shall list the requirements for front-end electronics and then see how well can a VME module do in each aspect.

(l)REAL ESTATE

A front-end electronics for a data acquisition system is coupled to detectors and analog signals are sent in from them. The input signals are likely to be small signals that require a tender loving care. There will be a lot of channels to take care as well. Figure 1 as well as Table 3 shows board sizes of various standard modules including VME.

The size of a VME board is astonishingly small compared with even a CAMAC which we are having a hard time to fit connectors to feed in a moderate number of input signals. Knowing that the number of channels per experiment is really growing to be large, using a VME module for a front-end is going to be an uphill battle just to feed in detector signals. For an example, DO experiment at Fermilab is using non- standard board size (see Figure 1) in an effort to use a VME system as their front-end system. They judged that the great varieties and the low cost of commercially available VME modules is worth such effort. However, non-standard is non-standard. It only tells you that they also think the VME board size is too small.

When we designed the RABBIT system for the CDF experiment in Fermilab, we did survey over many data acquisition modules that are commonly used. The re­ sult indicated that nominal board area required for a front-end circuit was approx­ imately 6 square inches or 40 square centimeters per channel*function including nominal protocol section regardless of basic board size. Here, the special unit, channel^function means that number of functions each circuit performs ( amplifi­ cation, charge-to-amplitude conversion, time-to-amplitude conversion, amplitude- to-digital conversion etc.) has to be counted for when the required real estate is to be estimated. If a board size is small, number of channels contained is less and the overhead of protocol area is large. On the contrary, if the board size is too large,

— 8 — per channel cost is reduced, but it will get handling problems (warped board after soldering, thermal expansions of a board, etc.) and a reliability problem. Here, the reliability problems means that a failure in one channel Villa a module in operation so that when a number of channels per board gets to be too large, the failure rate becomes too large to do a reasonable operation.

Table 4 compares ratio of the area required for protocol and the total board area in CAMAC, TKO and FASTBUS modules designed by a same circuit designer. It is only an example but seems to indicate the different characters of the three systems. Here, the CAMAC system contains 11.5% of protocol area which is quite reasonable for its size. Both TKO board required less than 4% of the total area and most of the board area can be used for anlog data processing. The overhead is quite small.

I should like to emphasize that what the TKO system can do is no less than that of CAMAC. In fact a TKO system should perform better than CAMAC. It is not only the board size, but also the way protocol is designed that economizes the occupied area for the protocol.

On the contrary, PASTBUS is the most powerful one in terms of what it can do about DIGITAL data. It is the rear-end function that a FASTBUS system is good at. Then the overhead for the protocol is large as indicated in Table 4.

The designers of the RABBIT system by then judged that 32 channels per board is a reasonable maximum when each circuit is simple, and 16 channels per board (or 32 channel*function's) is also a reasonable maximum when the required circuit per channel is moderately complex. Thus the RABBIT board size was determined to be close to 200 square inches, or 130 square centimeters. History has proven that the choice was correct. In fact the DO-quasi-VME, FASTBUS and TKO have similar board areas.

Obviously the VME is not the most desirable in this respect. Figure 2 shows a double-size standard VME board in VME specification. One can see the area for the connectors appears to be big. It tells you that if one wants to put as many channels as possible by utilizing all the area for connectors, there is not much left for electronics. If one subtracts area for protocol circuits, available analog circuit area could be so small, one could possibly fit a few channel*nmctions per board.

One may argue that the electronic components will get smaller in the future and one can also use custom-made monolithic ICs or hybrid-ICs. However, the situa­ tion would not improve as much as one would hope, since the cooling problem will start to show.

— 9 — Table 5 shows comparison between several data acquisition modules. There are modules that contain as much as 130 channel*functions (units) per module. It is achieved using monolithic ICs so that above arguments seems to hold. But when one take a look at the processing speed per module, their performance is not im­ pressive. In fact, one must also consider the speed and power ratio per board as a guide line. Then the number of units (channel*functions) per board is not so great when one requires performance as well. Only if one can put so much power and/or require slower speed, one can achieve high density packing.

The high power consumption will also give a limit to the number of units (chan- nel*functions) per slot. If one wants reasonable mean-time-before-failure (MTBF) to each unit, one can not put too much power unless one has a great cooling system.

It then would become too expensive or impossible too cool when too much power is dissipated. The spacing between slots are larger in VME. This only makes the number of channels per crate to be less in the VME system and do not help too much for cooling. It is because that there is an optimum spacing between mod­ ules that gives optimum cooling. If the space is much wider, the cooling air flow becomes a laminar flow so that the cooling air does not pick up as much heat as one expects.

It is interesting to note that there is a recognition of this board size problem of VME inside the VME community itself. They are proposing VXI system that has much larger board size (close to that of TKO and others). They even claim that VXI system is for the instrumentation, practically declaring by themselves that the VME is not suitable for instrumentation.

In Proceedings of an International Conference (1988) on VMEbus In Research, many authors states the same arguments as above that the VME is not for instru­ mentation.

(2) Since the input to the front-end electronics is an analog signal and we ex­ pect the output to be a digital data, there must be an analog-to-digital conversion circuit inside a front-end electronics. Table 6 lists common ADCs (integrated cir­ cuits) that axe likely to be used in front-end modules. Table 7 shows a VME bus that shows what kind of voltage is available to power the module. The TrmxiTniiTTi voltage the VME can give is plus and minus 12 volts. Most ADCs require plus and minus 15 volts to power them. None of them were designed to use 12 volts except 3 of them claim that they could function with plus and minus 12 volts while they were designed for plus and minus 15 volts. This is a serious fault of the VME system.

— 10 — The requirement of ADC power supplies for the plus and minus 15 volts instead of plus and minus 12 volts stems from an optimization between a good linearity and a power requirement in ADC design. The designers for ADCs would have gone for even higher voltages ( say plus and minus 24 volts ) if they do not have to worry about the power problems but wants highest possible linearity and/or resolution. One can see the same in Table 8 where commonly used modules that contains ADC are listed. Clearly, designers do not like to use plus and minus 12 volt lines it they could choose.

I suppose that the VME bus power was chosen because the originators did not expect the users of the system to require as high a resolution as users in the sci­ entific fields do. I even suspect that the VME was designed mainly for digital applications only.

The VXI system talks about the applications for data acquisition. Table 9 shows the VXI bus. However the power specification is only better than VME in having other power supply voltages. Table 10 shows the difference between the VME and the VXI. I do not consider the choices of 24 volts in VXI is good either. One may argue that one could use voltage regulator to generate the plus and minus 15 volts from the plus and minus 24 volts power supply on the VXI bus. I would think that the loss of 9 volts in each regulator is far from reasonable in power point of view when other modules are already hogging for power and there will be cooling problems even without for the ADC power supplies.

(3) The industrial application generally assumes that when the system is set up to work and is proven to work, they would not be modified for the duration of use. They do not expect the connections between modules to be changed afterwards. Thus the VME modules are secured by screws to a crate showing that they are to be left there. On the other hand, CAMAC, for example, has hand screws to secure the module in the crate so that replacing modules are easy to do. Situation is similar in FASTBUS, RABBIT, TKO and NIM. It tell you that there will be much more changes in configurations during an experiment period than in indus­ trial setups.

In all, I conclude that the VME system would be a good one for a rear-end system, but would not be a good candidate for a front-end electronics in physics experi­ ments. I might add that I would worry more about the VXI bus to become popular in this field of instrumentation since the VXI system is backed up by major sup­ pliers of instrumentation in the high energy physics field. (HP, Tektronics and others). I feel that the VXI is not an adequate system for front-end electronics, yet advertised to be one. It would be worse to see the VXI system to become a

— 11 — standard system for high energy physics instrumentation than the VME system to be one. This is because that I think the VME system to have a good potential for the rear-end system and that the VME system will never be good to be a front-end system. It is the VXI system that would do a mediocre job so that people might be misled to think that the VXI system can be used for the front-end system.

— 12 — CPU/MPO 176 MEMORY 141 I/O 302 PERIPHERAL 138 ADD-ON MEMORY UNIT 66 ,UNIVERSAL BOARD.. .315 CRATE,POWER SUPPLY 259 SYSTEM 81 SOFTWARE 163 VMEBUS SUPPORT CHIP 3 TOTAL 1647 Table 1 from VME products in Japan

MFR FASTBUS CAMAC NIM VME TKO LeCroy 8 51 15 Kinetic 16 112 (1) DrStruck 25 2 9 24 Phillips 4 9 29 AMSC 23 19 21 169 10 Rin-ei 3 6 6 7 Kaizu 46 60 5 Total 79 245 140 193 22 Table 2 Number of products available in Japan from major suppliers to KEK (Note:There are many other manufactures)

— 13 — SYSTEM HEIGHT DEPTH AREA RELATIVE SIZE (cm) (cm) (cm sq.) (CAMAC=1) VME-S 10.0 16.0 160.0 0.26 VME-W 23.335 16.0 373.4 0.61 VME-T(FermiDO) 36.67 28.0 1026.8 1.68 FASTBUS 36.67 40.0 1466.8 2.40 VXI 36.67 34.0 1246.8 2.04 TKO/RABBIT 42.863 31.75 1360.9 2.23 VERSABUS-W 36.83 23.5 865.3 1.42 30.48 17.145 522.6 0.86 Q-BUS-D 26.34 21.4 563.7 0.92 S100 BUS 25.4 13.78 350.0 0.57 STD BUS 11.379 16.51 188.1 0.31 CAMAC 20.0 30.5 610.0 (1) Table 3 Board size of various systems

CAMAC 4 channel FADC module 11.5% (70 sq.cm) TKO 8 channel FADC module 3.8% (52 sq.cm) TKO 32 channel TDC module 3.5% (48 sq.cm) FASTBUS SCAN ADC module 39.3% (576 sq.cm)

Table 4 Proportion of protocol area (%) and are for the protocol section (in bracket)

— 14 — s MODEL FUNCTION UNIT PWR IOD A/U P/U P*IOD/U

F 1875 64CH. 12BIT TDC 130U 50 177m 1.7 0.38 67000 F 1879 96CH. PLINE TDC 96U 72 5.2m 2.4 0.75 3900 F 1882F 96CH.CHARGE ADC 98U 38 0. 25m 2.3 0.38 97 c 2249A 12CH.CHARGE ADC 24U 7.5 0.06m 3.9 0.31 19 c 2249W 12CH.CHARGE ADC 24U 11 0.06m 3.9 0.44 26 C 4291B 32CH.TDC 32U 10 0.42m 3.0 0.32 134 C 4300B 16CH. ADC 1BU 31 4.8/i 5.9 0.32 9.6 C 4413 16CH. UPDT DISCR. 16U 34 18n 5.9 2. 1 0.038 C 4415A 16CH.NUPDT DISCR. 16U 34 22n 5.9 2.1 0.046 C 4504 4CH. FLASH ADC 8U 16 15n 12 2.0 0.030 T RPT04 1 32CH.TDC 34U 16 0.38m 7.4 0.48 184 C RPCOBO 8CH. HR TDC 9U 1.6 TOfi 11 1.6 110

UNIT:NUMBER OF UNIT FUNCTIONS PER BOARD PWR :TOTAL POWER CONSUMPTION PER BOARD IOD :INPUT-TO-OUTPUT DELAY A/U :BOARD AREA / NUMBER OF UNIT FUNTIONS (SQUARE INCHES) P/U :POWER CONSUMPTION PER UNIT FUNCTION P*IOD/U: P/U TIMES IOD (MICROSEC*WATTS)

Table 5 POWER REQUIREMENT FOR ADC ICs (CURRRENT IN mA)

MODEL + 15V + 5V -5.2V -15V ± 12V?

ADC10HT 15 16 30 NO ADC71 15 10 18 NO ADC72 15 10 18 NO ADC76 14 10 17 NO ADC80 5 11 21 91 YES ADC82 20 80 20 NO ADC84 20 10 25 @2 YES ADC574 11 7 21 @2 YES AD368AD/SD 15 20 30 @3 NO AD376 30 55 23 NO AD578 3 100 22 S3 NO AD7572 12 135 94 NO AD7582 7.5 0. 1 7.5 84 NO AD ADC84 25 100 30 15 NO ADC1130 40 250 60 94 NO AK! 140 25 150 25 @6 NO HAS1201 55 195 35 65 @4 NO CAV0920 198 86 1800 260 84 NO CAV1040 375 25 2500 200 94 NO CAV1210 185 120 2100 240 84 NO CAV1220 174 174 2780 157 @4 NO ICL7109 0.7 0.7 @4 -

14.5V~15.5V FOR 15V, -14.5V 15.5V FOR -15V UNLESS OTHERWISE NOTED.

SI: 11.5V~16.5V FOR 15V, -11. 5V 16. 5V FOR -15V @2:11.4V~16.5V FOR 15V, -11.4V 16.5V FOR -15V S3: 13. 5V— 16.5V FOR 15V, -13. 5V~ - 16. 5V FOR -15V 94:14.25V-15.75V FOR 15V, -14.25V~15.75V FOR -15V 95:14V~16V FOR 15V, -14V 16V FOR -15V 96: 14.55V~ 15.45V FOR 15V, -14. 55V~-i5. 45V FOR -15V (FROM DATABOOKS OF BURR-BROWN, ANALOG DEVICES, MAXIM PRODUCTS) 12V?: MEANS IF ±12V CAN BE USED INSTEAD OF ± 15V. POWER REQUIREMENT FOR ADC ICs (CURRRENT IN mA)

Table 6

— 16 — 7.6 VMEbus BACKPLANE CONNECTORS AND BOARD CONNECTORS

7.6.1 Pin Assignments For J1/P1 Connectors 7.6.2 Pin Assignments For The J2/P2 Connector Table 7-1 provides signal names for the J1/P1 connector pins. (The connee consists of three rows of pins labeled rows a, b, and c.) Table 7-2 provides signal names for the J2/P2 connector pins. (The connector consists of three rows of pins labeled rows a, b, and c.)

Table 7-1. J1/P1 Pin Assignments Table 7-2. J2/P2 pin assignments ROWa ROWb ROWc PIN SIGNAL SIGNAL SIGNAL ROWa ROWb ROWc NUMBER MNEMONIC MNEMONIC MNEMONIC PIN SIGNAL SIGNAL SIGNAL NUMBER MNEMONIC MNEMONIC MNEMONIC

1 D00 BBSY* D08 2 D01 BCLR* D09 1 User Defined +5V User Defined 3 D02 ACFAIL* D10 2 User Defined GND User Defined •71 4 D03 BGOIN* D11 User Defined RESERVED User Defined 5 D04 BGOOUT* D12 4 User Defined A24 User Defined 6 D05 BG1IN* D13 5 User Defined A25 User Defined 7 D06 BG10UT* D14 6 User Defined A26 User Defined 8 D07 BG2IN* D15 7 User Defined A27 User Defined 9 GND BG20UT* GND 8 User Defined A28 User Defined 10 SYSCLK G3IN* SYSFAIL* 9 User Defined A29 User Defined 11 GND BG30UT* BERR* 10 User Defined A30 User Defined 12 DS1* BRO* SYSRESET* 11 User Defined A31 User Defined 13 DSO* BR1* LWORD* 12 User Defined GND User Defined 14 WRITE* BR2* AM5 13 User Defined +5V User Defined 15 GND BR3* A23 14 User Defined D16 User Defined 16 DTACK' AMO A22 15 User Defined D17 User Defined 17 GND AM1 A21 16 User Defined D1B User Defined 18 AS* AM2 A20 17 User Defined D19 User Defined 19 GND AM3 A19 18 User Defined D20 User Defined 20 IACK* GND A18 19 User Defined D21 User Defined 21 IACKIN* SERCLK(1) A17 20 User Defined D22 User Defined 22 IACKOUT* SERDAT*(1) A16 21 User Defined D23 User Defined 23 AM4 GND A15 22 User Defined GND User Defined 24 A07 IRQ7' A14 23 User Defined D24 User Defined 25 A06 IRQ6' A13 24 User Defined D25 User Defined 26 A05 IRQ5' A12 25 User Defined D26 User Defined 27 A04 IRQ4' A11 26 User Defined D27 User Defined 28 A03 IRQ3* A10 27 User Defined D28 User Defined 29 A02 IRQ2' A09 28 User Defined D29 User Defined 30 A01 IRQ1* A08 29 User Defined D30 User Defined 31 -12V +5VSTDBY +12V 30 User Defined D31 User Defined 32 +5V +5V +5V 31 User Defined GND User Defined 32 User Defined +5V User Defined Note: (1): See Appendix C for further information on the use of these signals. Table 8

POWER REQUIREMENTS FOR COMMERCIALLY AVAILABLE MODULES

MODULE 24 15 12 6 5 -2 -5 -6 -12 -15 -24

F 1879 PIPELINE TDC * * * * * F 1882 CHARGE ADC * * * * * C 2228A TDC * * C 2229 TDC * * C 2249A/W CHARGE ADC * * C 2259B PK SENC'G ADC * * C 2262 WVFORM DGTZER * * * C 4291B 32CH TDC t C 4300B 16INPUT ADC * C 3512 BFFR'D SPEC. ADC * C 4413 UPDT'G DSCR' R * * C 4415 NONUPDT'G DSCR'R * * C 4504 FLASH ADC * * N 365AL MJRTY LOGIC UNIT * * N 428F LINEAR FANIN/OUT * * * N 429A LOGIC FANIN/OUT * * N 465 4FLD LOGOC UNIT * * * N 612AM PMT AMPLF" R * * * N 622 LOGIC UNIT * * * N 6'.!3B OCTAL DSCR" R * * * N 82! QUAD DSCR" R * * * N 4608 OCTAL DSCR'R * • * N 688AL LEVEL ADAPTER * * * N 4616 ECL/NIM/ECL CVTR * T RPT041 32CH DRIFT TDC * +8 * * -8 * C RPC060 8CH HR TDC * * * * C 7106 16CH DSCR'T * * * * C 7145 LINEAR GATE/MUX * * * * C 7176 PMT PREAMP"R * * C 7194 QUAD GATE/DELAY * * * * F 10C2 32CH BUFFR*D ADC 1 * * * *

— 18 — PIN ROWa ROWb ROWc PIN SIGNAL SIGNAL SIGNAL NUMBER NUMBER MNEMONIC MNEMONIC MNEMONIC 1 ECLTRG2 +24V + 12V 1 2 GND -24V -12V 2 3 ECLTRG3 GND RSV4 3 4 -2V RSV5 +5V 4 5 ECLTRG4 -5.2V RSV6 5 6 GND RSV7 GND 6 7 ECLTRG5 +5V -5.2V 7 8 -2V GND GND 8 9 STARY12+ +5V STARX01+ 9 10 STARY12- STARY01- STARX01- 10 U STARX12+ STARX12- STARY01+ 11 2 STARYU+ GND STARX02+ 12 i3 STARY11- STARY02- STARX02- 13 14 STARX11+ STARX11- STARY02+ 14 15 STARY10+ +5V STARX03+ 15 16 STARY10- STARY03- STARX03- 16 17 STARX10+ STARX10- STARY03+ 17 18 STARY09+ -2V STARX04+ 18 19 STARY09- STARY04- STARX04- 19 20 STARX09+ STARX09- STARY04+ 20 21 STARY08+ GND STARX05+ 21 22 STARY08- STARY05- STARX05- 22 23 STARX08+ STARX08- STARY05+ 23 24 STARY07+ +5V STARX06+ 24 25 STARY07- STARY06- STARX06- 25 26 STARX07+ STARX07- STARY06+ 26 27 GND GND GND 27 28 STARX+ -5.2V STARY+ 28 29 STARX- GND STARY- 29 30 GND -52V -52V 30 -2V SYNC100+ 31 31 CLK100+ 32 32 CLK100- GND SYNC100- 1

Table 9 J3 connector/Pin assignment of VXI bus

- 19 — Table 10

VME POWER SUPPLY

NUMBER OF PINS ON BACKPLANE CONNECTORS (PER SLOT)

VME V X 1

PI P2 P1+P2 PI P2 P3 P1+P2+P3 PWR MAX

GND 8 4 12 8 18 14 40 + 5V 4 3 7 4 4 5 13 1365 + 12V 1 1 1 1 2 504 -12V 1 1 1 1 2 504 -2V 2 4 6 252 -5. 2V 5 5 10 1092 + 24V. 1 1 2 1008 -24V 1 1 1 2 1008

MAXIMUM SUPPLIABLE POWER 5733

PWR MAX : SUPPLIABLE? POWER ASSUMING 21 SLOTS, 1 AMP/PIN.

— 20 — Figure 1 Board Sizes Compared

IZ_ 400MM ^ TKO/RflBBIT

i/ 305MM .\

/ 280MM \

1B0MM \ in 2: r- f OJ ~/1 CO VME-W 06 CD cu CO •st- to CRMRC I i LU u

IS Ln Q VME-S Z) I IO m o H ro en h- cc i s (/) H z: DC s LU cr X cc LU > u \I ^L LJl

— 21 — Figure 2 Double-size VME board

MECHANICAL SPECIFICATIONS

0.012)

2.7 ±0.1 DIAMETER 25(0.09») (0.106*0.004)

NOTES: 1. All dimensions are shown in millimeters. Inch dimensions are shown in parentheses. 2. These grids are provided to help the board designer to align components with the front panel grid. 3. RULE 7.33: Boards MUST be 1.6 ±0.2 mm (0.063 ±0.008 inch) thick in the guide area.

— 22 - HIGH POWER VME SYSTEM KEK T.MURAKAMI March 9,1989

1 Introduction

More and more experiments are requiring ever higher resolving power and/or higher processing speed in many data acquisition systems for high energy physics experiments. Number of channels they have to deal with are also getting very large. As a consequence, required power for data acquisition electronics have in­ creased dramatically. Examples of such modules are listed in Table 1. While the power consumption of CAMAC and VME modules listed in Table 1 is less than 40 watts per module and that of TKO and FASTBUS is much more, ( 83 watts for FASTBUS 32ch ADC), the former group consumes twice as much power per unit surface area as the latter group. This is an expected consequence of having a smaller board area of VME and CAMAC system. We all seem to expect each mod­ ule to do a certain minimum function. The area occupied by components amounts to 60% of total board area in TKO Super Controller Head module (SCH) and to 83% in VME Memory Partner module (MP). Such high packing density results high power density and cooling of such modules becomes an uneasy task. Despite of all these, we expect the popularity of such electronics systems with very high packing density. We have been studying various aspect of high power systems so that we can cope with them when we will use them. The studies include cooling, routing power, and resistance against fire. Some of the results are shown below.

2 Forced Air Cooling

Figure 1 shows an ambient temperature inside a crate as a function of power con­ sumption per module when only a natural convection cooling is used. The mea­ surement was done on a TKO system filled with a special module (HEB ACHLHigh Intensity Basic Cooling-Heating Instrument) which consumes specified power (ad­ justable). The ambient temperature exceeds 70 degrees centigrade when power consumption per module exceeds 14 watts. Knowing that most components must be operated under 70 degrees centigrade ambient, it is clear that the natural

— 23 — convection cooling is totally inadequate for high power applications. We have de­ veloped a forced air cooling unit for TKO systems. Figure 2 shows temperature change inside a crate. Within 25 minutes after power is turned on, the ambient temperature reached 100 degrees centigrade if the cooling unit is turned off. It could go much higher, but the measurement was stopped since card guide and modules would not survive. When the cooling unit is turned on, the temperature came down to 40.degrees centigrade within 5 minutes, showing the effectiveness of the cooling unit. Details of this work is published elsewhere.].) One thing we learned was that it was possible to have very high air flow rate for cooling, but it is not easy to get an uniform air flow, even though we succeeded. This fact indicates that high power system requires better temperature compensation than other systems since one can not hope to achieve uniform ambient temperature distribution over the board surface. It might even prohibit high resolution circuit to function to its full capability.

3 High Power Backplane

When modules consume large power, such power has to be delivered to them. We have been studying a backplane that can supply much more current than a con­ ventional one. Supply current is limited due to the temperature rise at the power connector that couples the backplane to the power supply. We have developed a special power connector that can supply up to 50 ampares each. The temperature rise at the connecting point to the backplane was measured. It is better, in a purist point of view, if we can connect a power line from a power supply to inner layer of a backplane directly. However, it may not be as practical since the backplane inner layer must be exposed to outside. One must connect the power line carefully so that one does not destroy the backplane. To evaluate the performance of the power connector, we compared the power connector to the direct coupling. We measured the temperature rise at the connecting point. Figure 3 shows the result. The backplane used here was developed by us in cooperation with AICA Denshi Corp.. The backplane has 0.5mm thick copper sheets sandwiched between the layers of a printed circuit board. While normal multilayer printed circuit boards have typically 0.075mm thick ( special cases of up to 0.23mm thick were done be­ fore) copper layers, this one has 0.5mm thick copper layers inside. This backplane should be able to handle much more current than a regular one. The result shows that the power connector can handle the required current even though the temper­ ature rise was higher than when the cable was directly connected to the inner layer of the backplane. It is interesting to show the result from an additional measure­ ment when this special board was compared with a standard TKO backplane and a typical VME backplane. Their size were not exactly matched so that the result is not conclusive, but the difference in thickness of inner layer copper did show

— 24 — up. One should note that the temperature rise with the VME backplane was more than 25 degrees centigrade above ambient when 70 ampares were sent through the connector. This indicates the possible difficulty when a regular VME backplane was used with high power modules. It also indicates the superior performance of the new backplane.

4 Mechanical Structure That Survives Higher Temperature

We tested some of the card guides used in VME and other crates. They become soft when the temperature reached around 80 degrees centigrade since most of them are made of Nylon. A VME card guide exposed to a 100 degrees centigrade air flow under no-stress condition got a permanent deformation of 4 mm so that it could not be used anymore. As indicated in (2), it is quite possible that the card guides would be exposed to such temperature in a high power crates when some malfunction states occurred. We have been developing a metal card guide that would not lose strength to hold modules in place when such high temperature state occurred. The card guide is made of cast aluminum. The difficulty is to keep the precision of the spacing between the slots since the temperature when the casting is done is above 700 degrees centigrade. We can produce usable cast aluminum card guide now. In all, we can report that we now have most of the technology to cope with high power systems. Never-the-less the requirement for high power density is not a easy thing to meet.

-Reference- 1)T.MURAKAMI et al.,IEEE Trans, on Nucl. Sci.36(1989)783 2)T.K.OHSKA et al.,IEEE Trans, on Nucl. Sci.33(1986)98. 3)VXI Specification (Revision: 1.2) June 2x,1988

— 5>K — Table 1 < H i g h Power Modules>

MODULE POWER(W/Module) area (mW/cm}

C 4413 UPDT'G DSCR'R 33.84 70.9 C 4415 NONUPDT'G DSCR'R 33.78 70.8 C 4300B 16INPUT ADC 31.44 65.9 C 7106 16CH DSCR'T 29.52 61.9 C 2262 WVFORH DGTZER 25.29 53.0 C 7145 LINEAR GATE/MUX 24.0 50.3 C 3512 BFFR'D SPEC.ADC 21.96 46.0 C 2229 TDC 11.0 23.0

T SUPER CH 49/33 36.3 T 16CH FLASH ADC 41.6 30.6 T PIPELINE TDC (50) (36.7) T 16CH HI-RES TDC 28.45 20.9 T 32CH DRIFT CH TDC 16.17 11.9

F 10C2 32CH BUFFR'D ADC 83.36 56.4 F 1879 PIPELINE TDC 71.77 48.5

V MEMORY PARTNER 36.0 96.5

-CAMAC Module 4413 UPDATING Discriminator Lecroy power -24V 30mA -6V 4.1A 6V 1.3A 24V 40mA total 34w -TKO Module Super Controller Head(SCH) Mitui power -5.4V 180mA 5.2V 9.3A total 49w IC etc possession area about 6OX -VME Module Memory Partner(MP) Mitui power 5V 7.2A total 36w IC etc possession area about 83% — 26 — C DEG. C 3 UJ100 en

10 20 30 POWER CONSUMPTION PER MODULE CWOTTD

Pig.l

0 30 60 90 120 ELAPSED TIME CMINUTES 3

Fig.2

— 27 — Prototype Backplane (AICA)

I ) ) p f't )i~")i / "-O^CuSSiH m + 38 ft m-»20z 3.18mm \ fii^f) / ~^~f ^gp'-Insulator I ' ^-^- J^ ^v -^ '-y^Q.Bmm Thic k Copper sheet

4 1 a y e r s

A ; Power bridge B ; Copper sheet Measure point© Measure pointC

1* *» \ Measure point

Current—Temperature

•>A

• • ' • i i i I i i i i I i i i—•—L • l 50 60 70 80 90 100

Load Current (A)

F i g . 3

— 28 — SCSI II J:S MASS STRAGE

J!53s1tt [1] ttCfeti

-fX9, Bimafcr^d-r-^, r-f J?*W-r^f-^ (DAT) ^f&y, ^ft&SfS »fflr>'WX©#< ItSC S I Clp|fil,T^*t. SxiSf«DHfJ'i'^a Kn viziSHIftfflr-^iRJ&ctiSWiy DECttSSPDP l l

#fl£|;i:PDP;a>e.VAX;*x-S'3>Cg£&*.&*v* L./ta», fB®S««ft*&i: Ltl

1 i , ^T -^"-??^. ?K^7 -5' im(i800BPIi6ie,1600BPI> $ f, ll6250BPI^£fB®ffi&£_hlf, &**> l*f?jffi<*# < fc-DTtt** Lfc*«, **l-efe6250BPI 24007<-hffl7L-^,t?»!ll60M bytes T? I-. -£, x-iilRfefflcOtt^Stfe^ynv, VMEliiftfflS tiS J: -5 C& 'J . #»ftt5«l«IICfty*t.

T'WxSfXhtSSi:, VflBVAX, PC98, VME±CSCSl3>hD-5S:

[2] SCSI SCSItt, Small Computer Interface 0B&trSASI(Shugart Associates System In terface)£^ — X (H986^HANSI(American National Standard Institute)Mt&£ cStl

DATA LINE 8 + Parity CONTROL LINE 9 (REQ, ACR, MSG, SEL, C/D, I/O, BSY, ATN, RST) IDENTIFIER 0-7 (SCSlA*JU;3V|.n-7-&8&£ T-glgJWtg) CONNECTER 50 tf> *-"7;i/fi S±6m(*¥ffiS) **25>(¥ftH) g^ag l.5Mbytes/sec(*^M=E- K) 4.0Mbytes/sec(|Hl$§:E- K) ^"Pf. Sfcn'VVKCli Test unit ready, Inquiry, Request sense. Mode select, M ode sense, Rewind, Read, Write, Write file mark, Space, Load/Unload, Format unit Rezero unit, Start/Stop unit, Read capacity ,...^)5S£> »J Jt. SCSI^X±tII*#tt8^5g®3 > Nn-5-#gS?T?t *f #, *0*-C-*X hnvt? a.

7£®@#©#^(ID#^)£#^A;x.&J8fflffi5l6gE»7#S:%ig< fc»J*t. ¥ T-9ZMfr&t}1Si-k$:Mlz®ft*®%.lzfr-zfrZi:. &fl)»i:ft'J4t.

— 29 — 1) AxjiJ-SiM. •i-i'x-9-lt$2

2) 7-^H/->3 >«-»oT/<*««:tHtLfc-f-S'x-y:-l4, SEL{|-5§-£ •£ y

L*t. *-yy hf±SEL«-t&ot@5'ffliD#-g-c*fjs:-rsff-*5'f,>*«j|-e, 1/0®%&&V&Z Z t Srlfeffi Lfc^gB*<3J?$tl,'rv>S ; a £BWLBSY«-£fc •by hL*t. W-^x-*-ttBSY^£$ffiLfc&SELft-i|£y-try hL*t„ «±T?-fel/*S/3 V7x-X&»;by> PMlAfi-Vv h±*T?ti^L.. -I'-S'X

3) JS«*nfc*-yy Ktavv K7i-Xi:»flL, nvv h^Dy?4-f=->i

>*\ h5??, Hz?*^&&j61-S©t;fc*-'!fy !•••£•;&.y, -f -S/x-* -l±B8 *a u *-&/^«

4) $->fv Htavi/ K7*ny ?-?*!££ ftfc^-y h^-tcoxf X!?*><3ffi^$n fc^ny^gcffl^-^fc^diU -f-i/x-^-liteSLS-r. ^-ySS-g-ti o;&*e>7f&y:-yy btt8^©3.-5/ h£g«T?t$*. 8£r>rscsi:£#T'{±64& (83 V hn-5-*83---y F)«)T/WXSI!lS-C§St. 5) Tt-*<£i£#*l7Lfce>y:->ry HiX?-!iiX7i-XSiffLt3?v KfflU ffSIKGOOD, Check Condition, Busyl£)£ -f -S/x-* - ll$£ L 3:1". 6) tSi:^-y? Mi "Command Complete" ^ y -fe-^SSHf LT-JI 4)03^-9, 5)ft£-f. til$8<0 «^coaS(3-7> h\ f^- y\ Xf-JX, ^y-fe-S?)(*x C/D«-g-fcMSG«-tr-E8UL*t-AJ. *CDil®ffil±*--!r y l-Wd&y **. &,±l&»*,ffi&teM-C'ti>i. ifflS^K^-f *3*0 K U-feU^K

*i-. *fcscsiffl©yn h3;i/Lsiti#affl%ffl/6Sf y s-r. [3] S C S I :K- K 1) PC9801 H #•?-<•£ nnvt? a - *-&$&£> PSB-9310&&J8 LS Lfc. SCSI^D hrui/LSIi: L •CNCRSSaOfc&c-C^*?-*1* SCSI/W K y V^a^Atf^- K±(OR0MtlS«tl •c^r, * y T0 g #&5&igi-r a iKgti&y **/,,, -eti-ftiffl^x-x^nfffsn iiPC98©us?xii-i:Bff^ffl3- Kfc-ty N try 7 hay a***tf r^y *f. 2) VME h-H-7i-4'ttiaa)TVME-322&MffiL* Lfc„ SCSI7"D S=i;i/LSIfc LTWD33C93S Sffll/ti^t. 7*n ha;i/LSI®Ui?x*-i5VME^x±tiait.,rv^0-C'^n'fc,

— 30 — 3) 7-tJDVAX VMS (VAX® OS) *»£ SCSI* aw Ku^fav (- n -)l>ir S tztbOiX- h'fc Lt, r K^>X l>2/XrAXtfccOSSVAll-C£^fflL-S Lfc. SCSI^n hruHSIi: LTNCR53 86&ffifflUr^Sf*Ml^XJ:Cl,A.S©(±8®©U^X^ (CSR, Command, Status, DMA address,DMA byte counter^) T?S> y , .TCOU$?x£ —£j!3(--fes'l-t'6-i:Hcfc^ TT'n hn;uLSI&fl«itSCSIA7,S3 v N p-;i< Life t. ±IB3»|®nVAXfflC{±VMS©S{Jp©K5-f A-V7 hMJtSx? aU-J-a ViK-FMlfflt StfiD, #«rSCSIffiavv Kl"*;i/T?n> hn-;u£t5(&g#&tt*U;£*ffl;fj#MiJti T?1-» fcifcfcSCSIli, T^WXfflgilSKtfcfe&ofctfD-eta*. S'-'ir i'S' + .n^W Xi:5V^A7^-feXfr-'W7.fei±3VV FfflWRtJinSt U HC5VifAr«-fe

SCSIt-n vv KU^^Tfnv |-n-;i/T?£ftlf, T'W xuiSSv^tt^a ^5 Afflffl*

K«fcS5t^fct#- K±co^n^7A(R0M)A*®)RLrv^S-r.

CJ-nf-S'aV^X-CH;, SCSI&SlP5lffiL.-C^S*)fflA«#<«:orv\S-r L, SCSHpSftffiCAMAC? U-h3>ha-5-feHjr6, 3h;i>&SCSUigStS »£tf 41" * 1-Jf *. 5 4 ffl t ©toil * 1". [4] MAS S STRAGE 3Sl«ja^«fC7L-^Clfc*>y#*fcS*>na7f-'Wxi:lB«x-^«QS*(OH:tt*or **fc*(OJ:dcfty s-f.

%?4 7.>} (5-fVf) 400 Mbytes/S ftiS^-f X t> (54V?) 297 Mbytes/ffi 8mmlfrtf-^ 2.3 Gbytes DAT (f <*****-?* **?-7") 1.2 Gbytes IBM3480*tnhf-7" 200 Mbytes JK^7L-^(6250bpi) 170 Mbytes £ © ft IBM3480*t»l>?-7'* l» < 5*/W X, StflK^-f XiMIO^T, f7l-LfcM*i 0-U @-2i:SLSt. ffllJStt'MJnVAX-efTV^ DECttjEft«#tt, i§ al/-S/a >#-

1) x-cX^f^WX

f H-lttr^?»«)7 /Wxi:-3^T< teSS7ny?1J--fX4:(g&xfcr-F*/\ — K-r-f X2, OPTdiskliJfex-f X*, MODISKttjfefflEa^-f X*T?^1*tlt»SCSI-et. RD54, RA70l±DEC©5-f >f-/\— Kr^ X i?-et. 7f-^<07i'-feX*fettS'-'7->5' + ^-C-t. ft-r-fX?, %&.%?< 7. 9 \Z

— 31 — 5*;Wx* HEft ¥*&#*> 8# HO 7 ?-fe X *-1" h RD54 (159Mbyt) 3600 rpm 8.3 msec 38.3 msec RD70 (280Mbyt) 4000 7.5 27.0 SCSIHD (673Mbyt) 3600 8.3 16 %T 4Z.9 925 33 213 %&%*<*.>? 1800 16.7 66.7

J8TJ& **•*«. f-!/«)ft*)lll:ttfit5i:ltgi!:^7.f73XHOSt1 Smml^x* x-^^DATlI^tJ L*t. *fc, ftIfflA-Fr^i'©3XhA7*-v>Xffl Sl±{i»iVS L<, JfeKft^-rx^fcA- KT. 2) f-^x'WX Butts'—yjR©5*/WX to ^T, (BSI^ns; *1f-f XtfeSIXtf- K®BSffi&SQ SLfct>cr>-ei-„ GCR_800K4:GCR_480Kl£|I|i;6250bpi©X t- U - 5 >^* -f :/©&fCx- ^1?1-*«, ^{ti#HXtr- KfeSEHf a«ffi««*y dfflfSSSffia*, 800Kbytes/sec;6> 480Kbytes/seci6>fflaiOT--r= URirtt, I- 5 ^l/CO^gfesf It U, 480Kbytes/secT? ffiffl L-Cl^a-T. TK50,TK70t£DEC©rt- h U ? VT--?-Z. 8«uaVCRfcfcSCSI, GCR£DATJ± /<-fjiH>#-7i-X1!t. DAT©Xtf-K##5Kbyte&l::Si*>T^5©l£. 17 U - A (READ.WRITE©ft/h*ffcTf«aW C l±5760bytes)#5Kbytes£ $-3 T ^ 3 fee* £ t S*>*l$-r„ 8mn fcf 5* *?-:/*. DAT =fe, X If- KT?tt3SSffiffl LT^S6250BPloa£^'r - ^ (GCR) J: »J t,, «V^*Ji^c:t««*>*y*t, &HDAT©«-&t±¥£&Ti::fto-C^ St. ^-t/ffla-g^fflgStiKSi^n **•»-•< x"©£W££fr£f\ TStt^ny* U" -f X fc £S©08$ £ SOS L fc * ffl T? t. •fuyWJX r - zf & & (Mbytes) Kbytes/block 6250BPI%gCf-7' 8nnfcT*f-7' DAT 1 57 876 238 2 86 1573 474 3 104 2135 714 4 117 2140 952 5 126 2135 1189 10 146 2140 1178 60 168 2137 1179 lKbytes/blockt?f;-*£»< t &»«©»#©- L*>#tt£•&*,,, -| Hi: y-*c 7*-# £#< fc&. -@j:*<:7oy0!&«/J'>3^fcIRG(Inter Record Gap)#Jt*.T, H

t S ± *> ft* -f ^yff-J*'(SlS*i. *o7*n y ? -9--f x*#*£ ^ t Sffli 5 ft

— 32 — fifflgii:ft^TL$^St. *< 9-< $>£"i::J:-3Tl£578Mbyte£l.;6>#W-&vv0!|fc;& USLfc. lKbytes/blockT?*., *o < y £ Ltz9 -i % V^T?»< £ , 2Gbytes£ •efcfcJ- S-rtfl&JlW-efci;* y **/,. DATcDS^fc^ READ,WRITEffl«/J^ffl*«5Kbytet;)&;o T^SfcftT?, lKbyte/blockt?#l^T

£frt£Jt-3Ttri$?-. *-fey h&-fesr hLTREAD¥trftS4T?j:, *j30g>#fc£;ft£ -f U VAX/VMS®, INITIALIZED V > Fi:l70g>, MOUNTnv > KC65#, COPY3 V > Kfcffi-a T, i fT«tf<0 7 7 -f ;u&3 tf--rsffltzi60# **>*>»; st. &*, xtr- K-ettSMitfT**?-^i:*u*-r««, ttv^ts-ett* DAT, M0l)NTi;l3g'i:8Bmtr7J2l-CJ*'«5i:, £v **•¥•< ftoTv^f. £&ft-&9 hfflBtU fflL*>. 8Bmtrxst-«:y 9-f > K tt IB T? & (^ i: T?t S-&/,*«, DAT(iiffflttffi*» 6-e*.*y ffi-ttS-r. DAT©iH|)l^{i^ffl2!feei;$)y*1-. 3i£. DATA/DATfcDDS(Dig ital Data Strage)® 2 »£®7 *- V ? S#ANSlCJS2g£ tl, fS#£-#1-5Jg|&i::fc itl^ti!1. ^BxX h I, fcDATli*© if %«>£*> it?*©•??to *®j&8mm\£5^*0:, -tt«Dafi«ffi-el-*»e>2»tt®jgT?{i^*l-et. fcjeLSHfflfflVAX/VMSxSaU-S/a :•#- Kffl*KttE»ttffl*^fceD#;i&y4l.fcU 8mmtf^^-««i#fflA-S?9 Vfc

[5] fifciJC fetlVME, PC98, 7-OnVAXi;8mmlf7i*x-^'&g5iKLfcS'X5LAa

S.&&*>*%b3if$-et. MaUBFO^-^&Jffi-feV*—C», 8mm fcTf1*^-:/«:&

1) ANSI X3.131-1986 Snail Computer System Interface (SCSI) 2) rSJpAffl*I/F=SCSI(0«F3fiJ , "-f V*-7x-x"(JULY 1987) 3) r?4 iff frt-^J *(&&&*>!) AT£-C)i , T-UVPaX&gm, nn^tfc 4) r/N- K •

— 33 — 1000

800

600

400

200

20 40 BLOCK SIZE (Kbyte/block)

-2 f-^r/Wx T 1 1- T 1 1 1 1 r GCR_800K Write 600

11111111 '' * GCR_800K Read

GCR_480K Write _J

20 40 BLOCK SIZE (Kbyte/block)

— 34 - MACINTOSH 11/ CAMAC/ VMEbus

KEK Susumu Inaba Abstract

This paper describes the CAMAC and the VMEbus data acquisition system with the 68020-based Apple Mackintosh II personal computer. The Machintosh II interface was newly developed together with the CAMAC and the VMEbus controllers. This interface can be provided with direct access to the CAMAC dataway and the VMEbus for data acquisition, experiment monitoring and equipment testing.

1. General International standards such as CAMAC, FASTBUS and VMEbus have been in general use for on-line data acquisition at scientific research laboratories such as KEK for a long time. On the other hand, 32-bit personal computers like the 68020-based Machintosh n (MAC II) or the 80386-based PC-9801 have recently spread far and wide. Software for these systems can be made available at very attractive cost per system. If these popular 32-bit personal computers are utilized for on-line data acquisition with CAMAC and VMEbus systems, it would be very powerful instrumentation for high energy physics experiments.

2. CES and KSC CAMAC/VMEbus controllers 2-1. CES CAMAC/VMEbus system

CES means Creative Electronic Systems in Switzerland. CES has two configurations from the MAC II interface to the CAMAC crate controller. One is through the CBD8210 CAMAC Branch Highway Driver, the other one is through the CPBI8216 VMEbus interface for the KSC3922 CAMAC crate controller. In both configurations there is no direct access from the MAC II interface to the CAMAC crate controller. The block diagram shown in Figure 1 and Figure 2 consists mainly of the CES interface and controller modules. Connection between the MAC7212 MAC E interface and the VBR8212 VMEbus controller is by the CES Vertical bus named the VMV. The VMV is a private data bus designed by CES. The VBR8212 is a VMEbus crate controller. This controller receives the signals coming from the VMEbus and transforms them into VMEbus signals in the crate where they reside.

— 35 - The VBE8213 is a VMEbus slave module. This module converts transfers destined for other VMEbus crates from the VMEbus to the VMV protocol. The CBD8210 in Figure 1 is connected to a standardized crate controller CCA-2 through a CAMAC Branch Highway cable. The CBD8210 is a VMEbus interface for the CAMAC Branch Highway .The CPBI8216 in Figure 2 is a VMEbus interface for the KSC3922 CAMAC crate controller. The CPBI8216 is connected to die KSC3922 on the RS- 485 parallel bus designed by KSC.

2-2. KSC CAMAC/VMEbus system KSC means Kinetic Systems Corporation in U.S.A.. The block diagram shown in Figure 3 consists mainly of KSC interface and controller modules. The KSC2932 MACII interface is directly connected to the KSC3922 CAMAC crate controller through the KSC RS-485 parallel bus. But the KSC2932 is not available yet, it is only an announced product. The KSC2932 can address up to eight KSC3922s en the RS-48S parallel bus.

3. Newly developed MACE Interface and CAMAC/VMEbus Controllers. The system diagram shown in Figure 4 and Figure 5 shows the connection between the newly developed MACQ Interface and CAMAC/VMEbus controllers. The CC-MAC/ADP MACH interface can directly access the CC-MAC CAMAC crate controller and/or the VME- MAC VMEbus controller on the Toyo private data bus. It may coexist with up to sixteen CC-MAC and VME-MAC controllers. A single 100-wire twisted-pair flat cable is connected to the CC- MAC/ADP and looped-through the CC-MAC and/or the VME-MAC. This 100-wire Toyo data bus uses a differential current drive transmission and permits 120 meter cable lengths which is greater than the EUR 4600e CAMAC Branch Highway. The data transfer speed of a 24-bit CAMAC data word on the Toyo bus was achieved in 2.8 microseconds. hi case of the VMEbus operation a 32-bit long-word data is separated two 16-bit words. The two demultiplexed 16-bit words are transferred from the CC-MAC/ADP to the VME-MAC controller as two cycles. Figure 6 shows all the signals on the Toyo private bus. The basic bus protocol between the MACII NuBUS and the CAMAC/VMEbus controllers is indicated in Figure 7.

— 36 — CES: MACII/VME/CAMAC SYSTEM #1 CAMAC Crate #1 o > > TERM a a > WS o o Ml CAMAC Crate #2

H H on a a Oil o o »• CAMAC P2(Rows A &C) Branch Highway VME Crate #1 Up to 7 Crates < < to CD °a 1 r-i 3D m 00 09 M • °M red 1 1 N -roA j . 1B1 TL I1 io•* u ^LJ P2(Rows A &C) CBD8210: < < O CD CD CAMAC Branch Driver m F"l ' m VBR8212: 1 1 • 09 M Vertical Bus Receiver • 1 _roL VBE8213: * rL- li -*M w 3 Vertical Bus Emitter o° VME Crate #2 MAC7212: SL MAC II Interface CD c CO

NuBus Board

Figure 1 : System Diagram consists of CES modules

— 37 — CES: MACII/VME/GAMAC SYSTEM #2 CAMAC Crate #1 ^^ACB

> XSC 3 o o> > o TERM ro o o 09 O ro m CAMAC Crate #2 nAc > * (0 r-. H H o 1 O O o i

o18 0 w 1 o o M I1 RS485 P2(Rows A &C) Parallel Bus VME Crate #1 (Up to 8 Crates) < < a D CPBI a m 09 09 ro ro _i w P2(Rows A &C)

< < CPBI8216: 00 CO CAMAC Parallel Interface n 3D m 09 09 for KSC3922 Controller 1 ro ro fL -i •A KSC3922: 1—i ro u Parallel Bus CAMAC Crate Controller VME Crate #2 ACC2180: Front End Processor

Figure 2 : System Diagram consists of CES modules

— 38 — KSC: MACII /CAMAC SYSTEM #1

CAMAC Crate #1 ACB > X

> > OC21 8 C39 2 o a w o o o 10 s»

CAMAC Crate #2

H o a o o RS485 Parallel Bus CAMAC Crate #3 Up to ft Crates

H H O O O O

RS485 KSC3922: Parallel Bus Controller ACC2180: Front End Processor Parallel Bus Max 90m

KSC2932: MAC II Interface

Figure 3 : System Diagram consists of KSC modules

— 39 — KEK/TOYO MAC II/VME/CAMAC SYSTEM

^:av», NuBus 32-bit Data/Address

TOYO/CC-MAC/ADP MAC II Interface NuBus Board

Private TOYO Bus 50 pairs (100 wires) Data 24-bit/Address 13-bit

CC/M AC CAMAC CAMAC Controller Board

Private TOYO Bus 50 pairs (100 wires) Data 16-bit /Address 16-bit

VME/MAC VMEbus VMEbus Controller Board

Figure 4 : Newly Developed System Block Diagram

— 40 — KEK/TOYO MACII /VME/CAMAC SYSTEM CAMAC Crate #1 ACB

> O > > o O TERM a a o s o o ro -> 00 o § o CAMAC Crate #n TOYO Private Bus 50 Pairs(l00 wires)

H H Up to 16 Controller D a (VME/CAMAC) O o

VME Crate ACC2180(KSC): Front End Processor < T I 3 o> m 1 m CO 3 3 oro O I > o 30 o -^ TOYO Private Bus 50 Pairs(100 wires) Cable 120m

NuBus Board

CO/MAO MACII ADAPTER

MACII/VME/CAMAC Interface

Figure 5 : Newly Developed MAC ll/CAMAC/'VME System

41 TOYO/MACII/CAMAC/VME Bus Signals

CAMAC VME Name CC/MAC VME/MAC

Data XD<1-24>:24 XD<0/16-15/31>:16

Address XC<1-4>:4 XA<0/16-15/31>:16 XN<1-5>:5 XA<1-4>:4

Function XF<1-5>:5

Control EX0:1 R/W:1 XX:1 EX0:1 XQ:1 INT:1 XL:1 DTACK:1 EXEC:1 BUSY.1 XS1:1

Figure 6 : Bus Signals on the TOYO Bus

— 42 — TOYO/MACII/CAMAC/VME Timing Transaction

NuBUS TOYOBus CAMAC CAMAC MACII NuBUS Adapter (50pairs) Controller Dataway

/CLK VME /START Controller /ADDRESS /DATA /MYSLOT /EXECUTE +

/ADD(CNAF) *H /MYCRATE /BUSY | -WNAF /S1 -•/S1 /S2 ' -•/S2 /EXECUTEf /ACK<«- /ACKl

Figure 7 : Basic Timing Transaction REAL-TIME MARKET and VsWORKS

Jerry fiddler

Wind River Systems KK

— 44 - A Real-Time Computing Real-Time Market: Evolution

Trends, 1970*81081 -87 1988-91 CPU Speed .1 MIPS 1MIPS 5-100 MIPS Directions, CPU Type PISC CISC RISC, CISC Memory 4-64KB 1MB 4-64MB and Language ASM C C, C++, Ada Comm. Serial Serial Requirements Processing Single Single Multi- Resources Hardware 50/50 Software | Wind River Systems

• Wind River Systems • Wind River SystemsJ

Wind River Systems KK. Special Seminar Jerry Fiddler Real-Time Market Trends: Real-Time Market Trends: Driving Factors Results

• Hardware Power Increasing > Many End-Users Will Choose Not to •5-100 MIPS Soon Design Own Hardware * Standard Bus Products • Distributed Systems * Networking 1 Applications Will Increasingly Be * Multiprocessing Software Oriented * Powerful Real-Time OS and • Higher Speeds + Complex Topologies = Development Environment Are More Difficult Engineering Problems Required • Availability of Standard Bus Products Customers Will Demand Much More > Availablity of Large Memory Integration * Development System • Availability of Powerful , 4 Development Systems as Low-Cost Real-Time Hardware Commodities * Connections * Support * Software j \ . Win• d Rlvar Systems • • Wind River Systems% > *^

Wind River Systems KK. Special Seminar Jerry Fiddler Real Time Market ( \ Requirements: Overview

• Unix - NOT as a Real-Time System • Workstations • For Development • Development Systems - As a Resource • Monitor, Control of Real-Time Environment • Cross-Development • Intelligent Consoles • Standards * X-termlnals • Connectivity • Why Unix? • Off-The-Shelf Real-Time OS * Thousands of Man-Years * Oriented Towards Software Development • New Architectures * Standards * RISC * Multiple Vendors * Huge Public Domain Library • CASE * Large Pool of Engineers • Ada * Books • World Market * Training * Its the Best

> Wind River SystemsJ Wind River Systems J

Wind River Systems KK. Special Seminar Jerry Fiddler Cross-Development Standards

• Needs for Host and Target Are Completely Different • Real and "de facto" • Both Environments Need to Be Optimized • Hardware Standards • Networking Provides the Ideal Solution * Standard Buses * VxWorks Allows "Real-Time Servers" As - VMEbus, Multibus 1 and 2, Resources In a Networked Environment • Software * - C.Ada * OS - Unix is de facto standard Becoming a Real Standard (Very Confuslngl) - ABI - Window Systems X11,NeWS * RT Kernels - ORKID - Others < Wind Rlvsr SystemsJ •Wind River Systems J

Wind River Systems KK. Special Seminar Jerry Fiddler ( \ Connectivity: Costs of a Do-it-Yourself Market Requirements RT OS (or Kernel)

Real Time Systems MUST Connect Into a Network • Design of Resources • Implementation * Development * RunTime • Documentation Heterogeneous Networks * Training * Different Computers • Maintenance * Different OS's * Bug Fixes * Different Functions • Optimization * "Black Boxes" - Intelligent Instruments • Generality (for additional projects) * Different (and Multiple) Physical Networks * Mobility between projects - Ethernet, PROnet most Important today • Improvements - FDDI, Others later * Networking - Custom Mechanisms (Microwave, satellite, * Additional Drivers special telco links, etc.) • Third-Party Software * Multiprocessing Fully integrated * User's Groups Standard Communications Protocols * TCP/IP Today • New Architectures * OSI (MAP) Tommorrow • Support *"* Wind Rlvtr Syatams J ***•"' Wind fflvor Systems

Wind River Systems KK. Special Seminar Jerry Fiddler r What an Architecture RISC Needs

• Required for Very High Speed Applications Reasonable Architecture * CISC Is Close to End of Speed Curve Excellent Basic Tools * RISC Is Just Beginning, Already Faster * Compilers • Register Intensive Architectures • C, Ada, ??? * Excellent Opportunities for Real-Time Use * Cross-Tools * Require Re-Thlnklng at OS Level Wide Range -Task Model * High-Speed • Special Real-Time Needs * Low Cost * Technology * Special (low power, radiation hard, ASIC) * Cache Organization Connectivity - Determinism Different System Configurations • Support for Sophisticated Applications * Workstations * Powerful Real-Time OS * SBCs * Powerful Development and Debugging Tools Extensive Documentation Effective Marketing * Specific to Real-Time Applications Worldwide V. • Wind Rlvar Systems Wind River Systems J

Wind River Systems KK. Special Seminar Jerry Fiddler ^\ CASE: CASE: Why? What?

Absolutely Essential! Currant Technology Current Technology Like Cars Were Built Before * Source Code (Version) Control Henry Ford * Automated Make * Object Oriented Programming (Ada, C++) Applications Growing Huge * Unix * 100K to 1.000K Lines Becoming Common * Documentation Tools Large Applications (>100K Lines) That Are 100% Future Bug-Free Will Not Be Written In Our Life Time * More Use of Existing Tools (With Current Technology) * More Automation • System specification, creation, documentation, maintenance * Verification, and Quality Assurance

V • Wind River SystemsJ • Wind River Systems •

Wind River Systems KK. Special Seminar Jerry Fiddler Ada Advantages of Ada

• US Government's Solution to the Software Crisis True Object Orientation • Now Required for All US Government Contracts * Data Hiding • Now Moving into Commercial Applications * Excellent Reusability of Modules * Module Interfaces Separate from Code • Acceptance Has Been Slow Because of: * Generics, Overloaded Operators * Lack of Adequate Compilers, Tools - Now Available Many OS Functions Defined by the Language * Lack of Trained Programmers * Tasking Model - Becoming Available * I/O * Lack of Understanding of Benefits Well-Specified Standard • Now Being Understood * Validation Procedure Required * Lack of a Sophisticated Host-Target Environment Large Public-Domain Library of Functions - Now Available, With VADSWorks Large Projects See Significant Savings • Well Accepted In Europe * Implementation * Maintenance

• Wind Rlvor Systems • • Wind River SystemsJ

Wind River Systems KK. Special Seminar Jerry Fiddler r World Market Wind River's Mission:

• World Support Required for Success * Architecture . Provide the Most Complete Spectrum of * OS Real-Time Solutions • Large Mass of Tools Required • Provide Excellent, Engineer-Oriented Customer * Unix Support * Cross-Development Tools * Domestic Support * Real-Time Tools * In-Country International Support * Documentation * Training • Become the Standard by Which Others are Judged • Products Developed, Maintained In Different * Start With the Best Product on the Market Countries * Participate on Standards Committees • Provide an Environment in Which People Thrive

V • Wind River Systems_ ^ I • Wind River Systems

Wind River Systems KK. Special Seminar Jerry Fiddler Real-Time Computing Revolution Wind River Systems, Inc. History

• Incorporated in CA, 1983 H • Began As Consulting Company * Video Editing System - Francis Coppola Unix VxWorks * TV Automation System Window EyiMni RMkHnwKMMl Toon Tool* * High-Speed Financial System Comptem MuMpracHiIng s"~unSSID*uw,n9 Symbolic Dtbuoglns - American Express • Company Now Devoted 100% to Real-Time System Products

< Wind River SystemsJ

Wind River Systems KK. Special Seminar Jerry Fiddler WRS Philosophy: Wind River Systems, Inc. A Company, Not Just a Product • Customer Buys a Company • 30 Employees (45 -50 by Year End) • Product • Profitable, No External Debt or Ownership - WRS Provides the Most Sophisticated Products • Domestic & International Distribution • Support • Customer Base. Is "Who's Who" of Real-Time - Engineer-Oriented * Aerospace (Rockwell, TRW, Boeing, - Bug Fixes, Questions, Consulting Lockheed, Thomson CSF) • Information * Medical (GE Medical, Nicolet, Phillips) - Presentations * Lab (Los Alamos, JPL, Sandia) - Newsletters * Communications (Pac Bell, Contel, ROLM) - User Groups * Automation (Caterpillar, Kodak) • Professionalism • Fastest Growing Real-Time Software - Highest Quality Company. - Confidentiality • Future Growth Path - New Features • New Architectures • Partner for Real-Time Needs - Experience for Sale V. • Wind River Systems • ' "' Wind Rhrer SystemsJ

Wind River Systems KK. Special Seminar Jerry Fiddler WRS Philosophy: "\ Employee Orientation Wind River Systems Marketing • Provide Pleasant Working Environment * Building • Domestic and International Distribution - Pleasant (Comfort, Privacy) * USA, Japan, France, Germany, England, - Productive (Networks, Tools) Australia * Informality, frlendllnes ' OEM's and VAR's • Provide Extensive Training * >10 Board Level Manufacturers * Technical and Non-Technical Employees * Integrators * Software Partners • Expect Professional Discipline - Verdlx (Ada) * Demand Excellent Quality Strong Relationship with Major Computer • Growth and "Career Enhancement" Opportunities Manufacturers • Management Participation Strong Presence In Board-Level Market • Happier «•> More Productive Uniquely Positioned In Marketplace • We Have Never Lost an Employee to * Highly Integrated Software Another Job! * Strategic Relationships

V. < Wind River Systemmtm^^s ^» Wind River Systems •<—-^

Wind River Systems KK. Special Seminar Jerry Fiddler A Wind River Systems, KK. VxWorks Sales Channels

Created to Market and Support WRS Products In > Goal Japan * 10-220% Direct Sales (in U.S.) Created as a Joint Venture * Strong Development and Support of * Wind River systems, Inc. OEM, VAR, and Integrator Channels * ASR Corporation * Nissln Emphasize OEMs and VARs That Offer * Kobe Steel a High Degree of Integration, Added Purpose Value, and End-User Support * Support Wind River's Role * Marketing * Translation * Development of Top-Notch Product * Information * Excellent Support of OEMs, VARs, - Both Directions Integrators, and Large Customers * Training * Training, Information * Localization • Japanese Boards, Architectures Distribution Models * A Japanese Partner for Real-Time Needs * MS-DOS (Microsoft), UNIX (AT&T), Postscript (Adobe) i

• Wind River Systems_ y *»» Wind Hlver Systems •—•^

Wind River Systems KK. Special Seminar Jerry Fiddler r r Sample VxWorks VxWorks Connections Projects

• Keck Telescope ' Hosts • Mars Lander Autonomous Vehicle (JPL) * Sun, HP, Integrated Solutions, Motorola, Kurama, NEWS, Toshiba, VAX • Medical Proton Beam (Loma Linda) Targets • MRI Scanner (GE Medical) * 680x0 • Boards by >10 Manufacturers • Accelerator Control System (LBL) * SPARC • American Express Network * Others (1989) Networks • Automated Lumber Mill (McMillan Bloedel) * TCP/IP • Flight Simulators (Boeing, TRW) * Ethernet, PROnet, SLIP (soon) '- Multiprocessing - Heterogeneous CPUs, manufacturers, OS's * Gateways - Subnets Available ^*» ' Wind River Systems •—••'^ < Wind River System!)J

Wind River Systems KK. Special Seminar Jerry Fiddler VxWorks Connections: Windows: Views Into a Modes Network of Resources

* Process to Process * Remote Procedure Calls * Remote File System * Remote Login * Remote Debugging

• Wind River Systems Wind Hlvar Systems >

Wind River Systems KK. Special Seminar Jerry Fiddler ( \ WRS Training Options Wind River User's Group

• Five Day Course • An Independent Organization • Reference Arrangement for Basic Courses • Provide a Flow of Information on Real-Time * C Issues to Users * Real-Time • Services for the User * Unix • Code Exchanges Software Archive • More Coming • Benchmarking • Regular Forum for Users • An Independent Voice to Wind River * Suggestions, Enhancements • First Meeting April 10 (Berkeley, CA) - All Users Invitedl - Japanese Group Will Form Later

V. • Wind River SystemsJ Wind River Systems J

Wind River Systems KK. Special Seminar Jerry Fiddler Real-Time Market Trends: Complex Applications: Wind River's Role The Mars Rover

• VxWorks Is the Most Powerful System for Real-Time Needs * High Level OS Capabilities in Real-Time Environment * Powerful Development Tools * Distributed Systems Facilities * Connectivity With Workstation World * Off-the-Shelf Solution for Many Different Hardware Platforms • Wind River Has Strong Relationships With Major Companies in Boards, Workstations, and Integration • VxWorks Is Ideal Software for Integrators • Wind River is Uniquely Positioned In the Real-Time , Market J • Wind River Systems • Wind River Systems-

Wind River Systems KK. Special Seminar Jerry Fiddler r ~\ VxWorks Requirements of a Real-Time System

Network-Based Real-Time • Fast, Predictable Behavior Computing • Multi-Tasking • Pre-Emptive Scheduling • Fast, Flexible Intertask Communications • Fast, Predictable Interrupt Response • Easy Communications Between Tasks and • Processes in Memory > ROM-able • Closeness to Hardware

^» Wind River Systems «• i Wind River Systems

Wind River Systems KK. Special Seminar David Wilner r r Unix is Not a Real-Time Unix is a Superb System Development System

• Powerful Software Development Tools • Programmer Friendly • Many Unix Programmers Available but. • Powerful Networking Facilities

. Wind River Systems > Wind River Systems•mw** ^

Wind River Systems KK. Special Seminar David Wilner r Unix in a Distributed VxWorks Real-Time System

is a True Real-Time Development Operating System Monitor, Control Designed to File Server Complement Unix High-Level Server

. Wind River Systems J . Wind River Systems.

Wind River Systems KK. Special Seminar David Wilner r \ What is VxWorks Choice of Kernels

• Kernel • Native VxWorks Kernel • Full Network Capabilities 1 pSOS (Software Components Group) • Real-Time Partner for Unix 1VRTX-32 (Ready Systems) • Complete Distributed Systems Facilities 1 • Powerful Debugging Tools 1 • FAST Real-Time System Development

< Wind River SystemsJ < Wind River SystemsJ

Wind River Systems KK. Special Seminar David Wilner VxWorks Kernel VxWorks Kernel Concepts Semaphores

• True Kernel (Minimal) • Damn Fast Semaphores (DFS) • Pre-emptive Scheduling • Most "Kernel" Functions are Performed at Task Level Using DFS •Up to 64K Tasks 'Memory Management • System Call by JSR Instead of Trap •Signals *No Interface Routine •Pipes •Sockets • Fast Context Switch, Interrupt Response •Network • Designed From the Start for: • Minimal Pre-emption Lockout * VxWorks Use * Task Pre-emption ONLY * Multiprocessing Locked Out During Context * CISC/RISC Switch and DFS. •Ada * Embedded Systems

V __ Wind River Systems • Wind Rlvsr Systems J

Wind River Systems KK. Special Seminar David Wilner VxWorks Kernel Concepts Minimal Lockout Work Queue Approach

Kernel Work Queue 'Work is Added to Queue if Kernel is TaikLwtl Already Active (During Interrupts) 'Kernel Performs Queued Work Before Exiting

Pnwnpilon Interrupts Are Locked Out ONLY While LortiidOut Work Queue Is Manipulated 'Occurs Only When the Kernel is "HWApproieh I | imgjjtt Interrupted CM Near-Zero Interrupt Lockout * Near-Zero Interrupt Latency (£)—© @__Q TukLwri Allows Almost All System Functions from Interrupt Level Prwmptoi LodwdOut Applications in Multiprocessing VxWorlu Approach

• Wind River Systems J • Wind River Systems -J

Wind River Systems KK. Special Seminar David Wilner r \ VxWorks Kernel Intertask Communications Performance and Synchronization

• Task Lock/Unlock- 14jiS • Shared Memory • Task Context Switch - 33nS • Semaphores • Semaphore Give / Take - 35nS • Pipes • Sockets • Signals -Unix BSD-style • RAM-disk * MMaurad on • 85MHz 68020,1 watt Mat*

V < Wind River SystemsJ . Wind River Systems J

Wind River Systems KK. Special Seminar David Wilner Interrupt Support Floating Point

• Low Interrupt Latency ' Floating Point I/O • Any C subroutine may be connected 1 Coprocessor Support • lnterrupt«-»Task Communications * Register Save/Restore * Semaphores * Optional on Per-Task Basis * Pipes * Message Logging * Signals

V. • Wind Rtver Systems wmm&^ • Wind River Systems J

Wind River Systems KK. Special Seminar David Wilner / \ Memory Management Network

• Variable-Size Memory Allocation • Complete Port of BSD 4.3 Network ' Unix Compatible •TCP/IP * malloc, free, realloc, calloc • Inter-Network Facilities Manage Multiple Memory Partitions * Connecting Multiple Networks * Different Physical Media I Ethernet, Pronet, Backplane o * Gateways * Transparency

. Wind Rlvor Systems . Wind River Systems >

Wind River Systems KK. Special Seminar David Wifner r Sockets Socket Communications

' Process to Process Communications VxWorks or Unix VxWorks or Unix 1 Transparent Communications * Between VxWorks and Unix * Across Any Network Medium * Between Multiple Networks Unix Source (and Object) Compatible

Bus or Ethernet

V < Wind River SystemsJ . Wind River Systems J

Wind River Systems KK. Special Seminar David Wilner Network File System Additional Network Toois^ (NFS)

Remote Procedure File Transfer Hemots Login Calls Protocol (rlogln, telnet) (KPC) (FTP) • rtogin, telnet (remote login) • ftp (file transfer protocol) • rsh (remote shell) & $

. Wind River Systems J Network Components

Wind River Systems KK. Special Seminar David Wilner UNIX

Ethernet Controller ml VMEChauli VxWmta Woriuteibni

Ethernet Ethernet Controller VxWortu %?-* VxWoito Mainframe

VxWorki VMEChuali

Ethernet Controller « » UNIX Distributed VxWbrti ton- System VMEChauls Architecture ,

VxWorks/Unix Network

Wind River Systems KK. Special Seminar David Wilner Backplane Network Backplane Network Implementation

•Allows Packets to be Passed Between • Global ("shared") Memory Accessible CPUs on a Common Backplane to all CPUs on the Backplane • Test-And-Set to Interlock Access to •Allows all Higher-Level Protocols to Global Data Structures Run Between CPUs on a Common Backplane: • Interprocessor Interrupts to Notify Destination CPU of Arrival of Packet: I -TCP/IP -J -Sockets -VME Interrupts -rlogln or -NFS -Mailbox Interrupts -etc. or -Polling If No Interrupts Possible

. Wind River Systems J . Wind River Systems.

Wind River Systems KK. Special Seminar David Wilner Backplane Network Backplane Network Shared Memory Shared Memory Layout

Ready Value • On Separate Memory Board: Heartbeat Free Packet List

CPUO InteTrugtJnlom^.^

r_., . Input Queue »"U 1 InhvmnInterruptt IntInto On GPU Board: CPU n

_ Packets

V. > Wind River Systemswmtm^^ V . Wind River Systems J

Wind River Systems KK. Special Seminar David Wilner VxWorks Shell VxWorks Loader

1 C Expression Interpreter > Standard aout Files 1 Interprets Any C Expression, Including > Run-Time Linkage Functions and Variables • Shareable Code 1 Call Any System or Application Function • Incremental Loading Access from Terminal or Net (rlogin) ' Fast Loading Over Network Korn-Shell Style History Automatic Symbol Complettion Floating-Point I/O Ada Extensions

• Wind River SystemsJ ^•n Wind River Systems ••

Wind River Systems KK. Special Seminar David Wilner r "\ Debugging dbxWorks

• Symbolic Debugging • Breakpoints, Single-Step, Continue • Next, Continue-Till-Return • Task-Trace • Examine/Modify Task Registers • Check Stack • Symbolic Disassembler • On-Line Help and Error Messages • Remote Source-Language Debugging

VxWorks . Wind River Systems J Wind River Systems J

Wind River Systems KK. Special Seminar David Wilner Exception Handling Performance Monitoring

1 Default Exception Handling > Execution Timer * Error-bounds analysis for high * Offending task is suspended, but speed timing not deleted * All other tasks continue normally ' CPU Utilization Percentage User-Defined Exception Handling via Profiling Signals

> Wind River SystemsJ ^»« Wind Rlvar Systems ~S

Wind River Systems KK. Special Seminar David Wilner Ada VADS/Works Components VADS/Works

• Joint Product of WRS and VERDIX Ada Application Program Corp. • VxWorks Run-Time System l!ompiler-C>enerai»?G5 a Calls • Remote Source Language Debugging • Fully Integrated Real-Time Ada RTS Environment VxWorks taxUopkg Tasking System • Packages for All VxWorks Functions Interface pkgs *Non-Ada Task Control Available • Available Now in Interim Form (68K) • SPARC Later

V • Wind River Systems» Wind River Systems.

Wind River Systems KK. Special Seminar David Wilner VADS/Works VxWorks Debugger Designed for Ada

• Rendezvous Translates Into Damn Fast Semaphore •Other Ada Issues •Priority of Tasks I 'Aborting Tasks •Exception Handling •Terminate Alternative Unix ! m • Interface to Other Languages o • Remote Source Language Debugging via RPC, ptrace+ i • Packages for All VxWorks Functions Z • Non-Ada Task Control Available

VxWorks V Wind River Systems • Wind River SystemsJ

Wind River Systems KK. Special Seminar David Wilner VADS/Works Schedule RISC / SPARC

• Phase 1: 'Running, No Ada Tasking or-io Register Windows packages 'Allocatable 'Source Debugger Running • One or more register sets can be 'Available Now dedicated to a task, group of • Phase 2: tasks, or interrupt level 'Register "Cache" Algorithm 'All Ada Constructs Available 'Entire Ada Environment Runs as a • Save/Restore registers only as Single VxWorks Task needed, rather than on every context switch 'Available Now •Validation 1Q89 • Phase 3: 'One-to-One Tasking 'All of VxWorks Available via Packages 'Optimization, Enhancement, *2Q89 .

Wind Hlvar Systems —«^ > W.nd H.M.

Wind River Systems KK. Special Seminar David Wilner r VxWorks Version 4.0 Register Window Strategies Major Enhancements

•Support for New Kernels: • Save Only Used RW's •WIND (Native VxWorks Kernel) • Restore Only One RW • pSOS (Software Components Group) • VRTX-32 • Use RWs as a Cache • Dedicate RW to Tasks or Interrupt Levels •NFS - Network File System 'Application Configurable i •stdio - UNIX-Compatible Buffered I/O • Use RW as a Traditional Task Context, Instead of Subroutine Context •Signals - UNIX-Compatible 'Compiler Doesnt Generate Save/Restore • Performance Improvements •Pseudo-Parrallelism •Network - 30% Faster • Memory Manager - 2 to 5 Times Faster

V. , Wind RIvor Systems. . Wind River Systems

Wind River Systems KK. Special Seminar David Wiiner r VxWorks Work In Progress

1 Optimizations •Kernel •TCP/IP •Backplane Network > Tightly Coupled Multiprocessors •Remote Kernel Calls

• Porting to New CPU Architectures •SPARC •80366 •Others • Extended Source-Level Debugging and Performance Monitoring

•Support for New Peripheral Device I «SMb Disks I 'SCSI Disks V .D/AandA/D N, Wind River systems

Wind River Systems KK. Special Seminar David Wilner Data taking system for the FANCY

A. Manabe Institute of Physics, University of Tsukuba Tsukuba, Ibaraki 305, Japan

I would like to introduce a data taking system of the FANCY spectrometer used in the experiments of irAC collaboration (E90, E132, E133, E157, El73 and E187) at the 12 GeV PS. A readout system for a jet-chamber-type cylindrical drift chamber (CDC) in the FANCY is a multiprocessor system (10 CPU) with use of VME bus(VERSA Bus) and 680x0 microprocessors for fast data handling and for effective communication with a host computer.

1 System on VERSA bus [1]

1.1 General Over view

The FANCY spectrometer, which was constructed for the study of hadron-nucleus reactions in a few GeV region by the E90 collaboration1 at 1983, is a large aperture general purpose multi-particle spectrometer. The detector system which is con­ structed by two part of the forward drift chamber and the central drift chamber (CDC) is installed at the TT2 beam line of the 12-GeV/c PS. The schematic diagram of data acquisition system of the FANCY is shown in Fig.l. It is composed of CDC (Central Drift Chamber) part and the CAMAC part. The data are collected into the CCS-11 (an emulator of the PDP-11 with the CAMAC branch high-way interface) via DMAM ( Module between VERSA-bus and ) and CAMAC branch high-way. The data are buffered, formatted in the CCS-11 and then transferred to the on-line computer PDP-11/34. We use the multi-CPU read­ out system for the CDC data taking (Fig.2). The signal from the CDC through the preamplifiers are analog-memorized in the FEM (Front End Module). The in­ formations from the CDC that are drift times and pulse heights, are collected and digitized by the analog digital conversion module (ADCM). The versatile peripheral controller (VPC), which contains a MC68000 micro-computer, controls this process,

lKEK-E90 STUDY OF HIGH ENERGY NUCLEAR REACTIONS WITii •.ARGE APERTURE MULTIPARTICLE DETECTOR; Tokyo Univ., Tokyo Inst. Tech. collaboration 1981-1984

— 84 — subtracts ADC pedestals and stores the data on the common memory module by the decided form. The FEM has a pair of shaping-filter amplifiers, discriminator, TACs, analog integrators and analog memories. It has a capacity of storing up to four multiple hit signals at the same trigger to provide high multiplicity events. The ADCM has a 12-bit ADC device the successive approximation type and has scanner circuit which searches for FEMs having hit signals. In the process, spurious data of the FEM which has any real hit is suppressed and never AD converted. Its AD conversion time is 5fis to each analog including settling time and the scanning speed is 250 ns per FEM. The analog pedestal table used by the pedestal subtraction is managed with a fine monitoring. One ADCM manages about forty FEMs. One VPC manages one ADCM. Ten VPC's, one common memory module and one DMAM (Direct Memory Access Module between CCS-11 and VPC) are connected with each other on VERSA-Bus. The VERSA-bus has the same protocol and electric regulation of the VME bus, while it has a different board size and card edge type connectors. We used 384 FEM's, 10 ADCM's and 10 VPC's for acquiring the data from 16 x 24 sense wires. The typical trigger rate is 160 triggers/spill. (The beam cycle of 12 GeV PS is 0.3 Hz. A beam spill is 0.5 sec after an acceleration period of 2.5 sec.) The average speed of data flow is about 170 Kbyte/sec during the beam spill. The example time table of data taking is shown in the Fig.3. Typical size of each contents of the data is listed in the following table.

Maximum data size Typically acquired Cylindrical drift chambers 6784 400 Other chambers 640 60 Counter hodoscopes 160 60 Total 7584 words ~ 500 words

1.2 Soft ware

Although all 10 VPC's are identical in the hard ware configuration in the data taking mode. One of the VPC is designated as a master one which controls the others and communicates with the host computer of CCS-11. Communications between VPCs are done with interrupt which is caused by accessing a register of each VPC and flags in the common memory. Interrupt is used for the starting signal of data taking. Flags are used for notify the job completing of each CPU module.

— 85 — Actual data taking proceeds as followed. When an event trigger comes, the CCS-11 orders the data taking to the MVPC using the interrupt. MVPC sends interrupts all SVPCs. VPCs accumulate hit counts of the connected FEMs using the scanner without digital conversion. MVPC sums up these counts and reserves the space of data in the common memory. Then VPCs read the digitized data to a decided position of common memory after pedestal subtraction. After the data is read, each SVPC tells its completion to MVPC and MVPC notifys to CCS-11 using the flags on the common memory. Through the use of multi CPU system, we can monitor the stability of the CDC electronics closely. This is one of the advantages of using such system. Every minute, the host computer generates test pulses 32 times in one monitor cycle via CAMAC. The test pulses are sent to the FEMs to check time conversion gains and analog pedestals. From the results of 32 trials the average and standard deviation are calculated and compared with values from the latest test cycle. A warning is sent to the host computer when a significant deviation is found. Since extraordinary data are not used for these calculation, these checks are free from illegal signal such as cosmic rays.

2 New system on VME

2.1 General Our rear end system of CCS-11 or PDP-11/34 is too old, while the CDC data acquisition system using VPCs is efficient and is not so old fashioned. In 1983, •KAC collaboration had to develop and make the VPCs by themselves, because there existed no commercial module which satisfied their requirement for multi CPU data taking. Therefor we have maintenance difficulty on VPCs now. Through the evolution of 5 years in the electronics, more powerful VME modules than the previous VPC are available commercially. We decided to reconstruct all the data taking system on VME system. In our application, speed of VME bus isn't a bottle neck of whole system, because AD conversion speed is not so fast. Fig.4 shows the schematic diagram of the new system. As the new VPCs Motorola MVME104 which has 68010 (8 MHz) CPU is used. It is cheep and has enough feature for our use. This CPU module provides a interrupt register (location monitor) and two-port memory which are indispensable to the multi CPU system. Since an interrupt is triggered by writing a ward to the

— 86 — interrupt register through VME bus by the other CPUs, we can use this interrupt as a signal between them. We use MVME220 Memory module for communication among VPCs and for data buffer between the CDC part and the CAM AC part. It has a two port memory and can be accessed from VME bus through Jl connecter and from VSB bus through J2 connecter. This memory module is used for connecting two VME crate.

2.2 Deveroping Environment

The developing environment is very important to make a software system. In the development of previous system, all softwares were developed by using a 68000 cross assembler on CP/M machine of NEC PC8001. For debugging and loading the programs to the target VPC, a debug-monitor and loader which was able to be used in multi-CPU system were provided by themselves. The programs for the actual data taking was only less than 10 % of the whole developed softwares. Nowadays, there are several OSs and monitors that provides a real time response and multi CPU management mechanisms, for example pSOS, VxWorks, VRTX and so on. However, they often require an UNIX work station (we have used VMS or RSX11 as the host computer of on-line system) and some training to use. If we can add some multi-CPU functions to a single CPU OS which has be ordinary used on VME modules such as OS9 or Versa-DOS, it is not a bad solution to use these familiar OSs. As the OS for 680x0 type CPU, we have used the Idris operating system. Idris is a multi task and multi user OS which is compatible to UNIX V6 system. In addition it has a feature for a real time use. We added an inter CPU communication mechanism and a shared (RAM) disk mechanism to Idris by modifying its device drivers. Inter CPU communication and shred disk are made by using the common memory, a read-modify-write semaphore and interrupt register (location monitor). Since Idris provides x-on/x-off character flow control not only for output but for input, the inter CPU communication can be easily realized. On the other hand, the shared disk is rather difficult. When a CPU changes a file in the shared disk, the other CPU does not recognize the file is changed. Because the Idris uses a disk cache mechanism and has some i-node list in RAM to speed up a file accesses. Avoid such situation, a CPU ,who wants to write a file, mounts the shared file system exclusively by 'mount' system call, then write something and dis-mount it for release to another use. In another hand, when a shared file system is mounted by read-only access, any CPUs can read a file on it simultaneously, but cannot change it.

— 87 — Using Idris command sequence, several multi-CPU functions are realized. The following are examples, where /dev/cpu(n) is a character device for communication to CPU(n) and /dev/md/1 is a shared disk. o Remote login to CPU2 o CPU2 Output to the console. # cu -l/dev/cpu2 # cat

o Execute process at CPU2 # mount /dev/md/1 /x # cp program /x # mount -u /x # echo "mount -r /dev/raml /x" >/dev/cpu2 # echo "/x/program" >/dev/cpu2 These expansion of a single OS has only a tiny part of real multi-CPU OS's functions. However, we could develop and debug the data taking programs effec­ tively.

2.3 Problems at VME system

When we use the VME system for a multi-CPU system, there existed some prob­ lems. One is that the VME standard has no broadcast mechanism both on data trans­ fer and interrupt. If it could be supported, we can simultaneously send a event to all CPUs for starting data and effectively distribute data among them. Since VME provides only 7 interrupts, we cannot use it for a signal between more than 7 CPUs. Some kind of CPU boards provides a local interrupt register (or location monitor) mechanism as previously mentioned. If each of the CPU modules can have it on a 'different address', this mechanism allows the interruption of more than 7 CPUs. Some VME modules sometimes cause serious problems, when these are used together with modules produced by the other makers. In my experience the modules of AVAL Data Co. Ltd. and those of Motorola are not congenial in constructing multi-CPU system. The reason is that one of these modules is not VME module in a strict sense. It is very difficult to find out the reason why they do not work, because a subtle hard ware timing is concerned.

Reference: [1] KJchimaru et al. NIM A237 (1985) 559

— 88 — FANCY DATA ACQUI5ITON STSTCM PDPH CAMAC ao lirifOOIK || IHTIHUPT lOUKll1"l ! HO,

ADC

toe

33IO CCSII ADC

HAM DfTICtOII DMA WINDOW

VERSA.us

COMMON J MIMOKT

MM ADC v rc IMTIlfACI

P DC IIH Fig.l

, A—Z_ACTF77 FAST GATE >- ) /p.C./=/COM..M./

C.D.C. ADC BUS ~~f/ / //BUSARB/

/S-ADCM/ /M-AOCM/P-SUS/VP c; 60M-COAXIAL CABLE ' / 7 /^f / 7 /gy / ~7__fi?y

X20 X20 X10

Fig.2

— 89 — CCS 11 M-VPC VPC's TIME t-5) TRIGGER J . . -o.O

IPS U""GO - 0.06 0.12 I HIT COUNT CHECK HIT COUNT a: 0.22 TOTAL HIT COUNT •> I-OJS

DATA CONVERSION BEAM, HODO- END SCOPE iO.B4

5 U.

DMA

6 2.1 CATHODE CHAMBER ? =^c

DATA TAKING SEQUENCE & TIMING (CDC 5 TRACKS)

Fig.3

90 — HOST COMPUTER CAMAC

UME BUS

I UME BUS SUPC SUPC MVPC:

MUPC: Master UPC SUPC: Slave UPC MUME150 IF IF

flDCM fiDCM

X 16

— 91 — Balloon-Bone Experiment

Tadahisa TAMURA

Physics Department, University of Tokyo

— 92 — A

Balloon-borne experiment Gamma-ray 200keV~3MeV Weight limitation 150 m 3MeV"100MeV Vacuum 900 kg Heat Power supply 40 km Noise Multiple Compton Telescope

Si strip detector

100 ch Csl(TI) Csl + PD 16 ch 100 ch 200 ch x 100 Csl + PMT 88 ch

o Scintillation counter i) CsI(Tl) with PD ii) CsI(Tl) with PMT (active shield) o Data rate 32 * 100 Hz x 120 byte = 3.8 *12 kbyte /sec —• 1* 2 GB • Analog VLSI ( 12 hours) • High speed data ° Data transferred by the telemetry acquisition system 4 kbyte/sec » 8 mm VCR 2 GB Compact an Electric wave from tbe Telemetry on the balloon

Bit icynchronlzer

. SIrlal data

Analog Sacommutalor Data Recorder E 16blt Parallel data Divider H" Telemetery Telemetery lnlerfaca Interfaca VME bus VME bus I c-3 I CPU DRAM SCSI CPU DRAM 68020 2 Mbyte Interface 68020 2 Mbyte

8mm VCR Online Monitor 2GByte Hbook Hplot GKS III. Each VME Module ii) Trigger (Interrupt) Module

Discri Signal ADC .Gate i) ADC Module Input 8 (16) channels Trigger 1 S Logic Peak,Sample hole Trigger Data width 16 bit <± H Hit Pattern Mask Pattern Interrupter Control CAMAC function Register Register E^ I /niiijijiji;::::!::: VME bus >This board is wrapping-wired. >The CMOS devices (74HC series.GAL) are used to keep power down, except the devices for the bus interface.

Hllllllll * GAL : the product of Lattice iii) Communication Module iv) Control Module Interface for the PCM unit (Telemetry) VME bus FIFO ( Dual port RAM ) 1 kword Trigger mode read request VME bus Interrupter

Trigger Data mode Command I/O port (4 bit) Box Telemetry- (Relay)

asynchronous synchronous SCSI Reset 16 ch ADC Gate (16bitdata) 4 kbyte/sec IV . Software ft Fast Operation I OS9/68000 8, Multi Task o •Interrupt Handling n •Data Recording •Environment Monitor I •Watchdog ft o Program ( C ) —> PROM SNAC-II Experiment

VI. Future Plan

GSO + Csl SN1987A phoswich up

down • 64 individual trigger source • 2 kHz trigger

• 2 CPU system + 2 VCR Development of Second Level Trigger System Based on a Microprocessor Array in the TRISTAN VENUS Experiment

Hiroshi Sakamoto

National Laboratory for High Energy Physics, KEK Oho 1-1, Tsukuba, Ibaraki 305, Japan

A second level trigger system is being developed at VENUS. The system consists of a lot of microporcessors which are connected to­ gether and form an array, hi the system, hit information of the central drift chamber is processed in parallel and reconstraction of tracks is performed. Present status of the development, especially on the evalu­ ation of the hardware performance and on the outline of the software, is reported.

1 Introduction

Up to now, data acquisition of the VENUS detector[l] has been triggered by a hard-wired logic trigger circuit based on memory lookup tables [2] and majority logic units. This system is so simple that the processing speed is very fast and the reliability is high, but due to limitation of hardware implimentation density, the resolution of the trigger, i.e. recognition of tracks on tracking devices or clustering of hits on calorimeters, is rather poor. In order to improve the resolution, it is still impracticable to extend straightforward the hardwired logic mechanism to achieve required resolu­ tion even by utilizing the art of electronics. A possible solution to this is to introduce microprocessors into the trigger mechanism. Recent development in micro-electronics is so remarkable that very high speed microprocessors

— 100 — are available at reasonable costs. A much higher computing power than a mainframe computer is obtained by forming a microprocessor array. So, it seems feasible to construct a second level trigger system based on such a technology, which covers the succeeding step of the existing hard-wired trigger. Performance of the second level trigger mechanism was evaluated on the mainframe using Monte Carlo data and real experimental data of VENUS. Investigation to introduce a microprocessor array is also ongoing using com­ mercially available chips. In the next section, the mechanism is described to some detail and the result of the evaluation of the algorithm is shown in the section 3. In the section 4, performance of the microprocessor array is reported and in the last section, we discuss the implementation of the system.

2 Second level trigger mechanism

There may be several steps in triggering a data acquisition system used in an e+-e~ collider experiment. The first step is to initiate an AD conversion using a 'clear' signal. The clear signal is distributed preceeding to every collision time unless a trigger condition is met. When a trigger condition is satisfied, the trigger system blocks the clear signal in order to complete the A/D conversion and notifys the data acquisition system to gather the data of the event. In order to avoid the deadtime, the decision must be done between the beam collistion interval, so the trigger system of this level is constructed based on a hard-wired logic. In the VENUS case, the collision interval is 5 microseconds ( designed so as to operate at 2.5 microseconds interval ) and the system is designed so that the decision is made in 1.7 microseconds and allows the frontend electronics spend 0.8 microseconds to clear themselves. Hereafter we refer this mechanism as the 'first level trigger'. Once a triggered event occurs, the clear signal is blocked untrl the A/D conversion is completed and the data are transfered to the data acquisition computer, and then the system is released for the next event. It takes about 10 miliseconds in our case to do this. All the events are transfered to the host without any succeeding filtering. This simple scheme consisted only

— 101 — from hard-wired logic circuits provides the reliability and the transparency to the system. So as far as the capability of the host computer i.e. the data transfer rate, and the data size, allows, this scheme is a reasonable choise. Unfortunately, the recent VENUS case is not so quiet and as the beam current has been increased, the dead time has become serious. In order to reduce the dead time, it is effective to introduce a second level trigger scheme. Without the second level trigger, once an event occurs, the system become busy and unable to accommodate the next event until the data have been transfered. If the second level trigger system decides, during the A/D conversion time, whether the event is to be acquired or not, the data acquisition system can be reset and become ready to accept the next event when the event has appeared out of interest. For example, the A/D conversion and the event building take 10 miliseconds and, the second level trigger decision is completed in 5 miliseconds, the dead time is reduced to 50 % for rejected events. The allowed time interval for the second level trigger system to compute should be shorter than A/D conversion and event building time. In this time, the system should calculate and make a decision which cannot be performed by only a hard-wired first level trigger circuit. In the VENUS case, the allowed time is of the order of 1 milisecond. Latest microprocessors can execute their instructions as fast as the order of 10 MIPS and this figure indicates that 10000 steps can be executed on each chip. Of course, although there may be some difliculties to make many CPU's cooperate, a CPU power of the order of 100 MIPS can be achieved easily. Once an event record is built on the frontend processor, the data may be transfered to the host, or abandoned according to the result of analysis of these data made by the frontend processor. This process should be refered as the 'third level' trigger. In this step, though the total amount of the experimental data is reduced, the dead time cannot be reduced by this procedure.

3 Algorithm In order to test the feasibility of the second level track trigger scheme, an evaluation program was implemented on a PACOM M780 mainframe and

— 102 — the performance or the processing speed was estimated using both Monte Carlo and real experimental data. Though the subject of the evaluation was limited to the CDC ( Central Drift Chamber ) data, of course the mechanism can be extended straightforward to other detector components. The tracking algorithm employed here is as follows.

1. The hit information of 7104 CDC anodes was used in the tracking and no drift time information was used. 2. The cell size was tuned so as to reduce the inefficiency of tracking and somewhat larger value than the physical one was employed. 3. Tracks were all supposed to come from the collision point. 4. The starting tracking road was calculated using hit wires in the in­ nermost and the second inner layers. The road was represented as a set of two arcs which came from the origin and passed one end of the innermost cell and then passed the opposite end of the next cell. 5. The road was extrapolated to the next layer and hit wires were seached on the road. 6. If a hit was found, the road parameters were recalculated using the innermost hit and it, and then the road was extrapolated to the next layer. 7. These procedures were repeated until no hit wires were found in the next layer or until the search reached to the outermost layer. 8. Inefficiency of anodes was taken into account by extrapolating the road toward the third layer if there was no hit on the next layer. The search was terminated when two succeeding layers had no hit wires inside the road.

The search was first performed for axial wires which brought only r- information and then if a track was found, using the r- data and including the slant wires, a 3-dimensional search was performed. Resolution of this algorithm was evaluated using Monte Carlo data on the mainframe. Figure 1 shows the momentum resolution. A track in this

— 103 — 140.0 130.0 120.0 110.0 100.0 90.0 BO.O 70.0 60,0 I 50.0 40.0 30.0 30.0 10.0 0.0 LI 0.0 0.5 1.0 1,5 2.0 Momemtum ( MeV/c )

Figure 1: Distribution of momenta calculated from a track pattern obtained by an algorithm described in the text. The input events were generated so that each event had only one track of 400 MeV/c. The width of the peak corresponds to the momentum resolution of the algorithm. algorithm was characterised by the curvature and the coordinate of the center of two arcs which defines a road. The curvature corresponds to the momenta of boundaries of the road. In this figure, the lower momenta of road boundaries are plotted. Input data were generated by a Monte Carlo program and each event had one track of 400 MeV/c momentum came from the origin. The width of the peak corresponds to the momentum resolution of this algorithm at 400 MeV/c and is less than 100 MeV/c. In Figure 2, the sharpness of the cut using this algorithm is displayed compared with the first level scheme. Input were uniformly distributed momenta data and the yields of the input, the first level and the second level cuts are displayed. As a matter of fact, noise events have exponential momentum distribution and it is very important that the momentum cut is so sharp just above the region where background events dominate. Real experimental data were fed into the program to evaluate actual filtering efficiency. In Figure 3, the momentum distribution of sample data

— 104 — 130.0 —1— ' 1 120.0 . 110.0 100.0 90.0 Pi BO.O h Viv> & 70.0 : n 60.0 •s 50.0 40.0 . 30.0 20.0 n 10.0 0.0 (7 \ , 0.0 0.5 1.0 1.5 2.0 Momemtum ( MeV/c )

Figure 2: Momentum distribution of tracks of input data generated by a Monte Carlo program ( dashed ), those of triggered by the first level ( dotted ) and by the second level ( solid ). In this figure, the momenta were obtained using the standard VENUS tracking routine.

— 105 — 190.0 lao.o 170.0 160.0 150.0 140.0 130.0 il l trar y 00.0 BO.O

Arb i 70.0 eo.o so.o

0.0 0.5 1.0 1.5 Momemtum ( MeV/c ) Figure 3: Momentum distribution of tracks reconstracted by an algorithm described in the text. The input data were taken from real experimental data of VENUS. chosen at random from recent experimental data. Requiring tracks to be greater than a threshold momenta, the sample data were reduced to 1/5 through 1/10 as listed in Table 1. The CPU time consumed was about 20 miliseconds per event, and trig­ gered events consumed much more CPU time than rejected events. Aver­ age CPU time to reconstruct one track was about 4 miliseconds. In the distributed computation, this value is an important measure to design a processor array.

4 Microprocessor array

In the previous section, a sample tracking algorithm has been introduced and the performance has been evaluated. In this example, a few miliseconds of CPU power of a mainframe is required to achieve the second level trigger. The next task for us is to constract a microprocessor array which is capable of such a computing power.

— 106 — Table 1: Reduction ratio of real data using the algorithm mentioned in the text.

threshold input servived (MeV/c) 400 500 110 550 79 700 47

There are several kinds of microprocessors which are commercially avail­ able and have considerablly high performance. Among them, a TRANS­ PUTER is chosen for the processor array element. The architecture of the TRANSPUTER is shown in Figure 4. The TRANSPUTER is a high speed microprocessor esspecially designed to form a processor array using links. Four high speed serial links are integrated into a chip and by connecting chips directly using these links, a processor array can be constructed without any external auxiliary circuits. These links can trans­ fer data at the maximum rate of 20 Mbps. This processor also has some high speed memories accessible at 50 nSec on the chip and, IMS T800, the high end version of the TRANSPUTER has also a floating point unit on the chip. The processing speed of a single TRANSPUTER^] was tested first. The result is displayed in Figure 5 where the relative speeds to execute logically same programs on several computers are plotted in the unit that the main frame FACOM M780 is equal to 1. In this figure, the performance of T800 appears somewhat lower than mentioned in the databook[3]. Programs ex­ ecuted on the TRANSPUTER were allocated on the external memory so the access to them takes at least 3 times longer than the on-chip memory. Optimization of programs may be different depending on the system. So this figure represents the worst case of the TRANSPUTER performance. According to this figure, in the worst case, we have to prepare 20 TRANS­ PUTERS to realize the performance of FACOM M780. The second problem is how many processors can cooperate at the same time within an allowed overhead. The TRANSPUTER communicates with

— 107 — Floating Point Unit vcc- GND- CapPlus- CapMlmra- 32 32 bit Reset- System Processor Analyse- services Errorin- Eiror- BootFromROM- Clockln- Link - UnkSpecial ProcSpeedSelectO-2 - Services -UnkOSpeclal -Unkl23Speclal Timers 32 Link -LlnklnO Interface -UnkOutO

r4 k bytes 32 Link -UnkfM of Interface -UnkOutl DIsablelntRam- On-chip N M RAM 32 Link -Linkln2 Interface -LfnkOutZ

ProcClockOut- 32 Link -Unkln3 notMamSO-4* H. H, Interface -UnkOut3 notMemWrBO-3- notMamRd- - EventReq External Event notMemRI- Memory - Event Ack Interface Mem Wait- ru MemConflg - _N MemnotWrDO 32 >MemnotRfD1 MemReq- "V MemAD2-31 MemGranted- fc

Figure 4: Architecture of a TRANSPUTER.

— 108 — Relative speed ( FACOM M7B0 = 1 )

p o o © to u —1—1—1—1—1—1—1—1—1 —1—'—'—l— ' 1 ' - 12J ' Average 1

sin(x) 3*800-20 =a vau/7ao ^TOBMO sqrt(x)

log(x)

exp(x) ^

cos(x) —:—n 1

atan(x)

x/y

x*y

x-y

x+y ^

• • • • 1 • • i 1 1 1 1 1 . 1 .

Figure 5: Comparison of processing speeds among several kinds of com­ puters. Relative speeds compared to FACOM M780 are plotted for various floating calculations.

— 109 — each other through the communication links and one processor has only 4 links so messages between distant processors are passed through many pro­ cessors in between. If this trafic is so crowded, the overhead to manage the links become a serious problem for each processor. Though this depends on the application or on the algorithm, we have to get some insight on this problem. A worst case analysis was performed again which has reality to some extent. Considering the case that all array members join the calcula­ tion and do it independently, 1 kB data, which correspond to the number of our CDC wires, are transfered through the links. The load of calculation on each member is just as much that is done in the allowed time in the second level trigger scheme. Relative performance of the u.Tay is plotted in Figure 6 by varing the number of elements. Due to the overhead, the performance curve goes saturated at more than 5 or 6 processors. This implies that the maximum size of a processor array is limited by the size or rate of transfered data relative to the computing load.

5 Implementation

Up to the previous section, we have studied the algorithm of the second level trigger mechanism and technique to form a microprocessor array. We have to achieve a computing power equals to FACOM M780 for each tracking and it must be realized using at most a few processors. In order to do thii, we have to study how to optimize the algorithm or the program code. As the consequence, a possible answer to the second level trigger system is to make a two dimensional latice of processors and in one direction, the computing load is distributed so as to the each row covers one sector of tracking. The average number of tracks in the real experimental data is about 5 and 90 % of events have less than 8 tracks. So in this direction, it is enough to prepare 6 or 7 rows. Each row in the processor latice performs tracking in one sector of CDC. In order to make the second level trigger scheme meaningful, the tracking must be done in a few miliseconds. This means that at least one track should be found if it exists in the sector. This might be done much faster than the estimation given in the previous section, because the value men­ tioned above represents the total time of tracking no matter whether it was

— 110 — 2 4 Number of Transputers

Figure 6: Performance of a processor array relative to a single processor.

— Ill — crate2 V crate4 7 Z crate3 crate5

FASTBUS crate 1 crate6

20Mbps TRANSPUTER link

T800 T800 T800 T800 T800 ZTZ zrz I | TBOO 68020FPI T800 TBOO T800 zc I I T800 T800 T800 T800 T80O FASTBUS | |

T8D0 TBOO TBOO T800 nz I I T800 TBOO T800 TBOO T800 0 zn ZIZ I I T800 T800 TBOO

PROCESSORARRAY

Figure 7: Conceptual diagram of the second level trigger system.

— 112- failed or not. So in one row of the processor latice, 5 or 6 processors are to be involved. Thus, the size of processor array is derived and is for example 6 x 6. In Figure 7, conceptual diagram of the system is drawn. Information on hit wires of CDC is first gathered by a TRANSPUTER on each crate through the auxiliary cards mounted on the rear side of FASTBUS crates. Once collected by the TRANSPUTER, all the data are transfered via communi­ cation links to the processor array. The tracking is done on the array and the result is notified to a 68020 FPI[4] which works as the frontend master and makes the final decision whether the data are transfered to the host computer. This mechanism of trigger decision on the 68020 FPI has been already implemented is working[5]. Development of hardware components is now ongoing. We have to pre­ pare transputer modules mounted on the FASTBUS auxiliary card and an interface between the TRANSPUTER link and the 68020 FPI. In the both cases, a mini-card of a TRANSPUTER with an external bus buffer is in­ troduced and the external bus interface specific to the installation is now the subject of the design. As one of interfaces to the processor array, SCSI interface is to be in­ troduced. The transfer speed of the SCSI bus is of the same order of the TRANSPUTER link and the SCSI interface is furnished to many types of mini- or micro-computers so this processor array can be also used as a general purpose computing engine for such computers. As the conclusion, the second level trigger system is now under construc­ tion based on a microprocessor array. The feasibility of the mechanism has been studied and a possible form of the system has been retrieved. More studies to optimize the algorithm in parallel computing is necessary. Know- how obtained in this development may be applied to the new generation of data acquisition in the high energy physics and other fields.

References

[1] K. Amako et al., Nucl. Instr. and Meth. A272(1988)687. [2] T. Ohsugi et al., Nucl. Instr. and Meth. A269(1988)522.

— 113 — [3] TRANSPUTER DATABOOK, INMOS Umited 1988. [4] Y. Aarai and Y. Yasu, IEEE Trans. Nucl. Sci., NS-35 No.l(1988)300. ( KEK Preprint 87-112 ). [5] S. Uehara and Y. Arai, private communication.

— 114 — UPGRADE OF THE AMY TRIGGER SYSTEM

Sergei Lusin University of South Carolina Columbia, South Carolina 29208 USA

ABSTRACT A proposed upgrade of the trigger system for the AMY detector at the TRIS­ TAN e+e~ collider is described. The upgrade involves making use of azimuthal information from the central drift chamber to identify radial tracks traversing the chamber.

INTRODUCTION

The AMY detector, at the TRISTAN e+e~ storage ring of the Japan Na­ tional Laboratory for High Energy Physics (KEK), is a compact general purpose detector optimized for lepton and photon identification, employing a high mag­ netic field (3 Tesla). The detector has been operating successfully since January 1987.Pig.l is a one-quadrant cross-sectional view of the AMY detector showing the Inner Tracking Chamber (ITC), Central Drift Chamber (CDC), and electro­ magnetic Shower Counter (SHC) inside a 3 Tesla solenoidal magnet coil. The coil is surrounded by a steel flux return yoke followed by a drift chamber/scintillation

-TUBM-EBBimr

Fig. 1 The AMY Detector

— 115 — counter muon detection system. The endcap regions contain a Luminosity mon­ itor (LUM), a Pole Tip Shower Counter (PTC), and a Tagging Counter (TC). The AMY trigger system relies mostly on information from the SHC, CDC and ITC. It consists of three nominally independent systems each of which places emphasis on one of the three detector components mentioned above. The three subsystems are the i.) ITC-based tracking triggers, ii.) CDC-based tracking triggers and iii.) SHC total energy trigger. Redundancy in the trigger system serves to enhance overall efficiency and simplifies the calculation of individual detector efficiencies. The AMY trigger system was developed mainly by groups from University of South Carolina, Rutgers University and U.C. Davis. Overall trigger inefficiency for hadrons, bhabhas, dimuons and tau pairs for the fall 88-winter 89 run period has been 1% or less with overall trigger rate on the order of 2 Hz. Although this represents very good performance it cannot be expected to continue indefinitely. TRISTAN luminosity is expected to increase significantly and the implementation of a micro-beta scheme will raise it further still. The CDC-based 2, 3 and 4-track triggers have shown themselves to be sensitive to beam-related backgrounds and are the major contributors to rate for the CDC-based triggers. Possibilities for further tuning of trigger criteria are nearly exhausted and reductions in trigger rate will have to come at the expense of efficiency. This report deals with a proposed hardware upgrade for the CDC-based tracking trigger system to raise trigger selectivity by using azimuthal information from the CDC. Further discussion will be limited to the CDC-based system alone.

CDC TRIGGER SYSTEM

The AMY CDC is a cylindrical multi-wire drift chamber with hexagonal cell structure consisting of 40 layers of sense wires arranged in 10 bands. There are 6 bands of axial wires, each axial band consisting of 4 layers except the innermost band which has 5. There are 4 bands of stereo wires, each consisting of 3 layers. Stereo bands are radially alternated with axial bands. Only axial wire hit information is used for triggering purposes. The principle of the CDC tracking trigger involves counting the number of stiff (radial) track segments in each CDC band and using these band tallies along with information from the shower counter to arrive at an event classification. Identification of radial track segments is accomplished by memory mapping in 16K x 1 bit RAMs. Each RAM covers a field of 14 adjacent wires in one band (referred to as a sector) and the pattern of hit CDC wires is mapped to a single bit output signifying the presence of a radial track segment. Track patterns qualifying as being sufficiently radial are zigzags or hooks. Slant tracks and patterns deviating further than that from the radial are not recognized as

— 116 — Zigzag Hook Hook Slant ^•!o°o>Wo°oWo° WoWoVoWo°o0 - Valid- Invalid

Fig. 2 CDC track patterns valid track segments. Track patterns are shown in Fig.2.

RAM Sector Boundaries. Fields of adjacent sectors overlap by six wires with each neighbor (Fig.3). This gives redundancy in track recognition, enhancing efficiency but also in­ troducing the possibilty of double-counting. To suppress this, cluster counting schemes are introduced in hardware. For reasons of efficiency, track patterns corresponding to valid 3 out of 4 are accepted and exceptions for certain condi­ tions are made to allow for noise on the sense wires. Track pattern evaluation is edge-sensitive to reduce sensitivity to noise and extraneous hits.

Fig.3 CDC RAM sectors

Track recognition RAMs are mounted on printed circuit boards, 8 RAMs per board, spanning a field of 64 wires. The RAM cards are mounted on the auxiliary backplane section of the Fastbus crates holding the TAC modules serving the CDC axial bands. Signal pickoff is accomplished directly from the TACs via the auxiliary backplane connectors. Once identified, track segments are tallied using majority logic units to pro­ duce a summed current, followed by a flash ADC stage. Each band is tallied separately. At this point the total CDC information for trigger purposes consists of 6 numbers, the track segment tallies for axial bands 1-6. This information

— 117 — Track segment recognition Summing by band Digitizing First stage event classification 48 summed shw. anode signals To further stages of memory lookup Inner 16 layers HI9I1 Thresh LRS LRS 2372 Shower Discriminators L «0« ZL, Showtr ITU Digitizing 48 summed shw. anode signals First stage event classification

Outer 1 layers

U rlln I. First stages CDC-based tracking trigger

Fig. 4 First stages CDC trigger is then used in succesive stages of memory lookup along with information from other detectors (Fig.4). Later stages of memory lookup are beyond the scope of this discussion. The region of interest for this study is at the majority logic unit/flash ADC stage, where azimuthal information is lost.

CDC ANGULAR CORRELATOR

The process described above serves to identify radial track segments in one band but does not neccesarily find a radial track. An improvement in trigger selectivity could be realized by using the outputs of the RAMs to spot angu­ lar correlations between track segments in different bands. This would involve adding a processing stage immediately after the RAMs to supplement the sum­ ming and digitization step. This stage (the CDC angular correlator) does not need to be as involved as the first-stage RAMs, where noise and efficiency con­ siderations complicate matters. The simplest approach is to ask that several triggered RAM sectors be lined up in phi, and that this line traverse the CDC. In addition, we can require activity in the outer layers of the shower counter in the same neighborhood in phi.

Segmentation in the CDC. To simplify design, the following adjustments are made: i) There is no need to use all six bands independently. It was decided to use an inner pair and outer pair of bands (bands 2+3 inner, 5+6 outer), ii) Granularity on the order of the first stage RAM sectors need not be maintained. Since the number of RAM

— 118 — sectors differs from band to band regions of common phi need to be defined in the CDC, a new unit (termed a macrosector) is needed.

Macrosector definitions. A macrosector consists of one or more adjacent RAM sectors in one band that span a given angular range. Macrosector boundaries are defined relative to the shower counter anode towers. There are 48 macrosectors in bands 2 and 3, whose boundaries are defined by the anode tower bisectors. Bands 5 and 6 have 96 macrosectors, boundaries in this case defined by both anode tower boundaries and anode tower bisectors. Since the placement of RAM sectors is determined by wire count in the CDC and not by the shower counter it is not possible to observe the macrosector boundaries exactly but the fine granularity of the CDC RAM sectors allows a close approximation. Macrosector assignments for the RAMs in each band were done by hand, seeking best fit to macrosector boundaries as defined above. An overall view of the segmentation scheme is shown in Fig.5.

CDC Angular Correlator Segmentation scheme.

Shower outer layers

Fig.5 CDC Macrosectors

The state of a macrosector is the logical OR of the RAMs belonging to it. Each event is characterized by 2 48-bit numbers representing the macrosector states of the inner bands and two 96-bit numbers for the outer. A radial track would appear as a coincidence in like-numbered macrosectors.

— 119 — Coincidence alignments. Requiring a coincidence between like-numbered macrosectors will result in high inefficiencies since many valid tracks will not be confined to one macrosector corridor and macrosectors don't overlap. To enhance efficiency the active area in outer bands corresponding to a given inner macrosector is broadened, but only in one direction or the other. This results in two possible "alignments" for which a coincidence can occur and determines the shower anode tower to be used in forming the coincidence. The two alignments are shown in Fig.6; they are equivalent up to a reflection.

Alignment A Alignment B Shower outer | 1| lESSSSSSSlI 1 I \WZm.\ II 1 Bande •••^^^•n r~ir~l^gg^^irnr~irn Bands ••n^Ea^min r~ir~)EgaKgsE%ar~ir~iri

Band 3 I lESSSSSil 1 I H5SSJS3I 1 Band 2 I \W/////A\ 1 I \V/M^A\ 1

Fig. 6 Macrosector alignments

A radial coincidence for a given inner macrosector is defined as the logical AND of all 4 bands and the shower anode towers corresponding to one of the two alignments. The states of bands 5 and 6 are the logical OR of the macrosectors shown in Fig.4. In addition, states with 3 out of 4 bands satisfying the above conditions are allowed, with the exception of the states shown in Fig. 7 (termed skew veto).

Alignment A Alignment B »»w«r.uter p ,| 1^^^| ! | MW/A\ 1| 1

Band5 ••••n^izo ••^••••n

Band 3 I \Y///A\Z=3 l»*« I IF7777r Band2 I IFEK?1I=] orr CZZDESSII Fig. 7 Skew veto

Correlator modeling. A software emulator was written and used to estimate the performance that could be expected from such a design. The emulator routine accepts an event record and outputs the number of radial coincidences using the same segmen­ tation and counting rules described above. Several classes of event data were

— 120- used as input. The object of the study was to determine what difference if any existed in the radial coincidence counts of junk events compared with selected data events. Raw data was used as a reasonable approximation to junk events. Selected event data used as input included hadron, bhabha, dimuon and tau pair events.

Radial coincidence distributions were accumulated for all event classes, sep­ arated by trigger categories as follows: i) Multitrack (hadronic) trigger, ii) Loose 2-track + SHC outer layers trigger, iii) 3 and 4 track trigger and iv) "Perfect" 2-track trigger, a restrictive two CDC-track trigger with no SHC requirement.

Histogram plots of the results are shown in Kg.8. Looking at the 2-track-f shower distributions, there is only one bhabha event that has zero radial co­ incidences while raw data has 511 such events. Requiring a TniniiniiTn of one radial coincidence for triggering would cut the rate by 49% while losing only one bhabha, which represents only a 0.2% decrease in efficiency. No tau pairs or dimuons from this sample would be lost due to this requirement.

Rejection ratios.

In order to compare results of different segmentation and counting schemes during the development of this design some performance criterion was needed. The main objective at this stage was to achieve TnavinmiTn possible rate reduction at the least cost in efficiency for each trigger and event class. A criterion termed the rejection ratio was obtained by treating the number of coincidences as a con­ tinuous value and interpolating to find the number of coincidences corresponding to a 1% decrease in efficiency for a given trigger and event class. The ratio of cut events to the total sample is the rejection ratio. It represents the rate reduction for a particular trigger that would result were we to accept a 1% decrease in efficiency for a given event class. The 1% value was chosen only to get around baseline fluctuations and is on the order of statistical shifts seen regularly.

The rejection ratios for the correlator described here are shown in figure 9. Figures for the multitrack trigger are shown for comparison only. One of the constituent triggers for this group has no shower counter requirement and is not a realistic candidate for inclusion in the correlator system. The inclusive event fraction for this group is on the order of 1% anyway so rate reduction is not an issue.

— 121 — Loose 2-track + shower 3 and 4 track trigger outer layers trig 1,0Enn "P n Bhabhas •gjfl 5°5.-' :|A 13 evts e Tift I i. i i i i i__j i_I ol II It 1OSLSSHAaHAS/Tai /'ALN-Vf • i as/'aHASHAssTaavALN.A.fa Dimuons » 94 evts 0 evts 30

o ao so *o *. lOSyOIMUONS/TSI/ALNJk^B 1 OT/OIMUDNS^Ta3y'AtN-.<-» Tau pairs JJ- 67 evts 14 evts

I I • ' ao so 110^AUS/T3t /'ALN.A<»-a 1 1 1STAUS/T22/'ALN.A.»H

Raw 1045 733 evts evts -L.—I—-I I l_ 30 SO 118/l«AW.DATAyT31 /ALNA* 11 •./KAW.OATA/'TU/AU'l.A.t.a

"Perfect" 2-track trig.

337 evt s Multitrack (hadronic) trigger. lO SO 30 «VO 104/VHA*H-«^r30y'ALN^.fB Hadronic event data. Dimuons 37 evts 702 evt

« ' C=U 1 1 1 l L. lO 30 SO 40 iO ao so 10«/DIMUOH3/T30/AUN^k->-B 100/HADRON3/T1 T.I a>*ALN-A«»B

Tau pairs „ 41 evt! ISO 233 evts lOO

lO 3Q SO «0 io ao so *o L. ra 11 SVT*USy»TSO/'AUN -»•»•• 119y tAW.DATAXri7.iayAUN^.-*'B Raw 319 evts Multitrack trigger. "I Raw data. io ao so -*o 11 T^KAW.OATA./TSO^'ALN^-*'*

Fig. 8 Radial coincidence distributions

— 122 — Bhabhas Dimuons Tau pairs Loose 2-track + Shw. 0.605 0.501 0.504 3 and 4 tracks 0.974 0.903 "Perfect" 2-track 0.503 0.416 0.413 Hadrons Multitrack 0.854

Fig. 9 Rejection ratios

SUMMARY

The correlator setup described here would result in rate reductions on the order of 50% for the two-track+shower trigger, 80-90% for the 3 and 4 track trigger and 40% for the perfect 2- track trigger. The first two are of most concern where beam-related backgrounds are involved. It should be kept in mind that we cannot cut on a fractional number of coincidences and that the data used for the study has some bias since it's been passed through the current trigger system. This bias is not very significant however since the correlator system can perform all the functions of the current system in addition to processing azimuthal information. There are several features of the correlator that go beyond reductions in rate. Having use of azimuthal information makes possible the creation of additional triggers that would be prohibitive in the current system. In addition, existing triggers can be readjusted to reduce the interdependence of detectors for trigger­ ing purposes. This would result in better estimates for detector efficiencies and decrease the trigger system's sensitivity to high voltage problems. Installation of the correlator would leave most of the current trigger system intact and can be phased in operating in parallel with the current system until its performance is proven. The author wishes to thank C. Rosenfeld and Y. Sakai, who have been central in the development of the AMY trigger and data acquisition systems, for their advice and assistance.

REFERENCES

1) 'Major Detectors in Elementary Particle Physics', Particle data Group, LBL-91,1985; H.Sagawa et al., Phys. Rev. Lett. 60, 93 (1988).

— 123 — 2) M.Ikeno et ai., IEEE Trans, on Nucl. Sci. Vol. 33, 779 (1986)

— 124 — BUS SYSTEM and DATA ACQUISITION SYSTEM ARCHITECTURE

Masaharu NOMACHI KEK Online Group

— 125- Host Computer(S) Monitor BUS system Recording

Data acquisition VME formatting Selection

system CPU architecture Read out TK Packing 10-MAR-1989 hm ODD Frontend modules © Present VME © Near future Good Cost performance Up to date development ® Future Sufficient performance VXI, SCI etc TKO Optimized for frontend bus

Good solution KEK M. Nomachi a Large scale experiment BUS • I L_ BUS CPU Mem. I/F CPU Mem. I/F Inter-crate connection s

system architecture Mem. Mem. CPU CPU

I/F I/F ODDhOOBnB FASTBUS ? NEW BUS ? S S

BUS+lnter crate protocol Loocal BUS memory bus Multiprocessor Computer BUS I/O BUS ——— SCSI < ® High speed network Ethernet VME • Small board size Not enough power or cooling 9 No message protocol No cache control No Inter crate protocol VXI i % Large board size na nn<<;i-iM»*cM'r»itnn!w v-f^oyatvctii'T, nam/**-* i Better E.M. shield, cooling i • Message protocol

• Local bus

• 4 •rt7.tt*!,xf-A£.mil7'a*y<» ^* No Inter crate protocol *««»'" + + yU* • **«*•*•«•» Mfr50iu * Lfc. Va-feW* iOMHiTBtt VXI as J «. 5 MIPS i fit L fe. fc -., HMUoK 4'*». 7"a*1Ht (ill What are problems ? SCI

I.Hand shake —• block mode 16 bit data width 2.Band width —• large data wdth 500MHz( 1 GB/sec/processor)

3.skew —^- serial bus Unidirectional ECL line up to 64K processors 4.lnperfect —•coaxial line transmission switched point to point lines cohirency 5.arbitration & contention

CPU

nodej memory SCI-22Aug88-docl-p5

Node I | Node I - • - lNode| £ 64K Nodes tuft&\^\ti®ft

I Node| | Node I |Node| | Nodal Figure 5: The organization of an SCI system. Present The real world is more complex, including pre-SCI systems built from various bus standards. An important goal of SCI is to make it possible to interface these systems to SCI and thus (indi­ rectly) to each other. There will be some limitations, of course, because the older systems will VME+TKO probably lack some desirable features which are not easily simulated by an interface. Nevertheless, some proposed SCI features are harder to Interface to than others, and we weigh these consld'-idons in our architectural decisions. Another practical consideration is the heed to configure clusters"of SQ systems. Figure 6 shows a typical case, where two SCI systems which were built independently are connected to each other and to several Independent subsystems built out of various standard buses. Near future message protocol DMA, cache I C*w»«l« INwdtl Hoda I Cwrtf I

Faslbua | VME Fulurebus Future

Figure 6: SCI is a generalized interconnection which can combine other Net work bus systems and SCI subsystems.

Scalable Coherent Interface Augusl22,m& Rev. 14, PageS Data Transfer with Transputer Link

Y.Nagasaka Institute of Physics, University of Tsukuba Tsukuba, Ibaraki 305, Japan March 10,1989

1 Introduction

Today, when we use the CAMAC, we usually use a branch-highway, i.e. a special bus with a speed of lOMbits/sec in maximum, and a branch-driver, i.e. a special interface, to connect CAMAC crate controllers with a computer. So far, in spite of its troublesomeness, we have used the branch-highway and the branch-driver as a standard device for data transfer. For investigating more practical data transfer system using modern technology, we studied Transputer link which is produced by Inmos Ltd. in England. Consequently, we confirmed an easy connection between the CAMAC crate controller and the computer via Transputer Link, and achieved the speed of data transfer comparable with that of the branch-highway.

2 Transputer link

The Transputer link has the following characteristics. 1. The link speed is high; the maximum speed is 20Mbits/sec. 2. By two one-directional signal lines, the transputers or peripherals with a link interface are connected with each other serially and synchronously. 3. Special link adaptors(imsC011,C012) are prepared by Inmos Ltd., which support the framing and protocol of Transputer link. 4. The Transputer link is able to be directly connected with the Transputer published by Inmos Ltd.

2.1 Link speed The Transputer link speed is selected in 10 or 20Mbits/sec. This selection is done on a hardware.

— 131 — 2.2 Signal lines To provide synchronised communication, each message must be acknowlaged. Con­ sequently, a link requires at least one signal line in each direction. A link between two objects is implemented by connecting between two link interfaces by two one- directional signal lines, along which data is transmitted serially.

2.3 Communications Each message is transmitted as a sequece of single byte communications. Each byte Is transmitted as a start bit followed by a one bit followed by the eight data bits followed by a stop bit. After transmitting each byte, the sender waits until an acknowledge is recieved; this consists of a start bit followed by a zero bit. The acknowledge signifies both that a process was able to receive the acknowledged byte, and that the receiving link is able to receive another byte. The sending link reschedules the sending process only after the acknowledge for the final byte of the message has been received. Data bytes and acknowledges are multiplexed down each signal line. An acknowledge can be transmitted as soon as reception of a data byte starts. Con­ sequently, a transmission may be continuous, with no delays between data bytes.

2.4 Link adaptors To connect Transputer devices, which have a link interface, with non-Transputer devices, which don't have it, Inmos Ltd. provides two link adaptor chips. One is imsCOll and the other imsC012. These two link adaptors are universal high speed system interconnects, pro­ viding full duplex communication according to the Transputer link protocol. The link protocol provides synchronised message transmission using handshaken byte streames. Data reception is asynchronous, allowing communication to be indepen­ dent of clock phase. The linlf adaptors convert bidirectional serial link data into parallel data streams. The serial links can be operated at each of two different speeds, 10 or 20Mbits/sec.

2.4.1 The imsCOll link adaptor The imsCOll Utile adaptor allows the user the flexibility of configuring it in one of two modes, and one of two serial link speeds for each mode. This selection depends on which the 'SeparatelQ' pin of the imsCOll package is wired to. If wired to VCC, mode 1 is selected with a link speed of lOMbits/sec. If wired to Clockln, mode 1 is selected but with a link speed of 20Mbits/sec. Wiring to GND selects mode 2.

— 132 — In mode 1, the link adaptor converts between the Transputer link and two independent fully handshaken byte-wide interfaces. One interface is for data com­ ing from the serial link and one for data going to the serial link. Transfers may proceed in both direction at the same time. In mode 2, the link adaptor provides an interface between the Transputer link and a microprocessor , via an 8-bit bi-directional interface. This mode has status/control and data registers for both input and output. Any of these can be accessed by the byte wide interface at any time. Two interrupt lines are provided, each gated by an interrupt enable flag. One presents an interrupt on output ready, and the other on data present.

2.4.2 imsC012 link adaptor The imsC012 link adaptor provides an interface between a Transputer link and a microprocessor system bus, via an 8-bit bi-directional interface, like the inmCOll in mode 2. The imsC012 link adaptor is the adaptor removed mode 1 functions from imsCOll mentioned in 2.4.1. As a result, a volume of the package is smaller than the imsCOll.

2.5 Connecting with the Transputer The Transputer is a microprocessor which is suitable for a pararell processing. The Inmos Ltd. have published three kinds of Transputers; T212, T414 and T800. The first one is 16-bit microprocessor, the second one 32-bit and the last one 64-bit. The T212 and the T414 have a processing speed of about 10MIPS for a 20MHz version. In addition to the CPU, these two Transputers have a SRAM of 2kbytes, whose accessing time is 50nsec. The performance of the T800 is much better than these two Transputers. The most common characteristic aspect is that these three all have four link interfaces supporting the Transputer link protocol. Owing to this fact, we are able to connect the Transputer link with the Transputer directly. Connecting with the Transputer, we are able to calucurate and assemble datas in paralell. Moreover, well be able to treat data faster than now. For example, a use of Transputers for intelligent event selection is considered.

3 Application

We developed the system connecting between the CAMAC and the Transputer using the Transputer link. We used a CAMAC crate controller having an MC68000 CPU in it. This crate controller had already existed. To connect it with the Transputer link, we developed an interface board. This board consists of a link adaptor imsCOll, and

— 133 — simple circuits to handshake with the Unk and to process an interrupt from the link. We used the link adaptor in mode 1. On the other side of the link, we adopted the Transputer. The Transputer was prepared by adding a Transputer board on an NEC PC. In this application, we used a T414 Transputer. For the link, we adopted RS-485 standard. In addition to the hardware changes, we designed a protocol for the MC68000- Transputer communication, and we developed some softwares. These were a Trans­ puter side and an MC68000 side softwares. On the MC68000, we developed a system monitor and a data transmitter. On the Transputer, we developed a server program to communicate with the MC68000 system monitor via the Transputer link, and a program to receive and save the transmitted data. The programs were developed by assembler language for the MC68000 and by Occam language for the Transputer.

4 Performance test

In order to examine the Transputer link performance, we carried out two tests, a transfer speed test and a reliability test. In these tests we used the developed system connecting between the CAMAC crate controller and the Transputer with the Transputer link illustrated in the section 3. The followings are the results. We used a twist and flat cable for the link cable. The length of cable used is 5m in Transfer speed tests (a) to (c) and 30m in (d), and 30m in a reliability test, respectively. The link speed used was lOMbits/sec. The MC68000 CPU used was an 8MHz Version.

1. Transfer speed test (a) Transputer-Transputer case Transputer-Transputer configuration is considered as the most suit­ able case for the Transputer link. This test is thought to measure the fatest speed in this meaning. The result was about 400kbytes/sec. The measurement of speed was done by observing the time between a byte and the next byte transfered with use of a logic analyzer. The same measuring method was used in the following tests. (b) MC68000-Transputer case(l) This test was done using the system connecting between the CA­ MAC crate controller and the Transputer. The measured speed cor­ responds to the case without any kind of DMA. The result was about 180kbytes/sec. (c) MC68000-Transputer case(2)

— 134 — This test represents a software simulation corresponding to the case with a DMA added on the MC68000 CPU. We used the system con­ necting between the CAMAC and the Transputer, and made a change of data transmit software on the MC68000 CPU.In the software, data was transmitted from a data register on the MC68000 directly. The result was about 250kbytes/sec. (d) MC68000-Transputer case(3) In the performance tests of (a) to (c), we used the cable of 5m length. In this test, we used the cable of 30m length. As a result, we achieved a speed of about 175kbytes/sec. 2. Reliability test We tested a reliability of transfered data under the same condition as the transfer test (d). We tested a way of transmitting IMbytes data. As a result, we didn't find any errors for the IMbytes data transmitted.

5 Conclusion

We developped a data transfer system connecting between a CAMAC crate con­ troller and a Transputer via Transputer link, and tested the performance of Trans­ puter link. In order to develop the data transfer system, we modified the existing hard­ ware, designed an interface board to connect to the Transputer link, and pro­ gramed some softwares; an MC68000 system monitor program, a server program on the Transputer to communicate with the MC68000 system monitor via the Transputer link, a data transmission program on MC68000 and a data manage­ ment program recieving and saving to disk program on the Transputer. The following shows the result of performance tests.

1. Transfer speed tests

(a) Transputer-Transputer case (with a cable of 5m length) = about 400 kbytes/sec (b) MC68000-Transputer case(l) (with a cable of 5m length) = about 180 kbytes/sec (c) MC68000-Tcansputer case(2) (with a cable of 5m length) = about 250 kbytes/sec (d) MC68000-Tcansputer case(3) (with a cable of 30m length) = about 175 kbytes/sec

— 135 — 2. Reliability test We have no error on IMbytes data transmission.

To transfer data at the maximum speed, we are considering the following improvements,

• adding a DMA on the MC68000

• changing the MC68000 from an 8MHz version to a 16MHz version • changing the Transputer link speed from lOMbits/sec to 20Mbits/sec

The obtained results confirmed that we are practically able to use this data transfer system with a Transputer link as an alternative of data transfer systems connecting between a CAMAC crate controller and a computer. The usage of Transputer link is very easy and very simple, and the performance is comparable with that of branch-highway.

— 136 — FIBER OPTICS DATA LINK

KEK Susumu Inaba Abstract This paper describes the advantages of the FODL (Fiber Optics Data Link) and applications of the FDDI (Fiber Distributed Data Interface) standard. A VMEbus test circuit with the AMD TAXIchips were developed for the FODL. It operates at a data transfer rate of 100 Mbits per second. 1. FDDI Application

The FDDI (Fiber Distributed Data Interface) standard has been designed as a backbone LAN (Local Area Networking) environment This standard has become a focal point for optical technology applications. Figure 1 shows the FDDI applications and the relation between the FDDI and the IEEE 802.XX standards. The FDDI standard satisfies the needs of backbone and back-end networks, as well as the ever increasing band-width requirements of front-end networks. The 100 Mbits/sec token-ring network links up to 500 workstations with a 100 km fiber path length. The FDDI is the result of American National Standards Committee X3T9. The FDDI grew from the need for high speed interconnection among mainframes, microcomputers and their associated peripherals. This requirement drove the definition of the 100 Mbits/sec data rate LAN. At the time of FDDI's inception, the predicted decline and increased availability of optical components opened the door for a fiber- based LAN.

2. Comparing the FDDI to Ethernet/Cheapernet Figure 2 compares the FDDI standard with Ethernet specifications. The FDDI is a 100 Mbits/sec high-end network over fiber optic media. In contrast, the Ethernet/Cheapernet is a low-end LAN specifying 10 Mbits/sec transmission over coaxial cable media. The FDDI data transfer speed is ten times faster than Ethernet. The FDDI standard accommoda^s up to 500 workstations with a total fiber optic path length of up to 100 km. The Ethernet links up to 1024 workstations with a total 2.8 km coaxial cable path length.

— 137 — 3. AMD TAXIchips The Am7968 TAXIchip Transmitter and the Am7969 TAXIchip Receiver chips are the result of the development of the FDDI chipset. These chips are a general purpose interface for very high-speed point-to- point communications over coaxial cable or fiber optic media. Figure 3 shows the connection between the Am7968 Transmitter and the Am7969 Receiver. The TAXIchips emulate a pseudo parallel input/output register. They load data into one side and output it on the other side. The Am7968 Transmitter shown in Figure 4 accepts parallel data, encodes and transmits the data serially.The Am7969 Receiver shown in Figure S receives the serial bit data stream, decodes the data and presents it in parallel data form at the receiving end. The Am7968 and the Am7969 TAXIchips have twelve parallel interface pins which are designated to carry either Command or Data bits. The Data Mode Select (DMS) pin on each chip can be set to select one of three modes of operation, eight Data and four Command bits, nine Data and three Command, or ten Data and two Command. This allows the system designer to select the byte-width which best suits system needs. The speed of a TAXIchip system is adjustable over a range of frequencies, with parallel bus transfer rates of 4 Mbytes/sec at the low- end, and up to 12.5 Mbytes/sec at the high-end. The TAXIchips flexible bus interface scheme accepts bytes that are either 8, 9, or 10 bits wide. Multiple TAXIchips can also be cascaded to accommodate a wider data bus. Byte transfers can be Data or Command signalling.

4. Fiber Optic Data Link Protocol Figure 6 shows an idea of the fiber optic data link protocol with TAXIchips. The Transmitter sends the start of message command to the Receiver at the beginning of a communication session, and then sends other commands in succession. Last of all, the Transmitter sends the end of message to finish a communication session. Figure 7 shows the list of command codes. The code 2h, me Data Transfer Mode command is used to transmit data at high-speed to a buffer device. The code 3h, the FASTBUS Transfer Mode is for use in implementing the FASTBUS protocol. A serial string of six bytes is used to identify FASTBUS signals. The code 7h, the Packet Number data byte following this command is used as a packet ID by a Transmitter/Receiver pair. This code identifies all packets which are within a message. The

— 138 — code Fh, the Command Code Expansion will extend the number of commands.

5. Fiber Optic Transmitter and Receiver circuits The test circuit with TAXIchips consists of two parts of the Am7968 Transmitter and the Am7969 Receiver. Each Transmitter and Receiver pair is connected by a separate high-speed fiber optic serial link. Figure 8 shows the Transmitter circuit with the Fiber Optic Transmitter DM- 76TB. The DM-76TB accepts differential ECL input signals. Differential ECL signals are converted to optical logic signals with a InGaAsP SLED. Figure 9 shows the Receiver circuit with the Fiber Optic Receiver DM-76RB. The DM-76RB receives optical input signals. Optical signals are reconverted to electrical signals with a InGaAs PIN- PD. The DM-76TB and the DM-76RB manufactured by Sumitomo Electric Corp. are high-speed 1300 nm fiber optic Transmitter and Receiver modules which can operate up to 220 Mbits/sec. This company has the DM-58(RB/TB) besides the DM-76(RB/TB). The DM-58 operates up to 400 Mbits/sec at 500 meters optical fiber cable. The Transmitter circuit with the Am7968 accepts inputs from a sending VMEbus system using a simple STRB/ACK handshake. Parallel bits previously stored in 74S244 latch flip-flops are saved by the Am7968's input latch on the rising edge of a STRB input signal. The input latch can be updated on every CLK cycle. The Am7969 Receiver accepts differential signals on me SERIN+/SERIN- input pins through the DM-76RB Fiber Optic Receiver. This input information, previously encoded by the Am7968 Transmitter, is loaded into a decoder. The Am7969 Receiver detects the difference between Data and Command patterns. When a new Data pattern is captured by the output latch, DS (Output Data Strobe) is pulsed and Command information remains unchanged. If a Command pattern is sent to the output latch, CS (Command Data Strobe) is pulsed and Data outputs remain in their previous state. Noise-induced bit errors can distort transmitted bit patterns. The Am7969 Receiver logic detects most noise-induced transmission errors. Invalid bit patterns are recognized and indicated by the assertion of the violation (VLTN) output pin.

— 139 — Q\/CT ®1—II—t® PSSS Tape 'v-sa.ji D,sk Mwr-^-^ Controller Controller TP3 • -Vrf • n FDDI Backend Network IEEE 802.3 ^-L ii Ethernet/Cheapernet CPU CPU Gateway J T PBX FDD! Backbone Network

Gateway Wiring Gateway Concentrator IEEE 802.4 // H \\ Toke)ken Bus (MAP) .Front End'Netwerkv ft . • ' .iA- Eng. Eng. Eng. Work Work Work 5^S Station Station Station

Figure I : FDDI Applications FDDI/ Ethernet (LAN) Comparisions

FDDI Ethernet

Data Rate 100 Mbit/sec 10 Mbit/sec

Media Fiber Optic Coaxial Distance 100km 2.8km Max. Nodes 500 1024

Figure 2 : Compare FDD! with ETHERNET COMMAND HH4.3, 1 or 2 Bits

Am7968 Transmitter

Fiber Optic or Link U: Am7969 Data Receiver Link Receiver

DATA COMMAND

Figure 3 : Connection Am7968 and Am7969 +Data Comman* d Strobe » Strobe & i • Input Latch Ack -* Acknowledge

i >

X1 * i • _i_ Oscillator ¥ Encoder Latch -r « and X2 Clock Gen. Clock -4 * * i i t i ita Mode Data Encoder Select

# Serial Out Serial ^ —• Media Serial Interface Shifter Input Interface Serial Out ,, Cascade/Local Select

Figure 4 : Am7968 TAXIchip Transmitter X1 Oscillator and Serial In + Media *—• Shifter <«* Serial In - Interface X2 I PLL Clock Gen. Decoder Latch < Data Mode —Select

V W Catch Next * Byte 6 Data Decoder Byte Sync Logic • I Got X Mine Output Latch < <> •*• Clock

Data Violation ^ =o Strobe Data Command Command ^>c Strobe

Figure 5 : Am7969 TAXIchip Receiver Fiber Optic Data Link : Protocol Transmitter Receiver Start Message Data Transfer Mode Destination Address Acknowledge Packet ID Number Bytes to be Transfer Data Bytes Follow Data Data CRC Check End of Packet Acknowledge End of Message

Figure 6 : Fiber Optic Data Link Protocol Fiber Optic Data Link Command Code (Draft)

CODE CONTENTS Oh Not Used 1h INQ lnquire(Request ACK) 2h DTM Data Transfer Mode 3h FTM FASTBUS Transfer Mode

4h DA Destination Address 5h SA Source Address 6h DL Data Length 7h PN Packet Number

8h DF Data Follows 9h EOP End of Packet Ah CRC or PAR Check Bh EOM End of Message

Ch ACK Acknowledge Byte Dh T Reserved for Token Eh Reserved Fh Command Code Expansion

Figure 7 : Command Code for Fiber Optic Data Link

— 146 — — 147 — RX

Figure 9 : Fiber Optic Receiver with AMD Am7969 OPTOELECTRONICS TECHNOLOGY FOR COMMUNICATION SYSTEMS T. Sugawa Sumitomo Electric Industries Ltd. 1, taya-cho, Sakae-ku, Yokohama 244 Japan ABSTRACT Fiber optic communication technology has been developed mainly in the area of public telecommunication. Recently, however, the application of fiber optic communication is being expanded into new areas such as office automation, factory automation and LAN in general. This new expansion of fiber optic communication systems has been made possible by the successful development of various fiber optic components which have characteristics of easy handling by users and high reliability. In this paper, the trend of the recent markets, new technical movement and international standardization trends are briefly explained. 1. INTRODUCTION There are two major classifications of fiber optic components. One component classification is associated with fiber optic active components and the other is fiber optic passive components. Typical components among active components are fiber optic transmitters and receivers. A great deal of progress has been made in the area of fiber optic transmitters which include LEDQight emitting diode), LDOaser diode) and associated circuits such as a combination of optical transmitter driving circuits and devices. The main functions of the fiber optic receiver are to detect and amplify weak optical signals. The miniaturization of this component has also been accomplished. A paired fiber optic transmitter and receiver has been made available as a commercial product called fiber optic data link. On the other hand, fiber optic passive components composed of fiber or waveguide instead of conventional mirror or prism are eagerly developed in conjunction with fiber optic data link.

2. FIBER OPTIC DATA LINKS (2.1) Link design An article entitled "Ever Expanding Fiber Optic Data Link Communication" was published by Nikkei Electronics in 1979. The article described fiber optic data link series of 10 Japanese and 12 overseas. At that time, the highest data rate was 32 Mbps and the longest transmission distance was 8 Km. Due to the market requirement for the low cost data link, LEDs of 0.8 um have been used as the light source and high silica multimode fiber as the transmission media. However, due to an everincreasing demand for high speed data rate and longer transmission distance, the longer wavelength of 1.3 pm have been introduced for

— 149 — LAN applications. Fiber transmission loss is much less at 1.3 um than 0.8 um, and LED which can be operated at the wavelength of 1.3 pm have been developed. Fiber optic data links for such speed and longer transmission distance became available about two years ago. In optical links utilizing LEDs as light sources, the material dispersion plays the major role in transmission capacity. An optical fiber which is made up of silica glass and containing some amount of Ge in its core part has the zero- dispersion wavelength around 1.3 um. The zero-dispersion wavelength is a weak function of the amount of Ge in the core part. In order to maximize the fiber bandwidth, the center wavelength of an LED must be designed to be identical with the zero-dispersion wavelength. On the other hand, generally, the higher speed the LED has, the wider spectrum it has, and therefore the more dispersion is produced in an optical fiber. This spectral width (Full Width Half Maximum) was designed to be 140 nm which is the typical value for high speed LEDs capable of a few hundred Mb/s modulation. In order to determine the optimum center wavelength of an LED, we took not only the dispersion characteristics but also the spectral loss characteristics of a fiber into account. Fig. 1 shows us the result of computer simulation for a 62.5 um core, 125 um cladding, NA=0.29 graded index fiber whose bandwidth is 600 MHz-Km with an LD. It is obvious that the peak emission wavelength of an LED must be 1320 nm, and the expected bandwidth is greater than 400 MHz-km with such an LED. For this calculation, the measured shape of the emission spectral distribution of an LED was used.

(MHz-km)

400 ^" \ Al = 140 nm >^ LE D ^ wit h

200 • bandwidt h 1

n • • 1300 1350 (nm) Peak emission wavelength

Fig. 1 Peak emission wavelength dependence of fiber bandwidth with LED

150- (2.2) InGaAsP/inP LED

LEDs are made up of InGaAsP/inP and are of surface emitting type. They are designed to have the approximately 1320 nm center wavelengths, and have the rise times of about 1.8 ns which are suitable at least for 200 Mb/s. (2.3) InGa As/InP PIN-Photodiode

As photodetectors, InGaAs/InP PIN-photodiodes have been developed. They are also packaged in connectors, and have responsive areas of 100 um diameter. The photosensitive region is a high-purity InGaAs grown by using specially designed furnaces. The features of the PIN photodiode, i.e., wide spectral range, low dark current and small capacitance, are based on the technology of growing high purity InGaAs layer. The PIN-Photodiode covers a wide spectral range of 1.0 to 1.6 um which is the most important wavelength range in silica optical fiber communications. The dark current is below 10 nA at room temperature and —5 V bias. This small capacitance makes it possible to detect weak optical signals modulated at bit- rates as high as 200 Mb/s in a high S/N ratio. (2.4) Transmission Experiment Transmission experiment at the speed of 300 Mbps was carried out with the reels of 1km and 2km 62.5/125 fiber and the newly developed transmitter receiver module. Optical fibers has 1115 MHz • km bandwidth with LD. Figure 2 is the bit error rate characteristics of this miniaturized data link. The minimum detectable power at the BER of 10"9 was — 23.5dBm (peak) in the back to back operation and — 22dBm (peak) at 3km. Power penalty caused by the limited bandwidth was only l.ldB. The bit error characteristics when high level optical power is launched into the receiver is also shown in Figure 2. The data was taken with the LD as a light source. The optical receiver has a dynamic range of 15dB. io- II " 1 "71 io-

io— 11 3km 1 tf/2kni 1 — 10"' lkm ?tt/JrC Back to back io- , ^ 15dB v io-- > 10-

-24-2ill3 -22. -10 -9 -8 (dBm peak) Fig. 2 Bit error rates characteristics at 300 Mbps

— 151 — (2.5) Transmitter / Receiver Modules The present highest data rate of such fiber optic data link is 200 - 300 Mbps and is already commercially available, Most practical applications can be satisfied by such high speed data links. Most of the remaining development work should be concentrated in miniaturization and cost reduction. Miniaturization is especially important, because most applications of fiber optic data link are to be mounted on printed circuit boards. In the last several years, glass-based epoxy substrate have been widely used for the mounting of standard ICs and discrete components. This approach is called hyblid IC technology and has an advantage of a short turnaround time and easy refinement of the circuit performance. One major drawback of hybrid IC technology is the cost factor for a large volume production. Hybrid IC technology can not compete in large volume cost with monolithic IC technology in the long run. When fiber optic data links are standardized and the volume becomes large enough, monolithic IC would be better from the view point of cost and the size of fiber optic data links. For example, 200 Mbps fiber optic data link with 16 PENDIP (dual in line package) have been realized and can be handled in the same way as normal IC. This trend can be seen more distinctively in low speed fiber optic links. For example, photodiode and preamplifier circuits are integrated on one monolithic silicon tip. This type of IC has contributed to the cost reduction of such products. There is a distinctive movement toward worldwide standardization as fiber optic data links penetrate into many areas of application. For example, standardization of fiber optic components have been promoted by IEC (International Electrotechnical Commission). In Japan, standardization of fiberoptic links have been discussed as a part of JIS activities. At the moment, definition of technical terms, measurement methods, items of standardization and similar basic items have been discussed. It may take quite some time before standardization is finalized on fiber optic links. Present activities concentrate rather on general product lines of fiber optic data links. On the other hand, unified movement is also in progress on specific applications of fiber optic data links. For example, the Japan Electronic Machinery Industry Association has been working on the standardization of fiber optic data links which are intended to be used between digital audio equipment and their interfaces. It is expected that such standardization will be realized shortly. In the field of LAN, ANSI (American National Standard Institute) is now working on such standardization. For example, the ANSI X3T9.5 committee is working on standardization of FDDI (Fiber Distributed Data Interface), which is considered one of the most important fiber optic LANs. FDDI standardization activities aim at high speed fiber optic data links with data rates of higher than 100 Mbps. Many makers and users have been participating in such discussions. It is expected to take some time before they reach a conclusion. However, some progress has been accomplished and LAN manufacturers are expected to

— 152 — announce their FDDI standard products in 1989. Data speed for FDDI is 125 Mbps (5/4 times faster than the actual data rate because of coding) and fibers are connected with dual connectors. Dual connectors which are used for FDDI applications are also to be specified by FDDI standardization. Connection loss of this dual connectors are typically represented by Fig. 3.

N = 72 Typ. = 0.27 [dB] Std. Div. = 0.10 [dBl

J_ 0.2 0.4 0.6 0.8 1.0

Connection Loss ( dB )

Fig. 3 Connection loss of FDDI duplex connector 3. PASSIVE COMPONENTS Typical passive components required for fiber optic communication systems are fiber optic switches, fiber optic couplers and WDM(wavelength division multiplexing) devices. Fiber optic switches are mainly used for protection of communication systems by changing the transmission line to the redundant lines. This components used to be composed of mirror or prism but recently fiber moving type were developed whose mechanism is shown in Fig. 4.

LOCAL LOOP FIXED STAGE

MOVABLE STAGE

CONTROL SIGNAL: ON CONTROL SIGNAL : OFF

NORMAL MODE BYPASS MODE

Fig. 4 Fiber moving type optical switch — 153 — This fiber optic switch is still mechanical moving type although this can meet FDDI standard. From the view point of reliability and mass productivity, waveguide type optical switch based on LNO and GaAs material are eagerly developed now. As mentioned above, passive components including other components will be realized shortly by waveguide technology as shown in Table 1.

Table 1 Technology trend for fiber optic passive component

~~ —~^_^_ BULK TYPE 4FIBER TYPE ^WAVEGUIDE TYPE

EXCESS LOSS 1.0-3.0 . <1.0 . <0.5 IdB)

PRODUCTIVITY POOR » GOOD . EXCELLENT

BULK TYPE FIBER TYPE WAVEGUIDE TYPE

FIBER OPTIC SWITCH O O DEVELOPMENT

FIBER OPTIC COUPLER o O DEVELOPMENT

WDM DEVICE o - DEVELOPMENT

4. CONCLUSION Demand for fiber optic data link is expected to be dramatically increased by FDDI standardization. In the area of low speed fiber optic data links, demand is also expected to grow fast as home electronic products employ more and more fiber optic links. The market size for fiberopti c data link is expected to grow into the hundreds of millions of dollars in several years. In order to meet such a big market demand, mass production technology development is urgently required.

— 154 — [presentation materials of Research meeting for the next generation data acquisition and processing]

immON: Industrial-The Realtime Operating system Nucleus

NISfflO, Nobuhiko

Department of Information Science Faculty of Science, the University of Tokyo

— 155 — Fatal characteristics tor JLTJKOJN What is ITR0N? Multi-tasking OS OS for industrial embedded complex and frequent computer system in TRON project, inter-process communication and syncronization, for control and communication of the intelligent objects for data transfer and mutual exclusion (industrial robots, NC machines and Realtime OS home electronic products), must satisfy tightly in order to realize HFDS (Highly conditioned timing constraints. Functionaly Distributed System). otherwise, disastrous events will happen. must be able to predict when a task will finish. Features of ITRON 1. Weak Standardization Virtual Approach 1. Weak Standardization v.s. Specific Approach

2. High Functionality quest for best performance and well-standardization

3. Quick Response Definitions and Recomendations 4. Adaptability for implementation on various processors

5. Easy to Learn ITRON/86 ITRON/68K ITRON/32 ITRON/MMU286 ITRON/CHIF 2. High Functionality 3. Quick Response [Ex.] Syncronization & IPC Quick Task Dispatching & Event Flag Interrupt Handling Semaphore Sleep & Wait Mail Box Designating Registers for Message Buffer Swapping in Dispatch Rendez-vous Interrupt Handling without OS intervention Wide selection for optimal mechanism Performance Improvement and Easiness for Development 4. Adaptability 5. Easy to Learn According to the condition of Standardization of ... the system environment ... Functionality and Naming of Selection of System Calls, System Calls Omission of Error Checking, Compound/Extended System Calls Kinds and Naming of MMU Support, File Management System Call Parameters Series of ITRON Specifications Kinds, Naming and Values adaptability of Error Codes jiITRON Naming Convention cre_tsk del_tsk sus__tsk task ITRON cre_flg del_flg set_flg event flag cre_sem del_sem sig_sem semaphore ITRQN2 creation deletion other operations standardization _J Authorization Functionalities in ITRON Priority-Based Task Scheduling Well-Organized Task Status Transition Clear System Status Division -running non-task part I-transitional status (-running task-independent part Lrunning quasi-task part -running task part

Realization of Delayed Dispatch Task A TaskB Interrupt A Interrupt B (priority low) (priority high) (priority low) (priority high) •^ errup :Zrj [interrupts Nested^] I ..--...-...... _.__. wup^tsk B • •

Delayed Dispatch .-•-.... -ret int ..... ret int

CO

O •a •KFPTTB 1.3* I"81 ^7^ § *S 3 8, O co I & iH ay to CO b z a 3 U— A n1 COr L. w % CO CO 3 co u •a •s § CO 3 co e •a co CO t

— 161 — 'I'ROiS Our Laboratory aims at... ri-MI'"' Development of ITRON/CHIP free-ware implementation of ITRON2 specification cm-'TRONCHIP MAPPED I/O with Distributed Processing Functior SHARED transparent v.s. specific extention MEMORY Of system call functionality, MAPPED I/O optimal resource allocation, development of protocols on Multi-Processor Architecture E M<0 tightly/loosely-coupled processors P 3 Tuors cache memory architecture JT:-I-M'P' toward HFDS realized on MTRON. | 1 HARDVARE AND SOFTWARE ARCHITECTURE FOR GENERAL PURPOSE PARALLEL PROCESSINGS

Kazuya Tago

Department of Mathematical Engineering and Information Physics, Faculty of Engineering, University of Tokyo 7-3-1 Hongo, Bunkyo-ku. Tokyo 113, JAPAN

ABSTRACT High performance microprocessors make parallel processors economically feasible. Hardware architecture, description languages and operating systems which are considerably different from those of single processors are needed to execute parallel computation. Technologies related to them are outlined.

1. INTRODUCTION

The progress of the semi-conductor device technology provides powerful and cheap processors. One of the most promising way to utilize such processors effectively is to apply them to implement parallel computing systems. Technologies required to implement parallel computing systems are: 1) Efficient connecting mechanism which enables co-operation and data sharing among processors, 2) Programs which control prallel processings, 3) Operating systems which provide load balancing mechanism, debugging environment, efficient file system and synchronization mechanism between units of parallel computation.

2. PARALLEL PROCESSORS

Architectures of parallel processors are roughly classified into two classes: the shared memory architecture and the message passing architecture (Figure 1). The shared memory architecture machine is constructed from processors, memory banks and switch. The switch transfers memory access requests generated by processors to memory banks. The switch is implemented by a single bus or a packet transfer network. For example, the VHX bus is used for this purpose. The common bus architecture is used for implementing system of relatively low order parallelism because of limited bandwidth of the bus. Packet transfer network provides higher bandwidth at the expense of simplicity of hardware. The cross-bar switch is the basic model of packet transfer network. Cache memory is associated to each processor to reduce traffic of the switch in usual design. Message passing architecture system is constructed by processor, local memory and switch. The processor can only access local memory. The message passing mechanism is provided

— 163 — memory bank processor

shared memory architecture tessage passing architecture

Figure 1 Architecture models of parallel processors. for co-ope.-ation between .processors. Software invokes communication primitive explicitly to interact, with program on different processor.

3. EXAMPLE OF SYSTEMS[3]

Several commercial multiprocessor systems which are implemented by. the common bus architecture have been available. For example, the Balance Symmetry, the Encore Multimax and the Appolo DN10000 were developed. The number of processors which can be connected to a single bus is limited to about 30, though elaborated cache management algorithms are used. Systems of this type are frequently used for job and file servers of network systems. Implementation methodology of shared memory systems of larger number of processors by using packet transfer networks are being studied. Message passing architecture machines with large number of processorsOlOO) have been developed. They are used for special purpose processings such as image processings or simulation.

4. DESCRIPTION LANGUAGES

Programs for parallel processors are implemented by using program languages which can describe concurrency expl icitly[2], or converting existing sequential programs into concurrent programs. The Occam and the Ada languages are frequently used

— 164 — for writing concurrent programs. The description mechanism of them is based on the message passing paradigm. These program languages have facility for describing process and ipc(inter- process communication). Programmers design parallel execution explicitly by allocating processes. This paradigm can be used for both shared memory architecture and message passing architecture machines. Several manufacturers of multi-processor systems provide compilers which convert FORTRAN program into object code suited for parallel execution. The basic idea of such conversion is to convert execution loop into parallel execution. The Parafrase[4] is the general purpose conversion program which performs this conversion on source code independently from specific hardware architecture. The syntax of output code of the Parafrase is the standard FORTRAN except that it includes the "DOALL" statement. The "BALL" starts multiple execution of following statements in parallel. This type of program reconstruction can only used for programs of shared memory architecture machines.

5. OPERATING SYSTEMS

(5.1) Overview

Operating systems of dedicated systems have relatively simple structure. Dedicated machine executs only one job at a time. Operating systems need not to manage sharing of hardware resources between multiple jobs. They only execute file i/o and help debugging. Synchronization of parallel execution and load balancing are executed by user program. The operating system of this type is usually executed on specially allocated front-end machine and parallel processor executes only user jobs. When a computer is shared among multiple jobs, synchronization and load balancing must be executed by the operating system. The reduction of implementation costs of synchronization and scheduling is the most important design target of such operating systems.

(5.2) Operating systems for shared memory architecture machines

Shared memory architecture machines are used in shared manner. Traditional TSS is reconstructed to control common bus multiprocessor systems. UNIX"*" operating system is frequently used for this purpose. Mach system[l] which is developed by CHU is the general purpose operating system for distributed processings and parallel processings. It provides generalized environment for parallel processings on shared memory architecture machines. Such environment is realized by multiple light weight processes

•UNIX is a trademark of Bell. Lab. Inc.

— 165 — sharing the single logical addressing space of a user process. Multiple processors are allocated to the user process, and each process executs one of the light weight processes. Hach provides scheduling and synchronization mechanism of light weight processes. For example, "DOALL" statenent can be implemented by creation of light weigh processes.

6. CONCLUSION

Technologies which are required for implementing parallel computing machines are overviewed. Significant advance in this area is expected in near future.

REFERENCES 1) Accetta.M. et. al : Hach: A New Kernel Foundation for UNIX Development, Proc. of USENIX, PP.93-112 (1986).

2) Filman.R.E. and Daniel P.F.: CORDINATED COMPUTING. McGraw- Hill. USA (1984).

3) Gehringer,E.F.= A Survey of Commercial Parallel Processor, Computer Architecture News, PP.75-105 (1988).

4) Leasure.B.: The Parafrase Project's Fortran Analyzer Major Module Documentation, University of Illinois, Center for Supercomputing Research & Development, No.504, RP-85-5 (1985).

— 166 — Cellular Array Processor CAP

Hiroaki Ishihata, Hiroyuki Sato, Mono Ikesaka, Kouichi Murakami and Mitsuo Ishii

Fujitsu Laboratories, Ltd., Kawasaki 1015 Kamikodanaka, Nakahara-ku, Kawasaki 211, Japan

ABSTRACT

The general-purpose, highly parallel cellular array processor (CAP) we developed features multiple-instruction stream multiple- data stream (MTMD) processing and image display. Several hun­ dreds of processor elements can be connected together. The present system uses 256 processors. Each processor element consists of a general-purpose microprocessor, memory, and a special VLSI chip that performs parallel-processing-specific functions such as proces­ sor communication and synchronization. The VLSI has two 2M byte/s independent common bus interfaces for data broadcasting and six 15M bit/s serial communication ports for local data com­ munication. The chip can also process image data in real time for multiple processors. CAP has been successfully applied to visualization applications such as ray tracing and realtime visualization of numerical simula­ tion.

1. INTRODUCTION Parallel processing is a key to speeding up problem solving. Advances in VLSI technology have made available powerful microprocessors and large-capacity RAM chips, enabling us to construct parallel processors consisting of several hundreds of processor elements. Examples include the iPSC[l], CM-1[2] and N-CUBE[3]; some of these are now being marketed. Very few cost-effective uses have been found for such systems, however. Much more research is needed on applications for parallel comput­ ers. The cellular array processor CAP is a parallel computer we developed to study many aspects of parallel processing, from hardware architecture to application software. CAP is based on the following design principles:

— 167 — (1) MMD Processing Using Highly Intelligent Processor Elements For general-purpose use, each cell must provide a multitasking environment to enable the system to be applied easily to the wide range of problems that can be solved using parallel processing. It should also be capable of being used as a tool for developing and evaluating parallel algorithms. (2) Intercell Communication The architecture must allow a system of several hundreds of cells to bs con­ structed. Using shared memory to construct such a system is needlessly complicated, so we used communication ports to communicate between processors. Hypercube con­ nections are effective for random communication between processors, but do not work efficiently for global communication, such as data broadcasting from one processor to all other processors, or for collecting data from all other processors. Hardware support for these types of communication is needed to reduce communication overhead. No single network topology suits all applications, which means that networks of different topologies should be used depending on the application. (3) Realtime Image Display Graphics is essential to a good user interface. We incorporated a realtime image display that makes CAP well suited to applications involving image data, such as com­ puter graphics and image processing. One of CAP's best applications has been in image generation[4][5]. Debugging parallel systems software is a big problem. The behavior of many processors during execution must be easily understood. Direct image display of each processor's status would greatly aid programmers.

2. CAP ARCHITECTURE

2.1. Overview Figure 1 gives the standard CAP-C5[9] hardware configuration of 256 processor cells in a two-dimensional array. Figure 2 shows the CAP-C5 hardware. Each cell is connected to four adjacent cells. Cells at the boundaries are connected to cells at the opposite sides. In addition to intercell connections, the common command bus links all cells to the host computer, and the common video bus links all cells to the video inteiface.

2.2. VLSI Architecture for Parallel Processing The CAP-VLSI chip[6] is a key component for CAP. It operates together with the microprocessor and memory elements. The chip has the following functions: (1) Window controller for fast image-data transfer (2) Two common bus interfaces for global communication (3) Hardware synchronizer

— 168 — (4) Six intercell communication ports configuring an intercell communication network Figure 3 shows the cell hardware configuration. Table 1 lists cell specifications. The CAP-VLSI chip (Figure 4) is fabricated using a channelless CMOS gate array chip[7]. Table 2 lists chip specifications.

2.3. Window Controller Image generation is speeded up by partitioning an image area into noncontinuous subareas and using a large number of cells to generate subimages in parallel. Each cell's window controller maps its subimage data onto a part of the screen. A frame of image data, partitioned and distributed to many cells, is reconstructed as one complete image by cell window controllers. The mapping pattern can be changed as needed. Each cell maps its image data onto dispersed block areas (Figure S). Changing the size and intervals of blocks produces a variety of mapping patterns (Figure 6), and makes it easy to implement different image-generation algorithms. It also helps even out load distributions, as explained later. Each window controller performs such tasks as address generation and arbitration of image memory access from the video bus and MPU. Image data is read via the video bus every refresh cycle (33 ms) for refresh display. Image memory can be used both for full color (8 bits each for red, green, and blue). Images can be input and out­ put simultaneously in realtime by using separate video buses for input and for output. Image data is easily input from a TV camera or other device.

2.4. Common Bus Interface The host computer broadcasts to all cells via the command bus. The CAP-VLSI chip has two command bus interfaces, one for the host and the other for cells, whose use enables a variety of hierarchies to be configured (Figures 7 (a) and (b)). The command bus interface has 16-bit 8-word first in, first out (FIFO) memory. This helps to reduce differences in cell-processing time. A 3-wire handshake protocol is used in 1 to N communication. Any cell can broadcast data to other cells and to the host computer, although arbitration is needed when more than one cell requests to broadcast at the same time. The command bus interface also controls resets, interrupts and bus arbitration. The host computer or host cell can reset or interrupt all slave cells or any one cell. The CAP-VLSI chip uses hardware polling to arbitrate bus requests and to specify individual cells. Each cell has a two dimensional address and polling is executed in 2 phases. The first polling determines the column position of the requesting cell, then the second polling determines the row position of the requesting cell. To poll cells sequentially, the time increases in proportion to the number of cells. Our polling tech­ nique reduced the polling time from 0(N) to 0(-\/N), where N is the number of cells.

— 169 — 2.5. Hardware Synchronization Global synchronization or synchronization among some cells is needed to execute most of the parallel algorithms. Each cell has status registers for synchronization. Outputs of the status register of each cell are wire-ORed, and the line status can be read by the host or by any cell. Cells can set status to indicate completion of a process or other states. The host or any cell can read the logical OR (or AND) statuses of all cells to detect the comple­ tion of processes in a number of cells. Each status line can handle a single independent synchronization request. Con­ ventionally, the more the synchronization requests, the more the status lines. Multi­ plexed synchronization control (MSC) avoids this situation without increasing the number of status lines. MSC uses two control lines to automatically synchronize 16 synchronization requests.

2.6. Intercell Communication Ports Each cell communicates with adjacent cells via six full-duplex serial communica­ tion ports. Networks of different topologies, such as two-dimensional cell arrays, three-dimensional cell arrays, and hypercubes of up to six dimensions, can be configured (Figures 7 (c) and (d)). The CAP-VLSI chip uses a bypass for intercell communication. Any two of six ports can be connected directly by commands from the MPU, enabling fast communi­ cation between distant cells by cell input and output port connection on the route. Paths are set and released dynamically. Data is transferred in 19-bit packets — 16 data bits and 3 header bits to identify die packet type (data, interrupt, nonmaskable interrupt, or acknowledge). Connected cells communicate by a handshake protocol using data and acknowledge packets. Using interrupt packets, a cell can interrupt another cell's MPU whenever the cell is not bypassed. Nonmaskable interrupt packets are used to interrupt bypassed cells.

3. OPERATING SYSTEM Problems executed by CAP are divided into parts that can be processed in paral­ lel, then mapped to cells, which communicate by exchanging messages. Figure 8 shows software configuration.

3.1. CellOS We developed a cell OS that controls task execution in each cell and supports intertask communication. The cell OS has several advantages. First, having program­ ming tasks as elements to be executed in parallel by many processors or concurrentiy in one processor makes it easier to extract parallelism in problems, making programs more efficient Second, application programmers are freed from designing complicated procedure scheduling and can develop programs more easily.

— 170 — Messages are usually sent in packets. Messages from other tasks arc queued and read in the sequence they arrive. Message destinations are specified by cell and task numbers. Messages sent with only a task number are broadcast to destination tasks in all cells. Communication tasks pass messages between tasks. The cell OS supports three-dimensional mesh, hypercube, and hierarchical con­ nections. VLSI bypassing provides application programmers with two types of bypassed communication. One is static, in which the host broadcasts bypass informa­ tion to cells to establish fixed connections between specified cell pairs. The other is dynamic, in which a cell sends a message directly to the destination cell, dynamically bypassing cells on the route.

3.2. Cell Driver The cell driver, which resides in the host, dynamically allocates tasks to cells, and broadcasts data from an application task in the host to cells and collects data from cells.

3.3. Display Manager The display manager, a basic software package for image generation and display, provides standard patterns for mapping cell subimages onto the screen (Figure 6). Two-dimensional primitives can also drawn, and animated display of multiple image frames is very useful in certain applications.

4. APPLICATION TO VISUALIZATION

4.1. Ray tracing The ray tracing algorithm[8] simulates optical phenomena such as reflection, sha­ dows, translucency, and refraction. It generates quality images, although this requires a large amount of calculation.

4.1.1. Parallel Ray Tracing Rays can be traced pixel by pixel. Ray tracing can be speeded up by dividing the screen area and having multiple processors process each small area in parallel (Figure 9). We implemented a ray tracing program into CAP systems[5].

4.1.2. Static Load Distribution To enable performance to improve with the number of processors, it is very important to distribute the calculation load evenly. Because rays can be processed independently, load is evenly distributed using the dot the mode (Figure 6), in which each cell takes charge of a similar proportion of heavy-load and light-load pixels. Each cell has a copy of all model data, together with camera and lighting infor­ mation.

— 171 — 4.1.3. Antialiasing Process Using Inter-cell Communication Because of ray tracing's point-sampling nature, undesirable effects called aliases are generated in synthesized images, examples are the staircase pattern along straight edges, me moire patterns in finely textured areas. To reduce these aliasing effects, antialiasing is performed as needed. To calculate a pixel's antialiased intensity, inten­ sities of the four adjacent pixels must be known. Rays are traced in dot mode, in which adjacent cells process adjacent pixels on the screen, e.g.) adjacent pixels to the right neighbor are in the right-hand cell, and those to the left are in the left-hand cell. Thus, each cell needs only to communicate with the neighboring cells to collect otfier pixels' intensities. The communication time for local data transfer is minimal.

4.1.4. Experimental results of ray tracing Experiments were performed using the 64-processor CAP-C3 and 256-processor CAP-C5. CAP-C3 processors (cells) are similar to those of CAP-C5, except that VLSI chip functions are implemented by discrete hardware. Figures 10 are examples of ray-traced images. The results of experiments showed that cell processing times differed a maximum of 10 percent when 256 cells were used. This evenness of load distribution results in a linear performance improvement that increases with the number of cells. Ray tracing with 256 cells is over tiiree times faster than Fujitsu's M-380 mainframe (Table 3). Image quality is improved by antialiasing, with additional calculation time for adaptive oversampling. During oversampling, multiple rays are traced on a pixel region, and subpixel intensities are averaged (Table 4). Communication overhead for antialiasing is small compared to the total processing time.

4.2. Realtime visualization of numerical simulation As computer power increases, the visualization of computed results becomes more and more important. There are two problems in scientific visualization. One is the huge amount of the data to be handled. The best way to deal with it is realtime visu­ alization of the simulation. No data need to be kept for post processes. The other is the communication bottleneck between computing and visualization systems. If com­ puting and visualization are executed in the same system, the I/O bottleneck is easily eliminated.

4.2.1. Heat flow simulation using the finite difference method We have developed an example program, which performs heat flow simulation in a solid material and visualizes die temperature distribution at me same time. The basic equations are those for heat dissipation (Figure 11). These are solved by the finite difference method. In two-dimensional simulation, a five point stencil is used. The physical space is divided into rectangle areas and processed in parallel. In three-dimensional simulation, a seven point stencil is used. The physical space is

— 172 — divided into cubes and processed in parallel. We used an explicit calculation scheme called red and black SOR (successive over relaxation)[10]. To calculate next iteration step grid temperature values, only nearby gird temperature values are necessary. The temperature values of grids located at cell boundaries are exchanged through intercell communication lines.

4.2.2. Cross section display of volume data In two dimensional simulation, it is easy to display the temperature distribution. In three dimensional simulation, volume rendering techniques are necessary. We implemented a cross section display function. Two stage processing is performed to generate cross-section images of the volume. Each cell handles a cubic volume. In the first stage, each cell renders cross section areas if any. The generated subimage data is then distributed to the cells. The image is divided in the line mode. The cell which handles the shaded cube renders these shaded polygons. In the second stage, that cell sends these subimages to other cells line by line using intercell communica­ tion lines. The z value of each pixel is also sent for the hidden surface removal. The cross section display of a temperature field (Figure 12) took about five seconds when 64 cells were used. It is possible to record the image on the video tape frame by frame to make an animation of the simulation. Other volume rendering tech­ niques such as isovaiue surface display and projection are left for future study.

5. CONCLUSION We have explained the architecture of the cellular array processor (CAP), which used specially designed CAP-VLSI chips to manage several hundreds of processor cells. CAP uses two independent common bus interfaces for data broadcasting and six serial communication ports for local data communication. The chip also has realtime image-data handling capabilities. Using CAP-VLSI chips and general-purpose microprocessors, we constructed a system with 256 processor elements. CAP has been applied to visualization such as ray tracing and visual simulation. In ray tracing, CAP generates high quality images efficiently by evenly distributing the calculation load using the dot mode. We made a prototype program which performs realtime visualization of heat flow simulation. Combining the computing power of a highly parallel system and image generation functions will make it possible to con­ struct an interactive visual simulation system.

6. REFERENCES

[1] Intel Corp., data sheets. [2] W. Daniel Hillis, The Connection Machine, Cambridge, Mass., MIT Press, 1985. [3] John D. Hayes, et al., "Architecture of a Hypercube Supercomputer," Proc. Int'l Conf. on Parallel Processing, pp. 653-660., 1986. [4] H. Sato, M. Ishii, et al., "Fast Image Generation of Constructive Solid Geometry

— 173 — Using a Cellular Array Processor," Computer Graphics, 22(2), July 1985, pp. 95- 102 [5] K. Murakami, H. Sato, et al., "Ray Tracing Using Cellular Array Processor CAP"(in Japanese), Information Processing Technical Report, Vol. 86, No. 43, CAD-22-2, July 1986. [6] H, Ishihata, M. Jshii, et al, "VLSI for the Cellular Array Processor," Proc. Ml Conf. on Computer Design, pp. 320-323., Oct 1987 [7] H. Takahashi, et al., "A 240K Transistor CMOS Array with Flexible Allocation of Memory and Channels," IEEE Journal of Solid State Circuits, Vol. SC-20, No. 5, pp. 1012-1017, Oct. 1985. [8] T. Whitted, "An Improved Rumination Model for Shaded Display," Comm. ACM Vol. 23, No. 6, 1980, pp. 343-349. [9] M. Ishii, H. Sato, et al., "Cellular Array Processor CAP and Applications," Proc. Ml Conf. on Systolic Arrays, pp. 535-544, 1988 [10] L. Adams and J. Ortega, "A Multi-Color SOR Method for Parallel Computation," Proc. Ml Conf. on Parallel Processing, pp. 53-56,1982

— 174 — Table 1 Cell specifications

MPU i80186 + i8087 RAM 2M bytes(with ECC) ROM 64K bytes Image memory 96K bytes Video clock 15 MHz maximum Common bus transfer rate 2M byte/s Intercell transmission speed 15M bit/s

Table 2 Gate array specifications

Design rule CMOS 1.8 Jim Chip size 13 mm square Gate delay 1.5 ns/gate Basic cell configuration 94 x 311 Package 256-pin PGA VO 220 lines, TTL-compatible

Table 3 Results of ray tracing experiments

Model (#Object) Image generation times(s) Performance rate Conditions CAP FACOM M-380 CAP/M-380 BALLQ25) 19 60 3.16 Reflection PISTON(179) 10 33 3.36 CHESS(144) Reflection 40 128 3.20 Shadow

Display resolution: 512 x 384 pixels Number of cells: 256 (CAP-C5) (The FACOM M-380 has the equivalent processing power of the IBM 3081.)

— 175 — Table 4 Timing results of anti-aliasing for the model PIANO

Level 0 Level 1 Test item No anti-aliasing (Averaging four points) (Adaptive oversampling)

Communication time(s) .0 2.0 2.0

Time of slowest cell (s) 56.6 68.1 121.1

Time of fastest cell (s) 55.7 67.3 104.8

Maximum deviation (%) 1.6 1.1 13.5

Maximum deviation = {(Time of slowest cell) - (Time of fastest cell)} / (Time of slowest cell) * 100 Number of cells: 64(CAP-C3)

— 176 — Command bus

Host computer , , 1 , Cell - _ . . _

Video 1 I 1 I bus _ . . _ —

I I | 1 - - — 1 1 1 Color monitor I I 1

- — • • — - 1 1 1 1 256 cells (16 X 16)

Figure 1 CAP hardware configuration

Figure 2 CAP-C5 hardware

— 177 — Command bus (to host or eel I) Video bus (out) NDP MPU RAM 2MB Video RAM ROM 32KB 64KB 18087 180186

CAP-VLSI

MPU interface

Ce11 command bus Window interface inter­ controler Status registers rupt Multiplexed synchro­ control nization control

Ce11 command bus Interce11 commun i cat i on interface interface Status registers Multiplexed synchro­ Bypass control nization control

Command bus V V V V * * (to eel Is) North East Top Video bus (in) West South Bottom

Figure 3 Cell hardware configuration

mnpiffliminii a i a a -* - s - B- 7 p a s 10

Figure 4 CAP-VLSI chip

— 178 — Screen

Dispersed block

Figure 5 Sub image mapping

Block mode Line mode Dot mode Figure 6 Napping patterns mm irinr Cubic Hypercubic

in m m Hierachical Pyramidal

Figure 7 Network configurations

— 179 — Video bus

North * »| South West East Top * » Bottom

Figure 8 Software configuration

Reflection

Cell 0

Transmission

Viewpoint

Figure 9 Parallel processing of ray tracing(block mode)

— 180 — Fugure 10 Ray traced image of a chess board

2D

3D

a* a^2 3/ az2

#' = FH„C( t/fo , C/^ut, t/^. (/^, t, 111^ , t/^, , u^)

Figure 11 Heat flow simulation

181 Figure 12 Cross section display of three dimensional temperature field

-182 — Advanced Computer Program(ACP) at KEK

Yoshiji Yasu

National Laboratory for Sigh Entry Physics Oho 1-1, T3v.kuba.-shi, Ibaraki-ken S05 Japan

Abstract The paper describes what is the ACP[1], the performance and the next generation of ACP. The first generation of ACP is being used at KEK for studying parallel processing and actual job like physics simulations while it's used at many laboratories of high energy physics in the world. Then, the next generation of ACP is being developed[2] for getting more high performance.

— 183 — 1 What is ACP The need for computing in high-energy physics has expanded and it is no longer possible to do all the computing that is necessary on conventional mainframes. On the other hand, more cost effective computing power can be got by the technol­ ogy of high performance microprocessors. Therefore, it's possible to make multi- microprocessor system for getting the cpu power. How does the system should be managed? There is a special characteristics in the high energy physics com­ putation. It's so called "event independency". The concept of ACP is "trivial" parallelism which means that each processor works independently with no inter- processor communication. The parallelism meets the requirement of the event reconstruction problem. The design goal is cost effectiveness, user friendliness and configuration flex­ ibility. The hardware elements of ACP are host computer VAX, node processor Motolora 68020(16Mhz) & AT&T 32100 on VME bus and 32 bit parallel bus - Branch Bus (20Mbytes/sec) connects VME bus with VAX Qbus or UNIBUS. The cpu power is estimated at cost of less than $2500 per VAX equivalent[1]. There are many node processors in VME crates and same node programs are down-loaded into each processor by host VAX. The program is automatically compiled on node processor by the host. When the program starts, the host program sends data into node and gets the results from the node concurrently. The user program doesn't know which node processor it should send data into and get the results from. The user can get the cpu power of twice if node processors increase twice, with no modification of the user's program. The configuration of ACP at KEK is shown in the figure 1.

1.1 How to use ACP The user should provide three types of program file at least. One is PARAMETER FILE, xxxx.UPF. Another is HOST PROGRAM FILE, xxxx.FOR. The other is NODE PROGRAM FILE, xxxx.FORl. The parameter file describes "system" which defines system parameters for the ACP, number of node processor, filename of host program file, filename of node program and so on. Then, a command of ACP called ACP$MULTICOMP should run for compiling host program and node program. It generates confiuration file for execution and executable module of host program and node program. Finally, just execute the programs by typing VAX DCL command "RUN". The operation is very simple. Therefore, user can use ACP with no effort. The figure 2 shows an example of the operation. The details of the programming is described in the manual[3].

— 184 — 1.2 Measurement of Performance of ACP 1.2.1 Compilation Time A whetstone benchmark program is used for the measurement. In case of using ACP, the compilation time on /iVAXII took 4 minutes and 3 seconds while the time on VAX3500 took 1 minutes and 45 seconds. Then, in case of no ACP which means that host program and node program are merged for extracting "ACP code" and compiled by VAX FORTRAN, the time on /uVAXII took 20 seconds while the time on the VAX3500 took 7 seconds. The results show that ACP system takes much time for compiling the programs. Therefore, the primary compilation error should be elminated by host machine and the simulator of ACP should be used before using real node processor, if the program code is big.

1.2.2 Communication overhead of data transfer between host and node The measurement of communication overhead is important to estimate multipro­ cessor system in general. As already described, the communication is only done between host and each node while the node doesn't communicate each other. Therefore, the overhead between host and node should be measured. A subroutine called ACPJ3ENDEVENT is provided for sending data from host to node via Branch Bus. The communication overhead of the subroutine is 40 milliseconds on JJVAXII and 20 milliseconds on VAX3500. And, transfer time of 400Kbytes data on /tVAXII is 650 milliseconds while it's 630 milliseconds on VAX3500. The result shows that data should be sent in large size for the efficiency.

1.2.3 Processing Power of ACP-CPU Whetstone benchmark is done on the CPU. It shows 0.7 MWIPS in single precision while the value on /xVAXII is 0.9 MWTPS.

2 Parallelism of ACP

It's assumed that "L" is CPU time to process "n" events and "M" is commu­ nication time to transfer data from/to host to/from node. (L/M)*n events can be processed in (L+M) time on (L/M) CPUs on ACP system. For example, in case of L=80 seocnds (400 events at 200milliseconds/event) and M=0.6seconds (400 events at lKbytes/event), the ACP system can process 130*400 events in 80 seconds on 130 CPUs. This means that the ACP system has 130 times as much CPU power as single CPU. This is the "trivial" parallelism. Two actual processing for experiments are investigated. One is processing for generation of Data Summary Tape. The other is processing for track recon­ struction. In case of the DST generation, the event size is lKbytes/event and

— 185 — the processing time per event is 250 millseconds on HITAC M280H while it will be estimated to take 4 seconds on ACP-CPU. It assumed that the data of 400 events are sent by one ACPJ3ENDEVENT and lOOKbytes data are received by one ACP-GETEVENT. In this case, the ACP system can be expanded upto 1500 AGP-GPUs with 100% efficiency. In the case of the track reconstruction, the event size is lOKbytes/event and the processing time per event is 1 seconds on FACOM M382 while it will takes 32 seconds on ACP-CPU. It assumed that the data of 20 events are sent by one ACP-SENDEVNET and the data of 400Kbytes are received by one ACP-GETEVENT. In this case, the ACP system can have 500 times as much CPU power as single ACP-CPU. The trivial parallelism is effective! These are not the result from actual measurement but just calculation while I/O of data to mass storage should be considered.

3 CERN Library on ACP

The CERN library is provided on ACP. For first generation of ACP, the library for host (VAX/VMS) and node (ACP-CPU) are installed on VAX. And, the library for MIPS computer can be used for second generation of ACP.

4 Second Generation of ACP 4.1 Hardware of The Second Generation New CPU is MIPS /R3010(25MHz) and shown in figure 3 and the talk [4]. The CPU is RISC(Reduced Instruction Set Computer) and its power is expected to increase well into the 1990s. The figure 4 shows Integrated Device Technology, Inc. (IDT) promises a 160 VAX MIPS version in 1991[1]. The typical system is shown in figure 5. The VBBC is VME bus slave & Branch Bus master interface while the BVI is VME bus master & Branch Bus slave interface. One of the CPUs becomes SERVER for downloading and file services to diskless nodes and controls the DISK Controlller. It also manages Ethernet Controller as a gateway of other machines connected by the ethernet.

4.2 Software of the Second Generation New ACP software is called ACP Cooperative Processes Software, ACP CPS. The OPS supports many of the same concepts as the first generation ACP syste.i. There are three major enhancements. First, processes can all communicate in a symmetric way - no rigid "host-node" topology is imposed. Data and control can pass from class to class in an arbitaxy manner. Second, each process is running under a full operating system(UNIX) and has access to the complete set of op­ erating system serivces. The interprocess communication(IPC) on the ACP CPS is based on UNIX's IPC as shown in figure 6. Third, each process has access to periperal devices. Figure 7 shows a typical software configuration.

4.3 Evaluation of MIPS FORTRAN Compiler The FORTRAN compiler is very important factor because real physics code runs in a high level langauge like FORTRAN. The part of specification is shown below. The MIPS compiler is superset of VAX FORTRAN. The better points :

• Depth of DO Loop can be expanded. The default is 20 which is same as VAX FORTRAN. • Continuation line is greater than 99 which is VAX FORTRAN'S spec. • Length of Symbol name is 32 while the length of VAX FORTRAN is 31. • Binary constant representation can be done like a=b'00010010001111' • Recursive call is supported. • Compile option -dJine and -vms which mean debug flag and VAX FOR­ TRAN compatible flag, repectively, are provided. The bad point : • REAL*16 is not fully compatible with VAX FORTRAN.

4.4 Result from Benchmarks There are three type of benchmarks. One is Whetstone benchmark. Another is "HBOOK" benchmark. The other is FERMILAB's benchmark. Table 1 shows the result from the Whetstone benchmark. ACP/R3000 CPU has almost same CPU power as one of mainframe FACOM M780 on UTS(UNDC). It has about 19 times as much CPU power as /iVAXII while it has about 14 times as much CPU power as VAX11/780. Table 2 shows the result from the HB OOK benchmark. The program includes 1 & 2 dimensional booking and filling. The ratio of the performance as fiVAXII equals 1, is similar to the ratio of the whetstone benchmark. This means that Whetstone benchmark seems to be the candidate of standard benchmark on high energy application. The HBOOK benchmark generates some graphical pictures and then they are shown in figure 8 and 9. Table 3 is the FERMILAB's benchmark result[4]. "RECON1" and "RE- CON2" are typical track reconstruction program, and "QCD" is an QCD program. These results are similar to above results.

— 187 — 4.5 Measurement of Performance 4.5.1 Performance of TCP/IP Communication on MIPS/UMIPS The communication overhead with TCP/IP of the internet process communica­ tion is measured. One process sends message to another process and the receiver retransmits the message to the sender immediately. Two programs runs on same MIPS M1000. The average speed to send or receive 32 bytes message is 1.45 milliseconds while the speed to do 4096 bytes is 5.3 milliseconds. The protocol overhead of the IPC is not so big. It's expected that the communication overhead will reduce in the comparison with one of the first generation of ACP.

4.5.2 Performance of VBBC Device Driver on MIPS/UMD?S The processors on VME bus is interfaced by the VBBC device driver at the view of the software. Therefore, the performance influences the whole of the ACP system. First version of VBBC device driver is written and tested. There are three type of I/O requests. One shows the basic software overhead of IOCTL system call to the VBBC driver. Another measures the software overhead of small message passing using IOCTL system call. The other is for measuring one of large data transfer. MIPS computer, M500 is used for measuring them. The typical overhead of the IOCTL system call is about 100 /usee. The system call does nothing in the device driver except referencing a variable. R>r small message, the IOCTL system call takes about 200 /isec with empty message. For large data transfer, READV or WPJTEV system call are used. It provides a powerful processing like list processing and buffer-chaining. It takes about 250 /isec without data. These results are preliminary and depends the UNDC system situation like how many processes are running. The overhead of the driver is small. This also shows that the communication overhead between processes on the multiprocessors will be expected to be small.

5 Plan For the second generation of ACP, new MIPS machine will be purchased at KEK. The ACP CPS will be installed and tested, and then new CPU board ACP/R3000 will be installed at KEK.

— 188 — References [1] T.Nash, et al., "The Fermilab Advanced Computer Program Multi- Microprocessor Project." conference proceedings, Computing in High Energy Physics, Amsterdam, June 1985 (North Holland) [2] T.Nash, et al., "High Performance Parallel Computers for Sciences: New Developments at the Fermilab Advanced Computer Program", Workshop on Computational Atomic and Nuclear Physics at One Gigaflop, Oak Rigde, TN, April 14-16, 1988 [3] ACP, ACP software user's guide for event oriented processing, June 21,1988 [4] H. Areti, et al., "Plans For The Second Generation Of ACP Multimicroproces- sor systems", Computing Techniques Seminar presented by Joe Biel, Septem­ ber 27th, 1988(FermiLab)

— 189 — Table 1. WHETSTONE BENCHMARK ( Single Precbion )

MACHINE OS MWIPS RATIOfuVA FACOM-M780 UTS 18.4 20.4 MIPS-M2000(R3000) RiacOS 17.3* 19.2 MIPS-M1000(R2000) UMIPS 10.7 11.9 VAX8700 VMS 6.0 6.8 MIPS-M500(R2000) UMIPS 5.6 6.2 VAX8530 VMS 4.2 4.8 SUN4/110 SUNOS 4.2 4.8 VAX3500 VMS 3.1 3.4 VAX11/780 VMS 1.3 1.4 SUN3/260(FPU-68881) SUNOS 1.3 1.4 /iVAXII VMS 0.9 1.0 ACP/MC68020(FPU-68881) LUNI 0.7 0.8 * This is the result from MIPS inc.

Table 2. HBOOK BENCHMARK

MACHINE OS TIME RATIO(fiVAXn=l) MIPS-M1000 UMIPS 1.07 11.03 VAX8700 VMS 1.8 6.56 VAX8530 VMS 2.5 4.72 VAX3500 VMS 3.6 3.28 VAX11/780 VMS 9.7 1.22 /iVAXII VMS 11.8 1.00

— 190 — ACP Benchmark Results And Projections (All Programs In FORTRAN)

REGONl Program Relative Performance

VAX Version 4 1.0 68020 ABSOFT .51 MIPS R2000 (16MHz) 7.9 MIPS R3000 (25MHz) 15.8 (Projection) MIPS R3000 (33MHz) 20.8 (Projection)

RECON2 Program Relative Performance

VAX Version 4 1.0 MIPS R2000 (16MHz) 6.4 MIPS R3000 (25MHz) 12.8 (Projection) MIPS R3000 (33MHz) 16.9 (Projection)

QCD Program Relative Performance

VAX Version 4 1.0 68020 ABSOFT .2 MIPS R2000 (16MHz) 7.4 MIPS R3000 (25MHz) 14.8 (Projection) MIPS R3000 (33MHz) 19.5 (Projection)

Table 3

— 191 — ETHERNET

5 -- cr. > CP U YBB T QBB C >• a

FAN

5 r

> BY I VRB T CP U

FAN

VME Crates

- ^ a> oo-

FAN

FAN

* VBBT VME Branch Bus Terminator * QBBC Qbus Branch Bus Controller * BVI Branch Bus VME Interface *VRM VME Resource Module •CPU MC68020/B8881(16MHz)

Figure 1 0?

— 192 — $ acp$mu whet Reading UPF file WHET.UPF UPP READER FINISHED. UPO FILE IS WHET.ACPTEMPORARYUPO Creating host macro file Assembling ACP host system file WHET.ACPMAR Creating host auxilliary fortran file Compiling ACP host system file WHET.ACPFOR Compiling host file WHET.FOR Creating host link file Linking user host program to file WHET.EXE Creating auxilliary node fortran file for class 1 Compiling auxilliary node fortran file for class 1 Attempting to start remote LUNI process on VAX 0 Remote MicroVax area being used is C"RL_V1_13 R_LUNI"::[.DATAO] 3: Program file complete: 4242 bytes COMPILATION SUCCEEDED FOR WHET.ACPFOR1 Compiling fortran file WHET.FORI for class 1 3: Program file complete: 2850 bytes COMPILATION SUCCEEDED FOR WHET.FORI Linking for class 1 to form WHET.EXE1 MEMORY NEEDED FOR SYSTEM AND EXECUTABLE IS 255272 BYTES Renaming temporary UPO file WHET.ACPTEMPORARYUPO to WHET.UPO; 1 Multicomp completed successfully $ run whet

Figure 2

— 193 — ACP/K3O0O

D-CACHE READ 1 WRITE I-CACHE 32KB BUFFERS 1 BUFFERS 32KB

SYSTEM CSR& CONTROL STARTUP I SY!3TE M 1 1JU S 1 MEM& 8MB PROM VME EXT CTL MEM[OR Y 25 6KB INTERF.

MEM& EXTERNAL VME BUS BUS

Figure 3

— 194 — IDT MIPS Performance Road Map

160'

60

I 20 —

10 —

Figure 4 MJM00044.00 Typical System

Figure 5

— 196 — ACP/R3000 Process Usetr Cod e ACP Support Routines UNIX System Calk /v 7N ~?F: \/_ J^L \/_ TCP NFS UDP -7FT

Internet ACP Messages 7^

JiUL JskL Direct Ethernet ACP Intemode Internet Interface Internet Interface Memory and Driver 7 ST transfers /*\ ACP \s_ N£_ ACP Message Driver

~~7R Direct /N Intemode Memory transfers HL. &- Incoming Message VBBC / VME Driver Interrupt 7TC VME Data transfers \/_ \I>L Ethernet VME Bus

Figure 6 More Sophisticated Reconstruction Topology

Data Flow

Figure 7

— 198- «4 O O o m-a- e a* e'­ o o m -a- o c-o* o < m© c-co ft! O f-^ T-* O C^vo • • CM-a- o r^-»?-rm CM fOO o-m NNO r*-m NrtO h-lH moo f-o O (M *4 OtO vo o> O O *HCO O VOCO wH C*-0 VO P- HIAO VOVO o-* ». *4 (HO vo m oa mi n **•& O VO-T oo »-i rto so m ON I H rtWO vo m I M -1 -t O vo^ O O I H *4 O O vO O 0*0 men • • CO O in eo t*-o IftN 301 o o mvo q mo mm a • -a- o in-a* E mo mm CM O mm «-i O m«i sa =• mo s 00 Ol-tf O -a- c% P K «M mo -4-CO (M (M O -tf E- (M *•< O -tfvO u« CM O O -* m • • *4 ONO -tf-* HCDO -* m »* r*-o -* m •P* vi*J) O -3"*H s. *-i mo -a- o iftftl TH-* O mo> o o fa *4 no moo • • HNO mi> taw *«t «•« O mvo on i-l O O mm OC-l tr m « mo mco (M O m P* n «4 O mvo -J m m m-» O m-a- 2u a9 m no m m B J CM m O m m CM *4 O m w «s<> moo m o u «4 OVO *-l ON , 2 *4CO O *4C0 -M r»o »H r» «*vo o *4V0

— 199 — EXAMFLB HO - 1

EXAMPLE OF A TABLE HBOOK ID • 30 DATE 01/03/89 NO = 3 CHANNELS 10 V 0 1 0 A 1 N 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 V B ABN « ABN OVE • OVE 30 31 32 33 34 35 36 37 38 39 40 4l 42 43 44 45 240 * 30 29 30 31 32 33 34 35 36 37 38 39 4o 41 42 43 44 235 * 29 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 230 • 28 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 225 * 27 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 220 • 26 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 215 • 25 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 210 • 24 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 205 • 23 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 200 * 22 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 195 • 21 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 190 • 20 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 185 * 19 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 180 • 18 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 175 • 17 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 170 • 16 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 165 * 15 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 160 • 14 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 155 • 13 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 150 * 12 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 145 • 11 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 140 • 10 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 135 * 9 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 130 • 8 7 B 9 10 11 12 13 14 15 16 17 18 19 20 21 22 125 * 7 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 120 • 6 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 115 • 5 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 110 • 4 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 105 • 3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 100 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 95 * 1 UND • UND LOW-EDGE 10 1 1 1 1 1 1 1. 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 • II • ENTRIES - 600 PLOT 1 1 • SATURATION AT» 1023 I 105751 5025 • STATISTICS 1 1 • II

Figure 9 TOPAZ Data Pre-Processor

Ryosuke Itoh Physics Division (TOPAZ) KEK

Contents 1. Introduction 2. Data Preprocessor 3. Requirements 4. Design 5. Schedule / Conclusion

— 201 — * Parallel Proccessing 1. Introduction L*. a realistic solution Todgy'a typical gnqlvsla In HEP Resources

-200MEPS Event Event Event Event Event CPU's 1 2 3 4 S

C Exp.A ) 1) HEP analysis- -+• event by event basis I- Fit to parallel processing nature Terminals 2) Steep increase of microprocessor's perfomance Future experiments ( SSC.B-factorv.JLC ) For example, * event rate, data size > lOx (current exp.) R3000 -20 MIPS MC68030 ~ 5 MIPS We need >2000 MIPS main frames !! We would like to experience Unreallstlcl! "what is Multi-Processing" and accumulate "know-how's" for the future use. 2. TOPAZ data pre-processor 3. Requirements to the preprocessor Background: a) Requirements on Hardware DST production—> 50 % of CPU resource for 0) Use commertially available components TOPAZ (No new design/construction) i 1) directly connected to the data acquisiton Reduce to less than 1/2 system —•VMB 2) No mass storage for data (i.e. processed TOPAZ new data acquisition system data are directly transferred to FACOM) 3) CPU power of at least 30 MIPS is required. oCO u b) Requirements on Software 1) Program codes are fully compatible with Electronics FACOM codes in Fortran level. (TOPAZ Data Preprocessor) 2) A good Fortran Compiler is required. 3) Data handling using "Bank" system •—• Shared Memory System ( 4) TOPAZ software —*• modular structure micro-processors Software Pipeline on parallelI processors?? ? ) ' do a part of DST production ' online filtering 4. Design * Mechanism for software pipeline event driven structure New event Event Processor Collection of Event Processing Elements Start Processing Start Processing TTh \ Event Processors EPEl Get/Putdflfa^ Chain fBuflErt Event into/fttMi Bank • Cumulate EPE2 Processor Histogram • Chain (BuflD ) Histogram Server EPEn j ' i J _J DvZ End of processing DMA Shared VME/Channel l/F Data l/F Histogram Memory Memory Design of Queueing mechanism of "events" L from To U Essential Data FACOM Acquisition via System "channel" ON Connectto FACOM o c • IH n o I 00 Connect to DAS U O U ^ VO Get CPU's c •§ E

J! o CO o CO o o o gn o o u-i O o o 00 e T! •a o ca « 13 >» tS B B ft" >G0 O «S Jfl ! * 5 pc, O Q 6 5 pq g w COP4

— 205 — REPORT OP WORKSHOP ON TRIGGERING AND DAQ FOR EXPERIMENT AT SSC

Yoshihide Sakai National Laboratory for High Energy Physics Tsukuba, Ibarafci, Japan 305

ABSTRACT The workshop on "Triggering and Data Acquisition for Experiments at the Superconducting Super Collider (SSC)" was held on January 16 - 19, 1989 in Toronto, Canada. Some interesting topics discussed there and the summary of the workshop are briefly reported.

INTRODUCTION

Since the last workshop on Triggering and Data Acquisition system for the SSC experiments at Fermilab in 1985 , several workshops on the SSC experi­ ments have been made and provided more understanding and realistic picture on the detector aspects ~~ . Also the considerable progress has been made in tech­ nologies relating to the triggering and data acquisition aspects since then. With above background, the workshop on Triggering and Data Acquisition system for the SSC experiment was held on January 16 - 19,1989 in Toronto, Canada The intent of this workshop was to explore the nature and feasibility of trigger and data acquisition for SSC experiments. The total number of participants to work­ shop was about 80 and mainly came from U.S.A. and Europe. Three people attended from Japan (Watase-san, Arai-san, and myself from KEK). In the fol­ lowing, first the overview of SSC is described briefly both for accelerator and experiments, then some of the interesting topics and summary of workshop is reported.

OVERVIEW OF SSC

SSC Accelerator The SSC accelerator is the double ring proton - proton colliding machine using superconducting magnets with circumference of 83 kilo-meters. The ac­ celerator rings and tunnel are schematically shown in Fig. 1. The design value of beam energy is 20 TeV and it will provides proton - proton collision with center of energy of 40 TeV which is 20 times higher than presently operating

— 206 — accelerator. The ring has oval shape and 6 collision points will be provided for experiments. These collision areas are gathered into two groups (4 in one and 2 in other) and located on each straight part of the ring as shown in Fig. 1. The designed luminosity is 1033cm_2sec-1 with beam life time of about 24 hours. Each ring will be installed with 3,840 superconducting bending magnets and 678 focusing magnets. Bending magnets will be 17 meters long and have magnetic field over 6 Tesla. The final candidate place for SSC site has been recently decided from several offered places. It will be in Texas state and about 35 kilo-meters south from Dallas.

SSC detectors The detector designs for SSC experiments have been studied in series of workshops. Because of high interaction energy, general purpose detector will be naturally quite large in size. As an example, Fig. 2 shows so called 'Large Solenoid Detector' design which is one of the 4TT detectors for high pt physics. The detector components are vertex detector, Tracking chamber, EM/Hadron calorimeters, superconducting solenoid, iron return yoke, muon tracking cham­ bers, and forward detectors which are quite similar to present CDF detector at Fermilab TEVATRON collider. However, the size of the detector is 16 meters high, 16 meters wide, and 30 meters long just for central part and weighs about 1,3000 tons. This size is larger than CDF by about twice in each dimension. The detector is required to be finely segmented from physic side and also from high luminosity SSC environment. This generally results to very large number of readout channels. The summary of design parameters for 'Large Solenoid Detector' mentioned above are listed in Table 1= Approximately 1 million readout channels are necessary.

General Aspects of Trigger and DAQ at SSC The role of trigger and data acquisition system is select candidates of interest­ ing events from the bulk of uninteresting events and send all useful information of selected events from detector front-end electronics to permanent storage device (usually involves conversion of analog information to digital information). For SSC, the following difficulties are encounter; o Very large reduction factor in trigger is required; Total cross-section of proton - proton collision at SSC energy (100 mb) with designed luminosity of 1033cm-2sec-1 gives 108 interactions/sec. Since the rate for writing on permanent storage is limited to 1 to 10 Hz by computing power of off-line analysis etc., the reduction factor of 107 to 10 is necessary. o Interval of beam crossing is 16 nsec for SSC in contrast with a few micro-sec for existing hadron colliding machine. This interval is too short to make

— 207 — even first level trigger. In order to maintain efficiency, at least first level trigger must have pipe-lined structure with 16 nsec step. o Due to the large number of readout channels and high trigger rate at early stage, large band width for data transfer is required.

The general scheme of trigger system for SSC experiments has been discussed at Fermilab workshop and following scheme is commonly accepted; — 1st Level trigger: in this stage the trigger rate will be reduced from 108 interaction /sec to 105 - 104 Hz with the processing time of 1 - 10 /isec. The trigger logic circuit must be pipe-lined with 16 nsec beam crossing step and also all the detector information for each beam crossing must be kept till decision is made in this stage. — 2nd Level trigger: in this stage the trigger rate will be reduced to 103 - 102 Hz with the processing time of order of 1 msec. — 3nd Level trigger: in this stage all detector information for one event is gathered together into the event data structure by event builder and es­ sentially same information used by off-line analysis is available for trigger. The software event filter running in processor farm will reduce trigger rate to 10 - 1 Hz. The events selected by this stage will be stored on mass storage device for off-line analysis.

REPORT OF THE WORKSHOP

Organization of the Workshop The workshop was held for 4 days and it was organized as following; 16-Jan: Plenary Session; This session gave the overview of present status of related things to SSC triggering and DAQ from various aspects, recalled participants what to be worked out, and provided some direction for the workshop. 17,18-Jan: Subgroup Parallel Session; All participants were assigned to one of subgroups (described below) and made presentation and discussion concentrating on particular aspects. 19-Jan: Summary of Subgroup discussion; A chair person of each group reported the summary of discussion made in each subgroup session. The following subgroups were formed for parallel discussion in this workshop; 1. Detector Element Parameters: Only initial report in plenary session was made and no subgroup discussion was made in the workshop.

-208 — 2. Algorithms: 3. Front-end Signal and Trigger Processing: 4. Data Acquisition, Event Building, and On-line Processing: Three people from Japan joined to 3 different subgroups (Group 2: Sakai, Group 3: Aral, Group 4: Watase).

Plenary Session The following presentation was made in the plenary session; 1) Status of SSC; SSC Physics and Trigger Requirements: An introduction to the workshop was made by M.Gilchreise (SSC - CDG). The recent status of SSC was reported first. Then, the trigger requirements from several particular physics (such as Higgs search) were presented. The stress was made on the importance to have enough flexibility and redundancy for trigger system. 2) Initial Report from Group 1 - Detector Element Parameters: General parameters of SSC experiment (channel numbers, data size etc.) are reported to make a common base in the participants for following discussion in the workshop, (see introduction part of this report for numbers.) 3) Triggering and DAQ of SSC Tracking: Result of initial work on track reconstruction for Central Tracking Chamber was reported. The algorithm used was that first track segments in each super-layer was reconstructed and then entire tracks were reconstruct by combining track segments. It was shown that selecting only radial track segments cleans up the chamber hits drastically and gives possibility of quite fast track reconstruction. The importance of having super-layer structure was emphasized. 4) CDF DAQ System - to SSC ?: The specifications of CDF DAQ system at design time were compared with what have been actually achieved in recent CDF runs. The result is listed in Table 2 and one can see the reality is consider­ ably worse than specification in most items. Nevertheless, the point of the talk was that the system routinely took useful data with low dead time at twice the design luminosity. Also, various problems found in the CDF DAQ system (Fast- bus, Eront-end, Event building, Hardware failure etc.) were reported and some suggestions towards SSC system were made based on these CDF experiences. 5) Simulation of DAQ System at SSC: Since the DAQ system for SSC be­ comes large and complicated one, having the simulation of the system is quite important for designing the system without dead-lock or dead-time. As one of such attempts, initial work of simulation using a general purpose simulation lan­ guage GPSS was reported by Watase-san. Schematic diagram of DAQ system in GPSS is shown in Fig. 3. Several example of the simulation results were presented. 6) Progress on Micro Processors: The present status and future develop­ ment for micro-processors which would be used in the SSC was presented. RISC (Reduced Instruction Set Computer) technique to improve computing speed has been making progress recently. 20 VAX CPU is available presently and 100 VAX CPU will be available by 1991. Second generation ACP (Advanced Computer Processor) is now being developed using MIPS R3000 CPU (RISC architecture) for general purpose use. On the other hand DSP (Digital Signal Processor) is being developed for high speed special purpose processing and 100 MFLOPS is available now. The summary of trends of computing power for several types of processors are shown in Fig. 4. 7) Data Transmission via Fiber Optics: Data transmission from detector front-end to next stage electronics using fiber optics is quite attractive for SSC, because fiber optics enables high band width with small number of transfer lines for long distance. Recent developments and status for fiber optics technique and materials were presented. One of the problem in SSC application is radiation damage. The measurements of radiation damage were made for several materials and high OH fiber showed relatively high radiation resistivity. 8) Project Management of Large System: Since SSC project will involve a huge system, the project management will become a quite important problem in order to proceed the project efficiently. Some general directions for this problem were presented.

Subgroup 2: Algorithm The experience of present hadron collider experiments - CDF, UA1, and UA2 were reported. Also the plan and status of DO and ZEUS experiments were reported. The ZEUS trigger system will be the first trial which is using piped-line technique for entire 1st level trigger (HERA accelerator has 96 nsec beam crossing interval and therefore they also need piped-line system) and will give quite useful information for SSC application in a few years. Another interesting topic was neutral computer. The computer which works with same principle as human brain has been developed recently. The advantages of this computer are in pattern recognition and self-learning aspects. Simple applications for SSC in track pattern recognition and event topology recognition (in simulation level) were reported, though they were too primitive for actual applications at present.

Subgroup 3: Front-end Recent developments on TVC (Time to Voltage Convertor) circuits, TMC (Time Measurement Cell, by Arai-san), and GaAs chips etc. which would be applicable for SSC front-end electronics were reported.

— 210 — The power density problem was discussed and the evaluation showed this would be a serious problem with present technology. Some more detail will be described next Arai-san's talk.

Subgroup 4: DAQ, Event Building This subgroup was further divided into several smaller subgroups and made extensive discussions on more detail. The subjects of the discussion covered the optical fiber data transmission, DAQ architectures, consideration of new bus system, permanent storage, and design methodology / Project management. One of the interesting topics was idea of application of barrel shifter to event builder. The function of event builder is gather data for same event from vari­ ous detector components and put them together into continuous buffer. Usually these event data will be sent to processor farm for further event filtering. The idea was doing these procedures using barrel shifter as shown in Fig. 5. The data from detector components go into barrel shifter in parallel and with suit­ able delay for each components they come out as event data in parallel and are sent to processor array of processor farm. This scheme enables very high band width event building in principle and might give possibility to eliminate 2nd level trigger. Still considerable R&D is needed for practical applications.

Summary In the workshop, many new developments and progresses which were related to trigger and data acquisition system were reported. Also, new experience from CDF experiment and developments for coming experiments such as ZEUS and DO were reported. Some of them are quite interesting and promising in the future but still need further R&D work to make applications to SSC experiments. On the other hand, many problems were also revealed in the workshop. Some of them were quite serious and should be worked out hard to solve them. In the workshop, various aspects were discussed more or less separately and this will be a step towards making overall framework with more detailed picture in the future. Author wishes to express his thanks to Prof. S.Mori for giving opportunity and support to participate the workshop, and also to Prof. Y.Watase and Dr. Y.Arai for their help and giving information in the workshop.

REFERENCES

1) Proceedings of the workshop on Triggering, Data Acquisition and Com­ puting for High Energy / High Luminosity Hadron - Hadron Colliders, Fermilab, November 1985.

— 211 — 2) Snowmass '86 Proceedings 3) Experiments, Detectors, and Experimental Areas for the Supercollider, Berkeley, California, 1987. 4) Snowmass '88, High Energy Physics in the 1990's

Table 1. Summary of design parameters of die Large Solenoid Detector

SOLENOID COIL Inner diameter 8 J meters Length 16 meters Central field 2 Telle Weight (inclading fiax rttnni) 16*450 metric tons CENTRAL TRACKING Inner radios 0.40 meters Oatvndiu 1.6 meters Number of sapcrlaycrs 15 Nnmber of cells 122,368 |i)l coverage

— 212 — Table 2. Comparison of Specification and Reality for CDF DAQ system

Dead-time per Trigger: Spec: -1 msec Reality: 15 msec

Digitization Accuracy: Spec: 16 Bits Reality: "13 Bits

Readout Rate: Spec: 100 ev./sec Reality: 8 ev./sec

Event Rate onto Tape: Spec: .1 Hz Reality: 1 Hz

Event Size: Spec: 100 Kb Reality: 180 Kb

Event Rejection in Level 3 "Processor Farm" Spec: 100:1 Reality: -4:1

Multi-user capability: Spec: "16 users Reality: "11-3 users

DAQ System Up-time: Spec: none Reality: 70S

-213 — -Interaction halls

Future interaction halls

Proton I

Proton I 'UtSty straight' sections

Fig. 1. Schematic view of SSC accelerator tunnel

• » • •* • i»_i •» • •" • •" • •'« • . " 20 • .22 • ,24 VtMtM Detector SI^^^Fotwatd Muon Torod^s

H-J.0 • Forward Tracking Low Beta Quads —

.10 n 14 -ii

Fig. 2. Schematic view of the Large Solenoid Detector

— 214 — [POT uimjuToiil

oiTicroa 1 (•IBUUITOIlJ |o«imr« | IVIM EVENT I tmam | oil* Of ivotr. DATA

GPSS

i__inra Lf 1 AMEVILE -6

1000- ACR

100- s. E 10- % 1-

I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I 1964 1969 1974 1979 1984 1989 1994 Year Announced. Fig. 4. Trends of computing power versus year

— 215 — *QS)-+ .„(Tcl|I¥]IT71

E3oa 63 EO

na(ianinr»| saco

|M|llUlft|MMfM)< E3E3IQlp)EllE*IE3E3C3V -*® -rK* 1 O* fc»IHglBclBilB«lliHITgUnlh»l ffimi]iriiwiia«ici||niBii(iF -*©- * « &*• B»iBaBsiBg)p»nr«ir~ii n | BZlireiTOBclHiilKeigeiBcllTcl -+©- e a O*. irzmaifAiBDiBgiBi)B«infi O*. ES|gZ]gi]EJ||S]gE)gl)ES)C3 iMMrtMDala MiH | •• 11 iillii mam •»•••• >

Fig. S. Schematic diagram of the event processing by barrel shifter Data Acquisition System at the SSC

YasuoArai KEK, National Laboratory for High Energy Physics, Ibaraki 305 JAPAN

ABSTRACT Several interesting topics at the workshop on Toronto, requirements for the SSC data acquisition system, and present R&D activities for die SSC are shown. Especially, VLSI developments which are being done at KEK are described.

INTRODUCTION The workshop on Triggering and Data Acquisition for Experiments at the Supercollider was held at Toronto, Canada, on January 16-19,1989. Whole reviewo f the workshop was done by Y. Sakai in this meeting. I will describe some of interesting topics from a view of new technologies. Data acquisition system at the SSC experiments will be very different from present ones. Because of large number of channels and high data rate, we must implement many electronics near the detector and need huge computing power to filter the events. At the SSC we will not use standard module/buses extensivelyat die frontend,ramer develop customized LSrs for detectors. In Fig. 1 several parameters are shown for both TRISTAN and SSC. Interaction rate is very different in these two experiments and the beam crossing interval is extremely short at die SSC. So we must elaborate on data acquisition and trigger system, which need pipeline operations and high speed circuits. In addition to these conditions, electronics inside me detectors must be radiation hard. Radiation from charged particle/y-rays will be 1 Mrad/year around 10 cm from the beam line and neutrons of 10^ n/cm^/year come fiom die beam and calorimeters. Fig. 2 shows typical data flow scheme at me SSC. Primary interaction rate is about 100MHz. Fu^t level trigger <^dt is expected to be able to reduce ute event rate by a factor of 1000. It will take about 1 us to make a decision. Frontend electronics must hold the event until die 1st level decision will come. This requires pipeline operation at die trigger and data buffering at die frontend electronics. The actual scheme for die second level Bigger is still not clear, but it is expected to get another factor of 100 reduction of die trigger rate. This condition is required from digitizing analog signals and reducing die data size. However, mere is anodier scheme proposed which skips me second level trigger. AUAe data after me lstlevdtriggeraresenttomepiocessorfarm.lt requiresver y high speed (-100GByte/sec) data transfer technique and a processor farm of huge computing power. This simplifies die total scheme very much but needs more careful studies.

— 217 — TRISTAN

Beam Crossing 200 kHz 62.5 MHz

Interaction Rate 0.01Hz 100 MHz

No. of Channels 104 106

Data Size/Event 10kB 1MB Radiation (e, y) 1 k-1MRad/year 2 2 (n ) 1d n/cm /year

Fig. 1 Parameters relevantt o die data acquisition system for TRISTAN and SSC.

Bandwidth of a data transfer line is expected to be 100M ~ 1G bit/sec/line. So we need more than 1000 lines to get ~10GByte/sec speed. In addition to the bandwidth problem, event building is a important issue. To process events in the processor farm, data from one event must be collected in a single buffer This may be implemented by using dual port memories or bus switches. As a kind of bus switch, barrel shifter switch is proposed by M. Bowden1. It switches input line sequentially which simplifies the implementation.

Li the computer farm, we need 10* to 10<> VAX equivalent computing power. This may be realized, for example, by using 200MIPS chip, 4 chip/boaid, 20 board/crate and SO crates processor farm. Events are filteredi n die farm and written to mass storages at -10 Hz.

— 218 — Detector Detector [ Element Element (~1 M Channel) 10^2

1st Level 1st Level Buffer Trigger

(digital/analog) 10~H^ z (optional)i 2nd Level 2nd Level Buffer Trigger (1 MB/event) 10 Hz! [ADC! ADC] Transmitter Transmitter inside Detector) I I 1-100GB/S Bus Switch/Dual Port Memory I 4 6 p=;— I Receiver | 10 ~iO VAXeq. | Receiver •H Proccesors j- •»( Proccesors Y ^ Proreesors}-» »[Prbccesors]-»

Proccesors i Proccesors J QD Wir- 10 Hz

Fig. 2 Typical data flow diagram at the SSC.

-219- HI-TECHNOLOGIES To develop the data acquisition system at the SSC, we need many new technologies which will be available in near future. Fig. 3 shows some of the expected technologies for data acquisition system at the SSC. Some of the interesting technologies are commented below.

Frontend f^a^P-SKaper, Discri,^"alog Memory.TDC... I (Bipolar/CMOS/GaAs LSI's)

l\Trigge r Analog Sum, Pattern Recognition, 1/ (Sum Amp, DSP, Transputer, Neuro Computer...)

Readout Sequencer, Buffer I (Standard Cell, Gate Array...) Driver, Optical Fiber,... Transfer (Bus Switch, SuperBus, FDDL.)

CPU Farm Processor (ACP, TRON, RISC, CISC, DSP...)

Storage 8mm, DAT, Optical Disk,Optical Tape ..

Fig.3 Expected new technologies used at the data acquisition system at the SSC. Preamp/Shaper/DJscriminatnr

A monolithic preamp/shaper/discriminator chip is being developed at KEK?. A 64 channel chip3 is now being tested for use in Si strip detectors. The main features of this chip are listed in Table 1. Total power consumption for a preamp-shaper-discriminator chain is estimated to be 3.3 mW/ch. The process technology for fabrication is a super self-aligned bipolar technology (SST) developed by NTT LSI Laboratories. Another advantage of the SST process, in addition to its high current gain (120) and high cut-off frequency (fT = 17 GHz), is its radiationhardnes s (up to 1()6 rad and lO1^ neutron / cm2)4. So the chip will be able to work at the vertex detector.

Table 1 Characteristics ofthe preamp/shaper/discrimiiiatDr chip. * No. of Channel 64 ch * Chip Size 6.82 x 4.9 mm2 •Shaping Time TM=15nsec '"Reference Voltage Stability ±0.1 mv/*c * Comparator Input Voltage 16 mV/25000 electron * Noise Value 1000 electron (5-10 pF) * Power Consumption 3.3mW/ch

— 220 — Time Memory Chip

5 6 8 A new CMOS TDC chip, the Time Memory Cell (TMC) ' '?' is now being developed to use in high rateexperiments . The chip has multi-hit capability and the first-level buffer. Through useofCMOS, VLSI sub-micron lithography and a new circuit scheme, the TMC is a very low power device, compared with a shift register. The high accuracy of the TMC is accomplished by having internal feedback circuits which reference to an external clock period. A block diagram of the TMC chip (TMC1004), which is being developed, is shown in Fig.4. It includes 4 channels in a chip and has 1024 bit/channels. Timing resolution will be Ins/bit

DOUT0<5> DOUT1<5> DOUT2<5> DOUT3<5> CKk7>

• RE

latch latch I latch latch •—«-

WCO<2> « W R P WE - 1 0 T | WSTART- t N RSTART Tm c WCLK _ RCLK t t t t UNO TIN1 ™2 TW3 Fig. 4 TMC1004 Block Diagram

Neuro Computer Neuro computer or neural network is a one of most exciting area for computer scientists. We are still not sure the usefulness of this kind computer for high energy physics, but it may play an active part in track finding and pattern recognition and so on. D. Cutts and et al". studied the ability of a neural network by using a simulation program on IBM-PC. They showed two example, electron/photon separation, and Higgs' photon and hadron shower separation. It indicates, neural networks can learn to recognize features of high energy physics data. It is still too early to judge die usefulness of the neuro computer at the SSC, but we need to continue exploring the possibility.

— 221 — SUBBBUS.

SuperBus or Scalable Coherent Interface (SCI) is a new standard which is now developed under IEEE project 15%. It aims bandwidth of 1 GByte/sec/processor (N GByte/sec for N node).This is not really a bus but-.a uni-directional, pomt-to-pomt links which connects distributed pid&ss^Ta^-^taaps, which includes SLAC, CERN, Apple, NS, HP, Norska Data etc., are now working actively to establish the standard. First draft of spec for preview and comments will appear this July.

SUMMARY We need many new technologies to make a data acquisition system for the SSC. It is often said that electronics is a key issue of the SSC. We need to judge most proper technologies to build die system.

REFERENCES

1E. Barsotu' M. Bowden, and C. Swoboda, "SSC/BCD Data Acquisition System Proposal" JProceedings of the workshop on Triggering and Data Acquisition for Experiments at the Supercollider, SSC-SR-1039. 2 H. Dceda et al., "Monolithic Preamplifier with Bipolar SST for Silicon Strip Readout", IEEE Trans. Nucl. Sci. Vol. 36; No. 1(1989). KEK Preprint 88-71 3 H. Dceda, "64 Channel Bipolar Amplifier for Silicon Strip Readout", Report for Workshop on Solid State Detector, Hiroshima University, Dec. 1988. 4 N. Ujiie, Talk at Workshop on Solid State Detector, Hiroshima University, Dec. 1988. 5 Y. Arai and T. Ohsugi, "An Idea of Deadtimeless Readout System by using Time Memory Cell", Proceedings of the 1986 Summer Study on die Physics of the Superconducting Super Collider, p455. KEK Preprint 86-64(1986). 6 Y. Arai and T. Ohsugi, "TMC: A Low-Power Time to Digital Converter LSI", IEEE NS Symposium, Oct. 1987. KEK Preprint 87-113(1987). 7 Y. And and T. Baba, "A CMOS Time to Digital Converter VLSI for High-Energy Physics", 1988 Symposium on VLSI Circuits, Aug. 1988/Tokyo, IEEE CAT.No. 88, TH 0227-9, pl21. 8 Y. Arai, "Time Measurement System at the SSC", Proceedings of the workshop on Triggering and Data Acquisition for Experiments at the Supercollider, SSC-SR-1039. 9 C. Barter, D. Cutts et al., "Neural Networks, DO, and the SSC", Proceedings of the workshop on Triggering and Data Acquisition for Experiments at the Supercollider, SSC-SR-1039.

— 222 —