RD24 Status Report 1994

EUR0PEAN ORGANIZATION FOR NUCLEAR RESEARCH

CERN/DRDC/ 94-23 RD24 Status Report 9 May 1994 RD24 Status Report Application of the Scalable Coherent Interface to Data Acquisition at LHC

A. Bogaerts1,R.Keyser,H.Müller1, G. Mugnai, P. Ponting, D. Samyn, P.Werner CERN, Geneva, Switzerland B. Skaali, E.H.Kristiansen2,H.Golparian,J.Wikne,B.Wu University of Oslo, Department of Physics,Norway S.Gjessing University of Oslo, Departement of Informatics,Norway S. Falciano, F. Cesaroni, G. Medici INFN Sezione di Roma and University of Rome, La Sapienza, Italy P. Creti, M. Panareo INFN Sezione di Lecce, Italy A. Sytin, A. Ivanov, A. Ekimov IHEP, Protvino, Russia E. Sanchis-Peris, V. Gonzalez-Millan, J.M. Lopez-Amengual, A.Sebastia, J. Ferrer-Prieto IFIC, Valencia, Spain F.J.Wickens, D.R.Boterill, R.W.Hatley, J.L.Leake, R.P.Middleton Rutherford Appleton Laboratory, Didcot, UK R.Hughes-Jones, S.Kolya, R.Marshall, D.Mercer University of Manchester, UK

K. Løchsen, S.E. Johansen, H. Kohmann, E. Rongved Dolphin Interconnect Solutions A.S., Oslo,Norway A. Guglielmi, A. Pastore Digital Equipment Corporation (DEC), Joint Project at CERN F-H. Worm, J. Bovier, A.Lounis Creative Electronic Systems (CES), Geneva, Switzerland R.Hon, D. North, G. Stone Apple Computer, Inc. Cupertino USA E. Perea Thomson-TCS Semiconducteurs Specifiques, Orsay, France

1. joint spokesmen

2 RD24 Status Report 1994 Motivation for SCI

RD24 2nd Phase Activities and Milestones

Previous activities and milestones of the RD24 collaboration are reported in the DRDC reports with the same title, referenced as CERN/DRDC/91-45 (Proposal P33), CERN/DRDC/ 92-06 (Addendum to Proposal P33), CERN/DRDC/93-20 (First RD24 Status Report May 93). These documents and many others are available on the public ftp server rd24.cern.ch in the directories sci/RD24_Info/Status_92, Status_93 and Status_94. Chapter 1. Motivation for SCI The goals and motivation of the RD24 first phase remain unchanged. The Scalable Coherent Interface (SCI) IEEE standard [3]after demonstrating an impressive 500 Mbyte/s link operation (Figure 1) during RD24’s first phase [1][2], has now low-cost CMOS chips and plug- SCI Link Signals: FLAG 4 16xDATA 18 SCI CLOCK Signals 4ns 16 bit DATA * 4ns = 1/2 GB/s SCI Packets: HEADER CRC DATA 0, 16, 64, 256 bytes ADDRESS (64 bits) + COMMAND Figure 1: 1993: SCI packets at 500 Mbyte/s in boards for SBus and VMEbus available. Technology independence from low cost CMOS to high performance GaAs implementations, all conforming to the same standard, now becomes visible. The options for caching and cache coherency though not at present available, me be added in the future without affecting present investment in applications. SCI has proven existence and now awaits implementations of further VLSI components and board level products to fill a range of compatible building blocks. These products will provide -like services between memories and processors over high speed, optimally terminated point-to point links. SCI is not designed for wide area communication, where ATM will have its dominant role, nor for very low latency environments like a 1st level trigger, nor for massive channel electronics environments where packaging, power and cooling plays an important role. In order to interface SCI to these boundary areas a variety of general purpose and specialized bridges are needed, some of which are soon expected to become commercially available such as SCI-ATM and SCI-VME bridges. Special bridges to the ATLAS 2nd level DSPs and the CMS DPM readout units are being designed or planned in RD24. The most suitable area for SCI is to provide high speed interconnections between front end data acquisition units and event builders, and between event builders and computer farms. This also means simplification by introducing a uniform, scalable and standard system between the application (the physicist) and the data buffers after the 1st level trigger. Simplification is also due to the support SCI provides in transparently accessing memory over SCI: the application software is not concerned whether the memory is local or remote.

3 RD24 Status Report 1994 CERN RD24 Milestones and activity report

Another notable feature is the natural possibility of supporting bi-directional data streams within the same SCI system. The same chips support a data driven architecture to move the data from a second level trigger processor as well as supporting a DMA type broadcast to download streams of calibration constants from the processor farms. In addition to scaling, SCI is also adaptable to changes in data acquisition and trigger architectures. SCI bandwidth starts off with a potential of 125 MByte/s link rate available today across SCSI-2 like cable assemblies or Gbit links. Performance enhancements by factors between 2 to 8 are expected soon for coming SCI components. SCI starts where buses become ineffective due to bandwidth limitations for long distances. SCI however is implemented over thin, cheap and flexible links (Fig 2) currently 18 signals for the parallel implementation. A

16 DATA +1Clock +1FLAG -> 18 pairs + _ 2 bytes @ 62.5 MHz => 125 Mbyte/s 4.2 V 2Bytesat each Transition 3.0 V 16-19 ns DIFFERENTIAL Pseudo-ECL ( ECL + 5Volt) Connector: 50 pin Cables 18 pairs

50 OHM Nodechip A 3V Nodechip B

18 pairs

50 OHM 3V Figure 2: SCI’s thin signal path between nodechips ringlet connection between two SCI nodes requires, apart from the node chips, two SCI cable assemblies and termination resistors for one outgoing and one incoming connection. SCI chips with 4 ns clocking (1993) required very high quality cables whereas the 16 ns clocking on low cost CMOS chips allows the use of low priced and popular SCSI-2 cable assemblies. Such small and robust 50 pin connectors proposed by RD24 have been adopted by the first European vendors. AMP, a world leader in cable assemblies, considers producing these specialized for SCI. Future SCI chips will probably use the SCI LVDS [4] (Low Voltage Differential Signaling) standard running at 2ns clock rate. LVDS, due to low current and low voltage swing allows the chip manufacturers to put the link termination inside the link controllers. LVDS has been used by National Semiconductors for Quickring chips at 300 MHz over 3 m cables. Chapter 2. CERN RD24 Milestones and activity report Several activities of RD24’s 1st phase [2] in particular design work on the SCI DMA node have largely contributed to today’s commercial European starter systems using the CMOS node chips from LSI Logic.[5] These chips became available only early 1994 and then required

4 RD24 Status Report 1994 CERN RD24 Milestones and activity report revision due to a problem in the bypass FiFo. RD24 had the first corrected chips available on SBUS and VME cards at CERN in May 1994 allowing us to test functionality and performance of a first two-ringlet system in a VMEbus based test setup in the ECP Division at CERN. Our

download, login and test Old GaAS tests on single Sbus ringlet RIO in 1993 PT-SBS915 VMEbus

REQUESTER FIC 8234 Os/9 REQUESTER RESPONDER RESPONDER SCI 8224 DMA SBUS Interface SBUS

Sun/SPARC Sun/SPARC

Cbus Link Diagnosis: 3 m SCI Node Node cables for different, chipNode chip single or multiple ringlet tests. Back-Back Bridge SCI test ringlets State Analyzer HP 1661 Functionality tests SCI-SCI bridge ( 16 Id’s passed to far-side ringlet) Waveform Analyzer Lecroy 9450 Figure 3: RD24’s 1994 multi-ringlet SCI test setup at CERN previously reported tests [2] were performed in 1993 using a VME -SBUS connection and software downloaded into the RIO Interface which had a GaAS node chip connected to the R3000 bus. In 1994 the laboratory setup (Fig 3) was extended to include: • Two SCI -SBus interfaces (Dolphin) with CMOS node chips for single and multiple ringlet and node tests between two Sun Workstations • A protoptype of an SCI-SCI bridge based on two CMOS node chips • One FIC8234 VMEbus processor equipped with the CES 8224 SCI-DMA card • Two 1.2 Gbit/s Glink interfaces (Lasertron) with adapters to the Sun SCI-connector and an optical fiber SCI ringlet (not shown) • Diagnostic instruments, power supplies and VME crates from the CERN pool A second laboratory for software developments (OS9, CASCADE) has been equipped with two SUN Workstations connected into an SCI ringlet. 2.1 SCI CMOS Node chip tests (CERN) We measured latencies for all SCI subactions which make up a complete SCI transaction (Fig 4). Using a requester and responder on the same SCI node and a diagnostic software set from Dolphin [6] installed on a SPARC IPC workstation, we measured latencies for read, write

5 RD24 Status Report 1994 CERN RD24 Milestones and activity report

Requester A TEST CONCEPT Requester B

Responder A Responder B SCI

TRACER

TEST TARGET REQUESTER RESPONDER T=0 request ∆ 5 ns/m + N * bypass delays ( t1) ∆t1 = echo latency

T=1 request echo Performance limit for responseless ∆ Move transactions t2= access latency T=3 response Performance limit for transactions with 1 outstanding request capability only T=3 Performance response echo ∆ t3 =∆ t1 limit for responder Figure 4: Timing measurement concept on SCI subactions and move transactions both in a closed loop and between two node chips. We used an HP 1661 state analyzer which was connected via probes to both the input and output links of the node chip under test. The probe voltage levels were adapted to Pseudo-ECL switching levels, i.e. 3.7 V. 2.1.1 Viewing of packets and performance measurements First packets (Fig. 5) from a CMOS SCI node chip on a two node ringlet show that data on the link are clocked at every transition of the clock (NCLK) whilst on the Cbus application bus, the CCLK works at 1/4 speed. SCI packets such as request, echo or move can be easily identified by their Flag symbol length coding: Size-4 for request/response packets and 3 clocks for echo packets. Sync packets (not shown) have a 1 clock Flag bit. The packet symbols contain in sequence: the 16 bit source node ID, the SCI command and the target ID, followed by data symbols and a CRC check symbol. 2.1.2 Measurements Listed in Table 1 are some basic timing parameters measured using the LSI Logic Nodechip L64601, clocked at a frequency of 62.5 MHz, corresponding to 125 MBytes/s bandwidth on the SCI ring. Tests at Dolphin show selected chips may run at up to 90] MHz (see report by Dolphin). The latencies obtained for responses (T3) are only approximate here as the generation of responses in the diagnostic packet mode (the only mode which we had available), can be very large as they are generated by software. The faster data mode latency measurements are not available at the date of writing. Such latencies are expected to be in the order of hardware

6 RD24 Status Report 1994 CERN RD24 Milestones and activity report

100 ns 16 ns Toggle Request and Response packets Echo packet

Data CRC 3rd Symbol=> SCI source ID 2nd Symbol=> SCI Command 1st Symbol=> SCI target ID Figure 5: First CMOS SCI packets footprints latencies from memory subsystems, i.e estimated at approximately 3-10 us, resulting in transfer rates between 100 kByte/s and 21 MByte/s depending on the number of bytes transferred in a packet (between 1 and 64 for write or read). Table 1: Measured Node chip timing parameters.

LSI CMOS Nodechip Performance @ 62.5 MHz latencies limita ∆t1 bypass latency 224 ns b - ∆t2 access latency 2 * 800 ns + - memory access time T1 limit for move 2.2 us 29 Mbyte/s T3 limit for complete response 3-10 us 100 kByte/s transactions on memory -21Mbyte/s nodes a. user data transferred between 2 directly connected nodes, 3 m cables b. corresponding to 14 linc clocks

These parameters will be improved for future, enhanced node chips. The addition of more packet buffers will allow for more outstanding requests, i.e. another packet can be sent before receipt of a response.The bypass delay will decrease proportionally with the chip’s clock rate. Also the speed of the elasticity buffer could be improved in future chips. We expect the bypass latency on future SCI chips to be of the order of 100-150 ns. The latency from the link to the backend interface (currently Cbus) can be reduced by a factor of 2 by more pipelining whilst the access latency of a typical user memory interface is probably constant and dominant. The limit for move transactions requires thought as this performance-critical transaction for data moving is dominated by the return time of an echo packet, which again depends on the size

7 RD24 Status Report 1994 CERN RD24 Milestones and activity report of the ring and bypass delays. 2.1.3 Back-to-back bridge functionality A simple SCI-SCI bridge (Fig 6) has been built from two node chips connected back-to- back using a special cable between the two Cbuses[21]. A bridge is possible because the

Requester-Responder Requester-Responder Back-2-Back Bridge

CBUS A+B

Nodechip A Nodechip B cable Ringlet B Ringlet A

Clock

Figure 6: A ringlet bridge prototype

CMOS node chips from LSI Logic can be initialized to recognize a range of 16 SCI destination identifiers whose packets should be passed to the Cbus instead of retransmitting them to the ring. The Cbus maintains the SCI packet structure. If two Cbuses are connected with request and response lines crossed these packets re-appear on the far-side SCI link. On 6.4.94 we have for the first time successfully transmitted SCI packets without error between two SPARC Workstations over such SCI-SCI bridge, using rsb, wsb and move transactions. The transfer latency for individual SCI packets through the bridge (node chips clocked at 62.5 MHz) was 1492 ns. We propose that a commercial back-to-back bridge with optimized transmission lines between the node chips (our cable was prone to noise) should be promoted quickly to allow building of small DAQ architectures with a limited number of ringlets in the near future. Larger DAQ architectures will need bridges with improved routing capabilities and lower latencies. Their functionality is being studied within RD24 for near term implementation (see report by Thomson and Oslo University). 2.2 Serial SCI test using 1.4 Gbit/s Glink chips (CERN, Dolphin) The node chips from LSI Logic are equipped with a handshake interface for Hewlett Packard’s 1.4 Gbit/s Glink chipset (receiver transmitter pair). These chips are capable of conversion of 16,17,20 or 21 bit parallel data into balanced, serial bit streams of up to 1300 Mbaud, or 65 MHz I/O rate on a 16 bit wide interface. The parallel link speed of 62,5 MHz is well adapted to this rate. We have purchased two evaluation cards (Calliope’s from Laserton [7]) which have the RxTx Giga chipset [20] (HDMP1000/1002 from Hewlett Packard) mounted together with a high speed 1300 ns single mode laser, capable of transmitting over more than 1 km. Via an adapter card made in the ECP Division, we equipped two SUN SPARCs via their SCI connector with the Calliope cards.

8 RD24 Status Report 1994 SCI for Accelerator Controls

HP Gigalink Chips Laser transmitter 1300 nm HDMP-1002/1004 SCI Control, Clock Bit serial enable signals over Tx fiber Loopback Nodechip mode LSI Logic Rx 16 +1 Cbus bit parallel Photodiode receiver SCI Link @ 62.5 MHz HP chips in 20 bit mode Calliope Card Figure 7: SCI over Fiber test

At the day of writing (7.4.94) we managed to transmit error free SCI packets only in loopback mode using a coaxial cable between the Rx and Tx chips, on each SCI node. The communication via two 4 m fiber segments (SCI fiber ringlet) has not been successful so far, probably due to an initialisation problem which needs further investigation. In loopback mode (and using a 20 bit interface width for SCI data + clock + flag) we observe a transition time of 80 ns between an ingoing and outgoing SCI packet. The phase relation between the Flag and the clock signals remains unchanged. We believe that this is proof of principle and commercial solutions to SCI over serial media (fiber optics or coaxial cables) should be promoted with high priority. This is because cable distances using parallel copper assemblies are limited to 10 m. Also the large price drop of Glink chips which took place over the last 12 months should make such implementations cost effective. Users expressed at the CERN SCI meeting in April that they want SCI boards equipped with serial SCI. Chapter 3. SCI for Accelerator Controls The SL division of CERN is intending to implement a small accelerator control application using SCI over optical fibre between the Prevessin Control Room and the LEP tunnel, a distance of several kilometers. The intent of the project is to evaluate SCI as a support for distributed shared memory and other inter-process communication with the aim of producing a simpler control system architecture. This simplicity would result from a design based on a two level control system - operator consoles based on workstations and equipment input/output controllers - requiring less diversity of hardware to be maintained. The above mentioned functionality of SCI would result in a simpler software structure whereby an application would access the memory in the equipment IOC without being aware of the distributed nature of the control system. The SL division has standardised on VME but does not support Sun workstations nor the associated Sbus. Dolphin has given priority to the SCI/Sbus bridge and does not expect to be able to deliver SCI/VME bridges before July. In preparation, the SL/CO group has designed and had constructed an interface, for use in VME crates, between an SCI/VME bridge and a

9 RD24 Status Report 1994 SCI bridge to DSP architectures (CERN, IFIC Valencia)

Calliope card that transmits and receives data over optical fibre at speeds up to 1.4 Gbits/s[18]. Software for the pilot project is being developed or acquired. Amongst that already available is a multi-source interrupt dispatcher for shared memory and the Self-describing Data Structure (SDS) package. The latter is a development of MOPS already well known at CERN and further refined at LBL, adapted for distributed shared memory. Its self-describing attributes allows a transparent data conversion when necessary due to the transfer of data between heterogeneous environments (big endian/little endian). Chapter 4. SCI bridge to DSP architectures (CERN, IFIC Valencia) A general approach to bridge SCI node chips to DSP architectures has been studied. We decided not to design for a particular DSP architecture in order to keep the interface adaptable to coming architectures like the ADSP21060 [8]. Such independence is possible if the SCI packet assembly and disassembly is done in firmware. In addition, we use a key address concept to communicate efficiently between hardware and firmware. Clearly, this is a very specific, non-transparent bridge which requires firmware to share CPU and DMA resources of aDSP. The bridge needs a speed domain decoupling between a cycle-based address/data bus of a

A D CBUS SCI-DSP INTERFACE

31 32 64 SCI

LINK KEY CBUS DSP ADDR DECOD BIDIR (C40) CONTR FIFO

RAM RAM CONTR NODECHIP

GLOBAL LOCAL Figure 8: SCI-DSP bridge concept given DSP and a packet oriented, backend bus of an SCI node chip. We decided to use bidirectional FiFos, following the design work for the RIO firmware interface from Phase 1, however optimised for performance and exploitation of DSP internal resources. A Cadence/Verilog design is in progress in the ECP Division by a technical student from IFIC, Valencia. His work should result in a prototype by September, to be tested on a C40 DSP

10 RD24 Status Report 1994CAE Design laboratory for SCI at IHEP Protvino Russia (CERN, IHEP) interface used in ATLAS (TIM modules). Further we plan to apply such a bridge concept to a DSP for handling memory buffers. An application is targeted for the CMS dual port memory. Chapter 5. CAE Design laboratory for SCI at IHEP Protvino Russia (CERN, IHEP) The experience with CAE design for SCI, performed by a Russian engineer is excellent and we want to enhance this possibility by enabling the IHEP group to participate locally within their Electronics and Automation Department (ca 100 engineers), RD24 at CERN has helped in building up a CAE cluster, consisting of 3 Sun IPC and one SPARCstation 10 with peripherals. Major resources were contributed by the SL and ECP divisions. The ESONE chairman was very helpful in establishing important contacts. With the very recent purchase of a Cadence Concept Licence "Logic Workbench Designer/SE" together with Verilog XL Logic Simulator, RD24 at IHEP plans to start work now on a enhancement of the current DMA (Fig 9) which in its first form is used on the CES

t x 0

Figure 9: IHEP’s current SCI-DMA executing on Cbus

FIC8234 board [9]. The IHEP group would like to convert the current design into an autonomous engine to be used with memory buses in front end units of DAQ systems. This requires access from SCI to the control block of the DMA engine in a further design stage. Also a compression of the current discrete PAL logic into a FPGA chips (Xilinx or Actel) is desired to allow using it as a small building block. We are still looking for sponsors to finance FPGA development tools (Xilinx or Actel) as well as memory extensions and peripherals such as printers for the IHEP cluster. Chapter 6. SCI Software SCI “simulates a bus but uses point-to-point links to achieve higher speed”. It may be considered as a bus or a network. Like buses, it provides a shared memory environment with low latencies and no software involvement for data access. It offers the connectivity of a network with reliable data transmission over optical fibres. Point-to-point links from the communications industry normally require drivers (and often client and server programs) for both sending and receiving data. In addition, buffer managers are used to de-synchronize the connection between bus based memories or processors and point-to-point links. Shared memory eliminates the need for drivers for data transport.

11 RD24 Status Report 1994 SCI Software

SCI supports two programming paradigms: message passing (as for point-to-point links) and shared memory (as for buses). Whereas shared memory simplifies the software, message passing is easier to implement in hardware. Message passing may be economical for front-end electronics where software often plays a minor role. Shared memory becomes attractive for applications which are dominated by complex software such as control and on-line event reconstruction and filtering (third level trigger). The current generation of hardware supports both paradigms. Message passing is still more efficient for the transfer of large blocks of data [11]. We expect that the efficiency of shared memory will be improved in future interfaces which can make use of burst data transfers over memory buses (generated by the replacement of cache lines) rather than the 4 or 8 byte data transfers of 32 or 64 bits wide I/O buses. Currently, message passing can be made more efficient because I/O buses such as SBUS (SPARC) or VMEbus support burst mode for DMA data transfers. 6.1 SCI in a SPARC Workstation /Unix environment (Dolphin, CERN) The simplest approach to SCI interfacing is Packet Mode, using software to build “CBUS packets” in an intermediate Fifo or Dual Port Memory. It is used for diagnostics (CERN software) as it may generate any SCI packet type. Dolphin has a software package which emulates an SCI cache and memory controller using packet mode and real hardware for the actual data transfers. Both message passing and shared memory use a UNIX driver (/dev/sci) which has the standard UNIX entry-points Open, Close, Read, Write, Ioctl and Mmap: • Open and Close control the access to SCI (a sharable device). • Read and Write provide DMA transfers (message passing). • Mmap provides memory mapping to both buffers (in kernel space) for DMA data transfers or "Remote SCI memory" (i.e. memory that is physically located on another node) for transparent access to shared memory. • Ioctl is used for SBus board initialization, setting up connections to remote nodes, se- lection of SCI transaction type (responseless dmove64 or write64 with response), diagnos- tics and ringlet configuration. The use of message passing is illustrated in Figure 10. A typical application would look like this: producer application on first node: unsigned int data[1024];

fd = open(“/dev/sci0”,O_WRONLY); ioctl(fd,CONNECT,NodeId); write(fd,data,sizeof(data)); close(fd); consumer application on second node: unsigned int data[1024];

fd = open(“/dev/sci0”,O_RDONLY); ioctl(fd,CONNECT,NodeId); read(fd,data,sizeof(data)); close(fd);

12 RD24 Status Report 1994 SCI Software

:

User User application application

Read(...) Read(...) Write(...) Write(...) Ioctl(...) User space Ioctl(...) Kernel space Message queue SCI SCI Message queue in kernel space daemon daemon in kernel space

SBus SBus SBus to SCI SBus to SCI bridge bridge SCI Figure 10: Message passing

User User User application application application

Buffer memory Buffer memory Buffer memory mapped from mapped from mapped from remote node remote node remote node Mmap(...) Mmap(...) Mmap(...)

Buffer memory allocated in kernel space.

Kernel space User space SBus SBus SBus SBus to SCI SBus to SCI SBus to SCI bridge bridge bridge SCI Figure 11: Shared Memory

13 RD24 Status Report 1994 SCI Software

In addition, Dolphin provides a TCP/IP network driver as illustrated in Figure 10.

Nuser Nuser applications applications

Read(...) Read(...) Write(...) Write(...) User space Kernel space Socket Socket interface interface

TCP/IP TCP/IP protocol protocol implementation implementation Message queue Message queue in kernel space in kernel space

SBus SBus SBus to SCI SBus to SCI bridge bridge

SCI

Figure 12: TCP/IP Protocols

.For transparent data access the driver is only used to set up the shared memory and define the connections between nodes, as illustrated in Figure 10. No software intervention is needed for the actual data transfers, as illustrated in the example below: local buffer application:

fd = open(“/dev/sci0”,O_LOCAL_MAP); ioctl(fd,CONNECT,NodeId); ptr = mmap(fd,1024, ...); *ptr = 123; /* transparent write operation */ close(fd); remote buffer application:

fd = open(“/dev/sci0”,O_REMOTE_MAP); ioctl(fd,CONNECT,NodeId); ptr = mmap(fd,1024, ...); /*the next statement will print the value 123 set above*/ printf(“Data is : %d\n”,*ptr); /* transparent read op. */ close(fd);

14 RD24 Status Report 1994The 8224 SCI Block Mover for VMEbus (CERN, IHEP and CES)

Chapter 7. The 8224 SCI Block Mover for VMEbus (CERN, IHEP and CES) As described in [1], a dual port interface to the 68040 bus (Figure 13) was designed and successfully tested at 500 Mbyte/s linkspeed and 114 Mbyte/s DMA transfer rate. This design was adapted to the CMOS nodechip and become a commercial corner for the 68040 based FIC processors from CES.

SCI DMA

CBUS Interface

MC68040 DMA CMOS SYSTEM NODE CHIP

Start block DMA 3 2 1 0 ADDR Decoder DPM address CBUS<63:0>

Cbus high address 040BUS<31:0> Cbus low address 16 Kb DPM F/S Byte count

Figure 13: DMA interface to FIC processor

The SCI 8225 from CES [9] combines the FIC 8234 (a MC68040 based VME board) with the SCI 8224 mezzanine board that provides a simple connection between SCI and VME using a fast Dual Port Memory. The software consists of a single-user DMA routine that allows data transfers between the local Dual Ported Memory and remote SCI memories on other nodes. Its main usage is for moving of data between VME based Data Acquisition hardware and processors for event building. The DMA uses a sequence of fast dmove64 packet protocols with wsb (write selected byte) synchronization protocols at start and end of 64 byte data boundaries. The 16 KByte deep dual port memory can also be addressed via SCI as a memory node, mapped into the 68040 address space. The DMA transfer performance achieved is consistent with the bandwidth limit quoted in Chapter 2.1.2 Chapter 8. Apple’s Macintosh interface and PowerPC project (ATG group, Apple) The Advanced Technology Group of Apple Computer, Inc. has demonstrated during the September 93 SCI meeting at CERN, a research prototype SCI interface for the Apple Macintosh Quadra series of personal computers. This SCI interface used a GaAS chip from Dolphin and resided in the internal processor direct slot (PDS) of 68040-based systems, providing a transparent bus-bridge capability, mapping selected 68040 bus transactions to and from the appropriate SCI transactions. The PowerPC interface will provide increased SCI

15 RD24 Status Report 1994 SCI/Turbochannel Interface (INFN Rome and Lecce) functionality as compared to the Quadra interface. 8.0.1 Status of PowerPC-SCI Bridge We are still in the design stage, incorporating feedback from the recent RD24 User's Group meeting at CERN in April 94. We are in the process of a technology investigation for choosing the target hardware medium for implementing the bridge. We intend the bridge to be a one chip solution with a RAM for address mapping. We most likely will use a fast turn around gate array technology. We are currently mapping the 040-SCI bridge to that technology to better understand the technology. We target a working design for the fourth quarter 94 8.0.2 Feature Set The bridge will be implemented as a transparent bus bridge with minimal or no software impact on applications or the operating system. It will map equivalent SCI transactions to/from the ‘601 bus operations as follows: • ‘601 1-8 byte r/w map to SCI read/write selected byte • SCI rsb/wsb map to ‘601 sequence • SCI read/write 64 byte maps to double 32 byte read/write sequence • SCI fetch&add maps to or from ‘601 read-modify-write transaction sequence The bridge will support DMA using 64 byte SCI read and write transactions The bridge will implement a simple, outgoing address translation mechanism to support many SCI nodes in a shared memory space model. 8.0.3 Implementation issues The split transaction support on the ‘601 bus is imperative to avoid deadlock on outgoing SCI transactions. Cacheline reads are implemented as ‘critical hexlet first’. We map specific ‘601 lock mechanisms to SCI lock transactions. Multi-block DMA will be supported. 8.0.4 Future directions The bridge should be adapted to future Apple PowerPC platforms to satisfy bandwidth demand of real-time multimedia, graphics and sound. We need a backend interface standardisation on the node chips. The integration of SCI communication over optical fiber is desirable but requires cheap fiber interfaces for SCI. Bridge functions to PCI and ATM are needed. An application for Cluster Computing is planned at the University of Santa Clara. Chapter 9. SCI/Turbochannel Interface (INFN Rome and Lecce) The TURBOchannel/SCI Interface has been designed to implement a bridge between TURBOchannel (TC) and SCI. TC is used by DEC in most workstations based on MIPS and Alpha AXP processors. The Interface has to be implemented as a bidirectional bridge allowing for both I/O operations (started by the system hosting the TC bus) and DMA operations (started by the SCI ring). The first prototype board connects a DECstation 5000/200 (running the ULTRIX Operating System) to an SCI node. The board is based on a R3000 processor which implements the firmware and is logically divided in two parts: the SCI and the TC environments are decoupled by the use of dual-port FIFOs. Two boards are designed which connect the TC to a specialized I/O card, hosted by the RIO card (RIO/SCI). The RIO/IO card acts as an SCI node

16 RD24 Status Report 1994SCI to DEC processor and SCI-C40 Interface (RAL and Manchester University) allowing TC to generate SCI transactions. To develop the RISC/TC logic, the RIO/IO card has been replaced by an R3052 evaluation board by IDT (7RS385) based on the processor IDT79R3052E, avoiding the use of VME for the test. On the board all the R3052 signals are available on a connector in a wire wrap area. Here, a 96 pin VME connector has been mounted to simulate the RIO/IO connector. The interface handles Read/Write operations of one word both on the RISC and TC side. The board is unable to handle DMA operations. The FIFOs implemented on the board are from MOSEL (MS76542 )1: four FIFOs for the RISC/SCI communication, two other FIFOs are used for the RISC/TC side to handle the communication in the two directions (one is for I/O operations, the other for DMA operations). Due to the dimensions of the board, it has not been possible to insert the interface directly in one of the three TC slots inside the DECstation. The connectivity has been realized through a TURBOchannel Extender. A complete hardware simulation of the RISC/TC board has been carried out by using the CADENCE/Verilog design environment using the Verilog Smart Model libraries. This allowed to simulate the logic and to foresee the performance of the board. In particular we predict the following data transfer rates: • a data transfer between R3000 and FIFOs: 25 Mbytes/s • b data transfer between DECstation and FIFO read: 12.5 MBytes/s write: 20 Mbytes/s The value a) has been confirmed by measurement b) will be verified as soon as the special- ized driver required by ULTRIX will be available. The development of the software is made in collaboration with DEC. The logic for the I/O operations has been designed and completed by INFN Rome and Lecce and the DMA part is under development at RAL. 9.0.1 + SCI Bridge (INFN Rome and Lecce) The project has a big delay because of the problems encountered with the FUTUREBUS+ hardware as described in the RD24 Status Report CERN/DRDC 93-20 (5 May 1993). The crate had to be replaced and the modules needed repair. The delivery of the new material to INFN is going on now. A preliminary test of this new hardware was performed at CERN and the impression was good especially for the new crate which allows for an easy test of newly developed prototypes.We have two Profile-F FUTUREBUS+ Starter Kits consisting of a 14-slot powered crate, an arbiter module, an interface module based on the R3500 processor and a 16 Mbyte memory. We plan to develop a first design of the SCI/ FUTUREBUS+ Bridge based on the Nanotek NR3000-1 module. The module uses a connector to the R3500 bus thus allowing a connection to logic similar to the RIO I/O card (SCI node). For a next prototype, we plan to make use of the experience gained with the development of the SCI/TC interface (namely the RISC/TC part). For the purpose to replace the NR3000-1 board with a more specialized one, we are looking for a suitable FUTUREBUS+ chip set to implement the interface and to evaluate the R4000 processor (P-4000i evaluation board by Algorithmics). Chapter 10. SCI to DEC processor and SCI-C40 Interface (RAL and Manchester University) The RAL group have been working towards a Turbo-Channel to SCI interface, based on the earlier CERN design of a RIO-SCI interface together with the more recent Rome work for the compatible RIO to Turbo-Channel interface. The complete interface consists of 3 VME size

1. production has been stopped, to be replaced

17 RD24 Status Report 1994SCI to DEC processor and SCI-C40 Interface (RAL and Manchester University) boards - a standard CES RIO, a DOLPHIN CMOS SCI mezzanine board and a custom built surface-mount/wire-wrap board. We have just received delivery of the DOLPHIN SCI board, and the RIO has been used for some while now to test the software which needs to run in its R3000 to handle the SCI and TurboChannel protocols. The custom board is almost finished

DBUS CBUS SCI

Card appears as Memory FIFO FIFO-32 Request FIFO-64 FIFO CSRs 32 64 CMOS + IRQ, Status, etc. NodeChip

FIFO

Response

Simple logic cards being FIFO

built to drive DBUS from

VME, PCI and

C40 Global Bus CSRs

SCI Daughter Card

Figure 14: RAL an Machester DSP interface with only a few wire-wraps awaiting clarification on a few control logic changes required to combine the two original circuits (i.e. from CERN and Rome). Once these are done the few remaining chips will be added and testing will begin at RAL. In parallel work has started on understanding the current status of the DecStation TurboChannel driver and the changes required to install it as a driver on an Alpha workstation running OSF. The RAL group has also started work on a standard C40 DSP development system (a DBV42 from Loughborough Sound Images) to prepare software for a C40 to SCI interface, using a Manchester SCI daughter card attached to the C40 global bus via some intermediate control logic. We are also in communication with the Valencia group so that their more general DSP-SCI interface gains from our experience.

18 RD24 Status Report 1994SCI to DEC processor and SCI-C40 Interface (RAL and Manchester University)

10.0.1 C40-SCI Interface at Manchester (Manchester University) The Manchester group have designed an SCI daughter board containing a CMOS SCI NodeChip, plus FIFOs and control logic between the DBeX32 expansion port of the LSI DBV42 card. The transition card (wire wrap) will have short cables (few cm) to the two dBeX connectors on the DBV42. Most of the required interfacing can be achieved by modifying the PALs on the 'data bus' side of the daughtercard FIFOs. Packetisation is performed in software - basically the FIFOs and registers of the daughtercard are mapped into the DBeX space of the C40. Interrupts will be supported. The first of these daughter boards is now being assembled and testing will then commence. The design of the daughterboard is such that it can easily be driven from other standard 32 bit buses via simple control logic. Three such control boards are planned, and are currently being designed or built. The first is a simple VME board to be driven from a CES FIC, the second connects to the global bus of a C40 and the third is for connection to the PCI bus of an Alpha evaluation board. As part of a test for a possible ATLAS second level trigger architecture it is planned to use an SCI ring with all of these interfaces in a test environment at CERN during the latter part of 1994 10.0.2 Time scales The daughter cards are being commissioned now with our VME interface, although we still don't have production node chips. The C40 transition board is being built now. Provided we don't have too much trouble with the daughtercard we expect to have the C40-SCI package ready mid June, although our test beam plans do not require delivery before mid July. We would plan to testing a three node ring using 'Manchester-only' hardware, and then test at RAL adding their Turbochannel interface to the ring (late July).

mailbox mailbox BUFFER Processor FIFO #1 FIFO #3 I/O Bus CBus

Internal Bus SCI BUFFER FIFO #0 FIFO #2 Node Memory Chip I/O Bus 29200 Bus

I/O-Bus C-Bus Transact Transact Control Control DMA pointer Control Control & Status & Status

AMD29200 Embedded Processor Fast PALs

CERN Host Interface FBControl master I/O Bus BUFFER FIFO’s : 36bits x 512bytes

Figure 15: Block Diagram CHI/SCI link by Univ.

19 RD24 Status Report 1994SCI to Fastbus Interface (University of OSLO, Physics Department)

Chapter 11. SCI to Fastbus Interface (University of OSLO, Physics Department) The CHI/SCI link has been designed to provide a simple bridge between Fastbus via the CERN Host Interface [22] (manufactured by Struck) and SCI. CHSCI is implemented as a daughter board which is plugged onto the CHI and connected to the I/O port of the module. The first version will use a CMOS NodeChip on a mezzanine card, which results in a triple width Fastbus module. The design has been done in collaboration with Struck. The link is a firmware driven FIFO interface, using the AMD29200 RISC processor card. The processor card contains 256K x 32 bits DRAM, DMA controller, interrupt controller, 16 programmable I/O lines and PIA and RS232 ports. The processor clock frequency is 16MHz. The 36 bit wide FIFO’s implement a 64 bit wide data path plus mailboxes. The CBus controller is a state machine implemented in 7.5 nsec PALs. Fourteen different SCI packet types are recognized, among those selected byte read/write, noncoherent 64 bytes read/write, and 64- byte move. The design has been done for the 1 GByte/s GaAs NodeChip, however, the CMOS chip will be used in the first version. Transfer of data to/from the CHI data memory is done via the I/O port. DMA transfers can take place between the CHI data memory and the FIFOs on the CHI side, and between the FIFOs and the 29200 processor using the DMA controller on the processor card. The software in the 29200 controls the data stream between both the FIFOs on the CHI side and the SCI packet FIFOs connected to the CBus. The design of the CHI/SCI link is finished, and the PCB is in production. First tests are scheduled for June ’94 Chapter 12. SCI-Switch status (Thomson TCS) Thomson TCS and Dolphin are the lead partner of the TOPSCI Eureka project EU 834 to develop and produce SCI technology together with Dolphin for interconnecting high performance computer systems and in particular to develop a high performance SCI switch. The technology chosen is GaAS. CERN and the University of OSLO (RD24) are associate partners with terms defined in collaboration agreement (K 211/ECP) 12.0.1 System Definition System definition was achieved through a fruitful collaboration between Oslo University, Cern and Dolphin, with support from TCS concerning mostly feasibility of the different approaches considered. Extensive simulations have been carried out by CERN and Oslo University to asses the performance for different architectures. The conclusion points to the convenience of using Dolphin's concept of the Link Controller (LC) with a B-Link as the back- to-back internal bus interconnection. The switch will be realized in a Silicon multi-layer Multi- Chip Module (MCM) substrate with four GaAs chips and line termination resistors on a Silicon chip. The routing algorithm is yet to be defined. Table lookup and self-routing are being considered. The amount of memory to be included in the GaAs chips and the design effort will strongly depend on the trade-off made. 12.0.2 GaAs IC Specification and Design Based on the results of the System Definition activities, the specification of the GaAs chips has started. This activity includes the estimation of the circuit complexities and performance for different architectural trade-offs. The target fabrication process is Vitesse's 0.6 micron H-GaAs III. and the design will be carried out using FX-200K Sea of Gates. IC design

20 RD24 Status Report 1994SCI Switches and Architecture Simulations (Univ.OSLO, CERN) is carried out by Dolphin with TCS' support. 12.0.3 MCM Interconnect Development Alternative interconnection schemes have been evaluated, including micro-bumping, wire-bonding and TAB. Substrate evaluations have been carried out by TCS including 4-layer interconnections with micro-bumping and wire-bonding of the chips. A test vehicle substrate has been designed and is currently in fabrication. Test vehicle GaAs and Silicon ICs aimed at bump development and design rule extraction have also been designed and are being fabricated. 12.0.4 Interconnection Modelling Based on TCS' preliminary interconnection design rules, Dolphin (SINTEF) and TCS carried out complete electrical simulations of different interconnection schemes on the Silicon substrate. The effect of the substrate resistivity on attenuation,characteristic impedance and crosstalk was evaluated. The conclusion is that the Ground Plane will be routed with MET1 layer, power could be routed with MET2 layer, and chip to chip signal interconnection could be routed using MET3 and MET4 layers. In all simulations, 50 ohm termination was assumed, since GaAs ECL I/Os require resistor terminations to Vtt = -2Volts. 12.0.5 Cooling Following evaluations carried out by Dolphin (SINTEF) the initial liquid Fluorocarbon cooling scheme, where the liquid was in direct contact with the GaAs chips was replaced by a hermetic cooling tower on top of the ceramic package, thus avoiding potential reliability problems coming from liquid-semiconductor interactions. In the new scheme, the liquid is inside an exchanger, and the heat transport is assisted by its phase transition. Detailed thermal modeling of the proposed package is in progress. Chapter 13. SCI Switches and Architecture Simulations (Univ.OSLO, CERN)

As an associate partner in the TOPSCI EUREKA project, RD24 (CERN, University of Oslo) participates in the architectural design of a high performance SCI switch. The four ports are GaAs Link Controllers (18-bit parallel at 1 Gbyte/s) on a Silicon substrate, see Figure 16.

1GByte/s

GaAs LC 1GByte/s asLC GaAs

SWITCH GaAs LC 1GByte/s

GaAs LC

1GByte/s Figure 16: A “4-switch” routes traffic between 4 bi-directional SCI rings

21 RD24 Status Report 1994SCI Switches and Architecture Simulations (Univ.OSLO, CERN)

Simulations have been carried out to investigate three alternative models for a switch architecture using SCILab [10]. Of these, one is a crossbar switch, one uses an internal-ring to connect four SCI switch ports and one is based on an internal bi-directional bus. 13.0.1 Switch Model An SCI 4-switch implements SCI protocols on its four ports each having one SCI input and output link. The interconnections between the four ports may be a crossbar, a ring or a bus as illustrated in Figure 10. SCI links a)

port port port port Crossbar b) back-to-back connections, could be 16-bit or 64-bit

Interconnections Ring for example: crossbar, ring, bus c)

Bus Figure 17: A block diagram of an SCI 4-switch

13.0.1.1 Simulation of a 4-switch To investigate the performance of a larger network we have first simulated the

REQUESTER 5) Response Delay (50 ns) RESPONDER Cbus (64 bit) 6) Request Delay (50 ns) 4) Cbus clock b) Packet 8) Request (1/4 SCI Speed, 8 ns) Type frequency (50 ns)

AD: Address Decoder SCI node In: input-FIFO SCI node Out In Out: Output-FIFO Out In Bypass: Bypass FIFO Mux: Multiplexer Mux AD Mux AD Bypass RT: Routing Table Bypass 1) Bypass Delay, including cable delay too. (15 ns) 9) SCI Clock (2ns) 2)RoutingDelay(15ns)

Routing Bypass Mux Bypass Mux 7) Speed, Clock, etc. RT Internal Connections Switch port Switch port In Out In Out

a) Depth of FIFO

3) Switch Delay (10 ns)

SCI Switch Model

Figure 18: Simulation Parameters

behaviour of a single switch element with each of the four SCI ports connected to an active node. We exclude the possibility of sending data to oneself. The purpose is to measure the

22 RD24 Status Report 1994SCI Switches and Architecture Simulations (Univ.OSLO, CERN) performance of a saturated switch. The performance metrics of primary concern are throughput and latency. We therefore determine the raw throughput (indicates link utilisation), net throughput (useful data going through the switch), retry throughput (generated by overloading the system) and latency by simulation. We vary the depth of the internal FIFOs and simulate the three switch models. The values of the other simulation parameters is given in Figure 18.

Table 2 shows the results (the number of internal Fifos has been normalized to use the same amount of on-chip RAM for each architecture).

Table 2. Performance of a single 4-switch, dmove64, simtime = 100 us

CrossSwitch Internal-bus Internal-ring raw/net/ret/lat raw/net/ret/lat raw/net/ret/lat #FIFOs (GB/s) (ns) (GB/s) (ns) (GB/s) (ns) 1-FIFOa 1.07/0.81/1.78/429 0.83/0.63/1.73/722 0.73/0.56/2.50/763 3-FIFOs 2.77/2.11/1.02/461 0.93/0.71/3.24/2076 1.50/1.14/2.54/1142 a. 1-FIFO or 3-FIFOs refer to both SCI nodes and switch ports. For the CrossSwitch and Internal- ring-based switch, 1-FIFO is just one output-FIFO, but remember that the ring-based structure has 8 ports; for the Internal-bus-based switch, 1-FIFO means one input-FIFO and one output- FIFO. It is the same for “3-FIFOs”.

Table 3 compares the three models.

Table 3. Evaluation of the CrossSwitch model, Internal-bus switch model and Internal-ring switch modelArchitecture Simulation

Switch Model CrossSwitch Internal-busa Internal-ring Complexity ob +o Throughput + -- Latency + -- Suitability for burst traffic + +o Suitability for random traffic + oo Performance scalability vs # ports + -- # FIFOs for a 4-switch 4x1xnc 4x2xn 4x2xn a. Internal-bus runs at 1 Gbyte/s, faster and slower bus evaluations are also done (ref: Several Details of Switches) b. +: good; o: average; -: poor. c. kxmxn where k is the number of ports, m is the number of FIFOs, n is the depth of FIFOs

13.0.1.2 8Rx8R DAQ multistage system and its variations Several topologies. for an event builder based on an 8Rx8R multistage network as illustrated in Figure 19 have been investigated [12][13][14][15]. Routing algorithms have been studied and a flexible method based on table look-up has been proposed [16][17].

23 RD24 Status Report 1994 SCI Interconnect Solutions from Dolphin

4-switch 4-switch 4-switch

4-switch 4-switch 4-switch

4-switch 4-switch 4-switch producers consumers

4-switch 4-switch 4-switch

4-switch 4-switch 2-switch 2-switch 2-switch (bridge) (bridge)

4-switch 4-switch 2-switch 2-switch 2-switch (bridge) (bridge)

2-switch 2-switch 2-switch 4-switch 4-switch (bridge) (bridge)

2-switch 2-switch 2-switch 4-switch 4-switch (bridge) (bridge)

Figure 19: 8Rx8R DAQ multistage system, various topologies

Chapter 14. SCI Interconnect Solutions from Dolphin After a sales and marketing agreement in 1992 with LSI Logic for distribution of the CMOS Nodechip, Dolphin produced and demonstrated the first 125 Mbyte/s SCI systems using the CMOS chips. Dolphin is a research partner in the TOPSCI Eureka and OMI/HIC ESPRIT programs. The intense collaboration with RD24 at CERN exists since 1992, based on the formal agreement K/180/ECP. As defined in this agreement, Dolphin provides RD24 partners with SCI technology and design tools and collaborates on common projects. After pioneering the development of the first SCI chips, Dolphin Interconnect Solutions uses today SCI and ATM as baseline technology for its products. A publicly announced product is the 1 Gbit/s SBUS adapter board for high performance

24 RD24 Status Report 1994 OMI-HIC Project (Department of Physics, University of Oslo)

Cluster Computing. This board is the first product that allows development of distributed shared memory applications. The SCI protocols implemented on the Sbus board act as Sbus extensions, thus physically distributed SBUS segments behave as one shared Sbus.These protocols include hardware support for busy-retry, guaranteed data delivery, error checking and flow control. Applications include bridging to VME and ATM. The Sbus-1 product includes driver software for shared memory application development and transparent TCP/IP support for running existing applications An SCI Evaluation kit enables developers to test SCI for Workstation Clustering, distributed shared memory and cache coherent interconnections. 14.0.1 First test results news on CMOS node chips, Friday March 18, 1994 • Two nodes, 40 HMz, 15 hours (about 12 million Cbus transactions): one node was a requester/responder, requesting data from its own memory over the SCI link through the other node, the other node was a requester only, reading CSR space of the other node. read/write selected byte requests ==> no errors! • Five nodes, 50 MHz, about 1/2 hour: Three SBus bridges, two in in one Sparc station, one in another Sparc station. 2 VME Bridges; 1 passive just passing link traffic through its bypass FIFO, 1 active controlled by a VME processor test A: 2 Sbus nodes requested data from its own memory, 1 VME node accessed CSR space of one of the Sbus nodes, creating a lot of link traffic test B: 2 Sbus nodes requested data from one of the Sbus node's memory, creating millions of retries ==> no errors! • 2 nodes, 50 MHz: -all Cbus Exdev -> NodeChip commands, tested; -all Cbus NodeChip -> Exdev memory commands, tested these tests imply some coherence testing ==> no problems! • 1 node, 62.5 MHz, 80 MHz, and 90 MHz, loopback: ==> a lot of tests run with no problems! • Target ID nibble masking tested successfully (ignore 4 MSBs and/or 4 LSBs of the 8 LSBs of the target ID, required for SCI-Cbus-SCI bridge architectures) • 1 and 3 meters SCI cables used - no problems. Longer cables not tested.

14.0.2 Prototype approval April 8 1994 Dolphin Interconnect Solutions approved the CMOS NodeChip Rev. B. LSI Logic is in the process of preparing for commercial availability. The Technical Manual is being polished these days and will be publicly available within a short time." Chapter 15. OMI-HIC Project (Department of Physics, University of Oslo) We have started to look at applications for SCI, which can be of future interest in a DAQ system. This resulted in several studies and developments, mostly based on research work done by graduate students, but also work which will be related to the ESPRIT-project OMI/HIC and OMI/MACRAMÉ.

25 RD24 Status Report 1994A VideoRAM Memory for SCI (University of OSLO, Physics Department)

Based on the frame work for making SCI interfaces with NodeChip, dual-port RAMS and programmable logic, a series of interfaces will be made at the Department of Physics, University of Oslo. The boards will be controllable over SCI and will be able to understand a subset of SCI -transactions. A graduate student started this winter to develop a video interface board connected to SCI. A video camera will be interface to the board, and the digitized video signals should be sent via SCI to a workstation connected to SCI. The video interface should be controllable from equipment that are connected to SCI. Another graduate student started to develop a sensor interface with DMA adapter for SCI. The work will show that SCI may be usable to be connected to front-end equipment and send large amount of data down to a more centralized part in the system for processing. Addresses to destination for raw data will be set up over SCI. Two graduate student are looking at how OMI/HIC can be used interfaces to SCI. Another graduate student, since late 1992 has been looking at how a bridge between SCI and OMI/HIC could be build. On the OMI/HIC part he is assuming a BULLIT chip from BULL, but the rest of the design is a model in Verilog. At the moment the model is running and he has started to send SCI transactions from SCI to OMI/HIC. A further graduate student, has, together with Dolphin, started an implementation of a bridge between SCI and OMI/HIC. The design will use both the NodeChip and the BULLIT chip together with programmable logic and dual ported RAMS. This work started in February. We hope to a first prototype to start testing in the autumn. In the OMI/MACRAMÉ project SINTEF will design and make a sensor module based on sensors generating a high rate raw data. The module will be connected to OMI/HIC by using OMI/HIC interface chips and programmable logic (FPGA). The sensors could be radiation sensor with front-end electronics already developed at SINTEF. The sensor module will give an efficient and low power demonstrator module for OMI/HIC. The logic in the FPGA should decode the commands from SCI transactions and make the module controllable over the OMI/ HIC-links.The project should test if OMI/HIC is usable as a high speed/high bandwidth interconnect with guaranteed delivery of data from sensors with a minimum of electronics. The project will also make test tools to send SCI packets over OMI/HIC links. Both in the OMI/HIC and the OMI/MACRAMÉ projects SINTEF will together with the University of Oslo study and simulate on possible configurations of network of OMI/HIC where the SCI protocols are used. This will be an activity until 1996.

Chapter 16. A VideoRAM Memory for SCI (University of OSLO, Physics Department) The major objectives of this work at the University of Oslo have been to a) design a memory controller system for SCI by employing the existing interface technology from Dolphin Inter- connect Solutions, and b) to employ VRAM in the main memory in order to increase the bandwidth and to provide memory access times suitable for the high rate of data transfer offered by SCI.

A specialized machine Figure 19approach has been adapted to design the SCI memory con- troller. The architecture of the system allows further modifications in order to make the sys- tem capable of performing 64-byte write and read operations simultaneously (once the next generation of SCI interface chips support such transaction pipelinings). To make this possible,

26 RD24 Status Report 1994A VideoRAM Memory for SCI (University of OSLO, Physics Department) either read or write operations must be performed on the DRAM port of the VRAM array. Write operations have been selected to be performed through fast page mode cycles on the DRAM port. Through a bottom-up approach, the different modules of the system have been modeled and linked to work together. All the implemented state machines have been simu- lated, and the entire system has been simulated repeatedly by applying worst case simulation parameters.

The major features of the design are the following: • Interfacing with the both GaAs and the CMOS versions of the NodeChipTM • Direct interface to Cbus [21] • Capable of performing the whole set of request commands defined in the Cbus protocol • Supporting the coherency part of the standard • VRAM for the main memory storage • High Speed SRAM for the tag directory memory • Burst mode access cycles with two (for 16-byte operations) or eight (for 64-byte opera- tions) consecutive write operations for performing write operations on the main memory • Read transfer cycles followed by shifting cycles for performing read operations on the main memory • 64-bit ALU for performing lock operations • Employing a programmable VRAM controller in order to guarantee critical VRAM access timing parameters and automatic refresh • At 25 MHz operating frequency, the system is capable of providing a peak bandwidth of 200 Mbyte/second

The design of the PCB layout for the circuit is currently in progress. IUE1 h pcaie Machine. Specialized The 1. FIGURE

Sbussy* Cclk Cbus ADD CMD DATA NEW ARG Input FIFO

Control

Eresp* read write Data In Ebussy* SRAM ADD Address Interface Read/Write SRAM Array Eerr* Error Machine Module Detector Data In/Out Video DRAM Ecan* read write array & Controller VRAM Active Read/Write Clock ERR Machine Serial Output SC DELAY TO Chip Select Module SC TAG From To Cclk Data New To Cont. OLD Mask & ALU

Figure 20: The specialized machine of the VideoRAM

27 RD24 Status Report 1994A VideoRAM Memory for SCI (University of OSLO, Physics Department)

References

[1] H.Müller, A.Bogaerts, J.Buytaert, R.Divia, A.Ivanov, R.Keyser, F.Lozano-Alemany, G.Mugnai, D.Samyn, B.Skaali, "First Experience with the Scalable Coherent Interface",RT93 Vancouver June 8-11, IEEE Trans, Vol 41, No 1, Feb. 1994, or prepint CERN/ECP 93-15 [2] Application of the Scalable Coherent Interface for Data Acquisition at LHC, RD24 Collaboration, CERN/DRDC 93-20, Status Report May 1993 a postscript copy is on the public server rd24.cern.ch in sci/RD24_Info/Status_93 [3] IEEE Standard for Scalable Coherent Interface (SCI), IEEE Computer Society, IEEE Std 1596-1992. Available in Europe via Infonorme London, Tel 03 44 23377, fax 03 44 291194 [4] IEEE Standard for Low-Voltage Differential Siagnals for SCI (LVDS), Draft 1.00 IEEE Std 1596.3-1994, a postscript copy is on the public server rd24.cern.ch in sci/P1596.3 [5] LSI Logic GmbH, European Headquaters, München, Germany, Tel: 49 89 45836-212, Fax 49.89.45836-219 [6] SBUS-to-SCI Adapter User’s guide, DI950-10211, version 1.0 Dolphin Interconnect Solutions A.S. OSLO, Phone: 47 22 62 7000, Fax: 47 22 62 71 80 [7] Lasertron, Inc, Burlington, MA 01803, USA, Fax: 617.273.2694, Tel: 617.272.6462 [8] ADSP-21060 Sharc, Super Harward Architecture Computer, Oct 1993, For current information contact Analog Devices at (617) 461-3881 [9] SCI 8224, Scalable Coherent Interface for FIC 8234, User’s manual, Creative Electronic Systems S.A. P.O. Box 107, CH- 1213 Petit Lancy 1, Switzerland, Fax: +41 22 795 57 48 [10] A. Bogaerts and Bin Wu, “The SCILab Cook Book”, CERN Internal Note, July, 1993 [11] A.Bogaerts,R.Keyser,G.Mugnai,H.Müller,P.Werner,B.Wu,B, B.Skaali,"SCI Data Acquisition Systems: Doing more with less", April 1994, Submitted to CHEP 94, San Francisco [12] Bin Wu, Andre Bogaerts, Roberto Divia, Ernst Kristiansen, Hans Muller, Bernhard Skaali, “Constructing Large Scale SCI-based Processing Systems by Switch Elements”, technical report UIO/PHYS/93-12, University of Oslo, Norway, May 1993 [13] Bin Wu and Andre Bogaerts, “Several Details of SCI Switch Models”, Univ. of Oslo/CERN Internal Report, Version 0.5, Nov 15, 1993 [14] B. Wu, A. Bogaerts, R. Divia, E. Kristiansen, H. Muller, E. Perea, B. Skaali, “Distributed SCI-based Data Acquisition Systems constructed from SCI bridges and SCI switches”, The 10th Int’l Symp. on Problems of Modular Information Systems and Networks, St. Petersburg, Russia, Sept. 13-18, 1993, also as technical report UIO/PHYS/94-02, University of

28 RD24 Status Report 1994A VideoRAM Memory for SCI (University of OSLO, Physics Department)

Oslo, Norway, Jan. 1994 [15] Bin Wu, Andre Bogaerts, Ernst Kristiansen, Hans Muller, Ernesto Perea, Bernhard Skaali, “Applications of the Scalable Coherent Interface in Multistage Networks”,submitted to IEEE TENCON’94, “Frontiers of Computer Technology”, Aug. 22-26, 1994, Singapore [16] B. Wu, A. Bogaerts, E. Kristiansen, B. Skaali, “A Study of Routing Algorithms for SCI-based Multistage Networks”, technical report UiO/PHYS/94-06, University of Oslo, Norway, March 1994 [17] B. Wu, “Initialization of nodeIds and routing tables in an SCI system”, Internal Note, March 13, 1994 [18] R. Keyser and Giuseppe Mugnai, "SCI for Accelerator Controls: Status Report", SL/Note 94-32(CO). [19] Applications of the Scalable Coherent Interface to Data Acquisition in LHC, CERN /DRDC/91-45 [20] HDMP-1000 Tx/Rx Pair, Technical Data Sheet Hewlett Packard [21] Cbus specification V 2.0, August 1992 Dolphin Interconnect Solutions, P.O Box 52, Bogerud, N-0621 OSLO, Norway August 1992, (on anonymous ftp server rd24.cern.ch) [22] The CHI, a new Fastbus Interface and Processor,H.Müller et al., IEEE Trans.o.Nucl.SCI, Vol. 37, No 2, April 1990

29 RD24 Status Report 1994A VideoRAM Memory for SCI (University of OSLO, Physics Department)

INDEX

Motivation for SCI 2 CERN RD24 Milestones and activity report 3 Viewing of packets and performance measurements 5 Measurements 5 Back-to-back bridge functionality 7 SCI for Accelerator Controls 8 SCI bridge to DSP architectures (CERN, IFIC Valencia) 9 CAE Design laboratory for SCI at IHEP Protvino Russia (CERN, IHEP) 10 SCI Software 10 The 8224 SCI Block Mover for VMEbus (CERN, IHEP and CES) 14 Apple’s Macintosh interface and PowerPC project (ATG group, Apple) 14 Status of PowerPC-SCI Bridge 15 Feature Set 15 Implementation issues 15 Future directions 15 SCI/Turbochannel Interface (INFN Rome and Lecce) 15 Futurebus+ SCI Bridge (INFN Rome and Lecce) 16 SCI to DEC processor and SCI-C40 Interface (RAL and Manchester University) 16 C40-SCI Interface at Manchester (Manchester University) 18 Time scales 18 SCI to Fastbus Interface (University of OSLO, Physics Department) 19 SCI-Switch status (Thomson TCS) 19 System Definition 19 GaAs IC Specification and Design 19 MCM Interconnect Development 20 Interconnection Modelling 20 Cooling 20 SCI Switches and Architecture Simulations (Univ.OSLO, CERN) 20 Switch Model 21 SCI Interconnect Solutions from Dolphin 23 First test results news on CMOS node chips, Friday March 18, 1994 24 Prototype approval 24 OMI-HIC Project (Department of Physics, University of Oslo) 24 A VideoRAM Memory for SCI (University of OSLO, Physics Department) 25

30