Applications of the Scalable Coherent Interface to Data Acquisition At
Total Page:16
File Type:pdf, Size:1020Kb
EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH . ’ R QE N CERN DRDC/91-45 CERN LIBRARIES, GENEVA E6 S Pow? Illllllllllll Illlllllllilllllllillllllllllllllll ~5-' 3 c. P SCOOOOOIZZ “' " Applications of the Scalable Coherent Interface to Q; ug Data Acqu1s1t1on at LHC A. Bogaertsl, J. Buytaertz, R. Divia, H. Miillerl, C. Paxkman, P. Ponting CERN, Geneva, Switzerland B. Skaali, G. Midttun, D. Wormald, J. Wikne University of Oslo, Physics Department, Norway S. Falciano, F. Cesaroni INFN Sezione di Roma and University of Rome, La Sapienza, Italy V.I. Vinogradov Institute of Nuclear Research, Academy of Sciences, Moscow, USSR E.H. Kristiansen, B. Solbergz Dolphin Server Technology A.S., Oslo, Norway A. Guglielmi Digital Equipment Corporation (DEC}, Joint Project at CERN F-H. Worm, J. Bovier Creative Electronic Systems (CES), Geneva, Switzerland C. Davis Radstone Technology plc, Towcester, UK joint spokesmen Fellow at CERN Scientific associate at CERN, funded by Norwegian Research Council for Science and Humanities OCR Output ZKLUIU We propose to use the Scalable Coherent Interface (SCI) as a very high speed interconnect between LHC detector data buffers and farms of com mercial trigger processors. Both the global 2"‘f and 3"d level trigger can be based on SCI as a reconfigurable and scalable system. SCI is a proposed IEEE standard which uses fast point-to-point links to provide computer bus-like services. It can connect a maximum of 65536 nodes (memories or processors), providing data transfer rates of up to 1 Gbyte / s. Scalable data acquisition systems can be built using either simple SCI rings or complex switches. The interconnections may be flat cables, coaxial cables, or optical fibers. SCI protocols have been entirely implemented in VLSI, resulting in a significant simplification of data acquisition software. Novel SCI features allow efficient implementation of both data and processor driven readout architectures. In particular, a very efficient implementation of the 3"d level trigger can be achieved by combining SCI’s shared and distributed mem ory with the virtual memory of modern RISC processors. This approach avoids complex event building hardware and software. Collaboration with computer and VLSI manufacturers is foreseen to assure the production of maintainable components. The proposed studies on SCI hardware and soft ware will be made in collaboration with other LHC R&D projects to provide a basis for future, standard SCI-based data acquisition systems. OCR Output Contents LHC Detector Readout 1.1 Introduction .......................... 1.2 LHC Detect ors ........................ 1.3 Size of the SCI Readout System ............... 1.4 SCI Readout Node Implementation ............ 1.5 Data Streams after 2”“ Level Trigger ............ 16.nerace Itft2”‘ o evererLl Tigg ................ Application of SCI to Global'“* 2and "' 3Level Triggers 2.1 Status of the SCI standard .......... 2.2 Impact of SCI on Data Acquisition Systems 2.3 Coherent Caching ........... 10 2.4 Use of SCI for the 2'“‘ Level 'Irigger 11 2.5 Use of SCI for the 3"d Level 'Trigger .11 2.6 Demonstration Systems ....... 12 Research and Development Program 14 3.1 General Purpose SCI Interface . 14 3.2 SCI Ringlet Test System ....... 14 3.3 Direct SCI-Computer Interface . 15 3.4 SCI Memory ............. 16 3.5 SCI Bridges and Interfaces ...... 16 3.6 SCI/ VME Single Board Computer . 18 3.7 Intelligent Data Controller ...... 18 3.8 Diagnostics, using a Protocol Tracer 18 39.otware Sf ................ 19 3.10 Modelling and Simulation ...... 20 Collaboration with industry 22 Budgets 23 Responsibilities 25 Timescales, Milestones 26 OCR Output 1 LHC Detector Readout 1.1 Introduction According to ECFA studies [1], the event rate for a general purpose LHC detector, operated at a luminosity of 2 >•· 103‘cm2s‘1 will not exceed 105 Hz after the 1" level trigger. The data volume generated by such a detector is estimated as: Inner Tracking: 1 Mbyte per 15 ns bunch, 20 million channels Calorimeter: 200 kbyte per 15 ns bunch, 200 000 channels Muon Tracking: negligible amount from up to 106 sparsely filled channels A possible readout scheme for such a detector, according to the current understanding [1] [2], is illustrated in fig. 1. A first stage of data concentration after the 1** level trigger decision is implemented by electronics located close to the detector. Next, data is carried off the detector by point-to-point links, further concentrated in bus units and stored in data buffers. The data volume of each segment is sufficiently small that these buffers can be implemented using conventional backplane bus systems Event data is further filtered by a 2"" level trigger which is implemented in two stages. The first stage consists of local trigger processors which have access to data of one segment of a detector only. These produce reformatted, reduced events complemented by trigger data which are stored in output buffers. The overall trigger decision is taken by global 2"d level trigger processors which can correlate the pre·processed event and trigger data of the first stage. The event rate at the input stages of the local and global 2"‘ level trigger is estimated at z 100 kHz; the output rate after the global 2**** level trigger at z 1 kHz. With an average of up to 1 ms for the global 2'"' level decision, such a reduction could be obtained by a farm of 100 processors. Final event rejection is accomplished by a 3"" level trigger, based on complete event data, further reducing the rate to z 100 Hz. Both the 3'“ level trigger and data logger can be implemented by a processor farm. There are several reasons not to use buses for the readout system after the local 2"“ level trigger [3]: the expected data rates exceed the capacity of existing buses, the required con nectivity over large distances is very problematic and event building methods based on buses [4] cannot be scaled to the size of an LHC system. We propose to implement the global 2'“’ level trigger, the 3"' level trigger and the data logger using a uniform SCI network: copying of data is avoided and events are stored only in the (local) 2"‘ level output buffers from where data can be accessed by the trigger processors. These and the data logger are all implemented as farms of commercially available computers. A significant simplification of both hardware and software can be ob tained by using the processor’s virtual memory hardware for implicit event building, whilst caches avoid repetitive data access. OCR Output iii$:`—·$<$$E¤§:§i$S<$i`i$S·i:¤:¤5?·:§;§>Q: I{2{7;:;:¥>!i¥;§,§¥$?!~K¤S$§:R?$?Z~$1?‘,{§ .,w__;_;;;; .......... 1 . '—2—Z—‘b1·Z·'·l` '~2~2' ' '£-1·i~Z·Z*Z*Z·Z'·Z·2·2 +2 ·2·Z-§;gm> x>¢ewa3q4§;;:;Tit? ·g;: -· ·.; ·¤: :i· :*· r1 nu. §a ;~: ·>. :;: _::Tl Dill .___; T1 Dm Cuixltrllnfl §:; . ·< ‘ Cmcaumtus :·`....»...........,s¤ ._ » Link1fxunDcxoc\¤rE1e¤¤¤niu a cia.: | 5°°° SCI N°°°‘ ii? SCI Interconnect Fum Farm Fum |v¤m1 Mmm - IVi!¤n1 Muncry in cache | a cms at cane ciosurz rsnagga Figure 1: LHC Detector Readout based on SCI. 1.2 LHC Detectors The EAGLE proto-collaboration serves as a model for our application proposal. Leading candidate detectors include: a silicon preshower tracker, a liquid argon calorimeter and inner tracker. Amongst various candidates for inner tracking, the Silicon Strip [5] and Silicon Preshower Detectors [6] are considered as examples. Roughly 20 * 106 channels need to be processed and compacted for each. For both detectors a large concentration of input channels between the 1'° and 2”d level trigger is foreseen. Preshower Detector: for the Silicon Tracker Preshower ( SITP ) detector, 1" level data compaction and formatting takes place within a VLSI chip. After multiplexing of silicon-pad inputs at the VLSI level, data are further concentrated via 32 bit bus units, containing 64 such chips. At I" level trigger rate of 50 kHz, the output rate per 64 pads is rather low at 12.5 kbyte/ s: a typical 32 bit bus can combine outputs from 64 chips into one link, requiring at least 800 Kbyte/ s data throughput to the 2"‘ level local trigger stage. This stage can be implemented using standard bus units, providing an additional concentration of channels in the order of 10. This stage will require fast processors and storage of compacted data, to be transferred to the further readout system. OCR Output From 25 * 106 silicon pads less than 1000 output channels could be connected to an SCI based readout system. All these SCI nodes need to be capable of transferring the bus-resident 2"" level data to an SCI memory. Silicon Strip detector: the Si Strip Detector is parameterized for 20-100 ps 2”" level decision time at a rate of 1 kHz. An analog pipeline feeds an Analog Pulse Shaper Processor whose output is kept in both analog and digital stores during the 2"" level decision time. 128 such channels are contained in one readout block, consisting of both analog and digital buses. Further concentration by a factor of 32 could be achieved via point to point links to data concentrating bus units. Local 2”“ level trigger processors and buffers in the bus concentrators could be connected to an SCI system with less than 1000 nodes, as in the case of the SITP. Calorimeters: an electromagnetic liquid argon spectrometer will probably provide 200,000 input channels whilst a hadronic spectrometer (such as SPACAL or liquid Argon) will have around 30,000 channels. After the 1“ level decision, less than 1/4 of these channels have to be read out. Proposals to process, format and compact this data are based on VLSI and multi—chip wafer [17] techniques which can integrate local processing and storage.