Scalable Coherent Interface and LHC: a Good Marriage?
Total Page:16
File Type:pdf, Size:1020Kb
Scalable Coherent Interface and LHC: a good marriage? R.Divia‘, AB0gaerts, J .Buytaert, H Muller; C.Parkman, Pfonting, D.Samyn DRDC RD24 project, CERN, Geneva, Switzerland The Scalable Coherent Interface (SCI) standard opens new possibilities to large scale data acquisition systems in HER Capable of lGByte/s bandwith per connection, SCI systems can be scaled from small test systems to very large networks up to 65535 nodes. The RD24 project is evaluating the possible use of SCI at LHC, where data streams in excess of several 100 MB/s are expected in the event building stage, with special attention to the data moving protocols and the use of caches. Some first ideas on software for scalable SCI systems are presented, with particular focus on the real-time requirements of LHC and the needs for its DAQ system. Access methods for High Energy Physics busses and links High Energy Physics data acquisition systems (DAQs) have several interfacing requirements like speed, iiexibility and reliability for different uses: fast point to point data transfers, broadcast of calibration and system parameters, slow control messages. Several links or busses are used, with different access methods. The most popular are system libraries and memory-mapped access. System libraries (also referred as DMA libraries) have been the iirst approach, mainly to interface Host Computers. CAMAC [l], FASTBUS [2] and HIPPI are normally accessed via system calls. Libraries are characterized by long initial latencies followed by fast transfers. They need a strong inter-process cooperation at the level of operating system and device control and must be able to handle system-specific parameters such as virtual memory, interrupts and processes management. Memory-mapped access is the most recent of the two methods. Here the external bus is seen as part of the processor address range where "windows" are allocated to different access modes and to different target modules. Typical examples of this approach are VMEbus [3], Futurebus+ [4] and SCI [5]. This approach offers low latencies per single transfer but is often inefficient for big blocks of data. Often the main processor is assisted by a DMA device to reduce these latencies and keep the CPU free. Being independent from the operating system (excepted some initial setup) this approach reduces the software complexity and achieves a high degree of portability. It needs neither run-time libraries nor inter-process cooperation. The I/O bus protocol is hidden in the controller’s silicon library, together with all the architecture- specific parameters. One of the novelties appearing in the next generation of DAQ busses is data coherency, where multiple copies of the same data are distributed on remote processors. Coherency, which goes together with caching, is today also supported on external busses such as Futurebus+ and SCI. The number of accesses to the bus is reduced and sharing of data between remote processors is achieved at no cost for the system software designer. In the past implementations, this introduced lack of scalability and robustness. SCI offers new solutions to these problems via an elegant and simple distribute directory scheme. 685 OCR Output A look at SCI SCI is an IEEE standard [5] which uses fast point—to-point links to provide computer-bus like services for high performance highly parallel multiprocessor systems. SCI can connect up to 65535 nodes in a flexible and reconfigurable fashion with expected data rate of up to one megabyte per second on short distances and one megabit per second on copper and optical links. SCI is scalable: adding new nodes increases the capacity of the network. SCI is coherent: remote processors can efficiently share data at low bandwith penalty. The coherency scheme scales well with the number of processors, keeping latencies low. SCI uses 64 bits physical addresses, where the upper 16 bits specify the node number. This simplifies routing, but yields a sparsely populated physical address space. Special services are available for inter-processor synchronization and cache handling. A 64 bits real-time clock is kept in phase across the whole network. Most of the work is performed by the SCI node chip. It responds to processor cycles within a predefined window by translating these into SCI packets. If coherency is required, an extra cache memory controller is needed. Remote processors, memories and I/O modules are seen from the CPU as an extension of the local address space. Virtual memory can be used to achieve a contiguous and uniform I/O stream over a sparse data environment. Since all SCI nodes have simultaneous master (Requester) and slave(Responder) capabilities, the SCI system designer can use both processor-driven and data-driven approaches. Higher performances can be achieved by pipelining multiple requests on a single requester. SCI as LHC interconnect Both second and third level trigger of LHC experiments can make use of SCI for event and trigger data, calibration parameters and slow comrol. Uniform SCI networks can interconnect complete experiments including numerous I/O devices, memories and processors. The expected availability of cheap node chips will make SCI flexible and portable to different devices. The SCI protocols have been designed for highly parallel multiprocessor systems and are therefore well suited for the distributed LHC DAQ architecture. Experiment control, calibration data, even DAQ software, normally distributed and updated at high run-time to all processors, can now trivially be shared and kept coherent via SCI. As an extension of the basic protocol of SCI, packets destinated to the same node can be "combined", reducing contention and optimizing concurrent accesses [6]. This helps all the configurations where the data flows towards common sinks, such as event builders, calibration databases and experiment controllers. Other extensions can reduce cache-update latencies from order(N) to order(log N), for N actively sharing processors. The memory-mapped approach of SCI simplifies the DAQ system software. Arrays, records, objects can be defined and shared at run-time. Reconfiguration can be done either via pointers or MMUs, achieving flexibility and reducing the debugging effort. An SCI extension project on shared data formats for SCI-based distributed multi-processing environment [7] is currently under IEEE ballot. The standard is intented to support efficient transfers among hetero geneous byte-addressable processors within an SCI—based distributed computing environment. The specification dehnes integer, floating-point and bit-fields entities with different attributes (byte-ordering, address alignment, caching). A specific configuration file can be prepared for any system where an ANSI C [8] compiler is available and IEEE floating point [9] entities can be handled. 686 OCR Output SCI offers a uniform solution for DAQ data and control streams, ranging from FASTBUS, VMEbus and Futurebus+ hosted boards up to large computers. It is compatible with IEEE Std 1212-1991 CSR architecture for device control and transactions [10] used to communicate between nodes within Multiprocessing systems. The SCI design is compatible with SerialBus [12], a low-cost high—performance peripheral bus capable of up to 100 megabits per second. The SCI standard will be extended to follow future technology trends, such as CMOS low—voltage swing node chips for low data rates (250 MBytes/s or faster) and at lower costs [13]. I/O Bridges are being defined for the most popular busses currently used in the HEP DAQ world (VMEbus, FASTBUS and Futurebus+). The target is to get transparent re·mapping and - if possible - to keep data coherency, with a page-level specification and protection. Special Load and Store instructions are available to bridge "foreign" atomic entities, for sizes of 1, 2, 4, 8 or 16 bytes. The DRDC RD24 project at CERN SCI is an IEEE standard the has been strongly influenced by the industry and the academic worlds. The CERN contribution to the SCI project began in 1990 with the ECFA (European Computing for Future Accelerator) working group on "busses and links standards". Target of the group was the evaluation of different busses for LHC DAQ. One of the best candidate was the newcomer (at that time) SCI [11]. A proposal for the study of application of SCI to data acquisition at LHC [14] was then submitted to the DRDC (Detector Research and Development Committee) board that approved it in early 1992. The DRDC Research and Development project 24 (RD24) has extensively simulated the SCI low-level and data coherency protocols. Different configurations have been evaluated for optimized DAQ architectures, as needed by the LHC collaborations. With the first delivery of GaAs SCI node chips the project will complete its first hardware design: a RISC-based block mover with hardware links to VMEbus and TurboChannel [15]. This board will be used as a test-bench to evaluate the suitability of SCI for DAQ systems at LHC. Most of its functionality will be implemented in firmware to offer enough flexibility and to provide intelligent block mover like capabilities between local RAM buffers, VMEbus, TurboChannel and SCI. Other RD24 future developments include bridges to Fastbus and Futurebus+, adapters to MC68020 and MC6804O busses &1'1d an SCI interface to the Fast Dual Port Memory developed by RD12. On-line documentation and source code is available from the project via anonymous ftp. Conclusions LHC data acquisition can take advantage of novel features proposed by SCI