Maximizing Vcmts Data Plane Performance with 3Rd Gen Intel® Xeon® Scalable Processor Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Maximizing Vcmts Data Plane Performance with 3Rd Gen Intel® Xeon® Scalable Processor Architecture White Paper Processor Architecture Study Maximizing vCMTS Data Plane Performance with 3rd Gen Intel® Xeon® Scalable Processor Architecture Intel platform technologies boost virtualized cable modem termination system (vCMTS) data plane performance by 70% Authors Introduction The standardization of distributed access architecture (DAA) for DOCSIS and Brendan Ryan further advancements in the flexible MAC architecture (FMA) standard have System Architect (Intel) enabled the transition to a software-centric cable network infrastructure. This Michael O‘Hanlon paper will explore how Intel technologies can be utilized to increase performance Senior Principal Engineer (Intel) for vCMTS for the various deployment scenarios shown below (see Figure 1). In each scenario, the same DOCSIS MAC software may be deployed, whether as: David Coyle Senior Software Engineer (Intel) • a virtual MAC core (vCore), also known as a virtualized cable modem termination system (vCMTS) on a server in a multiple-system operator (MSO) headend, Rory Sexton • part of a remote-MAC-core (RMC) and remote-PHY deployment on an edge Network Software Engineer (Intel) compute node, Subhiksha Ravisundar • or part of a remote-MAC-PHY device (RMD) deployment on an edge compute node Network Software Engineer (Intel) R-PHY and Virtual-MAC-Core R-PHY and Remote-MAC-Core Head End / CO Head End / CO Table of Contents DOCSIS MAC SW RPD RMC Fiber Fiber Intel vCMTS Reference RPD Coax Fiber CPU Coax Data Plane . 2 DOCSIS vCores MAC SW RPD RMC vCMTS Data Plane Performance RPD Coax CPU Coax DOCSIS Analysis – Single Service Group . 3 MAC SW Crypto and CRC Processing in the R-MAC-PHY vCMTS Data Plane Pipeline . 4 Head End / CO Using Hyper-threading in the vCMTS RMD Fiber Data Plane Pipeline . 6 Coax CPU DOCSIS System Scalability and MAC SW RMD Server Sizing . 8 Coax CPU DOCSIS Compelling vCMTS Performance MAC SW on 3rd Gen Intel® Xeon® Scalable Figure 1 . Flexible MAC Architecture (FMA) Deployment Scenarios Processors . 8 Performance for such a software-centric approach is greatly boosted by Appendix A . 10 technologies such as Data Plane Development Kit (DPDK) and the Intel Multi-Buffer Appendix B . 12 Crypto library (intel-ipsec-mb), which provide highly optimized packet processing in software that is tightly coupled to the ever-developing Intel® architecture. Intel Appendix C . 13 architecture provides native instructions and features that specifically accelerate Appendix D . 13 data plane packet processing for access network functions such as vCMTS. Appendix E . 15 White Paper | Processor Architecture Study 2 +50% 1 Crypto L1 3 Cache Crypto 2x + <=42% 4 L2 2 cores Cache +25% +45%5 B/W Memory Level 3 Cache Level 3 Cache +11%/Core 8 3 x 16 4 x 16 +33% 6 Lanes PCle Gen 3 PCle Gen 4 2x B/W 7 2nd Generation CPU 3rd Generation CPU Figure 2 . Intel® Xeon® CPU Generational Enhancements Independent software vendors (ISVs) can significantly mix (IMIX) traffic blend. When Intel® QuickAssist Technology improve vCMTS performance on 3rd Gen Intel® Xeon® (Intel® QAT) acceleration is employed in the system, a Scalable processor architecture (codename “Ice Lake”) single core can satisfy close to the maximum downstream by taking advantage of the gen-on-gen CPU architecture bandwidth of a pure DOCSIS 3.1 configuration. Since vCMTS enhancements shown in Figure 2 which include bigger workloads exhibit good scalability, a typical server blade cache-sizes at each level, higher core-count, more memory based on dual 3rd Gen Intel Xeon Scalable processors channels at higher speed and greater I/O bandwidth. (e.g. with 64 processor cores total) can achieve compelling performance density. This paper focusses specifically on how to take advantage of advanced features such as: It is worth noting that these performance benefits can also be delivered by deploying a general-purpose compute • Enhanced Intel® Advanced Vector Extensions 512 (Intel® component in the RMC and RMD scenarios shown in Figure AVX-512) 1. In other words, the same performance benefits of Intel • Dual AES encryption engines architecture and complementary technologies, such as Intel QAT and the Intel Multi-Buffer Crypto library, can be used to • Intel® Vector AES New Instructions (AES-NI) achieve similar performance on an edge compute node. • Intel® Vector PCLMULQDQ carry-less multiplication DOCSIS MAC functionality can be broken down into four instruction categories: downstream data plane, upstream data plane, • Intel® QuickAssist Technology control plane, and system management. From a network performance perspective, the most compute-intensive The paper also provides insights into implementation options workload is data plane processing, and consequently, is the and establishes an empirical performance data baseline that focus of this paper. can be used to estimate the capability of a vCMTS platform running on industry-standard, high-volume servers based on Intel vCMTS Reference Data Plane 3rd Gen Intel Xeon Scalable processor architecture. Actual measurements were taken on a 3rd Gen Intel Xeon Scalable Intel has developed a vCMTS reference data plane that processor-based system (see Appendix D for configuration is compliant with DOCSIS 3.1 specifications and based details). on the DPDK packet processing framework. It is publicly available on the Intel 01.org open-source website at The data demonstrates how a single 3rd Gen Intel Xeon 01.org/access-network-dataplanes. The main objective of Scalable processor core running at 2.2 GHz clock speed can this development is to provide a tool to demonstrate the support the downstream channel bandwidth for close to vCMTS data plane packet processing performance of Intel® five orthogonal, frequency-division multiplexing (OFDM) Xeon® processor-based platforms and to assist ISVs and channels in a pure DOCSIS 3.1 configuration for an Internet MSOs in deploying a vCMTS. White Paper | Processor Architecture Study 3 Upper MAC Lower MAC Combined Operation Downstream Receive IP Cable Modem DOCSIS DOCSIS DOCSIS QoS DOCSIS IP Frame CRC DOCSIS DEPI Transmit IP Frames Lookup & Sub Filtering Qos Scheduling (SF Framing Generation BPI+ Encapsulation DEPI/L2TP (DEPI/L2TP) Mgmt Classification & Channel) (incl HCS) Encryption Frames librte_ethdev librte_hash librte_acl librte_acl librte_sched librte_net_crc librte_cryptodev librte_cryptodev librte_mbuf librte_ethdev Combined Operation Upstream Transmit IP Frame CRC DOCSIS BPI+ DOCSIS DOCSIS DOCSIS Service ID UEPI Validate Receive IP IP Frames Verification Decryption Frame HCS Frame Segment Lookup Decapsulation Frame & Strip UEPI/L2TP (UEPI/L2TP) Verification Extraction Reassembly L2TP/IP Hdrs Frames librte_ethdev librte_cryptodev librte_cryptodev librte_net_crc librte_mbuf librte_mbuf librte_hash librte_mbuf librte_mbuf librte_ethdev Figure 3 . Intel vCMTS Reference Data Plane Figure 3 shows the upstream and downstream packet vCMTS Data Plane Performance Analysis – Single processing pipelines implemented for the Intel vCMTS Service Group Reference Data Plane. The downstream data plane is Performance tests were run using the Intel vCMTS Reference implemented as a two-stage pipeline, which perform upper Data plane with a channel configuration to demonstrate MAC and lower MAC processing respectively. The DPDK API the maximum throughput capability for a pure DOCSIS used for each significant DOCSIS MAC data plane packet- 3.1 deployment. Performance with crypto acceleration processing stage is also shown. using both the Intel Multi-Buffer Crypto library and Intel A detailed description of the upstream and downstream QAT is shown. The performance of the 3rd Gen Intel Xeon packet processing stages shown in Figure 3 is provided Scalable processor running at 2.2 GHz core frequency is in Appendix A. Many key innovations and performance also compared to a 2nd Generation Intel® Xeon® Scalable optimizations are supported by the Intel vCMTS Reference processor (codename “Cascade Lake”) running at 2.3 GHz.9 Data Plane, including the following: The channel configuration is a pure DOCSIS 3.1 deployment • Optimized multi-buffer implementation of combined AES with six OFDM channels that produce a theoretical cryptographic (crypto) and CRC processing based on Intel cumulative bandwidth of 11.3 Gbps; however, the effective AES-NI and AVX-512 instructions downstream bandwidth is limited to 10 Gbps for the DOCSIS 3.1 bandwidth limit per service group (SG). Note that this is • Optimized multi-buffer implementation of DES crypto enforced by the vCMTS data-plane pipeline with Downstream processing based on Intel AVX-512 instructions QoS rate-limiting of 10 Gbps per service group. • Acceleration of AES and DES DOCSIS crypto processing using Intel QuickAssist Technology Single Service Group Downstream Throughput • Optimized CRC32 and DOCSIS header check sequence (SINGLE CPU CORE) (HCS) calculation based on Intel AVX-512 instructions 12.00 • DOCSIS service flow and channel scheduling DOCSIS 3.1 DOWNSTREAM B/W LIMIT 10.00 implementation based on DPDK HQoS LINERATE : 5xOFDM : 9.5 GBPS • Packet Streaming Protocol (PSP) fragmentation/ 8.00 reassembly implementation based on the DPDK Mbuf API LINERATE : 4xOFDM : 7.6 GBPS • DOCSIS MAC data plane pre-processing using the 100G 6.00 Intel® Ethernet 800 Series network interface card (NIC) LINERATE : 32xSC-QAM, 2xOFDM : 5.7 GBPS Throughput (Gbps) Throughput • Configurable DOCSIS MAC data plane threading options 4.00 – dual-thread, single-thread, separate or combined upstream and downstream threads 2.00 I M The Intel vCMTS Reference Data Plane runs within
Recommended publications
  • Designing an Ultra Low-Overhead Multithreading Runtime for Nim
    Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy Weave [email protected] https://github.com/mratsim/weave Hello! I am Mamy Ratsimbazafy During the day blockchain/Ethereum 2 developer (in Nim) During the night, deep learning and numerical computing developer (in Nim) and data scientist (in Python) You can contact me at [email protected] Github: mratsim Twitter: m_ratsim 2 Where did this talk came from? ◇ 3 years ago: started writing a tensor library in Nim. ◇ 2 threading APIs at the time: OpenMP and simple threadpool ◇ 1 year ago: complete refactoring of the internals 3 Agenda ◇ Understanding the design space ◇ Hardware and software multithreading: definitions and use-cases ◇ Parallel APIs ◇ Sources of overhead and runtime design ◇ Minimum viable runtime plan in a weekend 4 Understanding the 1 design space Concurrency vs parallelism, latency vs throughput Cooperative vs preemptive, IO vs CPU 5 Parallelism is not 6 concurrency Kernel threading 7 models 1:1 Threading 1 application thread -> 1 hardware thread N:1 Threading N application threads -> 1 hardware thread M:N Threading M application threads -> N hardware threads The same distinctions can be done at a multithreaded language or multithreading runtime level. 8 The problem How to schedule M tasks on N hardware threads? Latency vs 9 Throughput - Do we want to do all the work in a minimal amount of time? - Numerical computing - Machine learning - ... - Do we want to be fair? - Clients-server - Video decoding - ... Cooperative vs 10 Preemptive Cooperative multithreading:
    [Show full text]
  • Fiber Context As a first-Class Object
    Document number: P0876R6 Date: 2019-06-17 Author: Oliver Kowalke ([email protected]) Nat Goodspeed ([email protected]) Audience: SG1 fiber_context - fibers without scheduler Revision History . .1 abstract . .2 control transfer mechanism . .2 std::fiber_context as a first-class object . .3 encapsulating the stack . .4 invalidation at resumption . .4 problem: avoiding non-const global variables and undefined behaviour . .5 solution: avoiding non-const global variables and undefined behaviour . .6 inject function into suspended fiber . 11 passing data between fibers . 12 termination . 13 exceptions . 16 std::fiber_context as building block for higher-level frameworks . 16 interaction with STL algorithms . 18 possible implementation strategies . 19 fiber switch on architectures with register window . 20 how fast is a fiber switch . 20 interaction with accelerators . 20 multi-threading environment . 21 acknowledgments . 21 API ................................................................ 22 33.7 Cooperative User-Mode Threads . 22 33.7.1 General . 22 33.7.2 Empty vs. Non-Empty . 22 33.7.3 Explicit Fiber vs. Implicit Fiber . 22 33.7.4 Implicit Top-Level Function . 22 33.7.5 Header <experimental/fiber_context> synopsis . 23 33.7.6 Class fiber_context . 23 33.7.7 Function unwind_fiber() . 27 references . 29 Revision History This document supersedes P0876R5. Changes since P0876R5: • std::unwind_exception removed: stack unwinding must be performed by platform facilities. • fiber_context::can_resume_from_any_thread() renamed to can_resume_from_this_thread(). • fiber_context::valid() renamed to empty() with inverted sense. • Material has been added concerning the top-level wrapper logic governing each fiber. The change to unwinding fiber stacks using an anonymous foreign exception not catchable by C++ try / catch blocks is in response to deep discussions in Kona 2019 of the surprisingly numerous problems surfaced by using an ordinary C++ exception for that purpose.
    [Show full text]
  • Fiber-Based Architecture for NFV Cloud Databases
    Fiber-based Architecture for NFV Cloud Databases Vaidas Gasiunas, David Dominguez-Sal, Ralph Acker, Aharon Avitzur, Ilan Bronshtein, Rushan Chen, Eli Ginot, Norbert Martinez-Bazan, Michael Muller,¨ Alexander Nozdrin, Weijie Ou, Nir Pachter, Dima Sivov, Eliezer Levy Huawei Technologies, Gauss Database Labs, European Research Institute fvaidas.gasiunas,david.dominguez1,fi[email protected] ABSTRACT to a specific workload, as well as to change this alloca- The telco industry is gradually shifting from using mono- tion on demand. Elasticity is especially attractive to op- lithic software packages deployed on custom hardware to erators who are very conscious about Total Cost of Own- using modular virtualized software functions deployed on ership (TCO). NFV meets databases (DBs) in more than cloudified data centers using commodity hardware. This a single aspect [20, 30]. First, the NFV concepts of modu- transformation is referred to as Network Function Virtual- lar software functions accelerate the separation of function ization (NFV). The scalability of the databases (DBs) un- (business logic) and state (data) in the design and imple- derlying the virtual network functions is the cornerstone for mentation of network functions (NFs). This in turn enables reaping the benefits from the NFV transformation. This independent scalability of the DB component, and under- paper presents an industrial experience of applying shared- lines its importance in providing elastic state repositories nothing techniques in order to achieve the scalability of a DB for various NFs. In principle, the NFV-ready DBs that are in an NFV setup. The special combination of requirements relevant to our discussion are sophisticated main-memory in NFV DBs are not easily met with conventional execution key-value (KV) stores with stringent high availability (HA) models.
    [Show full text]
  • A Thread Performance Comparison: Windows NT and Solaris on a Symmetric Multiprocessor
    The following paper was originally published in the Proceedings of the 2nd USENIX Windows NT Symposium Seattle, Washington, August 3–4, 1998 A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Brooklyn College and CUNY Graduate School For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ A Thread Performance Comparison: Windows NT and Solaris on A Symmetric Multiprocessor Fabian Zabatta and Kevin Ying Logic Based Systems Lab Brooklyn College and CUNY Graduate School Computer Science Department 2900 Bedford Avenue Brooklyn, New York 11210 {fabian, kevin}@sci.brooklyn.cuny.edu Abstract Manufacturers now have the capability to build high performance multiprocessor machines with common CPU CPU CPU CPU cache cache cache cache PC components. This has created a new market of low cost multiprocessor machines. However, these machines are handicapped unless they have an oper- Bus ating system that can take advantage of their under- lying architectures. Presented is a comparison of Shared Memory I/O two such operating systems, Windows NT and So- laris. By focusing on their implementation of Figure 1: A basic symmetric multiprocessor architecture. threads, we show each system's ability to exploit multiprocessors. We report our results and inter- pretations of several experiments that were used to compare the performance of each system. What 1.1 Symmetric Multiprocessing emerges is a discussion on the performance impact of each implementation and its significance on vari- Symmetric multiprocessing (SMP) is the primary ous types of applications.
    [Show full text]
  • Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
    Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again.
    [Show full text]
  • The Impact of Process Placement and Oversubscription on Application Performance: a Case Study for Exascale Computing
    Takustraße 7 Konrad-Zuse-Zentrum D-14195 Berlin-Dahlem fur¨ Informationstechnik Berlin Germany FLORIAN WENDE,THOMAS STEINKE,ALEXANDER REINEFELD The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing ZIB-Report 15-05 (January 2015) Herausgegeben vom Konrad-Zuse-Zentrum fur¨ Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Telefon: 030-84185-0 Telefax: 030-84185-125 e-mail: [email protected] URL: http://www.zib.de ZIB-Report (Print) ISSN 1438-0064 ZIB-Report (Internet) ISSN 2192-7782 The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing Florian Wende, Thomas Steinke, Alexander Reinefeld Abstract: With the growing number of hardware components and the increasing software complexity in the upcoming exascale computers, system failures will become the norm rather than an exception for long-running applications. Fault-tolerance can be achieved by the creation of checkpoints dur- ing the execution of a parallel program. Checkpoint/Restart (C/R) mechanisms allow for both task migration (even if there were no hardware faults) and restarting of tasks after the occurrence of hard- ware faults. Affected tasks are then migrated to other nodes which may result in unfortunate process placement and/or oversubscription of compute resources. In this paper we analyze the impact of unfortunate process placement and oversubscription of compute resources on the performance and scalability of two typical HPC application workloads, CP2K and MOM5. Results are given for a Cray XC30/40 with Aries dragonfly topology. Our results indicate that unfortunate process placement has only little negative impact while oversubscription substantially degrades the performance.
    [Show full text]
  • And Computer Systems I: Introduction Systems
    Welcome to Computer Team! Noah Singer Introduction Welcome to Computer Team! Units Computer and Computer Systems I: Introduction Systems Computer Parts Noah Singer Montgomery Blair High School Computer Team September 14, 2017 Overview Welcome to Computer Team! Noah Singer Introduction 1 Introduction Units Computer Systems 2 Units Computer Parts 3 Computer Systems 4 Computer Parts Welcome to Computer Team! Noah Singer Introduction Units Computer Systems Computer Parts Section 1: Introduction Computer Team Welcome to Computer Team! Computer Team is... Noah Singer A place to hang out and eat lunch every Thursday Introduction A cool discussion group for computer science Units Computer An award-winning competition programming group Systems A place for people to learn from each other Computer Parts Open to people of all experience levels Computer Team isn’t... A programming club A highly competitive environment A rigorous course that requires commitment Structure Welcome to Computer Team! Noah Singer Introduction Units Four units, each with five to ten lectures Computer Each lecture comes with notes that you can take home! Systems Computer Guided activities to help you understand more Parts difficult/technical concepts Special presentations and guest lectures on interesting topics Conventions Welcome to Computer Team! Noah Singer Vocabulary terms are marked in bold. Introduction Definition Units Computer Definitions are in boxes (along with theorems, etc.). Systems Computer Parts Important statements are highlighted in italics. Formulas are written
    [Show full text]
  • Enabling Technologies for Distributed and Cloud Computing Dr
    Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies CPU’s today assume a multi-core architecture with dual, quad, six, or more processing cores. The clock rate increased from 10 MHz for Intel 286 to 4 GHz for Pentium 4 in 30 years. However, the clock rate reached its limit on CMOS chips due to power limitations. Clock speeds cannot continue to increase due to excessive heat generation and current leakage. Multi-core CPUs can handle multiple instruction threads. Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies LI cache is private to each core, L2 cache is shared and L3 cache or DRAM is off the chip. Examples of multi-core CPUs include Intel i7, Xeon, AMD Opteron. Each core can also be multithreaded. E.g. the Niagara II has 8 cores with each core handling 8 threads for a total of 64 threads maximum. Technologies for Network-Based Systems Hyper-threading (HT) Technology A feature of certain Intel chips (such as Xeon, i7) that makes one physical core appear as two logical processors. On an operating system level, a single-core CPU with Hyper- Threading technology will be reported as two logical processors, dual-core CPU with HT is reported as four logical processors, and so on. HT adds a second set of general, control and special registers. The second set of registers allows the CPU to keep the state of both cores, and effortlessly switch between them by switching the register set.
    [Show full text]
  • Eindhoven University of Technology MASTER Acceleration of A
    Eindhoven University of Technology MASTER Acceleration of a geodesic fiber-tracking algorithm for diffusion tensor imaging using CUDA van Aart, E. Award date: 2010 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Acceleration of a Geodesic Fiber-Tracking Algorithm for Diffusion Tensor Imaging using CUDA Evert van Aart June 2010 Graduation Supervisor Dr. Andrei Jalba Eindhoven University of Technology Department of Mathematics and Computer Science Visualization group Committee Members Dr. Andrei Jalba Dr. Anna Vilanova Eindhoven University of Technology Department of Biomedical Engineering Biomedical Image Analysis group Dr. Bart Mesman Eindhoven University of Technology Department of Electrical Engineering Electronic Systems group Contents 1 Introduction 3 1.1 Problem Description .
    [Show full text]
  • HPX-Agseries
    Digital Fiber-Optic Sensors Easy operation and high performance for a variety of applications HPX-AG Series Dual display shows incoming light level and preset value side by side. High sensitivity and ultra long distance (1,200mm with the standard HPF-T003 thru scan fiber in high power mode) Three types of auto-tuning: 2-point, BGS and % EXPLANATION OF MAJOR FUNCTIONS AND FEATURES Easy-to-use design Auto-tuning button Function selection button Dual display panel Preset value Incoming light level The digital dual display panel indicates incoming light level and preset value side by side, so it is easy to check current scanning status while setting the sensor. The button layout is especially designed to ensure easy operation of gang-mounted sensors. Output indicator +Button -Button Easy operation Digital manual tuning With controls that are as easy to operate as a conventional potentiometer, and with easy-to-read digital display, settings can be changed directly in RUN mode. Less wiring, fewer man-hours The new models offer a choice between cable lead-out, M8 connector for fast installation, and reduced wiring models that receive power directly from adjacent gang-mounted units. Up to 16 units can be gang-mounted, with only the main unit wired to the power supply. M8 connector type Reduced wiring type Units are designed so that connecting cable is not required. There are no worries about wiring mishaps when mounting or dismounting. NEW Wiring with an M8 connector cable Add 1 sensor— save 2 wires! Up to 16 units can be combined. Since power is supplied through the gang-mounting connectors, only an output wire is required for the expansion units.
    [Show full text]
  • Fibers Are Not (P)Threads the Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations
    Fibers are not (P)Threads The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations Joseph Schuchart Christoph Niethammer José Gracia [email protected] [email protected] [email protected] Höchstleistungsrechenzentrum Höchstleistungsrechenzentrum Höchstleistungsrechenzentrum Stuttgart (HLRS) Stuttgart (HLRS) Stuttgart (HLRS) Stuttgart, Germany Stuttgart, Germany Stuttgart, Germany ABSTRACT multi- and many-core systems by exposing all available concur- Asynchronous programming models (APM) are gaining more and rency to a runtime system. Applications are expressed in terms of more traction, allowing applications to expose the available concur- work-packages (tasks) with well-defined inputs and outputs, which rency to a runtime system tasked with coordinating the execution. guide the runtime scheduler in determining the correct execution While MPI has long provided support for multi-threaded commu- order of tasks. A wide range of task-based programming models nication and non-blocking operations, it falls short of adequately have been developed, ranging from node-local approaches such supporting APMs as correctly and efficiently handling MPI com- as OpenMP [40] and OmpSs [13] to distributed systems such as munication in different models is still a challenge. Meanwhile, new HPX [27], StarPU [3], DASH [49], and PaRSEC [7]. All of them have low-level implementations of light-weight, cooperatively scheduled in common that a set of tasks is scheduled for execution by a set of execution contexts (fibers, aka user-level threads (ULT)) are meant to worker threads based on constraints provided by the user, either in serve as a basis for higher-level APMs and their integration in MPI the form of dependencies or dataflow expressions.
    [Show full text]
  • Use of High Perfo Nce Networks and Supercomputers Ime Flight Simulation
    USE OF HIGH PERFO NCE NETWORKS AND SUPERCOMPUTERS IME FLIGHT SIMULATION Jeff I. Cleveland EE Project Engineer 3 National Aeronautics and Space Ad I Langley Research Center Nampton, Virginia 23681-0001 ABSTRACT In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations must be consistent in processing time and be completed in as short a time as possible. These operations include simulation mathematical model computation and data inputloutput to the shulators. En 1986, in response to increased demands for flight simulation performance, NASA's Langley Research Center (LaRC), working with the contractor, developed extensions to the Computer Automated Measurement and Control (CAMAC) technology which resulted in a factor of ten increase in the effective bandwidlla and reduced latency of modules necessary for simulator communication. This technology extension is being used by more than 80 leading technological developers in the United States, Canada, and Europe. Included among the commercial applications are nuclear process control, power grid analysis, process monitoring, real-lime simulation, and radar data acquisition. Personnel at LaRC are completing the development of the use of supercomputers for mathematical model computation to support real-time flight simulation. This includes the development of a real-time operating system and development of specialized software and hardwxe for the simulator network. This paper describes the data acquisition technology and the development of supek-eompu~ng for flight simulation. INTRODUCTION NASA's Langley Research Center (LaRC) has used real-time flight simulation to support aerodynmic, space, and hardware research for over forty years.
    [Show full text]