Ttethernet Basics Critical Traffic Over Ttethernet Clock Synchronization Principles Fault Tolerance Ttethernet Products Overview
Total Page:16
File Type:pdf, Size:1020Kb
Ensuring Reliable Networks Theory, Concepts and Applications ETR 2015 – Rennes August, the 27th Jean-Baptiste Chaudron [email protected] www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 1 AGENDA Ensuring Reliable Networks Introduction TTEthernet Basics Critical Traffic over TTEthernet Clock Synchronization Principles Fault Tolerance TTEthernet Products Overview www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 2 AGENDA Ensuring Reliable Networks Introduction TTEthernet Basics Critical Traffic over TTEthernet Clock Synchronization Principles Fault Tolerance TTEthernet Products Overview www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 3 Introduction Real-Time Computer System Ensuring Reliable Networks A real-time computer system is a computer system in which the correctness of the system behavior depends not only on the logical results of the computations, but also on the physical time, when these results are produced [Kop97]. The point in time when a certain action must be finished is called deadline. • Soft deadlines: If the result has utility after the deadline. • Hard deadlines: Missing a deadline can result in a catastrophic event. Computer systems classification • Guaranteed Timeliness – RT systems • Best Effort – no timing guarantees – no RT systems www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 4 Introduction Distributed Real-Time System Ensuring Reliable Networks Reasons for distribution: • Scalability – single computer systems have limited computing resources • Complexity – handling through smaller simpler intelligent units • Safe wiring – from single computer to different sensors/actuators • Fault-tolerance – avoid single point of failure Control loops: sensor actuator • Periodic operation Sensor – communicate – calculate - actuator • Low end-to-end communication latency enables node1 node2 implementation of tighter control • Real-time communication real-time bus www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 5 Introduction Latency vs. Deadline Ensuring Reliable Networks min jitter max Relevant input/measurement occurs at Node A Latency of system response Deadline for system response Node A processes input Result is communicated with node B Node B acts upon result from A Flow of time www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 6 Introduction End to End Latency (1) Ensuring Reliable Networks The time interval between the initiation of transmission from the host computer to other host computer at the receiver depend on many factors: •Communication protocol, Media access control (MAC) •Transmission speed, cable lengths •Network load Node Node Host computer Host computer Communication Communication Controller Controller Time www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 7 Introduction End to End Latency (2) Ensuring Reliable Networks CSMA access free channel one message in transit (or pending, but with higher Communication can be priority than own message) delayed by: • Concurrency of two messages in transit / pending transmissions and the media access strategy three messages in transit … e.g., CSMA time tmin tmsg tmsg tmsg tmsg PAR Communication can also be delayed by: No error • Error handling strategy, e.g. PAR Retransmit once (Positive Acknowledge or Retransmit) Retransmit twice • Bus access delays due to EMI (External Retransmit three times Memory Interface) - wait for bus idle time tmin tmsg tmsg tmsg tmsg www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 8 Introduction Peak Load Handling Challenge (1) Ensuring Reliable Networks Peak load handling • Peak load situation: all nodes on the shared bus require communication services at the same time, send maximum amount/length of data, highest priority messages • Problem: find out in which scenario this happens, and what the actual load and the worst-case message delays are at this time • In event driven systems this can be very complex • Even more complicated if faults that lead to retransmission of messages must be accounted for • Experiments or approximate scheduling can only offer “probabilities”: • Ex: latency for message X less than 500 µs in 99,96%, but no guaranteed worst case latency www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 9 Introduction Peak Load Handling Challenge (2) Ensuring Reliable Networks Thrashing: • Abruptly decreasing throughput that occurs with an increase of the system load. Cause of trashing: • Retry mechanism in PAR protocols (error handling and time-outs) • Combined with the waits from the CSMA access throughput Ideal system 100 % thrashing point Real system requested load www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 10 Introduction Deterministic Networks Ensuring Reliable Networks Features of deterministic networks: •Known (maximum) end-to-end latency •Bounded and small jitter •Message ordering guarantee •Error detection •Masquerade protection www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 11 Introduction TT-System vs. ET-System Ensuring Reliable Networks Transportation - example • Cars and taxis are event-triggered: • they go whenever they are needed • Trains are time-triggered: • they go according to a fixed schedule • Advantage of the event-triggered approach: very flexible • Advantage of the time-triggered approach: very predictable When would you prefer a time-triggered solution? www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 12 Introduction Why Clock Synchronization? (1) Ensuring Reliable Networks In RT systems all ‘layers’ of functionality in the system must meet the ‘quality of service’ requirements defined by the application: •the application layer must operate timely and predictably, reading the sensors in time, computing correct values, updating actuators reliably etc. •the communication layer must meet the specified functionality of transmitting information between the nodes in the system, and must also do this predictably and timely Timely operation: •Coordination of the computer nodes in the time domain •Clock synchronization www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 13 Introduction Why Clock Synchronization? (2) Ensuring Reliable Networks Local clocks, a counter triggered by an oscillator. Oscillators have nominal rate (10 Mhz), and a certain drift rate. • Standard drift rates of oscillators in the market: 10-3s/s to 10-5 s/s • Oscillators with small drift rate ~ 10-6s/s – expensive • What does 10-3s/s drift rate mean? . 1 microsecond deviation every 1 millisecond, . 1 second deviation in 1000 seconds, . equals 10 min. deviation per week. Oscillator drift rate can be affected by other factors like: • temperature, humidity, … Clock synchronization: keeps clocks of distributed computers close to each other www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 14 Introduction Global Notion of Time Ensuring Reliable Networks 1. GLOBAL notion of time, built on top of local time Local clocks - free running Local view of global time 2. Activities triggered on basis of global time www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 15 Introduction Precision Interval (1) Ensuring Reliable Networks The precision, or precision interval (denoted ), is the upper bound between the slowest and the fastest non-faulty clock in the system. A “fast” clock and a “slow” clock will never differ by more than one precision interval. Clock 1 10:45 11:00 11:15 Clock 2 10:45 11:00 11:15 Clock 3 10:45 11:00 11:15 www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 16 Introduction Precision Interval (2) Ensuring Reliable Networks The precision interval in a distributed system depends on • the hardware properties of each clock (clock drifts, e.g. 100 ppm) • the resynchronization interval (e.g. 5 ms) • the resynchronization method used (how efficient does it work) smaller precision interval smaller timeouts more efficient system drift offset resynchronization interval Average clock precision interval precision www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 17 Introduction Whole System Synchronization (1) Ensuring Reliable Networks Two different approaches... Yes, 15:00 centralized vs. distributed I want to join, what‘s the time? 15:00 It‘s 15:00! 14:59 OK, 15:00. I see, 15:00. 15:00 www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 18 Introduction Whole System Synchronization (2) Ensuring Reliable Networks Synchronization to external time reference is possible • all nodes can apply a bounded correction term to slightly speed up or slow down the local clock End System with GPS receiver • the precision window will never be left – time will never go backwards 15:00 • this mechanism can be used to broadcast a correction value relative to some external time reference (e.g. GPS time) 15:00 • application of this term to the local node is performed by the host CPU 15:00 www.tttech.com Copyright © TTTech Computertechnik AG. All rights reserved. Page 19 Introduction Time Triggered System – Summary (1) Ensuring Reliable Networks Any Time-Triggered System must have two key properties: 1 a global notion of time • in case of a distributed system: a GLOBAL notion of time,