Tracing and Sampling for Real-Time partially simulated Avionics Systems
Raphaël Beamonte Michel Dagenais
Computer and Software Engineering Department École Polytechnique de Montréal C.P.6079, Station Downtown, Montréal, Québec, Canada, H3C 3A7
DORSAL Project Meeting December 5, 2012 Introduction
Tracing: Study runtime behavior Can be used to measure latency = fundamental for RT debug
Tracer requirements: Low-overhead Consistant maximum latency
Contribution: Methodology and tool to measure real-time latencies (NPT) Application of NPT to measure LTTng-UST latency Improvements to the real-time behavior of LTTng
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 1 / 24 Table of Contents
1 Research Context Real-Time Operating Systems The Linux Tracing Toolkit next-generation, LTTng
2 Test environment Presentation of the test environment System verification
3 Baseline results The Non-Preempt Test tool Latency results
4 Improving LTTng added latency Identify the source of the latency Latency results and comparison
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 2 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux
Xenomai: ADEOS thin-kernel Interrupt management – non-RT cannot preempt RT Hard Real Time ? Espace utilisateur (User-space) (Tâches non-temps réel) RT and non-RT tasks are independent Full Linux platform ? Noyau Linux (Non-temps réel) Tâches temps réel
Micro-noyau (Thin-kernel)
Couche matérielle (Hardware)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 3 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux
Why using the Linux kernel ?
Able to do Soft Real Time, can reach Hard Real Time : BIOS configuration: would you use hyperthreading ? Kernel configuration: PREEMPT_RT patch, which is more and more integrated to the standard kernel Software configuration: interrupts redirection, cpu shielding. . .
The power of the community
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 4 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency The Linux Tracing Toolkit next-generation, LTTng
Why LTTng is pertinent for RT applications ? Both userspace and kernel tracers (same clock source) Statically compiled tracepoints External process to consume events Arbitrary event types (Common Trace Format) Per-CPU ring buffers Important tracing variables protected by RCU
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 5 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency How LTTng-UST consumer works (simplified version)
Ring buffers
Read events in released filled Write events buffers
LTTng LTTng consumer instrumented sys_write to inform that application current buffer is full
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 6 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency Test environment
Hardware: CPU Intel R CoreTM i7 CPU 920 2.67 GHz RAM3 × 2 GiB DDR3 at 1 067 MHz Motherboard Intel DX58SO
Kernels: Standard debian Linux kernel 3.2.0-3-amd64 package version 3.2.21-3 RT debian Linux kernel 3.2.0-3-rt-amd64 package version 3.2.23-1
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 7 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency System verification
hwlatdetect (hwlat_detector): no hardware latency detected during one hour.
hwlatdetect: test duration 3600 seconds parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None
Starting test test finished Max Latency: 0us Samples recorded: 0 Samples exceeding threshold: 0
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 8 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Why NPT ?
What we have with known tools: cyclictest: runs periodic tasks and calculates discrepancy between desired and real period preempt-test: verify if higher priority tasks can preempt lower ones
What we want: A high-priority process that should not stop No latency during the run of this process (no preemption)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 9 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency How NPT works ?
Sets CPU affinity Sets RT priority Locks process memory into RAM to disable swapping Disables local IRQs
Non-stop loops to calculate statistics with rdtsc
Re-enables local IRQs Prints computed statistics
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 10 / 24 tracepoint nptstart
tracepoint nptloop
13: tracepoint nptstop
Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop
1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop
1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: tracepoint nptstart 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: tracepoint nptloop 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while 13: tracepoint nptstop
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency The test procedure
Shield CPUs (cpusets)
Run NPT for 108 loops: Without tracing With LTTng kernel tracing alone With LTTng-UST tracing alone With LTTng-UST and kernel tracing
Do it on: Standard kernel PREEMPT_RT patched kernel
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 12 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results without tracing Histogram generated by NPT for 108 cycles on standard and RT kernels
109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 13 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels
109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 14 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST tracing Histogram generated by NPT for 108 cycles on standard and RT kernels
109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 15 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST and kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels
109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 > 150 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 16 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Identify the source of the latency
Problem Latency added by the LTTng-UST tracing synchronization
Proposed solution Removing synchronization between instrumented application and LTTng consumer: The consumer will now poll to verify if a buffer is full Permanent polling (100% CPU use) and usleep-timed polling = same performances (CPU shielding) Removing LTTng-UST getcpu system call
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 17 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results on the standard kernel Histogram generated by NPT for 108 cycles with original and modified LTTng-UST
109 Original LTTng UST 108 Modified LTTng UST 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 18 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results on the RT kernel Histogram generated by NPT for 108 cycles with original and modified LTTng-UST
109 Original LTTng UST 108 Modified LTTng UST 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 22 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 19 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results with modified LTTng-UST Histograms generated by NPT for 108 cycles on standard and RT kernels
109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103
Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 20 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Numeric comparison Statistics per cycles, in nanoseconds, generated by NPT on both standard and RT kernels for both the original and modified versions of LTTng-UST
Latencies in ns Kernel standard RT LTTng original modified original modified Minimum 287 198 289 197 Mean 317 220 322 206 Maximum 121 744 5 847 19 837 5 088 Variance 74.778 1.186 1.813 1.027 Deviation 273 34.44 42.58 32.05
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 21 / 24 Conclusion
Non-Preempt Test tool
Effects of LTTng tracing on both standard and RT kernels
Modified LTTng according to our observations
Latency is currently as low as 5 µs on both standard and PREEMPT_RT patched kernels
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 22 / 24 Future Work
Continue LTTng-UST latency hunt Integrate our changes in the main branch Study the real-time behavior in non shielded environments Clarify the process of CPU isolation with respect to per-CPU kernel tasks such as RCU reclamation, timer update, . . .
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 23 / 24 Thank you. Any question ?
LTTng www.lttng.org mailing list: [email protected]
NPT: git.dorsal.polymtl.ca/?p=npt.git
Slides: www.dorsal.polymtl.ca/~rbeamonte/ dorsal-pm-dec2012.pdf
Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 24 / 24