Tracing and Sampling for Real-Time partially simulated Avionics Systems

Raphaël Beamonte Michel Dagenais

Computer and Software Engineering Department École Polytechnique de Montréal C.P.6079, Station Downtown, Montréal, Québec, Canada, H3C 3A7

DORSAL Project Meeting December 5, 2012 Introduction

Tracing: Study runtime behavior Can be used to measure latency = fundamental for RT debug

Tracer requirements: Low-overhead Consistant maximum latency

Contribution: Methodology and tool to measure real-time latencies (NPT) Application of NPT to measure LTTng-UST latency Improvements to the real-time behavior of LTTng

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 1 / 24 Table of Contents

1 Research Context Real-Time Operating Systems The Tracing Toolkit next-generation, LTTng

2 Test environment Presentation of the test environment System verification

3 Baseline results The Non-Preempt Test tool Latency results

4 Improving LTTng added latency Identify the source of the latency Latency results and comparison

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 2 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux

Xenomai: ADEOS thin-kernel management – non-RT cannot preempt RT Hard Real Time ? Espace utilisateur (User-space) (Tâches non-temps réel) RT and non-RT tasks are independent Full Linux platform ? Noyau Linux (Non-temps réel) Tâches temps réel

Micro-noyau (Thin-kernel)

Couche matérielle (Hardware)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 3 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux

Why using the Linux kernel ?

Able to do Soft Real Time, can reach Hard Real Time : BIOS configuration: would you use hyperthreading ? Kernel configuration: PREEMPT_RT patch, which is more and more integrated to the standard kernel Software configuration: redirection, cpu shielding. . .

The power of the community

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 4 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency The Linux Tracing Toolkit next-generation, LTTng

Why LTTng is pertinent for RT applications ? Both userspace and kernel tracers (same clock source) Statically compiled tracepoints External to consume events Arbitrary event types (Common Trace Format) Per-CPU ring buffers Important tracing variables protected by RCU

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 5 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency How LTTng-UST consumer works (simplified version)

Ring buffers

Read events in released filled Write events buffers

LTTng LTTng consumer instrumented sys_write to inform that application current buffer is full

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 6 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency Test environment

Hardware: CPU Intel R CoreTM i7 CPU 920 2.67 GHz RAM3 × 2 GiB DDR3 at 1 067 MHz Motherboard Intel DX58SO

Kernels: Standard debian Linux kernel 3.2.0-3-amd64 package version 3.2.21-3 RT debian Linux kernel 3.2.0-3-rt-amd64 package version 3.2.23-1

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 7 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency System verification

hwlatdetect (hwlat_detector): no hardware latency detected during one hour.

hwlatdetect: test duration 3600 seconds parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None

Starting test test finished Max Latency: 0us Samples recorded: 0 Samples exceeding threshold: 0

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 8 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Why NPT ?

What we have with known tools: cyclictest: runs periodic tasks and calculates discrepancy between desired and real period preempt-test: verify if higher priority tasks can preempt lower ones

What we want: A high-priority process that should not stop No latency during the run of this process (no preemption)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 9 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency How NPT works ?

Sets CPU affinity Sets RT priority Locks process memory into RAM to disable swapping Disables local IRQs

Non-stop loops to calculate statistics with rdtsc

Re-enables local IRQs Prints computed statistics

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 10 / 24 tracepoint nptstart

tracepoint nptloop

13: tracepoint nptstop

Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop

1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop

1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: tracepoint nptstart 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: tracepoint nptloop 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while 13: tracepoint nptstop

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency The test procedure

Shield CPUs (cpusets)

Run NPT for 108 loops: Without tracing With LTTng kernel tracing alone With LTTng-UST tracing alone With LTTng-UST and kernel tracing

Do it on: Standard kernel PREEMPT_RT patched kernel

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 12 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results without tracing Histogram generated by NPT for 108 cycles on standard and RT kernels

109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 13 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels

109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 14 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST tracing Histogram generated by NPT for 108 cycles on standard and RT kernels

109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 15 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST and kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels

109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 > 150 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 16 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Identify the source of the latency

Problem Latency added by the LTTng-UST tracing synchronization

Proposed solution Removing synchronization between instrumented application and LTTng consumer: The consumer will now poll to verify if a buffer is full Permanent polling (100% CPU use) and usleep-timed polling = same performances (CPU shielding) Removing LTTng-UST getcpu system call

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 17 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results on the standard kernel Histogram generated by NPT for 108 cycles with original and modified LTTng-UST

109 Original LTTng UST 108 Modified LTTng UST 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 18 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results on the RT kernel Histogram generated by NPT for 108 cycles with original and modified LTTng-UST

109 Original LTTng UST 108 Modified LTTng UST 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 22 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 19 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results with modified LTTng-UST Histograms generated by NPT for 108 cycles on standard and RT kernels

109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103

Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs)

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 20 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Numeric comparison Statistics per cycles, in nanoseconds, generated by NPT on both standard and RT kernels for both the original and modified versions of LTTng-UST

Latencies in ns Kernel standard RT LTTng original modified original modified Minimum 287 198 289 197 Mean 317 220 322 206 Maximum 121 744 5 847 19 837 5 088 Variance 74.778 1.186 1.813 1.027 Deviation 273 34.44 42.58 32.05

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 21 / 24 Conclusion

Non-Preempt Test tool

Effects of LTTng tracing on both standard and RT kernels

Modified LTTng according to our observations

Latency is currently as low as 5 µs on both standard and PREEMPT_RT patched kernels

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 22 / 24 Future Work

Continue LTTng-UST latency hunt Integrate our changes in the main branch Study the real-time behavior in non shielded environments Clarify the process of CPU isolation with respect to per-CPU kernel tasks such as RCU reclamation, timer update, . . .

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 23 / 24 Thank you. Any question ?

LTTng www.lttng.org mailing list: [email protected]

NPT: git.dorsal.polymtl.ca/?p=npt.git

Slides: www.dorsal.polymtl.ca/~rbeamonte/ dorsal-pm-dec2012.pdf

Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 24 / 24