Tracing and Sampling for Real-Time Partially Simulated Avionics Systems
Total Page:16
File Type:pdf, Size:1020Kb
Tracing and Sampling for Real-Time partially simulated Avionics Systems Raphaël Beamonte Michel Dagenais Computer and Software Engineering Department École Polytechnique de Montréal C.P.6079, Station Downtown, Montréal, Québec, Canada, H3C 3A7 DORSAL Project Meeting December 5, 2012 Introduction Tracing: Study runtime behavior Can be used to measure latency = fundamental for RT debug Tracer requirements: Low-overhead Consistant maximum latency Contribution: Methodology and tool to measure real-time latencies (NPT) Application of NPT to measure LTTng-UST latency Improvements to the real-time behavior of LTTng Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 1 / 24 Table of Contents 1 Research Context Real-Time Operating Systems The Linux Tracing Toolkit next-generation, LTTng 2 Test environment Presentation of the test environment System verification 3 Baseline results The Non-Preempt Test tool Latency results 4 Improving LTTng added latency Identify the source of the latency Latency results and comparison Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 2 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux Xenomai: ADEOS thin-kernel Interrupt management – non-RT cannot preempt RT Hard Real Time ? Espace utilisateur (User-space) (Tâches non-temps réel) RT and non-RT tasks are independent Full Linux platform ? Noyau Linux (Non-temps réel) Tâches temps réel Micro-noyau (Thin-kernel) Couche matérielle (Hardware) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 3 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency RTOS: Xenomai vs. Linux Why using the Linux kernel ? Able to do Soft Real Time, can reach Hard Real Time : BIOS configuration: would you use hyperthreading ? Kernel configuration: PREEMPT_RT patch, which is more and more integrated to the standard kernel Software configuration: interrupts redirection, cpu shielding. The power of the community Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 4 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency The Linux Tracing Toolkit next-generation, LTTng Why LTTng is pertinent for RT applications ? Both userspace and kernel tracers (same clock source) Statically compiled tracepoints External process to consume events Arbitrary event types (Common Trace Format) Per-CPU ring buffers Important tracing variables protected by RCU Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 5 / 24 Research Context Test environment Real-Time Operating Systems Baseline results The Linux Tracing Toolkit next-generation, LTTng Improving LTTng added latency How LTTng-UST consumer works (simplified version) Ring buffers Read events in released filled Write events buffers LTTng LTTng consumer instrumented sys_write to inform that application current buffer is full Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 6 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency Test environment Hardware: CPU Intel R CoreTM i7 CPU 920 2.67 GHz RAM3 × 2 GiB DDR3 at 1 067 MHz Motherboard Intel DX58SO Kernels: Standard debian Linux kernel 3.2.0-3-amd64 package version 3.2.21-3 RT debian Linux kernel 3.2.0-3-rt-amd64 package version 3.2.23-1 Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 7 / 24 Research Context Test environment Presentation of the test environment Baseline results System verification Improving LTTng added latency System verification hwlatdetect (hwlat_detector): no hardware latency detected during one hour. hwlatdetect: test duration 3600 seconds parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None Starting test test finished Max Latency: 0us Samples recorded: 0 Samples exceeding threshold: 0 Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 8 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Why NPT ? What we have with known tools: cyclictest: runs periodic tasks and calculates discrepancy between desired and real period preempt-test: verify if higher priority tasks can preempt lower ones What we want: A high-priority process that should not stop No latency during the run of this process (no preemption) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 9 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency How NPT works ? Sets CPU affinity Sets RT priority Locks process memory into RAM to disable swapping Disables local IRQs Non-stop loops to calculate statistics with rdtsc Re-enables local IRQs Prints computed statistics Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 10 / 24 tracepoint nptstart tracepoint nptloop 13: tracepoint nptstop Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop 1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Algorithm of NPT’s main loop 1: i ← 0 2: t0 ← read rdtsc 3: t1 ← t0 4: tracepoint nptstart 5: while i ≤ cycles_to_do do 6: i ← i + 1 7: duration ← (t0 − t1) × cpuPeriod 8: tracepoint nptloop 9: CalculateStatistics(duration) 10: t1 ← t0 11: t0 ← read rdtsc 12: end while 13: tracepoint nptstop Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 11 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency The test procedure Shield CPUs (cpusets) Run NPT for 108 loops: Without tracing With LTTng kernel tracing alone With LTTng-UST tracing alone With LTTng-UST and kernel tracing Do it on: Standard kernel PREEMPT_RT patched kernel Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 12 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results without tracing Histogram generated by NPT for 108 cycles on standard and RT kernels 109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103 Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 13 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels 109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103 Cycles 102 10 101 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Latency (µs) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 14 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST tracing Histogram generated by NPT for 108 cycles on standard and RT kernels 109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103 Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 15 / 24 Research Context Test environment The Non-Preempt Test tool Baseline results Latency results Improving LTTng added latency Latency results with LTTng-UST and kernel tracing Histogram generated by NPT for 108 cycles on standard and RT kernels 109 Standard Linux kernel 108 Linux kernel with PREEMPT_RT patch 107 106 105 104 103 Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 > 150 Latency (µs) Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 16 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Identify the source of the latency Problem Latency added by the LTTng-UST tracing synchronization Proposed solution Removing synchronization between instrumented application and LTTng consumer: The consumer will now poll to verify if a buffer is full Permanent polling (100% CPU use) and usleep-timed polling = same performances (CPU shielding) Removing LTTng-UST getcpu system call Raphaël Beamonte – 2012 (CC BY-SA) Tracing and Sampling for RT Systems 17 / 24 Research Context Test environment Identify the source of the latency Baseline results Latency results and comparison Improving LTTng added latency Latency results on the standard kernel Histogram generated by NPT for 108 cycles with original and modified LTTng-UST 109 Original LTTng UST 108 Modified LTTng UST 107 106 105 104 103 Cycles 102 10 101 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Latency (µs) Raphaël