AGL Real-time Architecture Options for Critical Services

Automotive Summit Tokyo June 2018 Fulup Ar Foll Lead Architect [email protected] IoT.bzh Crew

AGL Realtime Architecture Options Jun-2018 2 1St Technical Contributor

AGL Realtime Architecture Options Jun-2018 3 What is RealTime in an OS?

● Realtime means “on time” and not “faster” – Realtime is about predictability ● Typically Realtime address following concerns – 80% => Could my execution code be disturbed ? ● If then: – For how much time ? – How often ? – By who ? – 20% => How big is my latency ● Time lost between an external event, and the time my code can start to handle it ? ● Maximum latency fluctuation ?

AGL Realtime Architecture Options Jun-2018 4 Linux & Soft/Hard Realtime

● Soft realtime

● Periodical tasks/events of several milliseconds ● Some acceptable unpredictable delays (10/100 ms) ● Often implemented by resource controls ● Latency of few ms with exceptional unpredictable fluctuation +-10ms ● Hard realtime

● Total Latency magnitude of 10/100 us ● Predictable and short delays (< 250ms) ● Current vanilla Linux kernel is Soft RealTime

● Harder realtime with: – Preempt-RT – Xenomai

AGL Realtime Architecture Options Jun-2018 5 Which Automotive Apps need RT ?

● Soft Realtime

● Data Acquisition ● Audio/Video ● Hard Realtime

● Cluster ● Emergency/Safety signal

AGL Realtime Architecture Options Jun-2018 6 Linux RT Application Impact

● Standard Linux a simple “Ping Flood“ will lags applications.

● Linux network IRQs preempt applications too often and for too long, which significantly increase the latency. ● PREEMPT_RT reduces latency

● Replaces most spinlock by mutexes ● Support threaded IRQs ● Supports high resolution timers

AGL Realtime Architecture Options Jun-2018 7 Turn “ON” Linux RealTime.

● Objectives

● Decrease Application Latencies ● Guaranty that high priority tasks will not be bothered by lower priority ones. ● Make sure interrupts cannot lag your critical apps. ● Soft Realtime (Standard Kernel)

● Container, , ... ● Hard Realtime (Kernel must be patched)

● PREEMPT-RT ● IPIPE+Xenomai

AGL Realtime Architecture Options Jun-2018 8 Preempt_RT vs Xenomai

● Xenomai

● Xenomai supports to leagcy RT non-POSIX applications (eg: VxWorks) ● Dual Kernel solution brings more performances when no more than 4 cores run RT threads ● More confidence on the whole RT application (eg: /proc/xenomai statistics) ● Miss some critical Unix development tools (eg: Valgind) ● Preempt_RT

● Almost Vanilla Linux (no API/ABI changes) ● Continuous testing in OSADL QA farm ● No need for extra userspace libraries ● Less confidence in app, harder to debug, needs extra code for RT monitoring

AGL Realtime Architecture Options Jun-2018 9 Xenomai Dual Kernel Mode

AGL Realtime Architecture Options Jun-2018 10 Preempt-RT Architecture

Internal design of the RTLinux system (Image Courtesy - Linux For U)

AGL Realtime Architecture Options Jun-2018 11 Preempt_RT Latency

IRQs thread improves latency by removing “disabling of interupts”

Latency (us) Source: http://www.emlid.com/raspberry-pi-real-time-kernel

AGL Realtime Architecture Options Jun-2018 12 Xenomai & Prempt_RT convergence

● The under Xenomai can use Prempt_RT ● Xenomai 3.x offer dual kernel and native option ● Xenomai latency remains significantly better ● Some options (eg: RtNet only run Xenomai)

AGL Realtime Architecture Options Jun-2018 13 Playing Darts with Kernel Patches

● Far more matching kernel version with PREEMPT_RT patches (but harder to port)

● IPIPE patch (arm) ~= 700K (25 000 lines). Better documentation.

● PREEMPT_RT patch ~= 2.8Mo (58 000 lines)

AGL Realtime Architecture Options Jun-2018 14 RT Kernel is only a start

● Realtime requires more kernel tuning and clean behaviours on the application side.

● Enable CONFIG_PREEMPT_RT_FULL & CONFIG_HIGHRES_TIMERS to get <1ms precision ● Disable CONFIG_CPU_FREQ !

● Might conflic with power management ● There are strict rules to follow and actions to take in the application

● Stack pre-faulting ● Virtual Memory locking ● Fine tuning of threads priorities ● malloc() and friends chasing, to avoid page faults (can be difficult with some C++ libraries) ● Forbidden usage of system(); popen(); execve() … in runtime. ● Monitoring run-away threads (ie, tight loops in RT contexts) to prevent system hanging (and to allow debugging). ● clock_nanosleep is your friend, for writing periodic tasks ● Carefull initialization parameters of pthread_mutex, default ones do not have PTHREAD_PRIO_INHERIT ! ● Fancy some LTTng sessions ? (does not work with IPIPE) ● Last but not least

● Not everyting can be RT ● Providing high priority to some task means than the other will inherit of low priority ● Base you flow on lock (semaphore) and not on thread priority ● Get rid of any spin lock

AGL Realtime Architecture Options Jun-2018 15 C issues

● malloc()/realloc() do not lead to pagefault always, (though sys_brk() or sys_mmap_pgoff() because of internal memory pool of the glibc ● Thus, a RT ‘leak’ may be hard to reproduce ● Using GDB with a breakpoint on malloc() is usually sufficient ● Another technique, less intrusive, is to use Memory Allocation Hooks of the glibc. ● Some companies allow malloc() for initializations, and always forbid the free() !

AGL Realtime Architecture Options Jun-2018 16 C++ specific issues

● In C++, dynamic allocations are not always explicit ● Example: std::vector growing ● In some extra libraries (eg, boost), memory allocations may be completely out of control (in addition of alien-only-friendly backtraces)

AGL Realtime Architecture Options Jun-2018 17 RT Options inside AGL

Cluster Entertainement Cloud

Head Unix Navigation My Car Portal Maintenance Portal Service Paiement Know Bugs Direction Indication Carte handling Subcriptions Maintenances Localistion management Preference Service Packs POI

Transport & ACL Transport & ACL Transport & ACL

Cluster CAN-BUS Geopositioning Preferences Log Virtual Signal Virtual Signal Virtual & Analytics Signal Custumisation

Engine-CAN-BUS CAN-BUS Gyro, Acelerometer MongoDB Engine No-SQL Engine CAN GPS ABS LIN-BUS Paiement Service Statistics & Analytics

Hard RealTime Multi ECUSoft & Cloud RealTime Aware Architecture

AGL Realtime Architecture Options Jun-2018 18 AGL RT bindings

● The easiest option to enable RT for AGL

● No need to rewrite existing bindings ● Could easily support linux with RT-Prempt ● A subset of AGL AppFW could be ported to lighter OS (eg: Zephyr, VxWorks, …) ● Impose to create

● Portable AppFw RT transport layer ● Add RT definition to application/services ● More statistic & debug mechanism ● A bridge from AGL to Automotive Safety Services

AGL Realtime Architecture Options Jun-2018 19 Few References

● IoT.bzh AGL publications ● https://iot.bzh/en/publications ● Practical Linux RT (ELC) ● https://elinux.org/images/d/d7/Practical-Real-Time-Linux -ELCE15.pdf ● Video Intro RT Linux ● https://www.youtube.com/watch?time_continue=1&v=B KkX9WASfpI

AGL Realtime Architecture Options Jun-2018 20