REAL-TIME & TOOLS FOR THE JETSON TX1 KEN JACKSON

MAY 8, 2017 2017 - CONFIDENTIAL AND PROPRIETARY INFORMATION REDHAWK Real-Time Linux RedHawk kernels are based on NVIDIA’s kernel source plus 280 Concurrent-developed real-time feature patches Approximately 100K lines of Concurrent code added Time-proven APIs for hard real-time development High-performance real-time features can be used to improve application performance Provides many transparent performance enhancements Does not depend on the open-source PREEMPT_RT patch REDHAWK Real-Time Linux Software is installed on a Jetpack-initialized Jetson TX1 RedHawk real-time Linux kernels • Run-time — absolute highest performance, but rarely used • Trace — extremely low overhead kernel event tracing • Debug — mainly used for driver debug Command-line tools Libraries and header files Documentation REDHAWK CPU Shielding Best approach for maximizing real-time performance Shielded CPUs are sheltered from all unnecessary activity Real-time processes can then be assigned to shielded CPUs for lowest latency and maximum determinism in execution times Ability to disable some or even all Automatically moves kernel daemons off of shielded CPUs Minimizes the effects of cross-processor interrupts Fully dynamic configuration while system is running REDHAWK CPU Shielding The shield command is unique to RedHawk Used to shield CPUs from one or all of: • Arbitrary processes • Arbitrary interrupts • Local-timer interrupts Examples • shield –p 1,3 –i 0 –l 2 • shield –a 1-3 • shield –r REDHAWK Binding The run command is unique to RedHawk Can specify CPUs, scheduling class, priority and time quantum Use at program startup or to change running processes Memory lock process’ current & future pages, no source required Display bindings for one, all, or any subset of processes Examples • run –b 1 –s fifo –P 90 ––lock=all ./controller • run –b 2-3 –s rr –q 50ms –n data-logger REDHAWK Kernel Event Tracing Fully lockless kernel event tracing implementation Extremely low-overhead event tracing with per-CPU buffers Dynamically enable/disable tracing of all or a subset of events Dynamically enable/disable tracing on all or a subset of CPUs Many modes of operation, including exhaustive and wrap- around Events can be viewed live or collected for later analysis Much more about kernel event tracing in the next section covering the NightStar tools REDHAWK Real-Time Demo Runs the Cyclictest real-time benchmark on a CPU • https://rt.wiki.kernel.org/index.php/Cyclictest Creates a background load using the simple POSIX Stress utility • http://people.seas.harvard.edu/~apw/stress/ Continuously samples and graphs results Interactively start and stop background load Interactively toggle between shielded and unshielded modes Keeps track of worst-case result seen, zeroed on mode change

REDHAWK Real-Time CUDA Demo Measure impact of CUDA activity on real-time performance Runs the Cyclictest real-time benchmark on a CPU Creates a background load using the CUDA Reduction example Continuously samples and graphs results Interactively start and stop background load Interactively toggle between shielded and unshielded modes Keeps track of worst-case result seen, zeroed on mode change

REDHAWK Formal Benchmarking 12 hour Cyclictest results on Jetson TX1 RedHawk with Stress load • Min 8µs Average 16µs Max 38µs RedHawk with Stress and CUDA load • Min 8µs Average 16µs Max 49µs NVIDIA r24.2.1 kernel • Unable to run Cyclictest due to lack of voluntary preemption in kernel • Concurrent’s home grown benchmarks showed >100 milliseconds worst case REDHAWK Many More Features Frequency-based scheduler High-resolution process accounting Ptrace extensions • fast breakpoints • debugger visibility and control Optionally receive SIGBUS on page faults NUMA memory shielding and user/kernel text replication • Only on x86 today, but coming soon to ARM – via ARMv8.1- A spec NightStar for the Jetson TX1 NIGHTSTAR FOR THE JETSON TX1 Debugging & Analysis Tools

NightTrace NightView NightTune

Trace & Performance Symbolic System Utility Application Activity & Tuning Debugger NightProbe NightSim

Data Monitoring Cyclic Process Utility Scheduler NIGHTSTAR NIGHTSTAR

Hosted on a variety of Linux distributions • CentOS, Fedora, Ubuntu, RHEL • X86 and ARM64 systems Targets include ARM64 and X86 Linux systems • Certified ARM64 targets include NVIDIA Jetson TX1™ Applied Micro Circuits X-C1™ • X86 Any 32-bit or 64-bit bit Intel or AMD64 system • Host and target system may be the same machine • Cross-target usage is supported in either direction (X86 or ARM64) NIGHTSTAR NightTrace Incredibly powerful method of tracking code activity and system activity in a time-synchronized graphical display. Presents a cohesive view of the operation of individual threads, processes, CPUs, and the as a whole. Invaluable tool to troubleshooting developing versions of software and on-site customer problems. CUDA Tracing – GPU kernels, CUDA API usage. NIGHTTRACE

Allows programmers to automatically trace CUDA API function calls and examine the values of parameters passed and returned without changing their source code.

Allows users to add trace points into the CUDA kernels that are executed by the GPU. Provides CUDA-centric display panels for analysis. NIGHTTRACE Linux Kernel Tracing

Concurrent patches to kernel.org. Real-time performance (very minimal overhead). Scalable (operates well on systems with large # cpus). Tracepoints are already inserted in the kernel. While useful to kernel developers, it is aimed at users who need to understand what is happening in the kernel and their application. NIGHTTRACE NIGHTTRACE NIGHTTRACE & CUDA NIGHTTRACE GPU-TRACING NIGHTSTAR NightView

A complete symbolic debugger supporting Ada (Concurrent’s Ada Product), Fortran, C/C++. Debugs multiple threads, multiple processes, on multiple system all from a single interface. Superior multi-threaded debugging features, especially important for real-time applications. Designed for the lowest amount of process intrusion. Includes debugging CUDA user applications on the GPU.

Wipes out “heisenbugs” NIGHTVIEW NIGHTVIEW & CUDA NIGHTTUNE

Provides a graphical and integrated view of a wide range of system metrics, process activities, and real-time performance. It’s more than just graphical presentation. A user interface to control RedHawk CPU shielding; a key part of RedHawk’s value-add. Provides for remote monitoring and management. All metrics and events it measures can be recorded for off- line analysis. Provides details about the GPU cards installed in the system and dynamic statistics of GPU activity. NIGHTTUNE NIGHTTUNE NIGHTTUNE & CUDA Configuration Panel Details about each CUDA device • Kernel Version • Compute Capability • Clock Speeds • Cores, Warps, and Lanes • Grid & Thread Dimensions • GPU Details • Memory NIGHTTUNE & CUDA Activity Panels NIGHTSTAR NightProbe

A non-intrusive tool for sampling data from a process Browse variables in the program Change values on the fly Provide lists, tabular and graphical display of data over time Includes support for synchronizing data capture with a process Provides an API for locating and describing variables within a program file – the basis for customer- developed applications. NIGHTSTAR NightSim Provides a graphical interface to RedHawk’s Frequency- Based Scheduler (FBS). Schedule threads and processes based on user-defined cycles. Typically driven from an source or real-time clock. Monitors thread and process execution on a per-cycle bases. Supports deadline detection. Exports scripts for use with the FBS command line interface.