Realtime Linux and Open MPI Ннаan Introduction and Case Study

IBM Linux Technology Center Real-Time Linux and Open MPI -- An Introduction and Case Study Nivedita Singhvi April 15, 2010 : Linux Foundation Collab Summit © 2010 IBM Corporation IBM Linux Technology Center . Introduction to Real-Time Linux . Running an Open MPI benchmark on Real-Time Linux (ok, not so much on this....) 2 © 2010 IBM Corporation IBM Linux Technology Center Real-Time and HPC have a lot in common.... Somewhat poorly defined terms! mean different things to different people . Include a broad range of solution space . Known for being areas of challenging requirements complex problems to solve often tends not to be default path, require custom solutions want low latency, reduced OS jitter have critical applications that need a lot of processor time 3 © 2010 IBM Corporation IBM Linux Technology Center So if not a marriage made in heaven.... we should at least be dating, right? 4 © 2010 IBM Corporation IBM Linux Technology Center Well, while not quite the Montagues and the Capulets... Opposite ends of the size spectrum, typically HPC typically massive installations, larger systems Real-Time at the smaller end typically, embedded ± still in its infancy in terms of scaling... Complex programming models vs simpler, more general needs . Real-Time is coming out of the embedded and/or custom solution space into general-purpose systems solving problems that would not have been called real-time even a decade ago 5 © 2010 IBM Corporation IBM Linux Technology Center So what is this Real-Time Linux? . Enterprise Real-Time is ªsofterº real-time The world is not going to explode should you miss a deadline or a latency guarantee . Typically want ªas fast as possibleº but as/more importantly: ªdeterministicallyº, ªpredictablyº, ªconsistentlyº 6 © 2010 IBM Corporation IBM Linux Technology Center Normal Non-Real-Time Compute Model . Excellent throughput but with wide variability . Average times may vary . No prioritisation . Favors raw throughput, latencies uncontrolled . Acceptable non-real-time behavior: 100,000 transactions 99,999 at 5ms each and 1 at 2 seconds Unacceptable for real-time systems (courtesy Paul McKenney) 7 © 2010 IBM Corporation IBM Linux Technology Center What to expect from Real Time Linux? . Real Time Linux is not necessarily Real Fast Linux!! . Determinism sacrifice throughput to cap max latencies . Highest priority real time tasks take precedence they get system resources (priority-driven environment) . The ability to shoot yourself in the foot! ªwith great power comes great responsibilityº © 2010 IBM Corporation IBM Linux Technology Center Real-Time Kernel Goals - Determistic . Deterministic maximum latency bound want to guarantee operations will take ªat most 30 secondsº no matter what happens (all paths) Better to have 300 operations that take 20ms rather than 299 operations that take 15ms and 1 operation that takes 60ms. Real-Time applications need bounded latencies Sacrifice throughput, performance for determinism Trade-off benefits high priority tasks at the expense of low priority tasks © 2010 IBM Corporation IBM Linux Technology Center Mainline idle/loaded - async-handler . Scatterplot © 2010 IBM Corporation IBM Linux Technology Center RT idle/loaded - async-handler . Scatterplot © 2010 IBM Corporation IBM Linux Technology Center Real-Time Kernel Characteristics - Priority Driven . Priority Driven Traditional OSs are optimized for performance, overall throughput BUT on RT we: Always execute the N highest priority threads on an N-CPU box ± Latency guarantees can only be met if adequate resources! SCHED_FIFO not limited by timeslices (will not relinquish the CPU!) if a higher priority thread becomes runnable, lower priority thread is migrated or suspended ± this can cause some cache thrashing ± inefficient overall (might yield almost immediately, ...) ± potential deadlock, need care while programming, administering Sacrifice system throughput for real-time priority ordered hierarchy Trade-off benefits high priority tasks at the expense of low priority tasks © 2010 IBM Corporation IBM Linux Technology Center . Why do you need a Real-Time Linux kernel? isn©t mainline (POSIX 4) mostly there? (scheduler, timers, async I/O, ...) . Mainline Linux ± a real-time scheduler (CFS), high-resolution timers, tools - chrt, ftrace, perf, ..., . Real-Time Linux (CONFIG_PREEMPT_RT) ± more preemptive kernel (user space has greater control over kernel) ± interrupt handling is different (irqs, softirqs are kernel threads) ± locking is modified (rt mutexes replace spinlocks) · priority inheritance, preemptible ± whole lot of other underlying infrastructure to support this · time, timer infrastructure; preemptive RCU, .... 13 © 2010 IBM Corporation IBM Linux Technology Center Real-Time Kernel Features - Scheduling . CFS Scheduler - default scheduling algorithm . SCHEDULING POLICIES: SCHED_RR, SCHED_FIFO, SCHED_OTHER SCHED_FIFO not limited by timeslices ± first highest priority task that©s runnable gets CPU / keeps CPU ± preempted only by higher priority task ± real-time priority between 1-99 SCHED_RR has priority but is timesliced amongst same priority level ± round robin : once timeslice completes, goes to back of runqueue SCHED_OTHER does not have a real-time priority (time-sliced, priority 0) . SCHEDULING PRIORITIES: SCHED_FIFO, SCHED_RR: 1 - 99 © 2010 IBM Corporation IBM Linux Technology Center Real Time Tuning of Applications Scheduling Class: 99 SCHED_FIFO or SCHED_RR 99 Interrupts, Softirqs, Some Kernel Threads, ... Scheduling Class: SCHED_FIFO or SCHED_RR Application Threads 11 Real-Time Non Real-Time 0 Scheduling Class: Other / Other system SCHED_OTHER 15 © 2010 IBM Corporation IBM Linux Technology Center Real-Time Kernel Features - Threaded Interrupts . HW and SW Interrupts on RT are kernel threads . Work deferred from HW interrupt to softirq (now a kernel thread) except timer interrupt . Their priorities fall somewhere between 30 - 99 (current default) can be tricky if large number of high priority threads manipulating these priorities is HIGH ART need to ensure critical kernel activities not blocked . Applications used to get interrupts which ran till they completed caused a lot of variance in the time it took to run now kernel activity competes for time with high priority real-time processes allows high priority real-time tasks to run with fewer interrupts © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? 17 © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? Depends! 18 © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? Depends! . Should HPC be paying attention to Real-Time? 19 © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? Depends! . Should HPC be paying attention to Real-Time? Depends! 20 © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? Depends! . Should HPC be paying attention to Real-Time? Depends! . Anything HPC can learn from or share with Real-Time? 21 © 2010 IBM Corporation IBM Linux Technology Center . Would HPC benefit from using Real-Time? Depends! . Should HPC be paying attention to Real-Time? Depends! . Anything HPC can learn from or share with Real-Time? Depends! 22 © 2010 IBM Corporation IBM Linux Technology Center Jitter is bad for lock-step applications (common in HPC) Core N Core 0 work sync work sync rest stalled waiting on one task Improved determinism (max latencies capped) can improve throughput 23 © 2010 IBM Corporation IBM Linux Technology Center Sources of Jitter . Jitter - variation in execution times (also OS overhead) lack of determinism . Sources of jitter for application threads: Scheduling - CPU contention Lock contention Interrupts - Hardware, Softirq (periodic, asynchronous) Variance in memory allocation times, etc... Device/Network/external factors system mgt events, throttling, ... © 2010 IBM Corporation IBM Linux Technology Center What can users do? . On mainline Linux (non-RT) real-time priority scheduling chrt to set scheduling class and priority (careful!) taskset to pin interrupts, threads . Run on RT (tuning, priorities) . Run on RT with application code changes e.g. mlockall, etc to pin memory © 2010 IBM Corporation IBM Linux Technology Center Typical....Typical.... (don©t(don©t dodo that)that) 26 © 2010 IBM Corporation Plugin High-Level View Framework Courtesy: Jeff Squyres JeffCourtesy: "What is MPI" Squyres 2008 Comp. Comp. IBM Linux Technology Center ¼ Comp. Comp. Framework Comp. ¼ Modular Component Architecture (MCA) Architecture Component Modular Comp. Comp. Framework Comp. ¼ Comp. User application Framework Comp. MPI API Comp. ¼ Comp. Comp. Framework Comp. ¼ Comp. Comp. Framework Comp. ¼ Comp. ¼ Comp. Framework © 2010 IBM Corporation IBM 2010 Comp. ¼ Comp. IBM Linux Technology Center Three Main Code Sections . Open MPI layer (OMPI) Top-level MPI API and supporting logic . Open Run-Time Environment (ORTE) Interface to back-end run-time system . Open Portability Access Layer (OPAL) OS / utility code (lists, reference counting, etc.) . Dependencies - not layers OMPI ➔ ORTE ➔ OPAL Courtesy Brad Benton © 2010 IBM Corporation IBM Linux Technology Center Three Main Code Sections User application MPI API OMPI ORTE OPAL Operating system © 2010 IBM Corporation IBM Linux Technology Center NASA©s NAS Parallel Benchmark (NPB) BT Block Tridiagonal CG Conjugate Gradient EP Embarrassingly Parallel

Load more