
Popcorn Linux OS and Compiler Framework: lessons from 7 years of research, development, and deployments Antonio Barbalace, Pierre Olivier, Binoy Ravindran From my old slide sets (2013) … Heterogeneity Trends: Integration AMD Same chip Same machine Each image is Copyright of the respective Company or Manufacturer. Images are used here for educational purposes. From my old slide sets (2013) … Heterogeneity Trends: Specialization AMD OS-capable Fully Special general purpose and purpose general Special purpose purpose Each image is Copyright of the respective Company or Manufacturer. Images are used here for educational purposes. Popcorn Linux and Compiler Framework Project • Started at Virginia Tech, Blacksburg, VA, mid-2012 • Binoy Ravindran, Antonio Barbalace • Targets platforms with multiple groups of general-purpose processing units • Non-cache-coherent • Microarchitectural or ISA heterogenous • Initial goal • Extend the multiple kernel OS design (Barrelfish) to Linux • Provide the same OS and programming environment among processing units • OS and compiler provide SMP functionalities on non-SMP platforms Was that worth? How to do that? Today’s Wildly Heterogenous Hardware Example TPU Memory MPU Memory Memory Memory Accel Memory X Accel TPU GPU CPU CPU CPU CPU CPU CPU GPU Memory X Memory NPU Network Disk SPU Memory Lesson 1: Processing units' heterogeneity is here to stay, but no cache- coherent memories Program like SMP Popcorn Design don’t care about heterogeneity ApplicationApplication MiddlewareMiddleware Same Compiler Runtime interface OS OS OperatingSystem System Kernel OS OS OS FW rtm krn Software(OS krn) part part part Multiple communicating Accel OS krn/rtm/FW MPU DPUSPU NPU CPU CPU CPU CPU GPU TPU X Why and how? Classic Software for Heterogeneous Hardware • Software runs on CPUs • Other processing units Application App App Func Drv Drv cannot run the same Middleware Drv software as the CPUs Drv • Programmer (strictly) Compiler System partitions the application Drv Runtime • Each partition runs only Software Drv on a predefined Accel processing unit CPU CPU CPU CPU GPU DPU X • Supporting drivers, runtime, compilers What Are the Problems? • For each hardware component • Modify all software layers • Nightmare for application’s programmers • Hard to program • Difficult to port to a new platform • Poor resource utilization (performance, energy efficiency, determinism) • One programmer focuses on one application • Many applications run at the same time New Software for Heterogeneous Hardware • The OS extends among all processing units • The compiler builds Application Application applications software to Middleware Middleware run among all Compiler Runtime processing units • The runtime supports OperatingSystem System Kernel OS OS OS all processing units (OS krn) part krn krn • Programmers don’t Software have to partition the Accel CPU CPU CPU CPU GPU DPU application, which may X run everywhere, transparently Popcorn Linux runtime migration • Runtime • Runtime ISA execution migration • State transformation Application State • Based on musl C library • Compiler Framework ISA a ISA A specific ISA B specific • Offline analysis code code • Model-based code optimization Het-ISA Application Binary Source Analyzed Per-ISA • One binary per ISA Code Source Code Single System Image • Based on gcc/LLVM • Replicated-kernel Operating System Linux krn0 Linux krn1 • One kernel per ISA • Distributed systems services cpu0 cpu1 cpu2 cpu3 cpu0 cpu1 • Single system Image • ISA A ISA B Based on Linux x86 ARM Popcorn Linux – Operating System • Single System Image • Based on Popcorn namespaces (NS) • Creates a single operating environment • Migrating app sees the same OS Single System Image • Extends Linux namespaces • Popcorn NS Popcorn NS Distributed OS Services Popcorn Popcorn • Task (thread and process) migration Linux krn0Linux krn0Services Linux krn1 Services • Native code migration • Distributed memory management (DSM) Popcorn Communication Layer • Distributed file system • Inter-kernel Communication Layer ISA A ISA B • Performance critical component x86 ARM • low-latency and high-throughput • Exclusively kernel-space PCIe • Single format among ISAs TCP/IP RDMA etc. Popcorn Linux – Task Migration • Process Migration • Thread Migration • Whole application is transferred • Selected threads are transferred • All threads, user- & kernel-state • Threads' state is transferred • No dependecies are left on the • Kernels coordinate to maintain origin kernel application state consistent t2 t3 t2 t3 t2 t3 t2 t3 t2 t3 t2 t3 t1 t1 t1 t1 t1 Application Application Application Application Application Single System Image Single System Image Single System Image Single System Image krn0 services krn1 krn0 services services krn1 krn0 services krn1 krn0 services krn1 cpu0 cpu1 cpu2 cpu3 cpu0 cpu1 cpu2 cpu3 cpu0 cpu1 cpu2 cpu3 cpu0 cpu1 cpu2 cpu3 Popcorn Linux – Thread Migration’s DSM • Replicated virtual address space • Kept consistent among kernels • Page coherency protocol • Based on Modified-Shared-Invalid (MSI) cache coherency protocol • Memory page granularity instead of cache line granularity • Additional states to improve performance • Scaled from two kernels to multiple kernels Popcorn Linux – Compiler/Runtime • Profiler runtime • Performance and power profiles migration • Function and sub-function granularity • Output performance and power code indicators • Affinity estimations with cost model Application State • Compiler Toolchain ISA a ISA A specific ISA B specific • Output heterogenous-ISA binary code code (native) Het-ISA Application Binary • Common address space (including TLS) Source Analyzed Per-ISA • Insert migration points (fun boundaries) Code Source Code ISA A ISA B • Add state transformation metadata x86 ARM • Runtime Framework Compiler Runtime Profiler • Support task migration Toolchain Framework • Implements state transformation • Stack-transformation (rewriting) • Register-transformation Popcorn Linux – Compiler ISA A ISA B • Produces program binaries for each ISA x86 ARM • Common address space • Common type system • Each symbol at same virtual address on any ISA • No address space conversion! • Common thread-local storage (TLS) layout • x86_64 layout forced • No TLS conversion! • Migration points • Cannot migrate at any instruction Program • State-transformation meta-data in binaries Binary • E.g., var properties, stack frame offsets (Merged) Virtual Address Space Popcorn Linux – Runtime Stack Transformation x86_64 Stack aarch64 Stack x86_64 Register State aarch64 Register State Popcorn Linux Results • Ease programmability Homogeneous System Heterogeneous System • Enable portability (and legacy support) • Improve resource utilization • Runtime decisions (vs static) 30% • On heterogeneous-ISA [1] 66% • Up to 3.5x more performant than other heterogeneous frameworks • On fully heterogeneous-ISA [2] • Up to 66% better energy consumption for bursty arrivals [1] “Bridging the Programmability Gap in Heterogeneous-ISA Platforms” [2] “Breaking the Boundaries in Heterogeneous-ISA Datacenters” A. A. Barbalace et al., EuroSys '15 Barbalace et al., ASPLOS '17 First 5 years of the project in Summary • Gigantic Engineering Effort Lesson 2: very complex to build and debug because • Operating Systems development affects several • Multiple kernels Linux software layers • Repurpose monolithic Linux kernel as a message-passing kernel • Convert Linux’s subsystems from SHM to SHM+message-passing • Compiler/Linker Lesson 3: instead of Linux, • Common address space layout, per-ABI stack layout Darwin or DragonFly BSD may • Compile into different ISA binaries with LLVM/gold have reduced development time • Insert equivalence points at which stacks can be converted (stackmaps) • Runtime Library Lesson 4: LLVM as a cross- • Extended standard library (based on muslc) compiler saved a lot of time, and • Provide “builtin” functions to convert and migrate at eq points muslc supports a large amount of apps Feedback from Industry and Academia #1 Lesson 5: for production apps, • Constraining dependencies that use hacks for performance, transparency is hard to provide • Need application source-code • Eventual code modifications • and compiler script rewriting Lesson 6: impossible to keep up with upstream developments – • Must use Popcorn Linux Compiler Framework fix one version • Specific version of LLVM • Specific version of musl C library Lesson 7: adding a new CPU • Must use Popcorn Linux kernel architecture may be • Few kernel versions and CPU architectures supported incompatible with previous assumptions (32bit?) • Limited POSIX support • Not all Linux subsystems supported Lesson 8: cannot support all Linux subsytems, need automatic way to convert subsystems into SHM+MSG Feedback from Industry and Academia #2 • Limiting factors Lesson 9: Implement functionalities in modules or • Not well integrated in the Linux kernel nor in LLVM plugins to minimize patching • Requires Linux kernel patching • Requires LLVM patching Lesson 10: for dynamically • Doesn’t support dynamically compiled code compiled code, need to control • Including JIT, self-modifying, etc. the way code is generated • E.g., Java, .NET • Restricted library support Lesson 12: • Doesn’t support dynamic libraries containers/namespaces nice • Cannot migrate in library-code (if not recompiled) abstraction for migration • Supports application/container migration • Doesn’t generalize to VMs Lesson 11: a more generic techniques is needed to runtime migration among VMs (Popcorn relies on the syscall abstraction)
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages27 Page
-
File Size-