Transparently Consistent Asynchronous Shared Memory
Hakan Akkan, Latchesar Ionkov, & Michael Lang LA-UR-13-20182
U N C L A S S I F I E D Slide 1
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Motivation: Current Trends
n High core count sockets and nodes
n Power is a big concern
n Future architectures suggest • Lower clock speeds • Less memory per core
n New memory technologies • NVRAM • Memory Cube
n Therefore, intra-node sharing is becoming more important
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Motivation: What problem are we trying to solve?
Data Movement/Memory requirements
n Per core memory is decreasing but total memory will be increasing
n There is more data to move around for analysis, visualization etc.
n New trend: Do it all in the node, while the application is running • Reduce the amount of data to store • Reduce the power/time to transmit data
n Applications share the node and the data is now local to two or more applications
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Goals
TCASM – Shared memory for coupling independent applications
n T -- Transparent • No large code modifications • Publish version • Want to allow app to make progress
n C – Consistent • Data doesn’t change while being processed
n A -- Asynchronous • Want to allow apps to make independent progress
n SM – Shared Memory.
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Motivation: Who would benefit
n Analytics: reduce the data in the node
n Visualization: view the data in the node in REALTIME
n Checkpointing: • Burst-buffer – hierarchical storage, faster than PFS, close to node
n NVRAM Management
n Debugging: unobtrusively monitor what is happening inside the application
n Application developers • Just define data • No need to link processes • Application remain independent
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Background/Related work
n Checkpoint: BLCR, CRIU • No for sharing between apps
n Distributed memory MUNIN,…
n KSM: Kernel same page merging • Kernel thread tries to merge pages with the same data • Also uses COW • MADVISE marks regions for merging • Single process space • No versioning
n Virtual processes: DUNE • Uses virtual processes (rather than virtual machines) • Can share pages, need common process parent
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Solution
n Producer / Observer(s) Model • Producer publishes data at its natural interval • Consumer consumes at natural interval
n Need a way for applications to share data with simple semantics • Use existing MMAP system call
n No synchronization • Producer is never blocked.
n Observer sees consistent view of data
n COW • No wasted memory • Data is only copied when required • Memory use depends on how much memory producer modifies at each iteration
U N C L A S S I F I E D Slide 7
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Current methods
n Single Copy – requires locking between processes, standard shared memory
producer observer
n Double-buffer – Two buffers, still need to coordinate the buffer swap
producer current old observer
n Multibuffer – no synchronization, lots of memory (N copies, N-1 observers) current observer producer old observer older observer oldest
U N C L A S S I F I E D Slide 8
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Implementation 1 – anon-asm
n New flag added to msync() call
n Uses a combination of “mmap”ed anonymous and file backed pages to manage versions of data
n Write protect pages to get notification when changed • Not cheap but already in use with many incremental checkpoint systems.
n This implementation has higher overhead with multiple observers
Walk through an example… next
U N C L A S S I F I E D Slide 9
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Anon-asm
Producer File Content Observer
Page A Page A Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Producer writes
Producer File Content Observer
Page A Page A Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Causes COW
Producer File Content Observer
Page A Page A Page A
COW
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Anonymous page
Producer File Content Observer
Page B Page A Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Producer calls msync()
Producer File Content Observer
Page B Page A Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Producer calls msync()
Producer File Content Observer
Page B Page B Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Producer writes again
Producer File Content Observer
Page B Page B Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Causes COW
Producer File Content Observer
Page B Page B Page A
COW
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Anonymous pages
Producer File Content Observer
Page C Page B Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Observers call mmap() for new version
Producer File Content Observer
Page C Page B Page A
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Observers gets new version
Producer File Content Observer
Page C Page B Page B
File-backed Page Anonymous Page U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Implementation 2 – file-asm
n Again we use flags to msync
n When producers write we create a file-backed page, “unpublished”
n Then when producers call msync we swap files • Unlink old file, give unpublished file proper name
n Observer unmap – open “file” -- mmap to access latest copy
n Easier implementation
n Better performance with multiple observers, but there is a downside… • “file-asm” implementation incurs the "full-copy" penalty due to a Linux restriction that only allows pages to belong to a singe file • A kernel module can remedy this, work-in-progress
U N C L A S S I F I E D Slide 21
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA File-asm
Producer File Content Observer
Same flow, exchange of named files and Page A Page A Page A unpublished files COW Producer Write
Page B Page A Page A
Producer msync
Page B Page B Page A
Producer COW Write
Page C Page B Page A
Observer mmap
Page C Page B Page B
Unpublished Named Old File File Page File Page Page
U N C L A S S I F I E D Slide 22
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Results – varying pages touched
1000 100% coverage anon-asm 80% coverage anon-asm 100% coverage file-asm Double buffering 100 40% coverage anon-asm 20% coverage anon-asm
10
Completion time (s) 20% anon-asm 1
0.1 100 1000 10000 100000 Number of pages U N C L A S S I F I E D Slide 23
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
Results 100% coverage, with 1 or 10 observers 1000 Double buffering 10 O. Double buffering 1 O. 100% coverage file-asm 10 O. 100% coverage file-asm 1 O. 100
10 Completion time (s) 1
0.1 100 1000 10000 100000
NumberU N C Lof A Spages S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Application testing (new work since publication *)
n Currently integrated with LANL mini-app SNAP (SN transport proxy) • Implemented FORTRAN callable library in C — MMAP and MSYNC • Implemented consumer to copy data to stable storage ( ie burst-buffer) • Tested with multiple threads
*Doug Otstott (Florida International University)
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
“New” initial DATA
SNAP working set
120
100
80
60
%Pages changed 40
20
0 1 2 4 10 25 50 75 100 Iterations per MSYNC
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Conclusions
n Prototype shows benefit of approach
n COW shared memory for symbiotic applications
n No synchronization requirement
n Saves memory (SNAP use case)
Future Work
n Complete burst-buffer use case
n Additional use case for visualization (paraview)
n Performance with many producers/consumers in the system
n Looking at porting to Kitten/Palacious
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA Questions ???
Targeted for open source, paperwork in the system.
Thanks
U N C L A S S I F I E D
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA