Empowering Data-Driven Discovery with a Lightweight Provenance Service for High Performance Computing
Total Page:16
File Type:pdf, Size:1020Kb
Empowering Data-driven Discovery with a Lightweight Provenance Service for High Performance Computing Yong Chen Associate Professor, Computer Science Department Director, Data-Intensive Scalable Computing Laboratory Site Director, Cloud and Autonomic Computing Center Texas Tech University NITRD’s MAGIC Webinar, April 3, 2019 Data Challenges Scientific discovery becomes highly data intensive (“big data”) Both experimental data and observational data More real-world examples Factor of 1000x increase in less than a decade! Present day real world: Phones: 100+ Gigabytes Science and Business: 100s to 10,000s of Petabytes a ~~g 2 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 Reasons behind Data Revolution Rapid growth in computing capability has made data acquisition and generation much easier § Esp. when compared with a much slower increase in I/O system bandwidth High-resolution, multi-model scientific discovery requires and produces much more data The needs that insights can be mined out of large amounts of low- entropy data have substantially increased over years § Data-driven science v.s. model-driven (computational) science Scientific breakthroughs are increasingly powered by advanced computing (HPC) plus data understanding capabilities a ~~g 3 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 Our Vision To create a holistic collection, management, and analysis software infrastructure of provenance data § Lightweight Provenance Service for high performance computing Objectives § Run as an always-on service to collect and manage provenance for batch jobs transparently § Capture comprehensive provenance with accurate causality to support a wide range of use cases, and § Provide easy-to-use analysis tools for scientists and system administers to explore and utilize the provenance a ~~g 4 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 What is Provenance In general, provenance is documented history of an object and particularly useful to provide evidence for the originality of an art work Little Dancer Aged Fourteen 1. Degas, Edgar (created1878-1881) 2. René De Gas (heritage 1917) 3. Adrien-Aurélien Hébrard (a contract 5/3/1918) 4. Nelly Hébrard (heritage 1937) 5. M. Knoedler & Company, Inc. (cosigned 1955) 6. Paul Mellon (purchased 5/1956) 7. National Gallery of Art (bequest 1999). - From National Gallery of Art website In computer science, provenance means the lineage of data, including processes that act on data and agents responsible for those processes. a ~~g 5 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 What is Provenance name, id, group, permission, … machine name, ip_addr, dc, rack, … Relationships process id, job, machine, reads, writes, start_ts, finish_ts, … job id, params, config, inputs, outputs, start_ts, finish_ts, … IHPC file name, location, permission, parent, children, … Provenance is data lineage from all entities and the relationships among all elements that contribute to the existence of the data. 6 NITRD’s MAGIC Webinar, April 3, 2019 How to Represent Provenance: A Graph-based Model for HPC Provenance Data Based on Property Graph ModelVertex knows: 3 name: Alice type: partners name: Bob age:31 age:27 Mapping HPC provenance knows: 4 onto a property graph model type:colleagues name: Chris a e:33 name:sam @ User Entity group:cgr oup Edge name: john group: admin @ Exec11tion Entity • ,,,,, @ @ File Entity Properties/Attributes @◄~------@'l'UII ~ • • ~ n-:job20140S .,,,,.1'!' ,NU. ,, , par-,... ., ...-n 102' @. ' ,'writ,,, a~' , J/ ta'. 2 0l4OS01 " write.Si•• • 7H re' n~:app-01 n~:dset-1 • •• ' • • • ~ ■....111:e:.. , 2S6KB.. ... 11111:e .1020M .. ... , .. .... 7 e ~~gComputing Center NITRD’s MAGIC Webinar, April 3, 2019 How to Represent Provenance: A Graph-based Model for HPC Provenance Data Entity => Vertex § Data Object: represents the basic data unit in storage § Executions: represents applications including Jobs, Processes, Threads § User: represents real end user of a system § Allow to define your own entities Relationship => Edge Data User Execution § The relationship from Row to Column Object U ·er run § Reversed relationships also are defined exe, belong, Execution wa RunB read, § belongs/contains is general contain write § Allow to define your own relationships exedB , Data belong , wasReadB , Object contain. Attributes => Property was Written By § Work on both Entity and Relationship § Stored as Key-Value pairs attached on vertices and edges § Allow to define your own properties a ~~g 8 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 Requirements on Managing Provenance in HPC • Performance Requirements: • HPC users are performance sensitive • Managing overhead should be less than 1% slowdown and less than 1MB memory footprint per core • Coverage Requirements: • Provenance generated from multiple physical locations • Provenance could have various granularities • Transparency Requirements: • Users should not change or recompile their codes for provenance • More aggressively, users should not disable it when provenance is used in critical missions a ~~g 9 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 How to Collect and Manage Provenance LPS Overall Architecture in HPC r------------------- 1 Compute Nodes A Single Server User Space I A Single Server LPS Aggregator LPS Aggregator user processes Local LPS System Call Layer /Proc LPS Tracer Login Nodes Login Provenance Kernel Space Distributed LPS Cluster Distributed LPS Cluster Data Provenance LPS Builder LPS Builder LPS Builder LPS Builder ...,... Parallel File System Server Server Server Server a ~~g 10 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 How to Collect and Manage Provenance LPS Tracer LPS leverages kernel instrument to collect detailed runtime events to build provenance [among three methods] - Method Transparency No Privilege Dynamic Provenance AP/s X ✓ X Library Wrapper X ✓ X Kernel Instrumentation ✓ X ✓ To support flexible granularity, it needs to enable/disable probing read/write events § Dynamic Probe § Two kernel instrument scripts (Systemtap) § The second one only probes read/write events § Can be disabled/enabled accordingly in runtime a ~~g 11 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 How to Collect and Manage Provenance LPS Aggregator 1. Monitoring overhead and direct granularity change 2. Pruning noisy events to improve performance • Instrumentation introduces overheads 500 • Instrument Read/Write towards an application issuing 1M 1-byte writes 400 + .svi' QJ 300 • The aggregator monitors read/write E i= frequency 200 • a counter records the events 100 • a timer that resets the counter No Trace Empty Read/Write Probe LAPS Read/Write Probe • notify and change granularity a ~~g 12 ~ Computing Center NITRD’s MAGIC Webinar, April 3, 2019 How to Collect and Manage Provenance LPS Aggregator 1. Monitoring overhead and direct granularity change 2. Pruning noisy events to improve performance libselinux.so.1 UNDEFINED UNDEFINED icon-theme.cacheUNDEFINED /usr/share/emacs/24.5/lisp/emacs-lisp/cl-extra.elc 1952+0UNDEFINED /usr/share/pixmaps bash UNDEFINED /usr/share/emacs/site-lisp/site-start.d/systemtap-init.el UNDEFINED /usr/bin/sed UNDEFINED libelf-0.161.so bash UNDEFINED libcap.so.2.24 UNDEFINED/usr/share/icons/Adwaita/cursors/watch libtinfo.so.5.9 /usr/share/icons/Adwaita/cursors/sb_v_double_arrow UNDEFINED bash /usr/share/emacs/24.5/lisp/progmodes/cc-engine.elc libpcre.so.1.2.5 /usr/share/emacs/24.5/lisp/emacs-lisp/cl-seq.elc UNDEFINED UNDEFINEDUNDEFINED /usr/share/emacs/site-lisp/site-start.d/desktop-entry-mode-init.el /usr/share/icons/Adwaita/24x24/actions/edit-find.png /usr/share/emacs/24.5/etc/charsets/MULE-is13194.map UNDEFINED UNDEFINED/usr/share/emacs/24.5/etc/images/checked.xpm/home/daidong/.cache/mozilla/firefox/0vphsm5b.default/cache2/index.log /home/daidong/.mozilla/firefox/Crash Reports/InstallTime20150518070114bash dirname /usr/share/emacs/24.5/lisp/emacs-lisp/cl-loaddefs.el /usr/share/emacs/24.5/etc/images/search.xpmbasename DejaVuSans.ttf libpciaccess.so.0.11.1 UNDEFINED /usr/share/icons/Adwaita/24x24/actions/document-new.png /usr/share/emacs/24.5/etc/images/close.xpm UNDEFINED /home/daidong/DocumentsUNDEFINED UNDEFINED /usr/share/icons/Adwaita/cursors/left_ptr /usr/share/icons/Adwaita/24x24/actions/document-save.pngUNDEFINED/usr/share/emacs/24.5/lisp/emacs-lisp/gv.elc /usr/share/emacs/24.5/etc/images/icons/hicolor/scalable/apps/emacs.svgUNDEFINED /usr/share/emacs/24.5/lisp/progmodes/cc-mode.elc /usr/share/emacs/24.5/lisp/emacs-lisp/derived.elcUNDEFINED UNDEFINED gtk30.mo /usr/share/emacs/24.5/lisp/progmodes/cc-fonts.elc /usr/share/emacs/24.5/lisp/emacs-lisp/cconv.elc/usr/share/emacs/24.5/etc/charsets/symbol.map locale-archive bash /usr/share/icons/Adwaita/24x24/apps/system-file-manager.png /dev/nullUNDEFINED mime.cache /usr/share/icons/Adwaita/24x24/actions/edit-undo.png UNDEFINED UNDEFINED UNDEFINEDUTF-16.so UNDEFINED UNDEFINED /usr/share/emacs/site-lisp/systemtap-mode.el /usr/share/emacs/24.5/etc/images/open.xpm /home/daidong UNDEFINED /usr/share/emacs/24.5/etc/images/cut.xpm /run/user/1000/gdm/Xauthority /home/daidong/.emacs.d/auto-save-list UNDEFINED /usr/share/emacs/24.5/etc/images/save.xpm firefox UNDEFINED libdrm_intel.so.1.0.0 /usr/share/emacs/24.5/lisp/emacs-lisp/easymenu.elc Raw system events from UNDEFINED /usr/share/emacs/24.5/etc/images/diropen.xpmUNDEFINED/usr/share/icons/Adwaita/index.theme libLLVM-3.5.so /usr/share/icons/Adwaita/24x24/actions/document-open.png/usr/share/zoneinfo/America/Chicago/proc/15458/task/15459/stat/usr/share/icons/default/index.theme