HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA

RANGE OF ANALYSIS AND VIZ TASKS

. Analysis: Focus quantitative

. Visualization: Focus qualitative

. Monitoring, Steering

TRADITIONAL HPC WORKFLOW

Workstation Analysis, Setup Visualization

Supercomputer Viz Cluster

Dump, Checkpointing Visualization, Analysis File System TRADITIONAL WORKFLOW: CHALLENGES

Lack of interactivity prevents “intuition” Workstation High-end viz Analysis, neglected due Setup Visualization to workflow complexity

Supercomputer Viz Cluster

Viz resources need I/O becomes main to scale with simulation Dump, simulation bottleneck Checkpointing Visualization, Analysis File System OUTLINE

. Visualization applications . CUDA/OpenGL interop . Remote viz . Parallel viz . In-Situ viz

High-level overview. Some parts platform dependent. Check with your sysadmin.

VISUALIZATION APPLICATIONS NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES

Surveyed sites:

LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES

Surveyed sites:

LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie VISIT

. Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudo-color, volume,.. — Operators: slice, iso-surface, threshold, binning,..

. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query

. Scalable architecture . Open source http://wci.llnl.gov/codes/visit/

VISIT

. Cross-platform — /, OSX, Windows . Wide range of data formats — .vtk, .netcdf, .hdf5,.. . Extensible — Plugin architecture . Embeddable . Python scriptable

VISIT’S SCALABLE ARCHITECTURE

. Client-server architecture . Server MPI parallel . Distributed filtering . (multi-)GPU accelerated, parallel rendering*

* requires X server on each node PARAVIEW

. Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudocolor, volume,.. — Operators: slice, iso-surface, threshold, binning,..

. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query

. Scalable architecture . Developed by Kitware, open source http://www.paraview.org PARAVIEW’S SCALABLE ARCHITECTURE

. Client-server-server architecture

. Server MPI parallel

. Distributed filtering

. GPU accelerated, parallel rendering*

* requires X server on each node SOME OTHER TOOLS

. Wide range of visualization tools

. Often emerged from specialized application domain — Tecplot, EnSight: structural analysis, CFD — IndeX: seismic data processing & visualization — IDL: image processing

. Early adopters of visual programming — AVS/Express, OpenDX

FURTHER READING

. Paraview Tutorial: http://www.paraview.org/Wiki/The_ParaView_Tutorial

. VisIt Manuals/Tutorials: http://wci.llnl.gov/codes/visit/manuals.html

VTK - THE VISUALIZATION TOOLKIT VISUALIZATION TOOLKIT

. Focus on visualization, not (only) rendering . Provides more complex operations on data (“filtering”) . Introduces visualization pipeline . At the core of many high-level viz tools

http://www.vtk.org

VTK PIPELINE

Source Raw data, shapes

Filter Transform raw data

Mapper Map data to geometry

Renderer Render the geometry OPENGL From the CUDA perspective OpenGL OPENGL: API FOR GPU ACCELERATED RENDERING

• Primitives: points, lines, polygons • Properties: colors, lighting, textures, .. • View: camera position and perspective • Shaders: Rendering to screen/framebuffer

-style functions, enums

See e.g. “What Every CUDA Programmer Should Know About OpenGL” (http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf)

A SIMPLE OPENGL EXAMPLE glColor3f(1.0f,0,0); State-based API (sticky attributes) glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner Drawing glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner glEnd();

Render to screen glFlush();

glColor3f(1.0f,0,0); float* vert={-1.0f, -1.0f, ..}; float* d_vert; glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); glVertex3f(-1.0f, 1.0f, 0.0f); ? cudaMalloc(&d_vert, n); glVertex3f(1.0f, 1.0f, 0.0f); cudaMemcpy(d_vert, vert, n, glVertex3f(1.0f, -1.0f, 0.0f); cudaMemcpyHostToDevice); glEnd(); renderQuad<<>>(d_vert); glFlush(); flushToScreen<<<..>>>(); CUDA-OPENGL INTEROP: MAPPING MEMORY

• OpenGL: Opaque data buffer object • Virtex Buffer Object (VBO) • User has very limited control

• CUDA: C-style memory management • User has full control

• CUDA-OpenGL Interop: Map/Unmap OpenGL buffers into CUDA memory space RENDERING FROM A VBO

Create VBO

Initialize VBO ()

nit

i

Populate

Render display() Display RENDERING FROM A VBO WITH CUDA INTEROP

Create VBO

Initialize VBO () nit

Register VOB with CUDA i

Map VOB to CUDA

Populate

Unmap VBO from CUDA

Render display() Display MAIN OPENGL-CUDA INTEROP ROUTINES . Register VBO for CUDA cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags);

. Mapping of VBO to CUDA cudaGraphicsMapResources(1, cuda_vbo, 0); cudaGraphicsResourceGetMappedPointer(&d_ptr, size, *cuda_vbo);

. Unmapping VBO from CUDA cudaGraphicsUnmapResources(1, cuda_vbo, 0);

CAN ALL GPUS SUPPORT OPENGL?

. GeForce : standard feature set including OpenGL 4.3/4.4 . Quadro : + certain highly accelerated features (e.g. CAD)

. Tesla: If GPU is in “All on” operation mode* Graphics capabilities disabled nvidia-smi –-query-gpu=gom.current nvidia-smi –q

*requires “m” or “X” class devices

Graphics capabilities enabled

OPENGL SUMMARY

+ Relatively simple for basic viz + Existing/fixed/simple rendering pipeline

- Low level - Triangles, points, rather than isosurfaces, height-fields - (currently) depends on Xserver for context creation - What about remote, parallel etc?

MORE OPENGL INFORMATION AT GTC

S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E

S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web, 3/25, 13:30, LL21C S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C S4385 - Order Independent Transparency in OpenGL, 3/25, 210C

REMOTE VISUALIZATION OPENGL CONTEXT

. State of an OpenGL instance — Incl. viewable surface — Interface to

. Context creation: platform specific — Not part of OpenGL — Handled by Xserver in Linux/Unix-like systems

. GLX: Interaction X<-> OpenGL

LOCAL RENDERING

Application

libGL

GLX

X11

OpenGL Events Commands 2D/3D X Server

Driver

GPU, monitor attached X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING

Application X11 Commands libGL Xlib X11 Events

OpenGL/GLX

On remote system: 2D/3D X Server export DISPLAY=59.151.136.110:0.0 Driver

GPU, monitor attached Network X-FORWARDING + Simple! + ssh –X + No need to run X on remote machine

- Lots of data crosses the network - All rendering performed on the local Xserver - Not useful for visualization of remote CUDA Interop apps (X-server doesn’t “see” remote GPU)

SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION

Application

libGL Xlib

GLX X11 X11 Events Cmds OpenGL 2D/3D X Server Client

Driver Driver

GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING

Application

libGL Xlib

GLX X11 X11 Events Cmds OpenGL Images 2D/3D X Server Client

Driver Driver Scraper

GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING

+ Full GPU acceleration + No X server on client

Question: when to scrape? — Xserver not informed about direct rendering — Intercept glxSwapBuffers()

- Not multi-user

=> Occasionally used for remote desktop tools

GLX FORKING: OUT-OF-PROCESS

Application

libGL Xlib X11 OpenGL/ X11 Events GLX Cmds Proxy X Server GLX Images OpenGL Images 3D X Server Client App

Driver Driver

GPU GPU, monitor attached Network GLX FORKING: OUT-OF-PROCESS

+ Full GPU acceleration + No X server on client + Multi-User

- All traffic through Proxy X server

=> Occasionally used for remote desktop tools

GLX FORKING WITH INTERPOSER LIBRARY

Application X11 Commands libGL VirtualGL Xlib X11 Events VGL transport VirtualGL GLX (compressed) client OpenGL Images Images 3D X Server 2D X Server

Driver Driver

GPU GPU, monitor attached Network GLX FORKING WITH INTERPOSER LIBRARY

Application

libGL VirtualGL Xlib X11 X11 Events Cmds GLX Proxy X OpenGL Images Server Images 3D X Server Client App

Driver Driver

GPU GPU, monitor attached Network VIRTUAL GL + TURBOVNC

+ Compressed image transport + Transparent to the application + Fully GPU accelerated OpenGL

+ Client with or without Xserver

- Requires Xserver to access GPU http://www.virtualgl.org

HOW TO SET UP VIRTUAL GL + TURBOVNC

. Requires Xserver running on server — Root privileges for Xserver

. Requires installation of VirtualGL, TurboVNC on server

. Start VirtualGL-accelerated VNC server vncserver :3

. TurboVNC viewer on client — Linux, Windows, Javascript

CONNECTING CLIENT TO REMOTE SERVER

CONNECTING TO REMOTE VNC SERVER VIA A GATEWAY

. Establish tunnel on client ssh –L 3333:node01.cluster.net:5903 login.cluster.net . Connect client to localhost:3333

port 5903 port 3333 Gateway

node01.cluster.net login.cluster.net client

LAUNCHING REMOTE CUDA/OPENGL APPLICATIONS VIA VGLRUN

vglrun :3 glxgears

export DISPLAY=:3 vglrun simpleGL

REMOTE VIZ IN PARAVIEW/VISIT

. Approach 1: VirtualGL and VNC Paraview Client & Server export DISPLAY=:3 under VirtualGL vglrun paraview

. Approach 2: Local Client — On remote server: pvserver

On local workstation: paraview

REMOTE VISUALIZATION SUMMARY

. Multiple approaches to remote rendering — X forwarding, remote viz app, scraping, interposer process/library

. Currently requires X server to generate context — EGL will fix this, but requires application changes

PARALLEL VISUALIZATION

Parallel Viz PARALLEL VISUALIZATION

. Parallelism at multiple levels — Filtering — Rendering

- Both supported by VisIt & Paraview - Heavy lifting already done!

- Challenge: Setup in parallel environment - Both tools provide support for most common cases - Both VisIt & Paraview MPI parallel -> need custom build

BASIC STEPS TO PARALLEL VISUALIZATION

. Launch visualization server processes — Most likely through queuing system

. Connect client to head node . Tunnel from workstation . Setting up on remote node

DOMAIN DECOMPOSITION OF DATA

- Visualization algorithms can work on decomposed data - May require ghost cells - May lead to load imbalance No ghost cells required

Ghost cells required PARALLEL COMPOSITING

- Distributed geometry - Render with depth information - Composition using IceT

http://icet.sandia.gov

PARALLEL VISUALIZATION

. Particularly important for large datasets . Visualization time often determined by filtering . Rendering important if — Highly complex visualization — Complex visualization effects — Low-power CPU

. Transparent support in Paraview, VisIt

IN-SITU VISUALIZATION & STEERING BENEFIT OF IN-SITU VISUALIZATION

. Pipeline simulation cycle — Visualization/analysis integral part of simulation

. Immediate feedback

. Reduce pressure on file system

. In some (future) cases: only way to analyze/visualize data

LIBSIM IN VISIT: VISIT SERVER AS A LIBRARY

CONNECTING TO A RUNNING APPLICATION

BUILD VISUALIZATION PIPELINE FOR RUNNING APPLICATION

BASIC INTERACTION/STEERING WITH RUNNING APPLICATION

INTERACTIVE VIZ APPLICATION WITH LIBSIM . User implements callbacks for VisIt — meta-data, mesh, variables and domains

. GetData: VisIt requests data for visualization — Work directly on simulation data — Transfer data from GPU

. Command server: Interaction VisIt front-end <-> simulation — “Steering” http://www.visitusers.org/index.php?title=VisIt- tutorial-in-situ

THE FUTURE

The Future VISUALIZATION DATA FLOW

Filtering Rendering

CPU

Simulation

GPU VISUALIZATION WITH GPU ACCELERATED FILTER

Filtering Rendering

CPU

Simulation Filtering

GPU VISUALIZATION WITH GPU ACCELERATED FILTER AND HARDWARE RENDERING

Filtering

CPU

Simulation Filtering Rendering

GPU FULLY GPU ACCELERATED VISUALIZATION

CPU

Simulation Filtering Rendering

GPU SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)

. Management, Analysis, Visualization — In-situ analysis, indexing/compression — I/O-, Viz frameworks — Supporting application teams

. SDAV tools deployment: Paraview, VisIt, IceT, ..

http://www.sdav-scidac.org/

SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)

. PISTON (LANL): Data Parallel Visualization Operators — Isosurface, cut, threshold — Built on top of Thrust — Support of most Paraview operators — Incorporation into VTK

. DAX (Sandia), DIY (Argonne), EVAL (ORNL)

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL 4620 - DAX: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

VIZ TALKS AT GTC2014 . S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight Center . Abel Brown ( Principal Systems Engineer, A.I. Solutions

. S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging . S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL . S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

. S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD . S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World . S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies . S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems . S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling . S4778 - Interactive Processing and Visualization of Geospatial Imagery . S4140 - Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers . S4400 - Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX . S4811 - Extreme Machine Learning with GPUs

SUMMARY

. Different methods of visualization at different levels — OpenGL, VTK, VisIt & Paraview . Remote visualization concepts — X forwarding, Viz app, scraping, GLX forking . Parallel rendering and compositing — Handled transparently in key tools . In-situ visualization concepts — Expose simulation variables to visualization tool, interactive viz . Future directions — GPU accelerated filtering and rendering

Thanks to Gilles Fourestey (CSCS), Nina Suvanphim (Cray), Jean Favre (CSCS), Adam DeConinck (NVIDIA), Robert Crovella (NVIDIA), Dale Southard (NVIDIA), Hank Childs (U Oregon), Jeremy Meredith (ORNL), Ian Williams (NVIDIA), Steve Parker (NVIDIA), Kitware, Paraview, VisIt and VirtualGL developers for their support!

ENJOY THE CONFERENCE AND SEE YOU AT GTC 2015! ABSTRACT (FOR REFERENCE ONLY) Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.