Scientific Visualization on Gpu Clusters Peter Messmer, Nvidia

SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA Visualization ≠ Rendering Visualization Coordinate Field Operators transformations Isosurfaces, (Gradient, Curl,.. ) Isovolumes Clip, Slice Streamlines Feature extraction Rendering Binning, Surface Resample Line Rendering Rendering Thresholding Volume Compositing Rendering Visualization Filtering Field Operators Coordinate transformations FieldIsosurfaces Operators, (Gradient, Curl,.. ) (Gradient,Isovolumes Curl,.. ) Streamlines Streamlines Clip, Slice Feature Feature extraction Binning, extraction Rendering Resample Surface Line Rendering Rendering Isosurfaces, Thresholding Line Isovolumes Volume CompositingRendering Rendering THE VISUALIZATION PIPELINE Simulation Filtering Rendering Compositing Visualization Simulation: Data as needed in numerical algorithm Filtering: Conversion of simulation data into data ready for rendering Typical operations: binning, down/up-sampling, iso-surface extraction, interpolation, coordinate transformation, sub-selection, .. Sometimes embedded in simulation Rendering: Conversion of shapes to pixels (Fragment processing) Compositing: Combination of independently generated pixels into final frame PARTICULARITIES IN HPC VIZ Parallelism Remoteness Heterogeneity Images may be subject to copyright OUTLINE Tools Paraview, Visit, others Rendering Enable HW rendering on Tesla Remote rendering Compositing Delivery In-situ visualization High-level overview. Some parts platform dependent. Check with your sysadmin. VISUALIZATION APPLICATIONS PARAVIEW Scalar, vector and tensor field data features Plots: contour, curve, mesh, pseudocolor, volume,.. Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis Derived fields, dimension reduction, line-outs Pick & query Scalable architecture Developed by Kitware, open source http://www.paraview.org PARAVIEW’S SCALABLE ARCHITECTURE Client-server-server architecture Server MPI parallel Distributed filtering GPU accelerated, parallel rendering* * requires X server on each node VISIT Scalar, vector and tensor field data features Plots: contour, curve, mesh, pseudo-color, volume,.. Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis Derived fields, dimension reduction, line-outs Pick & query Scalable architecture Open source http://wci.llnl.gov/codes/visit/ VISIT’S SCALABLE ARCHITECTURE Client-server architecture Server MPI parallel Distributed filtering (multi-)GPU accelerated, parallel rendering* * requires X server on each node SOME OTHER TOOLS Wide range of visualization tools Often emerged from specialized application domain Tecplot, EnSight: structural analysis, CFD IndeX: seismic data processing & visualization IDL: image processing Early adopters of visual programming AVS/Express, OpenDX FURTHER READING Paraview Tutorial: http://www.paraview.org/Wiki/The_ParaView_Tutorial VisIt Manuals/Tutorials: http://wci.llnl.gov/codes/visit/manuals.html RENDERING OPENGL: API FOR GPU ACCELERATED RENDERING • Primitives: points, lines, polygons • Properties: colors, lighting, textures, .. • View: camera position and perspective • Shaders: Rendering to screen/framebuffer • C-style functions, enums See e.g. “What Every CUDA Programmer Should Know About OpenGL” (http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf) A SIMPLE OPENGL EXAMPLE State-based API glColor3f(1.0f,0,0); (sticky attributes) glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner Drawing glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner glEnd(); glFlush(); Render to screen glColor3f(1.0f,0,0); float* vert={-1.0f, -1.0f, ..}; float* d_vert; glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); glVertex3f(-1.0f, 1.0f, 0.0f); ? cudaMalloc(&d_vert, n); glVertex3f(1.0f, 1.0f, 0.0f); cudaMemcpy(d_vert, vert, n, glVertex3f(1.0f, -1.0f, 0.0f); cudaMemcpyHostToDevice); glEnd(); renderQuad<<<N/128, N>>>(d_vert); glFlush(); flushToScreen<<<..>>>(); CUDA-OPENGL INTEROP: MAPPING MEMORY • OpenGL: Opaque data buffer object (Mention here that this is mainly in “legacy” OpenGL and more control available in modern OpenGL) • Vertex Buffer Object (VBO) • User has very limited control • CUDA: C-style memory management • User has full control • CUDA-OpenGL Interop: Map/Unmap OpenGL buffers into CUDA memory space cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags); cudaGraphicsMapResources(1, cuda_vbo, 0); CAN ALL GPUS SUPPORT OPENGL? GeForce : standard feature set including OpenGL 4.5 Quadro : + certain highly accelerated features (e.g. CAD) Tesla, K20X,m : Requires setting GPU Operation Mode (GOM) to “All on” Tesla K40, K80: GOM “All on” by default nvidia-smi –-query-gpu=gom.current nvidia-smi –q OPENGL CONTEXT State of an OpenGL instance Incl. viewable surface Interface to windowing system Context creation: platform specific Not part of OpenGL Handled by Xserver in Linux/Unix-like systems New alternative: EGL (currently: OpenGL ES only) GLX: Interaction X<-> OpenGL OPENGL ON HEADLESS SERVERS Application Application libGLES libGL libGLX Xlib libGLES libGL libEGL Xlib OpenGLES OpenGL OpenGL ES OpenGL GLX EGL 3D X Server 3D X Server Driver Driver GPU GPU Current State Current State for OpenGL ES applications REMOTE RENDERING LOCAL RENDERING Application libGL Xlib GLX X11 OpenGL Events Commands 2D/3D X Server Driver GPU, monitor attached X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING Application X11 Commands libGL Xlib X11 Events OpenGL/GLX On remote system: 2D/3D X Server export DISPLAY=59.151.136.110:0.0 Driver GPU, monitor attached Network SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION Application libGL Xlib GLX X11 X11 Events Cmds OpenGL 2D/3D X Server Client Driver Driver GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING Application libGL Xlib GLX X11 X11 Events Cmds OpenGL Images 2D/3D X Server Client Driver Driver Scraper GPU GPU, monitor attached Network CAPTURE/ENCODING WITH NVFBC/NVIFR Application libGL Xlib GLX X11 X11 H.264 Events Cmds OpenGL Images 2D/3D X Server Client Driver Driver GPU NVFBC/NVIFR GPU, monitor attached Network https://developer.nvidia.com/grid-app-game-streaming https://developer.nvidia.com/nvidia-video-codec-sdk GLX FORKING WITH INTERPOSER LIBRARY Application libGL VirtualGL Xlib X11 X11 Events Cmds GLX Proxy X OpenGL Images Server Images 3D X Server Client App Driver Driver GPU GPU, monitor attached Network EXAMPLE OF REMOTING SOLUTIONS TurboVNC + VirtualGL + Open source solution + Compressed image transport + Remote GPU accelerates OpenGL http://www.virtualgl.org NICE DCV + Commercial grade product + H264 encoded video stream http://www.nice-software.com/products/dcv PARALLEL VISUALIZATION PARALLEL VISUALIZATION Domain decomposition Parallelism at multiple levels Filtering Rendering - Both supported by VisIt & Paraview - Heavy lifting already done! - Typically biggest challenge: Setup in parallel environment - Both tools provide support for most common cases - Both VisIt & Paraview MPI parallel -> need custom build PARALLEL COMPOSITING WITH ICET • Each node renders fraction of image • Sort last compositing • Highly scalable • Widely used (Paraview, VisIt .. ) http://icet.sandia.gov IN-SITU VISUALIZATION LEGACY WORKFLOW File system pressure Sim Sim Sim Long time-to-science Viz Viz Viz Time PIPELINED IN-SITU ANALYSIS Sim Sim Sim Reduced file system consumption Early detection of faulty runs Reduced Viz Viz Viz Time DIFFERENT VISUALIZATION SCENARIOS Compute Viz Filesystem 1) Legacy workflow GPU GPU Separate compute & viz system HPC System Viz System Communication via filesystem 2) Partitioned HPC system Compute Network Viz Different nodes for viz & compute GPU GPU Communication via High Perf Network HPC System 3) Co-Processing Compute Compute on GPU, viz on CPU +Viz Compute and visualization on same GPU nodes HPC System SUPPORT FOR IN-SITU VISUALIZATION Paraview: Catalyst VisIt: LibSim Instrument application to expose data Paraview: Adaptor LibSim: Data access callbacks Potential for interactive steering S5815, today, 3:30: Programming Pointers to Optimize your In-Situ Visualization Pipeline S5710, Fri 9:30: In-Situ Data Analysis and Visualization: ParaView, Calalyst and VTK-m SUMMARY Visualization is more than rendering Filtering often expensive component Fast rendering can enable new applications (interactive supercomputing) Tesla systems can be used for rendering On some systems requires GOM=All ON Xserver or EGL Remote visualization Supported by common tools Take advantage of available hardware (H264 encoder) Parallel Rendering Use IceT for compositing In-Situ Visualization Supported by common tools THANK YOU .

Scientific Visualization on Gpu Clusters Peter Messmer, Nvidia

THINC: a Virtual and Remote Display Architecture for Desktop Computing and Mobile Devices

Virtualgl / Turbovnc Survey Results Version 1, 3/17/2008 -- the Virtualgl Project

Supporting Distributed Visualization Services for High Performance Science and Engineering Applications – a Service Provider Perspective

SUSE® Linux Enterprise Desktop 12 and the Workstation Extension: What's New ?

Release Notes for Xfree86® 4.8.0 the Xfree86 Project, Inc December 2008

Release 0.11 Todd Gamblin

Vsim User Guide Release 10.1.0-R2780

HOW to VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA

Package Name Software Description Project

Upgrading and Performance Analysis of Thin Clients in Server Based Scientific Computing

Building a 3D Graphic User Interface in Linux

Remote Visualization