HOW to VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA
Total Page:16
File Type:pdf, Size:1020Kb
HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA RANGE OF ANALYSIS AND VIZ TASKS . Analysis: Focus quantitative . Visualization: Focus qualitative . Monitoring, Steering TRADITIONAL HPC WORKFLOW Workstation Analysis, Setup Visualization Supercomputer Viz Cluster Dump, Checkpointing Visualization, Analysis File System TRADITIONAL WORKFLOW: CHALLENGES Lack of interactivity prevents “intuition” Workstation High-end viz Analysis, neglected due Setup Visualization to workflow complexity Supercomputer Viz Cluster Viz resources need I/O becomes main to scale with simulation Dump, simulation bottleneck Checkpointing Visualization, Analysis File System OUTLINE . Visualization applications . CUDA/OpenGL interop . Remote viz . Parallel viz . In-Situ viz High-level overview. Some parts platform dependent. Check with your sysadmin. VISUALIZATION APPLICATIONS NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES Surveyed sites: LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES Surveyed sites: LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie VISIT . Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudo-color, volume,.. — Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query . Scalable architecture . Open source http://wci.llnl.gov/codes/visit/ VISIT . Cross-platform — Linux/Unix, OSX, Windows . Wide range of data formats — .vtk, .netcdf, .hdf5,.. Extensible — Plugin architecture . Embeddable . Python scriptable VISIT’S SCALABLE ARCHITECTURE . Client-server architecture . Server MPI parallel . Distributed filtering . (multi-)GPU accelerated, parallel rendering* * requires X server on each node PARAVIEW . Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudocolor, volume,.. — Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query . Scalable architecture . Developed by Kitware, open source http://www.paraview.org PARAVIEW’S SCALABLE ARCHITECTURE . Client-server-server architecture . Server MPI parallel . Distributed filtering . GPU accelerated, parallel rendering* * requires X server on each node SOME OTHER TOOLS . Wide range of visualization tools . Often emerged from specialized application domain — Tecplot, EnSight: structural analysis, CFD — IndeX: seismic data processing & visualization — IDL: image processing . Early adopters of visual programming — AVS/Express, OpenDX FURTHER READING . Paraview Tutorial: http://www.paraview.org/Wiki/The_ParaView_Tutorial . VisIt Manuals/Tutorials: http://wci.llnl.gov/codes/visit/manuals.html VTK - THE VISUALIZATION TOOLKIT VISUALIZATION TOOLKIT . Focus on visualization, not (only) rendering . Provides more complex operations on data (“filtering”) . Introduces visualization pipeline . At the core of many high-level viz tools http://www.vtk.org VTK PIPELINE Source Raw data, shapes Filter Transform raw data Mapper Map data to geometry Renderer Render the geometry OPENGL From the CUDA perspective OpenGL OPENGL: API FOR GPU ACCELERATED RENDERING • Primitives: points, lines, polygons • Properties: colors, lighting, textures, .. • View: camera position and perspective • Shaders: Rendering to screen/framebuffer • C-style functions, enums See e.g. “What Every CUDA Programmer Should Know About OpenGL” (http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf) A SIMPLE OPENGL EXAMPLE glColor3f(1.0f,0,0); State-based API (sticky attributes) glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner Drawing glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner glEnd(); Render to screen glFlush(); glColor3f(1.0f,0,0); float* vert={-1.0f, -1.0f, ..}; float* d_vert; glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); glVertex3f(-1.0f, 1.0f, 0.0f); ? cudaMalloc(&d_vert, n); glVertex3f(1.0f, 1.0f, 0.0f); cudaMemcpy(d_vert, vert, n, glVertex3f(1.0f, -1.0f, 0.0f); cudaMemcpyHostToDevice); glEnd(); renderQuad<<<N/128, N>>>(d_vert); glFlush(); flushToScreen<<<..>>>(); CUDA-OPENGL INTEROP: MAPPING MEMORY • OpenGL: Opaque data buffer object • Virtex Buffer Object (VBO) • User has very limited control • CUDA: C-style memory management • User has full control • CUDA-OpenGL Interop: Map/Unmap OpenGL buffers into CUDA memory space RENDERING FROM A VBO Create VBO Initialize VBO () nit i Populate Render display() Display RENDERING FROM A VBO WITH CUDA INTEROP Create VBO Initialize VBO () nit Register VOB with CUDA i Map VOB to CUDA Populate Unmap VBO from CUDA Render display() Display MAIN OPENGL-CUDA INTEROP ROUTINES . Register VBO for CUDA cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags); . Mapping of VBO to CUDA cudaGraphicsMapResources(1, cuda_vbo, 0); cudaGraphicsResourceGetMappedPointer(&d_ptr, size, *cuda_vbo); . Unmapping VBO from CUDA cudaGraphicsUnmapResources(1, cuda_vbo, 0); CAN ALL GPUS SUPPORT OPENGL? . GeForce : standard feature set including OpenGL 4.3/4.4 . Quadro : + certain highly accelerated features (e.g. CAD) . Tesla: If GPU is in “All on” operation mode* Graphics capabilities disabled nvidia-smi –-query-gpu=gom.current nvidia-smi –q *requires “m” or “X” class devices Graphics capabilities enabled OPENGL SUMMARY + Relatively simple for basic viz + Existing/fixed/simple rendering pipeline - Low level - Triangles, points, rather than isosurfaces, height-fields - (currently) depends on Xserver for context creation - What about remote, parallel etc? MORE OPENGL INFORMATION AT GTC S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web, 3/25, 13:30, LL21C S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C S4385 - Order Independent Transparency in OpenGL, 3/25, 210C REMOTE VISUALIZATION OPENGL CONTEXT . State of an OpenGL instance — Incl. viewable surface — Interface to windowing system . Context creation: platform specific — Not part of OpenGL — Handled by Xserver in Linux/Unix-like systems . GLX: Interaction X<-> OpenGL LOCAL RENDERING Application libGL Xlib GLX X11 OpenGL Events Commands 2D/3D X Server Driver GPU, monitor attached X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING Application X11 Commands libGL Xlib X11 Events OpenGL/GLX On remote system: 2D/3D X Server export DISPLAY=59.151.136.110:0.0 Driver GPU, monitor attached Network X-FORWARDING + Simple! + ssh –X + No need to run X on remote machine - Lots of data crosses the network - All rendering performed on the local Xserver - Not useful for visualization of remote CUDA Interop apps (X-server doesn’t “see” remote GPU) SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION Application libGL Xlib GLX X11 X11 Events Cmds OpenGL 2D/3D X Server Client Driver Driver GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING Application libGL Xlib GLX X11 X11 Events Cmds OpenGL Images 2D/3D X Server Client Driver Driver Scraper GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING + Full GPU acceleration + No X server on client Question: when to scrape? — Xserver not informed about direct rendering — Intercept glxSwapBuffers() - Not multi-user => Occasionally used for remote desktop tools GLX FORKING: OUT-OF-PROCESS Application libGL Xlib X11 OpenGL/ X11 Events GLX Cmds Proxy X Server GLX Images OpenGL Images 3D X Server Client App Driver Driver GPU GPU, monitor attached Network GLX FORKING: OUT-OF-PROCESS + Full GPU acceleration + No X server on client + Multi-User - All traffic through Proxy X server => Occasionally used for remote desktop tools GLX FORKING WITH INTERPOSER LIBRARY Application X11 Commands libGL VirtualGL Xlib X11 Events VGL transport VirtualGL GLX (compressed) client OpenGL Images Images 3D X Server 2D X Server Driver Driver GPU GPU, monitor attached Network GLX FORKING WITH INTERPOSER LIBRARY Application libGL VirtualGL Xlib X11 X11 Events Cmds GLX Proxy X OpenGL Images Server Images 3D X Server Client App Driver Driver GPU GPU, monitor attached Network VIRTUAL GL + TURBOVNC + Compressed image transport + Transparent to the application + Fully GPU accelerated OpenGL + Client with or without Xserver - Requires Xserver to access GPU http://www.virtualgl.org HOW TO SET UP VIRTUAL GL + TURBOVNC . Requires Xserver running on server — Root privileges for Xserver . Requires installation of VirtualGL, TurboVNC on server . Start VirtualGL-accelerated VNC server vncserver :3 . TurboVNC viewer on client — Linux, Windows, Javascript CONNECTING CLIENT TO REMOTE SERVER CONNECTING TO REMOTE VNC SERVER VIA A GATEWAY . Establish tunnel on client ssh –L 3333:node01.cluster.net:5903 login.cluster.net . Connect client to localhost:3333 port 5903 port 3333 Gateway node01.cluster.net login.cluster.net client LAUNCHING REMOTE CUDA/OPENGL APPLICATIONS VIA VGLRUN vglrun :3 glxgears export DISPLAY=:3 vglrun simpleGL REMOTE VIZ IN PARAVIEW/VISIT . Approach 1: VirtualGL