HOW to VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA

HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA RANGE OF ANALYSIS AND VIZ TASKS . Analysis: Focus quantitative . Visualization: Focus qualitative . Monitoring, Steering TRADITIONAL HPC WORKFLOW Workstation Analysis, Setup Visualization Supercomputer Viz Cluster Dump, Checkpointing Visualization, Analysis File System TRADITIONAL WORKFLOW: CHALLENGES Lack of interactivity prevents “intuition” Workstation High-end viz Analysis, neglected due Setup Visualization to workflow complexity Supercomputer Viz Cluster Viz resources need I/O becomes main to scale with simulation Dump, simulation bottleneck Checkpointing Visualization, Analysis File System OUTLINE . Visualization applications . CUDA/OpenGL interop . Remote viz . Parallel viz . In-Situ viz High-level overview. Some parts platform dependent. Check with your sysadmin. VISUALIZATION APPLICATIONS NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES Surveyed sites: LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES Surveyed sites: LLNL LLNL- ORNL- AFRL- NASA- NERSC -OCF SCF LANL CCS DOD-ORC DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie VISIT . Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudo-color, volume,.. — Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query . Scalable architecture . Open source http://wci.llnl.gov/codes/visit/ VISIT . Cross-platform — Linux/Unix, OSX, Windows . Wide range of data formats — .vtk, .netcdf, .hdf5,.. Extensible — Plugin architecture . Embeddable . Python scriptable VISIT’S SCALABLE ARCHITECTURE . Client-server architecture . Server MPI parallel . Distributed filtering . (multi-)GPU accelerated, parallel rendering* * requires X server on each node PARAVIEW . Scalar, vector and tensor field data features — Plots: contour, curve, mesh, pseudocolor, volume,.. — Operators: slice, iso-surface, threshold, binning,.. Quantitative and qualitative analysis/vis — Derived fields, dimension reduction, line-outs — Pick & query . Scalable architecture . Developed by Kitware, open source http://www.paraview.org PARAVIEW’S SCALABLE ARCHITECTURE . Client-server-server architecture . Server MPI parallel . Distributed filtering . GPU accelerated, parallel rendering* * requires X server on each node SOME OTHER TOOLS . Wide range of visualization tools . Often emerged from specialized application domain — Tecplot, EnSight: structural analysis, CFD — IndeX: seismic data processing & visualization — IDL: image processing . Early adopters of visual programming — AVS/Express, OpenDX FURTHER READING . Paraview Tutorial: http://www.paraview.org/Wiki/The_ParaView_Tutorial . VisIt Manuals/Tutorials: http://wci.llnl.gov/codes/visit/manuals.html VTK - THE VISUALIZATION TOOLKIT VISUALIZATION TOOLKIT . Focus on visualization, not (only) rendering . Provides more complex operations on data (“filtering”) . Introduces visualization pipeline . At the core of many high-level viz tools http://www.vtk.org VTK PIPELINE Source Raw data, shapes Filter Transform raw data Mapper Map data to geometry Renderer Render the geometry OPENGL From the CUDA perspective OpenGL OPENGL: API FOR GPU ACCELERATED RENDERING • Primitives: points, lines, polygons • Properties: colors, lighting, textures, .. • View: camera position and perspective • Shaders: Rendering to screen/framebuffer • C-style functions, enums See e.g. “What Every CUDA Programmer Should Know About OpenGL” (http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf) A SIMPLE OPENGL EXAMPLE glColor3f(1.0f,0,0); State-based API (sticky attributes) glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner Drawing glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner glEnd(); Render to screen glFlush(); glColor3f(1.0f,0,0); float* vert={-1.0f, -1.0f, ..}; float* d_vert; glBegin(GL_QUADS); glVertex3f(-1.0f, -1.0f, 0.0f); glVertex3f(-1.0f, 1.0f, 0.0f); ? cudaMalloc(&d_vert, n); glVertex3f(1.0f, 1.0f, 0.0f); cudaMemcpy(d_vert, vert, n, glVertex3f(1.0f, -1.0f, 0.0f); cudaMemcpyHostToDevice); glEnd(); renderQuad<<<N/128, N>>>(d_vert); glFlush(); flushToScreen<<<..>>>(); CUDA-OPENGL INTEROP: MAPPING MEMORY • OpenGL: Opaque data buffer object • Virtex Buffer Object (VBO) • User has very limited control • CUDA: C-style memory management • User has full control • CUDA-OpenGL Interop: Map/Unmap OpenGL buffers into CUDA memory space RENDERING FROM A VBO Create VBO Initialize VBO () nit i Populate Render display() Display RENDERING FROM A VBO WITH CUDA INTEROP Create VBO Initialize VBO () nit Register VOB with CUDA i Map VOB to CUDA Populate Unmap VBO from CUDA Render display() Display MAIN OPENGL-CUDA INTEROP ROUTINES . Register VBO for CUDA cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags); . Mapping of VBO to CUDA cudaGraphicsMapResources(1, cuda_vbo, 0); cudaGraphicsResourceGetMappedPointer(&d_ptr, size, *cuda_vbo); . Unmapping VBO from CUDA cudaGraphicsUnmapResources(1, cuda_vbo, 0); CAN ALL GPUS SUPPORT OPENGL? . GeForce : standard feature set including OpenGL 4.3/4.4 . Quadro : + certain highly accelerated features (e.g. CAD) . Tesla: If GPU is in “All on” operation mode* Graphics capabilities disabled nvidia-smi –-query-gpu=gom.current nvidia-smi –q *requires “m” or “X” class devices Graphics capabilities enabled OPENGL SUMMARY + Relatively simple for basic viz + Existing/fixed/simple rendering pipeline - Low level - Triangles, points, rather than isosurfaces, height-fields - (currently) depends on Xserver for context creation - What about remote, parallel etc? MORE OPENGL INFORMATION AT GTC S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web, 3/25, 13:30, LL21C S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C S4385 - Order Independent Transparency in OpenGL, 3/25, 210C REMOTE VISUALIZATION OPENGL CONTEXT . State of an OpenGL instance — Incl. viewable surface — Interface to windowing system . Context creation: platform specific — Not part of OpenGL — Handled by Xserver in Linux/Unix-like systems . GLX: Interaction X<-> OpenGL LOCAL RENDERING Application libGL Xlib GLX X11 OpenGL Events Commands 2D/3D X Server Driver GPU, monitor attached X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING Application X11 Commands libGL Xlib X11 Events OpenGL/GLX On remote system: 2D/3D X Server export DISPLAY=59.151.136.110:0.0 Driver GPU, monitor attached Network X-FORWARDING + Simple! + ssh –X + No need to run X on remote machine - Lots of data crosses the network - All rendering performed on the local Xserver - Not useful for visualization of remote CUDA Interop apps (X-server doesn’t “see” remote GPU) SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION Application libGL Xlib GLX X11 X11 Events Cmds OpenGL 2D/3D X Server Client Driver Driver GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING Application libGL Xlib GLX X11 X11 Events Cmds OpenGL Images 2D/3D X Server Client Driver Driver Scraper GPU GPU, monitor attached Network SERVER-SIDE RENDERING + SCRAPING + Full GPU acceleration + No X server on client Question: when to scrape? — Xserver not informed about direct rendering — Intercept glxSwapBuffers() - Not multi-user => Occasionally used for remote desktop tools GLX FORKING: OUT-OF-PROCESS Application libGL Xlib X11 OpenGL/ X11 Events GLX Cmds Proxy X Server GLX Images OpenGL Images 3D X Server Client App Driver Driver GPU GPU, monitor attached Network GLX FORKING: OUT-OF-PROCESS + Full GPU acceleration + No X server on client + Multi-User - All traffic through Proxy X server => Occasionally used for remote desktop tools GLX FORKING WITH INTERPOSER LIBRARY Application X11 Commands libGL VirtualGL Xlib X11 Events VGL transport VirtualGL GLX (compressed) client OpenGL Images Images 3D X Server 2D X Server Driver Driver GPU GPU, monitor attached Network GLX FORKING WITH INTERPOSER LIBRARY Application libGL VirtualGL Xlib X11 X11 Events Cmds GLX Proxy X OpenGL Images Server Images 3D X Server Client App Driver Driver GPU GPU, monitor attached Network VIRTUAL GL + TURBOVNC + Compressed image transport + Transparent to the application + Fully GPU accelerated OpenGL + Client with or without Xserver - Requires Xserver to access GPU http://www.virtualgl.org HOW TO SET UP VIRTUAL GL + TURBOVNC . Requires Xserver running on server — Root privileges for Xserver . Requires installation of VirtualGL, TurboVNC on server . Start VirtualGL-accelerated VNC server vncserver :3 . TurboVNC viewer on client — Linux, Windows, Javascript CONNECTING CLIENT TO REMOTE SERVER CONNECTING TO REMOTE VNC SERVER VIA A GATEWAY . Establish tunnel on client ssh –L 3333:node01.cluster.net:5903 login.cluster.net . Connect client to localhost:3333 port 5903 port 3333 Gateway node01.cluster.net login.cluster.net client LAUNCHING REMOTE CUDA/OPENGL APPLICATIONS VIA VGLRUN vglrun :3 glxgears export DISPLAY=:3 vglrun simpleGL REMOTE VIZ IN PARAVIEW/VISIT . Approach 1: VirtualGL

HOW to VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support