of Climate Data with ParaView

DATAR: Data science Aspects and Tools in Atmospheric Research

November 14, 2018 | Dr. Herwig Zilken

November 14, 2018 Slide 1 Contents

• Overview about ParaView • Loading Data • Important Visualization Techniques • Animating Data • ParaView Python Scripting

November 14, 2018 Slide 2 What is ParaView?

• General purpose open-source data analysis and visualization application (two- and three-dimensional data sets) built on top of VTK • Provides a comprehensive suite of visualization algorithms • Supports many different file formats for both loading and exporting data sets. • Supported platforms: , Windows, Mac • Processing Modes: • Stand-alone mode • Client server configuration (in parallel) • Batch (Python scripting) • Locally installed at JSC: • Linux Group • JURECA Visualization Partition

November 14, 2018 Slide 3 The History of ParaView • 2000: collaboration between Inc. and Los Alamos National Laboratory, funding provided by the US Department of Energy ASCI program • 2002: first release of ParaView • September 2005 - May 2007: development of Paraview 3 by Kitware, Sandia National Lab. and other partners  user interface more user friendly  quantitative analysis framework • June 2013: ParaView 4.0  more cohesive GUI controls  better multiblock interaction  In situ integration into simulation and other applications (Catalyst) • Recent Releases: (currently ParaView 5.6)  new VR backend (multi screen 3D projection, VR devices)

November 14, 2018 Slide 4 ParaView‘s Architecture

ParaView pvpython ParaWeb Catalyst Custom App Client (GUI)

Interface Layer (Qt Widgets, Python Wrappings)

ParaView Server

VTK

OpenGL MPI IceT … more …

November 14, 2018 Slide 5 Getting Started

• ParaView can be downloaded at www..org • precompiled versions available for Mac OS, Windows (32-bit and 64-bit) and Linux (64-bit) • sources for individual installations (e.g. using the mpi- version tailored to your system)

• ParaView client launches like most applications: • Windows: launcher located at the Start-Menu • Linux: execute paraview from a command prompt • Mac OS: open the application bundle that you installed

November 14, 2018 Slide 6 Visualization Pipeline • concept of a visualization pipeline as implemented in VTK

Source Filter 1 Filter2 initial data Data 1 modify the Data 2 modify the Data 3 input or data in some data in some generated way way

Mapper Actor Mapper Actor Mapper Actor Generates Adjust the Generates Adjust the Generates Adjust the Function calls visible Function calls visible Function calls visible to graphic properties to graphic properties to graphic properties system system system

Renderer

Create images of data

November 14, 2018 Slide 7 User Interface

Menu Bar

Toolbars

Pipeline Browser 3D View

Properties & Information Panel

Advanced toggle: Shows/hides advanced controls

Menu Bar: access the majority of features Toolbars: quick access to the most commonly used Pipeline Browser: Overview about the data processing pipeline Properties & Information Panel: Parameters of the selected object in the pipeline

November 14, 2018 Slide 8 Scientific Data • a data describes WHAT values are located WHERE.

• WHERE: the structure of the data 1. Geometry: defines the (3D) location 2. Topology: describes the connectivity of points (cells)

Example: three points at (x1,y1,z1), (x2,y2,z2), (x3,y3,z3) forming a triangle

• WHAT: the attributes (values) of the data, e.g. temperature, pressure, …

• typically the data is discrete (not continuous), given at a set of points in 3D space.

November 14, 2018 Slide 9 Data Types: Structured Data

Uniform Rectilinear Grid (Image Data) (vtkImageData)

Non-uniform Rectilinear Grid (Rectilinear Grid) (vtkRectilinearData)

Structured (Curvilinear) Grid (vtkStructuredData)

November 14, 2018 Slide 10 Data Types: Unstructured Data

Polygonal Mesh (Poly Data) (vtkPolyData)

Unstructured Grid (vtkUnstructuredGrid)

Multi-Block (several data sets grouped together)

Hierarchical Adaptive Mesh Refinement (AMR)

November 14, 2018 Slide 11 Data Attributes

• Data attributes at grid points (grid data, node data) or on cells (cell data, element data)

• General data attributes:  Scalar  Vector  Tensor (n x n matrix)

• Attributes with special meaning for the visualization process:  Surface normals (3D vector)  2D or 3D texture coordinates

• Any number of variables can be defined at points and cells

November 14, 2018 Slide 12 Loading Data

November 14, 2018 Slide 13 Loading Data: NetCDF

• Typically climate data is stored in NetCDF containers • There are some NetCDF readers in ParaView: – A generic netCDF reader (respecting Climate and Forecast convention, see http://cfconventions.org/) – A netCDF MPAS reader for MPAS data – A netCDF POP reader for (parallel) ocean program data – A netCDF CAM reader for reading CAM data – A CDI netCDF reader for unstructured ICON data sets (plugin)

November 14, 2018 Slide 14 Loading Data: HDF5 + XDMF

• NetCDF readers sometimes are unflexibel and maybe not right for your file format • Solution: convert your data to HDF5 and generate corresponding XDMF file (NEXT SLIDE) • Data then can be loaded by ParaView‘s xdfm reader • The xdmf file gives additional flexibility on loading: - hyperslab for volume of interest or data subsampling - join scalars to build vectors - a few simple calculations possible (e.g. add two values)

How To Convert: • Sometimes NetCDF-file is already a HDF5 container, check by: ncdump –k foo.nc output „netCDF-4“ or „netCDF-4 classic model“ means HDF5 • Otherwise convert NetCDF to HDF5 via nccopy –k 4 foo3.nc foo4.h5 another example with compression nccopy –k 4 – d 1 foo3.nc foo4.h5

November 14, 2018 Slide 15 XDMF • Description of the structure of HDF5 datasets, e.g. what data defines coordinates, what data defines data values (attributes), …

• Small XML file (light data) in addition to (large) data file (heavy data, typically HDF5) • light data: XML file containing XDFM language statements and references to datasets in heavy data • heavy data: HDF5 files (or binary)

• http://www.xdmf.org

November 14, 2018 Slide 16 Loading Data: HDF5 + XDMF

• Alternatively NetCDF can be converted by self made, simple Python script • Advantage: – Only the variables relevant for visualization can be selectively converted – Any desired coordinate transformation can be pre calculated (e.g. from spherical to cartesian coordinates, or from pressure to absolute height) – Additional variables can be derived from existing (e.g. magnitude of vectors, relative/fractional values) – A subset of the domain (region of interest) can be selected – Recalculate values to be cell centered (e.g. staggered grid)

November 14, 2018 Slide 17 Sample Copy Script: open files, copy metadata, copy selected variables, cut off 10 gridpoints from border import h5py import netCDF4 as nc import numpy as np variables = ["QNRAIN", "QNICE", "CLDFRA"] cutoff = 10 fin = nc.Dataset("foo.nc", "r") fout = h5py.File("foo.h5", 'w') # copy global meta data (attributes) for attr in fin.ncattrs(): fout.attrs[attr] = getattr(fin, attr) for var in variables: v=fin.variables[var] # generate shape of output variable taking cutoff into account outshape = [] for i in range(len(v.shape)): outshape.append(v.shape[i]) # cut off border outshape[-1] = outshape[-1] - 2*cutoff outshape[-2] = outshape[-2] - 2*cutoff dset = fout.create_dataset(var, tuple(outshape), dtype=np.float32) # copy variable meta data (attributes) for attr in v.ncattrs(): dset.attrs[attr] = getattr(v, attr) # copy content of variable fout[var][:, :, :, :] = v[:, :, cutoff:-cutoff, cutoff:-cutoff] fin.close() fout.close()

November 14, 2018 Slide 18 Sample Copy Script: calculate scalar from vector (result = sqrt(U10^2+V10^2)) import h5py import netCDF4 as nc import numpy as np fin = nc.Dataset("foo.nc", "r") fout = h5py.File("foo.h5", 'w')

U10 = fin.variables["U10"] V10 = fin.variables["V10"] resultName = "Result" dset = fout.create_dataset(resultName, U10.shape, dtype=np.float32, compression="gzip") fout[resultName][:, :, :] = np.sqrt(U10[:, :, :]*U10[:, :, :] + V10[:, :, :]*V10[:, :, :]) fin.close() fout.close()

November 14, 2018 Slide 19 Sample script to generate cartesian coordinates from lon, lat, level import h5py import netCDF4 as nc import numpy as np fin = nc.Dataset("foo.nc", "r") fout = h5py.File("foo.h5", 'w') latIn = fin['/lat'] # 1D variable lonIn = fin['/lon'] # 1D variable levIn = fin['/lev_2'] # 1D variable xOut = fOut.create_dataset("/x", (levIn.shape[0], latIn.shape[0], lonIn.shape[0]), dtype=np.float32) yOut = fOut.create_dataset("/y", (levIn.shape[0], latIn.shape[0], lonIn.shape[0]), dtype=np.float32) zOut = fOut.create_dataset("/z", (levIn.shape[0], latIn.shape[0], lonIn.shape[0]), dtype=np.float32) lat = np.zeros((latIn.shape[0], lonIn.shape[0]), dtype = np.float32) lon = np.zeros((latIn.shape[0], lonIn.shape[0]), dtype = np.float32) for i in range(lonIn.shape[0]): lat[:, i] = latIn for i in range(latIn.shape[0]): lon[i, :] = lonIn (continued on next slide)

November 14, 2018 Slide 20 Sample Script to generate cartesian coordinates from lon, lat, level (continued)

(continued) x = np.zeros((latIn.shape[0], lonIn.shape[0]), dtype = np.float32) y = np.zeros((latIn.shape[0], lonIn.shape[0]), dtype = np.float32) z = np.zeros((latIn.shape[0], lonIn.shape[0]), dtype = np.float32) for i in range(levIn.shape[0]): print "calculating coordinates for level %d" % (i) height = (137.0 - levIn[i])*.5+150 x = height * np.cos(lat*np.pi/180)*np.cos(lon*np.pi/180) y = height * np.cos(lat*np.pi/180)*np.sin(lon*np.pi/180) z = height * np.sin(lat*np.pi/180) xOut[i, :, :] = x yOut[i, :, :] = y zOut[i, :, :] = z fin.close() fout.close()

November 14, 2018 Slide 21 XDMF: Example for Image Data

-512.000000 -512.000000 -512.000000 1.000000 1.000000 1.000000 /viswork/testDataZilken/test_4GB.h5:/sphere

November 14, 2018 Slide 22 Calculations within ParaView

• It can make sense to do calculations in ParaView in a visualization session

• Advantages: – Calculations are done on the fly -> enhanced interactivity for data exploration – No space on hard drive needed for intermediate results • Disadvantages – Calculations may take some time -> you have to wait – Memory consumption at runtime increases

November 14, 2018 Slide 23 Calculations within ParaView • Calculator: calculates new attributes based on simple expression – example: „LANDMASK*(abs(HGT) + 20.0)“ – Can generate vectors from scalars via „iHat*velocity_x + jHat*velocity_y + kHat*velocity_z“ – Can generate new coordinates – Unflexibel, no „if“ statement • PythonCalculator: calculates new attributes based on simple Python expression – NumPy and SciPy functions can be used – Can generate vectors from scalars via „make_vector (velocity_x , velocity_y , velocity_z)” – No „if“ statement, but numpy.where works, e.g. „numpy.where(Rain > 20, -1 * Rain, LANDMASK*(numpy.abs(HGT)+20))“ • Programmable Source/Filter – Most flexible – Needs some deeper knowledge of ParaView conventions and data flow

November 14, 2018 Slide 24 Important Visualization Techniques

November 14, 2018 Slide 25 Visualize Textured Sphere • Create pipeline with Sphere source and TextureMapToSphere filter:

Set a texture image in TextureMapToSphere -> Miscellaneous

November 14, 2018 Slide 26 Visualize Textured Sphere: Pitfalls • Maybe you have to transform the texture image to fit your data 

• Dont set „End Theta“ of Sphere • Uncheck „Prevent Seam“ of to 360, use e.g. 359.9999, else  TextureMapToSphere, else 

November 14, 2018 Slide 27 Visualizing Scalar Data

• Slicing and Clipping of data possible, but outcome is mostly a boring image, maybe good for data analysis

• Main techniques are Contour and – Contour: generates one or more 2D or 3D iso-surfaces according to given iso-values (in 3D via a triangle mesh). – Volume Rendering: renders volumetric data directly (according to the „absorption emission“ model). Internally either 3D textures or ray casting algorithms are used. Good transfer function for color and transparency is important. Hint: OSPRay (http://www.ospray.org/) is integrated in ParaView. Nice visual effects, e.g. shadows and ambient sampling Beware: curvilinear grid has to be resampled to image data (via ResampleToImage filter)

November 14, 2018 Slide 28 Visualizing Scalar Data: 2 Examples

• Rain and ice as contour, cloudfraction as volume rendering

• Water vapor mixing ratio (humidity) as volume rendering

November 14, 2018 Slide 29 Visualizing Vector Data

• Typical techniques are Glyphs (e.g. arrows) and StreamTracer (for streamlines) • Also surface LIC plugin and WarpByVector available Example for streamlines

• For temporal data: ParticleTracer and ParticlePath filters. But crude implementation: integration time step can not be adjusted, depends on the real time steps of the data itself 

November 14, 2018 Slide 30 Animating Data

November 14, 2018 Slide 31 Animating Data Using The Animation View, ParaView can animate • Data time steps (if you have time-dependent data) • Nearly any property of any pipeline object • The camera, to perform camera flights along a specified path or orbit. • Use Python scripts to manipulate the scene every time step

November 14, 2018 Slide 32 Animating Properties • Select object and property from the combo boxes within the Animation View, press • A new track is created, holding key frames that specify values for the property at a specific time instance

• Double-click at the new track to edit the keyframes:

November 14, 2018 Slide 33 Animating the camera

• Select “Camera” in the object combo box within the Animation View.

• Choose one of the options: • “Orbit” and “Follow Path” to create a (closed) spline widget around your data along which the camera is animated • “Follow Data” to look at the data in every step of the animation. • “Interpolate camera locations” to create a path between two or more locations.

• Double-click at the new element to edit the keyframes

November 14, 2018 Slide 34 Saving Screenshots and Animations • File -> Save Screenshot • Save the current view(s) as PNG, BMP, TIFF, PPM, JPEG, PDF • File -> Export Scene • save your Image as a vector graphic

• File -> Save Animations saves the current animation as • Ogg/Theora (many open source viewers) AVI (windows, some open source viewers) • JPEG, TIFF, PNG (creates a flipbook)

November 14, 2018 Slide 35 ParaView Python Scripting

November 14, 2018 Slide 36 ParaView Python Scripting

• ParaView can be fully controlled by a Python script

Reasons to do this: • For batch processing of data (many files, many time steps, many different visualization methods) • To store and reconstruct (and to document) the state of a (complex) ParaView pipeline • As a workaround for some ParaView flaws • e.g. memory leak when loading time steps: close and restart ParaView every N frames (just to clear the memory) and let the script resume the animation exactly at that point

November 14, 2018 Slide 37 ParaView Python Scripting: How To • The GUI is a good tool to prototype the scene, especially pipeline properties, color tables and camera positions • Automated facility for creating Python scripts: Tracing • Tools -> Start Trace to begin trace recording, Tools -> Stop Trace to end it • Produces a python script that reconstructs many (not all ) actions performed in the GUI • Script can be started by pvpython ./script.py

• Images can be saved by SaveScreenshot(„./foo.jpg“, renderView, ImageResolution=[1920,1080])

November 14, 2018 Slide 38 From Gui To Script

• Color tables can be saved in the GUI and loaded in the script by ImportPresets('colortable.json') • Camera positions can also be saved in an xml file and loaded in the script by import xml.etree.ElementTree as ET from paraview.simple import * def assignCameraParameters(root, camera, camIdx): camera.SetPosition( (root[camIdx-1][1][0][0][0][0].attrib['value']), (root[camIdx- 1][1][0][0][0][1].attrib['value']), (root[camIdx-1][1][0][0][0][2].attrib['value'])) camera.SetFocalPoint((root[camIdx-1][1][0][0][1][0].attrib['value']), (root[camIdx- 1][1][0][0][1][1].attrib['value']), root[camIdx-1][1][0][0][1][2].attrib['value'])) camera.SetViewUp((root[camIdx-1][1][0][0][2][0].attrib['value']), (root[camIdx- 1][1][0][0][2][1].attrib['value']), (root[camIdx-1][1][0][0][2][2].attrib['value'])) camera.SetParallelScale((root[camIdx-1][1][0][0][6][0].attrib['value']))

tree = ET.parse('camera.pvcvbc') camera = GetActiveCamera() assignCameraParameters(tree.getroot(), camera, 1)

November 14, 2018 Slide 39 Visualize Textured Sphere: Python Code # create sphere # sphere = Sphere() sphere.Radius = 150.0 sphere.ThetaResolution = 100 sphere.EndTheta = 359.9999 sphere.PhiResolution = 50

# create a new 'Texture Map to Sphere' # textureMaptoSphere = TextureMaptoSphere(Input=sphere) textureMaptoSphere.PreventSeam = 0 textureMaptoSphereDisplay = Show(textureMaptoSphere, renderView) textureMaptoSphereDisplay.Representation = 'Surface' texProxy = servermanager.CreateProxy("textures", "ImageTexture") texProxy.GetProperty("FileName").SetElement(0, "./images_earth/earth_heller_transformed.jpg") texProxy.UpdateVTKObjects() textureMaptoSphereDisplay.Texture = texProxy

November 14, 2018 Slide 40