<<

Performing Geospatial Analysis Using and Data Ginger Ni - Software Engineer Hewlett Packard Enterprise confidential information This is a rolling (up to three year) roadmap and is subject to change without notice.

This Roadmap contains Hewlett Packard Enterprise Confidential Information. If you have a valid Confidential Disclosure Agreement with Hewlett Packard Enterprise, disclosure of the Roadmap is subject to that CDA. If not, it is subject to the following terms: for a period of three years after the date of disclosure, you may use the Roadmap solely for the purpose of evaluating purchase decisions from HPE and use a reasonable standard of care to prevent disclosures. You will not disclose the contents of the Roadmap to any third party unless it becomes publically known, rightfully received by you from a third party without duty of confidentiality, or disclosed with Hewlett Packard Enterprise’s prior written approval.

#SeizeTheData Please give me your feedback Session ID: B9697 Speaker: Ginger Ni –Use the mobile app to complete a session survey 1. Access “My schedule” 2. Click on the session detail page 3. Scroll down to “Rate & review”

– If the session is not on your schedule, just find it via the Discover app’s “Session Schedule” menu, click on this session, and scroll down to “Rate & Review” – If you have not downloaded our event app, please go to your phone’s app store and search on “Discover 2016 Las Vegas”

– Thank you for providing your feedback, which helps us enhance content for future events.

#SeizeTheData Overview

– Vertica geospatial support – GEOMETRY and GEOGRAPHY data types – Built-in spatial functions – WGS84 longitude and latitude data – Hurricane risk analysis example – Tools and datasets – Vertica SQL commands – Hurricane damage heat map

Image Credit: D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161. URL http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf

#SeizeTheData 5 Geospatial analysis in daily life

People are using technology of Geographic Information Systems (GIS) in their everyday lives. – View images on Google Earth – Navigate cars using Global Positioning System (GPS) devices – Show your locations when you upload pictures to social media – Search nearby restaurants on your smart phone

Government, companies, and individuals are using, generating and collecting geospatial data. From industry-leading businesses, to individual daily activities, vast amounts of geospatial data are being collected each day. Vertica can help you analyze geospatial data.

#SeizeTheData 6 Spatial analysis with Vertica

– People who are using or developing GIS for data management and analysis would benefit from the geospatial package in Vetica. – Vertica supports two spatial data types, GEOMETRY and GEOGRAPHY, allowing spatial data to be stored and queried. A lot of GIS-enabled applications can work with Vertica. – Vertica provides 65+ built-in functions for spatial analysis. – Vertica geospatial analysis functions follow the Open Geospatial Consortium (OGC) standards, which are international open standards for geospatial interface and content. – Vertica also has additional spatial functions beyond OGC standard functions.

#SeizeTheData 7 Vertica spatial data types

Vertica supports two spatial data types that can be used to store geographical objects such as points, lines, and polygons.

GEOMETRY : Used to store planar data. It is generally used to store XY coordinates in a two-dimensional space.

GEOGRAPHY: Used to store spherical (round-earth) data. It is used to store longitude and latitude coordinates that represent points, lines, and polygons on the earth's surface.

#SeizeTheData Import/export spatial data – Parse and load data from a Shapefile with a COPY command COPY table_name(columns) WITH SOURCE STV_ShpSource(file = ‘filename’) PARSER STV_ShpParser() – Exports data from Vertica to a Shapefile STV_Export2Shapefile (columns) USING PARAMETERS shapefile = ‘shapefile_name’) OVER() – Import spatial data from WKT/WKB formats (OGC standard geographic data formats) ST_GeomFromText(wkt), ST_GeomFromWKB(wkb), ST_GeographyFromText(wkt), ST_GeographyFromWKB(wkb) – Export spatial data as WKT/WKB formats ST_AsText(g), ST_AsBinary(g) – Export spatial data as GeoJSON format STV_AsGeoJSON(g)

#SeizeTheData Vertica spatial analysis functions

Vertica provides SQL functions that allow for the creation, comparison, analysis and retrieval of spatial data.

ST_ functions are STV_ functions are compliant with OGC standards. (40+ unique to Vertica and not compliant with functions) OGC standards.(25+ functions) – ST_Area – STV_Create_Index – ST_Distance – STV_Refresh_Index – ST_Boundary – STV_Intersect – ST_Intersects – STV_Export2Shapefile – ST_IsValid – STV_AsGeoJSON –… –…

#SeizeTheData 10 Vertica spatial function examples

– ST_GeomFromText Converts a well-known text (WKT) string into its corresponding GEOMETRY object. – Can bulk loading spatial data from a data file with the COPY statement

#SeizeTheData 11 Vertica spatial function examples

– ST_AsText Creates the well-known text (WKT) representation of a spatial object. The OGC defines the format of a WKT string. – ST_SRID Identifies the identifier (SRID) stored with a spatial object.

#SeizeTheData 12 Vertica spatial function examples

– ST_Transform Returns a new GEOMETRY with its coordinates converted to the target SRID.

#SeizeTheData 13 Vertica spatial function examples

– ST_Area Calculates the area of a spatial object. – ST_IsValid Determines if a spatial object is well formed or valid. If the object is not valid, STV_IsValidReason returns a string that explains where the invalidity occurs.

#SeizeTheData 14 Vertica spatial function examples

– ST_Intersects Determines if two spatial objects intersect or touch at a single point

#SeizeTheData 15 Spatial Joins – Spatial Joins allow you to identify spatial relationships between two sets of spatial data. Extensive applications are based on spatial join analysis. For example: • Calculate the density of mobile calls in various regions to determine the location of a new cell phone tower • Identify homes that fall within the impact zone of a hurricane • Calculate the number of users who live within a certain ZIP code • Calculate the number of customers in a retail store at any given time – Spatial join functions in Vertia are using spatial indexes which provide good performance.

#SeizeTheData 16 Longitude and latitude data

(WGS84) is an industry standard when mapping the Earth’s oblate spheroidal surface, and it is also the reference coordinate system used by GPS enabled devices. – WGS84 is common nowadays and is the standard for GPS data, so when people refer to lon lat, they mean WGS84 lon lat.

#SeizeTheData 17 GEOGRAPHY data type

– Vertica supports GEOGRAPHY spatial data type. – A GEOGRAPHY object is a spatial object defined as on the surface of a perfect sphere, or a spatial object in the WGS84 coordinate system.

#SeizeTheData 18 GEOGRAPHY data type

Compute ST_Distance on perfect sphere and WGS84. (Results in unit of meters)

#SeizeTheData 19 Using GEOGRAPHY data type

Pros – If your data covers the globe or a large area, this is probably the best. WGS84 is not projected, so distance measurement and spatial joins are more accurate than Universal Transverse Mercator (UTM). – Most web mapping expect data to be fed to them in WGS84, so GEOGRAPHY will work fine in your spatial analysis work flow. Cons – Processing functions for GEOGRAPHY are limited. – Computationally, functions for GEOGRAPHY are more expensive thus slower. For regional data, we usually use GEOMETRY data type.

#SeizeTheData 20 Summary

– Vertica provides 65+ functions that allow you to manipulate complex spatial objects. – Functions follow the OGC standards -- Easy for migration. – Fast computations based on spatial index. – Geosaptial analysis package automatically deploy on Vertica cluster. – Support spatial analysis on longitude and latitude data in WGS84 coordinate system.

#SeizeTheData Hurricane risk analysis example – In his example, I will show you perform spatial analysis on longitude latitude data with Vertica. Make a heat map with your spatial data using Vertica and third-party softwares. – The ability to perform spatial queries, and display data spatially, is the essence and power of GIS. Vertica is perfect fit for your GIS development.

#SeizeTheData 22 Hurricane risk analysis with Vertica

Hurricanes leave behind extensive property damage. Using spatial analysis to estimate the impact of disasters in high-risk areas will provide important information to insurance companies on potential loss of property value.

– In this example, we do a point-in-polygon analysis to identify properties that fall within the impact zone of a hurricane. – Determine areas most affected by a hurricane and visualize them on a heat map. – The example shows how you can gain insight into real-world challenges using your spatial data with Vertica.

#SeizeTheData 23 Hurricane heat map

The National Hurricane Center(NHC) provides various GIS datasets. In this example, we use the hurricane BONNIE dataset. The United States Census Bureau provides various census data. We downloaded the landmark dataset from the TIGER database. We do a spatial join analysis of a set of points and a set of polygons. Polygon dataset Hurricane BONNIE from http://www.nhc.noaa.gov/gis/ Point dataset Landmarks from ftp://ftp2.census.gov/geo/tiger/TIGER2015/POINTLM/ Tools Vertica + third-parth softwares

#SeizeTheData 24 #SeizeTheData 25 Convert spatial reference system

Both data set are geographic longitude latitude data set. The BONNIE data is in SR-ORG7219 and the landmark data is in NAD83. The first step is converting them to WGS84.

Convert hurricane BONNIE polygon data to WGS84 ogr2ogr –f “ESRI Shapefile” –t_srs EPSG:4326 al022016- 001_5day_pgn_4326.shp al022016-001_5day_pgn.shp

Convert landmarks point data to WGS84 ogr2ogr –f “ESRI Shapefile” –t_srs EPSG:4326 tl_2015_09_pointlm_4326.shp tl_2015_09_pointlm.shp –s_srs EPSG:4269

#SeizeTheData 26 Load shapefile into a Vertica table -- Load the hurricane BONNIE shapefile SELECT stv_shpcreatetable(using parameters file=bonnie.shp) OVER(); CREATE TABLE tmp( gid IDENTITY(64) PRIMARY KEY, ADVDATE VARCHAR(28), geom GEOMETRY(7581) ); COPY tmp SOURCE STV_ShpSource(file=bonnie.shp) PARSER STV_ShpParser();

-- Convert GEOMETRY to GEOGRAPHY CREATE TABLE bonnie AS SELECT gid, ADVDATE, stv_geography(geom) geom FROM tmp;

#SeizeTheData 27 Perform a spatial join analysis

CREATE TABLE countLM AS SELECT ADVDATE, STATEFP, count(*) FROM bonnie a, landmarks b WHERE st_intersects(a.geom, b.geom) GROUP BY ADVDATE, STATEFP;

Using this count in our visualization helps us see properties in which state are impacted the most by hurricane BONNIE.

#SeizeTheData 28 – More than 236 thousands landmarks on US east coast line are computed in each frame. – Spatial join computation of all 236+ thousands instances takes ~1 second run time. – You can export the intersect results to a shapefile and then load to a visualization tool. – Or you can connect GIS tools with Vertica though JDBC/ODBC to fetch those spatial analysis results.

#SeizeTheData Q&A

#SeizeTheData 30