<<

How to represent geospatial data in SDMX

SDMX Experts Workshop 2018 Juan Muñoz López SDMX-TWG, INEGI Agenda

• Basic Concepts on Geolocation • Representation of Geographic Coordinates and Areas • Proposal to Represent Geospatial Data in SDMX • Some Examples of Information Systems With Intensive Use - Referenced Data • Discussion • Conclusions Basic Concepts on Geolocation Some Basic Concepts

• Geospatial data refers to geographical aspects like coordinates, regions, countries, cities, addresses, places, etc. • This kind of data is georeferenced, because it represents , size and shape of an object on planet . • Geo-Reference: To locate something into the physical space. We establish a reference between an object and its position on Earth. • ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of . Reference frameworks • To geo-reference we can use: • Coordinates system: establish imaginary points over the Earth. A coordinate system is a reference system used to represent the of geographic features, imagery, and observations, such as Global Positioning System (GPS) locations, within a common geographic framework. • projection: transform and on the surfaces of an or an into locations on a plane. Is a systematic transformation of the latitudes and longitudes of locations from the of a sphere or an ellipsoid into locations on a plane. Map Projections

• Map projections: Conical, Cylindrical, Pseudocylindrical (Combination of conical and cylindrical), Azimuthal (planar).

• Orientation: normal, oblique or transversal

• Cuts: tangent or secant Projection Examples

Mercator: Is a cylindrical presented by the Flemish geographer and cartographer in 1569 Van der Griten: Is a projection of the Earth into a circle. Polar regions are subject to extreme distortion Google , WMS and OpenStreetMap used a (EPSG:3857). From August 2018 Google is using a 3D Globe Model The Universal Transverse Mercator (UTM) • Uses a 2-dimensional Cartesian coordinate system to give locations on the surface of the Earth. Like the traditional method of and , it is a horizontal position representation, i.e. it is used to identify locations on the Earth independently of . • The UTM system is not a single map projection. The system instead divides the Earth into sixty zones, each being a six- band of longitude, and uses a secant transverse Mercator projection in each zone. • Instead of using latitude and longitude coordinates, each 6° wide UTM zone has a central of 500,000 meters. This central meridian is an arbitrary value convenient for avoiding any negative easting coordinates. All easting values east and of the central meridian will be positive. • If you’re in the , the has a northing value of 0 meters. In the , the equator starts at 10,000,000 meters. , Ellipsoid, Datum

• Geoid: Is the shape that the surface of the would take under the influence of Earth's and rotation alone, in the absence of other influences such as and . • Ellipsoid: Is a three-dimensional geometric figure that resembles a sphere, but whose equatorial axis (a in Figure 2.15.1, above) is slightly longer than its polar axis (b). are commonly used as surrogates for so as to simplify the mathematics involved in relating a coordinate system grid with a model of the Earth's shape. Many ellipsoids are in use around the world. • : Is a coordinate system, and a set of reference points, used to locate places on the Earth (or similar objects) which is related to a specific ellipsoid. A datum in the modern sense is defined by choosing an ellipsoid and then a primary reference point. Therefore giving the ellipsoid used is not enough Th diff i di b d i l Cartesian Coordinate System

• Is a coordinate system that specifies each point uniquely in a plane by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular directed lines, measured in the same unit of length. • A Cartesian coordinate system for a three-dimensional space consists of an ordered triplet of lines (the axes) that go through a common point (the origin), and are pair-wise perpendicular; an orientation for each axis; and a single unit of length for all three axes. • As in the two-dimensional case, each axis becomes a number line. For any point P of space, one considers a plane through P perpendicular to each coordinate axis, and interprets the point where that plane cuts the axis as a number. The Cartesian coordinates of P are those three numbers, in the chosen order. The reverse construction determines the point P given its three coordinates. Geographical Coordinates Systems

Geographical coordinates ( , , h): • Latitude, phi, , Y-axis • Measured as Degrees φ λ • Longitude: lambda,φ , X-axis • Measured as Degrees • References: λ • X: Meridian of • Y: Equator • Z: Rotation axis • ( or ), h, Z-axis • Measured as Meters • Altitude can be measured: • Ellipsoidal or Orthometric: A point above an ellipsoid. • Geocentric or Cartesian: A point from the mass of the Earth. Latitude, Longitude, Altitude

• Latitude: Coordinate that specifies the position of a point on the Earth's surface. Is an which ranges from 0° at the Equator to 90° (North or South) at the poles. Lines of constant latitude, or parallels, run east–west as circles parallel to the equator. • Geodetic Latitude: The angle between the normal and the equatorial plane. The standard notation in English publications is . This is the definition assumed when the word latitude is used without qualification. • Longitude: Coordinate that specifies the eastφ -west position of a Point on the Earth’s Surface. Longitude the angle between a plane containing the Meridian and a plane containing the , and the location in question. This forms a right-handed coordinate system with the z axis pointing from the Earth's center toward the North Pole and the x axis extending from Earth's center through the equator at the IERS (derived from the Greenwich Meridian). • Geodetic Longitude: Angle from the prime meridian plane to the meridian plane passing through the given point, eastwards usually treated as positive. The standard notation in English publications is . • Altitude: Synonym of elevation or height. Is the mean distance above level. λ • Geodetic Altitude: The distance from the selected point to the reference geoid, measured along the geodetic local vertical, and is positive for points outside the www.britannica.com id d f h f h E h Some Reference Ellipsoids

Parameter Value Name Semi-Major Semi-Minor 1/ Axis (a) Km. Axis (b) Km. Semi-major axis a Airy 1830 6377.563 6356.257 299.32 Reciprocal of 1/F Modified Airy 6377.340 6356.034 299.32 flattening Australian National 6377.160 6356.775 298.25 Semi-minor axis b Bessel 1841 6377.397 6356.079 299.15 b=a(1-F) Clarke 1886 6377.206 6356.584 294.98 Clarke 1880 6377.249 6356.516 293.46 First eccentricity = 1 = F(2-F) Everest 1830 6377.276 6356.075 300.80 squared 2 2 𝑏𝑏 Fischer 1960 6378.155 6356.773 298.30 2 ( ) eccentricity 𝑒𝑒 = − 𝑎𝑎 1 = Helmert 1906 6378.200 6356.818 298.30 squared 2 ( ) 6378.160 6356.774 298.25 2 𝑏𝑏 𝐹𝐹 2−𝐹𝐹 Indonesian 1974 2 2 𝑒𝑒′ 𝑎𝑎 − 1−𝐹𝐹 International 1924 6378.388 6356.912 298.00 Krassovsky 1940 6378.245 6356.863 298.30 South American 1969 6378.160 6356.774 298.25 WGS 72 6378.135 6356.751 298.26 GRS 80 6378.137 6356.752 298.257 WGS 84 6378.137 6356.752 298.257 Some Reference Datum

Datum Ellipsoid Adindan Clarke 1880 European 1950 (ED 50) International 1924 European 1979 (ED 79) International 1924 Pulkovo 1942 Krassovsky Word Geodetic System (WGS 84) WGS84 Korean Geodetic System WGS84 North American 1927 (NAD 27) Clarke 1866 North American 1983 (NAD 83) Clarke 1866 South American 1969 South American 1969 Most GPS receivers and map services South American Geocentric GRS80 like Apple Maps, Google Maps, OpenStreetMap, among others, use a Tokyo Bessel 1841 Cartesian coordinate System based on International Terrestrial Reference System 1992 (ITRF92) GRS80 WGS 84 (EPSG:4326); GRS80 ellipsoid is close enough to align without International Terrestrial Reference System 2008 (ITRF08) GRS80 correction. International Terrestrial Reference System 2014 (ITRF14) GRS80 Representation of Geographic Coordinates and Areas Representing Coordinates

There are several standards to represent geo-spatial coordinates: • ISO 6709:2008; Standard representation of geographic point location by coordinates. • KML 2.3, Open Geospatial Consortium, ISO 19111 • GML, Open Geospatial Consortium, ISO 19136 ISO 6709:2008; Standard representation of geographic point location by coordinates. resenting Coordinates • ISO 6709:2008; Standard representation of geographic point location by coordinates. • The horizontal position shall be described through a pair of coordinates. • Latitude: is a number preceded by a sign character. A plus sign (+) denotes northern hemisphere or the equator, and a minus sign (-) denotes southern hemisphere. • Longitude: is a number preceded by a sign character. A plus sign (+) denotes east longitude or the prime meridian, and a minus sign (-) denotes west longitude or 180° meridian (opposite of the prime meridian). • For digital data interchange, shall be the preferred representation. However, for backward compatibility with the first edition of this International Standard, degrees may be used. • Height (or depth): Its representation is optional. Height or depth from the reference surface in the positive direction shall be designated using a plus sign (+). Height or depth from the reference surface in the negative direction shall be designated using a minus sign (−) Height or depth on the reference surface

ISO 6709:2008 Single Text Representation Examples (Annex H) • Latitude and Longitude: • Degrees: +40-075CRSWGS_84/ • Degrees and decimal degrees: +40.20361-75.00417CRSWGS_84/ • Degrees and : +4012-07500CRSWGS_84/ • Degrees, minutes and decimal minutes: +4012.22-07500.25CRSWGS_84/ • Degrees, minutes and : +401213-0750015CRSWGS_84/ • Latitude, Longitude and Height or Depth: • Degrees and decimal degrees: +40.20361-75.00417+350.517CRSWGS_84/ • Degrees, minutes, seconds and decimal seconds: +401213.1-0750015.1+2.79CRSWGS_84/ • Rules: • The number of digits for latitude, longitude and height indicates the precision • There shall be no separator between the elements for latitude, longitude, height (depth) and CRS. • The use of designators “+”, “−” and “CRS” preceding the value part of each element permits the recognition of • The start of each element and the termination of the previous one. • The point location string shall be terminated. The terminator character shall be a solidus (/). • Representation in XML (From Annex G): +40.20361-75.00417CRSWGS_84/ ISO 19136 Markup Language (GML), OGC • Is the official XML representation of ISO 6709 • Is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet. • Is an XML grammar written in XML Schema for the description of application schemas as well as the transport and storage of geographic information. 50.42 -22.59

0.42 -22.59 543.43

45.67 88.56

45.67 88.56 55.56 89.44

45.256,-110.45 46.46,-109.48 43.84,-109.86 45.8,-109. 2 (OGC 07-036 OpenGis) 101.2 101.3 101.4 101.5 101.6 101.7 101.7 101.8 101.9 102.0 102.1 102.2 (OGC 07-036 OpenGis) KML 2.3, Google-Open Geospatial Consortium (OGC) • KML encodes what to show in an earth browser, and how to show it. It was proposed by Google and accepted by OGC • Datum: The set of parameters used by a coordinate reference system that define the position of the origin, the , and the orientation of a coordinate system [ISO 19111]. The datums in used by the KML coordinate reference system are a based on the geoid earth model and a (horizontal) and a geodetic datum, which specifies the ellipsoid model, area of use, and position of the prime meridian. Uses the WGS84 EGM96 Geoid vertical datum • A kml:Location element shall contain the kml:longitude, kml:latitude child elements outside of an update context, that is when not a descendant of kml:Update. • kml:latitude Geodetic latitude of origin in decimal degrees. • kml:longitude Geodetic longitude of origin in decimal degrees. • kml:altitude Altitude of origin measured in meters and interpreted according to kml:altitudeMode. • Example: 39.55375305703105 -118.9813220168456 1223 -122.0822035425683,37.42228990140251,0 -112.265654928602,36.09447672602546,2357 -112.2660384528238,36.09342608838671,2357 -112.2668139013453,36.09251058776881,2357 -112.2677826834445,36.09189827357996,2357 -112.2688557510952,36.0913137941187,2357 Representation of Geographical Areas • A geographical area is portion of land that can be considered as a unit for the purposes of some geographical classification. • It may be as small as a park or a neighborhood, or as large as a continent or an . Metropolitan areas, for example, help define the borders of large population centers for a census and other official purposes. • A geographical area may be represented by means of a name referred to a l National Geostatistical Framework (MGN)

• Basis for the geographical reference of census and statistical surveys • Divides the Mexican territory in parts called “geo-statistical areas” • Contains three disaggregation levels: • Estate (AGEE) • Municipality (AGEM) • Basic (AGEB) • Urban or Rural • Contains localities and blocks Elements of the MGN

• For each disaggregation level, the MGN contains: • Code lists • Polygons Proposal to Represent Geospatial Data in SDMX Including Geographical Coordinates and Areas at any Level (like Observation) • To represent any geographical place we can add an attribute (Coordinates: COORD) or a set of attributes (Latitude: LAT Longitude: LON [Height: H] [Coordinates Reference System: CRS]) • We need to decide which would be the representation that we are going to use: • ISO 6709 • GML • KML • A SDMX defined • To represent any geographical area we can add an attribute (RefPolygon) composed by a set of coordinates. • We need to decide which would be the way we want to represent the polygon: • GML • KML • A SDMX defined Including Geographical Coordinates and Areas at any Level • Original: • ISO 6709 Single String: • Notes: Height and CRS would be optional. If CRS is not set then we would interpret that WGS84 is being used • ISO 6709 / ISO 19136/ GML: N H i h d SRS ld b i l If SRS i h ld i h WGS84 i Including Geographical Coordinates and Areas at any Level • Original: • KML 2.3 (A): • Notes: Altitude would be optional. KML defines WGS 84 (EPSG-4326) as the Coordinate Reference System to be used • KML 2.3 (B): 19.4319716” LONGITUDE=“-99.13342539” ALTITUDE=“2250.0” MEASURE="GP" FREQUENCY="A" TIME_FORMAT="P1Y" UNIT="PC" POWERCODE="0"> • Notes: Altitude would be optional. KML defines WGS 84 (EPSG-4326) as the Coordinate Reference

Including Geographical Coordinates and Areas at any Level • Original: • SDMX own set of attributes: • Notes: H and CRS are optional. If CRS is not set then we would interpret that WGS84 is being used Defining a Geographical Area by a Polygon • A geographical area can be represented by a Polygon of coordinates • The polygon can be included as an attribute (GEO_POLYGON) • We can represent it in different ways: • In a GML tupleList Style: GEO_POLYGON=“45.256,-110.45 46.46,-109.48 43.84,- 109.86 45.8,-109. 2”

• In GML/KML Style: GEO_POLYGON=“-112.265654928602,36.09447672602546,2357 -112.2660384528238,36.09342608838671,2357 - 112.2668139013453,36.09251058776881,2357 -112.2677826834445,36.09189827357996,2357 - 112.2688557510952,36.0913137941187,2357”

• In a SDMX defined Style: GEO_POLYGON=“45.256,-110.45;46.46,-109.48;43.84,- 109 86 45 8 109 2” Including Geographical References to CL_AREA • Geographical areas (CL_AREA) • This code list provides code values for geographical areas, defined as areas included within the borders of a country, region, group of countries, etc.

• A Polygon of coordinates may be associated to each code as an Annotation

• A reference to the Capital City (or to a centroid of the polygon) may be associated with a Coordinates attribute or a set of LAT LON [H] [CRS] attributes Defining a New Type of

• Geographical Dimension might be incorporated to the SDMX Information Model • This dimension may be useful for precise geographical localization of statistical data and their representation into maps • This dimension should be derived from DimensionComponent • It would include attributes like: • Latitude • Longitude • Height • Datum • GeoPolygon • LayerID SDMX Concepts Related to Geographical Issues

• Counterpart reference area (VIS_AREA) • The secondary area, as opposed to the reference area, to which the measured data is in relation. • Reference area • Country or geographic area to which the measured statistical phenomenon relates. • Geographical areas (CL_AREA) • This code list provides code values for geographical areas, defined as areas included within the borders of a country, region, group of countries, etc. SDMX Concepts Related to Geographical Issues

• Comparability – geographical (COMPAR_GEO) • Extent to which statistics are comparable between geographical areas. • Coverage (COVERAGE) • The definition of the population that statistics aim to cover. The "coverage" encompasses the descriptions of key delimiting the statistics produced, e.g. geographical, institutional, product, economic sector, industry, occupation, transaction, etc., as well as relevant exceptions and exclusions. • The term Coverage describes the scope of the data compiled, rather than the characteristics of the survey. Some Examples of Information Systems With Intensive Use Geo-Referenced Data Mexico in Numbers

• Information from national to municipal level • Several domains: • Economy • Environment • Demography • Society • Government • Visualization in charts of maps • Thematic, cartographic or maps

http://www.beta.inegi.org.mx/app/areasgeograficas/?ag=01#tabMCcollapse-Indicadores National Statistical Directory of Business Units (DENUE) • Identification and localization of 5’078,714 active business units in Mexico • Data from 2012, permanently updated • Selection of establishments by size, economical activity, and geographical area

• Geographical space http://www.beta.inegi.org.mx/app/mapa/denue/ chose by the user National Statistical Directory of Business Units (DENUE)

http://www.beta.inegi.org.mx/app/mapa/denue/ National Inventory of Dwellings (INV) • Data of dwellings, population and urban environment • Information from the 2010 National Census of Population and Dwelling and the grown between 2010 and 2015 • Shares the DENUE platform • Indicators contained: • 12 of dwelling • 7 of population • 14 of urban environment (vial infrastructure, equipment and services, access and commerce in the public roads) National Inventory of Dwellings (INV)

http://www.beta.inegi.org.mx/app/mapa/INV/Defaul t.aspx?ll=22.993165420735874,-108.9915234375&z=5 National Inventory of Dwellings (INV) Discussion Questions to Solve About Representing Geospatial Data in SDMX • Is it relevant to represent geographical information in SDMX? • Which representation style may be used to represent geographical coordinates in SDMX? • Which representation style may be used to represent geographical areas in SDMX? • CL_AREA must be modified to associate geographical polygons or is it better to create a new code list (CL_GEOAREA)? • Who would be in charge of maintaining this code-list? • Geographical information must be set as any regular dimension? Or, do we need to include a new type of dimension (Geographical Dimension) in SDMX? • What else will we need to support this characteristics (modifications to the web services APIs software tools etc ) Conclusions Space to fill with the conclusions… Additional Questions or Comments?

Contact: TWG@.org [email protected] References

• Encyclopedia Britannica (www.britannica.com) • International Earth Rotation and Reference Systems Service (www.iers.org) • International Hydographic Organization (www.iho.int) • International Union of and (www.iugg.org) • ISO (www.iso.org) • John A. Dutton E-Education Institute, PennState (www.e- education.psu.edu) • National Geographic (www.nationalgeographic.org) • Open Geospatial Foundation (www.ogc.org) • Open Source Geospatial Foundation (www.osgeo.org) • Wiki di ( iki di ) Some Known Names for Common CRS

NGVD 29 Datum 1929 OSGB36 1936 SK-42 Systema Koordinat 1942 goda ED50 European Datum 1950 SAD69 1969 GRS 80 Geodetic Reference System 1980 ISO 6709 Geographic point coord. 1983 NAD 83 1983 WGS 84 1984 NAVD 88 N. American Vertical Datum 1988 ETRS89 European Terrestrial Ref. Sys. 1989 GCJ-02 Chinese obfuscated datum 2002 Geo URI Internet link to a point 2010 ITRF92-ITRF2014 International Terrestrial Reference System SRID Identifier (SRID) UTM Universal Transverse Mercator (UTM)