RASTER DATA HANDLING IN SPATIAL : The Case for Images

Lúbia Vinhas Ricardo Cartaxo Gilberto Camara Karine Ferreira Antonio Miguel Vieira Monteiro

DPI/INPE

9th Workshop on Meteorological operational systems ECMWF AnAn OutlineOutline ofof thisthis TalkTalk

‰ INPE’s Motivation

‰ The Rationality for Having Images Stored in DBMS

‰ The Challenges

‰ Our Solution and Where We Are at this Stage

‰ Algorithm Development: API for Images Spatial Operations

‰ Conclusion and Future Works INPE’sINPE’s MotivationMotivation

ƒƒ SatelliteSatellite AcquiredAcquired DataData isis EverywhereEverywhere !!!!

ƒƒ SatelliteSatellite DerivedDerived ObservationalObservational DataData •• LargeLarge MassMass ofof HighlyHighly DimensionalDimensional SpatioSpatio-- TemporalTemporal DataData

ƒƒ 3030 YearsYears ofof lessonslessons learnedlearned fromfrom dealingdealing withwith HighHigh DimensionalDimensional SpatioSpatio--TemporalTemporal ImageImage DataData fromfrom EarthEarth RemoteRemote SensingSensing SatellitesSatellites andand AirborneAirborne Sensors.Sensors.

ƒƒ INPE’sINPE’s ImageImage DataData CentreCentre ProjectProject TThehe RationalityRationality forfor HavinHavingg ImaImaggeses StoredStored inin DBMSDBMS

ƒƒ AA NewNew GenerationGeneration ofof SpatiallySpatially EnabledEnabled DBMS;DBMS;

ƒƒ HugeHuge AmountAmount ofof DataData thatthat mustmust bebe DealtDealt with,with, comingcoming fromfrom aa VarietyVariety ofof SensorsSensors overover aa varietyvariety ofof plataformsplataforms;;

ƒƒ MakeMake DataData recoveryrecovery andand IntegrationIntegration aa moremore easyeasy Task;Task; TheThe ChallengesChallenges

ƒƒ Technological Challenges: • Efficient Spatially Enabled DBMS • Provide spatial operations on spatial data types stored in different DBMS

ƒƒ Scientific&Technological Challenges: • Methods and Techniques for Parameter/Pattern/Information-Content Extraction from High Dimensional Integrated Spatio-Temporal Datasets TheThe Challenges:Challenges: TheThe ApplicationsApplications NeedsNeeds drivingdriving thethe TechnologyTechnology NeedsNeeds

•• RunRun inin aa corporativecorporative environmentenvironment

•• AccessAccess datadata byby internetinternet andand intranetintranet

•• TypicalTypical useuse ofof imageimage datadata isis visualizationvisualization

•• IntegratesIntegrates descriptivedescriptive datadata storedstored inin aa conventionalconventional objectobject--relationalrelational DBMSDBMS

•• IntegratesIntegrates vectorvector datadata TheThe Challenge:Challenge: TheThe BasicBasic RequirementsRequirements

ƒƒ TheThe ImageImage DataData shouldshould bebe storedstored inin thethe existingexisting objectobject--relationalrelational databasedatabase managementmanagement systemsystem

•• DataData integrityintegrity andand consistencyconsistency

•• IndependentIndependent andand effectiveeffective accessaccess byby usersusers ofof multiplemultiple applicationsapplications TheThe Challenges:Challenges: TheThe ResearchResearch NeedsNeeds drivingdriving thethe ScientificScientific NeedsNeeds

‰‰Parameter/Pattern/Parameter/Pattern/ InformationInformation--ContentContent Extraction:Extraction:

ƒƒ AnotherAnother TypicalTypical useuse ofof imageimage datadata isis gettinggetting informationinformation outout ofof it:it:

Needs:Needs: NewNew MethodsMethods andand AlgorithmsAlgorithms OurOur AimAim …… ƒƒ ProvideProvide aa ResearchResearch TestbedTestbed forfor DealingDealing withwith LargeLarge RasterRaster DatasetsDatasets thatthat cancan helphelp in:in:

• Enabling Data Integration. Grid Data, Image Data, Observations Data and other Geographic Data types could be used together;

• Enabling easy new algorithms development for parameter extraction from Satellite Image Datasets;

• Enabling the test of new spatial-temporal statistics methods for “mining” high dimensional datasets …… andand WhereWhere wewe areare atat thisthis StageStage

ƒƒ AdvancesAdvances inin databasedatabase technologytechnology provideprovide supportsupport forfor majormajor advancesadvances inin nonnon--conventionalconventional databasedatabase applicationsapplications

ƒƒ SpatialSpatial DataData inin RelationalRelational DatabasesDatabases •Integration of spatial data types in object-relational management systems •Efficient handling of spatial data types • vector: polygons, lines and points • RasterRaster DataData Structures:Structures: ImagesImages oror anyany otherother GriddedGridded datadata •Tools for query and manipulation of spatial data ItIt isis TimeTime forfor Images…Images…

ƒƒ AA specialspecial interestinterest inin thethe spatialspatial databasesdatabases communitycommunity isis thethe efficientefficient handlinghandling ofof rasterraster datadata

ƒƒ AnAn approachapproach isis toto developdevelop specializedspecialized imageimage datadata serversservers •• MainMain advantageadvantage:: thethe capacitycapacity ofof performanceperformance improvementsimprovements OurOur ApproachApproach

ƒƒ IncludeInclude BuildingBuilding RasterRaster DataData ManagementManagement capabilitiescapabilities intointo ObjectObject--RelationalRelational DatabaseDatabase ManagementManagement SystemsSystems

•• MainMain advantagesadvantages:: •• eeasyasy interfaceinterface withwith existingexisting useruser environmentsenvironments •• ToTo accommodateaccommodate notnot onlyonly typicaltypical ImageImage Data,Data, butbut alsoalso RasterRaster DataData inin generalgeneral OurOur SolutionSolution ……

ƒƒ OurOur TechnologicalTechnological Solution:Solution:

TerraLibTerraLib (http://www.terralib.org) ƒƒ ThisThis workwork isis partpart ofof thethe developmendevelopmentt ofof ƒƒ TerraLibTerraLib isis anan OpenOpen SourceSource LicencedLicenced (LGPL)(LGPL) GeographicGeographic LibraryLibrary forfor providingproviding supportsupport forfor thethe developmentdevelopment ofof GeographicGeographic ApplicationsApplications poweredpowered byby SpatiallySpatially enabledenabled DBMSDBMS

ƒƒ MainMain featuresfeatures:: ƒƒ GeometryGeometry isis storestoredd andand managedmanaged inin thethe DBMSDBMS ƒƒ FacilitiesFacilities supportedsupported inin differentsdifferents DBMSDBMS asas ORACLE,ORACLE, PostgreSQLPostgreSQL,, MySQLMySQL,, ORACLEORACLE Spatial,Spatial, PostgreSQLPostgreSQL//PostGISPostGIS,, MSMS DatabasesDatabases throughthrough ADOADO TerraLib

‰ Interface with DBMS

TerraLib TeDatabase

ADO Driver MySQL Driver OracleSpatial Driver PostgreSQL Driver

Access

MySQL Oracle PostgreSQL Spatial SQLServer ImageImage ((RasterRaster)) DataData NeedsNeeds

ƒƒ efficientefficient storagestorage andand indexingindexing mechanismsmechanisms

ƒƒ decodingdecoding ofof thethe differentdifferent imageimage datadata formatsformats

ƒƒ basicbasic datadata manipulationmanipulation functionsfunctions

ƒƒ convenientconvenient waysways ofof accessingaccessing thethe imageimage datadata byby algorithmsalgorithms Two Main Aspects

1.1. AA DBMSDBMS DataData ModelModel

ƒƒ TablesTables schemaschema ƒƒ SpatialSpatial indexingindexing ƒƒ SupportSupport toto compressioncompression

2.2. AA setset ofof C++C++ classesclasses toto allowallow applicationsapplications toto dealdeal withwith RasterRaster DataData

ƒƒ EfficiencyEfficiency andand flexibilityflexibility toto accessaccess thethe datadata DBMS Data Model

ƒƒ Defines,Defines, atat aa physicalphysical level,level, howhow toto storestore rasterraster datadata inin aa objectobject--relationalrelational databasedatabase

ƒƒ AnAn ineffectiveineffective approach:approach: ƒƒ Store each point of the image in a row of a table [x,y,z]

ƒƒ AnotherAnother approach:approach: ƒƒ The entire image is written to a blob and stored in a field of a table ƒƒ AA variationvariation ofof thethe secondsecond approachapproach waswas adopted:adopted: ƒƒ Tiles of image are written to a blob and stored in a field of a table Tiling

ƒ Specific parts of the image can be retrieved and processing independently

ƒ User control over the size of the tiles ƒ Example: zooming operation Tiling → DBMS Data Model

ƒ Each raster data is stored in a table

ƒ Each row stores a tile of a particular band

tile_id band blob

T1 1 ...

T1 2 ...

T1 3 ... Multi-resolution

Large image Small canvas

ƒ Image is shown with a degraded resolution ƒ Much of the information retrieved is not used Multi-resolution

ƒ Lower resolution versions of the image are also stored in the database ƒ Application decides the best resolution level to be retrieved ƒ User control of the number of resolution levels

Level 2 → Resolution * 22

Level 1 → Resolution R * 21

Level 0 → Resolution R * 20 (original) Multi-resolution

ƒ To store an image in a lower resolution less tiles are needed

30 m

60 m

120 m

240 m Multi-resolution

ƒ Each row of a Raster table contains information about the level of resolution of the tile

tile_id band resolution_factor blob

T1 1 0 ...

T1 1 1 ...

T1 2 0 ...

T1 2 1 ...

T1 3 0 ...

T1 3 1 ... Spatial Indexing

ƒ For each tile the coordinates of its bounding box are stored

ƒ Using a SQL statement an application can select the tiles that intercept a given area in a given resolution level

tile_id band resolution_factor lower _x lower_y upper_x upper_y blob T1 1 0 ... T1 1 1 ... T1 2 0 ... T1 2 1 ... T1 3 0 ... T1 3 1 ...

SELECT * FROM raster_table WHERE NOT (lower_x > 10 OR upper_x < 20 OR lower_y > 10 OR upperY < 20 ) AND resolution_factor = 0 Accessing Pixels Individually

ƒ Typical image processing algorithm:

for i=0 to num rows for j=0 to num cols process Image(i,j) ƒ To query the database for each pixel of image can be costly

ƒ Solution: keep a cache of tiles in memory Virtual Memory

ƒ Optimize the access of pixels of an image ƒ Tiles in memory have the same identification of the database

Application Cache (3 tiles) (i,j) ? Tile XW (i,j) ? Tile X Tile Y Tile Tile Y Tile Z Pixel value Tile Z Tile W Tile Id

tile tile

tile tile should remain consistent should remain consistent ƒ

tiles tiles or (i,j) (x,y) Identification unique identification for each unique identification for each over mosaic operations A The function should return the same identification for every pixel that belongs to a The identification of over mosaic operations A The function should return the same identification for every pixel that belongs to a The identification of ƒ ƒ ƒ Tiles ƒ ƒ ƒ )(x / W) )(x / W) )(y / W) )(y / W) int ,b int ,b int int a a B B a = ( b = ( a = ( b = ( ⇒ ⇒ ⇒ ⇒ 1536m) No Data (x,y) (x,y) × j,i+m B (i +m) * W ... j,i+1 (in geographical units. I.e.: 1536m B i +1) * W ( H H × i i i , , , W W Identification j+1 j+n j+2 j,i B B B B i * W size: ...

Tiles Tile j * H +n) * H +1) * H (j (j Tiles Identification

Images can “grow” and identification of the tiles remains consistent

(j+N) * H Bj+N,i

Bj+2,i

Bj+1,i (j+1) * H

j * H Bj,i Bj,i+1 Bj,i+M

i * W (i +1) * W (i +M) * W Compression

ƒ Tiles can be compressed before stored in the database

ƒ Compression techniques: Zlib, JPEG or wavelets

ƒ An image of de 1778x2804 pixels (4985512 pixels), 1 band, X and Y resolution of 25m, stored in tiles of 512x512 pixels:

ƒ No compression - 6291456 bytes

ƒ ZLIB - 3746080 bytes (~59.0%)

ƒ JPEG 75% - 814694 bytes (~12.5%) Metadata

ƒ Database should also store metadata of the images in auxiliary tables API Raster TerraLibTerraLib

ƒ TerraLib provides a set of C++ classes do deal with Raster Data ƒ Class TeRaster ƒ Grid values are double ƒ Methods getElement and setElement access elements of a Raster ƒ Class TeRasterParams ƒ Information about a Raster representation ƒ Class TeDecoder

ƒ Strategy Pattern: allows the access to

different formats and storage aspects Decoders

ƒ Encapsulates the access to the elements of a Raster data ƒ Explicitly instantiated or defined from a file name for example ƒ Extensible Manipulation

ƒ Functions to import raster data into the database

‰ Class TeRasterRemap make a copy of a Raster Data solving differences in ƒ projections ƒ bounding boxes ƒ resolutions Manipulation

TeRasterRemap :

ƒ Import from file to database

ƒ Clipping

ƒ Mosaic

ƒ Visualization

ƒ Reprojection Importing

TeDecoderVirtualMemory TeDecoderVirtualMemory

getElement(c,l,b,val,res) setElement(c,l,b,val,res)

TeRasterRemap remap(); TeDecoderTiff remap.setInput(rIn); TeDecoderDatabase remap.setOutpu(rOut); getRasterBlock(blockId,...) getRasterBlock(blockId,...) remap.apply() putRasterBlock(blockId,...)

rOut

brasilia.tif -> rIn Visualization

TeDecoderQtCanvas TeDecoderVirtualMemory setElement(c,l,b,val,res) setElement(c,l,b,val,res)

TeRasterRemap remap(); remap.setInput(rIn); TeDecoderDatabase remap.setOutpu(rOut); getRasterBlock(blockId,...) remap.apply() putRasterBlock(blockId,...)

rOut

brasilia.tif -> rIn API for spatial operations on Images

DBMS TerraLib Oracle Access Spatial Spatial Spatial Operations Operations Geographic API for Application Spatial MySQL Operations Postgre SQL API – Zonal Operation

• Calculates statistics over a region or a zone of a Raster Data

API – Raster Data

• Mask Operation – Clips a raster data using a mask API – MASK Operation

– Clips a raster data using a mask

The Use of Iterators

ƒ Mechanism to traverse a Raster Data only in a region inside or outside a specific polygon

ƒ Developed: ƒ Iterator concept on TeRaster structure ƒ IteratorPoly ƒ Route strategies Algorithm Development made Easy

• Iterator is an abstraction of a pointer to a sequence

Algorithm TeCalculateStatistics(itBegin, itEnd, stat)

TeRaster::iteratorPoly itBegin = raster->begin(poly, TeBoxPiIn) Iterator TeRaster::iteratorPoly itEnd = raster->end(poly, TeBoxPixelIn)

Data TeRaster* raster Structure Conclusions

ƒƒ TilingTiling ++ MultiMulti--resolution:resolution:

ƒƒ EfficientEfficient toto visualizationvisualization applicationsapplications

ƒƒ TeRasterTeRaster providesprovides anan easyeasy interfaceinterface toto algorithmsalgorithms

ƒƒ TeDecoderTeDecoder providesprovides flexibilityflexibility toto dealdeal withwith differdifferentent typestypes ofof RasterRaster datadata Conclusions

ƒƒ TheThe developeddeveloped API:API: ƒƒ ProvidesProvides spatialspatial operationsoperations onon aa highhigh levellevel ofof abstractionabstraction forfor thethe developersdevelopers ofof geographicalgeographical applicationapplication

ƒƒ ExploresExplores aa newnew generationgeneration ofof objectobject--relationalrelational DBMSDBMS thatthat managemanage geographicalgeographical datadata Future Works

ƒ Implement other operations on Raster Data: ƒ Mathematical Operations ƒ Reclassify ƒ Slice ƒ Weight ƒ Extend the API to support new spatial extensions ƒ Spatial Extension in MySQL (release 4.1) ƒ ƒ Use future resources of spatial extensions to treat Raster Data (ex. Oracle Spacial)