Data Format and Database Development
Total Page:16
File Type:pdf, Size:1020Kb
DATA FORMAT AND DATABASE DEVELOPMENT Vicki Drake Santa Monica College
Questions, Sources, Databases Data Questions • Data source questions to ask when using actual data – For what purpose was the data collected? – What was the method by which the data was collected? – What are the entities represented in the data? – Are the entities represented as discrete – or as aggregations? – What time period is represented in the data? Is it a “snapshot” or a continuous log of observations? – What person or agency is the immediate source of the data? – What is the coordinate system used in the data? Datum? – If the data is ‘projected’ – what projection was used? – What format is the data distributed in? – Is the data format going to need conversion for use? • Questions about Tabular Data – What are the referencing systems used in the data source? Is there a data dictionary to explain the classification system used?
GIS Data Formats: Tabular Data • Tabular Data Formats – Tables are a common format for handling alpha-numeric data • Common Tables include: – dBase (.dbf) Format • a format for transferring tabular information from one application to another that preserves information in columns – Comma-delimited Text • files are plain text files • the lines of the file are interpreted as the rows of a table • the cells within each row are separated by commas • ArcView expects the first line in file to contain ordered list of column names
GIS Data Formats – Vector Formats • Vector GIS Data formats: – ArcView Shapefile • each data set is a collection of files • each shapefile needs three files: – *.shp – *.shx – *.dbf – ArcInfo Coverage • contain point, line and polygon objects in a “folder”. • associated with data in a directory called “info”. • a directory containing coverages and an info directory is a “Workspace”. – ArcExport.e00 format • ArcInfo coverages cannot be removed from workspace, but can be passed as export files into ArcView. – MapInfo Import Format (.mif and .mid) • MapInfo datasets (‘tables’) employ several related files, similar to ArcView • MapInfo developed this format to simplify transfer process between networks. • ArcView uses ‘mif2shape’ program to import these files
GIS Data Formats – CAD Formats • CAD – (Computer-Aided Design) File Formats
1 – CAD primarily manages geometric objects, but are not usually adept at handling attribute data associated with the geometry. • CAD drawings may lack real-world coordinate systems • CAD File coordinate systems can be registered (rotated, scaled and shifted) to be displayed with other ArcView data. • Geometric data can be imported from CAD into a GIS by ArcView using: – .dxf (AutoCAD’s Drawing Exchange format) – .dwg (AutoCAD drawing) – .dgn (Integraph Microstation format)
GIS Data Formats – Raster Image and Formats • Raster image data – An image is typically composed of a matrix of square cells or pixels – Pixels have any value expressed in a byte (256 different values) – Image formats do not contain information about real-world coordinate systems, but ArcView can access an image world file for information • Raster GIS Formats – Similar to image data as information stored in regularly spaced locations (cells). – Raster data permits assignment of attributes to cells – cell values accessible through tables • ArcInfo GRID Format Data – A GRID is a directory, complete with a folder associated with data in a directory called “info”
GIS Database Development • Information can be entered into a GIS database using a variety of techniques – Dependent on format, content, condition of source material – Dependent also on accuracy and quality requirements • Data Preparation – Source documents used in the creation of a GIS database include maps, paper records, card files, automated databases or aerial photography. – Source preparation • Ranges from organizing and assembling maps to a detailed review and revision of content – Document Scrubbing • A time-consuming process for resolving inconsistencies • Redrafting documents involved. – Photographic enlargement and reduction are also tasks performed in document preparation. – In some cases, field verification may be required to verify map features and locations.
GIS Database Development - Data Automation • Photogrammetric Compilation – The primary source used: aerial photography – Specialized equipment (a stereoplotter) projects overlapping aerial photos for a 3-d picture of terrain - a Photogrammetric Model • The location of survey controls points, the mathematical relationship between the control points (trigonometry), and all other visible features, is re-established. – A map is compiled by digitizing the location of roads, buildings and other features on stereoplotter. – Digitized information is rechecked, edited with annotations added creating a planimetric base map.
Photogrammetric Workstation GIS Database Development - Trace Digitizing • A digitizing workstation, with a digitizing tablet and cursor is used • Both the tablet and cursor are connected to a computer that controls the functions. • Digitizing involves tracing features on a source map, taped to a digitizing tablet, using a cross hair in the digitizing cursor
2 • The computer is programmed to accept the location and type of feature
The Digitizing Operation • Three or more control points digitized for each map sheet – Easily identified points (intersections of major streets, major mountain peaks, coastlines) – Coordinates of points must in same coordinate system used in final database (lat/long, State Plane Coordinates) – Control points used to calculate and convert coordinates to final system (the more control points – the better!) Digitizing the map contents can be done two ways: – Point Mode - operator identifies the points to be captured explicitly by pressing a button • most common digitizing method – Stream Mode - operator captures points at set time intervals Advantages and Disadvantages of digitizing processes: – Point Mode - the operator selects points subjectively (no two operators will code the same way) • requires operator judgment – Stream Mode - generates large numbers of points (some may be redundant) • more demanding on user
Digitized Map Problems • Paper maps not drafted for purpose of digitizing – Paper maps unstable • each time map is removed and re-affixed to digitizing table, reference points must be reentered. – Stretching or shrinking of maps over time • digitized points off set from previously digitized points – Maps display information - not always accurate locational information • Errors in maps is entered into GIS database • Boundary discrepancy also show up in GIS
Digitizing Errors • Operator error – Fatigue – Boredom • Editing errors – Small gaps at line junctions – Overshoots, loops, undershoots, spikes at intersections of lines – Small scale, complex map more susceptible to editing errors • Digitizing Costs – Digitizing rates are approximately one digitized boundary per minute – For the 99 counties of Iowa, it would take 1.65 hours to digitize all the boundaries
GIS Database Development - Coordinate Geometry (COGO) Entry • COGO is a technique for entering boundary information into a GIS database using a keyboard – Distances, bearing and curve calculations from field surveys are entered into a GIS database • GIS software can use information on position of boundaries based on a coordinate grid (such as the State Plane Coordinate System) to create a graphic representation of the lines. • Most commonly used for entering real property boundaries – Some detailed locations for features in field may also be collected and downloaded into the GIS
GIS Database Development - Map Scanning
3 • Using a raster format, optical scanning systems can automatically capture map features, text and symbols as individual cells or pixels. – Sensitive to variations in line type, widths, creases in maps – Post-scanning ‘clean-up’ and editing required • Time-consuming • Software is available with most scanning systems to convert the raster data format into a vector format consisting of points, lines and polygon features. • The attribute data will need to be manually entered into the GIS database.
GIS Database Development - Document Scanning • Smaller-format scanners may be used to create raster data files of permit forms, site photographs, etc. • The documents can then be indexed in a relational database to be queried and displayed by users. • GIS users can then “point at” a file and display the scanned document
GIS Database - Scanners • Video Scanner - a television camera to create computer-readable dataset – Typically have poor geometrical characteristics - including various kinds of spatial distortions and uneven sensitivity to brightness – Black-white or color • Typical data array between 250-1000 pixels on a side – Fast scan times (usually under 1 second) – $500-$10,000 • Electromechanical Scanner - The graphics are attached to a rotating drum. – Drum rotates about an axis, scanner head reads “reflectivity” of target graphic and digitizes the signal - one column of pixels at a time – Slower, but better quality products – Single light source and detector positions on regular grid on graphic to control distortions • Scan spot size as small as 25 micrometers • Scans graphics of order of 1 meter per side – very expensive - $10,000-100,000
GIS Database Development - Heads-up Digitizing • Heads-up digitizing provides a semi-automated setting to convert hard copy maps to vector data formats suitable for a GIS. – Works with scanned data (images or documents) • The scanned raster image is displayed on a workstation monitor, and, using a mouse, the operator will interactively edit and clean the raster image – Remove stray marks or line gaps from scanning process • Using heads-up digitizing tools, the user/operator may select individual raster feature for vector conversion, and enter annotation and attributes. • Various tools can speed the process of vector conversion. – Automatic line-following – Thinning vector conversion – Direct keying of attribute data . Heads-Up Digitizing GIS Database Development - Tabular Data Entry • Attribute data that exists as annotation on maps, or found in paper files, must be entered into a GIS database manually. • Through keyboard entry, the information required for the GIS applications will be converted to a digital form. • GIS software now has capability of creating data entry screens for quality control on information being entered – checking entries against lists of acceptable entries or a range of acceptable values. GIS Database Development - Field Data Collection
4 • Advances in hardware and software allow for capture of GIS data in the field (property surveys, land use inventories, etc.) • Electronic survey systems (Theodolite) and Global Positional Systems (GPS) have opened up surveying and field data collection opportunities. • Survey data can be gathered quickly in electronic form ready for uploading into a GIS. • GPS collection units have provided a quick means of capturing coordinates and attributes of features in the field
GIS Database - Digital Terrain Models (DTMs) • DTMs are computer files that store elevation information as a model of Earth’s terrain. • The model stores data in two main formats – Triangular Irregular Network (TIN) format - a mesh of triangles with defined elevations at vertices – Grid Format - elevation value assigned to each grid cell.
Projections - Mathematically “project” the curved Earth surface onto a flat surface. • Most GIS store map locations in Planar Coordinate Systems – distances measured in regular grids East and West. • The State Planar Coordinate System (SPCS) defines an origin for a specific geographic region or “State Plane Zone” – ‘x,y’ coordinates are defined from the origin of the zone.
Database – Spatial Accuracy • Spatial accuracy most critical factor for GIS database design – Defines how well position of features/boundaries, plotted on map, conform to actual position on Earth – Stated as maximum error between mapped position and actual ground position relative to standard grid – Direct impact on cost of GIS applications
GIS Database - Multimedia Access and GIS • Multimedia provides access to geographic information that exists in a variety of ways to an information system (i.e., GIS) • Multimedia tools integrate GIS data of different types and formats - including: – CAD drawings, architectural plans, field sketches, site photos and more – Aerial photographs – Orthophotos – Field sketches – Site photographs – Text documents
5