Foreword By Nancy von Meyer

This book is a compendium of the best papers presented during 4 years of URISA Street Smart and Savvy Conferences. The book also contains additional papers that are being published here for the first time.

The need for this publication has come from several fronts. URISA’s membership has had a keen interest in the use of address information and its application for data integration and decision support for many years. Nearly every URISA Annual Conference has had sessions and tracks related to addressing, there have been several addressing standards task forces, and in the past few years the Street Smart and Address Savvy Conference has been a highly successful stand-alone specialty event on the topic. The importance of emergency response has been highlighted in the year since September 11, 2001. Address is a critical component of emergency response and hence a critical part of homeland security. The time had come to pull together a summary of some of the recent thinking on address standards, address design and address applications.

This compendium begins with five papers related to address design and the use of in databases and geographic information systems (GIS). Mike Walls describes how to begin an address design project and focuses on the work of a data modeler confronted with designing an address database.

Morten Lind provides an international taste of address formulation. The potential of a simple data element that can release "spatial power" is enormous because thousands of private and public databases reference address information. Morten’s paper is based on experiences with standardization, data modeling, legisla- tion and geo-coding of address data in Denmark and in other Scandinavian countries.

Dan Parr, Scott Oppmann, and I describe an approach to addressing that is a part of a land management system. This paper is new and reflects the design work as of late summer 2002. Amy Purves then examines how to combine multiple sources of street information into a single useable data system. Andrew Pidgeon wraps up our design papers with a discussion about data quality. Those of you who manage and use address information recognize the importance of data quality in address systems. As Andy reminds us, both the technological revolution and the business evolution have put a premium on high-quality information as the key to success.

The remaining papers explore various case studies on the uses and implementation of address information, especially in emergency management. Jay Meehl and Chris Boyd describe their experience in Douglas County, Colorado. They highlight the original design and plan, attribution, quality control, and implementation of a geocodable road centerline. Agencies that want to begin this type of project will gain insight into issues such as linear interpolation errors, parity problems, and multiple address databases.

Peirce Eichelberger and Louise B. Wennberg describe the many benefits of GIS and 911 integration as experienced in Chester County, Pennsylvania. The County has taken an integrated approach to GIS and 911, from the Master Street Address Guide (MSAG) through current implementation and support.

In a previously unpublished paper Andrew Pidgeon describes how realtors use geocoding to spatially enable the multiple listing service (MLS). Geocoding brings information to life. This paper might help you sell your geocoding project and foresees many other uses for street and address files.

In another unreleased paper Greg DiGiorgio describes the problems and solutions concerning street and parcel addressing in geographic information systems for municipal governments. We found Greg through a web search and were very impressed with the work he has done in building consensus and data sharing.

From Newport News to Texas, Marc Berryman describes the Greater Harris County 911 Emergency Network at work in Houston, Texas. This is one of the most technologically advanced 911 systems in the nation with spatial data being maintained and updated on a daily, and often hourly, basis.

For the last three years, the Urban Planning Department at the University of Florida has been involved in various grants assisting Miami-Dade County, Florida in the mapping of pedestrian and bicycle crashes within their jurisdiction. Florida is currently one of the worst states in the nation in terms of the number of pedestrian and bicycle crashes each year, and Miami-Dade County consistently has the greatest number of crashes in the state. This is another unique application of the address. The value of address matching is that it reveals the spatial relationships of the incidents, thereby aiding in the evaluation and definition of a large-scale problem. Ilir Bejleri, Scott Wright, Ruth Steiner, and Richard Schneider explain their research and use of address information.

Sandra Johnson’s paper on an Internet transit trip planner describes a project that started as an Internet- based job placement tool. It was intended to help job placement workers, human resource personnel/trainers and employers route their clients and employees to work, training, childcare and other destinations using the city’s bus system. There are two address implications here. First, this project uses geocoding to locate transit riders and with so many address projects moving to Internet, and second this is a nice review of considerations for any Internet application project.

Lastly, Dan Parr and I have done a review of some of the address standards that may help you locate related standards and to look at the other activities that may help you set your own.

About the Editor Nancy von Meyer is Vice President of Fairview Industries, providing consulting, education, and GIS implementation services to government agencies and the private sector. Nancy works with many counties and local governments on parcel, land records, and system design for automation and modernization projects. She is also active with federal initiatives related to the FGDC Cadastral Data Content Standard, the National Integrated Lands System (NILS), eastern states cadastral initiatives and other land records projects. She recognizes that the address is a critical component of all land management systems.

Nancy von Meyer, PhD, PE, RLS Fairview Industries 233 East Main Street PO Box 100 Pendleton, South Carolina 29670 voice (864)-646-2755 fax (864) 646-2712 [email protected]

Addressing for Emergencies

Table of Contents

Foreword by Nancy von Meyer...... v

About the Editor ...... vii

Selected Papers

“The Object of the Address” Michael D. Walls Street Smart And Address Savvy Conference Proceedings, 2000 ...... 1

“Developing a System of Public Addresses as a "Language" for Location Dependent Information” Morten Lind Street Smart and Address Savvy Conference Proceedings, 2001...... 16

“Oakland County, Michigan Address Standards And Database Design” Nancy von Meyer, Dan Parr and Scott Oppmann ...... 30

“First Step to Addressing the Street Dictionary” Amy J. Purves Street Smart and Address Savvy Conference Proceedings, 2000...... 44

“Enterprise Data Quality Management: The Time Has Come” Andrew D. Pidgeon Street Smart and Address Savvy Conference Proceedings, 2000...... 57

“E911 Multi-Path Approach to Data Capture and Address Range Creation” Jay Meehl and Chris Boyd Street Smart and Address Savvy Conference Proceedings, 2001...... 64

“The Benefits Of Gis/911 Integration—An Approach Worth Emulating” F. Peirce Eichelberger and Louise B. Wennberg Street Smart and Address Savvy Conference Proceedings, 2001...... 75

“Geocoding Brings Information to Life” Andrew Pidgeon ...... 90

“City of Newport News, Virginia GIS Addressing Issues” Greg Digiorgio...... 95

“Location Based Technologies and E911” Marc E. Berryman Street Smart and Address Savvy Conference Proceedings, 2002...... 100

“Difficulties of Timely and Accurate Address Matching” Ilir Bejleri, Ph.D., Scott Wright, Ruth Steiner, Ph.D. and Richard Schneider, Ph.D. Street Smart and Address Savvy Conference Proceedings, 2002...... 107

“Internet Transit Trip Planner” Sandra Johnson Street Smart and Address Savvy Conference Proceedings, 2001...... 120

Appendix and Reference Materials

Address Compendium Authors...... 132

Appendix A: Address Standard Review - Overview Nancy Von Meyer and Dan Parr ...... 136

NENA Addressing Standard...... 145

United States National Grid – FGDC Standard...... 147

EPA Addressing Standards...... 148

Kansas Addressing Standards...... 149 THE OBJECT OF THE ADDRESS Michael D. Walls

Introduction

Why is integrating addressing into a database design so difficult? Often the answer lies in figuring out the many things in the real world to which an address can be assigned and sorting out what the locality has been assigning addresses to over the years. Addresses can serve multiple purposes within a local government setting. Gaining a clear picture of how this process had been implemented and applied in the past can be a daunting but essential challenge, especially when there has not been systematic implementation. Geographic Information Systems (GIS) offer many new tools to tackle this problem, but they are useless without a clear understanding of the data requirements.

This paper focuses on the work of a data modeler confronted with designing an address database. In this context it is important to consider the work done by the addressing specialist. Addresses are not a static attribute of buildings and parcels, but are the result of work processes of assignment and maintenance. Understanding these practices is always essential, and frequently is the only way a data modeler can make sense of the various data sets provided as a starting point for building the address database.

This paper guides the data modeler through these challenges in four steps:

· Identifying the roles of the addressing specialist and their customers.

· Identifying the set of real world things to which addresses must be assigned, and the business processes which such assignments support.

· Identifying legacy data resources that must be reverse engineered to identify correlations with results of the first step.

· Specifying data structures and technical processes with which to build and maintain a data resource that adequately represents those things.

The Process Of Addressing

Early in our country’s history addresses were not necessary. Daily interactions occurred within a small community in which everyone pretty much knew everyone else, or could figure out how to find them if needed. If in doubt, one could simply ask a local resident who would know where the place or person you were seeking was located.

Correspondence from this period almost always was addressed very simply, perhaps to “Mike Walls, Frankfort Kentucky”. If more precision were needed, perhaps a place name would be added such as “Mike Walls, At the PlanGraphics Building, Frankfort Kentucky”.

Modern life would not function if we continued to depend on such local knowledge. There are too many of us, too many structures to rely on an individual’s memory for location. To deal with this complexity, and to accommodate the far greater volume of interactions that we expect in our daily lives, society had evolved a more complex and potentially more accurate and precise form of addressing. We’ve added street names and house numbers, as a replacement for the building name. We’ve even added postal zones to subdivide the city into smaller units. In theory, all a complete stranger needs in order to find me is my address and a map of Frankfort.

This use of the address has demanded a source for cataloging addresses in use and for assigning new addresses. When we move to a new residence or seek to build our dream house, we citizens need someone to tell us what our new address is. In turn, we will use the address as the basic locator to describe our property when we set up our utility service or register to vote. It is also what we will give to service providers when we request help.

All of this is to remind us that addressing is an important business process of local government. It has a lengthy history that may be very different than the way we do things today. Because address has evolved as populations have grown, the methods of assigning address, storing the resulting information and assigning new addresses has also evolved. Local governments assign and maintain addresses to support necessary business processes. Data tables are a by-product, albeit a very important byproduct.

The data modeler working with address information needs to understand the components of the addressing process. Broadly this process includes:

· Communicating the Current Address to Interested Parties

· Reviewing & Assigning Names To Street Segments

· Assigning Addresses

· Resolving Address Problems

· Changing Addresses And Street Names

The individuals charged with performing tasks in these processes will collectively be referred to in this paper as the “addressing specialist”. This specialist must apply a set of standards and systematic practices to these tasks to provide quality addresses information. The basic standards and practices are:

· Uniqueness of Address - Is there only one location within the jurisdiction identified as “112 Main ST”?

· Unambiguous Identification - Do we need to distinguish between East Main ST and West Main ST?

· Locatability - Could someone use a street map to get near vicinity of my location, using only the address as a clue? (For this to work, the two prior points must be satisfied, but also the office must be associated with Main Street in obvious ways and it should be oriented to the other 100 block addresses within the City in a consistent and predictable manner.)

However, in developing and subsequently applying these standards and practices the addressing specialist must always consider how the resulting address will be used. Part of this consideration must include how it will be stored in various data management systems, ranging from a homeowner’s address book to the E911 dispatch system.

The Process of Data Modeling

Computers and methodologies of data management have evolved to support businesses and governmental agencies in their business operations. A key part of data management has come to involve addresses of one type of another. However, practitioners very quickly found that they cannot put together the data sets without prior planning. This is the role of the data modeler, to provide the data planning support necessary to integrate information from a wide variety of sources.

Why Model? When we set up a data resource we are modeling or representing certain aspects of reality. Many possible representations exist for a real world problem. The only true measure of quality is whether or not the resulting data resources meet the needs of the users. The challenges lie in (a) identifying all the users, (b) recognizing their various needs, and (c) integrating the results into a single, internally consistent data model.

To understand the modeling process we need to explore the relationship between the physical implementation of the database system, which is comprised of programs, data files, and etc., and the real world that ultimately is what users of the data resource are really interested in. Connecting these two is the logical data model.

Analyzing External Reality The real world can be conceived of as a highly complex system of which we humans have only limited comprehension. The ways we comprehend reality are significantly shaped by our experience, training, and theoretical orientations.

We try to "make sense" out of reality by concentrating on only those aspects we deem important in a particular context. We analyze reality in terms of the particular problem, attempting to identify those parts that are involved. Those parts which are complicated enough to be treated as subsystems may be further broken down into their component subsystems, until we are left with the minimum complexity required to deal with the problem at hand.

To clarify this potentially confusing process, we need a set of fundamental definitions. These are often referred to as real world primitives. These include:

· System. That portion of reality that is relevant to a particular problem. It is comprised of things and the relevant associations among them. This is often referred to the organization or "structure" of the system. "Function" is a more loosely defined concept that is also important to a system. An operational definition for our purpose is that a function of a system component is a reason why that component is relevant to the problem.

· Thing. A real phenomenon that can be represented in English by a noun or verb. Some writers use the term "object", but with the advent of object-oriented programming this has become ambiguous. It may be a physical thing like an oil well that can be touched or counted, or an abstract thing like ownership. Some things like an ecosystem may be intermediate in nature.

· Thing Class. A group of things formed by relevant generalization. The things "Mike" and "Joe" and "Debbie" can all be grouped in "people", "geographers", and "property owners", plus an infinity of other groupings. Which grouping(s) we need to work with depends on its relevance to solving our problem.

· Association. An association is a relationship or connection between things. The statement "Developer XYZ filed rezoning application R89-123" is an instance of an association between two specified things. If this connection can be generalized to all members of the thing classes to which these things belong, it can be directly captured in the database design. As you might suspect, an association is a special kind of thing.

· Property. A relevant characteristic of a thing. The statement "Rezoning case R89-123 affects 80 acres of land" specifies one property of that particular case. Here again our analysis of the problem must guide us in selecting only those that are relevant among the infinite number of properties that we might identify. For instance, zoning case R89-123 might weigh 2.3 pounds, but we probably don't care -- unless our problem involves selecting file cabinets to store zoning applications. Since its properties are normally used to classify a thing into a thing class, most of the relevant properties can be used to describe all things in a class. If not, this presents a possible means of refining our thing class definition into two smaller classes, things that share this property and things that do not.

· Fact. The intersection of a given thing and a given property, subject to the constraint of the property value set; that is, the specific value assigned to a property for a specific thing. A fact is an assertion of truth about a statement describing a particular thing. You must always keep in mind that your database is not reality but merely a representation of reality. When reality changes, your database must follow.

Modeling Reality in a Logical Model

We develop a logical model of the real world system that then guides us in specifying a physical implementation in the database management system environment. The logical model uses a set of components that correspond to the real world primitives that we have already defined.

· Model. A logical model represents a simplified subset of the real world system, with selection logic determined by a viewpoint dependent on our problem situation. The model should be rich enough to represent every component of the system that has significance to our problem situation, but nothing more. Most of the work in database design involves creating, testing, and revising models that are successive approximations of this ideal.

· Entity Instance. For each thing determined to be part of our system, we must create a corresponding entity instance.

· Entity. Sets of related entity instances are grouped into an entity, one for each thing class we identified. (WARNING: Many writers on database design use "entity" as both singular and collective; the meaning is usually clear from the context if you keep the underlying distinctions clear.)

· Attribute. Each relevant property identified for the thing is represented by a corresponding attribute in the logical mode.

· Attribute Class. Attributes are aggregated into attribute classes. Part of the entity definition is the set of attribute classes which represent the relevant property classes.

· Relationship. Each relevant association between two things is represented by a corresponding logical relationship.

· Relationship Class. Relationships are also aggregated to represent the associations among thing classes. (WARNING: As you work through the design process, you will find that you wind up treating some relationships classes as entity classes. This is a by-product of the logic of modeling, and will make sense in context; just start out with the stricter definition of entities.)

Putting It All Together

The final step is for an applications developer to implement the logical design using the chosen GIS and/or database management system (DBMS) tools. This implementation will consist of one or more relational tables and graphics files, each of which will contain a certain number of data fields or variables. Each record in these files thus stores a single fact from our real world system.

Unfortunately, the database design task is made difficult by two kinds of complexity. First, the components of the three conceptual systems must be kept straight. Table 1, below, lists the components of each system -- including more detail that we used in our brief discussion -- and shows how the components relate to one another.

Second, the design process frequently creates rather artificial and seemingly unnatural entities that do not clearly correspond to the real world primitives.

Table 1: Correlations Among Conceptual Systems

REALITY CONCEPT IMPLEMENTATION System Model Database or equivalent A Thing Entity Instance Tuple or Record A Class of Things Entity Class Table or File or Layer A particular Association between 2 Relationship Instance Values returned as the results of particular things particular JOIN operation on specified entities. Property of a particular thing Attribute Instance Value to be stored in the storage location assigned to that attribute for particular Record Property of a kind of Class of Attribute Class Storage location assigned to Attribute Things Property Value Set Domain of Attribute Constraints OR Set of actual contents Fact Attribute Value for Value stored at location assigned to particular thing location assigned to instance

What In The World Gets An Address?

Looking out my window I see a typical city street corner. There is a restaurant on the corner with its parking lot holding several cars. Next to it and sharing a common wall is a garage, with doors on my street and on the alley behind it. The buildings appear to have been built in the 1920s, and have undergone several obvious remodelings. A door between the two businesses leads to stairs to two apartments over the restaurant. A woman is using the pay phone installed on the vacant lot next door to the garage. Her dog is tied to the utility meters, which are centrally located between the two structures.

In this scene I can find two addresses displayed, indicated by the building numbers over the doors of the businesses. But, are these the only addresses that I might have to consider if I were designing a database for my city?

Here is a list of possible addresses I might have to find a place for:

· Businesses -- The restaurant and the garage have addresses, the house numbers of which are clearly displayed over their entrances. (But, are those addresses assigned to the business or to the structure? More on this point later.)

· Apartments -- Certainly these would have addresses, which might have some relationship to those assigned to the businesses on the ground floor.

· Garage -- Does the garage have an address for the entrance off of the alley? (I happen to know the alley in question is an “addressable street feature”; that is, it can be used for the construction and assignment of situs addresses.)

· Restaurant -- The question about garage entrance addresses prompts me to think about the restaurant. Since it is on a corner lot, does it have a second address off of the cross street?

· Utility Meters -- These are placed to one side of the buildings they serve. Does the utility give them an address for locational purposes independent of the structure(s) they serve? What about the electric service, which I can now see comes into the buildings from the alley? Would the electric meters have a separate address?

· Vacant Lot -- The structures on both sides of it have assigned addresses, but does this lot? For that matter, does the lot on which the restaurant is built have an address independent of the structure? I know this area was subdivided in the late 1800s, so the current building layout is certainly not the original development on the property.

· Parcels -- If the platting of this area was so long ago, it is likely that the lots on the plat do not have much relationship to the current real estate parcels. If so, what is the parcel address?

Having gotten this far just examining the built environment from the perspective of an addressing specialist, I now shift perspectives and look at the scene from the perspective of a data modeler faced with building an enterprise database for this city. What else would I find to have an address?

· Businesses -- Both businesses have to have business licenses. Further, each gets mandatory inspections of different types, such as grease traps and spill handling at the garage or health inspections for the restaurant. What addresses are in those files? Certainly a premise address (which could be any of those already identified), but what about the address on the business license -- it could be the premise, the owner’s home, or a corporate headquarters in another state.

· Cars -- The county issues motor vehicle licenses on behalf of the state. Those cars have a home address, though it may no longer match the current residence of the driver.

· Telephone -- Our city is considering having an address for every telephone. However, the pay phone already has an address in the E911 dispatch system. If

my neighbor using the pay phone were instead using a cell phone, would that phone too have an address?

· Dog -- Like cars, pets have a license that includes the address of the owner. (However, Butch -- the particular instance of our dog entity class -- belongs to my neighbor the scofflaw and is unregistered.)

· Signage -- The city requires a permit for certain types of signs, so the restaurant sign -- on a pole at the corner -- has an address associated with it. Is it the same as that of the business? To further complicate the modeling exercise, the garage sign over its door is not regulated and has no such information.

· Traffic Light -- The City’s old facility inventory forced a signal light to have an address. There are four immediate choices for this address, both sides of each of two streets, and no obvious means of preferring one to the others.

I think the above discussion indicates the potential complexity of deciding what gets an address. To summarize the example, we often see the following categories of features being assigned an address as an identifier: · Lot versus Parcel · Building versus Units · Incidents · Utilities · Other Structures · Entrances and Driveways · Telephones · Licensed Operations or Facilities · ????

It should be apparent by now that not all of the addresses which might be in place in my neighborhood are the same kind of thing. They are used for different purposes, and therefore must be constructed somewhat differently. We should be wary of using apparent purpose uncritically in our addressing activities, however, because many addresses are pressed into service for multiple (and sometimes contradictory) simultaneous purposes. The more common purposes include:

· Mailing Address -- It is convenient to think of this as the locator of the mailbox into which a certain class of mail would be delivered. However, we may need to track multiple classes of mail, including: a) Premise -- The address to use for on-site delivery, perhaps by a parcel delivery firm. b) Occupant -- The individual(s) or firm inhabiting the property. NOTE: Mail box may be on premise or not (P.O. Box, home office out of town, etc.)

c) Owner -- The individual(s) or firm(s) owning all or part of the property. These lists tend to include multiple owners. d) Taxpayer -- The individual or firm paying property taxes. This is not necessarily the owner (could be the mortgage company or attorney). e) Utility Account -- The address of the individual or firm paying for utilities. This is not necessarily the owner or renter (could be the landlord).

· Situs -- The address assigned to the physical property, either the land or structure or unit within the structure.

· Access Point -- The address assigned to the driveway curb cut or to doorways into the building. These are primarily useful for emergency services, which might be mislead by the situs address into cruising back and forth in front of a building when they need to get access off a side street.

· Utility Service -- The address assigned to the facility receiving services. Because utilities can be zoned within a property, different utility services could be billed to different parties within the same premise (e.g. bills for a rented house might go to the tenants while the separate landscaping utilities might go directly to the landlord).

· Utility Drop -- The address assigned to the utility tap, meter, or other connection between the premise services and the utility’s infrastructure. This is analogous to the situs address of the premise, but is the locator for the service drop, which may be off the alley behind the premise or a side street.

· Incident Locator -- The address assigned to a traffic accident, chemical spill, or any other type of incident tracked by the locality. These addresses may correspond to a premise address (when the incident is directly related to the premise, such as a burglary in the garage) or be no more than a convenient locator (such as a traffic accident on the street in front of the garage).

· Administrative Convenience -- These addresses are very similar to those assigned to incidents, distinguishable primarily because the feature to which they are assigned (e.g. traffic signal, manhole) are more permanent than an incident.

Interpretation of Legacy Data

The previous discussions about all the different classes of real world things which might have an address assigned to them, and about the types of addresses which might exist in the near vicinity of a particular thing, should clearly indicate the challenges of understanding what we have in a legacy set of addresses. (If not, try this laboratory exercise: For each of the address categories discussed in the previous section, acquire a

set of colored labels. Write down all the addresses you can find that describe my neighborhood, making sure each address goes on a label of the correct color. Come to my neighborhood and stick a label of the correct color onto the real world thing to which the address applies.)

When we add to this complexity in the real world the ambiguity of representation in the legacy data sets we have to work with, the problem gets even more complicated. It is made still more difficult by the pride of ownership exhibited by the proprietors of the different data sets. Fortunately, the resulting muddle can be unraveled given sufficient commitment of resources and of administrative willpower.

Legacy Data Surprisingly one of the most straightforward situations to resolve is the interpretation of legacy data that is not actively being maintained. (The active data sets involve additional issues of ownership, pride, and conflicting business needs, which we’ll talk about in a moment.) True legacy systems involve “data archaeology”, the interpretation of the data contents through review of data structures and observable clues such as domains of contents. This process generally follows these steps:

· Select the legacy data set to be worked on. Find out everything you can about it, including data layouts, file sizes, management tools, etc. If you are lucky, there will be system documentation that explains something about the reasons the data set was created, but frequently this will be sparse or outdated.

· Catalog the addressing attributes. This often means building a partial data dictionary by reverse engineering the physical data structure and its contents. In an RDBMS, this is facilitated by the built-in metadata that defines tables and columns. In older systems such as flat files (VSAM, etc.) you may be lucky to even find documentation of the total record layout. Often the naming conventions are so cryptic that you will have to resort to direct examination of the contents.

· Classify the type(s) of addresses. Once you decide what columns constitute an address, you can begin to determine what type of addresses you have. There are several ways of deciding, including: o If there is a ZIP Code column you probably have a mailing address. However, if ZIP is physically separate from the other address columns, it may only indicate a geographical classification like Council District. (This is especially probable if the data set was created before the jurisdiction began using GIS technology.) o A combination of City, State, and ZIP almost always means a mailing address. If the domain list of values includes values clearly outside the jurisdiction, you can be sure. Cross referencing this list with known mailing lists (utility customer accounts, tax rolls) can help you decide what kind of mailing list it is -- but you will probably want to exclude any legacy mailing data from your production systems since it tends to become outdated very quickly.

· Cross-referencing the list against other known address lists will help narrow down the category. Historic utility accounts and tax rolls of the appropriate date are useful in this regard. Inspect the list for known anomalies in data (e.g. the utility accounts have older addresses in the western edge of town which still reflect the naming conventions of a Municipal Utility District which has been taken over by the jurisdiction).

· Use the GIS to the legacy address list against the current addressing reference data. Compare the resulting points to those of known data (e.g. parcel situs, building, etc.) See if there is a reasonable match (note that no two data sets will ever give an exact match, due to the fluid nature of the addressing process).

· Work backwards to derive “valid” addresses for each feature. If we have addresses allegedly for water meters and a map of known meter locations, we can decide for each meter what address “makes sense” under current addressing standards and then see how close the addresses in the list under investigation come to the current ones. Note however that addresses to utilities may have been assigned when they were first extended, prior to completion of the street network.

· Decide if the data is worth reuse. The data is not reusable unless we both understand its structure and agree that its contents are relevant to the problem we are trying to solve. When in doubt, don’t use it.

· Design algorithm for reuse. If you decide to reuse the data set in question, you will have to import the data into the new technical environment. This may involve restructuring and scrubbing of the data, or other preparation.

Active Data Sets

Interpretation of a legacy data set that is in active use includes all of the above issues. These are made simultaneously more complicated and easier than when working with abandoned legacy data. This situation is easier in that there are knowledgeable staff around who can help you understand the structure, contents, and intended use of the data. However, their knowledge may be partial or self-serving so it should always be verified against the actual data set. The use of actively maintained systems is more complicated and more challenging because of the politics involved. Multiple address data sets tend to evolve in most jurisdictions (I’ve seen over 20 in a single city government). In part this is a legitimate response to differing business needs, since for example the police need a different set of addresses than does the planning department. All too often, however, the legitimate business differences get lost in bickering over whose file is “best”.

The discussion of how to resolve these issues is far beyond the scope of this paper, which concentrates only on the technical issues. From a technical perspective, there is no reason that multiple, inconsistent address data sets of the same type (e.g. building, situs, addresses) cannot be supported if necessary. Each one should be carefully documented as to the differences, the business reasons such differences are necessary, and how to use each one. Further, the addresses should be internally consistent as much as possible.

Some of these technical debates can be resolved at the level of address assignment data sets rather than in the actual addresses. That is, adding alleys and bike trails to the addressable street centerline may allow the police to generate the incident addresses they need from the “same” street network that other agencies use to generate the addresses they assign.

Data Structures For Representing Addresses

In this section we move to more concrete examples of how to represent addresses in a data model. Previous sections, theoretical and applied, have concentrated on background issues. Here we give specific hints and practices for handling address data.

Essential Data Components of the Addressing Process The data-modeling environment for address data must have some fundamental data components. The exact layouts of these will vary from place to place, but everybody needs the following:

· Approved Street Name List -- What street names are currently in use for addressing purposes.

· Street Centerlines, w. Status Code -- What streets currently exist. Are they physically present, or only platted?

· Master Address File, w. Type Code -- What addresses have officially been assigned? What type of address, to what features? (A GIS point file is an excellent way of maintaining this data.)

· Preservation of Alternative Names, Addresses -- How many names does this street segment have? Which ones are for use in addresses? What was the previous address of this building? (The more history you can preserve, the more useful the administrative records of your jurisdiction become. After all, no one is going back and readdressing old building permits just because the Council renamed a street.)

· Support of Many-to-Many Relationships -- The data management structure must support assigning many addresses to many structures and similar situations.

· Feature Level Metadata, esp. Temporal -- Documentation of the contents of your address data sets is essential for understanding how to use them.

Example Data Structures This section discusses specific addressing problems and shows a data model that will handle these problems. Business rules are stated that describe the problem. The solution is indicated by diagrams in an abbreviated Entity-Relationship notation, in which a rectangle indicates a logical entity, text in the box indicates logical attributes, and arrows connecting the rectangles indicate logical relationships. The arrowheads on the relationship lines indicate cardinality of the relationship. The meanings of these diagrams will be clear from the context.

A Thing Has an Address Business Rule: Each thing has one and only one address. (EX: A parcel has only one situs address.)

THING

PAR_ID

ADDRESS

A Thing Has Several Addresses Business Rule 1: Each thing can have one or several addresses. Some things do not have an address. Business Rule 2: If a thing has an address, one of its addresses is the preferred address. Others are recognized alternatives.

THING THING ADDRESS PAR_ID PAR_ID

ADDRESS ADDR_USE

An Address Must Be On The List Business Rule 1: Each thing has one and only one address. Business Rule 2: Each address must be on the approved address list in order to be assigned to a thing.

THING APPROVED ADDRESS PAR_ID ADDRESS

ADDRESS ADD_STATUS EFF_DATE

Address Related to Street Business Rule 1: Each address is associated with a street segment. Business Rule 2: A street segment may have more than one name. Business Rule 3: An address must be based on a valid street name.

STREET SEGMENT THING ADDRESS

SEGM_ID PAR_ID ADDRESS ADDR_USE

NAME_USED

STREET NAME

SEGM_ID NAM_ID

NAM_ID ST_PREFIX

NAME_USE ST_NAME ST_TYPE ST_SUFFIX ST_STATUS

Multiple Address Types Business Rule 1: A thing can have a mailing address and a situs address. Business Rule 2: A thing can have more than one address of each type. It can also have zero addresses.

THING THING ADDRESS PAR_ID PAR_ID

ADDRESS

ADDR_USE

THING MAIL ADDRESS

PAR_ID

ADDRESS

MCITY MSTATE MZIP

Conclusion Our data model must support the complexities of the day-to-day business practices of our client community. As we’ve discussed, one of the more problematic issues to be dealt with is that of addresses. Fortunately, there are several workable solutions for each community. The decision as to which to implement should consider:

· The selected solution must meet the business needs of the community. If a business need is not satisfied, the solution is flawed or incomplete.

· GIS technology simplifies modeling address attributes because the graphical relationships it supports greatly simplify the interrelationships among entities that must be maintained. It also helps in the use of point representations of addresses, and the process of geocoding.

· Object-oriented approaches help work with addresses because of the clarity it can bring to representing the complexities of addressable features such as apartment houses or shopping centers.

· Data ambiguities or problems in the real world are not solvable through data modeling. The model must reflect reality in the community, and not vice versa.

Bio – Michael Walls is an Executive Consultant for PlanGraphics, Inc. out of Frankfort Kentucky.

DEVELOPING AN SYSTEM OF PUBLIC ADDRESSES - AS A "LANGUAGE" FOR LOCATION DEPENDENT INFORMATION Morten Lind

Introduction A basic element in a society's information infrastructure is a well-formed system of addresses and corresponding high-quality address data, pinpointing and labeling in human language all locations where people live, work, shop etc.

Using properly geo-referenced address data, every citizen, enterprise, institution or public authority will be able to "twist" their own address-related data from tabular data into spatial information. The potential of a simple data element that can release this "spatial power" is enormous, because thousands of private and public databases and base registers have reference to address information.

This paper is based on the experiences of work on standardization, data modeling, legi- slation and geo-coding in the field of address data in Denmark and in the other Nordic countries. In these countries, the system of addresses and address data are developed on the basis of public street gazetteers, property and building registers, and detailed base maps. Here, as well as in the UK and in a number of other European countries, addresses are basically represented as individual, geo-referenced points, as "roof-top" or "door- step” addresses.

Even though the situation in the United States is somewhat different, we believe that a general lesson could be taken from our experiences, considerations and proposals.

This paper discusses the properties and characteristics of the “address" as a common spatial and administrative object which must play a key role in today's "Spatial Information Society". We will argue for the need of a consistent data model that puts the address entity (or object) in focus, whenever we manage address data or address-related data. Futhermore we will demonstrate the advantages and drawbacks of the “roof-top” address approach.

Characteristics of the address as a reference system In the geographical world you distinguish between reference systems based on co-ordi- nates and reference systems based on identifiers. Addresses belong in the last category, which means that for instance the address “Rentemestervej 8, DK2400 Copenhagen” identifies a particular (more or less well-defined) location without use of geographical co- ordinates. (Figure 1)

The identifiers of the address system thus consist of ”names”, that is: country, region, town, district, street name and house or door number. It is a characteristic of the address system that it is structured hierarchically with a network of named streets as its backbone.

Place names make up a similar name-based reference system, as each place name indi- cates a particular locality or a particular area (town, part of town, village, lake, wood, district etc.) without using co-ordinates. Another example is the Cadastre where the name of the region (parish etc.) and the title number identify each plot in the area covered by the Cadastre.

Figure 1- The system of addresses is based on identifiers, not co-ordinates

The particular properties of the address The address system, however, is distinguished by a series of special properties from other name-based reference systems:

· It is simple, well known and widespread! Everybody knows the system, and addres- ses are without comparison the most widely used method of collecting, storing and exchanging information on the location of this or that person, property or business. · It is suitably detailed! In the built-up areas where we move around daily, the address system is so fine-meshed that we can find our way to the right front door or stairway entrance, solely with the aid of a street name and a house number. · It is practical and logical! The system is based on the way we get about: The hier- archical structure with town name, street name and house numbers, divided into equal and unequal numbers in rising order, is easy to find your bearings in, on foot as well as in a car or on a bike - or when looking at a map. · It is visible! Signs with street names and house numbers mean that we can read and find our way around the address system, and find that particular front door we are looking for. (The combination of logical structure and visibility is perhaps the most special and valuable property of the address system.)

Because of these special properties, the address is a central component in GIS products and services: When we use addresses as a means of communication, we speak a ”language” which people understand and can relate to!

We can assume that the address constitutes a potential spatial key in thousands of registers and databases within the private as well as the public sector, and that we will be able to link records in different databases with the aid of the address identifier in order to transform billions of text-based records into spatial information.

A case: The address system in Denmark The address system in Denmark (as well as in the other Nordic countries) is regulated by the legislation pertaining to the central civil registration and by rules and regulations in connection with property registration, e.g. registration of buildings, dwellings and property.

The Danish Central Personal Register (CPR) law from the late 1960s and the Building and Housing Register (BBR) law from 1977 form the basis of the public authorities’ administration of street names and house numbers. The law governing the Building and Housing Register ensures that each individual dwelling is assigned a unique address designation which is registered by the municipality. At the same time, the law governing the Central Personal Register ensures that every citizen with a permanent dwelling is registered with his place of residence at a certain address.

The demand that dwelling addresses must be unique entails that residential buildings with more than one main entrance door or stairway are addressed with a house number for each individual stairway. Correspondingly, the insistence on uniqueness of reference entails that if two or more dwellings have access through the same stairway, each dwel- ling must be assigned information about floor number and door designation. Every named street is assigned a unique, four digit street code.

The 20-year old Danish address system is uniform over the whole country and includes towns as well as rural areas in all municipalities. (Figure 2)

Figure 2 - The "architecture" of the Danish address format

In the public sector it is of great advantage to use the official address format, because it reflects the administrative division of the country (by virtue of the municipal code). Moreover, the address format is based on digit codes and not on street or place names, where uncertainties about spelling, abbreviations etc. are often encountered. Finally, the format means that each individual element in the address is stored in a separate data field which entails that address designations are easier to handle and correlate by computer. The drawback is, of course, that an official address designation will only be readable and understandable, if it is joined to and supplied with a municipality name and street name.

Benefits of using the address as an administrative key There is no doubt that the function of the address as an administrative key has already saved the public sector in Denmark considerable amounts of money. The existence of a reliable administrative key between the personal data area and the area of building and property data means that a connection can be created between various administrative public systems which handle amounts running into billions in the form of personal taxes, property taxes, health insurance, wages, pensions, benefit payments etc.

To cite an example, there has not been a traditional manual census in Denmark since the 1970s. On the other hand, a machine-based census and dwelling count is made several times a year based on the personal number and address systems. Corresponding profits are gained by private industry, the financial sector, for instance, which also uses the Danish ”address key” to a large extent.

Data Modelling the address In accordance with the above, Denmark should have the best possible starting point for using the address as an official, administrative “key” to link information from the various public and private databases and digital maps. Experience shows, however, that the exis- tence of a common data format gives no guarantee of a sufficient consistency between the data contained in the various registers. (Figure 3)

Figure 3 - Inconsistency between property address information in two registers.

As an example, practical tests have exposed the fact that address information in the National Business Register (CER), which is based on reports from individual businesses, does not correspond very well with the addresses found in the Building and Housing Register that is maintained by the municipalities.

Attribute or entity? This problem is in fact a conceptual data model problem which could be summarized in this simple question: · Should the address be regarded as an attribute, that is, as a characteristic of each of the entities in question: the person, the property, the building, the business etc. or · Should the address be perceived as an entity, that is, as an independent object type which has associations to the other object types: person, property, building, busi- ness etc.? (Figure 4)

Figure 4 - Modelling the address as an attribute or as an independent entity.

In the private as well as in the public sector, the predominant tendency is for computer systems to favour the first concept: Most often, address information is managed as a simple attribute, that is, as a characteristic in the database record which is keyed in like other “soft” characteristics such as names, measurements, conversion figures etc. without any kind of validation.

The drawback to this “attribute approach” is that it prompts each individual computer system to build up its “own” address registration which has no connection with those of “other” systems. The result is that inconsistencies arise between the individual address collections, so that properties, buildings, businesses etc. that ought to carry the same address throughout the different registers, in fact do not.

Furthermore, we have many examples of the attribute approach encouraging each admini- strative system to create its own - slightly different - definition of an address and of the address concept. The result of this is that the different address registrations are not fully comparable even on a context level.

New Data Model: the address as an entity In connection with a Danish project for the modernization of the Building and Dwelling Register, the idea arose of regarding the address as an independent entity, that is, as a separate object of registration to be managed with the same care as other comparable administrative designations, like for instance personal numbers and cadastre numbers etc.

In general, the benefits of this new data model are that it strengthens the very concept of the system of addresses and positions the address entity more centrally as a common object type:

· The address object instance will, when it is created, exist “of itself”, and will thus no longer exist just as an ”appendage” to for instance a building or a plot of land.

As a result, new addresses can be created and registered when it is practical, for example in a planning phase before a plot is parcelled out, or a building erected. · The address object instance can be registered and equipped with its own relevant data characteristics (e.g. site co-ordinates, date of origin etc.) which can be reused in all data collections containing address information. (Figure 5)

Figure 5- The Address entity types and relationship types towards the Building register.

In this data model, an address will always have three basic characteristics: · A spatial location indicated by co-ordinates or by reference to at least one other identified spatial object ( plot of land, building, entrance door etc.). · A reference to a named road or the like which ensures that the address can be fitted into the hierarchical order of the address system. In certain cases, a road name is replaced by the name of a locality, for instance on smaller islands or in very small villages without a proper road system. · A label which as a rule consists of a house number that must be determined so as to indicate the place of the address in relation to the other addresses on the same road.

Note, however, that an address instance in this data model can change as regards road reference as well as ”label” without changing identity! Therefore, every object, when it is registered in a database, should be supplied with a persistent and unique object identifica- tion. (Figure 6)

Figure 6 - Address entities and attributes that composes an address instance

Requirements for a new address management The new data model will raise the demands on the practical as well as the legal management of the address system:

· The stock of addresses should be administered as a single quantity in a base register (or "address gazetteer"), clarifying fully which dataset is "the master" that must serve as validation reference in relation to other private and public address information sources. · The authority to create, maintain and delete addresses must be placed unequivo- cally with a certain authority or institution (in Denmark the municipal council) to prevent unnecessary duplication, and to ensure that the exercise of authority is handled according to a well-defined set of rules. · It is important to distinguish very clearly between situations where a new address instance is created and put into the base register, and situations where a correct or incorrect address information is given on an object (eg. when you give information on the address of a restaurant).

In Denmark the new focus on the address as an object has led to modernized legislation in the field of addresses. Until now, the rules on apportioning and registration of addresses have been spread over three areas of legislation under three different ministries. From July 1st 2001, all rules will be joined as one law under the Ministry of Housing. At the same time, the responsibility and the authority of the municipalities will be greatly

strengthened. In connection with the law, more precise, binding rules and informative guidelines will be worked out.

A consequence of the new data model is that a new road name or a new address will always come about as the result of a conscious decision by the responsible authority. Similarly, for already existing road names and addresses, it is a rule that the responsible authorities can always decide to accept or reject an instance as an ”element” in the official address system. If this authority is to be exercised uniformly, common rules are called for which define the real world phenomena that can carry the decision of an independent address.

In Denmark it will appear from the modernized set of rules that an address is linked to an entrance door in a building or to an unbuilt plot under development, or to other unbuilt areas which are intensively used. An independent address can also be linked to certain technical installations etc. which have physical properties comparable to a building ( e.g. larger windmills).

As something quite new, the rules further lay down that the purpose of the public system of addresses is for citizens and authorities to orient themselves:

§ 1. The municipality establishes addresses in accordance with §§ 3a-3e in the law on building and housing registration. Addresses are established with regard to helping citizens, authorities, utilities and others to orient themselves and to locate the properties, buildings, main entrances, housing or business entities in question.

Developing a public address register During the last couple of years, the government and the municipalities in Denmark have co-operated on a harmonization and quality control of the addresses in the public registers and maps. The purpose is to collect the official set of addresses which will compose the “backbone” in an independent public address register according to the conceptual approach described in chapter 2. An important part of this project is to establish a direct geo-reference in the register, based on a set of geographic co-ordinates that pinpoints exactly the "roof-top location" of each address.

Roof-top geo-referencing of addresses Evidently, an individual localization of each separate address – compared to address points computed by interpolation – is most valuable in applications where the positional accuracy is important, that is, where it is crucial to know which exact building or which part of the building is identified by the address. It also means that ”roof-top” address data are used mainly together with detailed maps in scales 1:25.000 to 1:1000 or even down to 1:500. The value of the individual geo-references depends naturally on the quality of the dataset, primarily as regards accuracy and completeness. (Figure 7)

Figure 7- Individual geo-referencing: "Roof-top" addresses

The usability and benefits from the individual geo-references depends naturally on the quality of the dataset, primarily as regards accuracy and completeness. In general, the advantages of a ”roof-top” approach can be summed up as follows:

· Great positional accuracy: The address point pinpoints the exact building or plot of land, which is particularly important in sparsely or individually built-up areas. · Great completeness by spatial search: The selection of addresses with the help of polygon search and the like will ensure a large degree of certainty that all addresses in question are represented in the response. · Great security in join with other data: The positional accuracy means that detailed spatial queries can be done, for instance on the ”neighbour objects” in the map. Similarly, the address object can be supplemented with data from other databases where the address constitutes a key by simple database join. · Simple data structure: Address data can be stored in a simple database table where the co-ordinates are stored as a set of attributes of the type ”float”. Standard data- base operations and computations can be done in this table without the use of a dedicated GIS. · Simple queries: If the address table is input in a GIS, it will be simple to search for individual addresses using SQL or by browsing the table - particular geo-coding mechanisms will not be necessary.

Compared to address datasets based on road segments with attached house number intervals, like for instance US TIGER (Figure 8), the principal drawback of the ”roof- top” approach is that, in principle, the individual addresses do not know the geometry of the underlying road network. Route calculations from address to address cannot be done until the individual addresses are linked to a suitable road network.

Figure 8 - Comparison of “roof-top” vs. segment-based addresses.

Another essential drawback to be considered is how to handle situations where there is a discrepancy between the looked-for address information and the address table. In a dataset based on road segments, it is possible in principle to calculate the position of every given address information, within the proper house number interval.

An individual address table contains a finite number of addresses and has no facility for calculating in-between or inaccurate house numbers by approximation.

Building a public address register: Data capture A public address register should build on a consistent data model, as sketched in chapter 2. Subsequently, a pragmatic course of action would be to develop the contents of the register on the foundation of the dataset containing the most complete collection of address data.

In Denmark we use the 20-year old Building and Housing Register, which in principle holds all built-on addresses, as the foundation stone of a new address register. The build- up then implies that each municipality, on a voluntary basis, harmonizes and supplements this backbone of addresses with closely related address information from the municipal property register. As an aid to the control process, a complete join of the two registers is done every month. The discrepancies found are offered to the municipalities who can then do the necessary rectifications.

The task is then to locate a set of spatial co-ordinates for each individual address. In Denmark we have a particularly well-suited point of departure for this geo-coding process in the form of a series of detailed digital maps:

· The municipalities’ and the utilities’ technical basemaps in scales 1:1000 – 1:10.000 which include among other things a house number level for most built-on properties. · The national digital cadastre map in scales 1:4000 – 1:10.000 which delimits and identifies each individual property and plot of land. · The national topographic basemap in scale appr. 1:10.000 which, like the cadastre map, is maintained by the National Survey and Cadastre.

In co-operation with a series of pilot municipalities the National Survey and Cadastre has developed and published a set of algorithms which ensure that an average of appr. 90% of the addresses can be automatically geo-coded using the technical basemaps. Of the re- maining addresses, a further 9% or more can be furnished with a provisional and approximate geo-code by calculation.

The calculation is done partly by simple interpolation in relation to the neighbouring addresses, partly by using an algorithm which employs the relation between address and property numbers from the municipal property register to locate the address on the digital cadastre map. (Figure 9) In case the property in question is built-on, the building polygon from the topographic basemap is used to calculate a set of co-ordinates which lies within the building.

Figure 9 - Geo-referencing by use of property no. and cadastre map.

When the automatic processes have thus established a co-ordinate geo-reference to nearly all addresses, it is the task of the municipalities to check the results and to edit the pro- visionally calculated co-ordinates, that is, to move each individually calculated address to the right building.

The municipalities and the National Survey and Cadastre have played the primary role in the project, but the private sector has also played an important role. The suppliers of GIS applications and services to the municipalities have thus been free to use the algorithms, specifications and interfaces which are an important part of the project. In practice, there has been fierce competition between four or five IT companies who have implemented the technique of carrying out the automatic geo-coding processes.

The task of the National Survey and Cadastre has been to develop methods and establish norms, to consult and to supervise the data quality. Thus all specifications and instruc- tions can be found on the address project’s website www.adresseprojekt.dk together with a series of free applications for the control of data quality, and conversions of data between different formats. The website functions at the same time as a market place for the private firms who have developed commercial tools and applications for the muni- cipalities.

The address project has been instrumental in starting parallel projects. For example, it has turned out that the public base register of road names and codes, which belongs under the Home Office, includes a number of errors which have for many years bothered the municipalities, the postal services, the police and mail-order firms and other essential users.

The result has been that each municipality has been offered a password to a particular section on the address project’s website where lists of spelling errors in road names, unnecessary abbreviations etc. are made available. For use in corrections, the Danish Board of Linguistics has published a special guidance in the spelling of road names.

Data maintenance and update

In 2001, the parties behind the Danish address project are implementing an IT co- operation between the address register, the municipal property data systems and the Cadastre.

The system ensures that each new address, which is not entered with co-ordinates in the address register, is automatically furnished, on a monthly basis, with an approximate geo- reference on the right plot of land, using the digital cadastre map. The system also ensures that all addresses on which the municipality has registered a building application are automatically provided with a code that informs the municipality that the co-ordinates of the address are to be controlled. Similarly, all erased addresses in the register are marked with a special code.

The municipalities and the other parties in the collaboration will be able to follow the quality and the updates of the address register’s co-ordinates currently on a website which includes among other things a detailed statistics of data quality and update frequency as well as metadata etc. for each individual municipality.

Conclusions The overall objective of an official address register is to represent a common resource in the geo-spatial information infrastructure - to the benefit of both the private and the public sector.

Of course it is possible to have a functional address system without having a public address register. It is also possible to have a telephone system without having telephone directories. However, it is more rational for all parties, if you do.

When building a public address register, you have to consider carefully the connection between organization, legislation and data model, and you have to build on the best accessible data sources. The key words are standardization and common rules, authority and responsibility for upkeep and the freest possible access to utilizing the addresses – in the private as well as in the public sector.

References British Standard, BS7666, Part 3: Specification for Addresses, London, 1995, UK. Cushine, J: “A British Standard is Published”, Mapping Awareness, June 1994, p.40-43. Lind, M., "Addresses and Address Data Play a Key Role in Spatial Infrastructure", Proceedings for EC-GIS 2000 Conference, Lyon, France. Lind, M., Lund Christensen, T., “The Address - a key to GIS”, Proceedings for Joint European Conference on Geographic Information, March 1996, Barcelona, Proceedings Vol. 2, p. 868-877. Swedish Standards Institution, SS 63 70 03, Addresses, 1st. edition, Swedish Standards Institution, 1998, Stockholm, (Publication only in Swedish: ”Belägenhedsadresser”) Danish Address project website: http://www.adresseprojekt.dk/

Bio - Morten Lind is with the National Survey and Cadastre in Copenhagen, Denmark

OAKLAND COUNTY, MICHIGAN ADDRESS STANDARDS AND DATABASE DESIGN Nancy von Meyer Dan Parr Scott Oppmann

Introduction

Oakland County has reviewed existing address data standards and developed a data model that will serve as a framework for the development of the Address Management Program and all county applications that use address data. The data in these models will serve to coordinate address management within county government. The data will be the basis for interaction and exchange with local cities, villages, and townships (CVTs). The standards and data will provide recommendations for address data format to the CVTs and develop processes that will allow interaction and exchange of data with every CVT. The goal is to standardize at the county level and allow for adequate flexibility at the CVT.

Address Data Standards

There will be two standards for address content in the county, a service delivery address and a mailing address. Service delivery addresses are assigned as identifiers for places where the county or local government has or may have responsibilities, e.g., parcel, structure, and occupancy. Not all places may have or require a street address. Service delivery or place address is seen as one type of identifier that indicates location, but it is not the only one or even the most efficient one in many instances. These standards are adopted with the understanding that street address is not the universal location identifier for places that have location in Oakland County, but it is the identifier that allows for the most effective and efficient interaction with citizens because it is commonly understood and accepted. These standards do not intend nor encourage the assignment of street addresses to everything with a location in the County.

The second address standard will govern addresses used for mail delivery. These include street address, postal routes, military and international addresses. The intent is to impose standards on content and format that are sufficient to ensure the proper delivery of items mailed by the county. CVTs will benefit from this as well because the standard ensures proper mail delivery. The second aspect of mailing address standards is that the mailing address used for a place or person depends on the type of information being sent. As examples a tax bill may be mailed to a bank or a mortgage company, a water bill may be mailed to the occupant, and a legal notice affecting the property may be sent to the landowner.

The dichotomy imposed on address data is designed to maximize control and flexibility for the functions of county government. The design and implementation is also designed to provide maximum benefit to the CVTs and others who are dependent on address data in the county. This document describes the data design of these two standards. An

implementation plan will describe the linkage to other systems, migration strategies, and other deployment details.

Service Delivery Address

Local government is responsible for providing services to many places within its jurisdiction. Most of these use street address as an intelligent identifier that allows one to know the location of the place by looking at the address. This identifier is pervasive in county files. It is, however, difficult to automate because the “intelligence” built into the identifier can be inconsistent at best and incorrect at worst. The service delivery standard is adopted to impose as much order as possible and to provide a foundation for dealing with the inconsistencies that have developed in Oakland County over the past 200 years.

A service delivery location is any place in Oakland County that receives or potentially receives a service. A service could come from the County, CVTs, or other jurisdictions. Examples of services include tax assessment, well and septic permits, law enforcement and emergency response. By definition a service location has an X,Y,Z location because it is a known location within the County. The accuracy and placement of the location in a mapping environment, such as the County’s geographic information system (GIS) is verified Service Delivery Address ID through use and may be managed as a standard Prefix1 component of the GIS. Prefix2 Street Number The basis for the standard comes from the recently Fractional Street Number adopted FGDC Standard for geographic point Street Name locations FGDC-STD-011-2001. The objective of Street Type Code this standard is to create a more favorable Suffix1 environment for developing location- based Suffix2 services within the United States and to increase Unit1 Type the interoperability of location services appliances Unit1 Number with printed map products by establishing a Unit 2 Type nationally consistent grid reference system as the Unit2 Number preferred grid for National Spatial Data Physical Municipality Infrastructure (NSDI) applications. This standard Site Address Name defines the US National Grid. The U.S. National Site Address Vanity Grid is based on universally defined coordinate Date Created (replaced with active) and grid systems and can, therefore, be easily Date Updated extended for use world-wide as a universal grid Data Source reference system. Date of Data Source X The proposed elements for the content of a service Y delivery address would be as shown: Z Site Coordinate Type Because this is a service delivery address and it USNG Number (generated) has a location, the County, Taxing CVT, Zip Code GeoCode Address and State can be easily generated by combining Status Status Date

this information with other GIS information. The decision to generate or store the data will be determined by what will be most efficient and effective for a majority of transaction applications.

Street Name Every valid street name needs to be identified and included in the database. A complete street name must include all of the concatenated elements that make it unique such as a directional “prefix,” “street name” and “street type” to be accurate, e.g., N. Elm Street, Elm Lane, W. Elm Avenue, Elm Circle. As the physical data files are implemented it may be most efficient to have a separate list of valid street names by jurisdiction as a quick look up table. The implementation plan will include a method for determining which agencies have the have the final authority on official street names.

Alias Many streets have more than one name in common use. The general rule is that any street will have one official street name. All reasonable steps will be taken to be flexible with the public and existing anomalies in the address system. For example, State Route 59 shares the roadway with a portion of Elizabeth Lake Road. Not all of M59 is along Elizabeth Lake Road and not all of Elizabeth Lake Road is shared with M59. These types of aliases are common and are known as routes. These will be managed as segments in the GIS or there may be specific look up tables for specific address points. The implementation plan will test various scenarios for handling routes and determine which works best for Oakland County.

Another type of alias can arise from former names being used as if they were the current name. The status attribute will assist in managing these types of address labels. For example a single point might have a current address and it may have a formerly known as address. Because these are the same location, it will be easier to recognize the history of the aliases. Other aliases arise from specific application uses. For example, some legacy applications may drop the prefix and the suffix. A methodology for managing these legacy elements as well as those arising from disagreement on name spellings and form will be described in the implementation plan.

Geographic Coordinates Geographic coordinates are shown in the Service Delivery for this level of design. However, the implementation plan will include a methodology for testing the best implementation of the X,Y, Z component of the Service Delivery Address.

Status Tracking a status of service delivery addresses enhances address management by adding the ability to handle parcels that have not been assigned addresses by local jurisdictions and may even make it possible to provide “pending” or “interim” addresses automatically for those applications that depend on dispatching people and equipment. CVT’s and the County may store many data elements for structures from the plans and permits to the licensing, taxation, and use. The status allows the service delivery to be tagged as interim or pending. The other status information includes added or active and deleted. In the

physical design it may be desirable to track the dates of the various stages of the addresses. In this case the status is expanded to be interim (y/n) and interim date, active (y/n) and active date and deleted (y/n) and deleted date.

Street Centerline Database The purpose of the Street Centerline Database is to record and maintain a node-segment- node street centerline file that contains data that describe address segments along the roadway. The Street Centerline Database Street Centerline can also be related to a road right of way database that would contain information Segment ID Low even actual about the road surface itself as well as Street Prefix Low odd actual infrastructure and services that are located Street Name High even actual within the right of way. The Street Street Type High odd actual Centerline Database contains block-by- Low even Alias Name block address ranges and can be enhanced to Low odd include routes and routing information such High even as turning impedances. The Service Delivery Address will be related to the Street Centerline Database because these are both location and physical location addresses.

Both actual and potential address ranges may be Postal Address ID maintained because they have important uses Prefix1 for multiple function of government. This also Prefix2 will keep the Address Management Program in Street Number compliance with the National Emergency Fractional Street Number Numbering Association’s (NENA) standard Street Name that are designed to serve the needs of Street Type Code emergency response agencies such as fire and Suffix1 police. The current street centerline GIS data Suffix2 set is in a DIME format meaning it has a left Unit1 Type and right address range. The County may Unit1 Number explore developing a “Nickel” format that has Unit 2 Type single side address ranges. Unit2 Number Alternate Address 1 Note also the Street Centerline data set has alias Alternate Address 2 names for the street segment. This may include City route names or other also known as names. State Abbreviation Country Mailing Address Zip Code Zip Plus 4 code The basis for the postal address standard comes International State from the proposed Federal Geographic Data International Committee (FGDC) Address Standard. The Date Created (replaced with active) address standard proposes standards to govern Date Updated content, i.e, the components of an address. The Data Source address content standard was developed to Date of Data Source Postal Address Type include all of the relevant elements of other existing national and international standards. By adopting it, Oakland County should ensure that it would be compatible with most mailing address data developed in the USA.

The section of the FGDC standard called “mailing address” is the specific source. Because this standard includes components that are not Oakland County location identifiers, e.g., “alternative address, rural route, international state,” those elements are removed. This leaves the data elements shown in the Postal Address box for address identifiers. The mailing address standard will include all components of the FGDC mailing standard. The implementation plan will include tasks to develop cross walks between existing systems that use postal addresses and this standard.

International Addresses The components of an international address, the International State and the International Postal Code are included in the postal address table. Because of Oakland County’s proximity to Canada these components may be particularly important.

Address Type The address type indicates whether the address is a military mailing address, rural route, or postal box. This helps analyze the expected content of the postal address.

Alternate Addresses The alternate address lines contain the postal box, military address, highway contract (not in Oakland County) and rural route designations. These are not the locations of non- postal alternate addresses or vanity addresses.

Status See the discussion of status under the service delivery address. As with the service delivery the status is tracked for when the postal address became active and when it was deleted. The only difference is that postal addresses do not have an interim status.

Address Data Model

The data model used to structure the adopted standards is designed to keep the address data as simple and accessible as possible. The address data model sees address as a single data entity that can be used in many ways. The design ensures that the address values are as accurate, consistent, and usable as possible. The address reference database will serve as a foundation for all other applications in the County, and potentially in the CVTs as they adopt more and more technology to speed and improve their services.

The purpose of the Address Database is to record and maintain the street address components for all parcels, structures and occupancies that use street address as a location identifier and do or may require government services. The purpose of a mailing address database is to record and maintain mail delivery data for all persons and organizations that receive mail from the County.

The relationship of the two address standards to each other and to other systems will be more fully described through the results of testing in the implementation. The implementation plan will describe the steps. From a high level design perspective the relationship of the two standards based on a sampling of some of the systems to which they may connect or relate is as shown. The heavier outlined boxes are directly related to the address management while the other boxes represented related systems and applications.

Domain of Values for Service Location Address Look up Tables including Street Segment GIS data set

Critical Places GIS Database Service Location Address Postal Address Parcel Identifier

Postal Address

Parcel (Map and Core Data)

Postal Address Cleaning Application

Person (Addressee) Names not in scope

Domain of Values Look Up Tables

The domains of values look up tables include look-ups for street names and cities names and state names. By providing as many look up tables for the domain of values in the address tables (both service delivery and postal) the County will achieve a level of consistency in the address information. The implementation plan will define specifically which attributes can be supported by domain of value look up tables and the content of those domains. Two examples to consider are the handling of zip code and street names.

In the case of zip codes, we all know from personal experience that the zip code is directly related to the postal city or mailing city in the address as well as the state information. There are many tables and applications available that keep this information current. It may be that the zip code and the city and state look up tables are part of an application or subscription that helps keep the County’s mailing information current.

In the case of street names, there will need to be a series a look up tables. One possible configuration is as shown below. The street name table contains the list of all valid names. However the street name by itself is not sufficient to provide the level of standardization needed in the County. The street name needs to be combined with the prefix and suffix information and then related to a specific CVT. For example, White Lake might have an East and West Elm Street and Bloomfield Hills might have a North and South Elm Boulevard. It is the combination of the street elements with the municipality that standardizes the street name component. This standardization may be able to even further by specifying the allowable address numbers for a given street in a given jurisdiction.

Street Name

Combined Street Name Prefix or Prefixes

Municipality (CVT) Suffix, Suffixes and Street Type

Allowable Street Name for a CVT or Municipality

Further standardization and validation may include verifying that assigned street numbers fall within allowable address ranges in the street segment file.

Anomalies and Relationships

The basic address reference table will have multiple relationships with other geographic data and application specific databases. Some of these tables will be used to manage anomalies in the existing address network such as alias names that were mentioned above. There is also an associated database of critical places that will contain landmarks and other physical places where people reference their location. The specification of this database is not part of the address design but the linkage to this database will be explored in the implementation plan.

Parcel The parcel/address relation table depends on the creation of a parcel table as part of the land records for the county. Coordinates for a centroid could be stored at this level because they apply to the place “parcel” and not to the address. The relation table is the link between address and parcel data. The address point could be placed on the structure to which an address is assigned.

Structure As with parcel, structures should be stored and managed in structure table that includes descriptive and administrative data that are created by local and county government. This table should be a part of the land record and would contain the coordinates that would be used to map address when they are available.

Postal Address Management

Postal addresses will be maintained in a separate database(s) that can be more flexible given the uncertain and inconsistent nature of postal addresses. The county has no ability to control content and little control over format for postal or mailing addresses because they can be inside or outside the county or country.

The maintenance of a central postal database will allow for the elimination of duplicate addresses. By assigning a postal address ID to each new mailing address record the application specific databases can eliminate all other postal address data elements. This database would grow very large over time, but it would allow for a check during the data entry process for any mailing address in the many county application systems. The postal address ID would be different from the service delivery address ID because they should be arbitrary system assigned numbers for data that can change independent of each other. However, no valid Oakland County address would ever need to be stored completely in any application files.

People

It was decided that “addressee” data management was outside the scope of the current effort. It is, however, important to note that there is a lot of duplicate and inconsistent data on people in county databases and files. The development of a standard, and a data model that could serve these data has the potential of simplifying and improving the management of people data.

Appendix A - Attribute Definitions

Postal Address Attribute Definitions

Postal Address ID - Primary key for the table

Prefix1 - This is normally a direction that precedes a site address. It will be standardized from a look up table. This is called a predirectional field in the postal standards and the EPA standards

Prefix2 - This is a secondary prefix for the address. This is only filled if the prefix 1 is not a directional prefix or there are two prefixes. This is not the second coordinate.

Street Number - The number assigned to a building or a land parcel along the street to identify location and to ensure accurate mail delivery. This is the number of the site. For the coordinate addresses it is NxxxWxxx, that is, include both numbers with no space.

Fractional Street Number - A sub-number to a street number

Street Name - This is the name of the street and this be a part of the street naming list that will be provided to all municipalities for use as they approve new street names. This is the official name of a street assigned by a local governing authority. The street name will be concatenated with the street type and the municipality to verify street names in data entry and data maintenance programs.

Street Type Code – This is the abbreviation for the street type. The Postal Service provides a standard list of abbreviations for street types.

Suffix1 - Postdirectional field in the FGDC standard. The directional symbol that represents the sector of a city where a street address is located. This is a post directional field in the postal standards. These should be standardized to be the cardinal directions as with the prefixes.

Suffix2 - This is a post directional field in the postal standards. The second suffix is only filled in if the First Suffix is filled in.

Unit1 Type - This is the unit type, such as building, apartment, suite or office. The standards for these are listed in Appendix C of the postal standards. Called Secondary address identifier in the FGDC standard.

Unit1 Number - This is the number of the first unit and can contain letters or numbers.

Unit 2 Type - If Unit 1 location is already identified and there is a second unit like Office 1 Suite 32, then Office 1 is in the first field and the Suite is in the second field. Another example is in condominiums like building 1 unit 5.

Unit2 Number - This is a number and can contain numbers or letters.

Alternate Address 1 – This is the first alternative mailing address such as a post box, a rural delivery or military address. If this field is populated the Address type field indicates what type of alternate address this is. These are not the locations of non-postal alternate addresses or vanity addresses.

Alternate Address 2 – This is the second line of an alternative address if needed. This field is only used for a second line of an alternative address.

City - A finer partitioning of geographic subdivisions of a state or county, usually associated with additional levels of government. Note that the taxing municipality for a parcel or structure may be different than the mailing address. The tax files will contain the mailing municipality.

State Abbreviation - Two-character abbreviation for the name of a state, U.S. Territory, or Armed Forces ZIP Code Designation (“AA”, “AE”, or “AP”).

Country – The largest of the geo-political boundaries that define address areas of the world.

Zip Code - A five-digit code that identifies a specific geographic delivery area. ZIP Codes can represent an area within a state, an area that crosses state boundaries (unusual condition) or a single building or company that has a very high mail volume. “ZIP” is an acronym for Zone Improvement Plan.

Zip Plus 4 code - ZIP equals the five-digit ZIP code (refer to ZIP Code) +4 describes the last four positions of a ZIP+4 code. Most delivery addresses are assigned a single ZIP+4 Code. However, large companies may be given a range of ZIP+4 Codes that can be used to route mail to a specific department.

International State – This is the first division of a non-US Country such as a Canadian providence.

International Postal Code – This is the non-US postal code

Date Created – Removed – see active date below

Date Updated – The date the record was updated or changed, not the same as the active or deleted date

Data Source – (File or source name) The file or source for the original record

Date of Data Source – The date of the file or source for the original record

Active – This is the active status. If the site address is active this field is yes. When a site address is deleted this becomes no. This is yes for interim site addresses if these interim addresses are active. That is, the interim address has been used or assigned.

Active Date – This is the date the site address became active.

Active Note – Any notes about the address’ active status

Deleted – This is the date a site address was deleted. When a site address is deleted the active field becomes no and the deleted field becomes yes.

Deleted Date – This is the date the site address was removed or deleted.

Deleted Note – Any notes about the address’ deleted status

Postal Address Type – The type of postal address. City street, international, military mailing address, rural route, postal box, unknown, secondary or alternate postal address

Service Delivery Address Attribute Definitions

Service Delivery Address ID – Primary key for the table

Prefix1 – This is normally a direction that precedes a site address. It will be standardized from a look up table. This is called a predirectional field in the postal standards and the EPA standards

Prefix2 – This is a secondary prefix for the address. This is only filled if the prefix 1 is not a directional prefix or there are two prefixes. This is not the second coordinate.

Street Number – The number assigned to a building or a land parcel along the street to identify location and to ensure accurate mail delivery. This is the number of the site. For the coordinate addresses it is NxxxWxxx, that is, include both numbers with no space.

Fractional Street Number – A sub-number to a street number

Street Name – This is the name of the street and this be a part of the street naming list that will be provided to all municipalities for use as they approve new street names. This is the official name of a street assigned by a local governing authority. The street name will be concatenated with the street type and the municipality to verify street names in data entry and data maintenance programs.

Street Type Code – This is the abbreviation for the street type. The Postal Service provides a standard list of abbreviations for street types.

Suffix1 – Postdirectional field in the FGDC standard. The directional symbol that represents the sector of a city where a street address is located. This is a post directional field in the postal standards. These should be standardized to be the cardinal directions as with the prefixes.

Suffix2 – This is a post directional field in the postal standards. The second suffix is only filled in if the First Suffix is filled in.

Unit1 Type – This is the unit type, such as building, apartment, suite or office. The standards for these are listed in Appendix C of the postal standards. Called Secondary address identifier in the FGDC standard

Unit1 Number – This is the number of the first unit and can contain letters or numbers.

Unit 2 Type – If Unit 1 location is already identified and there is a second unit like Office 1 Suite 32, then Office 1 is in the first field and the Suite is in the second field. Another example is in condominiums like building 1 unit 5.

Unit2 Number – This is a number and can contain numbers or letters.

Physical Municipality – This is the municipality where the site is physically located. This is generally the taxing municipality and may be different than the mailing city.

Site Address Name – This is the named location for a site such as the Pontiac Silver Dome. The named site location may aid with navigation to a site or help identify or find a site.

Site Address Vanity – This is the vanity address for the site that may or may not be used for mailing. For example 1 Chrysler Center may be a vanity address for the Chrysler tech Center.

Date Created – Removed see active date below

Date Updated – The date the record was updated

Data Source – (File or source name) The file or source for the original record

Date of Data Source – The date of the file or source for the original record

X – The East or X coordinate value for a point representing the site. This may be the X coordinate of the parcel centroid or a location on a building.

Y – The Northing or Y coordinate value for a point representing the site. This may be the Y coordinate of the parcel centroid or a location on a building.

Z – The Elevation or Z coordinate value for a point representing the site. This may not be populated in the initial deployment and may only be used where multistory site addresses exist.

Site Coordinate Type - The type of site coordinate, centroid, driveway point, building, other

USNG Number (generated) – The U.S. is divided into 6-degree longitudinal zones designated by a number and 8-degree latitudinal bands designated by a letter. Each area is given a unique alphanumeric Grid Zone Designator (GZD) (i.e. 18S). Each GZD 6 x 8 degree area is covered by a specific scheme of 100,000-meter squares where each square is identified by two unique letters. (i.e. 18SUJ - Identifies a specific 100,000-meter square in the specified GZD). The UTM grid coordinates are expressed in terms of Easting (E) and Northing (N) values to determine a point position within the 100,000- meter square. An equal number of digits are used for the east and north coordinate values where the number of digits depends on the precision desired in position referencing. In the USNG standard the reading is from left with Easting first and then Northing. A software tool will be used to generate the USNG and its usability will be explored as a part of pilot testing. : http://www.ngs.noaa.gov/TOOLS/usng.html

Interim – This is interim status of the site address. If this field is yes then this is an interim address. See the active field as well.

Interim Date – This is the date the address became an interim address.

Interim Note – Any notes about the address’ interim status

Active – This is the active status. If the site address is active this field is yes. When a site address is deleted this becomes no. This is yes for interim site addresses if these interim addresses are active, that is the interim address has been used or assigned.

Active Date – This is the date the site address became active.

Active Note – Any notes about the address’ active status

Deleted – This is the date a site address was deleted. When a site address is deleted the active field becomes no and the deleted field becomes yes.

Deleted Date – This is the date the site address was removed or deleted.

Deleted Note – Any notes about the address’ deleted status

GeoCode Address - The address components concatenated to match the format requirements for address geocoding. This element will be computed from the component parts.

Bio – Nancy von Meyer is with Fairview Industries in Pendleton South Carolina.

Dan Parr is with Dan Parr Associates in Takoma Park Maryland.

Scott Oppmann is the GIS Utility Manager in Oakland County Michigan.

FIRST STEP TO ADDRESSING THE STREET DICTIONARY

Amy J. Purves, P.E.

Introduction

In late 1998, Prince George’s County, Maryland was faced with the prospect of implementing several new systems, including a Permitting System and a new Computer Aided Dispatch (CAD) System. Both of these systems have requirements for access to Geographic Information System (GIS) data and to related attribute data. In particular, they both require access to accurate street centerline information, including consistent street names, accurate premise addresses, and correct address ranges. The new CAD system, to be provided by Tiburon, Inc., requires a geographic centerline file instead of the tabular CAD file now in use.

In order to ensure that both of these new systems could be supported, an analysis of the street and address information currently being maintained by the County and by the Maryland-National Capital Park and Planning Commission (M-NCPPC) was undertaken. Many sources of street information were identified, including a street index file, a TIGER based file, a planimetric centerline file, a street name validation table, premise addresses in the Assessor/Treasury (A/T) system, permits addresses, and the CAD street segment file. Each of these data sources had independently developed or mutated the street names and conventions used.

To provide the data needed for the new CAD system, it was necessary to build a geographic based street segment file, merging data from the available sources. The tabular CAD street segment file in use by the previous CAD system contained fire box areas and police reporting district attributes for each 100-block street segment. However, there was no geographic element in this data to link it to a TIGER type file or other centerline file.

This paper describes the many sources of street information that were used in this effort and the process that was developed for conflating the sources. Although the primary goal was to develop accurate address ranges and landmark address points, it was determined that this could not be accomplished without consistent, formatted street names.

Street Name Sources

Prince George's County is typical in the types and variety of street information that is maintained for many purposes. Most local governments do not yet have a single, centralized Street Dictionary that all applications, whether GIS related or otherwise, can access. The Street Dictionary serves many purposes: 1) a look up of valid streets to validate data entry; 2) the single source of approved street names and their status; 3) and a resource for correcting street names used in the wide variety of address and street based files and applications.

Street names usually are stored in a variety of systems which fulfill different operational needs: assessor's parcel data, subdivision information, Health records, permits databases, multiple centerline files (TIGER-based, planimetric derived), local street maps, public works roads inventories, and of course 911 systems. The physical street sign may be considered the final arbiter of correct street names in many jurisdictions. Each of these has different requirements and probably handles directional prefixes/suffixes, zip codes, and street type conventions in different ways. Even if the USPS standard for field formats and street types is adhered to, the used of the directional prefix and suffix may vary as well as conventions related to hyphens and apostrophes in street names.

The systems in use are typically residing in a variety of environments, including mainframe systems with centralized VSAM/COBOL systems, client/server using databases distributed on NT servers, and stand alone PC databases. The process of extracting and formatting data for comparison purposes can be complex.

Street Naming Agencies

Many agencies and departments are involved in assigning street names, maintaining street name data, and developing attribute data for streets.

The current mainframe permitting systems at the County directly access the Assessor/Treasury (A/T) system to obtain parcel related information, including streets and addresses. The Environmental Health Department address information is maintained completely separately as they do not have direct access to the A/T system. M-NCPPC is responsible for assigning new street names and correcting premise addresses in the County A/T system. The GIS department maintains a centerline file derived from planimetric mapping. They also are responsible for the development of the new street information for the CAD system. The Department of Public Works maintains a roadway inventory with associated pavement and condition information.

One of the first steps in establishing a street dictionary is to identify a team with representatives from all involved agencies. This team should have technical expertise so that the members can resolve formatting and data maintenance issues. The responsibilities of each agency must be clearly defined. These include approval of new street names, maintenance of the dictionary, processing of street name change or correction requests, street sign placement, maintenance of the centerline files, and oversight of QC and validation procedures.

All agencies should ideally have access to the same computing environments so that the street dictionary may be directly accessed and not maintained in duplicate on several systems. Prince George's County has implemented an intranet site that allows all connected agencies to submit corrections to street centerline information.

Street Dictionary Requirements

The Street Dictionary must be developed and maintained in a format that can be accessed and used by a variety of applications and systems. At its simplest level, it is a tabular list of valid street names in USPS format. Each valid name should have a unique identifier that can be used in any County system to reference that name. Ultimately, other databases would contain only the unique street name identifier, not the four fields required to hold the text for the name. This helps to ensure that inconsistent names do not "creep" into existence through data entry errors. Since most County and City jurisdictions have duplicate street names occurring within different towns or neighborhoods, the zip code should be included with each street name. A street that defines a boundary between two zip codes would have two records, one for each zip code.

The TIGER centerline file format and other centerline formats include the full street name as part of each segments attributes, including a left and right side zip code. In most cases, these street names can only be validated against the street dictionary since applications are not designed to expect the unique identifier to be used instead of the full name.

Centerline Requirements

Since street centerline files are the best definition of physical streets in a jurisdiction, they can be used as the primary link or reference to the street dictionary. They also can provide a graphic interface and method for maintaining the dictionary. The County has several requirements for centerline-based data. These requirements are described below:

Centerline Requirement Business Need 1. Spatially accurate centerlines based on Supports a County-wide GIS street base pavement edge map and small-scale mapping 2. Address ranges (left/right from lo to Supports GIS geocoding/incident mapping high) for each segment between by locating address features along the intersections centerline. 3. TIGER-based centerline file which Correlates street segments to census areas defines census geography as well as the as well as administrative and reporting street network areas. Supports demographic analysis. 4. Centerline file which can be loaded into The centerline file with address ranges and the TGF (Tiburon Geographic Format) fire box areas, reporting areas, and in the Tiburon System. municipalities must be able to loaded into the TGF using the Tiburon GCP (Geofile Conversion Process)

All of these requirements cannot typically be met by one file format. A decision must be made whether to maintain multiple centerline files with coordinated maintenance or whether to try to develop a single source that meets all requirements. Typically, a

planimetrically derived centerline file must be maintained separately since it includes much more geographic detail, many more vertices per segments, more parallel segments and intersection detail. It is very difficult to logically relate this centerline structure to a logical structure primarily used to store street attributes, such as a TIGER based file. In the case of the County, it was necessary at a minimum to conflate the information from the original CAD street segment file with another centerline source in order to provide the format needed for the Tiburon CAD system.

In the future, a centerline file should also be able to provide a foundation for building a routing system in which attributes such as traffic direction, impedance, and travel times can be added as attributes of segments.

Centerline File Sources

The formats and requirements of several sources of data are described in this section. This includes the existing CLINE files maintained by the County; the TIGER file format, the Tiburon File format, and the existing CAD system files.

CLINE File A centerline file with address ranges (referred to as the CLINE file) has been developed by M-NCPPC and the County. As the result of a complex series of address validation and editing tasks that have been performed, two versions of the file now exist. Version 1 contains the original centerlines developed as part of a photogrammetric compilation process. It also contains address ranges that have been validated by Centech based on the tax maps that have annotated parcel addresses. This version satisfies the requirements of item 1 and 2 in the above table. A second version of the file was developed by OAO. In Version 2, the segment configuration has been simplified by removing parallel segments for most divided highways, reducing the number of segments and nodes. This version more closely resembles the configuration of a TIGER file although the segments are more accurately located. The address ranges were adjusted in this version through automated procedures and are not as accurate as the Version 1 address ranges.

Neither CLINE file contains address ranges for crossovers and exit ramps or interstate highway segments. The Tiburon system requires all street segments to be addressed with non-zero numbers. Exit ramp numbers or highway mile markers can be used. The CAD system used mile markers for the beltway segments.

The CLINE version 1 was compared to the M-NCPPC street name file and index to ensure that correct street names were being used. The main problem with street names and addresses in this file will be that many are missing since the file has not been kept up-to-date.

TIGER File A TIGER format file also consists of street segments with address ranges. However, a TIGER file also includes non-street features such as hydrography and railroads and also includes invisible features such as administrative boundaries that do not correspond to

physical features. Each segment in the file contains left and right attributes that indicate what areas the right and left polygons belong to. Examples include municipalities, census tracts, election districts, and planning areas. From these attributes, polygons can be generated and put into separate graphic layers. When a TIGER file is provided in Arc/Info format, the segment features are often divided into separate layers by feature type and the area features are placed in separate polygon layers. However all of the layers were generated from the same file of segments and attributes.

When a TIGER file is updated, it is necessary to use an application that ensures that the topology of the segment network is maintained. Particularly that any changes to polygon areas are reflected automatically in the segment attributes.

TGF Format The Tiburon file format is very similar to a TIGER file, particularly the relationship between segment attributes and the areas that they represent. There are several differences between the Tiburon and TIGER formats. However, the GCP (Geofile Conversion Process) that Tiburon provides for loading data automatically makes these changes in the data.

1. In the TGF, two records exist for each segment, one for the left side and one for the right. These are created from the TIGER records when they are loaded through the GCP. 2. Non-street segments are not loaded into the Tiburon file. Therefore, if there is a node that defines a boundary or reporting area split between 2 intersections, an internal node is created so that each of the two resulting segments can have the correct reporting area attributes. 3. The Tiburon segments do not have “shape points” defining curvature in the segments, however shape records can be added which each define a single point in a segment. These points add a degree of curvature; each point also corresponds to a break in the address ranges. 4. An intersection file is created when TIGER data is loaded which contains a record for every pair of streets that intersect. 5. A street name index and an alias file are both required. These can be loaded from the M-NCPPC files which are currently maintained in the premise address system and from the CAD system alias file. 6. A commonplace name file is used which needs to be created from the CAD system common place names file.

The TGF requires that firebox areas, reporting areas, and municipalities be stored as attributes of each left and right street segment. These attributes can be assigned automatically from existing polygon areas using the Tiburon Geographic Maintenance System (GMS) tools. The GMS can be used to generate the fire box and reporting area attributes from polygon layers. Alternatively, the attributes can be copied over from the existing CAD files by matching the 100-block records to the new CLINE segments. Then they would be loaded directly into the TGF file along with other CLINE segments and attributes using the Geofile Conversion Process (GCP).

CAD System Files The current CAD system contains a record for the odd and even side of each 100-block segment. If a street intersects the block in the middle of the 100-block, there are two segments for that side only. If the intersecting 100-136 138-198 street crosses, there are two records for each side of the block. 101-199

The CAD system contains fire box number, reporting area, and municipality as fields for each block record. There may be duplicate records in the CAD file for various spellings of a street name and there may be incorrect records in the file. As long as incoming calls can be matched in the file, the duplicate or incorrect records have not been a problem. They are superfluous but do not interfere with the functioning of the CAD system. The CAD file has not been matched to the M-NCPPC street name file or index, so there may be many inconsistent street names in use.

In order for the transition to the new Tiburon system to be successful, it is important that every call that would currently find a matching record in the CAD system, would also find a record in the file in the Tiburon system. Matching the CAD file records to the CLINE file can test this. The street names in the CAD file should first be corrected to improve the match rate. As long as each street name can be matched in either a CLINE record or in the alias file, and the address falls within a segment address range, then operations can be safely supported.

There is also a street name alias file and a commonplace names file in the CAD system that need to be put into the appropriate Tiburon TGF files. The street name alias file should be compared to both the CAD system and the M-NCPPC alias file which is used as part of the premise address system. In the Tiburon TGF alias file, aliases can apply to certain address ranges of the street segment instead of only the whole street.

Centerline File Development

It was determined that the County should develop a centerline file that is derived from both the CLINE versions and a TIGER-based file. The resulting file will match the TIGER file structure, but the segment spatial accuracy will be improved and the validated CLINE address ranges will be used. This product will meet all of the requirements for a centerline file identified above. A conflation process is required to achieve this combined product.

Ideally the CLINE and best available TIGER file would be conflated graphically prior to final street name and address validation and loading to Tiburon. However, because of the very tight schedule for the Tiburon implementation, this step had to be delayed. The two CLINE files were conflated initially and the segments matched to the CAD file. The resulting file was loaded into the Tiburon file system for testing purposes. Additional refinement of the CLINE result, the CAD file sources, and the transfer of

attributes from CAD to CLINE were then performed. The first two stages of the development process are illustrated in the diagram that follows.

The process begins with the design of the data model and the establishment of quality standards for all components. The data model was developed in conjunction with the analysis of the permitting data requirements.

Based on the model and the standards, a detailed development process was defined and executed. The development process was structured to ensure the completeness and accuracy of this critical data resource. A thorough evaluation and analysis of all of the available data sources was be completed. A detailed, technical review of the Tiburon TGF was be completed and input from Tiburon technical staff obtained.

Centerline Development--Stages 1 and 2

Develop Centerline Data Model-- Establish Evaluate CLINE Data Standards Versions

Define Development Plan

CAD Street Names: Conflation Step 1: Match street names in the CAD Autolabel CLINE version 1 segments with system files to the M-NCP&PC the Centech validated address ranges. street name index, perform quick Overlay with CLINE version 2 segments. fixes where possible, write error Transfer address attributes from vs. 1 to file for further analysis and vs. 2 with a point and click tool (to be review. developed).

Match CAD records to CLINE segments, write non- matching records to error file for further review. Transfer fire box, reporting area, and municipality attributes from CAD records to CLINE segments.

Load test CLINE file to Tiburon System. Tiburon System Test result of load and identify needed Test Data Load refinements to process or data structure. February 28, 1999

Review and correct CAD errors in Identify and add missing segments street names and non-matching by comparison to tax maps or 100-block records. recent plats.

Check and correct other Tiburon Review and improve street name topological requirements, such as alias file and common place names directionality, odd/even mismatches or file. flips, range overlaps.

Match CAD records to CLINE segments, review non-matching records. Transfer fire box, reporting area, and municipality attributes from CAD records to CLINE segments.

Generate polygons for fire box and reporting areas. Identify missing boundary features. Add features to CLINE.

Tiburon System Load Production CLINE Production Data Load to Tiburon System. June 1, 1999

Street Name Validation

The initial street name validation to occur was the match of street names between the existing CAD file and the CLINE file. If these street names did not match, then the process of matching address ranges could not be completed. The improved match rate ensures that a greater number of CAD 100-blocks have matching Centerline segments which will improve the Tiburon System performance.

The original intention of the Street Name validation effort was to compare four sources of street names: the CAD file, Centerline file, TIGER file, and the M-NCP&PC Street Index. Although the Street Index is an established County standard, it did not correspond well to the ADC street names or the conventions used in the CAD file. Since the primary objective of the Street Name validation was to achieve a better match between CAD 100- blocks and Centerline segments, the effort was focused on comparing these two sources, using the ADC map book as a guide. The TIGER file was missing too many street segments to be useful.

Standard street types to be applied to all files were first identified based on a review of the files. These type abbreviations are two characters. These types were globally applied to all streets prior to initial matching. The CAD and Centerline files were then matched and the Fire Box and Reporting Districts from CAD segments applied to the matching Centerline segments. When the non-matching street names were reviewed it was determined that additional checking and correction of names could improve the results. In this process, the street names in the two files that did not match were reviewed individually in two files called FIX-CAD and FIX-CLINE. Each non-matching name was classified as to whether it could be ignored, needed to be researched further, or was corrected. Two files were created which contain the original names from each source and the corrected names. These files will be used to apply the corrections to the complete CADEVEN, CADODD, and Centerline files. The following process illustrates how these files were used.

Street Name Correction Process

FILE

FIX_CAD without matching FIX_CLINE

Unmatched CAD Streets Flag CAD Street Name as: 1. Highway or alias (ignore) 2. Missing street to be added to CLINE (note ADC page and grid) Look for street 3. Correct Street Name in FIX_CAD in ADC Map Book

1 YES Is Street a NO Is Street YES Double-check Highway or in ADC Street in obvious alias? Map Book? FIX_CLINE

NO

2 Check Add Street NO Is Street in FIX_CLINE, Segment to FIX_CLINE FIX_TIGER, CLINE misspelled? Street Index

(see diagram)

YES

NO Misspelling Correct or Error in Street in CAD Street? FIX_CLINE

3 FILE YES

Unmatched Correct Re-match CAD Streets Street in CAD Streets to be FIX_CAD to CLINE researched by CAD

Maintenance Workflow

Long-term maintenance of the Street Dictionary and related centerline files is very important. An analysis of all sources of street name changes and additions should be undertaken and a maintenance plan developed. In the case of the County, the processes that impacted street names were identified. The file will be maintained through four basic processes:

· Adding newly developed or vacated streets

· Adding newly assigned addresses

· Updating changes in area boundaries

· Incorporating corrections or additions discovered through use of the file.

The Centerline Maintenance Unit will manage the update and maintenance. Actual updating will be performed by a combination of that unit, Public Works, M-NCPPC, and other county departments. The Maintenance Unit will perform Quality Control (QC) on all updating and will be responsible for the quality and availability of the data.

The following table describes sources and methods for updates will be included in the overall process. Source Information Method New Streets Subdivision plat approved, M-NCPPC send plat with street names and assigned addresses to Centerline Maintenance Unit, enter new segment as single line centerline over plat, enter street name/address ranges, status code, attributes Street vacation Council vacation approval sent to the Centerline Maintenance Unit, locate street segment(s) in file, change status to vacated Street under construction Construction permit approved by Public Works, Public Works enter status code, adjust centerline fit as necessary to source permit, site plan, or engineering drawing and verify position and attributes, correct as necessary, update street name/address, attributes, pavement attributes, status code Street constructed Public Works update status code, verify as-built position, attributes Accepted for Public Works verify as-built position and attributes, enter maintenance attributes, enter status code State/Municipal roads Public Works obtain drawing from source, enter as in new street update Occupancy/Use Permits Verify street name, address range as part of permit application. Notify Centerline Maintenance Unit if error or missing name/address (e mail).

File use/field work Various County departments observe errors in file during field activities, access file through applications, verify data, notify Centerline Maintenance Unit of correction/missing data (e mail) Street name/address M-NCPPC enter new name/address, archive old data change Private Streets Public Works identify private streets from permits, site plans, field detection, enter street line, attributes, status code Census Geography Department geographic area boundary owners notify Centerline Maintenance Unit or update themselves, enter necessary boundary segments, modify/delete replaced segments, rebuild polygons/test for polygon closure

Ideally, all potential editors of centerline or address range data will be able to have direct access to the central centerline file. All editors will use a maintenance application which will be tailored or customized to their role in the process and will grant specific access rights. Staff will have the ability to make appropriate edits at any time. Certain users may be notified of types of activities so that they can make follow-up edits. The Centerline Maintenance Unit staff will be responsible for validating all edits before accepting them into the production file.

Street Name Considerations

There are still many technical issues to be resolved. These are comparable to issues facing other local government jurisdictions.

· There are duplicate street names in some municipalities. In addition, these streets may have identical block numbers. These need to be distinguished, possibly by municipality or zip code

· Street name and address ranges that are assigned by municipalities need to be transmitted and incorporated into the centerline file and street dictionaries.

· What streets should be included in the system? Paper streets, or only constructed streets.

· The status of the street should be included: approved, constructed, abandoned

· Should directional prefixes or suffixes be used? Different sources use different fields inconsistently.

· Should hyphens and apostrophes be allowed in a street name? They can interfere with processing in some applications.

· How should aliases be maintained? Aliases can be defined as previous names, commonly used names, and route numbers.

Conclusions

There are many implementation areas that must be addressed by the Street Dictionary team or management. Significant analysis and planning is required to establish a method for developing, maintaining, and publishing a Street Dictionary.

· Analyze all existing databases that contain street names, identifying the sources for the dictionary as well as databases that would need to be validated against the resulting dictionary.

· Develop a data model/database design for the dictionary and related centerline and address files.

· Identify all sources of street name changes and additions and develop a maintenance plan for the dictionary, including agency responsibilities.

· Determine the best environment for maintenance of the dictionary, typically a client/server database that is accessible through the organization's network is ideal.

· Determine how the dictionary can be accessed by the majority of applications, develop guidelines for design and modification of applications and databases to take advantage of the dictionary.

Bio - Amy J. Purves is a professional engineer with Plangraphics, Inc out of Silver Spring Maryland

ENTERPRISE DATA QUALITY MANAGEMENT: THE TIME HAS COME

Andrew D Pidgeon

Introduction

As we begin the new millenium with global competition and global communications, companies must be in the business of being informed. Those that know the most about their customers, products, vendors, and locations will be able to formulate the most successful strategies and tactics – for a jump-start in winning. This information has a significant impact on the bottom line in both revenue-generating and cost administration initiatives.

However, data quality is a prerequisite for being fully informed. Only accurate, complete data about customers, products, vendors, and locations – and their relationships with each other – will yield the enterprise information intelligence needed to win. Yet, in the business world today data moves constantly among diverse systems. It streams into the corporation through data entry systems and unstructured web responses. It flows from one corporate system to another, disseminated more widely than ever before by enterprise application integration (EAI) solutions. And it travels outside the corporation to vendors and partners who interact with it through extranets and the Internet. In other words, every day, there is an increase in data and system complexity, which can lead to degradation in data and overall system quality. As a result, achieving and sustaining high data quality is a challenge.

That’s why the time has come for enterprise data quality management (EDQM). Why EDQM is necessary now to achieve successful business results, and how it is possible, become more apparent if we address the following questions: · Why has the need for EDQM evolved? · Why is EDQM important? · What does EDQM involve?

Why Has the Need for Enterprise Data Quality Management Evolved?

Two mutually reinforcing trends – a technological revolution and a business evolution -- have put a premium on high-quality information as the key to success. Starting in the late 80s and culminating in the late 90s, the global infrastructure has been transformed: by fax machines, PCs, laptops, and sophisticated networking, as well as by the development of the Internet as the information superhighway and of hand-held devices like schedulers and mobile phones that link to it. This new infrastructure makes it possible to meet a

rising demand for almost instant access to information. In three-to-five years, it will enable people to work and connect to data anywhere.

What will be the result? As the new communications bridges and highways expand, the traffic will increase. Remember what happened in the Industrial Revolution. Building the railroad increased the shipment of materials for production and finished goods for sale – and reduced the gaps between people imposed by horse-powered means of travel and communication. Now, in the Information Age, the new electronic tracks will increase transportation – of information.

In response to this new infrastructure and other technological advances, especially relational databases, business theory and practice have also been evolving. In the past, corporations valued three assets: their personnel, products, and customer set. Now, they are realizing that they have a fourth asset, their information – on customers (or patients and providers in healthcare), vendors, distributors, products, inventory, and locations.

This information can take companies to a new level in formulating effective strategies and tactics. The more informed they are about these areas and the relationships among them, the better they will know their business and business trends, the more rational decisions they will be able to make, and the more successful strategies and tactics they will be able to formulate and execute. In short, information is now a necessity rather than a luxury.

Where will companies find this information? They have it already (and are now realizing this) – in thousands, even millions, of records stored in legacy systems – with new data coming in daily. This realization has produced a new business trend. Companies have begun to mine their legacy systems for this information and will do so increasingly.

But in extracting data from these systems, they must be able to put the relationships that are critical to a business together. Nowhere is this more evident than in the growing popularity of customer relationship management (CRM) systems. These systems rely on information extracted from existing systems not just on customers and products, but on which customers own which products.

The point is that with CRM and other business intelligence systems you need to match data components to get more extended information. This is the valuable information for formulating strategies and tactics and producing business benefits. But to optimize these systems – to be able to match the data -- you need to ensure these systems are loaded with high-quality data.

Why Is EDQM Important?

What level of data quality is needed to leverage corporate information, and how and where does data quality have an impact on business? Examining these topics reveals why data quality is an enterprise-wide concern, demanding to be managed.

A Closer Look at Data Quality Data quality includes but goes beyond accuracy – such as correct name and address data at the character level. Rather, it is the level of accuracy, consistency of format and data representation, and completeness that permits matching and integration of all records that pertain to an entity, such as a customer or patient, a product, or a location. Specifically, it requires domain integrity --ensuring each data value is in a discrete, appropriate domain or field (that is, matches the metadata) so that it is addressable, or easily accessible to users and query tools – and data consistency -- conditioning and standardizing data, so that it has a common structure.

These are just prerequisites for matching to get complete information. Once there is a common data structure, including a separate field for each data value, you can match records from diverse systems accurately and extensively. Matching, in turn, enables you to link all information relevant to a specific entity. Achieving the highest data quality also includes finding non-exact matches – like John Doe and J.Q. Doe -- so you can, indeed, consolidate all relevant information about this entity and eliminate duplicates or redundancy in the database. Once this consolidated information is available, you can find otherwise hidden relationships – for the highest-level of data quality.

This level of data quality – accurate, consistent, non-redundant data with a complete view of all relationships to other data sets -- is required for leveraging enterprise information for success. This is clear if we look at its impact.

The Impact of Data Quality on Revenue Generation and Cost Administration Within a company, two kinds of activities impact the bottom line: revenue generating initiatives and cost-administering initiatives. Data quality is critical to the success of each.

Revenue-generating initiatives rely on customer information systems. The latest and greatest of these – customer relationship management or CRM systems – are becoming popular because their purpose is to maximize a company’s understanding of its customers and their relationships with its products. This combined information is critical to valid marketing profiling and segmentation – which , in turn, are required for strengthening relationships with existing customers and planning successful marketing campaigns that drive revenues.

Yet, data quality, as defined here, is essential for CRM systems to deliver their expected ROI. If you don’t know all the relationships that particular customers have with your products, your marketing profiling and segmentation will be flawed, and your marketing campaigns will fall short of their potential. For example, let’s say John Doe has purchased three products. But in the three line-of-business systems containing his records, he is listed as John, Jon, and J.Q., and his Social Security number contains transposed digits in one of the systems. These data inconsistencies and errors – or a lack of data quality -- could prevent matching of these records.

Why is matching so important? Consider what happens when you don’t know that John (Jon, J.Q.) has purchased three products, but think you have three customers, each with one product? (This is the problem of duplicates in your information.) The result is misinformation, which has a detrimental impact. You will miscount the number of your customers and, thus, fail to understand your revenue potential and to plan appropriate programs. For example, if you don’t perceive John to be a premium customer owning three products, you may miss the opportunity to offer special deals to strengthen his loyalty and add revenues.

What happens if you make these kinds of errors with ten percent of your customers? The answer is clear: missed sales and even lost customers. In other words, only 75, 85, or 90 percent of the correct information can amount to misinformation, skewing strategic planning and tactics -- and business results.

Data quality has a similar impact on administering costs, for example, in purchasing. Purchasing is an expenditure. But cost savings here can have a major impact on managing the corporate budget and containing overall costs to ensure maximum profitability.

Consider the benefits of data quality within your data sets on inventory, vendors, and locations. If you have accurate, standardized descriptions for inventory items, you can match this information from different locations and their systems – to discover costly and inefficient inventory duplicates (perhaps listed in different systems under different product numbers) and save money by reducing inventory. Similarly, if you have accurate, standardized vendor information across your locations, you can get a true picture of your total business with a vendor, and may see that some “different” vendors that your various locations buy from are actually divisions of the same company. So you can negotiate a better price. What’s more, if you have views of the complete relationships among all these data sets (i.e., inventory, locations, vendors), you can really understand what cost-efficiencies you can gain from critical mass purchasing with a few vendors. Not only will you get better prices for volume purchases and special deals as a premium customer, but you will be able to administer purchasing more efficiently.

Challenges to Achieving and Sustaining Data Quality Taken together, these examples show the pervasive need for data quality. So what’s the problem? Typically, companies have a subset of applications critical to running the business: for example, customer sales order entry systems, financial systems, product information systems, and manufacturing or production systems. Data from these applications is constantly on the move.

Information enters the enterprise through varied data entry systems and portals. While you can train clerks to adhere to a standard data entry format, information entering through Web forms and extranets is subject to the user’s choices, no matter how well designed your interface is. That means your clean, standardized internal system environment is under threat of degradation from impurities such as missing data, the

“wrong” or “extra” data in a field, and misspelling and typos; from non-standard data representation; and last, but definitely not least, from duplicates.

Data also moves within the corporation – and this movement is being extended and accelerated as legacy systems become the targets for enterprise application integration (EAI). Specifically, companies are beginning to extract data from their applications for daily transactions and to route it in a real time fashion – especially, information that is vital to business’s ongoing health and ongoing procedures related to meeting the business objectives. This information is then directed to portals specific to corporate functions and users’ roles: portals for purchasing information, marketing information, product information, etc. In the near future, corporations will even begin to extract and route this data from transaction, mainstream applications – as they are being affected on a daily, per/minute, per/second basis.

But different systems, because of their varied purposes, require different data. As a result, there will be a need for data quality monitoring and filtering – a standardizing and, possibly, a re-engineering process – to enable the transaction data to be matched and integrated with data in these other enterprise application systems -- for optimal usage in decision support.

Add all this together and what do you have? Enterprise information portals, “life blood” enterprise application systems, decision support databases, extranets that interact with internal information systems, and Web sites that extend enterprise business into the Internet. All these systems need data and information, maintained on an enterprise-level, which conforms to enterprise standards and demonstrates the comprehensive relationships critical to optimizing business operations. The point is clear: it’s time for enterprise data quality management.

What Does EDQM Involve?

Enterprise data quality management (EDQM) is not a one-time, short-term fix. In the current, dynamic business environment, everything – procedures, products, relationships, the organization itself – changes on a short timetable. To remain competitive in the future will require quick reassessment and adjustment of information systems to support these changes.

As a result, to lay the ground for EDQM requires taking a top-level view: examining the entire enterprise to understand what information systems and pools are critical, how information flows through them to support business functions, and as a result, where the enterprise needs to implement data quality filters and procedures to create or maintain optimal quality information. Once a company has this understanding, EDQM is a three- phase process.

Loading systems with high-quality data You need to implement the highest level of data quality when introducing a new system. That means you need data filters – tools to re-engineer legacy data for data quality --

before loading it into new CRM and other business intelligence systems. Unlike the 90s, these implementations must be done rapidly, because people in the enterprise will want this information as quickly as possible, to operate in Internet time. In addition, at points of uncontrolled data entry like the Web, you need real-time data quality filters to correct data, condition it to corporate standards, and match it to internal information -- to prevent duplications. Finally, at points where transaction information is integrated into other systems, you also need data filters. These will standardize and re-engineer data to meet the requirements of the receiving systems– and, once again, perform matching in order to link the new data with the appropriate entities in the existing databases, so that you have up-to-date, complete information on these entities and avoid duplicates.

Maintaining the highest level of data quality in information systems This phase has two important components. First, you need to create and empower an internal information quality group to maintain the quality of corporate data, including all of the relationships running the critical initiatives of the business. Second – and this is a new activity – systems integrators such as the Big Five need to come up with ways to perform regular data quality audits on critical information systems and pools (on customers, products, etc., and their relationships). The reason is simple: as we have seen, misinformation can wreak havoc on the bottom line or, conversely, high-quality information can produce business results.

In fact, these audits can play a strategic role in a company’s success. For example, until now, mergers and acquisitions have been done “blind”: that is, on the basis of financial histories and fuzzy forecasts of revenue. As a result, companies have often had unpleasant surprises. Once EDQM is in place in business environments, mergers and acquisitions will be based on a true count of customers (in each organization and the combined organizations), inventories of all customer- product relationships, and, based on this information, analysis of the potential for existing customers to purchase other products. In other words, with EDQM, companies will be able to accurately price mergers and acquisitions and accurately forecast revenue potential.

Also, auditing and enriching the information in certain systems can greatly enhance their function. For example, if field support systems have optimal data quality and have their address data integrated with spatial information such as latitude and longitude, you can plan more efficient service routes, to cut service delivery costs. Furthermore, if you can also relate name and address information to product information (e.g., the products purchased and their maintenance cycles and defects), and to service resources in different locations, you can deliver proactive service programs that provide preventive maintenance. By improving customer care and eliminating many product problems, the company would become more customer-oriented and build customer loyalty for a competitive advantage.

Information development – or proactively modifying and constructing information systems to parallel the planning and implementation of new strategies A top executive, such as the CIO, needs to be looking at future changes in the business and their impact on information needs and quality – to proactively plan and rollout out

systems to support new strategies. Because of the huge advantage to corporations that can access all the information that’s out there, a company that can proactively construct information systems to support a new strategy can make it successful, faster.

For example, if a retail and a commercial bank merge, there will be new strategies and a new, hybrid set of products. So certain information systems will have to be merged, modified, and consolidated. Also, new information systems relevant to the new strategies and expanded business will need to be built. To hit the ground running with the new plan and products would require consistency in and matching of data across the merged enterprise -- to optimally leverage resources for cross-selling all products to all customers and get the quickest ROI. Or, if there is a new product release, people need to understand what type of information systems need to be modified to profile, segment, and forecast existing customers in relation to new product sales. If these systems are launched with the new product, the company will be able to detect trends immediately, so that they can adjust or accelerate their strategy or tactics.

Conclusions

Loading, maintaining, and planning for high-quality data -- that’s what EDQM is all about. Right now, we’re just scratching the surface. However, the tools to re-engineer data and ensure data quality in large system migrations, in real-time data-entry situations, at points of system transition and integration are available today. All that is needed is organizational commitment to do the top-level thinking about the enterprise’s data quality needs. Then, the practical EDQM implementation will follow: the creation of an information quality group, the regular audits, the proactive planning for quality in future systems to support the strategic direction. The time to begin is now -- to meet the future of instant, ubiquitous information access prepared for success.

Bio - Andrew D Pidgeon is a Senior Product Manager with Vality Technology, Inc. in

E911 MULTI-PATH APPROACH TO DATA CAPTURE AND ADDRESS RANGE CREATION Jay Meehl Chris Boyd

Introduction

In the last decade Douglas County, Colorado has experienced a 190% growth rate. In order to maintain a consistent quality of life amidst the ever-increasing suburban sprawl, municipal governments, emergency management agencies, and fire districts have had to create efficient methods of resource allocation. One of the tools required for such a project is an accurate geocodable road centerline file for E911 response.

In the spring of 2000 the Douglas County Sheriff’s E911 Board acquired the funds to create a GIS component to their automated emergency response system and organized a task force comprised of Emergency Service Providers, the County GIS Division and Community Development Department, and a team of hired consultants.

Since the initial scope of the project called for a multi-path approach to data capture and address range creation, the presentation will be a culmination of strategies and lessons learned from both the main stakeholders and the team of consultants. The paper will highlight the inception, attribution, quality control, and implementation of a geocodable road centerline. Agencies desiring to undertake such a project will gain insight into issues such as linear interpolation errors, parity problems, and multiple address databases.

Background

Situated south of Denver, Douglas County consists of 842 square miles of topographically diverse landscape comprised of buttes, mesas, mountains, valleys, and rock outcroppings. The county’s northern corridor is primarily urbanized with 46% of the population in 6% of the county’s land area. Approximately 27% of the population resides in incorporated areas encompassing another 6% of the landmass, and the remaining 27% of the county population lives in rural areas (88% of the landmass including national forest). The increased growth rate could be attributed to the county’s abundant wildlife and natural resources, as well as its proximity to the job markets in Denver and Colorado Springs.

With such a high growth rate the Douglas County Sheriff’s Department is faced with the mounting pressures of servicing an ever-expanding urban population, while stretching their resources to cover the dispersal of rural subdivisions. Furthermore, by October 2001 the FCC has mandated that wireless phone carriers provide the location of an E911 caller in latitude/longitude coordinates. The need to efficiently dispatch to both addressed and non-addressed locations will require the Sheriff’s Department to incorporate GIS and AVL (Automatic Vehicle Location) technological solutions.

In the spring of 2000 the Douglas County Communications Center acquired the funds to create a geocoded road centerline component for the AVL element of the computer aided dispatch system. By utilizing such technology the Dispatch Center could:

· Immediately pinpoint incident locations · Select the closest response vehicles and best routes · Provide data (traffic patterns, lanes, speed limits, construction) for Smart Dispatching · Improve methodology for mapping crimes, incidents, and offender information · Provide a proactive, heads-up approach to accomplishing E911 Phase 2 goals · Provide location information on Mobile Data Terminals (MDTs) · Assist in determining real-time strategic recommendations for EMS · Provide a deliverable back to the GIS department that will be used agency-wide.

While the Sheriff’s Department was acquiring project funding, the Douglas County GIS Division was completing a six-month overhaul of the digital road centerline. Considering the growth rate (13.4% in 2000, up from 11.1% in 1999) and the escalating number of new subdivision roads, the GIS Division published a new county road map to meet consumer needs. To accomplish this, as well as supply the Sheriff’s Department with an accurate base coverage, it was determined that a digital coverage should be created based on our latest digital orthophotography. The thematic layer requirements included:

· Spatial accuracy (+/-10 feet) · Updated forest, private, and subdivision roads · Full annotation including the correct text locations, size, and annotation levels · Attributes with the correct street direction, name, and type · Compliance with US Postal Standards (USPS) · A predefined editing environment and standards · Quality control for data integrity (intersection node errors, dangles).

Although the road update project was a success and completed on time, the initial investigation of county resources failed to identify the extensive role of the Planning Department in this project. Planning staff, resources, and databases have played a significant role in address assignment, road naming and the dissemination of address information in Douglas County.

Addressing operations are presently isolated on a legacy operating system with no dynamic connection to any other municipal address databases. Submitted street names are verified against USPS. Addresses are mathematically created using one of five addressing grids: the Douglas County grid with its zero starting point in the center of the county, the Metro grid with its zero starting point in Denver, addressing by lot numbers (a system approved by the Board of County Commissioners for one distinct area of the county), and two city grid systems. After addresses are created and entered into the database, paper copies of the maps are furnished to emergency response entities, post

offices, assessor, and elections, where the data is manually re-entered, increasing the error rate dramatically. Permit issuance is tied directly to the address database.

Address Initiation Flow Chart

Outside Sources City of Lone Tree · Developers · Planners · Utility Companies · Landowners City of Parker · Planners · Fire Dept. missio ne om r C MSAG Addressing Douglas County Dissemination Operations Assessor’s Office

In e GIS flu e nc City of Castle Rock · Planners Douglas County · Fire Dept. · Planners · Building Dept. City of Littleton · Townships No Referrals City of Aurora No referrals

Project Preparation

At the initial stage of the project the Sheriff’s Department organized the key players or stakeholders, consisting of county, city, and emergency staff. This group was called together prior to the contract signing to gain an understanding of the sheriff’s department’s ultimate objectives, and to discuss their roles within the confines of this project. The next step in creating an accurate geocoded road centerline from over 2400 miles of roads in Douglas County required the Sheriff’s Department to employ the services of CompassCom and the TSR Group. This consulting team, just completing a similar project in a nearby county, proposed a multi-path approach to accomplishing the road centerline capture and address range attribution. In simplest terms, the multi-path approach could be defined as use of several resources in terms of both data, as well as addressing expertise from the stakeholders. In order to achieve this the consulting team solicited input and numerous data sets including multiple road centerline coverages, the Master Street Address Guide (MSAG) information, the Street Maintenance Database file, hard copy maps, and aerial imagery.

This approach consisted of:

· Reviewing existing data 1. Completing an initial review 2. Prioritizing data 3. Importing existing data 4. Compiling data sets 5. QA/QC of compiled data sets · Acquiring and creating new spatial data 1. Data field verification 2. Identifying and prioritizing data capture needs 3. Editing spatial data 4. GPS of arcs (for areas not meeting project specifications) 5. Compiling new data (heads up digitizing and GPS) · Creating address and tabular data 1. Acquisition of the Orion MapManager Software for street addressing 2. Initial QA/QC 3. Creating street name list or MSAG (Master Street Address Guide) · Delivering of Products 1. Delivery of Version 1 of the street centerline coverage 2. Review by the stakeholders 3. Revision of the coverage based on input from the stakeholders 4. Delivery of remaining versions

Project

Initial Data Review After collecting datasets from all stakeholders, the consultant performed a thorough review of the data. The goal was to determine if existing datasets met the positional accuracy requirements as well as the database standards for addressing. Using orthophotography to determine positional accuracy, dataset comparisons, based on edge measurements and internal intersection locations, were performed. Road attribute tables were verified based on their conformance to the USPS standards for road abbreviations.

Delivery After the consultant completed the first geocoding of the road centerline, delivery was made to the clients in Douglas County. Stakeholders performed various QA/QC operations that included:

· Review of MSAG error report – Problems included street name spellings, directions, road type suffixes and street range errors within the MSAG database. These issues were verified with the addressing database. · Review of Addressing Error Report including verification of street name spellings and ranges, uniquely addressed areas, directions, suffixes, grid issues, un- addressed streets, aliases, and reverse addressing of cul-de-sacs. The majority of the questions were answered using the address database with backup verification

from the addressing maps. Many of the errors were corrected by having the consultant use the grid maps and alias road name list. · Roads were geocoded using the county’s parcel database as the master address file. Unmatched records were isolated and joined to the parcel shape file for visual dispersal of unmatched parcels. · Statistical reports of excessive arc range assignments were created. · Reports of unmatched street segments were verified against the Street Maintenance Database. · Visual QA/QC was performed on geocoded points for obvious dispersal problems. · Geocoding was performed on alias names to compare results. · AMLs (Arc Macro Language) were written to flag duplicate ranges and flipped arcs. · The reference coverage indexing structure and the *.MAT files (located under esri/av_gis30/arcview/geocoding) were manipulated to isolate certain types of errors. · A simple spatial join was performed (point in polygon) between the geocoded points that contained an address and the parcel coverage. Geocoded ranges and parcel address numbers were extracted ([propertyadd].extract(0).asNumber) and comparisons were run to determine the address number discrepancies which fell out of the normal 100-block range. Frequency checks were also performed to identify range dispersal problems and identify parcels with multiple geocode points.

Simple ArcView operations including spatial joins and manipulation of spelling sensitivity settings, and visual checks were used to identify geocoding issues or errors.

By using a point in polygon spatial join and extracting the ranges for comparison against parcel numbers, range problems were flagged.

This worked in areas of tight parcel rectification.

Douglas County Stakeholders met three times to review the initial deliverable and resolve addressing issues. The data was organized into a project update that was given to the consulting team along with parcels, copies of address maps, and written explanations of erroneous and unique data.

By using the above QA/QC procedures, the stakeholders documented the following concerns and responsible parties:

· Full street name, direction, or type is incorrect or missing. (County/City)

· Street name alias needs to be incorporated. (City/TSR)

· Parcel address has not been updated, or new subdivisions are missing roads (County/City)

· Address ranges have negative numbers, or incorrect high to low ranges due to MSAG (County/City/TSR)

· Street range is blank (County/TSR)

· Arc street ranges are excessive, incorrect, overlapping or containing gaps. (TSR)

· Arc segment direction or from/to flow is incorrect (TSR)

· Address parity errors. (TSR)

· Actual versus theoretical range problems. (TSR)

· Parcels adjacent to street are actually addressed off another street. (?)

By combining efforts, the stakeholders and consultants created a flexible address methodology by which to guide the course of the project. These guidelines included (but were not limited to) the following recommendations:

· The use of a 50-block range versus a 100-block range to increase spatial dispersal precision.

· Creation of a standard to determine street name versus common alias name.

· Required directions for duplicate range possibilities.

· Range arcs were manually attributed for uniquely addressed areas where traditional addressing schemes do not consider parity, even/odd distribution, or contain dissimilar ranges.

· Half cul-de-sacs (eyelets) were deleted to promote continuous address parity.

· Private access easements were captured but not addressed.

· Road name ranges were determined from the street maintenance file rather than MSAG.

· Divided roadways were given either a left or right road range leaving the opposite side with a zero range.

· Uninhabited roads were given an address range based on the grid.

· Created multiple pre-directional fields to accommodate both required (platted) and common use directions.

Over the course of the next few months the consultant supplied corrected versions of the road centerline. Stakeholders reviewed the deliverables and checked for conformance to the above-mentioned recommendations.

Successes

Successful accomplishments of the E911 Road Centerline Project benefited all stakeholders and county offices as well as providing proactive technology to the community. Current and projected successes gained by this project include:

· An accurately geocoded street centerline

· Internal investigation and creation of QA/QC procedures

· Generation of address data flow charts

· Uncovering and rectification of address problems that exist within the county and county departments

· Increased understanding of geocoding issues as they relate to emergency dispatch

· Creation, revision, and compliance of addressing methodology

· Improved precision of street maintenance file, MSAG, and CAD.

Lesson Learned From E911 Road Centerline Project

The lessons learned from this project are best presented by organizing them into three categories: addressing administration, project management, and technical issues.

Issues pertaining to address administration have lingered in the county’s shadows for years and this project has highlighted the following:

· Acknowledgement of poor addressing practices that were allowed in Douglas County. Examples have been collected and will be presented to the Commissioners as part of a Virtual Government initiative that will include recommendations on addressing.

· Evidence for strong addressing policies, adhered to by both city and county, which will minimize the possibilities of: a. Non-parity addressing b. Out of grid non-sequential numbering c. Duplicate numbering ranges d. Non-standardized directionals e. Duplicate road names.

· The importance of different addressing entities utilizing the same addressing grids, implementing the use of directionals only when appropriate, signage requirements, and recording procedures.

· The importance of implementing an addressing database for Douglas County through multi-jurisdictional cooperation.

At a project level Douglas County stakeholders realized:

· Specific roles and responsibilities should have been determined at the onset of the project.

· Complete details on the consultant’s QA/QC methodology should have been supplied at the onset.

· Prior to signing the contract, feedback from technical support staff should have been considered.

· Street maintenance files should have been the primary source of information since the MSAG contained questionable information entered historically from county resident input.

· Road names should have been compared to the street maintenance file prior to submittal to consultant.

· Check plots containing discrepancies should have been supplied to the stakeholders by the consultant for QA/QC.

· An important aspect of any project is the dedicated participation of all stakeholders working toward a common goal.

Technical issues were predominately focused on the Geocoding process within ArcView. By exploring various operations we found:

· There are a couple of ways to deal with non-parity. We tested the following process, which solved the parity problem but has yet to be tested for routing.

Non-Parity Addressing Issue Example Normal Address Dispersal after Geocoding

Example of Accuracy Issue Solution: Copy Arc, Flip, Change Ranges

Ranges were adjusted as shown in this portion of the road centerline table.

· It was necessary to manipulate the *.cls files located under the esri/av_gis30/arcview/geocode directory in order to reduce non-matched addresses. In particular we found roads with COVE street types were not being geocoded at the default settings. By editing the stname.cls and us_addr.cls files we scored 100 on the streets containing a COVE type. (Make sure to back up all files before editing)

Removed the line- CV COVE A (A = Abbreviations to expand)

Added two lines - COVE CV T (T = Street type (ST AV)) CV CV T

Conclusion

Addressing and the creation of an accurate geocoded road centerline coverage are difficult tasks to conduct. Issues and mistakes compounded over the years have created a complex road network which required a multi-path approach for data capture and address range creation. When undertaking this project we realized that it took cooperation from all stakeholders, well defined project goals and expectations, and thorough review and prioritization of all resources to achieve our goal. As a result of lessons learned, we are plotting a course for future maintenance and update. Continuous efforts by all stakeholders will guarantee accurately maintained processes and databases.

Bio - Jay S. Meehl is a GIS Analyst with Douglas County Colorado and Chris Boyd is a Planning Technician-Addressing in Douglas County Colorado.

Contributions from:

Della Walker , Dispatch Specialist, Douglas County Sheriff’s Department

Allison Robenstein, Systems Analyst, Douglas County Sheriff’s Department

Karen Blaney, GIS Specialist, Douglas County GIS Division

References:

FCC E911Phase 2: http://www.fcc.gov/e911

The Benefits of GIS/911 Integration--An Approach Worth Emulating F. Peirce Eichelberger Louise B. Wennberg

Introduction

This paper highlights the many benefits of GIS and 911 integration as experienced in Chester County, PA. The County has taken an integrated approach to GIS and 911, from MSAG/geofile building through current implementation and support. The advantages of integration are many including addresses can be leveraged for a far larger number of users, exact addresses can be placed in a context of address range information, precisional accuracy of GIS data and maps is usually better, focus on core staff competencies, improved division of labor, economies of scale, improved use of software tools, and a more extensive geographic systems/database model for all address and geographic referencing features. Each of the benefits will be enumerated with examples and details.

The structure and functions of the integrated GIS/911 approach will be covered as will roles and responsibilities. The special relationship the County has built with its 73 Municipalities will also be described. Advantages of using the GIS framework for address and street name maintenance are also many fold. Address workflows and procedures will also be described. The systems/database model of addresses will be described along with the geofile for CAD and the GIS’ map/vector/geographic data model. Ideas regarding GIS and 911 functional integration will also be described.

Background

Chester County is situated about thirty miles west of Philadelphia in Southeastern Pennsylvania. Its 760 square miles are home to more than 400,000 residents, with thousands of daytime workers swelling the ranks during the traditional work week. It is part of a Commonwealth system, where the local municipalities have the governing responsibilities. Hence we find ourselves with seventy-three municipal subdivisions that have the authority to “name streets and number buildings”. A once quaint, rural, woodland setting for William Penn’s colonists and descendents has now been “modernized” by diversified business, industry, and city dwellers moving to the country. Out of necessity rural addressing has been converted to city-style addressing to ensure unique, locatable addresses for telephone locations of residence and businesses. It was determined that 50% of Chester County had rural addressing, and an additional 10% needed to be changed to eliminate alpha-numerics, decimals, and fractions. Appropriate

rapid response, especially in the case of an emergency, is crucial. Remember the old adage of “we can’t help you if we can’t find you.”

Preliminary 911 developments started in 1991. A 911 Department was created within the Emergency Services Department to prepare the way for public cutover to a 911 Enhanced system projected for January of 1994. A 911 Task Force was established to provide overall regional support to the 911 Project. Representation consisted of police, fire, and medical responders, the League of Women Voters, the County Public Information Officer, the County Solicitor, a County Commissioner’s staff member, the Local Emergency Planning Commission, the Addressing Information Systems Unit of the United States Post Office, Bell Atlantic (now Verizon) – the host telephone company, and municipal officials.

Addressing Coordinators were appointed by all the municipalities to serve as the liaison between the County and their respective area. The Addressing Information Systems unit of the Post Office also appointed an Addressing Coordinator in each of the delivery post offices in the County to serve as the liaison between the County, the municipalities, and the Post Office. This collaborative effort began with joint meetings being held throughout the County, and continues even today with open dialog and discussion of addressing issues which continue to arise as the population increases.

With limited staff in the 911 Department, a bid for a systematic review of all the tax parcels in the County (160,000 +) was awarded to a consulting firm. Each of the tax parcels in the County were field checked by their people, who included salient details about the parcels such as number of driveways, actual access onto the streets, pools, ponds, etc. Based on a grid system, there were computer generated address proposals made for each of the parcels where there currently was no address. Addresses already in use were reviewed to ensure parity and sequential numbering. A report containing this information was compiled, and turned over to each municipality for review. This was the developmental stage of the county-wide database. The time frame for this process was one year; it began as soon as the contract had been awarded, and ended when the last municipality had received its data for review (1992-1993).

Although by the summer of 1993 all of the municipalities had received addresses coupled with tax parcel numbers for review, there was an extended period of time during which the County received the validated addresses for inclusion into the database. The hope had been to have all of the information ready for the creation of the Master Street Address Guide (MSAG), the backbone of the 911 dispatching system. The MSAG is an extremely large database with the street number ranges, the street names, the thoroughfare types, directionals if applicable, the municipality name, the County name and numerical code, and police, fire, and medical response “boxes”. Inasmuch as the information was incomplete, the actual cutover to 911 was made without the existence of the MSAG. That date was January 1, 1994.

Advantages of GIS/911 Data Integration

Experience is now showing that there are many benefits of 911/GIS data integration. While this may imply some form of organizational change, ultimate success depends on focused staff doing the jobs they do best regardless of their agency affiliation. The focus in this section is upon data integration; the following section will describe the needed organizational changes that might be needed to attain these benefits.

The benefits of 911/GIS integration from a data perspective are: · Improved address leveraging by all that need better addresses · Newer tools to do a much better job of address assignment, maintenance, editing and utilization · Earlier focus on data/address maintenance from an ongoing development perspective · Improved division of labor and focus on core competencies.

Each of these advantages is discussed in the following sections.

Improved Address Leveraging

In the GIS arena, we are quickly realizing that addresses are an essential element in providing access among various maps, databases and many applications. While 911 requirements are critical consumers of address-based data they are certainly not the only ones. Addresses truly are the “locus of GIS” and Chester County has found that addresses are used and needed by nearly every government computer application that deals with people, places, things or events1.

In addition to the typical residential and commercial structures, there are several locations which are “common places”, readily identifiable in the community, which often do not have addresses assigned. The reasoning is that “everyone knows where it is”. That may be true for the first due fire apparatus responding to the scene, but what about the second or third due ambulance responding from out of the area. Some of the locations where address has become essential are:

· Tower sites · Landfills · Group homes · Medical Facilities (of all kinds) · Gas stations and fuel terminals · Nursery schools

1Eichelberger, 1993.

· Silos · Courts and District Judges offices · Police, Fire, Medical buildings · Elected Officials Offices · Labs and family planning clinics

In addition to needing an address, it must be clearly posted and visible from the street. The increased use of cell phones and competition form other signage required posting large numbers (minimum 2 ½ “) on a sign with contrastive color. Standardizing color and placement is also essential.

Addresses have been found to be the most common of “geographic denominators” among applications in all governmental settings, i.e. cities and counties, sunbelt and rust-belt locales, old and new places. The pie charts (Figure 1) on the next page show the importance of addresses. Note the “address related” darkened, pie slices at 7-8% of total data elements. These pie charts are very representative of dozens of GIS design studies that found similar percentage rates for address data elements across a broad spectrum of local governments.

With so many other potential uses of accurate and up-to-date addresses there are important economies of scale to allow for heightened address utilization by everyone that needs them. This also provides for the adage “the more people looking at and using something, chances are that it will be kept current and that errors will be fixed or at least noted”. Address clean-ups come from many, many sources not just MSAG (Master Street Address Guide) “fall-outs”.

Importance of Address Information Figure 1

Orange County, FL Alexandria, VA

Land Geography Geography Structure 9% 4% 3% Unique Land Unique 3% Occupancy 1% 64% 9% 64% 8% Address Structure Related 3% 1% Occupancy 8% Contact 7% Address Related Information Contact 5% Information Right-of-Way 2% 9% Right-of-Way

In fact, Chester County wants to identify address problems well before an emergency call for service is needed. Waiting for MSAG file “fall-out” is unacceptable. This means that improved addresses are available for everyone, not just 911, and that all are participating in address use and edits, not only in just an emergency call situation.

Addresses are used and needed in most governmental activities. As these functions work at collecting, editing and using good addresses, a central repository and consistent data model will benefit everyone. Important too, is that addresses should be assigned/examined much earlier in the development cycle, before the streets are built and utilities are provided, prior to most building activities.

Improved Tools

The GIS’ base map provides for the strongest geographic framework for accurate street and address data capture, editing and development. Often times the GIS base map is also built upon a photogrammetric base that indicates roads, right-of-ways and improvements. The single line street map/database should be the basis for all address range data used for 911 call management and subsequent dispatching. The unified database should also drive the assignment of all individual addresses, so all E-911 addresses also reflect the proper address ranges, street types, directions and municipal designations, as determined by the single line street map/database.

The mapping, database and reporting tools common to GIS implementation are just what 911 data needs to be even more accurate, consistent and up-to-date. The ability to view right-of-ways and to see whether or not improvements exist for addressing is a powerful quality control feature of GIS. All too often the MSAG file is built in a non-graphic context with address ranges and segments poorly defined to the “face of the earth”, actual intersections, hundred blocks, etc.

The GIS also allows for powerful geographic edits and less error prone data maintenance, as key geography is built using the powerful topological structures of the GIS. Segments and street name synonyms, i.e. County Route 200 and State Route 113, are built using visible features, closely related to the proper hundred block orientation. The powerful graphics of the GIS also quickly show the orientation of streets to assigned addresses. Bad address parity or addresses “out of sequence” are quickly located by using the GIS. Most importantly all addresses should be cross-referenced with “mappable” GIS features such as street segments, intersections, parcels, map polygons or building footprints. This cross-referencing alone is the one of the most powerful features of 911/GIS integration, since it helps account for all known or assigned addresses. This also builds the relationship among all exact addresses and the address ranges from a graphic perspective. This makes it easier to correlate all MSAG(2) address ranges with E-911 exact addresses. The maps become the most powerful data quality control technique, since it helps “zoom in” to addressing problem areas quickly and visually, rather than perusing large reports or database dumps.

The County uses an improved MSAG(2) format which defines each street segment per recognizable street intersections, not the “span file” segments which are typical in most non-mapped, MSAG implementations.

Focus on the Development Perspective

Keeping addresses, street names and address range information up-to-date just for 911 and emergency dispatch purposes are a great deal of work in rapidly growing areas. Keeping GIS maps and related land records up-to-date is an equally daunting task. There are tremendous economies of scale if both update workflows can be orchestrated to be maintained in a more simultaneous fashion.

As the GIS’ base maps are maintained, as new subdivisions are approved, platted and recorded, as right-of-ways are managed, as building permits are finalized, etc. all these steps affect address assignments. With a consolidated approach to 911/GIS data integration, these workflow processes can be developed to ensure accurate and timely database/map maintenance.

Advantages of integration are especially felt in areas where addressing is a local responsibility of the cities and townships. Much coordination is required for interaction with the local jurisdictions. Their interest in GIS is also another good reason to leverage cooperative events, where daily communication helps to ensure that all have access to the very best and latest data.

Division of Labor

Addressing staff with a closer affinity to GIS procedures helps raise overall awareness about the importance of addresses for the community at large. Emergency services and 911 staff will benefit from a narrower focus on dispatch and larger emergency preparedness requirements. This implies a “lock-step” degree of coordination usually missing in most organizations. This coordination and division of labor is paramount to successful implementation, if it does not exist, if it is found wanting in any manner, if things change with differing staff, etc. the economies will be quickly lost.

The division of labor and the need for close coordination might also benefit from an agreed upon Service Level of Agreement which spells-out, in writing, agreed to procedures, data quality and timing parameters, so the success of the relationship can be measured. Note that if mapping gets backlogged, or resources or lost for important daily maintenance activities, the whole arrangement can become completely unworkable, very quickly.

Organizational Structure

The support of GIS/911 integration is predicated on a non-redundant, organizational model. Key staff must be organized, dedicated and directed to accomplish daily goals of accurate, consistent address and geographic referencing maintenance, else integration is a

pipe dream and will be a detriment. GIS staff will need to fully understand their role, their work will be of life and death consequences, and will be used seven days a week, 24 hours a day. Nothing like 911 support brings the GIS team out of the shadows and into the forefront of daily automation. With 911 support, GIS explodes from the past of “producing pretty maps once a month for the managers” to providing daily services of critical consequences. GIS becomes mainstream and begins to support all addressed based functions of government, not just dispatch and 911.

At Chester County, the GIS Addressing team is responsible for nearly two dozen activities organized into seven, major work program tasks. Key tasks will be described in the following sections.

Municipal Liaison/Exchange As you could image with 73 Townships, Cities and Boroughs a large effort is needed to work with key staff responsible for addressing and development review. A constant flow of correspondence and data is moving back and forth between the County and its municipalities. As new addresses are assigned, building permits issued, new streets paved or address problems uncovered--much work is needed to keep all the “geography” accurate and consistent.

In the Commonwealth, all actual address assignments rest with the municipalities. The County is responsible for the countywide compilation of all addresses for dispatch and 911 call handling.

Through various stages of 911/GIS implementation, many distinct tasks have occurred:

· Approval and agreement as to proper street names, street directions, i.e. N, S, E or W and street types, i.e. RD, ST, BV, etc. · Approval of address ranges along all street segments in the County · Address problem identification and verification · Address problem correction and edits · Exact, situs address determination · Situs address(es) cross-referenced to the proper ownership parcel · Address documentation and database construction.

Address Reconciliation/Situs Addressing This has been a multi-year task for Chester County. By using the initial address ranges established for the County’s E-911 program, the next step was to ensure that each, exact address was also properly assigned and within the proper address range, i.e. proper hundred block, odd or even side of the street and in the correct sequence--in ascending numerical order. Most importantly, the addresses were properly cross-referenced with the proper “parcel” in the GIS’ map, so address related data can always be reliably “mapped”. This work involves much municipal cooperation. County staff literally built the entire address database from the best available data and provide it to each of the municipalities in a report, map or digital format for their review and sign-off. Many

maps and address edits or corrections are also “flagged” for further attention. The work is reviewed with each municipality representative in a face-to-face setting.

Nearly 30 municipalities have returned their reviewed reports to the County. For the other municipalities we are focusing on the address Atlas and address map presentation format to elicit responses. These map products (in ARCView) can more readily highlight areas with addressing issues. The majority of the remaining municipalities are eager to review the maps indicating possible address problems. Follow-on work may lead to other 911 edits, TELCO updates and/or corrections.

This is a key area where future Internet enabled applications would allow for direct data and function sharing of all important address information. Currently a street report is available at the County’s web site. This helps keep street names unique, by Municipality, during subdivision plan review.

Process Municipal Requests/Correspondence FAXes, letters and telephone calls are received daily from the municipalities and refer to new addresses assigned, address problems corrected, street vacations, new subdivisions, plats approved, etc. This correspondence will lead to a work-flow/ripple affect throughout many GIS address functions. This workflow is very critical due to the emergency service nature of 911 data. In the future all of the communication should be web enabled.

Verizon Research Chester County is the only GIS in Pennsylvania with direct access to our major TELCO databases. This allows GIS staff to ensure very accurate and timely record/database maintenance. Verizon is the host TELCO for the MSAG file. The CLEC’s and other TELCO’s send their changes to GIS for forwarding to Verizon, as needed. The CESNA (Customer Emergency Service Number Assignment) terminal (in GIS) is used to access the MSAG file at Verizon. It is the MSAG that actually is used to switch the emergency calls to the proper PSAP (Public Safety Answering Port) for subsequent dispatching of the correct emergency vehicle by CAD software at the County.

Twice a year, County staff also assists Verizon in a major record compare of all exact addresses against the MSAG file as another important QA/QC check. Our most recent compare in April, 2001 netted a very high 98.6% match rate.

ANI/ALI Research If an address is used during dispatch of an emergency vehicle and the address does not “hit” the proper address range/segment, an error is reported for follow-up. As the address error is researched, various other corrections/processing may occur. As always, the internal consistency of the address and geographic records must be maintained and kept current. Problems are researched by GIS staff, before transmission to the appropriate TELCO.

Conflation Phase 3

The GIS’ single line, street maps came from two entirely different sources. Originally, the TIGER file was used to encode the Police and Fire zones and was used to build the dispatch geofiles. With many changes and edits, it finally became apparent that this file should be “conflated” or registered to the GIS’ base map. Along with GIS map development, a photogrammetrically accurate, street centerline file was built as well. Starting in the summer of 1999, conflation work began to combine the two centerline files into one, with more accurate address attributes and good GIS map, precisional accuracy. In this manner, the Geographic Index (GIX) Database/System was accurately built.

For the past four years our office has had two road centerline coverages. The one centerline coverage is spatially accurate, based on the digital orthophotos, but has few attributes. The other coverage is a TIGER Line file, which is not spatially accurate but is attributed for E-911 purposes. The attributes for this coverage include street name, thoroughfare type, theoretical address range, municipality name, etc. The spatial accuracy of this coverage is as much as +/- 600 feet in some instances.

At this time, the majority of the E-911 street centerlines have been realigned to match the spatially accurate centerlines. The current police and fire polygons, which are used to determine what Emergency units are sent to a particular address, are aligned to the spatially imprecise E-911 street centerlines. It is now time to realign the police and fire polygons to match the newly realigned street centerlines. This work is now very near completion. While this work is being done, there are street centerline issues that will be presented to the appropriate municipalities for resolution. There is also a final round of centerline address range edits to complete. Current GIS staff will complete these final edits by the end of the summer.

Once the police and fire boundaries are moved to match the newly aligned street centerlines and the municipal issues have been resolved, the realigned street centerline coverage will undergo a final QA/QC review and will then be made live for E-911 dispatching. Once this coverage is complete, the current E-911 and LRS street centerline coverages will be archived.

With the various large scale editing projects ongoing, QA/QC methods are normally assigned to different staff. This helps ensure overall accuracy, while providing a learning opportunity for other staff. With the powerful graphics capabilities of GIS, QA/QC procedures are often a fast examination of “filled” polygon maps, exception reporting, etc. Checking for unfilled or “zero” acreage polygons immediately highlights possible conflation problems.

The documentation of procedures, etc. is ongoing since the latest information is needed to train new staff. Procedures must be shared or data encoding differences will result. Especially critical is the proper documentation of the preferred workflow, so update

procedures are completed in the proper sequence, i.e. street names must be assigned before any exact addresses are encoded.

Through a great deal of hard work and simple “trial and error,” several map series have been produced for address use. These maps represent an address atlas with easy to use and read parcel level maps. A map Atlas is presently under development and a municipal map sheet series is also available. One is in an 11” x 17” format, and the other is a wall sized map/graphic. There is nothing like the address maps to highlight the work undertaken in developing the address databases! The maps also represent a significant “end-product” to deliver to the Municipalities in recognition of their cooperation. Atlases have been prepared for seven municipalities to generate responses. Five municipal address maps have been completed for municipalities that had returned their earlier, reviewed reports. The maps show PIN’s, addresses and acreages. The maps are proving very useful for subsequent address assignments.

Crystal Reports and ACCESS database reports are also frequently prepared for internal use and for submission to the municipalities.

PIN (Parcel Identification Number) Maintenance and Address Workflow Procedure

Because of GIS map maintenance procedures, new parcels are often created outside of the subdivision platting process. By definition, these new parcels will require a check of current addresses used. This synchronization is one critical step in keeping the cross- reference of addresses and GIS maps current. Changes in the GIS maps are an important source of workflow changes for the addressing group/function. Another group of GIS staff review up to 600 recorded deeds, on a daily basis, which may reflect property ownership or map boundary changes.

Address maintenance is also triggered by new active PIN’s in the automated Assessment system. Current work to reduce the 8,000 active PINs with no situs addresses is a high priority. Note that many parcels are “landlocked” and may never be addressed. Ongoing work is estimated at 100-200 new PINs per week. Another 5,000 records must be reviewed to check for street names, types and directions and “out of range” addresses. New procedures have been instituted to ensure a one-to-one correspondence between all parcels and situs addresses.

DES Services Functions and Liaison

The eloquence of our integrated 911/GIS approach is that all the hard work for 911 address maintenance can be immediately leveraged to all the other GIS users, County agencies, Municipalities, other levels of government and the private sector. Just think, if the addresses are good for 7x24 operations, they are going to be the best ever for virtually all other users. This means that addresses and geographic data are maintained consistently for all to use, in a timely manner. While many of the other mentioned activities directly involve Chester County’s Department of Emergency Services (DES), this task leads to a regular extract and off-load of the geofiles for subsequent loading into

the PRC (Planning Research Corporation) CAD (Computer Assisted Dispatch) system. Off-loads occur when new streets or address ranges are added, address ranges modified, records maintained or dispatch boundary changes occur.

There is daily contact between GIS staff and staff in DES. DES looks to GIS for their mapping and analytical needs. GIS staff maintain the fire and police polygon coverages that dictate emergency unit response. GIS staff prepared an ARCView application system for use in emergency preparedness situations involving either of our two nearby, aging nuclear plants. GIS staff frequently prepare maps showing recommended Police, Fire or Ambulance geocode changes. With the integrated GIS/911 model, DES staff can focus more on directly related dispatch, communications and emergency services liaison functions, rather than worrying about a multitude of address assignment responsibilities.

Note that there are two key geographic files external to GIS, the MSAG, which is at Verizon, and the geofile used for specific equipment dispatch.

With deregulation, there is another five-seven TELCO’s to deal with that provide local telephone service. Staff work with each to ensure accurate address/telephone number cross-referencing. In addition, Commonwealth and Conestoga Telephone Co. are other larger providers or CLECs (Competitive Local Exchange Carrier). An MSAG (Master Street Address Guide) licensing agreement is executed with each TELCO to allow for data sharing and coordination of records. Important QA/QC procedures are completed on any address data sent to the TELCO’s.

With the addition of new streets and subdivisions, much interrelated work may be triggered. New situs addresses are needed, street name tables updated, address ranges updated, geofiles “cut”, etc. an entire sequence of events is triggered. This ensures that the geography is current when it is needed for emergency dispatch purposes.

In the future, these GIX maintenance activities will automatically trigger many of the above-mentioned duties, now largely triggered by “brute force” methods.

Most recently staff have been working to “conflate” or cross-reference the 2000 TIGER/Line file to the GIX’s single line, street map database (now finished July, 2001)2. This means that County information can be directly compared with Census data or that County data can be mapped to Census geography, like Blocks and Tracts.

With this work came the completion of Traffic Zone polygons and Voting Precincts, important to many other agencies and functions. The Traffic Analysis Zones (TAZ’s) will play a very important role in any future transportation modeling, planning or growth management functions facing the County. This work has highlighted the need to keep all geography current and up-to-date not just 911, dispatch geography. A summary of 2000 Census data by Voting Precinct has recently been completed.

2U.S. Bureau of the Census,

System’s Model

In a coordinated GIS/911 data structure, key address range and exact address information are stored in an integral data model. Exact addresses can not be stored unless they “fit into” the address ranges and agree with the street name, street type and street direction of abutting street segments. The “single line” street map/database, in a GIS environment consists of both map vectors, i.e. graphics, and database attributes, i.e. address ranges, stored and maintained in a consistent fashion. The graphics add so much to the ability to store good attribute data! This data model also supports the likely precedence of the maintenance of key features, i.e. street names are normally approved, and address ranges known--prior to the assignment of individual addresses and the issuance of building permits. This precedence will also do much to ensure that addressing systems are maintained with consistency and future growth in mind. Nothing is worse than assigning addresses only to find that neighboring addresses are now “out of sequence”.

The Figure 3 summarizes the data model for key address/geographic entities. From this data structure and related GIS coverages come four flat files off-loaded on a weekly basis to the power the PRC/CAD dispatch functions. The files for CAD define block faces, intersections, commonplace names and interstate lanes.

Exact situs/location addresses PINS_ Addresses Address

Link to map Polygons

Hundred Block Ranges Left-right address Segment Segments From intersection-to- ranges, left-right Sides geography intersection

Thoroughfares Segment Synonyms

Street names, street types street directions Street synonym names Synonyms

Subset Data Model for Addressing Entities Chester County GIS/911 Figure 3

This early focus on address ranges as they pertain to new subdivisions also takes advantage of the GIS facilities to manage the addressing system and the address grids. The house numbering increment, for example four even (or odd numbers) per 40 feet of street frontage, as well as the North/South-East/West orientation of each proposed street can also be more easily determined by the GIS’s maps and records. The GIS can help maintain the “system” of addresses to ensure uniqueness with each address assignment and over time. This is particularly important where addressing grids and street layouts do not parallel the cardinal directions, like in the original thirteen colonies.

The coordinate base of GIS will become that much more critical as 911 efforts shift toward wireless 911 location identification. In some locales, 911 calls received from cellular telephones are approaching 40+%. Coordinate verification, and subsequent dispatching of the closest emergency unit along the most direct path to the incident location will also benefit from improved 911/GIS linkage. The X-Y coordinate in space still does very little to locate the nearest available unit, direct that unit to the scene and provide the best information concerning access, ingress and egress to the emergency scene.

Even for current E-911 activities, the coordinate, graphic base of the GIS allows for the accurate cross-referencing between X-Y coordinates and the map vectors and individual addresses, intersections or address ranges. The GIS provides for the perfect marriage of address data and related X-Y coordinates maintained in sequence. Try that in your CAD package!

GIS/911 data integration will also improve the collection and linkage of telephone numbers at larger sites and complexes. Identification of the building entrance or the security office, etc. may do little to speed help to the proper, exact location. Campus, hospital, even office park complexes where Centrex or other private systems serve many local users help obscure the true location of a caller in need. GIS will play an increasing role with the proper 3-D orientation of land parcels, buildings and occupancies/suites, apartments and rooms 3.

Summary and Conclusions

This discussion may be stating the obvious, yet in most locales little GIS/911 integration exists or has even been attempted. While there is no question that a solid address base is absolutely essential for 911 functions, we are now realizing that addresses are essential for so many more functions of government. Good addresses, stored and maintained in an open, modern IT environment can support virtually all governmental functions.

3Eichelberger, 1998.

For this integration to be successful, all update processing must be accomplished in a timely and accurate manner. No delays of a week or month can be tolerated where emergency dispatch is supported. This timeliness is a critical issue for many GIS implementations. For GIS/911 integration to work, all key map and address records must be maintained in a real-time, day-by-day manner. GIS programs require a major commitment to keep all maps and data current, especially as others become dependent on reliable maps/data. All of these daily uses also support the maintenance of an address database outside the narrow confines of emergency dispatch. If we catch address problems in a non-emergency context, we can clean-up the addresses before they are needed on a Saturday evening at mid-night in a snowstorm!

Glossary Ascending Sequence of Addresses--one of the key addressing “rules”. Numbers increase the further away from the address grid origin.

Address Parity--another “rule” of addressing. Normally even address numbers appear on one side of the street and odd addresses on the other side, not normally mixed.

CLEC (Competitive Local Exchange Carrier)--local telephone competition that came with deregulation.

Hundred Block Range--a lesser “rule” of addressing. With each intersection comes a new hundred block. Depending on address rules and local system, may not be used.

MSAG (Master Street Address Guide) File--basic geography/address range file hosted by the local Telephone Company, used to determine basic call switching to proper dispatch center for dispatching.

References Eichelberger, F. Peirce, 1993 “The Importance of Addresses--The Locus of GIS”, Urban and Regional Information Systems Association (URISA), Volume I, Annual Conference Proceedings. Eichelberger, F. Peirce, 1998, “3D GIS: The Necessary Next Wave”, Geo Info Systems, October, pp.30-33. United States Bureau of the Census, Geography Division, Guidelines to Update the TIGER/Line ª Files.

Bio – Pierce Eichelberger is the GIS Manager in Chester County Pennsylvania.

Louise B. Wennberg is a Senior Research Analyst with Chester County GIS Department West Chester Pennsylvania

Geocoding Brings Information to Life Andrew Pidgeon

Introduction

Real estate agents are maximizing sales by developing flexible geocoding applications.

The resources of e-business are available to help real estate agents sell brick-and-mortar locations. Real-estate boards, brokers, and multiple listing service (MLS) providers can purchase new Web- and client/server-based MLS solutions to expedite and increase sales. Clients and agents can access critical information, such as property photos, maps, community information and an array of demographic data. The common key to this data is geography. All of this information can be brought together through street-level geocoding and made usable by associating coordinates to individual addresses.

Geocoding brings information to life. With a full-service, online MLS system that includes mapping and geocoding, such as System 4i from Interealty.com, real estate agents can help buyers and sellers. For example, real estate agents can enter a buyer's purchase criteria (e.g., number of bedrooms, number of baths, style, price range, location, etc.) into the system and get photos, maps and information on homes in the MLS database that fit the description.

Geocoding also helps real estate agents establish selling prices for their clients' homes. They enter the relevant attributes of the home into the system and search the MLS database for similar properties in the same location that have sold in recent months. The system returns a map showing the exact location of all the comparables, as well as other relevant pricing information such as the number of days on market. By determining locations accurately, geocoding helps real estate agents save time, price properties and provide valuable information to clients when making recommendations.

Geocoding For Real Estate

In MLS applications with geocoding, when address data is entered, the software typically standardizes the address first. It segregates specific data values (e.g., street number) in discrete fields and standardizes the variations to a consistent form (e.g., Street, Str, to St). This facilitates validating the address by matching it to address records for a locale and geocoding.

Geocoding then matches the address records to a geographic base file (GBF) that contains address range street segment records. GBFs define the portion of a street that falls within particular geographic boundaries, such as a postal zone or census unit area. The matching process assigns a geographic code to the address record to establish its spatial location. The geocoder then appends the spatial information to the record. At the same time, it can assign additional information • often census codes and or demographic information • to the record. A mapping application can use the information to display the address on a map.

Four Basic Ingredients

The success of geocoding historically has been measured by the match rate achieved, and, recently, by the accuracy of the enriched information. Based on these criteria, there are four ingredients for optimal geocoding that a real estate board or MLS provider should look for in evaluating software capabilities: a good matching technology, flexibility, a good database, and the ability to append additional information.

Matching Technologies: Deterministic vs. Probabilistic The two ways to automate record matching are deterministic and probabilistic. The deterministic, or decision-table, approach performs a pattern- or rule-based lookup to a table. With deterministic matching, a real estate record either matches exactly or not at all. Probabilistic matching can deal with gray areas or nonexact matches. It uses mathematical measurement of available data in the record to evaluate the match. Both approaches are satisfactory when data is simple and matching requirements are lax. The probabilistic approach, however, is required for a high hit rate when there is inconsistent, incomplete, conflicting, or missing data • as is the case with many address records.

In the deterministic approach, each field being compared is evaluated and given a score or letter grade that tells how well it matched. The "grades" are accumulated to generate an overall match score. The score is then compared to a hard-coded limit. If the score exceeds the limit, it's a match. If it falls short, it's no match.

Probabilistic matching evaluates each field, but the score numerically represents that field's "information content" through the use of frequency information associated with the field. In a probabilistic approach, a match to a common word (Main) would receive a lower score than a match to an uncommon word (Nahattan). The individual field scores are then summed to produce a final score that precisely measures the information content of the fields being compared for a match. The final score, or match weight, can be converted into an odds ratio for an accurate gauge of the matches probability.

This measurement of "information content" is based on mathematically defined information theory and adjusts value and field scoring based on characteristics of the data. The measurement process gives a higher weight to a match between a pair of street numbers than between street types like road or avenue. In addition, it can give more gradations of partial credit than tables can to non-exact matching values like a street number with two digits transposed or name variations (Massachusetts Ave. vs. Mass. Ave). Probabilistic matching measures finer distinctions, finds more non-exact matches and makes fewer erroneous matches than deterministic matching. It can be audited and validated. The difference in results can be dramatic.

Critical to achieving the highest possible hit rate and accurate information is a good database. A variety of geographic directories or GBFs are available from public and private sector data publishers.

"With proprietary geocoding with deterministic matching, we needed an exact match or we couldn't cross-reference an address to the GBF files and so couldn't map it," said Lisa Gaetz, business development manager, Interealty.com. "We had a hit rate of about 50%. To increase our hit rate, we integrated Vality Technology's geocoding solution with probabilistic matching. Now we achieve hit rates in the 80-90 percent range or better."

Flexibility Issues A geocoding application can be "flexible" in modes of operation (batch and real-time interactive) and its ability to geocode unconventional or partial addresses for higher hit rates. Most purchasers probably want a system that processes address records in batch and real-time (batch for loading data the first time and regular updates, and interactively in real-time to process a single address on the spot for a buyer or seller).

Some geocoding software systems provide further flexibility and generate more hits, because they also geocode addresses that lack a street number or in which the input street number doesn't map. Such systems can geocode an input address containing an intersection, alphanumeric street number and/or a point of interest. Some systems will also allow for "default matching," which can assign a ZIP+4, ZIP+2, or five-digit ZIP- level coordinate to an otherwise unmatched record. Or, in the cases where the input address is lacking a five-digit ZIP and a city has multiple ZIP codes, the software will geocode to the center of the city.

This flexibility is critical to real estate professionals when the exact address is not known. In many cases, prospects know the general area where they want to live, but don't know the street names. In those cases, point-of-interest and intersection matching is critical.

Database Concerns Critical to achieving the highest possible hit rate and accurate information is a good database. A variety of geographic directories or GBFs are available from public- and private sector data publishers. They can be licensed for local, state, regional, or national areas. One critical criterion in evaluating the geocoding functionality of an MLS is the currency of the GBF (i.e., how regularly it is updated and maintained).

The system returns a map showing the exact location of all the comparables, as well as other relevant pricing information such as the number of days on market.

"While many of us think of geography as static, it really changes quite a lot," said Don Cooke, founder of Geographic Data Technology Inc. "New roads and new subdivisions are being built all the time. A good geocoding database must be able to capture those changes in order to deliver high hit rates. For MLS applications, the ability to locate a new subdivision is critical."

Public-sector data, while often freely and easily available, may be rarely updated and might not include the street and address updates necessary for geocoding success. Most

private-sector data providers offer scheduled updates with new street and address information captured on a regular basis. Some database providers are able to incorporate customer feedback into their update process to allow a user to provide information on the addresses that failed to geocode. The data providers then use that information to target their update efforts. But be aware that GBFs seldom represent up-to-the-minute conditions, because capturing changed data and producing GBFs reflecting new streets and neighborhoods are time consuming and never-ending tasks. (Achieving a 100% geocoding match rate only occurs under unique circumstances.)

Real estate agents can enter a buyer's purchase criteria into the system and get photos, maps and information on homes in the MLS database that fit the description.

GBFs also vary in form and content: from relatively simple digitized boundary files of counties, five-digit ZIPs, or flood plains; to more complex address coding guides (ACGs) that define complete streets and street sections contained inside geographic units, such as a five-digit ZIP, , or census block group; to sophisticated digital street centerline files, such as the U.S. Census Bureau's TIGER, that support coordinate coding. Most MLS applications require street-level , thus requiring an ACG or street centerline file to provide the most precise results.

The final component of the geocoding program is the street and address reference directory. All available directories, from TIGER to the various proprietary files, continuously attain higher states of completeness and accuracy. To ensure the best possible geocoding results, choose a street database that is updated on a regular basis and is as complete as possible, particularly in terms of street and address coverage.

Appending Additional Information Once an address is geocoded, community information and demographic information (derived from U.S. Census data) can be appended to the address based on the geocodes. In MLS services with appended information, the real estate agent can receive the community and demographic information related to the neighborhood as well as maps of the candidate properties. This type of one-stop shopping for information is invaluable to making a sale.

The Power of Spatial Enablement

State-of-the art geocoding functionality with probabilistic matching and flexibility will enable the mapping of most addresses in a locale despite inexact, incomplete, or unconventional address data, and will pinpoint the addresses accurately on a map, base on a current GBF file. Mapping is a powerful force in making a sale.

"Every year, we go to the National Association of Realtors (NAR) conventions," said Gaetz. "In our booth, we have several large screens, and we conduct live demos of System 4i. Invariably, the mapping is perceived as the 'sexy' component of our applications. When we show people how it works • selecting property information and

corresponding tax and deed data, school data, and community data by searching an area of the map • the Realtors just love it."

Mapping and geography are the key to real estate information, and a flexible geocoding application that offers a good matching technology, a good database and the ability to append information can truly unlock the opportunities for real estate professionals.

Bio - Andrew Pidgeon is senior product manager at Vality Technology Inc., Boston. He can be reached via e-mail at [email protected]

City of Newport News, VA GIS Addressing Issues Greg DiGiorgio

Introduction

The following paper outlines problems and solutions concerning street and parcel addressing in Geographic Information Systems (GIS) for municipal governments.

Servicing a population of 180,000 citizens, the City of Newport News is located in southeastern Virginia near the mouth of the James River.

Newport News covers a total of 69 square miles and encompasses 52,000 real estate parcels with an assessed value exceeding $9,000,000,000. Within this vast infrastructure, there are hundreds of multi-family dwellings from apartments to trailer parks.

The street network of Newport News comprises 1,600 lane miles of roadway with 1,500 public streets.

The Department of Real Estate Assessment maintains parcel and address data, while the Department of Engineering is responsible for the street network.

All parcel address and street data is spatially maintained in the City’s primary GIS system. This GIS system is operated by the Department of Engineering’s Mapping Office, which was established in 1990 and is currently staffed by a mapping supervisor and four cartographic technicians.

Why are addresses critical?

Address-based data affects municipal governance from emergency response through revenue collection to school bus routing. Additionally, most digital data sources in municipal government include addresses such as probation and parole, real estate assessments, immunization campaigns, vector control (mosquito spraying), taxation, payroll and benefits, utilities billing, police offenses and crime analysis, trash pickup, and storm water management. Add in other items such as garage sale permits, student registration, vehicle sales, voter registration, and census surveys and the proliferation of addresses through every facet of municipal government becomes readily apparent.

Who uses address-based data?

Police departments use addresses of crimes in briefings to alert officers to emerging patterns of crime. In conjunction with aerial photography, police use address data in terrain analysis prior to raids to display possible escape routes in a suspect’s domain, show natural barriers, and display surrounding housing for risk assessment. Crime analysts use address-based data to establish offense frequency by time of day and area, calculate number of violent crimes with a specific radius of certain areas, and determine

holiday-related crimes at select locations such as shopping malls. Police administration use address-based crime data in the formation of new precincts, to redistrict police response zone boundaries, and justify additional officers.

School systems use student address data to identify student residences for mass mailings as well as planning for new schools. Additional benefits include providing driver directions for inexperienced bus drivers and maximizing the number of students on each bus. Schools can then realize decreased operating costs by lessening the number of bus stops and routes, which, in turn, decreases maintenance costs on buses.

E911 emergency call center operations use addresses to display available response units within a certain radius, to show nearby hazardous waste sites, and to alert responders to previous trouble calls at the same address.

Problems with addresses

GIS technology is embedded in many core systems, including the City’s E911 emergency response system, Waterworks service and billing, and Police crime analysis. As a result, the demand for correct addressing is growing by leaps and bounds as more municipal operations come to rely on GIS technology.

Though problems with addresses have plagued municipalities long before GIS technology came into play, GIS has been the one technology to unify municipal asset data and bring street address problems to light.

Real world addresses present a host of problems for GIS such as:

· Odd and even addresses on the same side of a street · Randomly sequenced, adjacent house numbers like 2, 12, 32 · Owners can change private street names at any time without notification

A specific problem in Newport News is that responsibility for address and street name assignment is spread across multiple departments. The Real Estate Assessor is responsible for recording legal street names and addresses from real estate transactions, while the Engineering department responsible for street sign production.

Unfortunately, these departments use different computer systems to maintain their respective address and street name data. On occasion, the legally recorded street names from the Real Estate Assessor differ from the corresponding street signs from Engineering. And while changing a street sign may sound simple, it is not. Oftentimes, the name on an errant street sign has been in use for years with addresses from that street propagated through systems such as police reports, court records, tax records, voter lists, student registration and utility billing data.

In the absence of a central authority to resolve such conflicts, departments must not only agree to change street names on signs, but also consult with citizens to judge the impact

of such changes. In other cases, the City Council must pass ordinances to change names. In all cases, vast amounts of computer data must be reconciled to reflect street name spelling changes.

Compounding the problem of misspelled street names, computer systems do not share a single street address master validation system, causing incorrect addresses and street names to proliferate throughout various computer systems. That is, there is no single source against which street names and addresses are validated for correct spelling and accurate numbering as they are entered into computer systems. Additionally, paper forms allow a variety of address formats with no accepted standard.

As such, computer systems may record specific addresses or street names that do not actually exist.

Lastly, the modeling of street networks in the GIS can lead to problems. In the physical world, most streets are multi-lane and either unidirectional or bidirectional, yet most organizations model street networks in the GIS as a single line. While doing so is a standard and accepted practice, questions and problems arise. For example, how can the GIS represent the closure of a single turn lane in a multi-lane roadway such that emergency vehicle routing is optimized?

Address Geo-coding

Address geo-coding is a powerful GIS tool, allowing placement of incidents and resources on a digital map based solely on addresses.

Even with all the inherent problems of trying to coordinate address data across a municipal government, address geo-coding provides an economical, though not entirely accurate, way to apply GIS technology to municipal operations.

With address geo-coding, addresses can be downloaded from a variety of computer systems and placed on digital maps as diverse as police pin maps, social service client maps, emergency shelter maps, school district maps, and fire station service area maps.

Of course, the placement of incidents and resources on digital maps is only as accurate as the GIS-based street network.

To boost the accuracy of the street network, the street network should include all private and public streets, exhibit consistent spelling of street names, implement standardized street type abbreviations like “RD” for road and “AVE” for avenue, and use accurate address ranges for the street segments.

Address ranges are arguably the most important aspect of a well-defined street network.

In GIS, street networks are decomposed into thousands of street segments that connect to form a unified street network. Each street segment contains a starting and ending address range for the left and right sides of a block of a street.

Address geo-coding uses the address ranges of a street network to extrapolate the location of an incident or resource along a street segment.

Some organizations use estimated address ranges whereby minimum and maximum address ranges are assigned to street segments to ensure the inclusion of any address – even those that have not yet been assigned. Other organizations use actual address ranges to include only developed properties on a given block. Still others use a combination of the two types of address ranges.

For address geo-coding, actual address ranges work best as the spacing between addresses more closely matches the real world. When estimated address ranges are used in geo-coding, there is a tendency for objects to appear either bunched up or too thinly spread across street segments.

Pitfalls to avoid

Lack of authority to resolve street name issues

Cities may divide responsibility for street names and address ranges among several departments. For example, the Real Estate department may assign legal street names, but Engineering may be responsible for street signs. If the names do not match, who is responsible for fixing the problem? Also consider that changing street signs can cause major problems for citizens and businesses.

Lack of standardized addresses in computer systems

Standardization should cover street names, street type abbreviations (RD, ST, AVE), and street direction (N, NE, S, etc.) as well as address ranges.

Allowing address overrides

By allowing operators to override address entry in computer systems like E911 emergency dispatch or E311 information systems, you increase the chance of problems. Unfortunately, you may have no choice since police and fire departments respond to addresses that may not yet have been entered into your computer system.

Lack of database integrity assurance

Without a single, centralized source to validate addresses as they are entered into systems, many addresses will not geo-code. While this may not present a problem for non-critical systems, systems like E911 emergency dispatch that rely on address geo-coding will quickly highlight address deficiencies.

Conclusions

With careful planning and inclusion of decision makers from the outset, many pitfalls of addressing in GIS can be avoided.

Authorize an arbiter to resolve street name disputes by changing street signs, fixing errors in legally recorded street names, and standardizing inconsistent assignment of house numbers.

Enforce standardized addressing across systems by developing a standard address format and employing a centralized address validation system.

Implement actual address ranges in the street network to more accurately reflect the physical world.

Minimize the impact of address overrides by flagging overrides for manual confirmation and adding verified and correct overrides to the master database.

Develop GIS integrity assurance procedures to prevent duplicate and overlapping address ranges, reversed address ranges, and flipped orientation of line segments that can result from GIS technician’s data entry errors.

Create reports to compare GIS address data to non-GIS address sources to highlight missing streets and identify errors in street names and address ranges.

Identify sources of geo-spatial data other than addresses like police beats, fire response zones, public works service zones, and school zones to use when erroneous addresses will not geo-code. Of course, data will need a key of some sort to link to these other spatial data sources.

Bio - Greg DiGiorgio is a Database Analyst with the City of Newport News Newport News Virginia.

LOCATION BASED TECHNOLOGIES AND E911 Marc E Berryman

Introduction

Greater Harris County 911 Emergency Network (Network) in Houston, Texas administers one of the most technologically advanced 911 systems in the nation. The Network is dedicated to providing the finest technology available for the citizens that it serves. The Network is the third largest 911 services in the United States, serving over 4 million citizens in 2 counties and 47 cities, including Houston.

There are 43 fully equipped 911 call answering centers (PSAP’s) within the area served by Network. The citizens within the Networks service area have the most advanced levels of 911 services available anywhere in the nation. Locating wireless callers, an integrated digital mapping system, locations of potential responding law, fire, and EMS units, and having the ability to automatically receive information about a traffic collision are just a few of the services that are part of the Network.

The Year 2000 brought the initiation of a dedicated public safety Geographic Information System (GIS) task force. The Network partnered with the City of Houston and combined resources to produce accurate mapping for emergency response personnel. This program has grown, since the very beginning, and now has the involvement of over 100 public safety related agencies assisting in maintaining the most accurate and up-to-date spatial database available. This data is being used as the common base map for all public safety agencies, and many others. This ensures that all everyone is using standardized addresses and street naming conventions, and that all base-data is synchronized with other databases.

The spatial data is maintained and updated on a daily and often hourly basis. This data is shared with the regional South Texas Addressing and Reference Map, the STAR*Map program. The STAR*Map program shares its data with the Texas Strategic Mapping Project, which in turn shares with the National Mapping Program.

Wireless E911

In 1994 the National Emergency Number Association (NENA), the Association of Public-Safety Communications Officials, and the National Association of State Nine One One Administrators officially lobby the Federal Communication Commission (FCC) for service parity between existing wireline E911 systems and wireless services. The FCC’s wireless E911 rules require wireless carriers to begin transmission of enhanced location information in two phases, Phase I and Phase II.

Phase I requires carriers to transmit, to the PSAP, a caller’s call back number (of the mobile device), and location of the receiving cell tower and antenna orientation that received the call. This provides a gross location of anywhere between 3 to over 25 square

miles. The FCC deadline for Phase I has come and gone--it was April 2000. According to NENA, less than 50% of PSAP’s now have this feature available.

Phase II requires the wireless service provider to provide the call back telephone number, cell tower location, cell sector (antenna orientation) information, plus longitude and latitude (X, Y) information. Phase II E911 services exist today in a handful of locations, by a few wireless service providers, but these numbers will grow.

The accuracy requirement imposed on the wireless carriers by the FCC varies depending on the location technology used by the wireless carrier.

For carriers using a network-based Phase II location technology, accuracy of the latitude and longitude provided to the PSAP must be within 100 meters for 67 percent of the calls and 300 meters for 95 percent of the calls. Network-based location technology is usually based on the time it takes for the cellular call to reach two or more cell towers. While not as accurate, it works with any cellular telephone now in use.

For carriers using a handset-based Phase II location technology, accuracy of the latitude and longitude provided, to the PSAP, must be within 50 meters for 67 percent of the calls and 150 meters for 95 percent of the calls. Handset based location technology is usually based on GPS, or assisted GPS technology. While more accurate than the network-based solution, it requires new cell phone handsets.

Spatial Technology in E911

Wireless phones and other wireless location devices are the driving force of GIS use and awareness in public safety. Most PSAP’s rely on tabular databases of addresses and block ranges to direct the responding agency to a call for help. These tabular databases do not provide adequate information to assist in locating wireless calls, especially if the caller or the call taker is unfamiliar with the area.

Being able to assist the 911 callers, who cannot provide an accurate location, is very dependent on having spatial data that includes landmarks, structures, lakes, creek, parks, common places, and other features. This data has proven very helpful in determining the location of the 911 callers.

Not only does the PSAP need GIS technology to properly dispatch emergency services to the wireless caller, but routing the call to the proper PSAP also requires spatial technology. When a wireless Phase II call is placed, the coordinates are used to route the call to the proper PSAP. Wireless Phase I calls are routed based on the location of the cell tower and the orientation of the cell antennas.

When the longitude and latitude coordinate is received for a wireless Phase II call, it is “run through” a database which contains the service area polygons of the 6000+ PSAP’s in the US. This point-in-polygon procedure allows the wireless call to be routed to the

proper PSAP. The database that performs this feat is known as the Coordinate Routing Database or CRDB for short.

All this depends on having accurate, current, and precise base map information. The main data layers of E911 are the street centerline and the PSAP boundaries. Without this spatial information, wireless request of emergency services cannot be properly located. Without having accurate and current street centerline data, emergency services cannot be efficiently dispatched.

GIS technology is allowing emergency responders to reduce response time and locate individuals in distress. The street centerlines, along with other related spatial data, helps speed response, reduce transfer time, and maximize existing resources. GIS technology has come into public safety, and it will not be going away.

Automatic Crash Notification

Each year, more than 6 million crashes occur on US highways. Crashes kill more than 41,000 people, injure about 3.2 million, and cost more than $150 billion a year. Thousands of Americans die each year and far more suffer severe and lasting injuries because emergency responders do not know when an auto crash or medical incident has occurred. Precious minutes and lives are lost because emergency responders cannot automatically locate a wireless 911 caller or dispatch appropriate emergency care.

The likelihood of survivability increases when crash victims receive medical attention within the first hour following the crash referred to as the "Golden Hour." Most deaths occur within a few hours of the automotive crash.

30% of deaths occur within minutes of crash 50% of deaths occur prior to arrival at a hospital 70% of deaths occur within 2 hours of crash

Telecommunications, automotive and location technologies are converging to automatically notify emergency responders when a vehicle is in a serious collision. Automatic Crash Notification (ACN) systems use wireless telecommunication technologies to instantly alert a Telematics service providers (TSP) when a passenger presses the car’s Mayday button or the air bag deploys.

In an Automatic Crash Notification (ACN) situation, the TSP would locate the vehicle, and determine the proper PSAP to call, based on the location. The TSP would then call a ten-digit telephone number of the PSAP and relay the information by voice. The PSAP had to determine the proper responding agency based on the information provided. This information relay takes time and could lead to misunderstandings.

In June of 2000, the Greater Harris County 911 Network, in partnership with Intrado (then SCC) Communications, Southwestern Bell, Veridian Engineering, Plant Equipment, Inc. and Combix Corp. conducted the nation’s first end-to-end trial of

automatic crash notification (ACN). This test included a GPS location of the vehicle involved, and crash pulse data-information such as change in velocity, number of occupants, seat belts in use, and other data-to a PSAP and a trauma center.

Upon vehicle impact, the location and crash data was instantly transmitted to Intrado. Based on the longitude and latitude of the vehicle, this data was relayed to the Houston Fire Department (HFD). Intrado initiated a voice connection to the vehicle, and connected the call to the Southwestern Bell Selective Routing Tandem in Houston, where it was routed to the HFD PSAP. Integrated computer telephony equipment allowed voice communication, crash information, and the location of the crash on a digital map, to be viewed and recorded. The crash pulse data was also relayed to the receiving trauma center.

The June 2000 staged crash showed that it was possible for a vehicle equipped with telematic devices to relay data directly into a PSAP, and to a trauma center. This proof- of-concept lead to the "Enhanced Automatic Crash Notification" project. This project delivers enhanced crash data and voice communication from the telematic-equipped automobile into the existing E911 system. This project started in the latter part of June 2002, in the Network area.

Enhanced Automatic Crash Notification

The Greater Harris County 911 Emergency Network partnered with Ford Motor Company, Cross Country Automotive Services, Veridian, Trimble, Southwestern Bell Communications, Intrado, and 18 local area law enforcement and fire response agencies to begin the next generation of automatic crash notification. The project is known as the Enhance Automatic Crash Notification (EACN) Fleet Program.

Ford Motor Company has installed advanced crash detection systems by Veridian, GPS units by Trimble, and next generation sensors by Ford Automotive, into 300 public safety vehicles within the Network’s area. Cross Country Automotive Services is providing the Telematics service providers services, Intrado is providing the database services, with communications networks via Southwestern Bell Communications.

In the event of a crash, an EACN device immediately initiates a connection to emergency responders and transmits critical crash information. Specifically, EACN provides responders with the location of the crash and crash data elements. These data elements can include data and time, longitude, latitude, datum, location confidence, Telematics service provider name and phone number, vehicle type, color, model, year, license plate number, owners name, number of occupants, crash severity estimation, thrown occupants, entrapped occupants, seat belt usage, whether or not the vehicle rolled over, the difference in vehicle speed at the time of the crash, the principal direction of force, and others.

The Enhance Automatic Crash Notification Fleet Program is a two-year program, using 300 law enforcement and fire response vehicles, to meet the following objectives:

· Deliver Telematics calls, requiring emergency assistance, as native 911 calls. · Minimize any impact to the 911 Call Center · Develop an open interface for data delivery (XML) · Utilize existing NENA protocols · Gain support from Industry, Organizations, and US DOT · Improve National Institute of Traffic Safety Model · To significantly improve response time of EMS, which may save lives · Reduce hospitalization time and improve trauma outcome, which should lower recovery cost · Better prepare EMS and trauma centers on the nature and severity of the injuries · Make it Work!

The goals of the EACN Fleet Program includes:

· One minute or less notification times of serious accidents; · Better understanding of safety counter-measures in real world accidents; · Observe system functionality over a wide range of incidents thereby increasing the overall robustness; · Evaluate the 2nd Generation Urgency algorithm; · Define appropriate data for transfer as required by various agencies; · Build relationships and framework for future partnerships; · Provide accurate crash location mapping for improved response time; · Achieve leadership position by bringing the medical and response communities and the auto industry together; · Provides framework for future technology deployment; · Improve Traffic Management; · Obtain data for understanding real-world accidents, so safety systems can be improved.

The value of these devices may eventually lead to them being installed on every vehicle, much like seat belts are today.

Stored ACN data has tremendous value. Stripped of individual characteristics to preserve citizen privacy, ACN data could build a valuable aggregate database. This data can help government, industry, and safety experts avoid issues like the recent Firestone problems. The aggregated data could be used to design better safety systems, reduce errors in first response and trauma care, and decrease medical cost by providing better models.

Reducing emergency response times saves lives and reduces the impact of injuries. Knowing which emergencies are serious and where they are can save tax dollars and target emergency resources. Automatic Crash Notification (ACN) is the next major advancement in auto safety.

Network Early Warning System (NEWS)

The Greater Harris County 911 Emergency Network is integrating a target notification warning system into their existing systems. The target notification system allows a specific area to be targeted for rapid calling and delivery of emergency messages. It’s like having 911 call you. This type of service has proven valuable in quickly locating missing children, Alzheimer’s patients, HazMat emergencies, and weather threatening conditions.

The target notification system being developed in the Network’s area will be called the Network Early Warning System, or NEWS. The system will be one of the most advanced tools available to rapidly identify, notify, and instruct affected individuals based on their geographic location. By partnering with Southwestern Bell

Communications and Intrado, the system will be able to dial and relay a message to thousands of people every minute.

The area to be notified can be identified by drawing a polygon on the GIS map display. The street centerlines within the target area are highlighted and selected. Each street centerline has attributes data such as the high and low address range, the left and right side addresses, street names, and other information. The selected streets and addresses are matched up, by address and area, with the 911 telephone numbers in the selected area. The telephone numbers are placed into a calling queue and dialed. When the phones are answered, a message is delivered. The system calls back unanswered and busy numbers, and keeps records of the totals of all calls.

In a crisis, seconds count, getting the right information to the right people, at the earliest possible time, saves lives. Using the telephone system as the contacting medium is more efficient, less costly, and more reliable than any other method available.

NEWS uses the GIS street centerline maintained by the Network, and updated daily, and the 911 telephone number database. This database contains every wireline telephone number, including un-listed and non-published numbers. Since the Network manages data on a daily basis, the data is the most accurate possible. Plans are underway to expand the notification to wireless phones in the target area.

NEWS also allow for call list so individuals associated with the search and rescue teams, volunteer fire departments, SWAT Teams, HazMat crews, and other groups can be quickly called. The system can even call the same individuals, using multiple phone numbers, until they are contacted.

Since the Network covers over 150 different public safety agencies, one of the big task was getting them to agree that the first certified individual to an incident could initiate the activation of the system. Full activation of the system can only be initiated by an approved agency head, or one of their alternates, with a one-time password. Since the password would need to be transmitted over radio, it would only be good once, to prevent

possible unauthorized activation. The Networks 7x24 command center is the primary contact point for launching any events.

The agreements, understandings, cooperation, collaboration, and team building efforts to make the target notification system work, with 1000’s of individuals working for 100’s of different agencies, is a larger task than simply making the technology work.

Real-Time Weather and Weather Prediction

The Network has also purchased a custom 911 weather alerting and prediction system. The system allows our GIS data to be integrated into the system, and their data to be integrated into our GIS.

The system allows for displaying real-time weather information in a highly detailed manner, and has the capability of providing critical storm prediction information. The accuracy of the future weather predictions rivals any other weather prediction system. The system allows alarms, and alerts to be set and activated by certain weather conditions within certain geographic areas. The alerts and alarms can be Faxed, set by page or mobile phone text messages, or e-mailed to certain individuals. The 911WxAlertTM System from Weather Decision Technologies of Norman, Oklahoma will play a key role in the daily operations of the Network’s weather coordinator, the “The 911WxAlert System will assist the Network with continuing to provide the newest technology available for the citizens that it serves” says John Melcher, Deputy Director of the Network. “This unique weather prediction and alerting service greatly enhances our PSAP’s ability to make critical operational decisions, to protect the citizens of Harris and Fort Bend Counties and manage our resources during periods of severe and hazardous weather”

The Future

GIS technology and spatial data play an increasingly important role in the public safety arena. In the near future one will probably see the 911 infrastructure of this country joining forces and forming partnerships that one cannot even imagine today. Technology will continue to improve, and will give rise to new problems to the 911 infrastructure. How can one make a 911 call from a two-way pager, a wireless connected PDA, or a computer connected to the Internet through a dial-up connection? These questions and many others are being asked by public safety personnel today, so the future will not "sneak up" on 911. We will see more innovations in the next few years than we have seen since the first 911 systems was put in place, just over 30 years ago.

Bio - Marc E Berryman is with the Greater Harris County E9-1-1 Network of Houston, Texas

DIFFICULTIES OF TIMELY AND ACCURATE ADDRESS MATCHING Ilir Bejleri, Ph.D. Scott Wright Ruth Steiner, Ph.D. Richard Schneider, Ph.D.

Introduction

Beginning 1999, the Urban Planning Department at the University of Florida attempted to match the addresses for pedestrian and bicycle crash locations in Miami-Dade County, according to the information provided on police crash reports. The end goal is the reduction of the overall number of incidents and in the degree of seriousness, through the development of appropriate countermeasures at specific locations.

Though the response to the problem here is long term in nature – rather than the immediate one required in emergency response situations – the concept is the same. In both situations, addresses should be matched as quickly and accurately as possible in order to provide the most appropriate and timely response. For this reason, much of the efforts of the University of Florida have been aimed at making the process of address matching more efficient, reliable and easier to understand.

The reliability of any system for analysis begins with the data involved. In the case of address matching, the vital data components are the addresses in the data table and the digital street data layer used as a base for matching. Any errors in either component are multiplied and sometimes amplified when the process of address matching begins. Errors in the address data table are found to result primarily from human mistakes in data entry. Other problems result from an incomplete or outdated digital street data layer, aliases for street names that are not defined and other inaccuracies in the digital street data layer.

To address these difficulties, a GIS application has been designed to make address matching more accurate, faster, and less confusing. The interface deals with the creation of portions of the initial database, the compilation and editing of existing addresses in the data table, and the reconciliation of street names with the address table and the address matching system.

In order to make address matching a viable technology in the fields of emergency response and disaster preparedness, the system must be made to be efficient, expedient, and reliable. The application, though originally intended to match crash locations, accomplishes this.

For the last three years, the Urban Planning Department at the University of Florida has been involved in various grants assisting Miami-Dade County, Florida in the mapping of pedestrian and bicycle crashes within their jurisdiction. Florida is currently one of the worst states in the nation in terms of the number of pedestrian and bicycle crashes each year, and Miami-Dade County consistently has the greatest number of crashes in the

state. Over nine thousand pedestrian crashes from the years 1996-2000 were mapped and evaluated. The purpose of mapping these crashes was to add another dimension – a visual and spatial one – to the methods being currently used to understand and approach this situation.

The value of address matching is that it reveals the spatial relationships of the incidents, thereby aiding in the evaluation and definition of a large-scale problem. In this particular case, the end goal was the reduction of the overall number of incidents and in the degree of seriousness, through the development of appropriate countermeasures at specific locations. Although the response to the problem here is long term in nature – rather than the immediate one required in emergency response situations – the concept is the same. In both situations, addresses should be matched as quickly and accurately as possible in order to provide the most appropriate and timely response. For this reason, much of the efforts here have been aimed at making the process of address matching more efficient, reliable and easier to understand.

In late 1999, address matching began for pedestrian crash locations in Miami-Dade County, based on information provided on police crash reports. The original data set obtained from the Florida Department of Motor Vehicles, contained many different fields with information for each crash. However, the critical information describing the location of the crash was not included and was therefore had to be entered as a new field in the data table. The address data table for all crashes was then address matched to an available digital street layer. Problems ensued. During the course of the project, multiple difficulties regarding the accuracy of the address matching were encountered. Since the exact location of each crash would matter in the final analysis of crash locations countywide, the precision of the results was of measurable importance. So, the next step was to determine how to make this process more accurate and thus more useful, by first identifying the potential reasons that address matching would not work and then by determining how to minimize these problems. Another goal was to reduce the time required to successfully match addresses, since this could be seen as another boundary to address matching on a large scale in the real world. In order to begin to accomplish this, the most time-consuming steps of the process were examined to determine how to simplify or remove them. The improvement of the overall accuracy and the decrease in time required for address matching was sought to make the process more feasible for local governments. These measures were incorporated into a customized Geographic Information Systems application - made available to local governments - that would automate many difficult steps and limit the potential sources of error in the address matching process.

Address Matching Process

Following is a description of the steps for the address matching process of pedestrian crashes in Miami-Dade County, which is presented here as a background in order to understand the difficulties and solutions encountered during this project.

1. Initial Review of Data – This phase involved the removal from further analysis of all the original crash records that were not included in the scope of this project such as automobile and bicycle crashes. In addition, crashes occurring outside of the study area (Miami-Dade County) and those without any information describing crash location were removed as well.

2. Data Entry – All locations of crashes described on the police crash reports (Figure 1) could be grouped as either simple street addresses or as intersections of streets. In the case of the latter, information describing the offset direction and distance of the crash from the intersection of the streets was included on the report. Following is an example of a police crash report, showing a unique case number (54749794), the two intersecting streets nearest the crash (Valenzia Avenue and Salzedo Street), and the offset distance (200 feet) and direction (east) from the intersection, describing exactly where the crash occurred.

Figure 1: Fragment of crash police report showing the portion of address

3. Initial Batch Matching of Addresses – Once all the data defining the location of crashes was entered, the data table was batch matched to the digital streets file using the GIS address matching function (software used was ArcView 3.x). An output GIS point file (in shapefile format) was created containing the points of all successfully matched crashes. Any unmatched crashes were included in the resulting file but no point locations were assigned to them.

4. Review and Editing of Data – The unmatched addresses were then selected from the table of the point GIS file and reviewed. Corrections were made to the addresses based on local knowledge or by referencing maps of the area. In some cases, multiple variations of an address were attempted until one matched successfully.

5. Re-matching of Addresses – Once all unmatched addresses were reviewed they were matched again. The successful matches were updated in the point GIS file.

6. Manual Matching – After the addresses were sufficiently edited and re-matched, any remaining unmatched addresses had to be located using maps as references and digitized onscreen using the GIS software.

7. Offset Processing – Once all crashes were correctly located, they could be adjusted according to the offset distances from intersection.

Following is a table (Table 1) displaying the matching percentages of Miami-Dade pedestrian crashes for the year 2000. Of the 1996 original crash records, 240 were found to be located outside of the county, lack information regarding location, or describe accidents not involving pedestrians. Of the 1756 remaining valid records, 73.5% matched automatically on the first attempt, 14.1% were reviewed and edited and then matched automatically, and 10.3% were manually matched. In total, 1720 of the 1756 records were successfully matched. The remaining 2.1% of the crashes could not be matched, either because the information describing the location of the crash was found to be incomplete, or because of inaccuracies in the streets data file used for matching. Obviously, if this process was being used in an emergency situation, then total accuracy (100% matching) and minimal time required for matching would be desired.

2000 Miami-Dade Matching

# Records % of All Records Original Records 1996 100.0% Non-Valid Records 240 12.0% No Address 86 4.3% Outside Date county 50 2.5% Non-Pedestrian 104 5.2% Valid Records 1756 88.0%

# Records % of Valid Records Automatic Matching 1291 73.5% Interactive Matching 248 14.1% 1st Edit Rematch Initial Review 68 3.9% 2nd Edit Rematch Map Reference 164 9.3% 3rd Edit Rematch Record Pull 16 0.9% Manual Matching 181 10.3% Total Matching 1720 97.9%

Table 1: Miami-Dade County matching of pedestrian crashes for year 2000

It should be noted that the first version of a crash mapping GIS customized application interface was used for the year 2000 pedestrian crashes in Miami-Dade County, and so

these percentages reflect the positive impact that this interface had on the effectiveness of the crash mapping process. Previous years had much lower percentages for automatic matching and total matching. Due to the additional editing of addresses and manual matching that was undertaken during these first years of the project, the entire process took two to three times longer.

Sources Of Error

Following is a description of the various types of problems that were encountered during the process of address matching, including inaccuracies resulting from data (address data and streets GIS file) and from the address matching process itself. In many cases, the errors ultimately resulted in the failure of a crash record to match, decreasing the overall success rate of the matching process. In many other instances, the errors were corrected or the crashes were manually matched, increasing the overall time and effort required.

Inaccuracies Resulting from Data The reliability of any system for analysis begins with the data involved. In the case of address matching, the vital data components are the addresses in the data table and the digital street data layer used as a base for matching. Any errors in either component were multiplied and sometimes amplified when the process of address matching began. Errors in the address data table were found to result primarily from human mistakes in data entry. Other problems resulted from an incomplete or outdated digital street data layer, aliases for street names that were not defined, and other inaccuracies in the digital street data layer.

Errors in Crash Address Data Table Many inaccuracies in the crash address data table result from the initial recording of the data by the officer arriving on the scene of the crash. Any mistakes made in the entry of the address at this point are carried over into the crash data table and eventually result in errors in the address matching (geocoding) process. Following are some of the common errors in addresses recorded at the scene of the crash: - Misspellings – improper spelling of street names. - Incomplete addresses – addresses that were missing one or more components. - Incomplete offset data – many times the offset direction or distance was missing on the crash report.

Many other inaccuracies resulted from problems during data entry: - Typographical errors – basic mistakes made during the transfer of information from the crash reports to the data table occurred far too often. A single error in the address or offset could result in an inaccurate match. - Incomplete addresses – another common mistake in data entry involved incomplete transfer of the address information. Failure to include a component of the address (such as the street type or directional prefix) would prevent a match from occurring.

Errors in the format of the data also occurred: - Improper format – sometimes street addresses simply did not fit the format required to match properly (e.g. – 120 Rue Bordeaux in Miami Beach has a non standard street name). Because the inaccuracy in these cases did not result from human error, this is an example of a problem that could not be controlled through any improvement of the process. In these cases, the crashes were usually matched manually in the end. - Alternate street names – many streets are known by many different names, or change names as they pass from one area into another. These streets presented many problems in the address matching process, since the streets database only contains one street name assigned to each line segment, and oftentimes an alternate name may have been used to describe the street.

Errors in Streets Database During the first several attempts to match addresses in Miami-Dade County, it was realized that there were certain problem areas in the county where addresses rarely matched to the streets database: - Interstates, State Roads, Causeways - New Developments - Outlying Rural Roads - Northern Island Areas - Port Area Roads

Upon investigation, it was realized that these areas of the street grid refused to match for a number of different reasons. Following is a list of the potential problems with any streets database, which may lead to incorrect matches or the failure of addresses to match:

1. Interstates and some state highways lack nodal intersections with other roads, and therefore many times will not match as intersections. Interstates and highways also rarely have an address numbering system, and so there is no linear reference to match points to the locations described in crash reports. 2. No intersection node exists where certain road segments intersect, and so crash locations defined as intersections will not match there. 3. Newly constructed roads are oftentimes missing from the data set. 4. Some road segments lack data in the street name field (such as bridges, causeways, or port area roads) or are listed only as unnamed roads (typically new roads in outlying suburban areas). 5. A very small number of roads may be incorrectly named. 6. Address numbering may be incorrect or outdated on certain sections of roads in the database.

Inaccuracies Resulting from Geocoding Process

Even with perfectly reliable data, there are still reasons that addresses may not match properly. In order to understand this, it’s important to know how the process of address matching actually works – how the GIS software assigns each address to a point on the digital street map. The understanding of the GIS software was gained largely through trial and error, and later through examination of the actual programming processes for the address matching function.

Streets matching style – there are a number of different formats that may be used to match a streets database to a table of addresses. In this project, the U.S. Streets matching format was used. This format which requires either a street number, name, and type for a regular address or two street names and types for an intersection address. Directional prefixes and suffixes are optional. If a particular address does not include one of the required fields for this format, it will not match.

Geocoding parameters – the existing address matching function of the GIS software (ESRI’s Arcview 3.x) works by dividing each address into sub- components, then searches for a corresponding line segment in the streets file with all matching components. The precision with which these components are required to match can be adjusted by setting the desired geocoding (matching method) parameters. Spelling sensitivity determines how exactly the spelling of street addresses in the address table must match those in the streets file in order for an address match to occur. Matching accuracy determines what level of accuracy is needed for an address to count as a match. During the project in Miami-Dade County, it was discovered through trial and error that when spelling sensitivity was set below 92% and matching accuracy was set below 100%, some addresses were allowed to match improperly. In order to maximize the accuracy of the placement of address points, these parameters were set as high as possible to prevent inaccurate matches from occurring and still permit the matching of all addresses that were known to be correct. If this system was being used to match addresses in an emergency response situation, then total accuracy for each match would be imperative. Since each address match is of vital importance in such a situation, then matching accuracy would have to be set at 100%.

Multiple matches – In a number of occasions there may be than one 100% match for a given address. This happens when the same address is found in more than one city. In this case unfortunately the computer randomly chooses one of these addresses, which may not be the correct one. This problem can be controlled by introducing an additional geographic address component such as a zip code, into the process of address matching. In this instance, however, the address data available did not include a zip code, and so the city boundaries were used for a geographic zone. However, this presented another problem, since the city code assigned to each crash in the report was often unreliable.

Solutions

Standardization of the data used and of the address matching process is needed to ensure accuracy and speed, and to eliminate the need for repeated troubleshooting. In order to limit the many errors that may result from improper data entry, the entry of data should be strictly controlled. To standardize the address matching process, the GIS software was customized to control much of the data entry and matching processes. The customization was developed as a GIS application extension (ArcView 3.x extension). This customized software was also designed to improve the speed, accuracy, and reliability of address matching, while reducing the technical expertise required for the process. The primary purposes of the customized GIS application were: 1. Streamline the crash mapping process (reduce the time requirement and the need for making critical decisions) 2. Reduce the room for human error (particularly during the data entry stage) 3. Make GIS crash mapping possible even for inexperienced users 4. Eliminate technical intermediary steps

The application allows for the creation of portions of the initial database, the compilation and editing of existing addresses in the data table, and the reconciliation of street name files with the address table and the address matching system. The various problems encountered during the steps of address matching are handled with several features: - A pull-down menu guiding the user through the address matching process - An interface for data entry that controls the input of addresses and reduces room for human error - The implementation of a spell checking on the addresses entered - An interface that easily allows for address review and editing after initial entry - The ability to import a street alias table and access this for alternate matches during the address matching process - An interface that deals with the problem of multiple matches – multiple locations in the street data layer with the same address and street names - Addition of X and Y fields in the matched address file - The inclusion of offset distances in the address matching process

Although this particular extension is tailored to the existing process of crash mapping in Miami-Dade County, it is designed with the flexibility for a more generic use. Currently the system uses a commercial roads database (provided by Geographic Data Technology) as a mapping framework, and assumes the use of crash data tables based on police crash reports. However the system is capable of working with data sources for other counties in Florida and beyond provided a street GIS data layer with address fields and a table that has at least one field to uniquely identify each record.

The crash mapping extension consists of several functions including data input files; address entry and editing, and automatic and manual matching.

Input Data Files

This interface (Figure 2) represents the first step in the address matching process, and allows for the selection of several files that serve as the basis for crash mapping. The streets shapefile serves as the base for address matching or geocoding. The crash table is a database file (in database DBF format) containing crash numbers and associated data, and may or may not contain information describing the location of each crash. The city limit shapefile is an optional data layer containing the geographic boundaries of all the cities in a given county. The city-county code table, also optional, is the corresponding database file that contains code numbers associated with all cities within the county. Together the city limit file and the city-county code table provide a spatial solution to the multiple matching.

Figure 2: Data Entry Interface

Address Entry and Mapping

This interface (Figure 3) combines several functions:

Figure 3: Address Entry and Mapping Interface

Selection of crash records for address entry and/or editing: The Address Entry and Mapping interface allows the user to select the view of the crash records, which determines what crash records are displayed according to matching status. All crashes in the data table may be displayed, or only the matched or unmatched records. This interface also provides a means to determine the sorting method of the crash records, which may facilitate the location of certain records. Crash records may be sorted according to the original format, numerically by case number, or alphabetically by entered address of the crash. The Go to Case Number function allows for search and selection of crash records by case number.

Entry of addresses and offset data for selected records: Once the crash record has been selected, the address for the record will appear in the address component boxes at the bottom of the interface; if an address does not yet exist for the crash, then one can be entered into the boxes at this point. This portion of the Address Mapping and Entry interface includes pull-down lists for the entry of directional prefix, street name, street type, and directional suffix. The directional prefixes and suffixes are in a list limited to the basic cardinal directions (N, S, E, W, NW, NE, SW, and SE). The list of street types is designed to include all possible types of streets that may occur (from ‘Boulevard’ to ‘Plaza’ to ‘Isle’). Street types are listed alphabetically, except for ‘Avenue’ and ‘Street’, which are located at the top of the list, due to the frequency with which they

occur. The list of street names is drawn from the database for the streets file, and so will include all possible matching choices for the street name, listed alphabetically. The interface also contains a blank for entry of street numbers and an optional blank for street name in case this is found to be a preferable method to the user. The TAB key on the keyboard may be used to easily move from box to box and improve the speed with which addresses may be entered. The offset information for each crash – offset distance and direction – may be entered into similar boxes. The apply button store editing changes to the database. The data may be changed at any time by selecting this record again, entering new address and offset information into the appropriate boxes, and then saving it again to the database file.

Automatic or manual matching for each crash record: The Add Record function adds a new blank record to the list of crash records. This ability was included in the interface in order to account for any crashes that were not contained in the original crash database file. The Delete Record function provides a means to eliminate records that should not be involved in the mapping process. In the case of the Miami-Dade project, this included records that were located outside of the county, records that were incomplete, and records that involved types of crashes other than pedestrian. The Records to Match function allows for the user to determine the records that will be matched – either the selected ones, the unmatched ones, or all the records. The matching method may be set to automatic or manual. Automatic matching uses runs the matching function for all the selected records; manual matching brings selected records to a different interface used to match crashes manually.

Multiple Matches A user interface (Figure 4) had to be added to allow for the judgment of the user to make the ultimate decision as to the location of the address. The manual matching method may be selected to match individual addresses with the input of the user or to match them automatically according to assigned city-county codes from the city-county database file and city boundary shapefile. If the user chooses to Match individual addresses, then the correct location must be selected by number, according to the labels of each choice of location in the map view window.

Figure 4: Multiple matching interface

Manual Matching If a crash cannot be mapped automatically, even through editing, then it must be manually placed in the correct location. The manual matching interface (Figure 5) is designed to make this as easy as possible.

Figure 5: Manual matching interface

Zoom and pan tools including zoom to theme, zoom in, zoom out, and pan to assist in finding the correct location on the street map where the crash occurred. The identify tool is used to identify individual streets or crash points. Clicking on the street segment displays a window that lists multiple fields describing the street, including the street name and address range; clicking on a crash point lists the fields for the crash, including the unique crash number. The Point tool allows the placement of the address point in the desired location. The erase button is used to remove a point if incorrectly placed on the map view window. Finally, the Done button finalizes the placement of the point and adds it to the created crash point file.

Future Possibilities And Alternative Applications

The extension is transferable to other counties interested in mapping crashes, whether they are pedestrian, bicycle or automobile. As well, the extension could map addresses for just about any purpose. In this project, the desire to increase the speed and accuracy of the system came from a need to maximize return on time and monetary investments. Speed and accuracy are the two most important elements in mapping for an emergency situation, though for entirely different reasons.

Conclusions

In order to make address matching a viable technology in the fields of emergency response and disaster preparedness, the system must be made to be efficient, expedient, and reliable. The crash mapping GIS application described here, though originally intended to match the locations of pedestrian crashes, could be used for a number of other purposes, including those related to emergency response situations. Recommendations to improve the process include a better data collection of crash information by the police officers to avoid potential errors, improvement of streets database geometry and attribute information to reflect the most up to date streets information, use of high values for the geocoding parameters such as 92% or higher spelling sensitivity and 100% matching accuracy, provision of local streets names for each segment of the street database and a satisfactory local knowledge to substitute for imperfections in the database and GIS geocoding functions.

Bio – Ilir Bejleri, Ph.D. is an Assistant Professor in the Department of Urban and Regional Planning at the University of Florida

Scott Wright is a Graduate Research Assistant in the Department of Urban and Regional Planning at the University of Florida

Ruth Steiner, Ph.D. is an Associate Professor in the Department of Urban and Regional Planning at the University of Florida

Internet Transit Trip Planner Sandra Johnson

Introduction

In July of 1998 the Central Massachusetts Regional Planning Commission (CMRPC) was awarded an FTA Job Access Planning Challenge Grant for $20,000 to develop an Internet based trip planning application based on the Worcester Regional Transit Authority (WRTA) fixed route bus system. The major goal of this pilot was to determine whether or not an internet based planning application could be developed to serve small to medium sized transit systems for as little as $20,000 and then to consider the replication possibilities by similar sized systems, nationwide.

The following is a portion of a document that discusses in detail the Planning Commission’s Internet trip planning development efforts. The original document fulfilled a FTA Challenge Grant reporting requirement and is available by contacting CMRPC. Included in the original document is the development process, as it evolved, the lessons learned and tips for other agencies interested in attempting to develop this technology with limited financial resources. Some of the key elements of the report are included in this synopsis.

Background

The Central Massachusetts Regional Planning Commission (CMRPC), on behalf of the Worcester Regional Transit Authority (WRTA), began Access to Jobs/Welfare-to-Work efforts in August of 1997 by identifying the major players across a 56-town region in central/southern Massachusetts. To date, staff has met with over 50 agencies and employers, where initial and ongoing discussions have centered on appropriate transportation strategies, given the difference between rural and urban transportation issues and the disparate solutions required to solve them.

This outreach effort with local agencies surprisingly revealed a wholesale lack of knowledge regarding WRTA services. It became obvious to CMRPC staff that although human resource personnel and trainers/placement staff who work with recipients are aware of the transit system, most do not know how to use it and, therefore, cannot promote it to their employees/clients as a viable transportation option. Conversations revealed that they didn’t know the bus service area, how to read schedules, how to make transfers or how to route themselves/employees/clients to destinations. This lack of knowledge sparked an exciting new educational effort called ‘Train the Trainer’ aimed at teaching staff who working with welfare recipients how to use the bus. The intent of these sessions is to make trainers/placement individuals comfortable enough with the system to be able to promote the bus as a reliable, convenient and user-friendly transportation resource to their clients.

While the ‘Train the Trainer’ effort is expected to make potential riders feel more comfortable using the bus, staff is aware that the 29-route system serving Worcester and

13 surrounding communities remains confusing. Additionally, after completing a training session at the Regional Employment Board (REB), REB staff expressed a desire for an ‘easy to use tool’ that would provide real-time transportation options at job placement centers to go along with their newly acquired knowledge of the bus system. After considering all of the various options, CMRPC staff concluded that the best way to disseminate this type of information would be through the Internet, where 24 hours/day – 7 days/week access would be would be free to users, updating route/schedule changes would be seamless and bus schedules would no longer be required at various sites.

Therefore, CMRPC set out to develop an Internet based transit trip-planning tool to work in tandem with the training sessions. In order to be truly useful, the Internet product needed to be capable of locating available jobs and childcare sites along existing bus routes so placement

Getting Started – A ‘How-To Manual’

The Trip Planner was envisioned as an Internet-based job placement tool. It was intended to help job placement workers, human resource personnel/trainers and employers route their clients and employees to work, training, childcare and other destinations using the bus.

Although CMRPC was unaware of existing web based trip planners when the project began three years ago, staff knew that such a concept was possible. Staff also realized that they did not have the technology, experience or the time to complete the task without help. The development would have to be accomplished by an outside agency. It was then that the search for ‘partners’ began.

Concurrently the search for funding began. CMRPC met with FTA Region I staff to discuss their Trip Planner concept, and were encouraged to apply for one of the five (5) FTA Job Access Planning Challenge Grants available nationwide. After preparing a detailed Scope of Work and applying for the grant, CMRPC learned in FY ’98 that they had received one of the five $20,000 grants to develop an Internet Trip Planner for the Worcester Regional Transit Authority.

Technical and Developmental Concerns As part of its search for software partners, staff approached its own Internet Service Provider (ISP), Ultranet (now owned by RCN), based in Marlboro, MA. Because of their experience creating web-based applications Ultranet introduced enlightening technical and developmental concerns not previously considered by CMRPC staff.

Ultranet began by identify the development, design and maintenance concerns associated with an Internet based Trip Planning web page. The following is a list of questions Ultranet felt CMRPC needed to answer/consider before continuing on with their Trip Planner development: 1) Who are your primary users and what are their technological capabilities? Including:

a) Hardware: types of computers, types of printers, ratio of computers/printers to worker, modem and internet connection at each location b) Level of computer/internet experience c) Languages needed for output 2) Which agency (WRTA or CMRPC) will control the page? 3) Will the application have its own domain name or will it be attached to an existing site, and if so, which site (WRTA or CMRPC)? 4) Should the application be launched in stages? 5) Should use of the Trip Planner be free of charge? 6) Should the web page be sponsored by a company, and if so, who should it be? 7) Should advertising be sold on the web site to pay for costs of development and maintenance? 8) How will the application be marketed? 9) What links should be created to and from the application?

Many of the questions raised could not be answered immediately but they needed to be raised. Only a company like Ultranet, which has broad-based experience with Internet products could raise such questions. Anyone interested in pursuing a similar project should seek out this expertise.

Outreach to Vendors It was through Ultranet that CMRPC was introduced to local software vendors and designers. Ultranet had worked on similar (but not identical) projects with DesLauriers and Associates, Inc., a GIS developer based in Franklin, MA. DesLauriers created an Environmental Systems Research Institute Inc. (ESRI) based Internet product to assist municipalities with their planning requirements. Ultranet arranged a meeting between CMRPC and DesLauriers to explore the possibility of working together to create a Trip Planner application. Concurrently, staff learned of another local GIS vendor/developer that was working on a similar product and met with them, as well.

In explaining the nature of the project, staff made it clear that once funding was secured, it would probably be limited, at best. Pivotal to the success of accomplishing this project would be the ability of staff to persuade software developers to donate their services, experience, products, time and labor for the good of the project and any future benefits they might glean. It was after these first meetings that CMRPC was awarded a FTA Challenge Grant in July of 1998 to develop the trip planning application. Once CMRPC was able to bring money to the table, staff created a Solicitation for Services/Request for Proposals and a list of technical specifications to begin the bid process (both available in the full write-up).

During this time CMRPC became aware of and researched a number of existing trip planning applications in Ventura County, CA, Washington D.C., the state of New Jersey, Los Angeles, CA and Detroit, MI. All of these transit authorities are much larger than the WRTA and paid considerable amounts for their trip planning applications (from $150,000 to well over $1,000,000). Armed with this information, CMRPC returned to the local GIS providers/developers and discussions centered on the possibility of these

companies developing a product within the $20,000 cost constraint. It was essential, at this point, for CMRPC to make it worth their while to get involved with the project by offering them opportunities to tap into markets not usually associated with the GIS industry, such as the human service field. Additionally, the development of this product could introduce the vendor to smaller transit authorities, nationwide, since Challenge Grant requirements included the replication potential. Both agencies felt the project was worthwhile and bids were received from the two vendors in October of 1998. However, the ensuing negotiation process was arduous and stretched out for several months.

Bids Once bids were received and discussed between the WRTA and CMRPC, many issues not previously considered were revealed. A list of proposal pros and cons was created that included an assessment of CMRPC’s contribution to the development of the product. First, CMRPC/WRTA discussed the costly software purchases associated with each vendor’s proposal. While vendor 2 required a considerable investment in a product that CMRPC had not used before, the DesLauriers/ESRI bid was based on using a GIS system already owned and operated by CMRPC. Staff had important concerns regarding the time and effort required to learn a new piece of GIS software and the technical assistance/service costs associated with product development and beyond. While CMRPC recognized the ease of using a GIS software package familiar to staff, this was not the only criterion for choosing a vendor.

Other concerns included: 1) Final product user restrictions were considered since one bid stipulated the product could only be used for Job Access purposes (not acceptable to the WRTA or CMRPC). 2) Timeframe for completion - one vendor had a completed product based on software not owned or used by CMRPC (learning curve for staff). The other vendor proposed an application based on software used by CMRPC, but had to develop the application. Additionally, although both companies felt the product would be up and running within a few months of signing a contract, that time frame was underestimated in both cases due to the unanticipated and overwhelming data development requirements of the project. 3) Finally, continuing long-term costs, including web server licensing fees and web site hosting costs were uncertain. This variable prolonged negotiations 6 more months, since CMRPC and the WRTA were concerned about the ongoing costs of maintaining the product on the Internet.

Lessons Learned

Negotiating Armed with a cost/benefit analysis of the two bids and the above mentioned concerns, CMRPC learned its first valuable bargaining lessons: 1) Be sure you talk to the right people, both financial and administrative to avoid misunderstandings and misinformation. Staff had a very difficult time obtaining up-front cost information on products to be developed in the future and had to be

rather persistent to obtain that information. Staff was glad they were persistent, since they avoided what could have been a very costly ‘surprise’ by insisting on in- depth cost information up front. 2) Work with a developer(s) willing to accommodate your specific needs. For consideration are the functional requirements of your product and whether or not the vendor can/will develop the product to accommodate those requirements. Be cautious working with developers simply because they have a developed product, since you may be forced to use a product developed for another user’s needs. If you choose to work with a vendor that has an existing product, be sure you obtain the costs associated with customizing their product to your application. If you work with a vendor that has not developed a product before, make sure your application needs can be accommodated within the contracted price. 3) Make sure you work with a vendor that will allow the application to be open not only to initially identified users, but potential (future) users, as well. At the outset, CMRPC identified the Trip Planner users as individuals/agencies involved in the Job Access effort. However, the WRTA and CMRPC recognized many other potential users. By restricting the application to one population, the WRTA would be restricting the service from elders and people with disabilities, commuters, employers and tourists. If user restrictions are a part of the contract, be sure you are aware of them and the costs associated with removing them.

Costs It is important to find out what you 'own' when the product is completed and what costs are associated with modifying the application once a final product is delivered. Along these lines, decide upon any ‘future’ uses of the trip planner, such as whether the product may eventually be used in a kiosk setting. CMRPC envisions kiosks at Worcester’s Union Station, the WRTA hub and local shopping centers. Decide if the application would one day be used to route individuals on commuter rail lines, subways, ferries or private buses, or if it should accommodate only fixed route buses. Regardless of your future needs, be aware that programming changes to a completed trip planning application would most likely require a fee, and depending on the vendor, can cost upwards of $100/hour. Essentially, be sure you are pleased with the product before making any final payments.

Also, research on-going costs associated with maintaining the trip planner - such as web licensing fees, web hosting, maintenance and upgrading/changing the product. These answers may not come from the GIS vendor/developer you have chosen so you must contact ISP's or other transit authorities running similar products to get price ranges.

Another important cost consideration deals with what is required of you and your agency to support the development of the product. At the outset, CMRPC believed that the majority of the application costs would be for programming and GIS development, but later realized that the bulk of the costs would be associated with CMRPC in-house data development. CMRPC’s data development costs would have been the same regardless of vendor, since any trip planning application seemingly would require similar information. It must be recognized that any agency developing a trip planning application will have to

complete costly data development tasks unless the data is already available from the transit authority. Data collection sources and costs should be considered before embarking on this type of project. One way to avoid the data development process is to have the vendor complete these tasks. However, keep in mind that their labor costs can start at $100/hour.

Accepting a Bid After considering all of the above-mentioned items, CMRPC accepted DesLauriers bid to develop an application based on ESRI products. For a number of different reasons staff felt DesLauriers bid was less constricting, more inclusive of CMRPC’s needs and would ultimately be less time consuming to develop based on staff's vast experience using ESRI products. A contract was signed between CMRPC and DesLauriers in May of ’99.

On a final note, make sure that you are totally aware of your agency’s bid procedures and your funder’s bid/award requirements before beginning a project of this magnitude, as non-coherence could lengthen the process considerably.

Production Development

Data Development The first phase in the Trip Planner development, the data development phase, began by DesLauriers and CMRPC jointly compiling the following list of the Planning Commission’s required tasks: · Develop GIS based bus route model (with transfers and stops) suitable for geocoding and routing · Provide ancillary data regarding bus schedules, fares, and other pertinent data to be accessed by the system · Refine GIS based street network with associated data model suitable for geocoding and routing · Provide technical support to agencies using the product.

Armed with these rather amorphous tasks, staff next approached the Administrator and General Manager of the WRTA in order to find out whom at that agency would be most likely to aid in the data development process. Involving the RTA at the beginning of the process is crucial to the success of the project. After identifying a key person, staff began by assessing the data the WRTA and CMRPC had collected over the years related to the WRTA bus stops and routes.

At that point in time CMRPC had developed all 29 WRTA routes in ESRI's PC ArcView GIS program, one task required for the project. However, obtaining data on bus stops proved to be much more difficult. The WRTA bus route system has 29 routes covering the city of Worcester and 13 surrounding communities. The system is made up of approximately 1,500 bus stops in the city of Worcester, which are, for the most part, designated by a bus stop sign. Riders in the city are required to board or alight at a designated stop. Conversely, in the surrounding towns, the system changes to a flag

system where riders wave down the bus in order to board and alight at a 'stop' deemed safe by the driver.

The WRTA and CMRPC had completed a joint survey of the city of Worcester bus stops in 1997 which included the following information: bus stop number and unique ID, the street and the side of street the stop is on, approximate location, direction, nearest intersection, if stop is signed, mileage from last stop, landmarks, and other information related to the environment of the stop (sidewalks, lighting, etc). The survey was completed to evaluate the disabled community’s degree of accessibility to the WRTA system and the data captured reflected the needs of the project. Unfortunately, the data collection process did not include the use of the most advanced technologies available at the time, including Global Positioning Systems (GPS) to obtain geographic coordinates or relational databases to store the information. As a result, the survey, which took approximately one year to complete, did not add to the overall data development process. It was quickly realized that what was required was a GPS survey of each stop in the city of Worcester in order to obtain the accurate latitude/longitude of each stop. Fortunately, CMRPC had recently purchased a GPS unit and staff members familiar with the technology were able to assist in training new staff.

GPS Data Collection Two CMRPC staff members most familiar with the WRTA bus system undertook the GPS data collection task. The tasks of these two individuals prior to the GPS data collection included ridership surveys and passenger counts for the WRTA, requiring them to ride each bus route numerous times in order to collect information and in turn gave them an intimate knowledge of each bus stop in the city of Worcester. This level of knowledge is by far the most important ingredient in the Trip Planner data development process. If this person does not exist, or their time cannot be tapped in order to complete the project, CMRPC strongly recommends you do not continue with the project. If you feel you can move forward without this person, be prepared to learn every miniscule detail of your transit system and be sure you can double and triple check your information with someone more familiar with the system.

Next, staff worked with one of the staff members with this level of knowledge, a planning technician, and taught him the guidelines/steps for the most efficient data collection. This individual became responsible for all aspects of the GPS data collection including: creating data collection schedules, checking satellite positions and optimal data collection times, creating and uploading data dictionaries, locating all stops (of which approximately 30% - or 450 - were not signed at the time of the survey), capturing all stops, keeping track of all points collected and those not collected due to environmental interference, uploading data to computer, differentially correcting data, converting/exporting data sets to ESRI shapefiles and lastly viewing data on ArcView for quality control. Data collection began in the city of Worcester in the summer of 1998.

While initially CMRPC did not believe stops would have to be collected in the surrounding towns, the need for this information became essential due to a change in routing techniques from DesLauriers. CMRPC had assumed the application was going to

be route (line) driven, meaning the bus routes would be the basis for trip routings. However, once programming work began DesLauriers realized their programming package, ESRI's MapObjects, could only support node (point or bus stop) driven applications and not route driven applications. Therefore, the collection of bus stop data in flag areas, or areas where there are technically no bus stops, became essential. While it may seem impossible, the identification of bus stops where no stops exist can be done in a number of ways. First, CMRPC chose to GPS the 'timed' stops used in the schedules, these are locations where the bus departs from at a time designated on the schedule (either at the exact scheduled time or after) in order to ensure the bus is running on time. Secondly, the passenger count information collected by CMRPC was used to select certain 'high loading' locations, or stops that are used by many people to board the bus. Another identification technique not used by CMRPC could be to GPS all route intersections as points, a practice more likely to be used by smaller transit systems with flag systems in less dense areas.

The GPS data collection system set up by the planning technician worked well for the data collection in the city of Worcester and the surrounding towns, but took approximately two summers (roughly 3 and ½ months full time) to collect all 2,000 bus stops. Once this was completed, however, the GPS data collection task does not end, but must continue in order to capture any new bus stops due to routing changes, which can occur as often as twice a month. In order to ensure the accuracy of the application it is important to be kept informed of all routing and bus stop changes and to update the system as needed. In CMRPC's case this continues to be a problem since the City Council controls bus stop location approval and removal and communication remains the main stumbling block.

Obtaining Base Data While the majority of the work to date revolved around collecting WRTA bus route system information, ongoing activities included contacting street data vendors and database development. An accurate street data set used for geocoding (or address matching) is absolutely essential to the success of the project once it is up and running. However, finding accurate data at a reasonable cost proved difficult. While the State of Massachusetts (and most likely all states) create road inventory files (RIF) for every town in the Commonwealth and are available free of charge, they unfortunately do not contain the necessary information required to geocode. The required information is the address range and zip code by street segment and unfortunately, this level of information was not included in the Massachusetts road inventory files. However, many private companies sell relatively up-to-date street data sets with this important geocoding information. As it turns out, the State of Massachusetts already had a reciprocal agreement with a street data vendor, Geographic Data Technologies (GDT) located in Lebanon, NH. This agreement entitled all state planning commissions to use GDT’s 'Dynamaps' street data for geocoding with the agreement that the planning commissions send regular street updates to GDT. Unfortunately, most planning commissions (including CMRPC) did not send updates and the data, in turn, was not highly accurate.

In an attempt to obtain accurate information CMRPC contacted the appropriate officials in each of the towns with fixed route service in order to obtain a list of streets within ¾ mile of the fixed route system along with address ranges for each street segment. While CMRPC did obtain excellent information for the city of Worcester through the City Manager's Office of Planning and Community Development (OPCD), obtaining similar information from the surrounding towns, that have little or no GIS capabilities, proved to be an impossible task. Nonetheless, CMRPC decided to use GDT's Dynamaps as the geocoding data layer for the Trip Planner application. This was due in part to the fact that CMRPC felt most of the users of the application would live and work in the city of Worcester, which now had highly accurate information, thanks to OPCD.

Access Database Development Once the bus stop data was collected and the base data defined, the next step was to create and populate a database. DesLauriers developed a relational database model in Microsoft Access for CMRPC to use as a guide, which was eventually changed/upgraded by CMRPC to streamline the database population process. Microsoft Access was determined to be the appropriate system to use since this powerful tool is best used for complex data sets in order to 'relate' complicated information and was, therefore, perfectly suited to trip planning applications. CMRPC's application is comprised of six separate spreadsheets all 'related' to one another through similar fields (or columns). The information required to route individuals is contained within the spreadsheets and uses the following: unique bus stop identifiers, route names and aliases, the starting and ending points of each run, the total number of stops in a route, the sequence of the routing, time between stops, timed stops, transfer points, fare zones and the latitude/longitude of each bus stop.

Populating the databases proved to be a time and labor intensive project, even though it was assigned to the one key person with extensive/intimate knowledge of the transit system, the planning technician responsible for GPS data collection. Even though the WRTA’s 29-route transit system is small in comparison to other systems, it is full of deviations, interlined routes and special runs that must be accounted for in the database. The first database populated was a 'simple' table where each route (recognizing one route is a separation of inbound and outbound runs, so each route contains two runs) was accounted for by assigning a unique name to each route and the total number of stops associated with each run. As stated, the WRTA system is made up of 29 routes, however, if a route has a deviation or a route makes a scheduled trip to a stop not on the normal route, the total number of stops for the route increases and therefore, must be considered a different, separate route. As a result, the total number of routes increased from 29 to 136, making the data population task very lengthy and complicated.

In total, staff populated 6 separate Access tables containing over 10,000 records required to run their Trip Planner application. CMRPC did not include the Access database at this time due to confidentiality issues. However, if you would like to discuss the database development requirements for a Trip Planning application, please contact CMRPC.

It should be noted that the output of this Trip Planner data collection/database development process is a highly accurate picture of the entire WRTA fixed route system. What should be recognized is that the database created for the Trip Planner application does not have to be used solely for this project, but can and should be used for numerous other efforts. Much of the data collection/database development work completed by CMRPC could have been done prior to the introduction of the Trip Planner project, and in so doing, would have decreased the cost and time frames dramatically. In retrospect, when CMRPC/WRTA embarked on the initial bus stop survey in 1997 and recognized the data intensive nature of the project, they should have asked two questions, what other projects could benefit from this data and what other types of information should be collected while in the field.

Web Development Envisioning your final product is the key to defining the functionality required in the application. From the beginning staff had envisioned a product easily used by many diverse populations ranging from first time bus riders with minimal English skills to commuters or tourists wanting to avoid traffic to WRTA call takers answering routing questions for riders. Staff articulated these needs to DesLauriers, but were still unsure as to what functions they could or would develop within the cost constraints. The ideas that arose from discussions on functionality and design were enlightening to both CMRPC and DesLauriers, since each agency brought to the table unique thoughts and ideas for the final product. What resulted from these meetings was a product designed to meet CMRPC's needs and programming that allowed the process to work as efficiently and quickly as possible. This was accomplished due to the fact that DesLauriers designed the product from scratch and incorporated the needs of CMRPC into the design, rather than having the programming dictate the functionality.

Once CMRPC identified the needs of the application, the final phase was to design a user-friendly web site for the product. Staff members did have experience in web page design and created an easy to use frame-based site. The page was written in easy to read language, with an instruction page link for new users. The application requires users to enter an origin and destination address and city, along with the arrival or departure time and the travel day (weekday, Saturday or Sunday).

While this may seem straightforward, the work leading up to the application was not. Each database had to be designed and populated to incorporate all relevant information, plus new databases were created in order to simplify the web site route finding process for the user. This included creation of a database containing all zip codes linked to town names, so the user only needed to know the origin/destination town name rather than the town zip code, which could be a problem for visitors or those unfamiliar to the area. It also included the creation of a list of 500 landmarks (addresses of common trip destinations) for users to choose from, in case an individual knew they needed to go the Worcester Department of Transitional Assistance but did not know the address.

Finally, while CMRPC was responsible for the design of the web site, DesLauriers was responsible for the design of the route finder results page. CMRPC did give DesLauriers

an ArcView project depicting specifics about map colors, labels and caveats to be included on the results page, but had no control over the placement of the map or trip instructions. DesLauriers, however, has been very flexible and will customize the results page to suit any transit authorities needs.

CMRPC’s Internet Transit Trip Planner is currently in beta-testing mode and will be open to the public in October of 2001 at www. therta.com.

On-Going Considerations

It must be recognized that the development of a final product and a write-up of the procedures does not close the files on the Trip Planner application. Many unknowns still exist that will only be illuminated once the system has been up and running. Continued concerns include: finding a low-cost ISP with Internet Map Server capabilities, system maintenance and upgrading, site ownership and continued funding. These issues will be documented as they evolve and a follow-up report will be forwarded to the FTA.

Conclusions

One of the best outcomes of this demonstration project were the questions it raised that could not have been anticipated. In depth analysis of the WRTA's system shed light on the numerous complexities of fixed route bus service, even in a smaller system. CMRPC learned throughout this process that, while the Trip Planner was created in order to make the system 'easier' for riders, it would not completely demystify bus service for every rider in every situation. It must be the responsibility of each rider to learn all of the ‘quirks’ of their bus system since many aspects of transit design and operation do not lend themselves easily to computerization. Along those same lines, it is hoped that CMRPC transportation planners and WRTA transportation operation managers will recognize the value of the Trip Planner database and use the enormous amount of highly accurate data in other transit authority projects beyond this application.

At the beginning of this write up two questions were asked regarding the Trip Planner application, 1) is $20,000 enough to develop a trip planning application for a small/medium size transit authority and 2) if so, what are the replication possibilities nationwide? CMRPC has proven that only the programming can be done within the $20,000 cost constraints, and then only if developers are willing to donate freely of their time, labor and expertise. The other necessary work (data collection, data development and on-going maintenance) requires funding well beyond this amount. If an agency can secure additional funding and staff to complete the needed tasks, DesLauriers has created a low-cost prototype that can be applied to any small/medium sized transit system in the country.

Please note that the complete Internet Transit Trip Planner write-up is available by contacting: CMRPC 35 Harvard Street, 2nd Floor

Worcester, MA 01609 Phone (508) 756-7717 x 32 Fax (508) 792-6818 Or email: sandra.j.johnson@ usa.net

Bio - Sandra Johnson is with CMRPC in Worchester Massachusetts

Address Compendium Authors The authors are listed in alphabetical order by author last name. In some cases current addresses were not available and this noted in the listing.

Ilir Bejleri, Ph.D. Assistant Professor Department of Urban and Regional Planning University of Florida 431 Architecture Building Gainesville, FL 32611 Phone (352) 392-0997 Ext. 432 Fax (352) 392-3308 [email protected]

Marc E Berryman Greater Harris County E9-1-1 Network Houston, Texas [email protected]

Chris Boyd Planning Technician-Addressing, Douglas County Community Development Department Douglas County Government 100 Third Street Castle Rock, CO 80104 [email protected]

Greg DiGiorgio Database Analyst City of Newport News 2400 Washington Avenue Newport News, VA 23606 Phone 757-926-3744 [email protected]

Sandra Johnson At the time of original authorship CMRPC 35 Harvard Street, 2nd Floor Worcester, MA 01609

Morten Lind [email protected] National Survey and Cadastre, Copenhagen, Denmark

Jay S. Meehl At the time of original authorship at Douglas County Colorado

Scott Oppmann GIS Utility Supervisor Oakland County Information Technology 1200 N Telegraph Road #49 Pontiac, MI 48341 Phone (248) 452-9198 Fax (248) 452-9128 [email protected]

Dan Parr 1005 Houston Ave Takoma Park, MD 20912 Phone (301) 585-8031 Fax (301) 585-3809 [email protected]

Andrew Pidgeon Director of Sales and Marketing Des Lauriers Municipal Solutions, Inc 40 Kenwood Circle, Suite 8 Franklin, MA 02038 Phone (508) 520-0502 Fax (508) 528-4011 [email protected]

Amy J. Purves, P.E. PlanGraphics Associate 11612 Michale Ct Silver Spring, MD 20904 Phone (301)-346-0130 [email protected]

Richard Schneider, Ph.D. Associate Professor Department of Urban and Regional Planning University of Florida 431 Architecture Building Gainesville, FL 32611 Phone (352) 392-0997 ext. 430 Fax (352) 392-3308 [email protected]

Ruth Steiner, Ph.D. Associate Professor Department of Urban and Regional Planning University of Florida 431 Architecture Building Gainesville, FL 32611 Phone (352) 392-0997 Ext. 431 Fax (352) 392-3308 [email protected]

Nancy von Meyer Fairview Industries PO Box 100 - 233 East Main Street Pendleton SC 29670 Phone (864) 646-2755 Fax (864)-646-2712 [email protected]

Michael Walls Executive Consultant PlanGraphics, Inc. 112 East Main Street Frankfort, KY 40601 Phone (502) 223-1501 Fax (502) 223-1235 [email protected]

Louise B. Wennberg Senior Research Analyst Chester County GIS Department West Chester, PA 19380 Phone (610) 344-5215 Fax (610) 344-5211 [email protected]

Scott Wright Graduate Research Assistant Department of Urban and Regional Planning University of Florida 431 Architecture Building Gainesville, FL 32611 Phone (352) 392-0997 Ext. 431 Fax (352) 392-3308 [email protected]

APPENDIX A ADDRESS STANDARD REVIEW - OVERVIEW Nancy von Meyer Dan Parr

The following document reviews site and mailing address components of selected existing and proposed standards in the United States. The standards that were selected for review were those thought to be representative of the types of standards that might be encountered by most state and local governments.

This review is consists of United States Postal Service (USPS), the Federal Geographic Data Committee (FGDC), the National Emergency Number Association (NENA), the State of Kansas Address Standards and the Environmental Protection Agency (EPA).

The review process combined elements form all of these standards into a generic model of address. The following diagram illustrates the combined address components. The diagram is followed by definitions for the components form the various standards.

Geographic Street-Segment GeographicID Alternate-Name SegmentID AddressID AddresseeID StreetName X AlternateAddress LowEven Y eeID HighEven Z AlternateType LowOdd Latitude degree HighOdd Latitude minute LowEvenActual Latitude Second Addressee (Customer/Client/ HighEvenActual Longitude degree LowOddActual Agency/Firm) Longitude minute HighOddActual AddresseeID Longitude second Odd/Even/Both? Name prefix UTM Zone First name Northing Middle initial Easting Middle name Last name Address LUStreet Name Name qualifier AddressID Street Name Educational achievements Prefix 1 Alternate Street Firm/Agency Prefix 2 Name Department/Divsiion Street number Fractional street number LUStreetType Street name StreetTypeID Street Type StreetTypeAbbrev LURole Contact-Address Suffix 1 StreetType RoleID AddresseeID Suffix 2 Role AddressID Unit 1 Type Role Description RoleID Unit 1 Number Unit 2 Type LUUnitType Unit 2 Number UnitTypeID Rural route description UnitTypeAbbrev Rural route number UnitType Parcel-Address Rural route box number AddressID Municipality Mailing ParcelID State abbreviation LUMuni (CVT) ParcelAddressTyp ZIP Code MuniID e ZIP+4 Code CVT Code International State CVT Name Country Alternate- International postal code Address GIS Concatenated Address LUState AddressID Date Created State AliasAddressID Date Updated Abbreviation AliasType Data Source State Name Date of the Data Source State FIPS Code Alias Flag?

Addressee – This table identifies the person or organization for whom an address has been collected. Not all people or organizations will have an address

Addressee (Client/Agency/Customer/Business) AddresseeID Number Primary Key for the addressee Name prefix Character Title preceding the name of an individual. Examples: Judge, Mr., Mrs., Ms., Miss, Colonel First name Character Given name or nickname of an individual Middle initial Character First letter of the second (or more) names of an individual Middle name Character Second (or more) names given an individual preceding the individual’s last name Last name Character Surname (i.e. family name) of the individual Name qualifier Character Qualifier indicating a person has the same name as another family member. Examples: Junior [Jr.], III Educational achievements Character One or more advanced degrees that may be important to an establishment (e.g., an educational institution). Examples: Ph.D., EdD, JD, MD Business/Agency Character The name of the firm or agency

Division/Department Character The department or division or other subdivision of an agency or business to identity the specific entity

Alternate Name – This table tracks relationships among the name listings in the Addressee table

Alternate Name AddresseeID Number Foreign key pointing to a record in the Addressee table AlternateAddresseeID Number A pointer to another record in the Addressee Table AlternateType Character The relationship between the two listed addresses

Address - This is the primary address table and is a mailing address.

AddressID Primary key for the table Prefix 1 This is normally a direction that precedes a site address. It will be standardized from a look up table. This is called a predirectional field in the postal standards and the EPA standards Prefix 2 This is a secondary prefix for the address. This is only filled if the prefix 1 is not a directional prefix or there are two prefixes. This is not the second coordinate. Street Number The number assigned to a building or a land parcel along the street to identify location and to ensure accurate mail delivery. This is the number of the site. For the coordinate addresses it is NxxxWxxx, that is, include both numbers with no space. Fractional street number A sub-number to a street number Street name This is the name of the street and this will also form the basis for the street naming list that will be provided to all municipalities for use as they approve new street names. Official name of a street assigned by a local governing authority. Street Type The trailing designator in a street address. Called suffix in the FGDC standard. This is the standard street type abbreviation. These are defined in Appendix C of the postal standards. The post office provides a standard domain of values for these. Suffix 1 Postdirectional field in the FGDC standard. The directional symbol that represents the sector of a city where a street address is located. This is a post directional field in the postal standards. These should be standardized to be the cardinal directions as with the prefixes. Suffix 2 This is a post directional field in the postal standards. The second suffix is only filled in if the First Suffix is filled in. Unit 1 Type This is the unit type, such as apartment, suite or office. The standards for these are listed in Appendix C of the postal standards. Called Secondary address identifier in FGDC Unit 1 Number This is the number of the first unit and can contain letters or numbers. Unit 2 Type If Unit 1 location is already identified and there is a second unit like Office 1 Suite 32, then Office 1 is in the first field and the Suite is in the second field. Another example is in condominiums like building 1 unit 5. Unit 2 Number This is a number and can contain numbers or letters. Rural route description Type of rural route; route, rural route, highway contract route, star route, or PSC. Rural route number Number assigned to the rural route Rural route box number Number of a box along the rural route Municipality Mailing A finer partitioning of geographic subdivisions of a state or county, usually associated with additional levels of government. Note that

the taxing municipality for a parcel or structure may be different than the mailing address. The tax files will contain the taxing municipality. State abbreviation Two-character abbreviation for the name of a state, U.S. Territory, or Armed Forces ZIP Code Designation (“AA”, “AE”, or “AP”). ZIP Code A five-digit code that identifies a specific geographic delivery area. ZIP Codes can represent an area within a state, an area that crosses state boundaries (unusual condition) or a single building or company that has a very high mail volume. “ZIP” is an acronym for Zone Improvement Plan. ZIP+4 Code ZIP equals the five-digit ZIP code (refer to ZIP Code) +4 describes the last four positions of a ZIP+4 code. Most delivery addresses are assigned a single ZIP+4 Code. However, large companies may be given a range of ZIP+4 Codes that can be used to route mail to a specific department. International State The first division of an international country. I.e. the state equivalent in other countries. Country The largest of the geo-political boundaries that define address areas of the world. International postal code The postal code used for final sorting by local or regional delivery unit. Different countries have their own coding systems and formats for this code. GIS Concatenated Address The address components concatenated to match the format requirements for address geocoding. This element will be computed form the component parts. Date Created The date the record was created Date Updated The date the record was updated Data Source (File or source name) The file or source for the original record Date of the Data Source The date of the file or source for the original record

Contact-Address – This table matches people to an address

Contact-Address AddresseeID Number Foreign key points to the addressee AddressID Number Foreign Key points to the address entry RoleID Number Value form a look up table indicating the role the contact plays that address

Role – The role the contact, person or firm or agency, has with the address. This is a look up table.

LURole RoleID Number Primary Key for the Table Role Character The name of role such as resident, owner, contractor, manager, primary contact, other,

unknown, etc Role Description Character A more complete text description of the role for help and information

Geographic Address – The coordinate location of the address. Note that the horizontal and vertical datum are expressed in the metadata for this file.

Geographic Address GeoAddressID Number Primary key for the table AddressID Number Foreign Key pointing to the address X Number State Plane Coordinate Value for easting Y Number State Plane Coordinate Value for northing Z Number Elevation in Feet referenced to the County’s vertical datum Latitude degree Number First unit of measure; 0-360 degrees domain – FGDC Standard Latitude minute Number Second unit of measure; 60 minutes = 1 degree – FGDC Standard Latitude Second Number Third unit of measure; 60 seconds = 1 minute – FGDC Standard Longitude degree Number First unit of measure; 0-360 degrees domain – FGDC Standard Longitude minute Number Second unit of measure; 60 minutes = 1 degree – FGDC Standard Longitude second Number Third unit of measure; 60 seconds = 1 minute – FGDC Standard UTM Zone Number Segment of a grid dividing the Earth - required for NIMA Northing Number Distance in meters from the Equator - required for NIMA Easting Number Distance in meters from the Prime Meridian - required for NIMA

Reference Information – FGDC Proposed Address Standard

Address type = MAILING Descriptive Element Name Source4 Definition Addressee name/ Name prefix EPA5 Title preceding the name of an individual. Examples: Judge, Mr., Mrs., Ms., Miss, Colonel First name EPA Given name or nickname of an individual Middle initial EPA First letter of the second (or more) names of an individual Middle name None Second (or more) names given an individual preceding the individual’s last name Last name EPA Surname (i.e. family name) of the individual Name qualifier EPA Qualifier indicating a person has the same name as another family member. Examples: Junior [Jr.], III Educational EPA One or more advanced degrees that may be achievements important to an establishment (e.g., an educational institution). Examples: Ph.D., EdD, JD, MD Business/Agency EPA A finer partitioning of geographic subdivisions of a state or county, usually associated with additional levels of government. Division/Department EPA The primary administrative subdivision of a state in the United States County FIPS code Census A three-digit code assigned by the National Institute of Standards and Technology (NIST) to identify each county and statistically equivalent entity within a State. The NIST assigns the codes based on the alphabetic sequence of county names, it documents these codes in a Federal Information Processing Standard (FIPS) publication (FIPS PUB 6). Country EPA The largest of the geo-political boundaries that define address areas of the world. International postal code EPA The postal code used for final sorting by local or regional delivery unit. Different countries have their own coding systems and formats for this code. Rural route description Census Type of rural route; route, rural route, highway contract route, star route, or PSC. Rural route number Number assigned to the rural route Rural route box number Number of a box along the rural route

4 The source indicates the agency documentation used for the listed definition. 5 EPA definitions are from the Data Standard for Representation of Address Information, SDC-0055-057- LF-5038

State name Census A type of governmental unit that is the primary legal subdivision of the United States. State abbreviation USPS Two-character abbreviation for the name of a state, U.S. Territory, or Armed Forces ZIP Code Designation (“AA”, “AE”, or “AP”). State FIPS code Census A two-digit FIPS code assigned by the NIST to identify each State and statistically equivalent entity. The NIST assigns the codes based on the alphabetic sequence of state names (Puerto Rico and the Outlying Areas appear at the end); it documents these codes in a FIPS publication (FIPS PUB 5) Street/ Street number EPA The number assigned to a building or a land parcel along the street to identify location and to ensure accurate mail delivery Fractional street number USGS A sub-number to a street number Predirectional EPA The street vector, or direction the street has taken from some arbitrary starting point. Street name Census Official name of a street assigned by a local governing authority. Suffix USPS The trailing designator in a street address Postdirectional EPA The directional symbol that represents the sector of a city where a street address is located Secondary address identifier EPA The room, suite, apartment, unit, or building designator and number that are used by the postal service for mail delivery and for assigning the ZIP+4 postal code. Secondary address range USPS A geographic direction which follows the Street Name ZIP Code USPS A five-digit code that identifies a specific geographic delivery area. ZIP Codes can represent an area within a state, an area that crosses state boundaries (unusual condition) or a single building or company that has a very high mail volume. “ZIP” is an acronym for Zone Improvement Plan. ZIP+4 Code USPS ZIP equals the five-digit ZIP code (refer to ZIP Code) +4 describes the last four positions of a ZIP+4 code. Most delivery addresses are assigned a single ZIP+4 Code. However, large companies may be given a range of ZIP+4 Codes that can be used to route mail to a specific department.

Address type = GEOGRAPHIC

Descriptive Element Name Source Definition Latitude degree First unit of measure; 0-360 degrees domain Latitude minute Second unit of measure; 60 minutes = 1 degree Latitude Second Third unit of measure; 60 seconds = 1 minute Longitude degree First unit of measure; 0-360 degrees domain Longitude minute Second unit of measure; 60 minutes = 1 degree Longitude second Third unit of measure; 60 seconds = 1 minute UTM/ Zone NIMA Segment of a grid dividing the Earth Northing NIMA Distance in meters from the Equator Easting NIMA Distance in meters from the Prime Meridian

Address type = PHYSICAL Descriptive Element Name Source Definition Reference item Census Permanent object used to find the location of an address From distance Census Distance from the reference item to the address location From direction Census Direction of the address location from the reference item

Reference Information – NENA Address Standard

This document sets forth National Emergency Number Association (NENA) standard formats for Automatic Location Identification (ALI) data exchange between Service Providers and Data Base Maintenance System Providers. Movement of ALI data between Service Providers and/or Data Base Management System Providers is a necessary and common activity for the activation of E9-1-1 systems. Means of moving such data are varied and many. This document contains data exchange formats and data protocols recommended for creation and transporting of 9-1-1 data.

This recommendation advocates the use of one of two common protocols (KERMIT and NDM) for use in the near term and with a move toward one common protocol (TCP/IP) in the future. The recommendation unfolded in this manner with the recognition that as a goal NENA acknowledges the advantage of one protocol, but that existing systems are in place so an evolution plan must be put in place and that no single protocol can satisfy all applications.

The following table contains elements from the version 3 data exchange standards that may impact address exchange.

Record Type Indicates start of data record (label only, no data follows). Valid labels: DAT = Data Record sent from the Service Provider to the Data Base Management System Provider RTN = Data record returned from the Data Base Management System Provider to the Service Provider Status Indicator Record status indicator. Valid entries: E = Error C = Completed P = Pending U = Unprocessed Gateway received but not sent to processing, (future date) Function Code Type of activity the record is being submitted for. Valid entries: C = Change D = Delete I = Insert U = Unlock M = Migrate House Number House Number. House Number Suffix House number extension (e.g. 1/2). Prefix Directional Leading street direction prefix. Valid Entries: N S E W NE NW SE SW Street Name Valid service address of the Calling Telephone number. Street Suffix Valid street abbreviation, as defined by the U S Postal

Service Publication 28. (e.g. AVE) Post Directional Trailing street direction suffix. Valid entries: N S E W NE NW SE SW MSAG Community Name Valid service community name as identified by the MSAG. Postal Community Name Valid service community name as identified by the U S Postal Service. State Alpha state abbreviation (e.g., TX) Location Additional address information (free formatted) describing the exact location of the Calling Telephone Number (e.g., Apt 718) Also Rings At Address Secondary address for the Calling Telephone Number that rings at 2 locations. Not validated against the MSAG. This information may be displayed at the PSAP Zip Code Postal Zip Code Zip + 4 Postal Zip Code Extension X Coordinate Longitude/X coordinate. Right Justified: pad field with zeros to left of decimal degrees. +long: east of Greenwich; -long: west of Greenwich. Sample: +000.000000 Y Coordinate Latitude/Y coordinate. Right Justified; pad field with zeros to left of decimal degrees. +lat: north of equator; - lat: south of equator. Sample: +000.###### Z Coordinate Altitude indicated as mean sea level, measured in meters. Blank record indicates data not available. Sample: ####

Reference Information – United States National Grid – FGDC Standard

The objective of this standard is to create a more favorable environment for developing location- based services within the United States and to increase the interoperability of location services appliances with printed map products by establishing a nationally consistent grid reference system as the preferred grid for National Spatial Data Infrastructure (NSDI) applications. This standard defines the US National Grid. The U.S. National Grid is based on universally defined coordinate and grid systems and can, therefore, be easily extended for use world-wide as a universal grid reference system.

This standard defines a preferred U.S. National Grid (USNG) for mapping applications at scales of approximately 1:1,000,000 and larger. It defines how to present Universal Transverse Mercator (UTM) coordinates at various levels of precision. It specifies the use of those coordinates with the grid system defined by the Military Grid Reference System (MGRS). Additionally, it addresses specific presentation issues such as grid spacing. The UTM coordinate representation, the MGRS grid, and the specific grid presentation requirements together define the USNG. This standard is a process standard as defined by the Federal Geographic Data Committee (FGDC) Standards Reference Model. Specifically, it is a presentation process standard.

A point position within the 100,000-meter square shall be given by the UTM grid coordinates in terms of its Easting (E) and Northing (N). For specific requirements or applications, the number of digits will depend on the precision desired in position referencing. In this convention, the reading shall be from left with Easting first, then Northing. An equal number of digits shall always be used for E and N. Examples:

18SUJ20 - Locates a point with a precision of 10 km 18SUJ2306 - Locates a point with a precision of 1 km 18SUJ234064 - Locates a point with a precision of 100 meters 18SUJ23480647 - Locates a point with a precision of 10 meters 18SUJ2348306479 - Locates a point with a precision of 1 meter

Reference Information – EPA Address Standards

Facility Identification Data Standard - Mailing Address The standard address used to send mail to an individual or organization affiliated with the facility site. Each Mailing Address must be the delivery point for one or more Affiliation(s).

COMPONENTS

Supplemental Address Text - The text that provides additional information to facilitate the delivery of a mail piece, including building name, secondary units, and mail stop or local box numbers not serviced by the U.S. Postal Service. Value Domain: All text that provides additional information to facilitate the delivery of a mail piece, including building name, secondary units, and mail stop or local box numbers not serviced by the U.S. Postal Service. Example: Waterside Mall Mail Code 5204G, Pulaski Bldg Rm 8130

Mailing Address - The exact address where a mail piece is intended to be delivered, including urban-style street address, rural route, and PO Box. Value Domain: All exact addresses where a mail piece is intended to be delivered, including urban-style street address, rural route, and PO box. Example: 200 N Glebe Rd, RR3 Box 2, PO Box 135

Mailing Address Country Name - The name of the country where the addressee is located.

Mailing Address ZIP Code/International Postal Code - The combination of the five-digit Zone Improvement Plan (ZIP) code and the four-digit extension code (if available) that represents the geographic segment that is a subunit of the ZIP code, assigned by the U.S. Postal Service to a geographic location to facilitate mail delivery; or the postal zone specific to the country, other than the U.S., where the mail is delivered. Value Domain: All combinations of the five-digit Zone Improvement Plan (ZIP) code and the four-digit extension code that represents the geographic segment that is a subunit of the ZIP code, assigned by the U.S. Postal Service to a geographic location to facilitate mail delivery; or all the postal zone specific to the country, other than the U.S., where the mail is delivered.

Mailing Address City Name - The name of the city, town, or village where the mail is delivered.

Mailing Address State Name - The name of the state where mail is delivered.

Reference Information – Kansas Address Standards October 29, 1999

This document is a standard for addressing. For the sake of clarity, the term address refers to the simple, everyday element that designates a specific, situs location, such as a home or office. Addresses are very important. But, addresses are not always recorded and maintained in a standard manner. This document provides a set of guidelines by which addresses can be uniformly developed and, thereby, integrated with geospatial data structures. The guidelines should be incorporated into all efforts to establish address databases, for geocoding validation, and for the development of a master address list. The standard may be applied to both attribute databases and geospatial datasets.

An associated address table shall be comprised of the following components: • Unique identifier • Address Number • Directional Prefix • Street Name • Street Type • Directional Suffix • Unit Type • Unit Number • City Name • State • 5 Digit Zip Code • +4 Zip Code Example: 1235 W 19TH ST APT 24

At a minimum, the components shall be formatted as shown below: Field Name Length Type Description UNIQ 20 Alpha A unique identifier within the or associated address table that Numeric can be linked to other tables. NUMBER 6 Alpha Address Number SUB_NUM 3 Alpha Address Sub-number PRE_DIR 2 Alpha Directional Prefix STR_NAM 30 Alpha Street Name STR_TYPE 4 Alpha Street Type SUF_DIR 2 Alpha Directional Suffix UNIT_TYPE 4 Alpha Unit (i.e., APT, STE, BLDG) UNIT_NUM 4 Alpha Unit Number CITY 17 Alpha City name ST 2 Alpha State ZIP5 5 Alpha Zip Code ZIP4 4 Alpha +4 Zip Code The line in this instance is a linear geospatial feature that represents a street centerline. Address ranges are typically established for individual centerline segments so address

matching may be performed. Whenever practical, street names and address ranges shall conform to the actual situs addresses assigned to points and polygons.

• Unique identifier • Left From (Low) Address • Left To (High) Address • Right From (Low) Address • Right To (High) Address • Directional Prefix • Street Name • Street Type • Directional Suffix

The components shall be formatted as shown below:

Field Name Length Type Description UNIQ 20 Alpha A unique identifier within or the geospatial feature attribute Numeric table that can be linked to an associated address table. L_ADD_FROM 5 Numeric Left From (Low) Address L_ADD_TO 5 Numeric Left To (High) Address R_ADD_FROM 5 Numeric Right From (Low) Address R_ADD_TO 5 Numeric Right To (High) Address PRE_DIR 2 Alpha Directional Prefix STR_NAME 30 Alpha Street Name STR_TYPE 4 Alpha Street Type SUF_DIR 2 Alpha Directional Suffix