USAREUR ITAM GIS Regional Support Center
Total Page:16
File Type:pdf, Size:1020Kb
US Army Engineer District, Detroit DACW35-01-D-0003, D.O. 0005 Task 5
Information Management Strategy for the International Joint Commission Lake Ontario-St. Lawrence River Study May, 2002
Submitted To: US Army Corps of Engineers Detroit District Watershed Hydrology Branch Engineering and Technical Services USAED P.O. Box 1027 477 Michigan Ave. Detroit, MI 48226 USA
Submitted By: Pangaea Information Technologies, Ltd. 14 E. Jackson Blvd., Suite 1325 Chicago, IL 60604 USA USACE IJC LOSLR IMS
EXECUTIVE SUMMARY
The Common Data Needs Technical Working Group (CDNTWG) of the International Joint Commission’s Lake Ontario – St. Lawrence River Study was charged with the development and implementation of an Information Management Strategy (IMS). In response the CDNTWG assembled an IMS Team consisting of professionals either participating in the Study or associated with agencies or organizations in the Study region, that have relevant experience in information technologies. With assistance from a contractor, Pangaea Information Technologies, the IMS team has conducted a comprehensive Needs Assessment (NA) and hosted two workshops to aid in the formulation of the IMS. A summary of the needs assessment process and results, a survey of the strategies and policies adopted by other organizations, the strategy alternatives and options associated with the implementation phase and recommendations thereof, are presented in this document. After the Study Board selects the alternatives and options to be implemented, and Study IM policies to support those alternatives and options are endorsed, the CDNTWG will coordinate the development and implementation of a detailed Information Management Plan.
During the IM assessment and analysis phase, several key Study properties important to the Information Management Strategy were identified. They include:
Modeling and data processing is widely distributed within the TWGs among contractors, affiliated agencies and TWG members; other Study Participants and regional resource stakeholders are also widely distributed. TWGs identified specific geospatial and aspatial datasets (both model inputs and outputs) as sensitive, for reasons related to security, proprietary, and liability issues. While communication and information/data transfer via email and FTP presently address Study needs, this short-term solution is anticipated to become insufficient within 6 months due to dramatic increases in data volumes and demand for that data. The IJC does not wish to serve in a data stewardship or distributor capacity after the Study ends. Making the Study process transparent to the public is essential to the Study’s success. This includes making model descriptions and datasets accessible to the public for review and evaluation.
Pangaea Information Technologies, Ltd. ii USACE IJC LOSLR IMS
Providing data discovery, evaluation, and access for Study Participants has much potential to reduce redundancy and allow for more integrated modeling across resource sectors. Providing this functionality to the public is also desirable if data sensitivity and security can be ensured. Any IM system developed for the Study needs to be reliable, redundant (backed-up), and secure.
One of the principle functions required of a system designed to support the sharing of information is the ability of potential users to learn of the existence of and significant details about data or information. Popularly known as data discovery, such mechanisms employ search procedures which match user defined criteria against information held about the data, known as metadata. Metadata is essential for the data discovery process. The degree to which an organization generates metadata for data and information determines the effectiveness of the data discovery mechanism that can be implemented.
Data storage, maintenance and access needs require the coordination and integration of the many responsibilities associated with the system and its data. Data owners hold the responsibility for data use and maintenance, and they have the authority to define and manage data access and distribution through the application of a flexible security model. Authorized by the data owner, data stewards are familiar with the issues and concerns specific to a data set, and are responsible for its day-to-day maintenance. Consistent maintenance is essential to ensure the currency and quality of data. The integration of these responsibilities with the infrastructure and organizational procedures that support the system ensures reliability and sustainability.
The Internet continues to be the most commonly utilized method of distributing an information management system to a large number of dispersed users. Through consistent and well-designed implementation, an information management system can consist of one or many servers and provide simultaneous access to multiple users in many different locations. In establishing a shared environment from which data and information resources can be utilized, a client/server strategy can promote efficient system administration and access. The organization and design of the system will also need to address extensibility, the capacity to implement additional technical functionality after the initial implementation phase. Examples of such additional functionality are web services such as web mapping services (WMS) and web feature services (WFS) which provide functionality for interactive geospatial data viewing and querying over the Internet. This
Pangaea Information Technologies, Ltd. iii USACE IJC LOSLR IMS
type of functionality would require the implementation of a system that allows for connectivity to multiple datasets, potentially over multiple systems.
With knowledge gained from a review of the IMS approaches and policies implemented by organizationally- and functionally-similar organizations, and the information about Study properties and needs (presented above), the IMS team synthesized a strategy and several specific approaches for implementation. The alternatives and options were divided into three distinct areas: 1) Data Discovery 2) Data Storage, Maintenance, Access, and Distribution 3) Document Information Management
Data Discovery Alternatives and Options
Alternative 1 - Status-Quo: Currently, data discovery performed in the LOSLR Study is a function of gleaning information from documents detailing the Study organization and work plans and/or by “word-of-mouth.” The currency and completeness of this Status Quo approach is often poor.
Alternative 2 - Tabular List of Data: A second alternative would be to generate a tabular list of all data used or generated by the Study, and ask the respective data owners to add some brief metadata to their entry(s). The list could only be distributed to Study Participants because the non-compliant metadata would not be fit for public consumption. This alternative addresses the immediate need for inter-TWG data awareness in a limited manner, but does nothing to promote transparency and openness of the Study for the public.
Alternative 3 - Metadata Catalogue: A third alternative would be to develop a collection of standard-compliant metadata files, or metadata catalogue. This alternative is the first to be fit for public consumption, addressing the need for public involvement and transparency in the Study process and thereby promoting its overall credibility. This alternative requires a Study- wide commitment for the development and coordination of standard-compliant metadata, which requires a formal metadata review process.
Pangaea Information Technologies, Ltd. iv USACE IJC LOSLR IMS
Metadata Options:
The following options may be selected to further assist in the metadata development and coordination:
Option 1 - Metadata Review Team: The first of these options is the formation of a Metadata Review Team, which would conduct quality assurance and quality control on metadata as it is generated by the TWGs. Option 2 - Metadata Coordinator: The second option is the hiring of a Metadata Coordinator, who would coordinate all metadata training, provide assistance in metadata development, ensure completeness of metadata produced, and confirm compliance with FGDC 1998 metadata standards. Option 3 - Metadata Workshop: The third option is to hold a Metadata Workshop for training all study participants involved in the production of standard-compliant metadata. Workshop training on metadata generation software could provide a jump- start to the metadata creation process, and reduce the time spent by a Metadata Review Team and/or Metadata Coordinator over the course of the Study. Option 4 - On-line Metadata Development Assistance: The final option would be to design and implement On-line Metadata Development Assistance. This service would help TWGs that are generating metadata through simple text instructions and easy to understand manuals, and to direct specific questions to an identified metadata expert (e.g., the Metadata Coordinator), who would be required to provide timely assistance.
Alternative 4 - Spatial Data Infrastructure (SDI) Participation: The fourth and final alternative that addresses the need for Data Discovery is the participation in the spatial data infrastructure (SDI) of the United States and Canada. The SDI includes a network of metadata providers that use a standard search protocol to allow access to metadata through a single data discovery portal. Participation in the clearinghouse networks requires FGDC- or ISO-compliant metadata, and a searchable server. Thus, this alternative incorporates the tasks necessary for implementation of the third alternative, i.e., production of the metadata catalogue.
Because participation in the SDI network requires the implementation of a searchable (i.e., Z39.50-compliant) server, the Study would most efficiently utilize resources by submitting
Pangaea Information Technologies, Ltd. v USACE IJC LOSLR IMS
metadata to an agency or organization who has already implemented a clearinghouse node server. Examples include the Great Lakes Information Network Data Access (GLINDA) Clearinghouse, established by the Great Lakes Commission (GLC), and the Canadian GeoConnections.
Recommendation of Data Discovery Alternatives and Options
To best address the need for Data Discovery and Evaluation, implementation of Alternative 4 - SDI Participation is recommended. In addition to its primary function, positive externalities of this alternative for the Study include becoming part of a developing service provided to the geospatial data community, facilitating the transparency of the Study, and enhancing the overall visibility of the Study through its inclusion in the Global Spatial Data Infrastructure (GSDI). Options 2 – 4 are also recommended: hiring a “Metadata Coordinator”, conducting a “Metadata Workshop”, and providing “Online Metadata Development Assistance”. The primary cost associated with this alternative is related to the creation of metadata and the optional support functions. While some support for the SDI node may be appropriate (requested or required), that additional expense would be minimal. The total estimated cost of implementing the recommended Data Discovery and Evaluation alternative and options is $73,356.00US in FY2002 and $176,101.50 thru FY2005.
Data Storage, Maintenance, Access, and Distribution Alternatives and Options
Six alternatives have been identified for addressing the needs for data storage, maintenance, access and distribution. While the implementation of multiple alternatives simultaneously was possible for data discovery, the alternatives here are less compatible. The possible exception to this would be the temporary “implementation” of an “extended status quo” to accommodate the short-term needs of the Study during the development, testing, and final implementation of a better alternative. Additional options have also been identified that could be implemented with the three more functional alternatives
Alternative 1 – Status-Quo: The current data storage and access scheme implemented for the Study allows users (Study Participants) to store and access data in their local environments. Data transfers to non-local users requires an FTP site, such as the one managed by Canadian Centre for Inland Waters (CCIW), or various media (e.g., CDs, magnetic tapes, etc.). The system for data distribution is largely uncoordinated and fails to facilitate data integrity, security, back-ups
Pangaea Information Technologies, Ltd. vi USACE IJC LOSLR IMS
or archiving. This system includes no active maintenance functionality for individual datasets: incremental changes to parts of a dataset cannot be made, and only wholesale replacement is possible. Considerations for public accessibility of data and long-term sustainability of data and systems are not addressed under the current strategy. No immediate additional costs are associated with continuing with the Status Quo alternative; however, because the CCIW FTP site was intended as a temporary solution, a decision to continue with this strategy will likely require that addition capacity be added in the near future as the demand for its use increases.
Alternative 2 – Single Repository: The second alternative identified to address the need for a coordinated data storage, maintenance, access and distribution is the implementation of a Single Repository for Study data. The repository would exist as single FTP site to which users can be assigned rights and permissions according to their specific information needs. As a single location for all Study data, the repository would allow for much greater coordination of data distribution, and data integrity, security, back-up and archiving would be facilitated. The repository would be able to accommodate public access to data through providing limited access with read-only permissions or by implementing a webpage with hyperlinks to FTP-downloadable files.
Alternative 3 – Single Data Base Management System (DBMS): The third alternative identified to address the need for a coordinated data management strategy involves the implementation of a Single Data Base Management System (DBMS) for data storage, maintenance, access, and distribution. Establishing a Single DBMS in which data is loaded and stored in a logical structure in a relational database environment would allow for data to be integrated into other systems. It would also accommodate the application of other technologies much more effectively than through using the file structure approach of the previous two examples. The single location will facilitate data integrity, security, back-up and archiving. However, because long-term sustainability is dependant upon the willingness and ability of data owners and stewards to maintain datasets, as with the previous alternatives this one prohibits long-term sustainability by inhibiting regional ownership and stewardship. Policies to provide for appropriate public accessibility would need to be established under the Single DBMS alternative. Similar to the single repository, a flexible data security model and standards for data transfer would need to be implemented.
Alternative 4 – IJC Distributed DBMS: The fourth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to
Pangaea Information Technologies, Ltd. vii USACE IJC LOSLR IMS
the single DBMS described above, but divided and managed by the respective national offices of the IJC in Ottawa and Washington DC. A dual system would be developed and maintained in a consistent and interoperable manner so as to support seamless data access across national jurisdictions. By committing to the development and maintenance of a system managing data for the LOSLR Study by national jurisdiction, the IJC would build an information management infrastructure to support the data management needs of the LOSLR Study, and potentially, future studies.
This alternative would require the Study Board’s support to equip the IJC national offices with the necessary hardware, software and expertise required to develop, implement and maintain interoperable geodata management systems. Because this approach requires the development of IM support staff and resources, the cost associated with this dual system is substantially greater than the regionally distributed alternative, which takes advantage of the infrastructure and established knowledge base of other regional organizations. However, while the cost is associated directly with the LOSLR Study’s IM system development, implementation, and maintenance, it could also be considered an investment for future studies and other IJC information management needs.
Alternative 5 – Regionally Distributed DBMS: The fifth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to the Single DBMS described above, but divided and managed at the regional level. The Regionally Distributed DBMS most effectively addresses the need for regional partners to ensure the longevity of data associated with the Study. As with data owners, regional system maintainers would need to be identified just as data owners would. This data management model is the most flexible and progressive; it is endorsed and actively promoted by leaders in the geospatial IT community from the public and private sectors, as well as NGOs.
A Regionally Distributed DBMS would be developed in a coordinated effort to ensure maximum consistency in system implementation and maintenance. Interoperability standards would be need to be specified to ensure greater integration and connectivity to other systems, and can more easily accommodate other technologies such as geospatial web services. At present, probable candidates as regional components in this distributed set of DBMSs include systems managed by the Great Lakes Commission, Land Information Ontario (LIO, a part of the Ministry of Natural Resources) or Environment Canada – Ontario Region, and Environment Canada –Quebec Region (EC-QR). While all three regionally-distributed DBMSs will have separate
Pangaea Information Technologies, Ltd. viii USACE IJC LOSLR IMS
administration, consistency must be promoted during development to ensure a common approach to data storage, maintenance, access, and distribution. In addition to addressing seamless system development and implementation, data held in the systems would need to be clipped to a common boundary and otherwise made seamless in order to facilitate the overall consistency of the Study data. System development for this alternative will require investment exceeding that necessary for the Single DBMS, in order to accommodate for the additional coordination of effort and system implementation.
Alternative 6 – Technical Work Group (TWG) Distributed DBMS: A final alternative that should be considered to address the need for coordinated geospatial data management involves implementing a DBMS similar to the alternative described above, but with components distributed among TWGs. This approach has several advantages, although these are confined to activities that will take place during the duration of the Study. The TWG Distributed DBMS alternative would place the data and system in relatively close association with the data developers and initial data users. As such, reliable access and control over the data has the potential to increase the overall motivation required for system upkeep during the Study. Moreover, because the system and geodata would be managed by that data’s primary user-group, data currency and integrity should remain up-to-date.
However, because this approach includes datasets that encompass international and provincial boundaries, unlike the “Regionally Distributed DBMS” alternative, securing data owners with the motivation to provide for database maintenance beyond the Study’s terminus could prove problematic. Likewise, the system longevity would be dependent on securing a motivated steward prior to the completion of the Study. Obviously, this alternative would require a maximum allocation of funding – required to implement a large network of distributed systems, one for each individual TWG.
DBMS Options:
The three options for the three preceding DBMS alternatives (alternatives 3-5) include:
Option 1 – Data Viewing Map Making: The development and provision for interactive Data Viewing and Map Making using open source software,
Pangaea Information Technologies, Ltd. ix USACE IJC LOSLR IMS
Option 2- Proprietary Internet Mapping Service: The implementation of Proprietary Internet Mapping Services that offers more robust geospatial analysis functionality than that in the first option, and/or Option 3- Middleware: The implementation of system “middleware” that allows the connection of geospatial applications in certain DBMS environments.
Recommendation for Data Storage, Maintenance, Access, and Distribution
To address the need for Data Storage, Maintenance, Access, and Distribution, implementation of a Alternative 5 - Regionally Distributed System is recommended. The system recommended in Alternative 5 would be distributed among the three political regions (Quebec, Ontario, and New York State) that comprise the Study area. Because the IJC does not wish to serve in a data maintenance capacity beyond the life of the Study, data owners and stewards will need to be assigned to ensure long-term sustainability of data. Regional agencies have the necessary interest in the datasets and motivation to ensure the data’s longevity. Hence, this alternative increases the likelihood that the system and the data will likely remain sustainable in the long- term, and can be recommended because of the existing resources available to the study in the form of regional DBMS’s and knowledge bases.
DBMS Options 1 and 3 are recommended: establishment of web-based Data Viewing and Mapping capabilities, and installation of “middleware” to provide for system interoperability and OpenGIS Consortium compliancy for other Open Web Services (OWS). Total estimated cost of implementing the recommended alternative and options is $166,675US in FY2002 and $312,175 thru FY2005.
Policy essential in the implementation of the recommended alternative and options are:
All primary Study participants (e.g., Study Board, PIAG, and TWG members) should be given access to all data and information utilized and/or produced by the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns. All other interested parties should be given access to any data and information which is considered new or having value added to it by activities of the Study, with the exception
Pangaea Information Technologies, Ltd. x USACE IJC LOSLR IMS
of data and information having special security, liability, privacy, licensing, or proprietary concerns. “New data or information” could be defined as that which did not exist prior to Study activities and was generated from primary data collection procedures as a direct result of Study activities, i.e., model output or results. “Value-added data and information” could be defined as that which has been significantly improved as a result of Study activities in either its content or usability. Data owners, and especially data steward, should be identified as early as possible prior to the end of the Study.
Document Information Management Alternatives and Options
Without question, other flows of information will be necessary for the Study to be successful. In particular, it is likely that administrative and document management tools will become increasingly desirable as the Study progresses. However, without having developed a Communications Strategy or specific policies for internal reporting procedures or functions, specific recommendations are difficult. Given this lack of information it would be prudent to err towards a more robust document management system that is scalable and possesses the capacity for extensibility. Designing and implementing a system that will not meet changing or currently unforeseen critical Study needs could prove very costly (and wasteful) in the long run. Designing a system that is extensible and scalable provides insurance against this.
Having said this, the following document management system components and functions are tentatively recommended: 1) Commercial Off The Shelf (COTS) software for web-based document and other information management, such as Xerox’s DocuShare (see Section 7.3.2.1.1). This higher-end web-based document management system could prove extremely useful in meeting internal Study IM needs. 2) A web-site with documents and other information presented in a hierarchical structure. This is simply a recommendation for the organization of the existing web-site. Basic HTML text search functionality should be provided. 3) A web-enabled Shared Vision model. The IMS team views this model as having the potential for more than an excellent decision support tool. Its structure allows the integration of all essential information (i.e., links to model descriptions, model inputs,
Pangaea Information Technologies, Ltd. xi USACE IJC LOSLR IMS
etc) that could facilitate evaluation of the Study and support the recommended data discovery, evaluation and access schema.
System Component Integration
The primary system components, as recommended above, are:
the three regional database management systems (DBMS), a web-mapping and geodata viewing application, the Study website, a Study-wide document management system with web interface, and a web-enabled version of the Shared Vision Model (SVM). Given the distributed nature of the Study Participants and the stakeholders across the study region, the Internet should serve as the backbone for integrating the Study’s IM system. Study web pages and hyperlinks contained therein then serve as the means for providing linkages among the recommended applications as well as the collection of documents, databases, images, etc. that comprise the Study’s body of data and information. Hyperlinks to a particular document, database, or application should be present at all logical locations within the system.
Under this scenario, the Study website serves as the focal point, and point of departure, for all system functions. Organization of data and information services via the existing Study website will allow for the efficient and simple query and transfer of information to both the public and to Study Participants. Moreover, by utilizing the familiar structure of web portals as a central information store, all users of the system will immediately be able to find the information that they are seeking.
Pangaea Information Technologies, Ltd. xii USACE IJC LOSLR IMS
TABLE OF CONTENTS
EXECUTIVE SUMMARY...... i TABLE OF CONTENTS...... x TABLE OF FIGURES...... xiii 1.0 INTRODUCTION...... 1 2.0 EXISTING STUDY POLICY...... 4 2.1 Study Mandate...... 4 2.2 Plan of Study (POS)...... 4 2.3 Great Lakes Water Quality Agreement (GLWQA)...... 5 3.0 NEEDS ASSESSMENT...... 6 3.1 Overview of Needs Assessment Process...... 6 3.2 NAQ Responses and Follow-up Interviews...... 6 3.2.1 Common Data Needs TWG ...... 6 3.2.2 Hydrologic and Hydraulics TWG ...... 6 3.2.3 Hydropower TWG ...... 7 3.2.4 Coastal TWG ...... 7 3.2.5 Environmental/Wetlands TWG ...... 8 3.2.6 Recreational Boating and Tourism TWG ...... 8 3.2.7 Study Board – Scirmemmano ...... 9 3.2.8 Public Interest Advisory Group ...... 9 3.2.9 Plan Formulation and Evaluation Group ...... 10 3.3 Study Properties and Needs...... 11 4.0 EXISTING RESOURCES AND POTENTIAL SYSTEM COMPONENTS / PARTICIPANTS...... 14 4.1 International...... 14 4.1.1 Open GIS Consortium (OGC) ...... 14 4.1.2 Great Lakes Information Network (GLIN) ...... 14 4.1.3 Binational.net ...... 15 4.2 United States...... 15 4.2.1 Federal Geographic Data Committee (FGDC) & the National Spatial Data Infrastructure (NSDI) ...... 15 4.2.2 Cornell University Geospatial Information Repository (CUGIR) ...... 16 4.2.3 New York State GIS Clearinghouse ...... 16 4.3 Canada...... 16
Pangaea Information Technologies, Ltd. xiii USACE IJC LOSLR IMS
4.3.1 GeoConnections ...... 16 4.3.2 Ontario ...... 17 4.3.2.1 Land Information Ontario...... 17 4.3.3 Quebec ...... 18 4.3.3.1 Environment Canada Quebec Region...... 18
5.0 OTHER ORGANIZATIONS / STUDIES...... 19 5.1 Red River Basin Decision Information Network...... 19 5.2 Yellowstone-to-Yukon...... 20 5.3 United States Global Change Research Program‘s Data Working Groups...... 21 6.0 FOCUS DISCUSSIONS AT IMS WORKSHOPS...... 22 6.1 Policy...... 22 6.2 Technical...... 23 6.3 FGDC (USGS) Cooperative Agreements Program (CAP) Grant Opportunity...... 24 6.4 Study Properties and Needs...... 26 7.0 PRIMARY ALTERNATIVES AND OPTIONS...... 28 7.1 Data Discovery...... 28 7.1.1 Alternatives ...... 30 7.1.1.1 Status Quo...... 30 7.1.1.2 Generating a Data List...... 31 7.1.1.3 Metadata Catalog...... 31 7.1.1.4 Participation in SDI...... 33 7.1.2 Additional Options ...... 34 7.1.3 Evaluation of Alternatives ...... 35 7.1.4 Costs ...... 37 7.1.5 Recommendations ...... 39 7.2 Data Storage, Access, and Distribution...... 40 7.2.1 Alternatives ...... 42 7.2.1.1 Status Quo...... 42 7.2.1.2 Single Repository...... 43 7.2.1.3 Single Data Base Management System (DBMS)...... 44 7.2.1.4 IJC Distributed DBMS...... 44 7.2.1.5 Regionally Distributed DBMS...... 45 7.2.1.6 TWG Distributed DBMS...... 47 7.2.2 Additional Options ...... 49
Pangaea Information Technologies, Ltd. xiv USACE IJC LOSLR IMS
7.2.3 Evaluation of Alternatives ...... 49 7.2.4 Cost ...... 58 Status Quo...... 58 Proprietary...... 60 7.2.5 Recommendations ...... 60 7.3 Document and General Information Management Tools...... 62 7.3.1 Simple Web Site Approach ...... 62 7.3.1.1 Hierarchical Web Page Structure...... 62 7.3.1.2 Optional Web Tools...... 64 7.3.1.2.1 HTML Text Search...... 64 7.3.1.2.2 Metadata for aspatial information...... 67
7.3.2 Document Management Systems ...... 68 7.3.2.1 COTS Software...... 68 7.3.2.1.1 DocuShare – Xerox Corporation...... 68 7.3.2.1.2 EasyDocs - Internet Development Ltd., UK...... 71 7.3.2.2 Customized Information Management Systems...... 74 7.3.2.2.1 US EPA Environmental Information Management System...... 74 7.3.3 Presentation Options for Study Content and Decision Support ...... 77 7.3.3.1 Stand Alone...... 77 7.3.3.2 Web Enabled...... 80 7.3.4 Recommendations ...... 80 7.4 System Component Integration...... 81 8.0 IMPLEMENTATION...... 84 8.1 Data Discovery...... 84 8.2 Data Storage, Maintenance, Access, and Distribution...... 86 8.3 Document and General Information Management Tools...... 86 9.0 SUMMARY...... 87 10.0 REFERENCES...... 101 Appendix I: Needs Assessment Questionnaire distributed to all TWGs...... 102 Appendix II: Lists of model inputs and outputs...... 107 Appendix III: DIWG Policy Examples...... 117 Appendix IV: FY2002 CAP Grant Proposal Summary...... 118 Appendix V: Public Participation Management Tools...... 119 A.V.1 Information Collection Using Web-based Forms...... 119
Pangaea Information Technologies, Ltd. xv USACE IJC LOSLR IMS
A.V.1.1 Web-Based Surveys ...... 119 A.V.1.1.1 HTML Survey Design...... 120 A.V.1.1.2 COTS Software Programs...... 122 A.V.1.2 Feedback Forms ...... 125 A.V.2 Contact Addresses/Lists...... 128 Appendix VI: List of Acronyms...... 129
TABLE OF FIGURES
Figure 7.1.1 – Geospatial Data Discovery through the SDI...... 29 Figure 7.1.2 - Evaluation of Data Discovery Alternatives...... 37 Figure 7.2.1(a) - Flow of information between modeling groups and Single DBMS server in Alternative 3...... 50 Figure 7.2.1(b) - Flow of information between modeling groups and IJC Distributed DBMS servers in Alternative 4...... 51 Figure 7.2.1(c) - Flow of information between modeling groups and Regionally Distributed DBMS servers in Alternative 5...... 52 Figure 7.2.1(d) - Flow of information between modeling groups and TWG Distributed DBMS servers in Alternative 6...... 53 Figure 7.2.2 - Evaluation of Storage, Maintenance, and Access Alternatives...... 57 Figure 7.3.1 - LMPDS Document Clearinghouse Contents Page...... 63 Figure 7.3.2 - Basic Search Form Example...... 65 Figure 7.3.3 - Query Results Page...... 66
Pangaea Information Technologies, Ltd. xvi USACE IJC LOSLR IMS
Figure 7.3.4 - DocuShare Integration with Windows Explorer...... 69 Figure 7.3.5 - DocuShare Outlook Integration...... 69 Figure 7.3.6 - EasyDocs Search Screen Example...... 72 Figure 7.3.7 - EasyDocs Search Results Example...... 72 Figure 7.3.8 - EIMS Search Form...... 75 Figure 7.3.9 - EIMS Metadata Search Results List...... 75 Figure 7.3.10 - EIMS Metadata Summary Form...... 76 Figure 7.3.11 - The Microsoft Encarta Start-Up Window...... 78 Figure 7.3.12 - Zambezi River Information management System GUI...... 79 Figure 7.4.1 - System Components Integration...... 83 Figure 8.1.1 - Alternatives and Options Timeline...... 86 Figure A.V.1 - Example of Web Based Questionnaire...... 121 Figure A.V.2 - HRDC NOC/SIC Code Database Query Screen...... 122 Figure A.V.3 - Feedback / Submittal Form for Section 227 Project Information...... 126 Figure A.V.4 - Section 227 Database Query Form...... 127 Figure A.V.5 - Section 227 Search Results Page...... 127
Pangaea Information Technologies, Ltd. xvii USACE IJC LOSLR IMS
1.0 INTRODUCTION
The Common Data Needs Technical Working Group (CDNTWG) of the International Joint Commission’s Lake Ontario – St. Lawrence River Study (LOSLR “Study”) was charged with the development and implementation of an Information Management Strategy (IMS) for the Study. In response, the CDNTWG assembled an IMS Team consisting of GIS, IM, and IT professionals either participating in the Study or associated with agencies or organizations in the Study region. With assistance from a contractor, Pangaea Information Technologies, the IMS team has conducted a comprehensive Needs Assessment and conducted two workshops to aid in the formulation of that Strategy. The results of that development effort are presented in this document. After selection of implementation alternatives and options, and endorsement of Study IM policies to support those alternatives and options, the CDNTWG will coordinate the development and implementation of a detailed Information Management Plan.
Information management is a critical component of any study conducting a regional impact assessment with as large and diverse a scope as the Lake Ontario – St. Lawrence River Study. At the heart of the study’s results will be the data and information collected, analyzed and produced by the Technical Working Groups (TWGs) upon which decisions will be made and justified. To ensure the impartiality of the study’s conclusions, ideas and information will need to be exchanged freely and openly among study participants and in as near real-time as possible. For the LOSLR Study, an information management system will be required to organize a large amount of geospatial and non-spatial data. The procedures and mechanisms employed in such a system will need to facilitate the sharing of ideas and information to a distributed set of users. Because the results of the Study will formulate recommendations that could affect large segments of the population, an information management system developed for the Study should address the public’s need for free and open access to information.
One of the principle functions required of a system responsible for supporting the sharing of information is the ability of potential users to learn of the existence of and significant details about data or information. Popularly known as data discovery, such mechanisms employ search procedures which match user defined criteria against information held about the data, known as metadata. Metadata is the most important component of the
Pangaea Information Technologies, Ltd. 1 USACE IJC LOSLR IMS
data discovery process. The degree to which an organization generates metadata for data and information determines how effective a data discovery mechanism can be implemented. Standard geospatial metadata formats have been developed by the FGDC and ISO to ensure that all essential information about the data has been collected and represented in a consistently organized and searchable way. Metadata that is not compliant with commonly accepted standards lacks the necessary completeness that is required to publish information in a meaningful way. Because data discovery only provides information contained in metadata, it can be considered separate from most liability and security concerns associated with data access and distribution.
Data storage, maintenance and access needs require the coordination and integration of the many responsibilities associated with the system and its data. Data owners hold the responsibility for data security, use and maintenance, and they have the authority to define and manage data access and distribution. This is commonly achieved through the application of a flexible security model. Data stewards are responsible for the day-to-day maintenance of data. Given the authority by the data owner, data stewards are familiar with the issues and concerns specific to a data set. Consistent maintenance is essential to ensure the currency and quality of data. The integration of these responsibilities with the infrastructure and organizational procedures that support the system ensures reliability and sustainability.
The Internet continues to be the most commonly utilized method of distributing an information management system to a large number of dispersed users. Through consistent and well-designed implementation, an information management system can consist of one or many servers and provide simultaneous access to multiple users in many different locations. In establishing a shared environment from which data and information resources can be utilized, a client/server strategy can promote efficient system administration and access. The organization and design of the system will also need to address extensibility in terms of supporting the capacity to implement additional technical functionality after the initial implementation phase. Examples of such additional functionality are web services such as web mapping services (WMS) and web feature services (WFS) which provide functionality for interactive geospatial data viewing and querying over the Internet. This type of functionality would require the
Pangaea Information Technologies, Ltd. 2 USACE IJC LOSLR IMS
implementation of a system that allows for connectivity to multiple datasets, potentially over multiple systems.
Developing a system that includes the required functionality begins with the strategy and analysis phase of the system development life cycle, the traditional methodology used to develop information systems. The primary purpose of the strategy and analysis phase is to establish a solid understanding of the organization and functions for which the information system is being developed. Done through research, conducting needs assessment, and interviewing relevant participants and system users, this process leads to a strategy appropriate to a particular organization with its specific needs. The strategy will provide guidance through following phases of the system development life cycle. This report represents a synthesis of the information collected during the strategy and analysis phase of system development for the IJC LOSLR Study. Through collecting more detailed information about the specific data for which the system is being developed, logical and physical models will be developed to provide a clear concept of the system and direct the build and document phase of the development process. The initial implementation of the system will require the installation of hardware and software, loading of sample data, and the development of user documentation, help text and operations manuals to support the use and operation of the system. With initial implementation complete, the testing phase of the development process will ensure the system performs the required functions and supports the organizations business processes as they were defined in the strategy and design phases. After testing has been completed and the system refined, the production phase, the final step of the development process, can be initiated. The final system is delivered to the users and any necessary training is conducted. The system is closely monitored through the early period of production and enhanced or refined accordingly to optimize system performance.
This report is intended to summarize the strategy and analysis phase of the system development process and provide appropriate alternatives from which a specific system design and implementation plan can be formulated. This report is not intended to provide the design specifications required for building the system. However, through the alternative approaches discussed in this report, direction is provided to assist in the beginning of the design phase of the development process.
Pangaea Information Technologies, Ltd. 3 USACE IJC LOSLR IMS
Pangaea Information Technologies, Ltd. 4 USACE IJC LOSLR IMS
2.0 EXISTING STUDY POLICY
Policies with organizational authority are necessary to ensure consistency throughout the Study and support the coordinated effort required for success. Formal policies, in the form of directives or mandates, provide clear guidance and can serve as a model for other Studies. The needs assessment revealed a very limited number of IJC-level policies on data management, and a few more at the Study-level, though relatively general in most cases.
2.1 Study Mandate
The following policy statements are taken directly from the Mandate for the IJC LOSLR Study.
10. “The Commission emphasizes the importance of public outreach, consultation, and participation. … The Commission expects the Study Board to involve the public in its work to the fullest extent possible. The Study Board shall provide the text of media releases to the Secretaries of the Commission prior to their release.”
11. “To facilitate public outreach and consultation, the Study Board shall make information related to the study as widely available as practicable, including white papers, data, reports of the Study Board or any of its subgroups, and other materials, as appropriate.”
2.2 Plan of Study (POS)
The following policy statements are taken directly from the Plan of Study for Criteria Review for the IJC LOSLR Study.
4. Coordination of Common Elements by the Study Board 4.1 Direct and Coordinate Work of Study Teams 4.1.d The authority and tasks of the Board would include to “act as coordinator to ensure effective exchange of information among the study teams, and full use of studies or information from other sources.”
Pangaea Information Technologies, Ltd. 5 USACE IJC LOSLR IMS
4.6 Process Management and Integration of Work “Given the considerable cost of the overall Plan of Study activities, the Study Board will also need to ensure that duplication of effort is minimized, and data collected is made widely available across all teams.”
“The Study Board will also need to satisfy itself that each Study Team is carrying out the required work in a satisfactory manner, and that cross-interest impacts have also been considered.”
Annex 4 – Background Documentation and Correspondence 4(c) Directive in the Lake Ontario – St. Lawrence River “Plan of Studies” Team “Documents, letters, memoranda, and communications of every kind in the official records of the Commission are privileged and become available for public information only after release by the Commission. The Commission considers all documents in any official files that the team may establish to be similarly privileged. Accordingly, all such documents shall be so identified and maintained as separate files.”
2.3 Great Lakes Water Quality Agreement (GLWQA)
The policy statements listed below are taken from the 10th Biennial Report on the Great Lakes Water Quality, Chapter 6 Information and Data Management, and are only directly applicable to the GLWQ Agreement.
o Quality assurance for legal and scientific defensibility o Broad waiver of data recovery costs o Promote accessibility of data and information o Promote organization and management of data bases o Establish protocols to ensure compatibility and comparability of data [across programs and boundaries]
Pangaea Information Technologies, Ltd. 6 USACE IJC LOSLR IMS
3.0 NEEDS ASSESSMENT
3.1 Overview of Needs Assessment Process
The Needs Assessment (NA) process consisted of the development of a Needs Assessment Questionnaire (NAQ; presented as Appendix I), completion of the NAQ by TWGs and the Public Interest Advisory Group (PIAG), and follow-up interviews conducted by Pangaea. A list of datasets associated with the inputs and outputs of models addressing Performance Indicators (PIs) was compiled as a result of this and associated efforts, and is presented as Appendix II to this report.
3.2 NAQ Responses and Follow-up Interviews
Responses to the NAQ were received from all of the TWGs except two: Commercial Navigation and Municipal, Industrial, and Domestic Water Use. Follow-up interviews were conducted via conference call after reviewing the completed questionnaires.
3.2.1 Common Data Needs TWG
The Common Data Needs (CDN) TWG was formed to provide for the elevational (bathymetric and topographic) data and imagery requirements for the compliment of TWGs, and to work towards an information management strategy to facilitate the sharing, access and use of all data and information generated within the study. The CDN TWG is nearing completion of data collection activities for the elevation and imagery data, as well as geodata for shorelines, political units, transportation features, watersheds, and tributaries. All the data will need to be transmitted to the other TWGs for use in their modeling and analysis. An FTP site is expected to be sufficient for these data transfers through the summer 2002, though this does not allow for data viewing. The group has identified some restrictions on data use (e.g., with the City of Kingston and City of Hamilton orthoimagery).
3.2.2 Hydrologic and Hydraulics TWG
Pangaea Information Technologies, Ltd. 7 USACE IJC LOSLR IMS
The Hydrologic and Hydraulics (H&H) TWG will produce a series of hydrologic scenarios that describe the levels and flows associated with different regulation plans and climate conditions. These scenarios will be used to perform resource-level impact assessments: the TWGs (other than CDN) will use these in models which address the PIs associated with their particular resource sector. Hence, upon completion of hydrologic data set production, H&H TWG will need to make their products available to the TWGs. The H&H TWG’s response to the NAQ focused on data needed to support their modeling efforts, data transfer to other TWGs, and archiving their outputs.
H&H TWG model inputs and outputs will need to be made accessible to the public as well as the other TWGs. However, with respect to the outputs, the H&H perceives that the usefulness and desirability of the full datasets to the public is minimal. The only input datasets not accessible to the public are a portion of the raw digital bathymetry and raster shoreline files for the Upper St. Lawrence River. These are owned by Canadian Hydrographic Service (via Nautical Data International) and were made available to Environment Canada for modeling purposes only.
Upon completion of dataset production, at which point updates will no longer be needed, the H&H TWG believes it best to transfer all data to a single repository. Prior to summarizing, there will be about 400 Mb of output for each scenario for each geographic location within the Study area that is being modeled.
3.2.3 Hydropower TWG
The Hydropower TWG is still defining Performance Indicators (PIs), models, and thus input data needs. Some data needs for the group have been addressed: as “Hydropower Entities” associated with the TWG hold a wealth of historical data that is already in the public domain. These data sets are maintained by the Entities themselves, and are continually being updated. At the time of the interview, the Hydropower TWG believed that all of their data could be shared with other groups, with the exception of some megawatt pricing information. However, the Entities would not want to expend the resources to actively maintain and update their datasets in a remote location on a frequent basis. This group perceives a need for consistent, Study-wide guidelines for PI valuation.
Pangaea Information Technologies, Ltd. 8 USACE IJC LOSLR IMS
3.2.4 Coastal TWG
The Coastal TWG is split into two sub-groups: one for Lake Ontario and the upper St. Lawrence River, and the other for the Lower St. Lawrence River. Both groups have defined their PIs, modeling approaches, and input data needs. Their model inputs and geospatial data requirements are extensive relative to the other TWGs (see Appendix II). The “Upper” sub-group of the Coastal TWG will use Baird’s Flood and Erosion Prediction System (FEPS), with data processing done primarily by consultants. The TWG is purchasing a Coastal Data Server (CDS) for the Upper sub-group, which will hold all of their data. The sub-group considered the possibilities of holding all data for the Study on this server, though they do not currently have the funding necessary for public accessibility to the CDS. The “Lower” sub-group, centered at the Environment Canada, Meteorological Service of Canada, Quebec Region, Hydrology Section (ECQR), is utilizing an approach relying on finite element grid model output. A new database management system (DBMS) is being developed at ECQR, in part to address the Lower sub-group’s needs.
The Coastal TWG has some liability concerns associated with premature release of data that might allow for misuse and misinterpretation of the data. Also, there are potential security issues with some higher-resolution data sets (e.g., aerial photos, topometry). The Coastal TWG noted that some datasets are licensed and owned by private companies. The group perceived no problems with the short-term GIS Guidelines.
3.2.5 Environmental/Wetlands TWG
The Environmental/Wetlands TWG provided a limited response to the Needs Assessment Questionnaire. The group has identified an initial set of Performance Indicators, models, and supporting data inputs. The data modeling and processing will be performed by a combination of affiliated organizations, consultants, and TWG members. Some of their data will be funded and owned by the Ministry of Natural Resources. The group may require additional GIS capabilities and technical support. A need for designation of an information management lead for each TWG was suggested. The sole responder to the NAQ suggested that it was likely that an FTP site could meet the group’s needs for information distribution. At the March 7th and 8th Environmental/Wetlands TWG
Pangaea Information Technologies, Ltd. 9 USACE IJC LOSLR IMS
meeting, it was confirmed that providing data discovery, evaluation (including visualization), and access within this TWG and among others could be of considerable benefit.
3.2.6 Recreational Boating and Tourism TWG
The Recreational Boating and Tourism TWG has identified their Performance Indicators, models, and data inputs. To date, much of the information identified as “required” by this TWG is survey-based (data about marina owners and recreational users). As a result of this, the group has confidentiality concerns (e.g., individual survey responses). This TWG also needs highly resolved bathymetric data for areas around the marinas, boat docks, and launching ramps. The geodata inputs to the models were created in UTM, and will be reprojected as per the Short-Term GIS Guidelines. Aside from this, the group has not yet evaluated the Short-Term GIS Guidelines, including the metadata standards. The group does not have funding for public awareness and access included in their budget, and believes that the policy and funding for public accessibility and data sharing should come from the IJC -- because they are the principal owners of the data during the course of the Study. The group does see that there could be potential benefits from learning about the other TWGs’ model inputs and outputs. They foresee that extending applications developed for the Study could benefit other user groups during or after the Study.
3.2.7 Study Board – Scirmemmano
Frank Scirmemmano had several recommendations for the Study, most relating to information accessibility and transparency with respect to the public. Scirmemmano believes that public awareness and accessibility should be conducted at all organizational levels with consistent methods and structures. In general, all the data should be made available for public scrutiny. This is necessary to ensure transparency of the Study process, which is crucial to public acceptance and perceived credibility of the Study. Hence, access is important for success of the Study. In order to address public accessibility, there is a need for a Study policy regarding what data and information should be made accessible to the public. Scirmemmano generally approves of the draft
Pangaea Information Technologies, Ltd. 10 USACE IJC LOSLR IMS
definition of “publicly-accessible data”: new data, or value-added data produced by the Study that is not readily-available through other sources.
Scirmemmano also related the need for Study-wide communication in the area of data collection and use. The discovery, acquisition, and use of data should be coordinated to maximize the efficient use of Study resources. Also, the development of the IMS strategy is crucial to the success of the Study. In order to ensure compliance, contracts and the balance of funding should be tied to compliance with the process and related policies that make up the IMS strategy. However, some contingency should be created so that compliance to communication, metadata, and data policies, despite their importance, do not detract from the funding obligations to the Study’s working groups (or inhibit the TWGs from completing their research or analyses).
3.2.8 Public Interest Advisory Group
The Public Interest Advisory Group’s (PIAG) response to the Needs Assessment focused on the need for public accessibility and transparency. The group’s principal concern is in maximizing the public’s knowledge of the Study, as well as facilitating the public’s involvement in the Study process. As knowledge and participation increase, the public’s perception of the Study’s credibility increases. This perception of credibility is crucial to the success of the Study.
The first step in facilitating public involvement is to advertise the Study’s existence to the public. PIAG plans to increase visibility of the Study through press releases, status updates and reports. In order to address the need for public accessibility and transparency, PIAG recommends that all data, models, procedures, and policies be disclosed throughout the duration of the Study, and for some time thereafter. Also, in order for the public to access the data, it needs to be available to them at the appropriate level of complexity. Therefore, the level of detail and subject matter for all data, including summary reports and periodic updates of TWG activities should correspond to the needs and desires of the target audience. This disclosure has the potential to be an involved process requiring a system capable of managing public surveys and contact address lists.
Pangaea Information Technologies, Ltd. 11 USACE IJC LOSLR IMS
Along with assuring that information is matched with the audience at the appropriate level of detail, the information must be organized in a manner that enables public consumption. To best aid the public, the information will be organized with an emphasis on usability in terms of content and format, via a hierarchical information structure. A document search and retrieval functionality must exist, which can be implemented in several different ways, ranging from documents searched and presented through metadata search functionality (such as the RRBDIN web site) to a hierarchical web page structure where the public can pass through hyperlinks for increasing detail. The public also needs the ability to find specific information within Study databases. To facilitate this, there needs to be database search and display functionality. There are many options for this search and display functionality, ranging from a stand-alone database with query and report functions to a web-enabled query and reports system associated with a backend database. A minimum requirement will be the capability to navigate through the information using some form of HTML text search capability.
To ensure that the system will meet public needs and consider public input, the information access structure also needs to allow for public feedback and/or questions that can be considered and responded to in a timely manner. In certain cases, it may be necessary to direct such questions to one or more appropriate Study members. However, a separate ask-an-expert capability, in which questions would be forwarded to experts within the various TWGs, was discouraged by the PIAG during their interview.
3.2.9 Plan Formulation and Evaluation Group
The Plan Formulation and Evaluation Group (PFEG) consists of all Study leaders: the Planning Group members, the entire the Study Board, and a representative (Co-Chair) from each of the technical work groups (TWG), and the Public Interest Advisory Group (PIAG). PFEG needs to be able to receive the output from the TWG models, each addressing a Performance Indicator response to the different hydrologic regimes (levels and flows) specified by the H&H TWG. At present, this information – presented in terms of monetary gains/losses inasmuch as possible – is to be integrated through Shared Vision Planning. A specific Shared Vision Model will be developed for the Study. This decision support tool will assist stakeholder groups in the comparison of alternative regulations and their associated hydrologic regimes and resource sector impacts.
Pangaea Information Technologies, Ltd. 12 USACE IJC LOSLR IMS
3.3 Study Properties and Needs
The Needs Assessment process revealed information essential to the formulation of an information management strategy. First, two primary user groups associated with the Study were recognized: Study Participants (all TWGs, the PIAG, and the Study Board), and the Public. Second, the general flows of information in the Study were identified. Third, there were requests, suggestions, comments, and needs identified that were either held in common among the individual Study Participant groups, or should be considered at the Study-level.
The general flows of information in the Study start with regulatory alternatives passed to the Hydrologic and Hydraulics (H&H) TWG. The H&H TWG models the “levels and flows” scenario associated with each regulatory alternative, and provides these to the TWGs who evaluate resource sector response for selected Performance Indicators (PI). In addition to these “levels and flows”, which can be considered forcing or driving variables of the PI models, modeling groups within the TWGs require many other input variables. While many model inputs are derived directly from organizations outside of the Study, inputs from at least three sources inside the Study can be used: 1) “basemap layers” provided through the CDN TWG, 2) model inputs obtained from other TWGs, or from modeling teams within the same TWG, and 3) model outputs from other TWGs, or from modeling teams within the same TWG. All model outputs, aggregated as appropriate and valuated in dollar ($) terms whenever possible, will be made available to all Study Participants and the Public. As suggested above, some of these model outputs may serve as inputs in models addressing PIs in the same or different resource sector. Model approach, results, and analysis, and discussion will be documented in report form, which will be made available to all Study Participants, and to the Public via the PIAG. Last, all model outputs will be transferred to the PFEG for incorporation in the Shared Vision Model, subsequently used by all Stakeholders.
Pangaea Information Technologies, Ltd. 13 USACE IJC LOSLR IMS
These flows of information can be summarized as follows: Primary Model Drivers H&H TWG TWGs other than CDN
Other Model Inputs Non-Study Orgs. TWGs CDN TWG other TWGs TWGs TWGs other than CDN
Model (Outputs) Results TWGs Model Inputs for TWGs (other than CDN) TWGs other than CDN PIAG all Study Participants and the Public TWGs other than CDN PFEG Stakeholders (via Shared Vision Model)
These study-wide results can be summarized as “Study Properties”, and as “Study Needs”:
Study Properties
Modeling and data processing is widely distributed within the TWGs among contractors, affiliated agencies and contractors. TWGs identified specific datasets as sensitive (for reasons related to security, proprietary, and liability). Inter-TWG data discovery (and access & evaluation) mechanism, while not currently in place, has potential benefits for input/output evaluation, further development of PI models, and better overall integration in the Study. The need for inter-TWG data transfer is currently limited. Provision of a mechanism for comprehensive data discovery, evaluation, and access would increase the need for inter-TWG and intra-TWG data transfers.
Pangaea Information Technologies, Ltd. 14 USACE IJC LOSLR IMS
The Common Data Needs TWG’s Short-term GIS Guidelines were consistently accepted by those TWGs that responded to the NA questionnaire. Most TWGs stated that obtaining metadata would be relatively easy, but none had it in a format ready for the Study. Responses to questions about making activities, forms or procedures web-enabled involved concerns of funding. Funding for public accessibility of Study information is not presently budgeted at the TWG-level.
Study Needs
Information management strategy that will provide for a widely distributed user group. Flexible security model and communication of data set sensitivity to users (via metadata). Data discovery, evaluation, and access mechanism. Policy and mechanism for data archiving. More specific metadata guidelines for TWGs. Importance of making the process transparent to the public. Study policy on public accessibility to data and information. Study policy on bilinguality of metadata and/or data. Study policy or clarification on PIAG vs. TWG responsibility and funding for public access and outreach. Study policy regarding what data and information should be made accessible to the public (e.g., “new and value-added not otherwise readily-available through other sources”). Study policy on tracking sensitive data (including that which is licensed). Study policy that links compliance to metadata and data standards to contracts and the balance of funding.
[Please note that many properties and needs directly associated with PIAG activities are not included above, though some options will be addressed in Section 7.3 and
Pangaea Information Technologies, Ltd. 15 USACE IJC LOSLR IMS
Appendix V.]
Fourth, a substantial number of both geo-spatial and non-spatial databases were identified as inputs and outputs as related to Study Participant activities. Most of these were associated with TWG models that address PIs. A list was generated from the Needs Assessment. For those TWGs with limited (or no) response to the NA questionnaire, the Plan of Study was used to augment (or wholly create) their parts of the input/output list. This list, comprehensive as possible at the time of this report, is presented in Appendix II.
Pangaea Information Technologies, Ltd. 16 USACE IJC LOSLR IMS
4.0 EXISTING RESOURCES AND POTENTIAL SYSTEM COMPONENTS / PARTICIPANTS
The Common Data Needs TWG held an Information Management Strategy Workshop in Burlington, Ontario February 14th and 15th. The workshop consisted of three presentations, followed by focused discussions. The first presentation focused on the results of the NA process. The second focused on existing IM resources (systems, knowledge bases, etc.) available to the Study, as well as IM strategies, policies, and “lessons learned” by organizations as structurally- and functionally-similar to the Study as possible. The third presentation explored potential policies, and alternative system architectures to meet Study needs. “Break-out groups” engaged in focused discussions on IM policy, technical issues, and writing a proposal for funding assistance to help meet Study IM needs. The latter topic focused on a FY2002 Category 4 CAP Grant, funded jointly through GeoConnections (Canada), and its US counterpart, the Federal Geographic Data Committee (U.S Geological Survey). [A grant proposal was submitted and has been accepted. See Appendix IV for a summary of the proposed project.]
4.1 International
4.1.1 Open GIS Consortium (OGC)
The Open GIS Consortium (OGC) is a consortium of government agencies, non-profit organizations, universities, and private organizations working together to ensure interoperability in the geospatial community. The group is working towards interoperability by developing standards for data formats and quality and procedural standards. The Consortia is working to develop and incorporate ISO standards so that all geodata and services can be internationally compatible. Through the widespread use of these standards, the Web can become “geo-enabled,” which will allow geospatial data to be more widely used and therefore incorporated into more decision-making processes.
4.1.2 Great Lakes Information Network (GLIN)
The Great Lakes Information Network (GLIN) serves as the clearinghouse of Great Lakes information. It was established by the Great Lakes Commission and has been
Pangaea Information Technologies, Ltd. 17 USACE IJC LOSLR IMS
online since 1993. The Great Lakes Commission was established in 1955 by federal legislation as an interstate agency to work with the eight Great Lakes States in the United States. GLIN now serves as the gateway for Great Lakes geospatial data, and provides some Web Map Services. GLIN is structured as a decentralized network of regional partners and information providers (including USEPA, USACE, IJC, and Environment Canada). GLIN provides centralized access to the information providers via a page of links to the individual organizations. The information providers develop, host, and maintain their information at their own location. GLIN serves to organize and enhance access to the information by providing a central link to all information in the network. This works to increase exposure of the information through integration among the information providers in the context of the partnership network. GLIN will soon serve as the site for the GLIN Data Access Clearinghouse (GLINDA). GLINDA will serve as a clearinghouse for all GLIN data and will be a node of the Federal Geographic Data Committee (FGDC) National Spatial Data Infrastructure (NSDI), facilitating more widespread data discovery.
4.1.3 Binational.net
There are many binational program web sites that currently are hosted on Canadian and United States web sites. This split between country web sites makes the discovery and access of data difficult for the users of the binational data. Until this program, it has been difficult to create a system capable of hosting both countries’ data due to different regulations in each country governing website design. Binational.net could get around these regulations by starting a website that is not under the auspices of either country; having a .net address rather than .ca or .gov. Binational.net was announced at an IJC conference and is a joint venture by the EPA and Environment Canada - Ontario Region that seeks to eliminate the redundancy and confusion created with multiple hosting sites by hosting binational data for the United States and Canada at a single location. The focus for this project is in making governmental data easily accessible to the public.
This option is of potential interest to the Study, but it appears that some procedural steps, at least from the US side, have slowed the development of binational.net. One criticism thus far is difficulty using the system due to inefficiencies in speed. More information is currently needed in order to fully consider the potential benefits of this resource.
Pangaea Information Technologies, Ltd. 18 USACE IJC LOSLR IMS
4.2 United States
4.2.1 Federal Geographic Data Committee (FGDC) & the National Spatial Data Infrastructure (NSDI)
The USGS’s Federal Geographic Data Committee (FGDC) is made up of 17 federal agencies to promote the nationwide use and sharing of geospatial data. The FGDC, with the help of other partner organizations from state and private entities, is the developer for the National Spatial Data Infrastructure (NSDI). The NSDI sets standards for data and metadata in terms of both quality and format. The FGDC also maintains a network of decentralized metadata clearinghouses which users can query to find needed metadata based on keywords, time period, and/or geographic location. The FGDC is also involved in data, metadata, and infrastructure development by offering funding opportunities via Cooperative Agreements Program (CAP) grants.
4.2.2 Cornell University Geospatial Information Repository (CUGIR)
Cornell University Geospatial Information Repository (CUGIR) is an active online repository providing geospatial data and metadata for New York State, with special emphasis on those natural features relevant to agriculture, ecology, natural resources, and human-environment interactions. Subjects such as landforms and topography, soils, hydrology, environmental hazards, agricultural activities, wildlife and natural resource management are appropriate for inclusion in CUGIR. All data files are cataloged in accordance with FGDC standards and made available in widely used geospatial data formats.
4.2.3 New York State GIS Clearinghouse
The New York State GIS Clearinghouse has a number of primary functions. It serves as an access for (downloading) data associated with the state of New York at no cost to users for some limited datasets. It also houses the New York State GIS Data Sharing Cooperative. The New York State GIS Data Sharing Cooperative is a group of government agencies and non-profit organizations who have entered into Data Sharing
Pangaea Information Technologies, Ltd. 19 USACE IJC LOSLR IMS
Agreements. When groups have entered the Cooperative, they must provide metadata to the clearinghouse, and also fill GIS data sharing requests from other members of the cooperative. Members are encouraged to put their data on a web site to minimize the need for staff interaction. There is no cost for joining, aside from costs associated with meeting member requests for data.
4.3 Canada
4.3.1 GeoConnections
GeoConnections is a national public and private partnership initiative led by Natural Resources Canada (NRCAN), and serves a role generally analogous to the US FGDC. The initiative fosters the creation of a Canadian Geospatial Data Infrastructure (CGDI) to enable online access and sharing of geographic information and services. GeoConnections is jointly funded by the Canadian government and partner organizations. GeoConnections provides many services to fulfill the needs of its different users. The GeoConnections Discovery Portal (formerly CEONet) allows users to search available metadata to discover data. Metadata is reviewed initially upon receipt from data owners/distributors, and periodically thereafter, by the Metadata Content Team. GeoConnections offers support for Web Mapping Services (WMS) and Web Feature Services (WFS), which utilize Open GIS Consortium (OGC) interfaces. GeoConnections also offers CGDI Re-Usable Components (e.g., Earthscape map viewer clients), which are a set of tools that provide geospatial location display in Web pages. Standardized interfaces (wizards) are provided for Re-usable Components so developer can embed these tools within their own web-based application. GeoConnections also permits the use of Web API, which enables customized portals into any part of the GeoConnections Discovery Portal, giving any external website the capability to use any CEONet service. GeoConnections also plays a role in data development by offering funding opportunities via “Access” grants, CAP grants (a cooperative effort with the FGDC), etc.
4.3.2 Ontario
4.3.2.1 Land Information Ontario
Pangaea Information Technologies, Ltd. 20 USACE IJC LOSLR IMS
Land Information Ontario (LIO) was designed to create a common infrastructure that will allow a wide range of consistent and well-managed land information to be captured, cataloged, and made readily available from a centralized warehouse. The group has coordinated Ontario’s participation in the CGDI.
There are several components within LIO. The Ontario Land Information Warehouse (OLIW) allows for online data viewing using the OLIW Map Browser. The Warehouse contains 140 viewable geospatial datasets, however, the data cannot be extracted and downloaded by users, except by subscription. The Ontario Land Information Directory (OLID) allows for data discovery by keyword and geographic area. The Ontario Digital Geographic Database (ODGD) consists of datasets maintained by the Ontario Ministry of Natural Resources (OMNR) that contains OMNR base data plus features of interest to the OMNR. Public and private users can access the data via several different licensing options. Users can purchase an Electronic Intellectual Property Copyright License if they have no plans to redistribute the data in any way. A Non-Value Added Resale License is available for users to sell and distribute the original data. Also, a Value Added Resale License is available for users to enhance, resell, and distribute the data. Finally, the Ontario Geospatial Data Exchange (OGDE) is a collection of shared data that only members can access. Membership is open to several different categories of organizations. All Schedule I and III ministries within Ontario are expected to join and must pay an annual levy of up to 50,000 CDN in order to join. Community groups with an annual budget over 100 million pay no fee in the first year; and then 3,000 CND annually each following year. Community groups with an annual budget less than 100 million pay no fee in the first year; and then 1,000 CND annually each following year. Other nations and commercial groups are considered on a case by case basis. LIO is currently working with GeoConnections to incorporate OGC-compliant Web Feature Services and Web Mapping Services connectors to improve effectiveness of data use.
For each of the various datasets housed in the LIO warehouse, a custodian, or data steward(s), is identified by the data owner. The custodian has a support network of “information teams” who help define standards and protocols. The custodian is ultimately responsible for defining specific database updating and maintenance tools. LIO has carefully documented their policies, standards, and procedures and made this documentation available to the Common Data Needs TWG.
Pangaea Information Technologies, Ltd. 21 USACE IJC LOSLR IMS
4.3.3 Quebec
As with Ontario, Quebec contains at least two existing resources for the LOSLR Study. These include the Quebec Ministry of the Environment (QME), and Environment Canada, Meteorological Service of Canada, Quebec Region, Hydrology Section (ECQR). The portion of the St. Lawrence River in Quebec is classified as an International Seaway. As such, the development and maintenance of a system for the hydrologic and coastal processes occurs at the federal level in Quebec. Hence, this is one reason that data development and modeling activities associated with this Study are being conducted with ECQR resources (i.e., facilities and staff). Although coordination with QME staff may be necessary and/or advisable at times during the Study, only ECQR resources will be described below.
4.3.3.1 Environment Canada Quebec Region
The Environment Canada, Meteorological Service of Canada, Quebec Region, Hydrology Section has recently made a substantial investment in a database management system: ~$50,000 CND for hardware and software alone. This was purchased by EC for the purpose of EC activities with the understanding that much of the IJC LOSL Study’s coastal analysis of the lower St Lawrence would utilize the system. Currently in its implementation stage, the system is designed to run Oracle and support OGC-compliant geospatial web services. Experiences gained in the development, design and implementation of the information management system by EC staff and the Database Administrator constitutes a valuable resource for the Study and increases the available knowledge base (KB).
Pangaea Information Technologies, Ltd. 22 USACE IJC LOSLR IMS
5.0 OTHER ORGANIZATIONS / STUDIES
Information management has been addressed by many organizations engaged in work of a comparable nature to the LOSLR Study. Many of these organizations were identified at the start of the IMS development process and have been evaluated in terms of their information management approaches. Lessons learned from the policies and decisions implemented by other organizations serve to promote a more thorough and successful information management strategy for the LOSLR Study. Organizations reviewed for their information management policies and decisions include the Yellowstone to Yukon Conservation Initiative, the Red River Basin Decision Information Network, Data and Information Working Group of the United States Global Change Research Program (USGCRP), and the Data Management Working Group of the USGCRP National Assessment Program.
5.1 Red River Basin Decision Information Network
The Red River Basin (RRB) Decision Information Network was developed to create an internet-based information dissemination system for the Red River Basin. The RRB Decision Information Network has two primary components, the RRB Decision Support System, and the RRB Virtual Data Base. Data planned for inclusion in the RRB Virtual Data Base are an authoritative base map for the basin, spatial data (e.g. topography, imagery), water quality data, and other related information. The International Joint Commission (IJC) is the data provider to the RRB Virtual Data Base. For the purposes of data discovery, the RRB Virtual Data Base is integrated with the Manitoba Land Initiative (MLI) in terms of shared web server and shared metadata catalog. The MLI ensures the long-term viability of the RRB Virtual Data Base by maintaining the metadata catalog. Data discovery queries to the RRB Data Information Network are processed on a replicate metadata catalog outside the Manitoba government firewall.
The RRB Decision Information Network recommends that a “watch-dog” group be created for the maintenance of the metadata catalog. This group would be responsible for overseeing the maintenance, collection, and integration of metadata from private organizations and governmental agencies in the RRB Virtual Data Base system on an ongoing basis. This “watch-dog” role would be carried out by maintaining contact with
Pangaea Information Technologies, Ltd. 23 USACE IJC LOSLR IMS
all private and non-government agencies that have contributed metadata in the past, and ensuring that metadata collection is current and that metadata is accurate. The group would also assist agencies with metadata collection tasks, help to identify new data sources related to Red River Basin flood management, and assist with integrating new and revised metadata.
The RRB Decision Information Network is also intended to facilitate future development of decision support tools via the RRB Decision Support System. The network is administered jointly by the IJC’s Red River Task Force, and by the Global Disaster Information Network under the direction of the Office of the US Vice-President.
5.2 Yellowstone-to-Yukon
One component of the Yellowstone to Yukon (Y2Y) Conservation Initiative, the Y2Y Framework Dataset Demonstration Project, is a collaborative transboundary project focused on creating 10 seamless geospatial datasets from the best available sources. The Project is working with the FGDC and GeoConnections to develop the Framework Architecture needed to facilitate the transboundary sharing and use of the datasets. The Project has compiled a catalogue of FGDC-compliant metadata for project data. Quality Assurance/Quality Control has been a priority, checking to ensure attribute consistency, spatial accuracy, vertical integration with other project layers, and metadata completeness. The Project has also established policy to document errors and/or inconsistencies in the metadata. This facilitates a more informed data evaluation by potential users in relation to data quality and limitations. The Y2Y project also established clearinghouse nodes to allow for public data discovery. The group recognized the need for stable server location for the nodes. They were able to find these in the US, using agency and university resources. Nodes were also established in Canada, with the Canadian and US clearinghouses employing mirrored indexes. These nodes create a distributed, virtual warehouse from which the public can access data.
Along with node creation, the project has also established procedures for data storage, maintenance, and access location based on stewardship need. There is a three-tiered approach for data stewardship need. Core datasets (e.g. “those not likely to change) are mirrored on servers in Canada and the US. Metadata lists both site URLs in the
Pangaea Information Technologies, Ltd. 24 USACE IJC LOSLR IMS
“Distribution” section of the FGDC standard metadata (section 6). Datasets with routine updates (e.g. roads) remain under the stewardship of the owner. The owner provides maintenance and archiving offsite and the data is completely reloaded as needed. The third tier is for owners with access capacity who maintain their own data online or on- request.
For access, the project has a drill-down approach in HTML pages for larger, tiled datasets. They are developing specific protocols for updating databases. Y2Y uses a membership approach and agreement to an “acceptable use policy” to address access to data with significant licensing issues. The group has piloted the use of core data in a transboundary cumulative effects analysis application, and has plans to develop other applications. 5.3 United States Global Change Research Program‘s Data Working Groups
Two primary groups address data policy issues within the United States Global Change Research Program (USGCRP). The Data Management Working Group (DMWG) formulates data policy related to the diverse studies conducted under the auspices of the USGCRP’s National Assessment Program. The Data and Information Working Group (DIWG) serves the same function for the USGCRP at large. The policies created by the DMWG include: 1) “Suggested Data Product Requirement for Grants, Cooperative Agreements, and Contracts” should be included in every contractual document (1997; see Appendix III), 2) Metadata should meet the FGDC standards (1998), 3) Servers be ANSI Z39.50 compliant (1998), and 4) Data abstracts should be submitted to the GCMD Global Change Master Directory (1998).
The Global Change Data and Information System (GCDIS), managed by the DIWG, provides a gateway to data and information related to global environmental change generated by federal agencies participating in the USGCRP. The GCDIS provides access to a wealth of documents related to data policies, including those advocated by the DIWG. The GCDIS is governed by the idea that there should be “full and open sharing”
Pangaea Information Technologies, Ltd. 25 USACE IJC LOSLR IMS
of data for free or at cost; via the World Wide Web whenever possible (DIWG 1991). To ease sharing, the GCDIS requires a standard data citation format (DIWG 1998; see Appendix III). The GCDIS also provides an important public outreach function through its “Ask Doctor Global Change” which puts users in touch with experts.
Pangaea Information Technologies, Ltd. 26 USACE IJC LOSLR IMS
6.0 FOCUS DISCUSSIONS AT IMS WORKSHOPS
Three “break-out discussion groups” met on the second day of the IMS Workshop. Two of these groups focused on “Policy” and “Technical” issues, alternatives, and recommendations to be included in this report. A third group evaluated the feasibility of submitting a Category 4 Community Assistance Program (CAP) grant proposal to the FGDC and GeoConnections.
6.1 Policy
Several policies were discussed and recommendations were made in the Policy Break-out Group at the IMS Workshop. The groups agreed on the need for free and open data sharing, “new” and “value-added” data that should be made accessible if it is not available elsewhere and no restrictions exist on the data. In order to facilitate free and open sharing, the Common Data Needs TWG needs to create metadata guidelines for the TWGs and assist the groups in producing the metadata. This will be implemented in such a manner that anyone working in the study has access to all data. PIAG would have access to everything that has metadata (and/or has been reviewed). Any agencies, outside contractors, academics, and members of the general public external to the study can access any data for which metadata exists when it is produced by the study, but must go to the original owner if the data was not produced by the study. Free and open sharing could be complicated by the existence of sensitive data. Several types of data were identified that could be sensitive, and included modeled output (e.g. erosion lines, flood limits, and property values), climate change scenarios and water levels, marina data (e.g. competition), exact locations of variable and threatened species, intakes and outfalls, and detailed imagery.
Proprietary data issues were also discussed. The group decided that 1) any requests for licensed data should be directed to the original owner, 2) the IJC cannot assume ownership of licensed data, and 3) the Common Data Needs TWG should track licensing agreements. For new or value-added data, future ownership will be transferred to a willing recipient who would ultimately be responsible for the storage, archiving, and maintenance of the dataset. The new data owner should be carefully chosen to avoid
Pangaea Information Technologies, Ltd. 27 USACE IJC LOSLR IMS
transferring ownership to an agency unwilling or unable to make the data freely available. Licenses for commercially sensitive data should be noted in the metadata. The.
Data security and liability were also both addressed. It was agreed that data liability should be covered in the metadata. For all information products, a disclaimer is necessary. As specified in the Study Directives, the Board needs to approve all information products before they can be released to the public. [How this Directive will be applied in practice, and to whom (if anyone) authority will be delegated needs to be resolved.] In terms of data security, the levels of data access must be defined, and specified within the metadata. Systems and mechanisms must be in place to ensure the security of the data. Overall, security for any licensed or proprietary data is the responsibility of the CDN TWG.
The issue of bilinguality of metadata and data was also addressed. The IMS team believed that the IJC policy of producing Study documentation in both English and French, and supporting translation costs, must be adhered to for metadata as well. The rationale for this requirement was that evaluation of the Study at its most basic level – the models that address Performance Indicators (PIs) under the “levels and flows” associated with different regulatory alternatives – requires the ability to evaluate the metadata associated with model inputs and outputs. Given that the majority of these models have a spatial component, it seemed prudent to provide geospatial metadata in both English and French. [Line item costs associated with translation of metadata are included in the evaluation of data discovery alternatives (below).]
While the Policy Break-out Group at the IMS Workshop felt that providing for bilingual metadata was justified, they did not support translation of the attribute information of the data sets themselves. Assuming that overviews of every model and supporting technical documentation will be translated according to IJC policy, the IMS team did not feel that the cost of translating the regional databases themselves was warranted. Given the above assumption, full evaluation of the modeling approach as well as inputs and outputs would be possible without translation of the databases. Moreover, the IJC should not bear costs of making every “new or value-added” geospatial dataset (including both PI-model inputs and outputs) immediately useful to the public, but only those costs necessary for full evaluation of the model approach. Translating metadata fulfills this latter requirement.
Pangaea Information Technologies, Ltd. 28 USACE IJC LOSLR IMS
6.2 Technical
This group agreed that before technical needs could be addressed, interoperability standards needed to be specified. Also, GIS and database guidelines must be conformed to by all TWGs. To assist TWGs in compliance, it would be helpful to create very specific metadata instructions and a set of Frequently Asked Questions for technical issues. Metadata support for the TWGs would also help ensure that standards are met. The group anticipated that bilingual metadata would need to be produced (i.e., one English and one French version).
The group agreed that once standards were created, the technical needs of the project at different points in its life cycle could be addressed. They defined three main phases for the project that affect technical requirements. The first phase is the current stage, which is infancy. At present, FTP and email are serving as the distribution mechanisms and distribution is driven by need/demand. There is a small volume of data and minimal use requirements, which keep the cost low.
The next phase is growth, which is expected to begin in the summer and fall of FY2002, and extend into FY2004 . This phase will be characterized by increased volume of data to manage and an increased interest in data by the TWGs. The FTP site established as a temporary solution for the TWGs (currently managed by Ian Gillespie at CCIW) has accommodated much of the Study’s data access and distribution needs to this point. However, the increased demand for data storage volume, speed of access, and site maintenance is expected to soon exceed this solution’s capacity. Data discovery needs will increase among the TWGs, but not yet for the public. On-line interactive data browsing and mapping would be helpful in addressing many user and TWG needs. During the growth phase, security issues will be clarified. In the growth phase it might become attractive to utilize regional resources such as the IT divisions in Environment Canada Ontario Region and Environment Canada Quebec Region.
The last phase is maturity and includes contains everything after FY2004. At this point, most geospatial data and associated metadata should be complete and housed in a repository. The dissemination system should be operational. The amount of money
Pangaea Information Technologies, Ltd. 29 USACE IJC LOSLR IMS
necessary to run the system will increase with capacity and functionality. At this point the system will be driven by user needs. Mirrored sites may be needed for data distribution, after discovery through GLIN or GeoConnections.
6.3 FGDC (USGS) Cooperative Agreements Program (CAP) Grant Opportunity
Category 4 of the 2002 National Spatial Data Infrastructure (NSDI) Cooperative Agreements Program (CAP) provides an opportunity to acquire additional resources for implementing an information management strategy for the Study. The Joint U.S. and Canadian Spatial Data Infrastructure Project funds projects implementing and demonstrating the ability to address sound community decision-making through the collaborative use, maintenance and sharing of geospatial data over a common geography. FGDC and GeoConnections are collaborating to sponsor one project for the 2002 CAP. The award potential for this program is $75,000 US provided by FGDC and $100,000 Canadian provided by GeoConnections. This proposal requires 100% in-kind matching funds to be provided by the U.S. and Canadian partners respectively.
Prior to the Workshop, discussions with FGDC and GeoConnections provided some guidance for responding to the RFP. The group identified a preliminary set of public and private partners, with the IJC serving as the lead organization for the proposed project, with Roger Gauthier and Wendy Leger as POCs, serving in their capacity as Common Data Needs TWG Co-chairs. Other public sector partners would include the U.S. Army Corps of Engineers, Environment Canada, Ontario MNR, and may include other provincial and state agencies. Pangaea Information Technologies and Great Lakes Commission would constitute the principle U.S. partners. CJS Consulting and Baird & Associates would constitute the principle Canadian partners. [Note: This organizational structure was changed in the actual grant proposal to have the ACE Detroit District and Environment Canada serve as the national Leads, with the IJC serving as a partner to both the US and Canadian groups. See Appendix IV for a summary of the proposed Project.]
Funds from this award would be matched by FY2002 funds already allocated for information management and would allow for greater attention to be given to the design and implementation of an information management system for the Study. Through the thorough documentation of the development process, made more feasible by the
Pangaea Information Technologies, Ltd. 30 USACE IJC LOSLR IMS
additional funding, the Lake Ontario and St. Lawrence River Study and its information management strategy and system design could serve as a model for other bi-national studies, particularly within the Great Lakes region. It should be noted that public outreach and the publicity activities are specified as components of any project receiving CAP funding. The group viewed the CAP project as an excellent compliment to the IM implementation activities recommended here, the PIAG's mission in particular, and the objectives of the Study in general.
The project would require a commitment to producing several specific data layers: these include geodetic control, cadastral, hydrography, elevation (topographic and bathymetric), political boundaries, transportation, ortho-imagery, and shoreline. The group proposed to have metadata stored at and accessed through the Great Lakes metadata clearinghouse (GLINDA) on the Great Lakes Information Network (GLIN). It would be necessary to develop a QA/QC team for review prior to metadata release.
These actions would meet FGDC goals (and our recommendations to the Study Board, below) by promoting, developing, and provide for training on metadata for the CDN and all other LOSLR TWGs. A guidelines manual would be created for reference purposes, and be made available to other groups. The goals of PIAG in promoting access to data and metadata would also be met, as well as the mid and long-term goals of data discovery.
If the grant were to be awarded to proposed participants, it has implications for data management, too. Some decisions would revolve around how the CDN is going to handle data management for the study and possibly beyond the study. Possibilities include a single server, or a distributed network – consisting of servers outside firewalls which are maintained and updated by several agencies, such as LIO, the NYS GIS Clearinghouse, Quebec, Environment Canada, or US ACE, etc. A plan of action was proposed for use on the proposal for data, and consists of dedicating 2-3 servers to the data sharing effort and developing an interface page on the IJC project website which is basically an index of data, including FTPs of maps and provides options for download. The next steps to be taken include database development and connectivity for web mapping services (WMS) and web features services (WFS).
Pangaea Information Technologies, Ltd. 31 USACE IJC LOSLR IMS
Questions that were still open at the end of the meeting were whether enough time existed to apply for the grant, how equipment purchases could be funded through this grant, and whether the proposal would mesh with the IMS Alternatives and Options (below) selected the Study Board. It was decided that flexibility in operations could help manage the CAP timeline and compliance with the Study Board desires, and also that there are other funding opportunities.
6.4 Study Properties and Needs
The IMS Workshop illuminated several new “Study Properties” and “Study Needs”, as well as clarifying or underlining those identified during the Needs Assessment Process (see section 3.3):
Study Properties (and Related Facts/Perceptions)
Data discovery opportunities among (and even within) TWGs are presently limited. While communication and information/data transfer via email and FTP presently meet Study needs, this solution will likely be insufficient within 6 months. Standards-based metadata would be essential to the Study if: 1) Study datasets are going to be made public, 2) the value and longevity of Study datasets are to be preserved, and 3) automated discovery and evaluation tools need to be implemented in the Study. A fully-featured system that provides for data discovery, evaluation, and access for Study Participants has much potential to reduce redundancy and provide for a greater degree of integration among the modeling efforts with individual resource sectors. On-line, interactive mapping tools would aid in data evaluation for both Study Participants and the Public. On-line, interactive mapping tools would enhance transparency and public participation. The IJC does not wish to serve in a data stewardship or distributor capacity after the Study ends. For some datasets, data owners need to be identified by the end of the Study.
Pangaea Information Technologies, Ltd. 32 USACE IJC LOSLR IMS
New data owners likely will provide for the maintenance and distribution of Study datasets only for those political areas with which they are associated. A regionally-distributed storage, maintenance, and access system will likely preserve the value and increase the use of Study data.
Study Needs
Metadata production is a present need. Standards that promote uniformity and interoperability need to be selected, communicated, promoted, and supported. CDN assistance with metadata creation for other TWGs. Support for compliance with standards (e.g., FGDC-1998 for metadata). Data discovery, evaluation, and access for Study Participants is a present need. Data discovery, evaluation, and access for the Public is a future need, but well within the life of the study. Free and open data sharing policy. Address sensitive datasets in policy and system implementation. Include data disclaimers and use restrictions in the metadata. Ensure system reliability and security. Data/information review process to have IJC, Board, or designated party confirm appropriateness for publication (see Annex 4c of the Plan of Study)
Pangaea Information Technologies, Ltd. 33 USACE IJC LOSLR IMS
7.0 PRIMARY ALTERNATIVES AND OPTIONS (& ESSENTIAL SUPPORTING POLICIES AND COSTS)
7.1 Data Discovery
Data discovery and its associated mechanisms and functions provide the means by which information about the existence of data can be obtained. Current data discovery mechanisms require that some form of metadata, data about data, be compiled for each dataset and typically be made available in some organized fashion. The identified needs for data discovery and evaluation come from within the Study, in promoting Study-wide coordination of data, and from the associated desires to actively promote transparency in the Study and encourage public involvement. Geospatial data discovery has grown to become strongly associated with the establishment of spatial data infrastructures (SDI) in federal governments. The United States National Spatial Data Infrastructure (NSDI) and the Canadian Geospatial Data Infrastructure (CGDI) are the spatial data infrastructures which have implemented networks of data discovery clearinghouse nodes. More details of the SDI clearinghouse structure are presented later in this section of the report. A graphical depiction of geospatial data discovery using the SDI approach is presented in Figure 7.1.1 (below).
At the heart of any data discovery mechanism is the metadata from which information can be retrieved about the data. Data discovery is only as effective as the metadata generated for data is complete in content and quality. Metadata standards guide organizations in the creation of “complete” metadata by specifying format and content, and have been created by the FGDC and the International Organization for Standardization (ISO). Given the importance of metadata, not only in data discovery mechanisms but also in generally promoting clarity in data development, many organizations are committing resources to ensure metadata generation for all geospatial data and compliance with metadata standards. The Common Data Needs TWG has identified the need to support a single metadata standard and has committed to the FGDC-1998 metadata standard as that to which all technical working groups should comply. However, currently metadata does not exist for some of the data being used and produced by the study, complicating the ability to discover and cultivate data being used in Study activities.
Pangaea Information Technologies, Ltd. 34 USACE IJC LOSLR IMS
Figure 7.1.1 – Geospatial Data Discovery through the SDI
Study Participants (comprised of all the TWGs, the PIAG, and the Study Board) need the ability to explore within-Study data availability and to learn of the work of other study
Pangaea Information Technologies, Ltd. 35 USACE IJC LOSLR IMS
participants to meet the study mandate to “minimize data redundancy”. Potential benefits of a comprehensive data discovery mechanism include greater transparency in the system and a promotion of open dialog between resource sectors being evaluated. Cross TWG discussion and increased awareness of available (and planned) data sets will promote a more holistic approach to studying the impacts of water regulation schemes and generally creates greater understanding of the inter-related interests and concerns of various resource sectors associated with the Lake Ontario and St Lawrence system. The public’s capacity to discover and evaluate study data, modeling approaches, and results will promote greater public involvement in the Study. Again, the IJC and the LOSLR Study Board have expressed their commitment to public involvement and can, through implementing a data discovery mechanism, promote greater transparency in the Study methods. However, premature release of information and subsequent liability and security concerns make any sharing of information beyond summary information an important consideration. It is therefore necessary to establish clear rules and procedures for developing and publishing information in a public forum. While data discovery is a form of (summary) data distribution, the data being distributed (metadata) hold no significant liability or security concerns that warrant any reluctance in sharing the information. Data discovery simply shares information about the data and processing at a technical level within the Study.
7.1.1 Alternatives
Four alternatives have been identified for addressing the needs for data discovery. Each successive alternative addresses the need for metadata and data discovery to an increasing degree. Multiple alternatives can be conducted concurrently, and can therefore most effectively address different needs at various stages in the life of the Study. Additional options have also been identified that could be implemented in conjunction with the last two alternatives. Details for the alternatives, options, and evaluating criteria are detailed in the paragraphs below.
7.1.1.1 Status Quo
Data discovery performed in the LOSLR Study is currently a function of gleaning information from documents detailing the Study organization and work plans and/or by
Pangaea Information Technologies, Ltd. 36 USACE IJC LOSLR IMS
“word-of-mouth.” In some cases limited information about data being used in the Study is included in such documents; however, the currency and completeness of such information is usually poor. The usability of existing documentation to learn about the data being used and generated in the Study is poor. The information contained in the documents would have to be reviewed in its near entirety to generate a comprehensive list of data; even then the data list would be only as current as when the source text was generated. Details concerning any specific data layers would not be available in the status quo alternative without first identifying the data owner and corresponding with them to inquire about metadata. Even if the data owner can be identified and contacted, the existence of metadata is not required in this alternative. The status quo option would require no additional funding or policy considerations; in essence, this alternative is already implemented. 7.1.1.2 Generating a Data List
The second alternative identified to address the need for data discovery involves generating a tabular list of all data used or generated by the Study. The tabular list would include general information about the data, its use and information about its ownership, maintenance, and distribution. The list would be distributed only to Study Participants. Because the information about the data is not compliant to metadata standards, it is not fit for public consumption. This alternative addresses the immediate need for inter-TWG data awareness in a limited manner, but does nothing to promote transparency and openness of the Study for the public.
Because the data contained in a table format is not parseable with standard metadata parsing engines, the functionality of searching the metadata table is limited to the searching functionality of the application through which the table is displayed. The most significant functionality that the data table alternative fails to support which is supported by other alternatives is the ability to search for data using geographic coordinates as part of the query.
The amount of effort and coordination necessary to generate the table and distribute it to the Study participants is, relative to the following two alternatives, minimal. Hence, this alternative is feasible in the near short-term. Because the information is not parsed or indexed, searching could only be done through basic text string searches on the
Pangaea Information Technologies, Ltd. 37 USACE IJC LOSLR IMS
distributed list. Other than a directive for TWG cooperation to provide brief metadata (name, scale/resolution, extent, data, etc), no additional policy considerations would be required in implementing this alternative. A full list of data being used and generated by the Study could be generated within ~2 months of initiating this alternative. Compiling the information necessary to create a comprehensive data list would require a coordination effort by the Common Data Needs TWG or independent contractor and time spent by Study participants involved in data development and analysis to provide for a comprehensive list. The basis for this list could be the list of dataset titles, grouped by TWG, which is presented in Appendix II.
7.1.1.3 Metadata Catalog
A catalog of standard compliant metadata would provide complete information about the data being used and generated by the LOSLR Study. A collection of metadata files, which can be made available over the Study website, represents a comprehensive list of information about data used in or produced by the Study. When generated in accordance with the FGDC standard, metadata files will include all information required to learn about (i.e., discover and evaluate) the particulars of data. Information regarding the party responsible for data distribution is essential in order to acquire a copy. Information regarding the parties responsible for data maintenance is useful in providing feedback if errors are found, or for technical questions if such should arise. Furthermore, the information contained in the metadata would provide more detail in how the data is being used and/or generated by the Study. This alternative is the first to be fit for public consumption, addressing the need for public involvement and transparency in the Study process. It is important to note that a catalog available on the study website requires the data discoverer to visit in the Study website, thus improving visibility of the Study.
This alternative requires additional funding and a Study-wide commitment for the development and coordination of standard-compliant metadata. Standard-compliant metadata provides a common set of terminology and definitions to document data and allows an organization to maintain the investment made in collecting or generating geospatial data. Primary elements (text sections) of FGDC compliant metadata include: Identification, Data Quality, Spatial Data Organization, Spatial Reference, Entity and Attribute, Distribution, Metadata Reference, Citation, Time Period and Contact
Pangaea Information Technologies, Ltd. 38 USACE IJC LOSLR IMS
Information. These common elements and any specific data elements contained within allow users to determine things like the availability, fitness of use and accessibility of datasets. Commitment to standard compliant metadata will require some kind of metadata review process be implemented. Once reviewed, we expect that most Study metadata will require little or no maintenance through the Study’s duration. The Study’s dedication to metadata has strong implications on the functionality of a data discovery mechanism. Additional options may be selected to further assist in the metadata development and coordination. These options are listed in Section 7.1.2.
Policies in support of publicly discoverable metadata would need to be addressed in this alternative. All data and information produced by the Study should be made discoverable for the Study Participants and the public-at-large through a standard metadata documentation and collection procedure. Data used as model inputs that are not produced by the Study, and are readily available elsewhere, only would need to be cited appropriately in Study documentation. Regardless of the specific data or information restrictions, all metadata should be made accessible. In addition to providing for inter- TWG information discovery, such documentation and procedures will improve the visibility and transparency of the Study for the public, thereby promoting its overall credibility.
The consistency in metadata content and quality are crucial to the successful implementation of a data discovery structure. The development of data and information by any Study participant or contractor should be considered incomplete without compliance to metadata content and quality standards. A standard clause should be included in all contracts related to data and information development, stating that required metadata is to meet all Study approved content and quality standards (see Appendix III for example).
7.1.1.4 Participation in SDI
The fourth and final alternative identified to address the need for data discovery is the participation in the spatial data infrastructures of the United States and Canada. The NSDI and the CGDI are both networks of metadata providers that use a standard search protocol to allow access to metadata through a single data discovery portal. Participation
Pangaea Information Technologies, Ltd. 39 USACE IJC LOSLR IMS
in the clearinghouse networks requires FGDC- or ISO-compliant metadata and a Z39.50 compliant server. Metadata is parsed and indexed when loaded onto the clearinghouse node, facilitating fully functional search capability. Once loaded, the clearinghouse is notified and the metadata is made searchable through the primary data discovery portal. In the case of CGDI, portal services could be accessed from a hyperlink within any website utilizing the Web API developed by GeoConnections.
Because participation in the SDI network requires the implementation of a Z39.50 compliant server, the Study would most efficiently utilize resources by submitting metadata to an agency or organization who has already implemented a clearinghouse node server. At present, the Great Lakes Commission (GLC) is establishing the Great Lakes Information Network Data Access (GLINDA) Clearinghouse, which is to be a clearinghouse node for the Great Lakes region. Participation in GLINDA or other metadata clearinghouse node such as the GeoConnections Discovery Portal would require minimal support by the Study, as the individual metadata files comprising the catalogue would simply be uploaded into the existing node. Once loaded onto a clearinghouse node, metadata would not require reloading unless updated to reflect changes to a dataset or contact information.
An international directory of SDI networks connects the nodes of different clearinghouses to create a world-wide network of metadata clearinghouse nodes. Discovery of the Study data in the SDI alternative can occur from multiple data discovery portals and nodes. This mechanism for data discovery increases the exposure of the Study and has the potential to attract the interest of more individuals than just those who would have otherwise known of or found the Study website.
The cost associated with this alternative would be similar to that of the metadata catalog alternative above, as the primary cost is related to the creation of metadata and support functions. While some support for the SDI node may be appropriate (requested or required), the additional expense would be minimal. As with the metadata catalog option above, the four additional metadata options are applicable for the SDI participation alternative. Policies that support metadata creation, review, and uploading to a SDI clearinghouse, should be included as part of implementing this alternative. These policies would include adoption of the FGDC 1998 metadata content standard, ANSI
Pangaea Information Technologies, Ltd. 40 USACE IJC LOSLR IMS
Z39.50 compliance for server(s) holding the metadata catalog, and promotion of these standards at the contractual level.
7.1.2 Additional Options
Metadata Review Team: To ensure compliance with metadata standards and consistent application of those standards, a metadata review team could be established for the purpose of conducting quality assurance and quality control on metadata as it is generated by the TWGs. The metadata review team could provide guidance in the development of metadata. [See description of “watch-dog” group formed for the Red River Basin Decision Information Network, Section 5.1.]
Metadata Coordinator: A dedicated staff person could be responsible for ensuring metadata compliance to standards and consistency in metadata generated for the Study. As a short-term assignment, the metadata coordinator’s primary functions would be completed within the next fiscal year. Specific functions of the metadata coordinator could potentially include: coordination of all metadata training, providing assistance in metadata development, ensuring completeness of metadata produced, confirm compliance with FGDC 1998 metadata standards.
Metadata Workshop: A metadata workshop held for all study participants involved in the production of metadata could provide for the necessary training and coordination to facilitate creation of standard compliant metadata. Training in metadata generation software could provide a jump-start to the metadata creation process, and reduce the time spent by a Metadata Review Team and/or Metadata Coordinator.
On-line Metadata Development Assistance: On-line development assistance could help TWGs that are generating metadata through simple text instructions and easy to understand manuals. This mechanism could also include a mechanism to direct specific questions to an identified metadata expert (e.g., the Metadata Coordinator), who would be required to provide timely assistance.
Pangaea Information Technologies, Ltd. 41 USACE IJC LOSLR IMS
This option illustrates another means by which the creation of metadata can be facilitated to ensure fully compliant metadata for data layers. 7.1.3 Evaluation of Alternatives
Below are listed the primary criteria by which the above alternatives have been evaluated. A summary of the evaluation can be found in Figure 7.1.2.
Currency: Currency in terms of data discovery is principally a concern of completeness. It addresses the question “Is the information about the data complete and up-to-date?” The Status Quo alternative involving the use of existing Study documents to glean information about data being used and generated in the Study lacks currency. The data defined within the planning documents available to all Study participants via the Study website is incomplete and only as up-to-date as the documents themselves. The three other alternatives would provide up-to-date information as they coordinate the identification of all data and provide the most updated information about those data.
Ease of Discovery by Study Participants: The capacity for study participants to discover data being used and generated by other TWGs is consistent with the Study mandate of minimizing redundancy in data development. Inter-TWG data coordination is required to make the most efficient use of Study resources and to provide the highest level of integration among the resource- specific assessments. Therefore, facilitating data discovery within the Study fosters a more efficient use of Study resources, and an improved result. As the organization of information about data is the principle measure of ease of discovery, the Status Quo alternative fails to provide the necessary ease of discovery. The other three alternatives provide for the required organization of information about data to constitute ease of discovery.
Ease of Discovery by the Public: The ability for the public to learn about data being used and generated by the Study is meant to address the need for transparency in the Study process. By providing a mechanism by which the public is able to learn of the data used in the Study, public involvement and acceptance of the Study results will be enhanced. Because information
Pangaea Information Technologies, Ltd. 42 USACE IJC LOSLR IMS
about data is published for public consumption only when it is standard compliant metadata, the Status Quo and Data Table alternatives fail to provide ease of discovery by the public. The Data Catalog alternative, while an acceptable collection of compliant metadata, would only be available to the public as a single document and would lack the search functionality necessary to constitute ease of discovery. The SDI alternative provides full access and search functionality through the SDI data discovery portal making this the alternative that provides the easiest public access, even to those not familiar with the Study. Fit for Public Consumption (i.e., metadata standard): Unless compliant with FGDC or ISO metadata standards the information about data is not appropriate to present to the public. Only completely compliant metadata will serve the public’s need for data discovery and the Study’s interest to include the public in the data discovery process. Because information about data is published for public consumption only when it is standard compliant metadata, the status quo and data table alternatives fail to provide ease of discovery by the public. The Data Catalog and SDI alternatives both require fully compliant metadata and are therefore fit for public consumption.
Comprehensiveness of Metadata: Comprehensiveness of metadata refers to the completeness of the metadata in terms of metadata attributes. The Status Quo consists of no metadata, and the data table alternative consists of a very limited amount of information describing the data. Both the Data Catalog and the Participation in the SDI alternatives involve fully compliant (i.e., complete) metadata.
Organization of Metadata: Only the Status Quo fails to organize information about the metadata; all other alternatives bring together a list of all the data and information about the data.
Fully Searchable: An important function of any metadata discovery mechanism is the ability to search for information or specific characteristics about the data. The efficiency, flexibility, and thoroughness of the search function is directly related to the successful ability to and ease by which one can use the data discovery mechanism. While all the
Pangaea Information Technologies, Ltd. 43 USACE IJC LOSLR IMS
alternatives, at a minimum, involve digital information in one form or another that is searchable by text string, the SDI is the only alternative in which a search for geospatial data can accommodate geospatial (i.e., by x- and y-coordinates), categorical, and keyword searching.
Increased Exposure to the Study: Unique to the alternative of participating in the SDI, data discovery portals not affiliated with the Study could be used to discover data pertaining to Study activities. In this case, the metadata provided to the user by the data discovery portal will contain information about the Study and who to contact for more information. In such circumstances, an individual interested in data for the region would be made aware of the Study and forwarded to the Study website by the metadata discovered through an unrelated source. This positive externality of implementing a networked data discovery mechanism promotes greater public involvement and awareness of the Study.
Pangaea Information Technologies, Ltd. 44 USACE IJC LOSLR IMS
= poor = fair = good = excellent
Figure 7.1.2 - Evaluation of Data Discovery Alternatives
7.1.4 Costs
Data Discovery Budget Assumptions and Justification
Unless noted otherwise, rates are calculated at $475 per day. This is a blended average rate in which agency staff completes 75% of work at $300 per day and the remainder by contractors at $1000 per day. All costs are for labor, unless otherwise noted. All money is in US dollars. Estimated number of Study datasets is 200. Translation cost for each metadata file is $124.44US. Amounts do not reflect yearly increase of salary and overhead. There are no costs associated with implementing the status quo alternative. Creation of metadata coincides with the completion of the data development and costs of metadata creation are proportionally distributed across the years of the Study in a 40%, 35%, 20% and 5% distribution scheme. Costs associated with metadata development may be reduced in whole or in part by selecting Options 2, 3, 4 or a combination thereof. Support of the SDI Node may be optional.
Task FY2002 FY2003 FY2004 FY2005 Study Per Year Total After Study Alternative 1: None $0 $0 $0 $0 $0 $0 Status Quo Yearly Total $0 $0 $0 $0 $0 $0
Alternative 2: Development $4,750 $0 $0 $0 $4,750 $0 Data List Yearly Total $4,750 $0 $0 $0 $4,750 $0
Alternative 3: Metadata $11,875 $10,391 $5,938 $1,484 $29,688 $0 Metadata Catalog Development
Pangaea Information Technologies, Ltd. 45 USACE IJC LOSLR IMS
Translation $9,956 $8,711 $4,978 $1,244 $24,889 $0 Yearly Total $21,831 $19,102 $10,915 $2,729 $54,577 $0
Alternative 4: Metadata $11,875 $10,391 $5,938 $1,484 $29,688 $0 SDI Participation Development Translation $9,956 $8,711 $4,978 $1,244 $24,889 $0 Support SDI $2,000 $0 $0 $0 $2,000 $0 Node Yearly Total $23,831 $19,102 $10,915 $2,729 $56,577 $0
Option 1: Study $5,700 $4,988 $2,850 $713 $14,250 $0 Metadata Review Participant Team Time Yearly Total $5,700 $4,988 $2,850 $713 $14,250 $0
Option 2: Agency Staff $40,000 $40,000 $20,000 $10,000 $110,000 $0 Metadata Salary Coordinator Yearly Total $40,000 $40,000 $20,000 $10,000 $110,000 $0
Option 3: Participants $0 $0 $0 $0 $0 $0 Metadata Training $500 $0 $0 $0 $500 $0 Workshop Material Yearly Total $500 $0 $0 $0 $500 $0
Option 4: Implementation $4,275 $0 $0 $0 $4,275 $0 Online Metadata Yearly Total $4,275 $0 $0 $0 $4,275 $0 Development Assistance 7.1.5 Recommendations
Alternative 2 is recommended as a short-term solution to the need for data discovery to occur within the Study. In order for study participants to be able to learn of the data being used and generated by other study participant with enough time to allow for integration of data into analysis, a mechanism for data discovery needs to be implemented soon. For the purpose of the study, distribution of a list inventorying data and providing limited details is appropriate to meet the immediate need. The most
Pangaea Information Technologies, Ltd. 46 USACE IJC LOSLR IMS
important element of this alternative is the short amount of time needed to generate and distribute the information.
Alternative 4 is the long-term recommendation for the Study. Requiring the development of metadata standard compliant metadata, participation in the SDI initiative would provide a great amount of exposure with limited development effort given the utilization of existing resources such as GLINDA or GeoConnections Discovery Portal for implementing the metadata clearinghouse node. For the purpose of data discovery, loading of the metadata onto the clearinghouse node involves parsing and indexing which allows for greater search functionality. Information about the study’s data would be stored on the node be available from any metadata clearinghouse portal, thereby increasing the potential exposure of the study to the public. This alternative is consistent with recent initiatives of both governments related to the coordination of geospatial data. By utilizing existing resources and providing fully compliant metadata, the study would be serving to promote the SDI initiatives supported by both the United States and Canadian governments and would at the same time be implementing a fully searchable data discovery mechanism to anyone with interest in the region or relevant data topic.
Options 2,3, and 4 are recommended in support of data discovery. These include: the creation of a staff position dedicated to the coordination of metadata for the Study, conducting a metadata workshop, and providing online metadata development assistance. The three options recommended support the creation of quality metadata, a crucial component of the data discovery process. A single metadata coordinator position would be responsible for ensuring that metadata generated by study participants were consistent and fully compliant to the FGDC metadata standard. The person assigned to this position would be accessible by study participants involved in the creation of metadata and could provide any necessary technical assistance in completing fully compliant metadata. A metadata workshop is another option chosen to support the process of metadata creation. As many organizations have only recently begun the process of generating standard compliant metadata, the expertise required to effectively comply with metadata standards is limited. A one and a half day workshop instructing participants on how to generate fully compliant metadata would improve the overall effectiveness in the study’s metadata creation tasks. The final option is a provision for online metadata development assistance. This option would include any software package or other user driven
Pangaea Information Technologies, Ltd. 47 USACE IJC LOSLR IMS
metadata tutorial that would assist study participants with simple, straightforward questions concerning metadata creation.
Our recommendations are to implement: Alternative 2: Creation of Data List (with brief metadata; a short-term solution) Alternative 4: SDI Participation Option 2: Metadata Coordinator Option 3: Metadata Workshop Option 4: Online Metadata Development Assistance
Total estimated cost of implementing the recommended alternative and options is $73,356US in FY2002 and $176,101.50 thru FY2005.
Policies essential in the implementation of the recommended alternatives and options are: The development of data and information by any Study participant or contractor should be considered incomplete without compliance to metadata content and quality standards (FGDC-1998). A data abstract, for use in data discovery, should be submitted with metadata. A data citation should be submitted with metadata A standard clause should be included in all contracts related to data and information development, stating that required metadata is to meet all Study approved content and quality standards. All metadata should be made available in both English and French. Translation of the datasets themselves is not required.
7.2 Data Storage, Access, and Distribution
A coordinated approach to data access is, at a minimum, essential for TWGs to complete their responsibilities in an efficient manner. Beyond facilitating the work being done within independent TWGs, the approach to data access has other implications for how effectively the Study is able to minimize data redundancy and ensure consistency between datasets and related analyses. The need for a system to facilitate data storage and access was repeatedly expressed in the Needs Assessment process. A data storage
Pangaea Information Technologies, Ltd. 48 USACE IJC LOSLR IMS
and access strategy has further implications on the extendibility of a system in accommodating the application of technologies such as the implementation of web services. Prior to determining how the Study will provide for data storage, maintenance, access and distribution, a clear understanding of the Study’s commitment to facilitating long-term sustainability and public accessibility to data and systems developed and utilized by the Study is needed.
Data being produced or significantly enhanced through the course of the Study will be the property of the IJC in most cases. The responsibility for ensuring the proper maintenance and presentation of data, while held by the IJC, will likely be assigned to the TWG associated with the data. In this scenario, data ownership is held by the IJC and the responsibility of storage, maintenance, access and distribution are assigned to TWGs (or TWG members) serving in a data stewardship capacity. At the fruition of the Study, the IJC may no longer desire or be able to continue in its role as data distributor, and thus be willing to forego its role as data owner. The responsibilities of TWGs will be discontinued, preventing them from serving as data stewards. Therefore, under the current scheme the sustainability of data is probably limited to the life of the Study, after which point it would exist in data archives. The value of the data for use in evaluation of Study results, in facilitating further studies, or for a variety of other uses (some undoubtedly unforeseen at present) continues well beyond the defined life study. Therefore, the need for accommodating long-term sustainability of data and systems should be addressed from the beginning of the information management strategy implementation.
Public accessibility of Study data has significant implications and is most efficiently addressed in concert with providing accessibility to all Study participants and related agencies. Access to data and information utilized and/or produced by the Study should be determined through a rules-based procedure considering the data’s ownership, security, liability, licensing, privacy, and proprietary status and the relationship of the interested party to the Study. The following simple rules for access have come out of discussions and correspondence throughout the information management strategy development process.
Pangaea Information Technologies, Ltd. 49 USACE IJC LOSLR IMS
o All primary Study participants (e.g., Study Board, PIAG, and TWG members) should be given access to all data and information utilized and/or produced by the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns.
o All other interested parties should be given access to any data and information which is considered new or having value added to it by activities of the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns.
o “New data or information” is defined as that which did not exist prior to Study activities and was generated from primary data collection procedures as a direct result of Study activities, i.e., model output or results.
o “Value-added data and information” is defined as that which has been significantly improved as a result of Study activities in either its content or usability, and cannot be readily accessed elsewhere.
o Data and information deemed “sensitive” (i.e., possessing special security, liability, privacy, licensing, or proprietary concerns), should be systematically tracked by the Common Data Needs TWG. The CDN TWG should track all licensing agreements.
These guidelines can be applied in assigning rights and privileges in a coordinated data storage, maintenance, access and distribution strategy. Additional considerations in implementing a data access strategy will include specific liability and/or security concerns associated with the premature release of data to public scrutiny. Careful steps will need to be taken when datasets become mature and public accessibility is addressed. Products of the Study will need to be thoroughly reviewed and any disparate opinions among Study members regarding results or appropriate use should be addressed prior to the data being published. Disclaimers and appropriate use restrictions will need to be presented (e.g., in the metadata), to anyone who wishes to acquire data.
7.2.1 Alternatives
Pangaea Information Technologies, Ltd. 50 USACE IJC LOSLR IMS
Four alternatives have been identified for addressing the needs for data storage, maintenance, access and distribution. While the implementation of multiple alternatives simultaneously was possible for data discovery, the alternatives here are much less compatible, with the possible exception being the temporary “implementation” of an extended status quo to accommodate the short-term needs of the Study during the development, testing, and final implementation of a better alternative. Additional options have also been identified that could be implemented with either of the two more functional alternatives. Details for the alternatives, options, and evaluating criteria are detailed in the paragraphs below.
7.2.1.1 Status Quo
The current data storage and access scheme implemented for the Study allows users (Study Participants) to store and access data in their local environments. While this typically would provide easy access to data by those who are connected to the local environment in which data is stored, access to data by other users normally requires the use of an FTP site, where data is uploaded by the source user and then removed by the destination user. Other mechanisms for data transfer involving various media (e.g., CDs, magnetic tapes, etc.) are also likely being used. The system for data distribution is largely uncoordinated and fails to facilitate data integrity, security, back-ups or archiving. This system includes no active maintenance functionality for individual datasets: incremental changes to parts of a dataset could not be made, and only full replacement would be possible. While the FTP site being managed by Ian Gillespie at CCIW has accommodated much of the Study’s data access and distribution needs to this point, the increased demand for data storage and site maintenance is expected to soon exceed the capacity of this temporary solution. Considerations for public accessibility of data and long-term sustainability of data and systems have not been addressed under the current strategy. No immediate additional costs are associated with continuing with the status quo; however, because the status quo FTP site was intended as a temporary solution, a decision to continue with this strategy will likely require that addition capacity be added in the near future as the demand for its use increases. No additional policy considerations are essential for the persistence of the status quo; however any considerations of public accessibility will require a coordinated Study policy, particularly in a less coordinated storage and access strategy.
Pangaea Information Technologies, Ltd. 51 USACE IJC LOSLR IMS
7.2.1.2 Single Repository
The second alternative identified to address the need for a coordinated data storage, maintenance, access and distribution is the implementation of a single repository for Study data. The repository would exist as single FTP site to which users can be assigned rights and permissions according to their specific information needs. As a single location for all Study data, the repository would allow for much greater coordination of data distribution. Data integrity, security, back-up and archival would be facilitated in a single environment. The repository would be able to accommodate public access to data through providing limited access with read-only permissions or by implementing a webpage with hyperlinks to FTP downloadable files.
While more coordinated than the status quo, a single repository has limited potential for facilitating long-term data sustainability. Data owners and corresponding data stewards with the ability, interest and motivation to ensure long-term data sustainability are likely to be less willing to manage data in a single system (read: national and provincial concerns and legal issues). As with the preceding alternative, this system would preclude the possibility of active maintenance functionality for individual datasets.
This alternative, like the two following it, will require a flexible data security model to be implemented. Such an approach for granting rights and permissions has and can be easily implemented from a technical standpoint without significant effort. (From a policy standpoint, of course, this is not so straightforward.) In addition to managing file security, a common data transfer standard (e.g. SDTS) or de facto standards (e.g. shapefile or .e00) will be necessary to provide consistency across the study. The additional costs associated with the implementation of a single data repository would include the expansion of additional storage volume on a system having ample bandwidth to accommodate the need for data transfer associated with data distribution.
7.2.1.3 Single Data Base Management System (DBMS)
The third alternative identified to address the need for a coordinated data management strategy involves the implementation of a single data storage, maintenance, access, and
Pangaea Information Technologies, Ltd. 52 USACE IJC LOSLR IMS
distribution system. Establishing a single system in which data is loaded and stored in a relational database environment will facilitate the full integration of data into a comprehensive system. A database system in which data is stored in a logical structure will allow for data to be integrated into other systems and accommodate the application of other technologies much more effectively than through using a file structure. The single location will facilitate data integrity, security, back-up and archiving. However, because long-term sustainability is dependant upon the willingness and ability of data owners and stewards to maintain datasets, as with the previous alternatives this one prohibits long-term sustainability by inhibiting regional ownership and stewardship. A single system could potentially alienate the regional partners who are removed from the system location. The long-term sustainability of the single system is tied to the motivation of a single maintainer to manage it beyond the life of the Study.
Policies to provide for appropriate public accessibility would need to be established under the single system alternative. Similar to the single repository, a flexible data security model and standards for data transfer would need to be implemented. Costs associated with the single system alternative include hardware, software, development, training, implementation, and maintenance. Additional options for enhancing functionality represent add-ons to the single system alternative. The three options are listed in Section 7.2.2.
7.2.1.4 IJC Distributed DBMS
The fourth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to the single DBMS described above, but divided and managed by the respective national offices of the IJC in Ottawa and Washington DC. A dual system would be developed and maintained in a consistent and interoperable manner so as to support seamless data access across national jurisdictions. By committing to the development and maintenance of systems managing data for the LOSLR Study by national jurisdiction, the IJC would build an information management infrastructure to support the data management needs of the LOSLR Study, and potentially, future studies. This option offers direct control over almost every aspect of systems development, implementation, and maintenance, without reliance on the coordinated effort of other agencies to form a functional information infrastructure.
Pangaea Information Technologies, Ltd. 53 USACE IJC LOSLR IMS
However, it does not take advantage of the exiting pool of available resources nor the long-term benefits associated with a more distributed, regional approach.
This alternative would require the Study Board’s support to equip the IJC national offices with the necessary hardware, software and expertise required to develop, implement and maintain interoperable geodata management systems. Because this approach requires the development of IM support staff and resources, the cost associated with this dual system is substantially greater than the regionally distributed alternative, which takes advantage of the infrastructure and established knowledge base of other regional organizations. However, while the cost is associated directly with the LOSLR Study’s IM system development, implementation, and maintenance, it could also be considered an investment for future studies and other IJC information management needs. So long as the IJC would choose to support and maintain scalable data management systems, future studies could take advantage of the infrastructure created by this alternative as well as the knowledge base established within the IJC as a result of the development and maintenance of the systems.
Insofar as the IJC commits to maintaining a dual data management system, the long-term sustainability of the system will be provided for. Certainly, for the duration of the study, the IJC has the necessary motivation to maintain the new and value-added datasets that form the basis for many of the Study’s recommendations. But unless access to the datasets is necessary after conclusion of the Study (e.g., for continued public review, or for use in other IJC studies), the IJC would no longer have a direct need for or motivation to maintain the datasets. It will be important for the IJC to weigh this future need when deciding upon which alternative is most suitable: this system would be cost-effective only if other IJC studies can utilize it after FY2005. If future IJC needs would not be met, then the large investment associated with this nationally-distributed alternative would be outweighed by the advantages of less expensive alternatives, as will be discussed below.
7.2.1.5 Regionally Distributed DBMS
The fifth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to the single system
Pangaea Information Technologies, Ltd. 54 USACE IJC LOSLR IMS
described above, but divided and managed at the regional level. The establishment of regional data management systems most effectively addresses the need for regional partners to ensure the longevity of data associated with the Study. As with data owners, regional system maintainers would need to be identified just as data owners would. This data management model is the most progressive, and is endorsed and promoted by the public sector (FGDC, GeoConnections), private sector (CubeWerx, Inc.), and NGOs (OpenGIS Consortium).
The regionally distributed systems would be developed in a coordinated effort to ensure maximum consistency in system implementation and maintenance. As described above for the single system alternative, a regionally distributed system would take advantage of relational database environment to manage the structure of the data store. Again, this facilitates greater integration and connectivity to other systems, and can more easily accommodate other technologies such as web services. At a minimum, interoperability standards would be specified (and need to be adhered to!).
Preliminary evaluations of regional resources (introduced in Section 4.1), which could facilitate system development and ensure system reliability, have identified the following agencies and organizations. For the United States region of the Study area, the New York state options preliminary investigations revealed no system or organization with the capacity to provide for the information management needs of the Study. The GLC may soon secure non-Study funding to support the development of a database-driven information management system at the University of Michigan, which could serve as a major component of the LOSLR–US Region Information System. This system is being designed with the scalability to provide similar information management services for other studies within the Great Lakes region.
For the Ontario region of the Study area, LIO and Binational.net have been identified as potential regional partners. LIO is a fully functional information management system running on a database environment and is able to support WMS and is developing support for other OWS. Binational.net is being developed by EC-Ontario Region and the US EPA in order to accommodate binational programs conducted between those agencies. While a possible option, Binational.net is less developed than LIO and may introduce bureaucratic challenges in implementing a flexible system to accommodate the
Pangaea Information Technologies, Ltd. 55 USACE IJC LOSLR IMS
Study’s information management needs. For the Quebec region of the Study area, ECQR has been identified as the forerunner for implementing an information management system to accommodate the Study’s needs. Presently in the implementation stage of a database driven information system behind their firewall, knowledge gained from this experience would facilitate the development of the system the Study outside of the firewall, as well as provide a knowledge base for development of the GLC DBMS.
While all three systems in a regionally distributed information system will have separate administration, development should concentrate on consistency to ensure a common approach to data storage, maintenance, access, and distribution. In addition to addressing seamless system development and implementation, data held in the systems will be clipped to a common boundary and/or need to be made seamless in order to facilitate the overall consistency of the Study data. The costs associated with establishing a regionally distributed information management system for the Study include hardware, software, development and implementation. System development for the regionally distributed information management system will require additional time in comparison to the single system development to accommodate the additional coordination of effort and system implementation. Options associated with this alternative are identical to those listed for the single system alternative but with additional considerations due to the distributed nature of this alternative. Because the Study data is distributed across three servers, it will be necessary to implement middleware on each of the regional systems to allow a single web service to utilize all three stores of data. Implementation costs associated with the middleware would approximately double those of the middleware option in the single system alternative.
7.2.1.6 TWG Distributed DBMS
A final alternative that should be considered to address the need for coordinated geospatial data management involves implementing data systems similar to the “Single DBMS” described above, but with components distributed among TWGs. This approach has several advantages. However, these are confined largely to activities that will take place during the duration of the Study.
Pangaea Information Technologies, Ltd. 56 USACE IJC LOSLR IMS
In the TWG Distributed DBMS alternative, maintenance responsibilities would be determined by whether: 1) the system resources were acquired by the TWG specifically for its work for the Study, or 2) the resources were on loan from a participating agency. In the scenario in which resources are on loan from a participating agency, the system resource management responsibilities likely with that agency, while data management responsibilities exist with the TWG in whatever arrangement they deem reasonable. In the scenario in which resources are acquired by the TWG specifically for implementation of the system, the TWG is responsible for both maintaining the system and data, and for finding a suitable location for their system. At present, the latter scenario is likely for all TWGs with the exception of Hydrologic & Hydraulic, and Coastal.
A geodata management system distributed among TWGs would place the data and system in relatively close association with the data developers and initial data users. As such, reliable access and control over the data has the potential to increase the overall motivation required for system upkeep during the Study. Moreover, because the system and geodata would be managed by that data’s primary user-group, data currency and integrity should remain up-to-date. At the conclusion of the Study the data and system resources acquired by the Study would be transferred to the IJC; resources on loan from a participating agency would be released and data relying on those resources would be transferred to the IJC, or identified data owners whenever possible. Because this approach includes datasets that encompass international and provincial boundaries, unlike the “Regionally Distributed DBMS” alternative, securing data owners with the motivation to provide for database maintenance beyond the Study’s terminus could prove problematic.
Data transfers between TWGs will rely heavily on the ability for other Study Participants to connect to and access data from the distributed network of servers. Similarly, for Study Participants or the public to simultaneously access multiple geospatial databases that are distributed among TWG servers for interactive data viewing or map-making (WMS-related activities), interoperability is essential. In this alternative, the development and exact system configurations of the TWG systems have the potential to vary considerably, based on the specific demand and resources available. Thus, inasmuch as possible, each system component (typically one per TWG) should be standardized for consistency across the Study. Insofar as the systems are standardized,
Pangaea Information Technologies, Ltd. 57 USACE IJC LOSLR IMS
interoperability and the potential for providing connective features such as web services would be promoted. To this end, a specific system design and configuration (i.e., “standard build”) including network operating systems, middleware, software, and database structures should be developed and adopted.
This alternative would require the Study Board’s support through the allocation of funding required to implement a large network of distributed systems, one for each individual TWG. Without specifying design criteria here, it should be stated that each TWG would need a server for data accessibility (both inter-TWG and to the Public), and a workstation with software capable of geospatial data management and production purposes. As with the previous two alternatives, a database (DBMS) rather than file system approach is recommended. Cost estimates for this alternative (below) reflect these specifications.
Cost estimates do not reflect the unique problems associated with the Lake Ontario – Upper St. Lawrence River Coastal Data Server (CDS). Although the exact status of implementation is unknown at the time of this report, a file system approach was recently proposed. Although the CDS could be used within this Alternative, hardware and software would need to be standardized to the greatest extent possible with the other TWG servers, including a database management system (DBMS) rather than a file system structure. This type of dilemma, (starting work v. waiting for an IM strategy and specifications) points to the need of IMS efforts addressed earlier in the Study life cycle.
7.2.2 Additional Options
Interactive Data Viewing and Map Making (i.e., WMS Capability): OGC Web Services is a set of web-based services developed by the OGC to facilitate the open source application of mapping (WMS) and GIS functionality (WFS) over the Internet. Implementation of OGC Web Services will require a connection to the database in which data layers are stored. Costs associated with implementation of this option include program development to customize the services for use with the Study’s data.
Support Proprietary Internet Mapping Services:
Pangaea Information Technologies, Ltd. 58 USACE IJC LOSLR IMS
More highly developed than the OGC Web Services, proprietary Internet mapping services can provide more robust functionality, i.e., involving geospatial operations such as overlaying or proximity analysis. The largest disadvantage with implementing proprietary Internet map services is the cost associated with purchase and licensing.
Implementation of a Database Middleware: Middleware is used to connect applications to database management systems (DBMS) environment. The implementation of middleware in the single system alternative would allow for remote access to data layers stored on the system from desktop GIS applications or web services located on different servers.
7.2.3 Evaluation of Alternatives
Graphical depictions of the flows of information between modeling groups (each associated with a particular TWG) and DBMS servers, and how alternatives 3, 4, 5, and 6 differ in this regard are presented as Figures 7.2.1 (a-d).
Below are listed the primary criteria by which the above alternatives have been evaluated. A summary of the evaluation can be found in Figure 7.2.2.
Interactive Data Viewing and Mapping (i.e., WMS Capability): Data viewing is the ability to simply look at the spatial data on-screen through a web mapping viewer. A web mapping service (WMS) produces maps from data located in a structured data store as images in an Internet-enabled environment. The structure of the data storage system dictates the ability of that system to accommodate WMS implementation. The Status Quo and Single Repository alternatives lack the necessary structure in the file system to accommodate WMS. All the other alternatives provide the structure required and therefore the capability to support WMS. The ability of the two distributed alternatives to support WMS relies heavily on interoperability and the consistent implementation of standards in their disparate systems. Systems implemented using different technologies hinder the interoperability required in providing cohesive data viewing and mapping.
Pangaea Information Technologies, Ltd. 59 USACE IJC LOSLR IMS
Figure 7.2.1(a) - Flow of information between modeling groups and Single DBMS server in Alternative 3.
Pangaea Information Technologies, Ltd. 60 USACE IJC LOSLR IMS
Figure 7.2.1(b) - Flow of information between modeling groups and IJC Distributed DBMS servers in Alternative 4.
Pangaea Information Technologies, Ltd. 61 USACE IJC LOSLR IMS
Figure 7.2.1(c) - Flow of information between modeling groups and Regionally Distributed DBMS servers in Alternative 5.
Pangaea Information Technologies, Ltd. 62 USACE IJC LOSLR IMS
Figure 7.2.1(d) - Flow of information between modeling groups and TWG Distributed DBMS servers in Alternative 6.
Pangaea Information Technologies, Ltd. 63 USACE IJC LOSLR IMS
Capacity for WFS: The web feature service (WFS) can support more robust functionality in geospatial services than can WMS, but the technology is less well developed. A WFS allows for greater flexibility in developing web-based custom processes with GIS data. As a burgeoning technology, WFS considerations should be considered if future information management and system development is desired. The structure of the data storage system dictates the ability of that system to accommodate WFS implementation. The Status Quo and Single Repository alternatives lack the necessary structure in the file system to accommodate WFS. All the other alternatives provide the structure required and therefore the capability to support WFS. As with WMS, the ability of the two distributed alternatives to support WFS relies heavily on interoperability and the consistent implementation of standards.
Potential for Long-term Sustainability of Data: “Long-term sustainability” of data is directly related to the ability to identify new owners willing to take responsibility for data beyond the life of the study. It is unlikely that data owners can be identified who are willing and able to ensure long-term sustainability of data and data access within a system housed in a region different than their own. This does not preclude a regional authority’s acceptance of a single Study- wide DBMS; however, an agency’s or organization’s familiarity with systems managed within another agency or organization of their own region increases the likelihood of identifying appropriate data owners. Potential for regional ownership and stewardship indicates an ability to identify a committed, regional level agency or organization. The activities (and data) associated with regional agencies are controlled by the policies dictated by regional authorities. Thus, the feasibility of implementing a regional ownership and stewardship arrangement is directly tied to the acceptance of such a system by these regional authorities. Acceptance by the regional authorities is more likely to occur in an environment familiar to them in which some manner of control can be exercised. The Regionally Distributed DBMS is the only alternative that completely supports the potential for regional ownership/stewardship and therefore long-term sustainability of the data. The Status Quo alternative lacks the organization required of any ownership/stewardship scheme that would support long-term sustainability of the
Pangaea Information Technologies, Ltd. 64 USACE IJC LOSLR IMS
data. The other three alternatives could, but are unlikely to, support an ownership/stewardship scheme that provides for long-term sustainability of the data.
Consistency of the System: The consistency of the system indicates the Study’s ability to manage data and provide for access in a way that is seamless in method and content. The consistency of the system is not an evaluation of the seamlessness of the data, but instead it evaluates the amount of work it would take to display the seamless data in an appropriate manner. The Status Quo alternative lacks the coordinated organization to be considered consistent. The distributed alternatives have the potential to support consistency across the network of disparate systems. The Regionally Distributed DBMS approach has better potential for system consistency than the TWG Distributed DBMS approach because of the fewer total number of systems to coordinate. This distinction highlights the problem created by trying to coordinate disparate systems that are technically complex. The evaluation of consistency in the distributed DBMS alternatives reflects the effort needed to ensure the appropriate level of interoperability between the various components of the system. The IJC Distributed DBMS approach has better potential for system consistency than the Regionally Distributed DBMS approach for two reasons: in the former approach, there would be one fewer DBMS to coordinate, and the two systems would be controlled by a single authority (i.e., the IJC). The Single Repository and Single DBMS alternatives have the greatest degree of consistency because they are solitary and isolated approaches, not relying on the coordination technical complexities. The ultimate test of consistency in an information management system for the LOSLR Study, particularly in the regionally distributed approach, would be to evaluate the results of an information request encompassing the New York – Ontario – Quebec border area. Failure to return a consistent result would indicate potentially significant discrepancies in the system design and/or content that could affect the effectiveness of the Study in general.
Ease of Accessibility by Study Participants: The ease of accessibility by Study Participants is necessary to ensure the efficient use of study resources and is one of the most critical of the evaluation criteria. Without a reliable system for ensuring Study Participants’ access to information necessary to complete their responsibilities, the information management strategy would be a failure. While the Status Quo allows for access to data primarily on a by-request basis, the other
Pangaea Information Technologies, Ltd. 65 USACE IJC LOSLR IMS
alternatives all provide a mechanism for Study Participants to transfer data within the Study, though the Single Repository alternative does so in a relatively cumbersome manner.
Ease of Accessibility by Public: The Public’s ease of accessibility has significant implications on the public’s acceptance of the Study and the credibility of its results. Furthermore, the value of data utilized and generated by the Study may warrant its provision to the public if only as a service provided by the Study. While the Status Quo provides no mechanism by which the public can access to data, the other alternatives all provide some mechanism in support of public accessibility to Study data. Again, the Single Repository alternative does so in a relatively cumbersome manner.
Foster Study Transparency and Facilitate Public Involvement: Directly related to the ease of access by the public, fostering Study transparency is intended to allow the public to become more involved in the Study and knowledgeable of its methods and results. Study transparency is necessary throughout the life of the Study, and possibly for some time thereafter, to ensure the credibility of the Study. Study transparency is promoted across the alternatives as the range of functionality and extensibility of the various alternatives increase. Because of the TWG Distributed DBMS alternative’s complexity, it is evaluated as having lower potential in promoting transparency in the Study than the IJC and Regionally Distributed DBMS alternatives.
Provide Model for Other Organizations and Studies: The IJC LOSLR Study has the opportunity to provide an example for other Studies of comparable scope to use as a template for the design and implementation of an information management system that utilizes information technology to advance the efficiency, effectiveness, and public participation in a public sector study. In essence, this criterion is an aggregate evaluation of the potential support of the functionality and extensibility of the alternatives.
Long-term Sustainability of the System after the Study:
Pangaea Information Technologies, Ltd. 66 USACE IJC LOSLR IMS
Similar to the concern of long-term sustainability of data, sustainability of the Study information management system will be determined by the willingness of the system maintainer(s) to keep the system up after the conclusion of the Study. The utilization of the Study by the public will warrant the necessity of maintaining the system beyond the life of the Study, as will the investment of public funds in the system. The Status Quo and TWG Distributed DBMS alternatives fail to coordinate data or functionality at any level greater than the TWG of the Study. The Single DBMS is the most coordinated alternative, but is evaluated as poorly supporting long-term sustainability after the Study because of the limited potential for a politically and technically feasible scheme for long-term ownership/stewardship of the data given the orientation and complexity of the system. The Single Repository alternative is evaluated more favorably because the long-term sustainability of the system would require minimal work given the relative simplicity of supporting an FTP site versus a DBMS. The Regionally Distributed DBMS alternative is evaluated most favorably because it directly accounts for and incorporates the interest of organizations and agencies required to support long-term sustainability. Proper evaluation of the IJC Distributed DBMS alternative depends on the IJC’s commitment to maintain such systems in the long-term. Given the investment necessary for developing and implementing an IJC Distributed DBMS approach, it is likely that the IJC would plan on sustaining the system for some period of time after the completion of the LOSLR Study.
Potential for Study-wide Backup and Archiving: As a function of the Study’s data management needs, data backup and archival is necessary to protect Study resources. Much of the data being used by the Study will not need to be made easily accessible after a particular phase is complete; however, a record must be maintained. The principle measure for evaluating this criterion is the whether an organized collection of the data exists. In the Status Quo alternative no organized collection exists and therefore the potential for backup and archiving is not supported. In all the other alternatives some form of backup and archival can be supported in either a single or distributed fashion.
Implementation Alternatives Status Single Single IJC Regionally TWG
Pangaea Information Technologies, Ltd. 67 USACE IJC LOSLR IMS a i Quo Repository System Distributed Distributed Distributed r e t i r System System System C n
o Data Viewing i t a
u (i.e., WMS Capability) l a v
E Capacity for WFS
Potential for Long-tern Sustainability of Data Consistency of System
Ease of Accessibility by Study Participants Ease of Accessibility by Public
Foster Study Transparency and Facilitate Public Involvement Long-term Sustainability of the System after the Study Potential for Study-wide Backup and Archival Provides Model for Other Organizations and Studies Time to Delivery
Cost
= poor = fair = good = excellent
Figure 7.2.2 - Evaluation of Storage, Maintenance, and Access Alternatives 7.2.4 Cost
Data Access, Maintenance and Distribution Budget Assumptions and Justification
Pangaea Information Technologies, Ltd. 68 USACE IJC LOSLR IMS
Unless noted otherwise, rates are calculated at $825.00 per day. This is a blended average rate in which agency staff complete 25% of work at $300.00 per day and contractors complete 75% of the work at $1000 per day. All money is in US dollars. Amounts do not reflect yearly increase of salary and overhead. There are no costs associated with implementing the status quo alternative. The specifics of a licensing arrangement with LIO remain unknown and have not been included in the budget information. The Workstation / Arc8 Bundle include annual licensing fees. Implementation costs associated with Options 1, 2 and 3 include design, pilot, installation, testing and training phases. This cost for Option 3 is representative of a single DBMS and would need doubled in a regionally distributed DBMS approach. The cost under the TWG DBMS approach would increase according to the number of servers implemented in the system. Administrative operating costs associated with Options 2 and 3 are proportionally distributed to reflect the updating and maintenance necessary with addition of data as it is completed. The 40%, 35%, 20% and 5% distribution scheme is applied to these costs.
Task FY2002 FY2003 FY2004 FY2005 Study Per Year Total After Study Alternative 1: None $0 $0 $0 $0 $0 $0 Status Quo Yearly Total $0 $0 $0 $0 $0 $0
Alternative 2: Additional $800 $0 $0 $0 $800 $0 Single Repository Storage on Server Installation and $9,900 $0 $0 $0 $9,900 $0 Development Administrative $9,900 $9,900 $9,900 $9,900 $39,600 $9,900 Operating Yearly Total $20,600 $9,900 $9,900 $9,900 $50,300 $9,900
Alternative 3: Server (w/ $40,000 $0 $0 $0 $40,000 $0 Single DBMS software)
Pangaea Information Technologies, Ltd. 69 USACE IJC LOSLR IMS
Development $48,000 $0 $0 $0 $48,000 $0 Administrative $40,000 $40,000 $40,000 $40,000 $160,000 $40,000 Operating Yearly Total $128,000 $40,000 $40,000 $40,000 $248,00 $40,000 0
Alternative 4: United States IJC Office – Washington D.C. IJC Distributed Server (w/ $40,000 $0 $0 $0 $40,000 $0 DBMS software) Development $48,000 $0 $0 $0 $48,000 $0 Administrative $40,000 $40,000 $40,000 $40,000 $160,00 $40,000 Operating 0 Canadian IJC Office - Ottawa Server (w/ $40,000 $0 $0 $0 $40,000 $0 software) Development $48,000 $0 $0 $0 $48,000 $0 Administrative $40,000 $40,000 $40,000 $40,000 $160,00 $40,000 Operating 0 Yearly Total $256,000 $80,000 $80,000 $80,000 $496,00 $80,000 0
Alternative 5: United States Regionally GLC-UM System Alt. $0 $0 $0 $0 $0 Distributed Funding DBMS Development $29,700 $0 $0 $0 $29,700 $0 Administrative $19,800 $19,800 $19,800 $19,800 $79,200 $19,800 Operating Ontario LIO $0 $0 $0 $0 $0 $0 Development $4,125 $0 $0 $0 $4,125 $0 Administrative $19,800 $19,800 $19,800 $19,800 $79,200 $19,800 Operating Quebec Mirror Server (w/ $40,000 $0 $0 $0 $40,000 $0
Pangaea Information Technologies, Ltd. 70 USACE IJC LOSLR IMS
software)
Development $19,800 $0 $0 $0 $19,800 $0 Administrative $9,900 $9,900 $9,900 $9,900 $39,600 $9,900 Operating Yearly Total $143,125 $49,500 $49,500 $49,500 $291,62 $49,500 5
Alternative 6: Servers $10,000 $0 $0 $0 $10,000 $0 TWG Distributed Workstation / $11,000 $1,100 $1,100 $1,100 $14,300 $1,100 DBMS Arc8 Bundle Development $19,800 $0 $0 $0 $19,800 $0 Administrative $9,900 $9,900 $9,900 $9,900 $39,600 $9,900 Operating Individual TWG $50,700 $11,000 $11,000 $11,000 $83,700 $11,000 Sub-total Yearly Total $304,200 $66,000 $66,000 $66,000 $502,20 $66,000 (w/ 6 systems) 0
Option 1: Implementation $12,000 $0 $0 $0 $12,000 $0 Data Viewing and Administrative $0 $10,500 $6,000 $1,500 $18,000 $0 Mapping (WMS) Operating Yearly Total $12,000 $0 $0 $0 $12,000 $0
Option 2: Software $12,000 $1,400 $1,400 $1,400 $16,200 $1,400 Proprietary Implementation $18,000 $0 $0 $0 $18,000 $0 Internet Map Administrative $0 $15,750 $9,000 $2,250 $27,000 $0 Services Operating Yearly Total $30,000 $1,400 $1,400 $1,400 $61,200 $1,400
Option 3: Implementation $5,775 $0 $0 $0 $5,775 $0 Middleware Administrative $0 $1,000 $1,000 $1,000 $3,000 $1,000 Operating Yearly Total $5,775 $0 $0 $0 $8,775 $0
Pangaea Information Technologies, Ltd. 71 USACE IJC LOSLR IMS
7.2.5 Recommendations
For the short-term we recommend that the Status Quo (Alternative 1) remain in place during the development of the Regionally Distributed DBMS (Alternative 5). As mentioned, the Status Quo solution will soon become unable to accommodate the needs of the Study because of the anticipated increase in data and demands for that data. An extension of the Status Quo to accommodate the needs of the study while the Regionally Distributed DBMS is being developed has been offered by André Plante at ECQR. With the understanding that their resources would only be required in the short-term, until a distributed system is fully operational, ECQR would provide for the necessary capacity to accommodate the study’s short-term data transfer needs. A spatial data would be managed in the ECQR development DBMS using the model that would be developed for the regionally distributed network of servers. Use of the ECQR DBMS would be available until the time that the data outgrows the ECQR system or the regionally distributed DBMS is implemented. An additional benefit in the use of the ECQR DBMS to satisfy the short-term data storage and transfer needs of the study, is that the amount of backfill required in implementing the ECQR node of the regionally distributed DBVMS would be limited since the data would already be in the database.
The Regionally Distributed DBMS approach to implementing a study-wide information management system would best provide for the long-term sustainability of the data used and generated by the study. Data ownership and stewardship responsibilities will need to be assigned to willing agencies with the necessary interest and motivation to maintain the information. Most agencies willing to perform in this capacity will be found at the regional level and would benefit from having the data warehouse located within known geopolitical boundaries. By its design, the Regionally Distributed DBMS approach would support the development of web-based mapping and feature services. This regionally-distributed, yet integrated, approach would accommodate a study-wide information management solution that would most effectively support the long-term sustainability of the information used and generated by the study.
Options 1 and 3 are both recommended. They will support the regionally distributed approach by promoting the extension of a basic storage and distribution system to
Pangaea Information Technologies, Ltd. 72 USACE IJC LOSLR IMS
accommodate greater flexibility and functionality. Support for and eventual implementation of an OGC-compliant web-based data viewing and mapping service adds to the overall accessibility and visualization of the study information. Provided as a web service, the mapping functionality would be available to Study Participants and the public. As a technical prerequisite to a completely integrated study-wide system, the implementation of middleware to support the accessibility to distributed data stored by distributed applications would extend the potential functionality of the regionally distributed storage system.
Our recommendations are to implement: Alternative 1: Status Quo (for short-term) Alternative 5: Regionally Distributed System Option 1: Data Viewing and Mapping (WMS) Option 3: Middleware ($5775US times 2 = $11,550)
Total estimated cost of implementing the recommended alternative and options, given the regional partners identified in 7.2.4, is $166,675US in FY2002 and $312,175 thru FY2005.
Policy essential in the implementation of the recommended alternative and options is:
All primary Study Participants (e.g., Study Board, PIAG, and TWG members) should be given access to all data and information utilized and/or produced by the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns.
All other interested parties should be given access to any data and information which is considered new or having value added to it by activities of the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns.
New data or information” could be defined as that which did not exist prior to Study activities and was generated from primary data collection procedures as a direct result of Study activities, i.e., model output or results.
Pangaea Information Technologies, Ltd. 73 USACE IJC LOSLR IMS
“Value-added data and information” could be defined as that which has been significantly improved as a result of Study activities in either its content or usability.
Data owners, and especially data stewards, should be identified as early as possible preceding the end of the Study.
7.3 Document and General Information Management Tools
The preceding two sections have addressed alternatives and options for database management and with an emphasis on geospatial data. Study Participants (e.g., TWG and PIAG members, and the Study Board) also likely will need to share a considerable and increasing amount of aspatial information both internally, and to some extent with the Public. This will include status reports, technical documentation, meeting minutes, memos, messages, contact information, calendars, etc. Given the distributed nature of the Study Participants within and beyond the Study region, and the needs of the Public, providing for web-enabled information management and presentation tools is clearly desirable. Preliminary investigation of the web services provided to the Study indicate that the current Study website (accessed via www.ijc.org) and consultants could accommodate the alternatives that follow.
This section will introduce and evaluate alternatives and options for the management of documents and other aspatial information, and make preliminary strategy recommendations. Because a Study-wide Communications Strategy has not yet been developed, making fully informed recommendations regarding Study-wide aspatial information management is difficult. Note that other IM tools that address needs identified during interviews with the PIAG (e.g., automated web-based user feedback and resource user/provider survey forms with back-end databases) are described in Appendix V.
7.3.1 Simple Web Site Approach
7.3.1.1 Hierarchical Web Page Structure
Pangaea Information Technologies, Ltd. 74 USACE IJC LOSLR IMS
Perhaps the least complex method of providing web-based access to documentation is to simply build the documents into a hierarchical structure within a web page. While the exact structure of such a method will vary depending on information content and overall web page design, in general a user will be presented with an “index” or “contents” page that will list the titles of the documents available. The list might be arranged by topic, or perhaps geographical location. Alternatively, an initial list of topics might be presented on an initial, higher-level page, with hyperlinks to a set of lower-level pages (classified according to topic or geography) with lists of specific documents.
An example of this is found in Figure 7.3.1 for the Lake Michigan Potential Damages Study being conducted by the US Army Corps of Engineers. In this case, a “Document Clearinghouse” presents the user with a list of available documents along with a brief description. The documents are made available on the web server in various formats including Word, Adobe PDF or HTML. Users can click the appropriate link to either view or download the document directly.
Figure 7.3.1 - LMPDS Document Clearinghouse Contents Page (http://huron.lre.usace.army.mil/coastal/LMPDS/documents.htm)
Pangaea Information Technologies, Ltd. 75 USACE IJC LOSLR IMS
Developing or updating a document site such as this requires a number of basic steps. First, the document has to be stored on the web server. This can be in its original format (e.g., Word) or, as with many web documents, in Adobe PDF format. This will require the conversion of the original document and then the upload of the PDF document to the web as well. If simple HTML format is required, the document will also need to be converted. Next, the “contents” page has to be updated. This has to be done manually for each new document added to the site. Since there is no database or database index being used, the only way for a user to know what documents are available is for the web administrator/developer to add the new link to the contents page so that it is visible to the user the next time they visit the site. While such a process is relatively simple, it could prove cumbersome should there be a large number of documents required to be listed, or should frequent updates be required.
An advantage of this method is that no complex computer code is required, nor is complex database software required. The index page can be done in simple HTML and the documents can be stored on the web server in a standard file directory structure if needed for organizational purposes. They can also be stored in their native format, or Adobe PDF format. Some disadvantages are that each time a document is added to the web server, a manual change must be made to the web page providing a hotlink to that document. Also, if the web site is hosted on an Internet Service Provider (ISP) server, there may be limitations as to the total storage allowed, or on the amount of information that can be transferred in any given month. Large organizations running their own web servers would not be subject to this type of limitation. For organizations with a large number of documents, and in a variety of formats, this method may not be very efficient. For smaller numbers of documents, as well as for sharing of smaller documents such as minutes, memos, and short documents, this method can be both efficient and cost effective.
Cost The cost to develop this type of functionality can be wrapped into the cost for an overall web design and will vary depending on the number of documents anticipated to be placed on the site. For document conversion to PDF format (where required) a unit cost can be developed. This would only apply to very complex documents as MS Office integrates nicely with Adobe Acrobat (Version 4 and up) and documents (Word, Excel and
Pangaea Information Technologies, Ltd. 76 USACE IJC LOSLR IMS
PowerPoint) can be converted to PDF format at the click of a button. Should Adobe Acrobat be required for PDF document creation and conversion, it currently retails for ~$250.00US.
7.3.1.2 Optional Web Tools
7.3.1.2.1 HTML Text Search
One of the most basic methods for searching for information within a web site or series of related web pages is a basic text search. This allows users to use a search form or box to search a web site for specific words or phrases. When site visitors enter text in a “search form” they will get a list of web pages and other documents within that site that contain content matching their search criteria. This list is usually “hot-linked” so that users can then go directly to that page. An example of a basic search form is found in Figure 7.3.2. In this example the phrase “revetment” is entered into the search box. When the user clicks on the “Go” or enter key, a list of page titles or documents having those words within them is provided (Figure 7.3.3). These titles are “hot-linked” and the user can then go directly to that page by clicking on the title.
Pangaea Information Technologies, Ltd. 77 USACE IJC LOSLR IMS
Figure 7.3.2 - Basic Search Form Example
Search boxes or forms can be easily incorporated into an existing web site. Most web creation programs (e.g., MS Front Page) have automated forms that contain the required HTML code (or other scripting language) that can be inserted directly into the page, which can then be customized with text, copyright information, button labels, and so on.
Pangaea Information Technologies, Ltd. 78 USACE IJC LOSLR IMS
Figure 7.3.3 - Query Results Page
How Search Forms Work In this type of search capability, web creation programs (e.g., MS Front Page) will generate a text index (basically a simple database file) based on all words contained in the web pages and documents making up the site (excluding the approximately 300 most common English words, such as "a," "the," and so on). This text index file is stored on the web server when the web site is “published” to the server. When a user fills in the search form and clicks the submit button, this sends a message (request) to the server (via the computer code embedded in the search form) to search the text index file for any records containing those words and to then send those back to the user in a list or a table. The script or code that is embedded in the search form also governs the way the list or table is presented back to the user. As mentioned, web page creation programs have simple automated search forms that can be used very easily. Custom search forms can also be created if required, using other scripting languages (e.g. JavaScript, CGI, PERL)
Pangaea Information Technologies, Ltd. 79 USACE IJC LOSLR IMS
Cost Cost to design this type of search function within a web site is very low and should be within the basic design costs of the web site. Web designers can make use of automated scripts and search forms built into most of the industry standard web design programs and as such, additional programming costs should not be required. Increased costs may arise, however, in ensuring that the web server that is storing the web site can support the type of scripts (or computer code) that is used in the search forms. For example, if designing web sites with MS Front Page (bundled with MS Office Suite, retail cost ~$600-800US for full version), the web server will need to support and utilize MS Front Page Extensions (essentially small bits of computer code) in order for capabilities such as the search function to work properly.
7.3.1.2.2 Metadata for aspatial information
An alternative method of providing web-based access directly to data and information is through a metadata search engine. As described in the preceding section focused on geospatial data, metadata for aspatial data is essentially “data about data” – it provides a comprehensive description of the data or document that is available, including information on its format, source, location, etc. A metadata search engine does not provide a direct link to the actual data or document, but instead returns the metadata file(s) meeting the search criteria. Users can then view the metadata to see if the document is in fact what they are looking for. In most cases, the metadata will then provide a link that will allow the user to go directly to the data or document for viewing or download.
Metadata search engines work like most other web based search engines. Metadata is stored in a database or “catalog” that resides on a web server. Users fill in a standard search form that then queries the catalog and a list of metadata matching the search criteria is returned to the user.
A key advantage of a system of this type is that the organization hosting the metadata catalog does not also have to store and archive all the associated data. The data can be
Pangaea Information Technologies, Ltd. 80 USACE IJC LOSLR IMS
hosted by its original developer, or parent organization, who can then provide more direct control and, if necessary, maintenance. In addition, templates can be set up to allow organizations to post new metadata to the site when they develop new data sets.
The US Environmental Protection Agency has employed metadata and associated search functionality as core of its Environmental Information Management System (EIMS). This system will be described in detail, below. 7.3.2 Document Management Systems
7.3.2.1 COTS Software
There is a range of Commercial Off-The-Shelf (COTS) Document Management Software on the market that can be utilized to store, manage and share all types of corporate documentation, including geospatial data and information. These can range from very complex and expensive enterprise-wide solutions to systems designed for small office or home users.
COTS software has a number of advantages over developing custom solutions for document management. Typically, these products come from reputable and well- established companies. They have usually undergone significant beta and user testing and have incorporated any required fixes in the latest versions. Depending on the software, acquiring multiple licenses for the software may be equivalent, or in some cases, cost less than the cost of contracting out a custom solution.
There are also disadvantages to the COTS software. The data must be formatted to fit the structure provided by the software. Any customization of the software will increase the cost, as will any necessary network infrastructure upgrades. Significant training may also be needed to use the software.
Two examples of commercially available Document Management Software are provided below. This does not represent the full range of available software, but is intended to provide representative examples of the types of software available.
Pangaea Information Technologies, Ltd. 81 USACE IJC LOSLR IMS
7.3.2.1.1 DocuShare – Xerox Corporation (http://docushare.xerox.com/marketing/index.shtml)
DocuShare is an example of a higher-end web-based document management system that allows users to easily store, access, and share information in a secure and collaborative work environment. DocuShare allows any user on any system to post and retrieve information in any format. Text, scanned images, video clips, Microsoft Office documents, sound files, executables, web links, bulletin boards, calendars, etc. can be managed over the Web without using complex FTP software, browser plug-ins, or client- side applications.
The DocuShare Client components allow seamless integration with applications such as Microsoft Office and Corel Office; the Microsoft Windows Explorer desktop; and Microsoft Outlook. Files may be dragged and dropped between the local hard disk and DocuShare with Microsoft Windows Explorer (Figure 7.3.4).
Figure 7.3.4 - DocuShare Integration with Windows Explorer
An Outlook Client is also available (see Figure 7.3.5) that seamlessly integrates with Microsoft Outlook to provide a transparent drag and drop interface between mail applications and an online repository.
Pangaea Information Technologies, Ltd. 82 USACE IJC LOSLR IMS
Figure 7.3.5 - DocuShare Outlook Integration
DocuShare uses an “open architecture.” All documents are securely stored in their original formats on the web server and are accessed by standard HTML links. These links do not change when objects are modified or moved, enabling easy integration with other Web pages and products. End users control what individuals or groups have authority to read, write, or manage. Access to links for unauthorized users are not only disabled, but made invisible.
System Requirements The following are the minimum system configurations required:
Windows 600 MHz Pentium III® 128 MB RAM. 100 MB disk space. Windows NT® Server 4/SP 3/4/5/6 or Windows 2000. Microsoft IIS 4.0 or Netscape® Enterprise Server 3.0.
UNIX
Pangaea Information Technologies, Ltd. 83 USACE IJC LOSLR IMS
Sun Ultra SparcTM. 64 MB RAM. 100 MB disk space. Sun SolarisTM 2.5/2.6/7. Apache web server (included) or Netscape Enterprise Server 3.0.
Minimum Client Platform Requirements
Web Interface Any platform (Windows, Mac OS, UNIX) with a current web browser (Internet Explorer 4.x/5.x or Netscape Navigator 4.x recommended).
Windows Interface Microsoft Windows 95/98/NT4, 2000. Internet Explorer 4.x/5.x (required).
Cost Cost for a DocuShare license is based on the total number of users, with users defined as those who actually need access to the software to create and post content. There is no licensing fee for the number of users who simply wish to download information. The table below outlines the scaled pricing:
# Users Cost (US Dollars) Annual Maintenance 50 $4995* $816 500 $19,995 $3144 Unlimited $49,995 N/A
*Additional 50 User Bundles - $2495
7.3.2.1.2 EasyDocs - Internet Development Ltd., UK (http://www.easydocs.com/frame.html)
Pangaea Information Technologies, Ltd. 84 USACE IJC LOSLR IMS
At the other end of the spectrum, EasyDocs is an example of a low cost document management system that is designed to provide simple access and sharing of documentation. It is aimed primarily at organizations which use Microsoft Office for their office documents (i.e. Word, Excel, Powerpoint etc.) and allows users to upload, search and manage all their documents or their colleagues’ documents using their web browsers.
From their web browser, users can upload Microsoft Office based files (only) to EasyDocs. Once the files are uploaded, EasyDocs indexes the documents and makes them available for other users who are allowed to access those documents.
Users can then access documents they are allowed to see by searching document contents or document properties using the EasyDocs search screen (Figure 7.3.6). Documents matching the search criteria are then returned via a results screen (Figure 7.3.7). Users who have the relevant permissions can amend or delete documents on the system. Each user within EasyDocs is given his/her own profile, which determines what he/she can and cannot do with documents stored in EasyDocs.
Figure 7.3.6 - EasyDocs Search Screen Example
Pangaea Information Technologies, Ltd. 85 USACE IJC LOSLR IMS
Once a search query is submitted, EasyDocs returns a listing of the relevant documents which are hotlinked for download or viewing (Figure 7.3.7).
Figure 7.3.7 - EasyDocs Search Results Example
System Requirements EasyDocs needs to be installed on an NT4 Server with appropriate hardware depending on anticipated usage and running Internet Information Server 4, Index Server 2 and Service Pack 3. Index and Internet Information Server are part of the NT4 Option Pack which is freely available from Microsoft. EasyDocs integrates tightly with NT4 making full use of NT's accounts database and NTFS for document security.
Client End users will require IE 4 or above, Netscape 3 or above and Microsoft Office 95 or 97 (or just the required components such as Word or Excel). Network A TCP/IP based network. Server Hardware RAM Requirements
Pangaea Information Technologies, Ltd. 86 USACE IJC LOSLR IMS
Number of Documents Amount of RAM required up to 100,000 128Mb up to 250,000 196Mb up to 500,000 256Mb
Hard Disk Space The EasyDocs software uses about 3Mb of disk space. However, the majority of disk space on the server will be used by documents that users can upload from their PC through their web browser. EasyDocs is capable of indexing approximately 500,000 documents.
The table below provides an idea of the amount of disk space required depending on the number of documents. It is based around the formula 1.4 * average size of document * number of documents.
Disk Space Requirements in GigaBytes Number of documents Average Doc up to 100,000 up to 250,000 up to 500,000 Size 10K 1.4 3.5 7.0 50K 7.0 17.5 35.0 100K 14.0 35.0 70.0
Cost EasyDocs itself has a very simplistic pricing scheme of $30 U.S. per user/license. For example, 20 licenses must be purchased for 20 people within an organization to use the system on individual computers. Additional costs may be required to provide the necessary server/system requirements.
7.3.2.2 Customized Information Management Systems
Pangaea Information Technologies, Ltd. 87 USACE IJC LOSLR IMS
7.3.2.2.1 US EPA Environmental Information Management System (http://www.epa.gov/eims/eims.html)
Overview EPA's Office of Research and Development (ORD) has developed a scientific environmental information management system (EIMS) that stores, manages, and delivers descriptive information (metadata) for data sets, databases, documents, models, multimedia, projects, and spatial information.
EIMS is a repository of products and metadata. The descriptive information in metadata enables users to evaluate and use these products. EIMS stores and maintains descriptive information in a relational database (Oracle) and refers to the products (data, documents, etc.) stored either within EIMS or as distributed external files. This architecture supports the management of remote sensing data, geographical information system (GIS) coverages, and other types of data for which entry into relational tables is not appropriate. Descriptive information stored within EIMS is consistent with the Federal Geographic Data Committee (FGDC) metadata content standards for spatial data. A significant enhancement of these standards, however, is the addition of a hierarchical metadata framework that organizes detailed scientific data and documentation, and accommodates customized information at the catalog level to facilitate a review of the different types of metadata in EIMS.
The EIMS repository of scientific documentation, accessed with standard web browsers, places a virtual library on the desktop of EPA staff and others with Internet access. Users can search within EIMS to find information sources of interest based upon topic or defined criteria related to types of environmental resources, geographical extent, date, or content origin. These user-defined searches typically are more efficient than currently used web search engines. The basic EIMS Search Form is seen in Figure 7.3.8.
Pangaea Information Technologies, Ltd. 88 USACE IJC LOSLR IMS
Figure 7.3.8 - EIMS Search Form
Figure 7.3.9 - EIMS Metadata Search Results List
Pangaea Information Technologies, Ltd. 89 USACE IJC LOSLR IMS
When submitted, the system returns a list of items matching the search criteria (Figure 7.3.9). These are hot-linked and users can click on a link to go to a summary of the metadata for that item (Figure 7.3.10). This summary item then has links in the left hand portion of the screen that allow the user to learn more about the data set, or actually download the data from it’s source.
Figure 7.3.10 - EIMS Metadata Summary Form
Both EIMS partners and public users of EIMS may submit directory entries to be considered for inclusion in the EIMS database. A data administrator reviews all submissions, prior to approval. Approved directory entries are then viewable by designated EIMS users. By design, all directory entries submitted by public users are public and viewable by all users of EIMS. Partners may wish to restrict some of their descriptive information for internal use only. EIMS has the necessary security options to permit different levels of access to metadata.
Cost Cost information for the development of EIMS has been requested from EPA, but was not available at the time of writing. Based on knowledge of design projects such as this,
Pangaea Information Technologies, Ltd. 90 USACE IJC LOSLR IMS
it is estimated that costs for the basic development of such a search engine by an IT consultant would range in the $75,000 - $100,000 range. Additional costs would be required for initial population of the catalog as well as maintenance and updates. 7.3.3 Presentation Options for Study Content and Decision Support
The Shared Vision Planning approach to be implemented includes a Shared Vision Model (SVM), an example of which Bill Werick has presented to Study Participants during the past year. The SVM provides an excellent framework to integrate the varied model output, each of which addresses a particular Performance Indicator (PI) in a particular resource sector. As such, and given its use as a decision support tool for Shareholders in the Lake Ontario – St. Lawrence River region, it provides a means for presenting in an organized and meaningful way the immense body of information that will be available upon completion of the Study.
A review of the technical work that supports the SVM could be presented in the following way. Hypertext could be placed within the SVM that links users to overviews on individual PI models. From a PI model overview, further hypertext could direct the users to detailed reports and/or articles (peer-reviewed, popular, etc.), metadata files for each model input, or web-mapping browsers. From the metatdata, users could evaluate data sources, and download if desired and permissible.
Currently, the SVM is in workbook (MS-Excel) format. Alternatively, the model could be developed in STELLA, or a number of other modeling environments. Moreover, the model could exist either as a stand-alone program (as it is presently) or be developed to be web-enabled, with access and use requiring only a web-browser (i.e., Netscape or MS- Explorer). Stand-alone and web-enabled information management systems, and examples thereof, are described below.
7.3.3.1 Stand Alone
An alternative to web-based database query systems are what can be called “stand-alone” systems. These are basically database management systems or information management systems that contain all the necessary software and data to run them on one, or multiple
Pangaea Information Technologies, Ltd. 91 USACE IJC LOSLR IMS
CD-ROMs or DVDs. These can be run directly from the CD, or they can be installed (usually in unlimited numbers) on any computer you wish.
A good common example of such a system (albeit a high end one to develop) is Microsoft’s Encarta Encyclopedia. The latest version of this program contains all content, as well as the Graphical User Interface (GUI) used to access the data on one single DVD. Users enter via a start-up screen (figure 7.3.11) from which they search/query the database of articles, photos, maps, videos, etc. through a search index along the left margin, or via hot-linked topic titles in the main “browser” window.
Figure 7.3.11 - The Microsoft Encarta Start-Up Window
One of the concerns with stand-alone database programs of this nature, is that once they are cut to CD or DVD, they are out of date. New information can not be added to the database. Fortunately, Encarta, for example, provides a close integration with
Pangaea Information Technologies, Ltd. 92 USACE IJC LOSLR IMS
Microsoft’s Encarta web site so that users can download new information via a subscription service (free for first year). While the data does not update the static data that came on the disk, it does create a secondary database file on the hard drive of the computer that now is searchable by the user in a seamless and integrated fashion.
Another example is provided below for an information management system that was developed for the Zambezi River basin in Africa. A wealth of information on the basin was available including articles, fact sheets, photos, maps, videos, reports, etc. The World Conservation Union (IUCN) desired to make all this information available in a central location for themselves, but also for many of the local planning agencies and officials in the various countries. While many of these agencies had some degree of computer access, not all had reliable internet access so a web based solution was not feasible. As such a self-contained database management system was developed. The system resides entirely on one CD and can be run directly from the CD or installed on a computer’s hard drive. The data in this case includes a number of PDF files as well as photo and video files.
Figure 7.3.12 shows the main components of the GUI users encounter. Across the top of the screen are “browser” buttons representing the key data types archived (bibliographies, factsheets, reports, maps, photographs, etc.). On the left side is a search window. This is set up in a simple Windows Explorer like tree structure where users can search by geographic region, or by topic and then collapse or expand the directories to see they types of data available. When a data type is selected from the list on the left (in this case a map), it is displayed (via integration with Adobe Acrobat) in the browser window which occupies the right part of the screen. If a user selects a browser button along the top, the search window on the left will switch to only show that type of data for either the region/sub-region or the topic/sub-topic selected.
Pangaea Information Technologies, Ltd. 93 USACE IJC LOSLR IMS
Figure 7.3.12 - Zambezi River Information management System GUI
Cost Cost for developing systems like these will clearly vary depending on the complexity. The cost for development of the Zambezi River system above was on the order of $300,000US over a three year period. This also included however, travel costs to Africa to research and collect the data, as well as development costs for a web-based version (integrated with a GIS map-viewer).
7.3.3.2 Web Enabled
Many of the examples discussed previously in this section are examples that utilize a web based database search and display functionality to some extent. The basic text search function (Section 7.3.1.2.1) searches a small “text index” database file and send results back to the user. Commercial web survey software utilizes higher end Oracle or SQL
Pangaea Information Technologies, Ltd. 94 USACE IJC LOSLR IMS
database software to store results and to generate reports back to the user. Feedback form data is populated into a database that is then searchable from the same web site.
These types of systems have a number of advantages perhaps the best of which is the “currency” of information. They are always up to date and are populated with the most recent data. These systems can be nicely integrated into an organizations web site, they can make use of a range of existing database management software ranging from the simple (Access) to the more complex (Oracle) and they can generally be maintained by personnel with a good understanding of web design and database programming languages. On the downside, such systems may require that specific proprietary web serving and hosting software be running on the machine hosting the web site in order for it to run. In addition, depending on the volume of information, upgrades of RAM, hard drive space and processing speed may be required. The cost of these items may in some cases be a limitation to their implementation.
7.3.4 Recommendations
It must be stated again that because a Study-wide Communications Strategy has not yet been developed, making fully informed recommendations regarding Study-wide aspatial information management is difficult. Given this lack of information, it would be prudent to err towards a more robust document management system that is scalable and possesses the capacity for extensibility. Designing and implementing a system that will not meet changing or currently unforeseen critical Study needs could prove very costly (and wasteful) in the long run. Designing a system that is extensible and scalable provides insurance against this.
The following system components and functions are recommended:
1) Commercial Off The Shelf (COTS) software for web-based document and other information management, such as Xerox’s DocuShare (see Section 7.3.2.1.1). This higher-end web-based document management system could prove extremely useful in meeting internal Study IM needs.
Pangaea Information Technologies, Ltd. 95 USACE IJC LOSLR IMS
2) A web-site with documents and other information presented in a hierarchical structure. This is simply a recommendation for the organization of the existing web-site. Basic HTML text search functionality should be provided. 3) A web-enabled Shared Vision model. The IMS team views this model as having the potential for more than an excellent decision support tool. Its structure allows the integration of all essential information (i.e., links to model descriptions, model inputs, etc) that could facilitate evaluation of the Study and support the recommended data discovery, evaluation and access schema.
7.4 System Component Integration
An effective information management (IM) strategy provides for the integration of the components comprising an IM system. The primary system components, as recommended above, are:
the three regional database management systems (DBMS) – consisting of hardware, software, data, and people, (should all of these be capitalized? For this one in particular, do software, data and people really need to be included?) a web-mapping and geodata viewing application, the Study website, a Study-wide document management system with web interface, and a web-enabled version of the Shared Vision Model (SVM).
Given the distributed nature of the Study Participants and the stakeholders within the study region, the Internet should serve as the backbone for integrating the Study’s IM system. Study web pages and hyperlinks contained therein then serve as the means for providing linkages among the recommended applications as well as the collection of documents, databases, images, etc. that comprise the Study’s body of data and information.
Under this scenario, the Study website serves as the focal point, and point of departure, for all system functions. One link from the Study homepage would take the user to a
Pangaea Information Technologies, Ltd. 96 USACE IJC LOSLR IMS
page (or separate site) devoted to geospatial data. From there, the user could access static maps, interactive web mapping services, and/or a metadata clearinghouse for data discovery. A second link from the homepage would take the user to the document delivery system, where Study Participants and users would be authenticated, and provided with access to different sets of documents at login. From there, the user could perform Boolean searches on the index generated from the complete text of every document in the system, and/or keyword searches on document metadata. A third linked page would be devoted to the Shared Vision Approach. The SVM would be described there and, when it becomes available, links to the web-enabled Shared Vision Model, a user manual, etc. would be incorporated. Finally, pages with frequently used or essential documentation or information could be accessed through additional links present on the homepage (some of these links and associated pages currently exist).
In designing this system for the Study, the redundancy of user pathways to different system components, and their contents or applications, has been emphasized. For example, hyperlinks to a particular document should be present at all logical locations within the system. Take for example a geospatial database used as an input to a model that addresses a particular performance indicator (PI). The database would be listed or mentioned in the report that describes the model. The report would be available both in the hierarchy of “essential documentation” pages suggested above, and through searches via the document management system. A hyperlink on the name of that database would take the user to the metadata for that geospatial database which, in turn, provides the information for distribution, a link to the report, and a link to the Study’s mapping and data viewing application.
If a user entered a data discovery node (metadata clearinghouse) that is a part of the global spatial data infrastructure (GSDI), the metadata for that database could be accessed and retrieved after a search with the appropriate criteria. The database, with hypertext link to its metadata, could be present as a row listed in a page linked to the geospatial data page. The database would be a theme that could be turned on or off, via radio-button or check-box located next to the theme (i.e., database) name in the web mapping and data viewing application GUI. Links to the database’s metadata could be provided in that application through a right mouse click on the theme (i.e., database) name. A simple diagram of system component integration is presented in Figure 7.4.1.
Pangaea Information Technologies, Ltd. 97 USACE IJC LOSLR IMS
This generic example can easily be extended to accommodate for a wide range of user queries and delivery of data and information from the system. Organization of data and information services via the existing Study website will allow for the efficient and simple query and transfer of information to both the public and to Study participants. Moreover, by utilizing the familiar structure of web portals as a central information store, all users of the system will immediately be able to find the information that they are searching for.
Figure 7.4.1 - System Components Integration
Pangaea Information Technologies, Ltd. 98 USACE IJC LOSLR IMS
8.0 IMPLEMENTATION
8.1 Data Discovery
The recommendations for data discovery address the immediate need for a data discovery mechanism within the Study and the importance of developing FGDC compliant metadata. After all standardized metadata is generated, translated, and reviewed, the more robust data discovery mechanism with the SDI can be implemented. Because metadata creation can be a very time intensive process, it is necessary to provide for the immediate needs of a data discovery mechanism separate from the need to develop fully compliant metadata. Figure 8.1.1 provides timelines for implementation of the four alternatives addressing data discovery needs (as well as the five related to storage, maintenance, access, and distribution needs).
In the first quarter of FY 2002, a comprehensive study-wide table should be created which will list all datasets used or created by the study along with basic metadata: dataset name, owner, scale, date, format, etc. This list should be periodically maintained through the second quarter of FY 2002 and remain available to Study Participants through FY 2003. In creating the data list, the process of generating FGDC compliant metadata will begin. Study Participants responsible for metadata coordination will be identified, and the metadata collection process will begin with this basic information regarding the Study data. Beginning with the third quarter of FY 2002 and once fully compliant metadata has been generated for the Study, utilization of the GLINDA system will provide for full search capability for Study datasets. Use of the basic, non-standard metadata table will continue beyond the initial inclusion of Study data in the GLINDA system to accommodate the variability in how quickly FGDC compliant metadata can be created for certain Study datasets. As fully compliant metadata is generated for more of the Study’s data, the need for and usefulness of the data discovery table will be out weighed by the increased functionality of GLINDA and the spatial data infrastructure network. The short timeline associated the implementation of the data discovery recommendations is illustrative of the immediate need for a data discovery mechanism for Study Participants and the importance of generating metadata as a component of the data development process.
Pangaea Information Technologies, Ltd. 99 USACE IJC LOSLR IMS
Short Term Mid Term Long Term Alternatives FY 2002 FY 2003 FY 2004 FY 2005
y Status Quo r e v o c s i Table D
Catalog
SDI s
s Status Quo e c c A
d Single Repository n a e c n
a Single DBMS n e t n i a IJC Distributed DBMS M e g a r Regionally Distributed o t S DBMS TWG Distributed DBMS s n OGC WMS o i t p Single System O OGC WMS Regionally Distributed GIS Middleware Single System GIS Middleware Regionally Distributed OGC WMF Single System OGC WMF Regionally Distributed
Pangaea Information Technologies, Ltd. 100 USACE IJC LOSLR IMS t
n Hierarchical Web Site e m
u Structure c o Document Management D
t Software n e Web-Enabled m e g
a Shared Vision Model n a
M Figure 8.1.1 - Alternatives and Options Timeline. In Figure 8.1.1, solid black lines indicate the probable timeframes that an alternative or option could be implemented. Dashed black lines indicate the possible implementation (or production) timeframe. Green shading indicates the recommended timeframes for implementation and/or production phases.
8.2 Data Storage, Maintenance, Access, and Distribution
The principle recommendation for how the Study should address data storage, maintenance and access needs is through the development of a Regionally Distributed DBMS, the only system that can potentially be supported in the long-term. Because development of such a system would require a timeframe exceeding the longevity of the Status Quo system, we recommend that the Study look to extending the functional capacity of the FTP site solution to meet the needs of Study Participants in the short-term. André Plante of ECQR suggested that as a temporary solution, resources could be dedicated to accommodate the short-term data storage needs of the Study. By the beginning of FY2003, a Regionally Distributed DBMS to store data should be implemented; however, the extended Status Quo should remain functional through the second quarter of FY2003 in support of data transfers while performance and reliability of the Regionally Distributed System is finalized. As recommended options to add greater functionality to the system, development and implementation of an OGC- compliant web mapping service and the necessary supporting middleware should begin the first quarter of FY2003. While the possibility exists to implement an OGC-compliant web feature service sooner, the development and implementation of this recommended option should begin the third quarter of FY2004. The recommended alternatives and options for how the Study should address data storage, maintenance and access allow for the potential support of the system in the long-term.
Pangaea Information Technologies, Ltd. 101 USACE IJC LOSLR IMS
8.3 Document and General Information Management Tools
Both the hierarchical web site structure and document management software can be implemented immediately. Content of the website and the document management system will be generated as the Study proceeds, making the organizational structure of this component of the IM system more logical and useful. While the publication of the Shared Vision Model in a web environment is technically feasible now, further development of the model itself, a more detailed evaluation of the technical options for online publication, and incorporation of the results of the modeling groups (PI response), will be necessary.
Pangaea Information Technologies, Ltd. 102 USACE IJC LOSLR IMS
9.0 SUMMARY
The Common Data Needs Technical Working Group (CDNTWG) of the International Joint Commission’s Lake Ontario – St. Lawrence River Study was charged with the development and implementation of an Information Management Strategy (IMS). In response the CDNTWG assembled an IMS Team consisting of professionals either participating in the Study or associated with agencies or organizations in the Study region, that have relevant experience in information technologies. With assistance from a contractor, Pangaea Information Technologies, the IMS team has conducted a comprehensive Needs Assessment (NA) and hosted two workshops to aid in the formulation of the IMS. A summary of the needs assessment process and results, a survey of the strategies and policies adopted by other organizations, the strategy alternatives and options associated with the implementation phase and recommendations thereof, are presented in this document. After the Study Board selects the alternatives and options to be implemented, and Study IM policies to support those alternatives and options are endorsed, the CDNTWG will coordinate the development and implementation of a detailed Information Management Plan.
The chronology of the strategy development process formally started with the first IMS Workshop, conducted in October 2001. This meeting consisted of presentations of geographic information systems (GIS), database management systems (DBMS), and IM systems developed by a number of agencies and organizations, as well as review and discussion by the IMS Team of short- and long-term issues related to the management of the Study’s geospatial information. Next, the Needs Assessment Questionnaire was developed and distributed to the TWGs and the Public Interest Advisory Group (PIAG), and follow-up interviews conducted by Pangaea. A list of datasets associated with the inputs and outputs of models addressing Performance Indicators (PIs) was compiled as a result of this and associated efforts, and is presented as an appendix to this report. The second IMS workshop was conducted in February 2002, and consisted of three primary components: 1) a summary of the NA process, 2) IM resources (systems, knowledge bases, etc.) available to the Study, as well as IM strategies, policies, and “lessons learned” by organizations as structurally- and functionally-similar to the Study as possible, and 3) potential policies and alternative system architectures to meet Study needs. “Break-out groups” engaged in focused discussions on IM policy, technical
Pangaea Information Technologies, Ltd. 103 USACE IJC LOSLR IMS
issues, and writing a proposal for funding assistance to help meet Study IM needs. The latter topic focused on a FY2002 Category 4 Cooperative Agreement Program Grant, funded jointly through GeoConnections (Canada), and its US counterpart, the Federal Geographic Data Committee (U.S. Geological Survey). [A grant proposal was subsequently submitted, and has been accepted. See Appendix IV for a summary of the proposed project.] Study Properties and IM Needs Several key Study properties and related facts (or perceptions) were identified in the above activities. Two primary user groups are associated with the Study: Study Participants (all TWG, PIAG, Study Board members), and the Public. The general flows of information in the Study start with the regulatory alternatives being passed to the Hydrologic and Hydraulics (H&H) TWG. The H&H TWG models the “levels and flows” scenario associated with each regulatory alternative, and provides these to the TWGs who evaluate resource sector response for selected Performance Indicators (PI). In addition to these “levels and flows”, which can be considered forcing or driving variables of the PI models, many other input variables are required by the TWG modeling groups. While many model inputs are derived directly from organizations outside of the Study, inputs from at least three sources inside the Study can be used: 1) the “basemap layers” provided through the CDN TWG, 2) model inputs obtained from other TWGs, or from modeling teams within the same TWG, and 3) model outputs (to be used as inputs in a different model).
All model outputs, aggregated as appropriate and valuated in dollar ($) terms whenever possible, will be made available to all Study Participants and the Public. As suggested above, some of these model outputs may serve as inputs in models addressing PIs in the same or different resource sector. Model approach, results, and analysis, and discussion will be documented in report form, which will be made available to all Study Participants, and to the Public via the PIAG. Last, all model outputs will be transferred to the PFEG for incorporation in the Shared Vision Model, subsequently used by all Stakeholders.
In addition to the Study’s general flows of information, several key Study properties important to the Information Management Strategy were identified in this IM assessment and analysis phase.
Pangaea Information Technologies, Ltd. 104 USACE IJC LOSLR IMS
Modeling and data processing is widely distributed within the TWGs among contractors, affiliated agencies and TWG members; other Study Participants and regional resource stakeholders are also widely distributed. TWGs identified specific geospatial and aspatial datasets (both model inputs and outputs) as sensitive, for reasons related to security, proprietary, and liability issues. While communication and information/data transfer via email and FTP presently address Study needs, this short-term solution is anticipated to become insufficient within 6 months due to dramatic increases in data volumes and demand for that data. The IJC does not wish to serve in a data stewardship or distributor capacity after the Study ends. Making the Study process transparent to the public is essential to the Study’s success. This includes making model descriptions and datasets accessible to the public for review and evaluation. Providing data discovery, evaluation, and access for Study Participants has much potential to reduce redundancy and allow for more integrated modeling across resource sectors. Providing this functionality to the public is also desirable if data sensitivity and security can be ensured. Any IM system developed for the Study needs to be reliable, redundant (backed- up), and secure.
One of the principle functions required of a system responsible for supporting the sharing of information is the ability of potential users to learn of the existence of and significant details about data or information. Popularly known as data discovery, such mechanisms employ search procedures which match user defined criteria against information held about the data, known as metadata. Metadata is essential for the data discovery process. The degree to which an organization generates metadata for data and information determines how effective a data discovery mechanism can be implemented. Standard geospatial metadata formats have been developed by the FGDC and ISO to ensure that all essential information about the data has been collected and represented in a consistently organized and searchable way. Standard-compliant metadata provides a common set of terminology and definitions to document data and allows an organization to maintain the investment made in collecting or generating geospatial data. Primary elements (text
Pangaea Information Technologies, Ltd. 105 USACE IJC LOSLR IMS
sections) of FGDC compliant metadata include: Identification, Data Quality, Spatial Data Organization, Spatial Reference, Entity and Attribute, Distribution, Metadata Reference, Citation, Time Period and Contact Information. These common elements and any specific data elements contained within allow users to determine things like the availability, fitness of use and accessibility of datasets. Metadata that is not standardized lacks the necessary completeness to publish information in a meaningful way, and should not be presented to the public. Because data discovery only provides information contained in metadata, it can be considered separate from most liability and security concerns associated with data access and distribution.
Data storage, maintenance and access needs require the coordination and integration of the many responsibilities associated with the system and its data. Data owners hold the responsibility for data security, use and maintenance, and they have the authority to define and manage data access and distribution through the application of a flexible security model. Data stewards are responsible for the day-to-day maintenance of data. Given the authority by the data owner, data stewards are familiar with the issues and concerns specific to a data set. Consistent maintenance is essential to ensure the currency and quality of data. The integration of these responsibilities with the infrastructure and organizational procedures that support the system ensures reliability and sustainability.
The Internet continues to be the most commonly utilized method of distributing an information management system to a large number of dispersed users. Through consistent and well-designed implementation, an information management system can consist of one or many servers and provide simultaneous access to multiple users in many different locations. In establishing a shared environment from which data and information resources can be utilized, a client/server strategy can promote efficient system administration and access. The organization and design of the system will also need to address extensibility in terms of supporting the capacity to implement additional technical functionality after the initial implementation phase. Examples of such additional functionality are web services such as web mapping services (WMS) and web feature services (WFS) which provide functionality for interactive geospatial data viewing and querying over the Internet. This type of functionality would require the implementation of a system that allows for connectivity to multiple datasets, potentially over multiple systems.
Pangaea Information Technologies, Ltd. 106 USACE IJC LOSLR IMS
With knowledge gained from a review of the IMS approaches and policies implemented by organizationally- and functionally-similar organizations, and the information about Study properties and needs (presented above), the IMS team synthesized a strategy and several specific approaches for implementation. The alternatives and options were divided into three distinct areas: 1) Geospatial Data Discovery 2) Geospatial Data Storage, Maintenance, Access, and Distribution 3) Aspatial (Document) Information Management
Data Discovery Alternatives and Options Currently, data discovery performed in the LOSLR Study is currently a function of gleaning information from documents detailing the Study organization and work plans and/or by “word-of-mouth.” The currency and completeness of this Status Quo approach is often poor. A second alternative would be to generate a tabular list of all data used or generated by the Study, and ask the respective data owners to add some brief metadata to their entry(s). The list could only be distributed to Study Participants because the non- compliant metadata would not be fit for public consumption. This alternative addresses the immediate need for inter-TWG data awareness in a limited manner, but does nothing to promote transparency and openness of the Study for the public.
A third alternative would be to develop a collection of standard-compliant metadata files, or metadata catalogue. This catalogue, which could be made available over the Study website, would represent a comprehensive list of information about data used in or produced by the Study. This alternative is the first to be fit for public consumption, addressing the need for public involvement and transparency in the Study process and thereby promoting its overall credibility. This alternative requires additional funding and a Study-wide commitment for the development and coordination of standard-compliant metadata, the latter requiring a formal metadata review process.
All data and information produced by the Study should be made discoverable for the Study Participants and the public-at-large through a standard metadata documentation and collection procedure. Data used as model inputs that are not produced by the Study,
Pangaea Information Technologies, Ltd. 107 USACE IJC LOSLR IMS
and are readily available elsewhere, only would need to be cited appropriately in Study documentation. Regardless of the specific data sensitivity, all metadata should be made accessible; in fact, metadata provides an ideal location for presenting data liability disclaimers and use restrictions.
The IMS team believes that the IJC policy of producing Study documentation in both English and French, and supporting translation costs, must be adhered to for metadata as well. The rationale for this requirement was that evaluation of the Study at its most basic level – the models that address Performance Indicators (PIs) under the “levels and flows” associated with different regulatory alternatives – requires the ability to evaluate the metadata associated with model inputs. Given that the majority of these models have a spatial component, it seemed prudent to provide geospatial metadata in both English and French.
While IMS team believes that providing for bilingual metadata is justified, they do not support translation of the attribute information of the data sets themselves. Assuming that overviews of every model and supporting technical documentation will be translated according to IJC policy, the IMS team did not feel that the cost of translating the regional databases themselves is warranted. Full evaluation of the modeling approach as well as inputs and outputs would be possible without translation of the databases. Moreover, the IJC should not bear costs of making every “new or value-added” geospatial dataset (including both PI-model inputs and outputs) immediately useful to the public, but only those costs necessary for full evaluation of the model approach. Translating metadata fulfills this latter requirement.
Options may be selected to further assist in the metadata development and coordination. The first of these options is the formation of a Metadata Review Team, which would conduct quality assurance and quality control on metadata as it is generated by the TWGs. The second option is the hiring of a Metadata Coordinator, who would coordinate all metadata training, provide assistance in metadata development, ensure completeness of metadata produced, and confirm compliance with FGDC 1998 metadata standards. The third option is to hold a Metadata Workshop for training all study participants involved in the production of standard-compliant metadata. Workshop training on metadata generation software could provide a jump-start to the metadata
Pangaea Information Technologies, Ltd. 108 USACE IJC LOSLR IMS
creation process, and reduce the time spent by a Metadata Review Team and/or Metadata Coordinator over the course of the Study. The final option would be to design and implement On-line Metadata Development Assistance. This service would help TWGs that are generating metadata through simple text instructions and easy to understand manuals, and to direct specific questions to an identified metadata expert (e.g., the Metadata Coordinator), who would be required to provide timely assistance.
The fourth and final alternative that addresses the need for Data Discovery is the participation in the spatial data infrastructure (SDI) of the United States and Canada. The SDI is a network of metadata providers that use a standard search protocol to allow access to metadata through a single data discovery portal. Participation in the clearinghouse networks requires FGDC- or ISO-compliant metadata, and a searchable server. Thus, this alternative incorporates the tasks necessary for implementation of the third alternative, i.e., production of the metadata catalogue.
Because participation in the SDI network requires the implementation of a searchable (i.e., Z39.50-compliant) server, the Study would most efficiently utilize resources by submitting metadata to an agency or organization who has already implemented a clearinghouse node server. An international directory of SDI networks connects the nodes of different clearinghouses to create a world-wide network of metadata clearinghouses. Thus, discovery of the Study data in the SDI alternative can occur from multiple data discovery portals and nodes. This mechanism for data discovery increases the exposure of the Study and has the potential to attract the interest of more individuals than just those who would have otherwise known of or found the Study website.
Recommendation of Data Discovery Alternatives and Options To best address the need for Data Discovery and Evaluation, implementation of “SDI Participation” (Alternative 4) is recommended. In addition to its primary function, positive externalities of this alternative for the Study include becoming part of a developing service provided to the geospatial data community, facilitating the transparency of the Study, and enhancing the overall visibility of the Study through its inclusion in the Global Spatial Data Infrastructure (GSDI). Options 2 – 4 are also recommended: hiring a “Metadata Coordinator”, conducting a “Metadata Workshop”, and providing “Online Metadata Development Assistance”. The primary cost associated
Pangaea Information Technologies, Ltd. 109 USACE IJC LOSLR IMS
with this alternative is related to the creation of metadata and the optional support functions. While some support for the SDI node may be appropriate (requested or required), the additional expense would be minimal.
The costs associated with these alternatives and options are presented in the table that follows (recommendations appear in bold text):
Data Discovery Alternatives, Options, and Time of Probable Cost Cost Recommendations Implementation (FY2002) (Thru FY2005) Alternative 1: Status Quo Now $0.00 $0.00 Alternative 2: Data Table 2nd Q, FY2002 $4,750.00 $4,750.00 Alternative 3: Metadata Catalog 3rd Q, FY2002 $21,831.00 $54,576.50 Alternative 4: SDI Participation 3rd Q, FY2002 $23,831.00 $56,576.50 Option 1: Metadata Review Team 2nd Q, FY2002 $5,700.00 $14,250.00 Option 2: Metadata Coordinator 2nd Q, FY2002 $40,000.00 $110,000.00 Option 3: Metadata Workshop 2nd Q, FY2002 $500.00 $500.00 Option 4: Online Metadata Development Assistance 2nd Q, FY2002 $4,275.00 $4,275.00 Total Cost of $73,356.00 $176,101.50 Recommendations :
Policies that support metadata creation, review, and uploading to a SDI clearinghouse, should be included as part of implementing this alternative. These policies would include adoption of the FGDC 1998 metadata content standard, ANSI Z39.50 compliance for server(s) holding the metadata catalog, and promotion of these standards at the contractual level. A standard clause should be included in all contracts related to data and information development, stating that required metadata is to meet all Study approved content and quality standards. In addition to the metadata file itself, a data abstract (for use in data discovery) and a data citation should be submitted. Finally, all metadata should be made available in both English and French. Translation of the datasets themselves is not required.
Pangaea Information Technologies, Ltd. 110 USACE IJC LOSLR IMS
Data Storage, Maintenance, Access, and Distribution Alternatives and Options Six alternatives have been identified for addressing the needs for data storage, maintenance, access and distribution. While the implementation of multiple alternatives simultaneously was possible for data discovery, the alternatives here are much less compatible, with the possible exception being the temporary “implementation” of an extended status quo to accommodate the short-term needs of the Study during the development, testing, and final implementation of a better alternative. Additional options have also been identified that could be implemented with either of the two more functional alternatives
The current data storage and access scheme implemented for the Study allows users (Study Participants) to store and access data in their local environments. Data transfers to non-local users requires an FTP site, such as the one managed by Canadian Centre for Inland Waters (CCIW), or various media (e.g., CDs, magnetic tapes, etc.). The system for data distribution is largely uncoordinated and fails to facilitate data integrity, security, back-ups or archiving. This system includes no active maintenance functionality for individual datasets: incremental changes to parts of a dataset could not be made, and only wholesale replacement could be possible. Considerations for public accessibility of data and long-term sustainability of data and systems have not been addressed under the current strategy. No immediate additional costs are associated with continuing with the Status Quo alternative; however, because the CCIW FTP site was intended as a temporary solution, a decision to continue with this strategy will likely require that addition capacity be added in the near future as the demand for its use increases.
The second alternative identified to address the need for a coordinated data storage, maintenance, access and distribution is the implementation of a Single repository for Study data. The repository would exist as single FTP site to which users can be assigned rights and permissions according to their specific information needs. As a single location for all Study data, the repository would allow for much greater coordination of data distribution. Data integrity, security, back-up and archival would be facilitated in a single environment. The repository would be able to accommodate public access to data through providing limited access with read-only permissions or by implementing a webpage with hyperlinks to FTP downloadable files.
Pangaea Information Technologies, Ltd. 111 USACE IJC LOSLR IMS
While more coordinated than the Status Quo alternative, a Single Repository has limited potential for facilitating long-term data sustainability. Data owners and corresponding data stewards with the ability, interest and motivation to ensure long-term data sustainability are likely to be less willing to manage data in a single system (read: national and provincial concerns and legal issues). As with the preceding alternative, this system would preclude the possibility of active or incremental maintenance functionality for individual datasets. The additional costs associated with the implementation of a single data repository would include the expansion of additional storage volume on a system having ample bandwidth to accommodate the need for data transfer associated with data distribution.
The third alternative identified to address the need for a coordinated data management strategy involves the implementation of a Single Data Base Management System (DBMS) for data storage, maintenance, access, and distribution. Establishing a single DBMS in which data is loaded and stored in a relational database environment will facilitate the full integration of data into a comprehensive system. A database system in which data is stored in a logical structure will allow for data to be integrated into other systems and accommodate the application of other technologies much more effectively than through using the file structure approach of the previous two examples. The single location will facilitate data integrity, security, back-up and archiving. However, because long-term sustainability is dependant upon the willingness and ability of data owners and stewards to maintain datasets, as with the previous alternatives this one prohibits long- term sustainability by inhibiting regional ownership and stewardship.
Policies to provide for appropriate public accessibility would need to be established under the Single DBMS alternative. Similar to the single repository, a flexible data security model and standards for data transfer would need to be implemented. Costs associated with the single system alternative include hardware, software, development, training, implementation, and maintenance.
The three options for the Single DBMS alternative and the following three DBMS alternatives include: 1) the development and provision for interactive Data Viewing and Map Making using open source software, 2) the implementation of Proprietary Internet Mapping Services that offers more robust geospatial analysis functionality than that in the
Pangaea Information Technologies, Ltd. 112 USACE IJC LOSLR IMS
first option, and/or 3) the implementation of system “Middleware” that allows the connection of geospatial applications in certain DBMS environments.
The fourth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to the single DBMS described above, but divided and managed by the respective national offices of the IJC in Ottawa and Washington DC. A dual system would be developed and maintained in a consistent and interoperable manner so as to support seamless data access across national jurisdictions. By committing to the development and maintenance of a system managing data for the LOSLR Study by national jurisdiction, the IJC would build an infrastructure to support the data management needs of the LOSLR Study, and potentially, future studies. This option offers direct control over almost every aspect of systems development, implementation, and maintenance, without reliance on the coordinated effort of other agencies to form a functional information infrastructure. However, it does not take advantage of the exiting pool of available resources nor the long-term benefits associated with a more distributed, regional approach.
This alternative would require the Study Board’s support to equip the IJC national offices with the necessary hardware, software and expertise required to develop, implement and maintain interoperable geodata management systems. Because this approach requires the development of IM support staff and resources, the cost associated with this dual system is substantially greater than the regionally distributed alternative, which takes advantage of the infrastructure and established knowledge base of other regional organizations. However, while the cost is associated directly with the LOSLR Study’s IM system development, implementation, and maintenance, it could also be considered an investment for future studies and other IJC information management needs. So long as the IJC would choose to support and maintain scalable data management systems, future studies could take advantage of the infrastructure created by this alternative as well as the knowledge base established within the IJC as a result of the development and maintenance of the systems.
The fifth alternative identified to address the need for a coordinated data management strategy involves the implementation of a data system similar to the single system described above, but divided and managed at the regional level. The Regionally
Pangaea Information Technologies, Ltd. 113 USACE IJC LOSLR IMS
Distributed DBMS most effectively addresses the need for regional partners to ensure the longevity of data associated with the Study. As with data owners, regional system maintainers would need to be identified just as data owners would. This data management model is the most flexible and progressive; it is endorsed and promoted by leaders in the public, and private sector, and by NGOs in the geospatial IT community.
The regionally distributed systems would be developed in a coordinated effort to ensure maximum consistency in system implementation and maintenance. Interoperability standards would be need to be specified to ensure greater integration and connectivity to other systems, and can more easily accommodate other technologies such as web services (WMS and WFS). At present, regional systems to be established through or as a part of the Great Lakes Commission (Great Lakes Information Network at the University of Michigan), Land Information Ontario (part of the Ministry of Natural Resources), and ECQR are probable candidates as regional components in this distributed set of DBMSs. While all three systems in a regionally distributed information system will have separate administration, consistency should be promoted during development to ensure a common approach to data storage, maintenance, access, and distribution. In addition to addressing seamless system development and implementation, data held in the systems will be clipped to a common boundary and/or need to be made seamless in order to facilitate the overall consistency of the Study data. System development for this alternative will require beyond that necessary for the Single DBMS, in order to accommodate for the additional coordination of effort and system implementation. Options associated with this alternative are identical to those listed for the single DBMS alternative but with additional considerations due to the distributed nature of this alternative. Because the Study data is distributed across three servers, it will be necessary to implement middleware on each of the regional systems to allow a single web service to utilize all three stores of data, hence increasing cost.
A final alternative that should be considered to address the need for coordinated geospatial data management involves implementing data systems similar to the “Single DBMS” described above, but with components distributed among TWGs. This approach has several advantages, although these are confined to activities that will take place during the duration of the Study. The TWG Distributed DBMS alternative would place the data and system in relatively close association with the data developers and initial
Pangaea Information Technologies, Ltd. 114 USACE IJC LOSLR IMS
data users. As such, reliable access and control over the data has the potential to increase the overall motivation required for system upkeep during the Study. Moreover, because the system and geodata would be managed by that data’s primary user-group, data currency and integrity should remain up-to-date.
As with the other distributed DBMS alternative, interoperability is essential. In order for Study Participants or the public to simultaneously access multiple geospatial databases that are distributed among TWG servers for interactive data viewing or map-making, each system component should be standardized for consistency across the Study inasmuch as possible. Insofar as the systems are standardized, interoperability and the potential for providing connective features such as web services would be promoted.
Because this approach includes datasets that encompass international and provincial boundaries, unlike the “Regionally Distributed DBMS” alternative, securing data owners with the motivation to provide for database maintenance beyond the Study’s terminus could prove problematic. Likewise, the system longevity would be dependent on securing a motivated steward prior to the completion of the Study. Obviously, this alternative would require the Study Board’s support through the allocation of funding required to implement a large network of distributed systems, one for each individual TWG.
Recommendation for Data Storage, Maintenance, Access, and Distribution To address the need for Data Storage, Maintenance, Access, and Distribution, implementation of a “Regionally Distributed System” (Alternative 5) is recommended. The system recommended in Alternative 5 would be distributed among the three political regions (Quebec, Ontario, and New York State) that comprise the Study area. Because the IJC does not wish to serve in a data maintenance capacity beyond the life of the Study, data owners and stewards will need to be assigned to ensure long-term sustainability of data. Regional agencies have the necessarily interest in the datasets and motivation to ensure the data’s longevity. This alternative increases the likelihood that the system and the data will likely remain sustainable in the long-term, and can be recommended because of the existing resources available to the study in the form of regional DBMS’s and knowledge bases. [A DBMS approach is also recommended because of its unique ability to address identified Study needs.]
Pangaea Information Technologies, Ltd. 115 USACE IJC LOSLR IMS
Options 1 and 3 are recommended: establishment of web-based Data Viewing and Mapping, and installation of “middleware” to provide for system interoperability and OpenGIS Consortium compliancy for other Open Web Services (OWS).
The costs associated with these alternatives and options are presented in the table that follows (recommendations appear in bold text):
Data Storage, Access and Distribution Alternatives, Time of Probable Cost Cost Options, and Recommendations Implementation (FY2002) (Thru FY2005) Alternative 1: Status Quo Now $0.00 $0.00 Alternative 2: Single Repository 2nd Q, FY2002 $20,600.00 $50,300.00 Alternative 3: Single DBMS 3rd Q, FY2002 $128,000.00 $248,000.00 Alternative 4: IJC Distributed DBMS 1st Q, FY2003 $256,000.00 $496,000.00 Alternative 5: Regionally Distributed DBMS 1st Q, FY2003 $143,125.00 $291,625.00 Alternative 6: TWG Distributed DBMS 3rd Q, FY2003 $304,200.00 $502,200.00 Option 1: Data Viewing and Mapping (WMS) 1st Q, FY2003 $12,000.00 $12,000.00 Option 2: Proprietary Internet Map Services 1st Q, FY2003 $30,000.00 $61,200.00 Option 3: Middleware 1st Q, FY2003 $11,550.00 $17,550.00 Total Cost of $166,675.00 $312,175.00 Recommendations :
Policy essential in the implementation of the recommended alternative and options are:
All primary Study participants (e.g., Study Board, PIAG, and TWG members) should be given access to all data and information utilized and/or produced by the Study, with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns. All other interested parties should be given access to any data and information which is considered new or having value added to it by activities of the Study,
Pangaea Information Technologies, Ltd. 116 USACE IJC LOSLR IMS
with the exception of data and information having special security, liability, privacy, licensing, or proprietary concerns. “New data or information” could be defined as that which did not exist prior to Study activities and was generated from primary data collection procedures as a direct result of Study activities, i.e., model output or results. “Value-added data and information” could be defined as that which has been significantly improved as a result of Study activities in either its content or usability. Data owners, and especially data steward, should be identified as early as possible prior to the end of the Study. Aspatial Information Management Alternatives and Options Without question, other flows of information will be necessary for the Study to be successful. In particular, it is likely that administrative and document management tools will become increasingly desirable as the Study progresses. However, without having developed a Communications Strategy or specific policies for internal reporting procedure or functions, specific recommendations are difficult. Given this lack of information it would be prudent to err towards a more robust document management system that is scalable and possesses the capacity for extensibility. Designing and implementing a system that will not meet changing or currently unforeseen critical Study needs could prove very costly (and wasteful) in the long run. Designing a system that is extensible and scalable provides insurance against this.
Having said this, the following document management system components and functions are recommended: 1) Commercial Off The Shelf (COTS) software for web-based document and other information management, such as Xerox’s DocuShare (see Section 7.3.2.1.1). This higher-end web-based document management system could prove extremely useful in meeting internal Study IM needs. 2) A web-site with documents and other information presented in a hierarchical structure. This is simply a recommendation for the organization of the existing web-site. Basic HTML text search functionality should be provided.
Pangaea Information Technologies, Ltd. 117 USACE IJC LOSLR IMS
3) A web-enabled Shared Vision model. The IMS team views this model as having the potential for more than an excellent decision support tool. Its structure allows the integration of all essential information (i.e., links to model descriptions, model inputs, etc) that could facilitate evaluation of the Study and support the recommended data discovery, evaluation and access schema.
System Component Integration The primary system components, as recommended above, are:
the three regional database management systems (DBMS), a web-mapping and geodata viewing application, the Study website, a Study-wide document management system with web interface, and a web-enabled version of the Shared Vision Model (SVM). Given the distributed nature of the Study Participants and the stakeholders within the study region, the Internet should serve as the backbone for integrating the Study’s IM system. Study web pages and hyperlinks contained therein then serve as the means for providing linkages among the recommended applications as well as the collection of documents, databases, images, etc. that comprise the Study’s body of data and information.
Under this scenario, the Study website serves as the focal point, and point of departure, for all system functions. One link from the Study homepage would take the user to a page (or separate site) devoted to geospatial data. From there, the user could access static maps, interactive web mapping services, and/or a metadata clearinghouse for data discovery. A second link from the homepage would take the user to the document delivery system, where Study Participants and users would be authenticated, and provided with access to different sets of documents at login. From there, the user could perform Boolean searches on the index generated from the complete text of every document in the system, and/or keyword searches on document metadata. A third linked page would be devoted to the Shared Vision Approach. The SVM would be described there and, when it becomes available, links to the web-enabled Shared Vision Model, a user manual, etc. would be incorporated. Finally, pages with frequently used or essential
Pangaea Information Technologies, Ltd. 118 USACE IJC LOSLR IMS
documentation or information could be accessed through additional links present on the homepage (some of these links and associated pages currently exist).
In designing this system for the Study, the redundancy of user pathways to different system components, and their contents or applications, is emphasized. For example, hyperlinks to a particular document will be present at all logical locations within the system. Organization of data and information services via the existing Study website will allow for the efficient and simple query and transfer of information to both the public and to Study participants. Moreover, by utilizing the familiar structure of web portals as a central information store, all users of the system will immediately be able to find the information that they are searching for.
Pangaea Information Technologies, Ltd. 119 USACE IJC LOSLR IMS
10.0 REFERENCES
International Joint Commission (2000) Tenth Biennial Report on GLWQA http://www.ijc.org/comm/10br/en/chap6en.html, accessed 3/26/02.
International Joint Commission (1999) Plan of Study for Criteria Review in the Orders of Approval for Regulation of Lake Ontario - St. Lawrence River Levels and Flows http://www.ijc.org/boards/islrbc/pos/pose.html, accessed 3/26/02.
International Joint Commission International Joint Commission (2000) Directive to the International Lake Ontario-St. Lawrence River Study http://www.losl.org/about/mandate-e.html, accessed 3/26/02.
Pangaea Information Technologies, Ltd. 120 USACE IJC LOSLR IMS
Appendix I: Needs Assessment Questionnaire distributed to all TWGs
Please answer each question as thoroughly as you are able. We recognize the likelihood that no one person receiving this questionnaire will be able to respond to every question. Please note that the response box size should not dictate the length of your response; the boxes will expand to accommodate more lengthy responses.
What is your vision of an implemented information management system for the study? What characteristics would that system possess?
Responses may be different for different data sets or information types. In these cases, identify the data set or information type to which you are referring in your response (see example).
A. Data Identification (Shared Vision)
What Performance Indicators (PI) has your group identified?
What data has your group identified that is necessary to support the PIs?
B. Data Collection/Acquisition
What data has been collected/acquired? How was the data collected? Who collected the data? (e.g., TWG member, contractor, other agency) Is the data or information associated with a particular geography within the study area?
Pangaea Information Technologies, Ltd. 121 USACE IJC LOSLR IMS
What data is currently planned for collection/acquisition? When is collection of the data planned? How will the data be collected? Who will collect the data? (e.g., TWG member, contractor, other agency) Will the data be associated with a particular geography within the study?
In addition to data listed above, what data is still needed to support the PIs identified by your group? (i.e., are there any data gaps?) Who do you expect will fulfill the data need?
Who provided (or will provide) funding for the data acquisition? Who owns (or will own) the data?
C. Modifying/Manipulating Data (To get data to usable format)
What modification/manipulation of the data is necessary to get the data to a usable format? What is that format?
What data standards are (or will be) applied to the data? (i.e., SDSFIE, OGC, ISO, etc.).
Pangaea Information Technologies, Ltd. 122 USACE IJC LOSLR IMS
What quality assurance and quality control procedures are (or will be) in place?
D. Data Use: Creating Information from the Data (via Display, Query, Analysis, Modeling)
For what activities, tasks, procedures are the data to be used? (e.g., as inputs to models, for displaying locations of wetlands sensitive to water levels, etc.)
What are the information products of the activity?
How will these information products be used? Will special software or hardware be necessary for use? Are there proprietary issues associated with the software? Will training be required?
Who needs to use the information? Do they have the capability to use it? If they do not, how will this capability be achieved?
Pangaea Information Technologies, Ltd. 123 USACE IJC LOSLR IMS
Are there any restrictions regarding the appropriate use of the data and/or information? Are there liability concerns?
E. Data Cataloging/Indexing (Metadata) Note: If metadata (information about data sets) exists in digital format, please attach files to your response.
Does metadata exist for current data? What metadata standards were used? How will metadata be handled in the future?
What relationships exist between attributes of different data sets? (e.g., What key attributes exist for the data sets?) [See Glossary.doc, if necessary.]
F. Data Storage/Management
Where does your group currently store data?
What are the limitations, if any, of your current data storage system?
What, if any, are the current maintenance schedules for your data? What is (or will be) the frequency of data updates? Who is responsible for data maintenance?
Pangaea Information Technologies, Ltd. 124 USACE IJC LOSLR IMS
Do you use the ftp site or study website for information management purposes? If so, do they meet your needs? Would an Intranet or dedicated, secure website be more convenient for you?
G. Data Distribution/Networking
Who needs access to the data and/or information? (e.g., TWG members, Board members, the Public-At-Large, Permitted users such as certain researchers, etc.) Do these parties currently have the capability to access the data?
Are there (or will there be) any data sharing agreements required to have access to data and/or information?
Are there security issues with the data and/or information? What potential security problems do the data and/or information pose?
H. Archiving
What information needs to be archived?
Will the archive need to be publicly accessible?
How are people going to use the information in the future (i.e., ten years from now)?
Pangaea Information Technologies, Ltd. 125 USACE IJC LOSLR IMS
Where should the data ultimately reside? Who will or should pay to maintain this archive?
Is it desirable to archive key correspondence? Do you currently archive key correspondence? If correspondence is currently archived, how is this achieved?
Pangaea Information Technologies, Ltd. 126 USACE IJC LOSLR IMS
Appendix II: Lists of model inputs and outputs as defined by Technical Working Groups (TWGs) or extracted from the Plan of Study
Common Data Needs TWG
Outputs - Topographic LIDAR data - Existing ortho-imagery - Color IR aerial photos - IR aerial photos - Bathymetric data (Shoals and other sources) - Half-meter contours o TWG-defined and other elevation products o Derived from LIDAR, SHOALS, and other sources - Base mapping layers o Shoreline units o Shoreline data o Watershed boundaries o Political boundaries o Transportation features o Conservation authority boundaries - IKONOS 4m multi-spec ? Digital Raster Graphics (DRGS--scanned topos) and Canadian equivalent
H&H TWG
Outputs - Three different hydrologic scenarios: 1. recorded historic sequence, 2. stochastically generated sequences (10,000 yrs long), 3. and climate change scenarios
- Time Series datasets for each scenario:
Pangaea Information Technologies, Ltd. 127 USACE IJC LOSLR IMS
o net basin supplies to the Great Lakes, o outflows from the Ottawa River and from other key downstream tributaries, o hydraulic effects of ice and vegetation, o diversions
- For each of the 12 datasets (3 scenarios x 4 datasets for each scenario) o Levels for Lake Ontario o Levels and flow for river systems (includes flow splits around islands, velocity distributions, water depths, and other detailed hydraulic information in necessary)
Inputs - Bathymetry - Water level - Discharge - River velocities - Precipitation - Temperature - Site specific hydraulic relationships - Shoreline - Statistical characteristics of hydraulics - GCM outputs
Hydropower TWG
Outputs - Value of power generation - Timing of outflows ~f(seasonal and daily demand for electricity) - Replacement cost of generation - Environmental impacts of replacing hydropower with generation from fossil plants
Pangaea Information Technologies, Ltd. 128 USACE IJC LOSLR IMS
Inputs - Demand curves and supply curves for generation - Cost of emission that are offset by hydropower generation - Entity specific market value of generation during various seasons and peaks times of the day - Historical river levels, head, and outflows at hydro-plants - Historical hydro-plant generation data - Measured noxious waste emissions - Noxious waste: Mwh ratios - Noxious credit value - Predicted effects of hydro-plant upgrades
Coastal TWG
Outputs - Loss of land (riparian, public, environmental) and buildings - Preservation of beaches, and other depositional areas - Costs for shore protection (new or maintenance of old) - Percentage of shoreline requiring new or improved shore protection - Costs of preserving natural lands - Percentage change in average annual recession rates of different shore types - Flooding (areas, location, timing, duration)
Inputs - Hypsography and Digital Elevation Models (DEM) o Contours o Profiles - Building footprints or point locations (structural type) - Recreational Facilities (camps and campgrounds) - Digital Line Graphs (or local source data) o Roads and rail o Pipes and wire infrastructure (sewers, water and gas line, etc.) - Heinz factor - Property values (including assessment data if necessary)
Pangaea Information Technologies, Ltd. 129 USACE IJC LOSLR IMS
- Parcel - Ownership - Landuse (current, future;trend) - Zoning - Water levels and flows - Water depths - Basic shore type (e.g., bedrock, bluff, beach, riverbank, etc.) - Stratigraphy (i.e., is it a composite bluff, is it homogenous?) - Sand Content for bluff shorelines…how much sand is available to be put into transport and possibly build beaches - Bluff heights, slopes, degree of gullying (#’s of gullies) - Types of shore protection in place (again based on existing classification scheme) - Quality of shore protection (where discernable) - Shore protection trends…if available; otherwise from: o FDRP 1:2000 shoreline mapping o Historical Shoreline mapping - Crest elevation and toe elevation - Recreational boating structures (docks, marinas, ramps, etc.)…any existing mapping information? - Shore management plans - Nearshore geology - Wind data - Wave data - Ship wake wave data o Ship traffic by vessel size and type o Rec. Boat traffic by vessel size - Wave run up and storm water frequencies - Ice conditions (percent and duration of ice cover, thickness, armouring effects, etc.) - River currents - Sediment transport rates - Sediment budgets - Sediment characteristic studies (grain sizes, etc.) - Soil erodibility (for river banks)
Pangaea Information Technologies, Ltd. 130 USACE IJC LOSLR IMS
? Soils databases - Finite element flow fields for various river stages for both Cornwall – Montreal and Montreal-Trois-Rivières (EC) - Recession rates - Canadian and U.S. topographic maps (DRGs if available) - Ontario Base Maps…any Quebec equivalent? - Digital Orthophotography - Aerial Photography o Most recent air photos, ideally at a 1:6000 scale or better can assist in defining shoreline features or else digitizing shore features and other information. It would also be useful (essential for at least some if not all of the local/regional study sites) to have air photo sets from previous years. Basically we should catalogue here any and every air photo set that is potentially available. Oswego County Soil and Water Conservation District has a set of vertical aerials of the ELO sandy shoreline, both Jefferson and Oswego Counties, consisting of one set from each decade, 1938, 1942, 1955, 1964 or 65, 1974, 1984, 199? o Especially interesting would be any sets of photos prior to 1958 (e.g., 1930s and 1950s) as this may help in examining recession rates during the pre regulation years. It may not be that important to assess pre-regulation recession rates as these simply are associated with a unique water level range (and importantly wave climate) – as are the post-regulation changes. In other words, there is not a large advantage in testing/calibrating the model to one period or the other for most shore types. - Ground and Oblique Photography – basically any “field” photography that may exist that was taken from boats, planes, cars, or on foot. This may help us better define shore features or protection structures for example. - Videos - we used aerial video to assist in classifying the US shoreline and it was extremely useful. Thus if any recent video of any portions of the shoreline can be found, they may be very useful to us. OMNR and EC has already identified some video for the Ontario shoreline, we have 1998 for the US. Quebec is unknown? - Government or Consulting Reports - any coastal investigations and reports generated by consultants, government agencies, etc. will be useful, if only for
Pangaea Information Technologies, Ltd. 131 USACE IJC LOSLR IMS
background information. This might include site specific consulting reports related to shorefront development. o Master plans o Zoning Ordinances o Setback Regulations o Land use management practices
Rec. Boating and Tourism TWG
Outputs - Marina, yacht club (available docks) use - Number of boaters and boater-days - Water related tourism ($) - Regional economic activity - Effect of alternative plans on recreational benefits, related revenues, and adaptation costs for boaters and service providers
Inputs - Depth and specific conditions at marina and yacht club sites - General bathymetry - Water levels - Marina and yacht club site specific data o Number and type of infrastructure (e.g., buildings, docks, ramps, etc.) o Infrastructure protection features and types o Depth constraints o Adaptation measures and types o Dock occupancy at different times (seasonal and water level specific) o Number, type and draft of boats o Damages (type, degree, and economic value) - Boater information o Basic demographic and revenue characteristics o Type of boaters o Boater days in the lake and the river (and for substitution sites)
Pangaea Information Technologies, Ltd. 132 USACE IJC LOSLR IMS
o Actual expenses (total and by type) o Willingness-to-pay to keep the water level acceptable (according to water level scenarios) o Attitudinal questions (appreciation of water level and quality) - Water related tourism information o Tourism population (demographic and revenue characteristics by activity type) o Type of tourism (e.g., distraction, adventure, “eco-tourism”, etc.) o Tourism days in the lake and the river (and for substitution sites) o Actual expenses (total and by type) o Willingness-to-pay to keep the water level acceptable (according to water level scenarios) - Regional impact of tourism ? Aquatic vegetation
Environment TWG
Output - Coastal wetland plant community extent and composition/diversity - Wetland vertebrate community response - Fish habitat response - Fish mortality response - Fish production response - Muskrat population - Bird populations - Amphibians and reptiles populations - Availability of natural habitats to meet faunal requirements - Special interest habitat - Water quality
Inputs - Current and historic aerial photography - Current and historic plant community extent
Pangaea Information Technologies, Ltd. 133 USACE IJC LOSLR IMS
- Bathymetry and topography - Plant community information - Bird habitat association information (e.g., surveys, literature, etc.) - Fish habitat association information (e.g., surveys, literature, etc.) - Mammal habitat association information (e.g., surveys, literature, etc.) - Herps habitat association information (e.g., surveys, literature, etc.) - Sediment composition database - Sediment core database - Rebound rates database - Long-term historical lake level database - Pelagic zone habitat information - Water Quality (e.g., transparency, contaminant concentrations, suspended particulates) - Phytoplanktonic production - Biomass accumulation (algal blooms) - Zebra mussel population information - TES information - Vertebrate breeding sites - Vertebrate growth, migration patterns, and food web processes - Ground water flow information - Long-term St. Lawrence River level database - Experimental fisheries in the St. Lawrence River - Fish production in managed marshes - Long-term series for commercial fisheries
Municipal, Industrial and Domestic Water Use
Outputs - [Undefined]
Inputs - Intake and treatment facility locations - “Shore well” locations
Pangaea Information Technologies, Ltd. 134 USACE IJC LOSLR IMS
- Database resulting from interviews - Water use demographics
Commercial Navigation
Outputs - Stability and predictability of water levels - Transportation costs
Inputs - Physical commercial navigation systems o Ports (controlling depths, dredging needs, dock depths and locations, loading capabilities/rates, etc.) o Channels and locks (maintained depths and widths) - Regional transportation infrastructure (truck, rail, and barge systems) - Vessel information o Characteristics (length, width, maximum vessel draft, tons per inch immersion factors, maneuvering characteristics, efficiencies per inch of immersion, etc.) o Historical tonnage levels o Origin and destination routes for Lake Ontario ports o Operating characteristics and limitations - Currents - Water depth [level minus bathymetric?] - Wind speed and direction - Tributary flows - Deep sea traffic information (load and schedule) - Water level gauge databases maintained by power entities - Existing reports and literature - Survey information from vessel operators and port/dock operators
Pangaea Information Technologies, Ltd. 135 USACE IJC LOSLR IMS
- Revenue for shippers and port/dock operators - Hydrodynamic phenomena information (e.g., squat and bank suction)
PIAG
Outputs - Press releases - Periodic status reports - Technical reports - Summary reports - Public feedback database - Budgetary reports (funding and expenditures) - Personnel involvement reports - Survey questionnaire (pdf format) and results reports - Meeting minutes - Calendar of events - Educational materials - Contact, area-of-interest, and level-of-detail database for all interested parties
Inputs - Existing contact information lists - Reports at various levels of detail from TWGs and the Study Board - Meeting minutes from TWGs and the Study Board - Survey responses - Feedback from public - New contact information from website
Sensitive Data with Security, Privacy, Proprietary, or Liability Issues
- City of Kingston Geo-data - City of Hamilton Ortho-imagery - IKONOS data
Pangaea Information Technologies, Ltd. 136 USACE IJC LOSLR IMS
- Identities and addresses of public - Revenue information associated with specific marinas or yacht clubs - High resolution aerial photos of facilities (e.g., power plants) - Megawatt pricing associated with each power generation entity - Erosion lines, flood limits, and property values ? Water levels under changed climate regimes - Precise T&E locations - Water intakes and outflow/outfalls
Transfer of information between TWGs
- Topographic and Bathymetric data from CDN to o Coast o Environment o Rec. Boating and Tourism (possibly) - Ortho-imagery from CDN to o Coastal - IKONOS data from CDN to o Environment - Other base mapping data from CDN to o ALL - Coastal data to o Environment o Rec. Boating and Tourism o Water Supply - Rec Boating (launching ramp, marina, and other locations) to o Coastal ? Aquatic plants information from Environment to o Rec. Boating and Tourism ? Aerial ortho-photos from Environment to o Coastal - H&H levels and flows to o ALL
Pangaea Information Technologies, Ltd. 137 USACE IJC LOSLR IMS
- Output PI values/results from ALL to o Plan Formulation (and Evaluation?) Group ? Rec Boat traffic (wave data) from o Coastal ? Ship traffic from Commercial Navigation to o Coastal
Pangaea Information Technologies, Ltd. 138 USACE IJC LOSLR IMS
Appendix III: DIWG Policy Examples
Suggested Data Product Requirement for Grants, Cooperative Agreements, and Contracts (DIWG 1997):
Describe the plan to make available the data products produced, whether from observations or analyses, which contribute significantly to the
Standard Data Citation Format (1998 DIWG): o Name of the data set,
o Name(s) of the individual(s) with primary intellectual responsibility for the data set's development
o Organization from which the data set is available,
o Month and year that the data set was made available,
o URL for the data set, if available, or for its location.
Pangaea Information Technologies, Ltd. 139 USACE IJC LOSLR IMS
Appendix IV: FY2002 CAP Grant Proposal Summary
A multi-sector partnership proposes developing a framework for geospatial data essential for research, management, and business operations in the Great Lakes region. As a starting point, the “Lake Ontario – St. Lawrence River Framework Data Project” will integrate, afford discovery of, and begin to provide for the long-term storage, maintenance, and flexible accessibility of a number of “framework data” layers. These include shoreline, political units, transportation features, watersheds, hydrography, conservation management areas, orthoimagery, and elevation (hypsographic and topometric) data. The project is designed to provide a scalable system with respect to new participants, data types, geographies, and data uses, and to augment the growing knowledge base by documenting all procedures, policies, and lessons learned, and making these widely available.
Pangaea Information Technologies, Ltd. 140 USACE IJC LOSLR IMS
Appendix V: Public Participation Management Tools
A.V.1 Information Collection Using Web-based Forms
Organizations or private companies often wish to collect information from the general public, or alternatively may wish to actively solicit specific information via surveys targeted at specific user groups. The Internet can be an effective medium for collecting this type of information via a number of mechanisms including web-based surveys, web based “customer” feedback forms, and web-based contact lists. A brief discussion and examples of alternatives associated with each of these functions is provided below.
A.V.1.1 Web-Based Surveys
Web based surveys are common-place on the Internet and can range in form from simple and brief questionnaires that ask a few questions to complex multi-page surveys on a specific topic or range of topics.
There are multiple advantages to conducting a web-based survey. First and foremost is that the survey response data (submitted by the user) can be written directly into a database file which, in most cases, can then be directly input into specialty survey software for statistical and other analysis. This eliminates the need to enter survey responses into the database in a manually and/or interactively.
Web surveys can also be easier for the user to complete. Use of HTML drop-down lists, “radio-buttons” or check-boxes, makes it easier for the user to provide responses to specific questions. The graphical nature of the web also allows for creative survey presentation, as well as the ability to easily integrate hot-linked “Help” files for the user as they progress through the survey.
Where required, on-line surveys can be password protected so that they can be accessed only by permitted target groups. In addition, if required, unique passwords can be provided to each user so as to eliminate multiple responses from the same user. This allows tight control over the response group and can help eliminate error or bias in the results.
Pangaea Information Technologies, Ltd. 141 USACE IJC LOSLR IMS
On-line surveys can also have a degree of quality control or error checking built into them. For example if a particular question requires the user to input a number (e.g., # of employees) and the user inputs a letter by mistake, that particular field can be coded to only recognize numbers and generate an error message back to the user upon submission.
There can be a number of limitations to web surveys as well. Foremost is that response rate will be governed by the number of people in the target audience who have easy access to the Internet. This should be a critical consideration to evaluate prior to survey development.
If the survey is accessible to the general public (e.g., no password limitations), there is a chance that malicious or false data could be entered. In this case it is important to have a mechanism in place where all survey responses are checked for accuracy prior to population of the database file. Similarly, without password protection users can answer the survey more than once, which would introduce bias into the survey results.
Complex surveys can often require significant database coding (along with the HTML coding of the survey form) in order for the survey results to be written directly and accurately to a database file. This can lead to increased development costs.
Web surveys can be created from “scratch” using basic HTML as well as database programming as required. Alternatively there are a number of commercial web survey design software packages available on the market. Examples of these alternative approaches are provided below.
A.V.1.1.1 HTML Survey Design
Developers with a good knowledge of HTML (including web design programs) as well as relational database programs can quickly and efficiently design and implement web based surveys. Figure A.V.1 is an example of a portion of a survey that was designed using only MS Front Page as the web development tool. This was a fairly complex survey designed to solicit detailed employee wage and training information from businesses for Human Resources Development Canada. The survey was made easier for the user to
Pangaea Information Technologies, Ltd. 142 USACE IJC LOSLR IMS
view and navigate through the effective use of colors and tables, as well as through careful placement of survey tips and instructions.
Figure A.V.1 - Example of Web Based Questionnaire
Upon completion of the survey, users clicked on a submit button and responses were submitted directly into an MS-Access database file that resided directly on the web server. This file was downloaded weekly by the research consultant and checked for errors and missing data or duplicate entries. Once verified, the data was imported directly into their survey analysis software for statistical analysis. Survey response data was also used to develop a publicly-available web page that presented the results of the survey and allowed users to request and generate a series of reports summarizing key aspects of the survey responses (excluding proprietary or confidential information). Users could request a series of pre-defined reports on the employment data provided in the survey responses; additionally, they could tailor their search by Standard Industry Codes (SICs) or National Occupational Codes (NOCs) as
Pangaea Information Technologies, Ltd. 143 USACE IJC LOSLR IMS
well as by geographic region (Figure A.V.2). The survey database was then queried for the appropriate information and a report was generated and sent back to the user in the web browser.
Figure A.V.2 - HRDC NOC/SIC Code Database Query Screen
Cost The cost for the development of a web-based survey from scratch like the one above will clearly depend on the complexity of the survey (number of questions, number of potential responses and associated database coding required). For the example provided, which ultimately surveyed over 2000 businesses (both web and paper surveys combined), the total costs, which included survey development (both web and paper), administration, statistical analysis and reporting and development of the backend reporting web accessible database, was on the order of $75-$100,000 US. Additional costs can also be incurred to provide the necessary hardware required to host the survey and the hardware and software required to host the backend database.
Pangaea Information Technologies, Ltd. 144 USACE IJC LOSLR IMS
A.V.1.1.2 COTS Software Programs
A number of commercially available software packages are available to assist in the development of web based surveys. Such programs are advantageous particularly for those who do not have good knowledge of HTML or database access programming. On the downside, they can be fairly expensive to acquire. Most include a number of standard features such as:
Pre-defined sample survey templates to help users get started Sample questions that can be used or modified Ability to save questions for future use Validation of responses on-the-fly to eliminate errors Real-time monitoring of responses Built-in testing features to ensure survey is functioning and complete Ability to export results to SPSS analysis software or spreadsheets/database programs Ability to incorporate corporate and organizational logos, colors, banners, etc.
Two examples of commercial web survey software programs are provided below.
Halogen eSurveyor 3.1 Domino http://www.halogensoftware.com/products/esurveyordomino.php
Halogen eSurveyor Domino 3.1 is a sophisticated yet easy-to-use web-based survey solution that delivers powerful research capabilities in a secure and configurable format. It makes online surveying simple, interactive, fast, and cost-effective. It comes with many standard features that allow the development of simple, but comprehensive surveys, and a number of reporting features that allow for detailed analysis of the data.
System Requirements Domino can run on the MS Internet Information Server or Lotus Domino Server environment. The server needs to be at a minimum a Pentium PC with 256 MB of memory, 100MB of disk space, running MS Windows NT 4.0 and (for MS IIS) running
Pangaea Information Technologies, Ltd. 145 USACE IJC LOSLR IMS
an Oracle or SQL database. For the desktop survey application, minimum system requirements are a Pentium PC with 64 MB of memory, MS Windows 98, NT or 2000 (should also be compatible with XP), 20 MB of disk space and either Netscape Navigator 4.07 (or later) or MS Internet Explorer 4, Service Pack 2 (or later).
Cost List price for the starter survey package (as on the Halogen web site) is $15,000 U.S. This provides a “floating license” which limits survey responses to no more than 1000 at any given time. Once survey responses are cleared or downloaded, the license renews itself. A single processor server license is available at $40,000 U.S. and a dual processor server license is also available at $55,000 U.S. The advantage of these is that they basically allow unlimited survey responses (contingent on the horsepower and capacity of the server they are running on).
Object Planet – Surveyor http://www.objectplanet.com/Surveyor/
Surveyor is a fully web based survey application that enables the production and publication of surveys and questionnaires on the Internet or your intranet in minutes using a regular web browser. Surveys are created, published, and managed through a standard web browser. There is no need for cumbersome installations on client computers. In Surveyor, reports are immediately available after respondents have started answering the survey, and can be displayed directly in a web browser, or exported to other statistical applications for further analysis.
In addition to being web-based, Surveyor is easy to use with surveys created and published in a few minutes when you have your questions ready. Online reports and analysis are available at the moment the recipients start submitting their survey replies.
System Requirements Minimum requirements Hardware • Intel-based processors • 192 MB memory
Pangaea Information Technologies, Ltd. 146 USACE IJC LOSLR IMS
• 100 MB hard drive
Software • Microsoft Internet Information Server 4.0 or newer (web server) • SQL Server 7.0 or newer (database engine) • Installation of 3 dll files: surveyor.dll, InfoLite.dll and mail component • Clients: web browser (IE 4.0+ or Netscape 4.0+ recommended)
Recommended requirements Recommended hardware requirements of the server will depend heavily on the number of surveys you plan to publish, and the number of respondents per survey. The size of your surveys (number of questions) will also be important. Most installations require a single processor, Pentium III, with 128 MB of memory. If you are running SQL Server on the same computer as your web server, at least 256 MB memory is recommended. A fast hard drive is an advantage, because the survey engine will store responses to disk continually.
Hardware • Dual Intel processors • 256 MB memory • SCSI type hard drive
Cost The cost for the Surveyor software itself is $499US for an unsupported license. For a license with 1 year of e-mail support the list price is $799US. A separate license would be required for each person wishing to use the software to create and administer surveys. Licenses do not appear to be based on survey respondents. The software is ordered and downloaded directly from the Object Planet web site.
A.V.1.2 Feedback Forms
Feedback forms are another useful way of soliciting information from the public via a web interface. In many ways they are a type of survey or questionnaire in that they are
Pangaea Information Technologies, Ltd. 147 USACE IJC LOSLR IMS
asking for specific information that is then received and can be stored in some type of a database for future use. Feedback forms can be set up very simply using many of the common web design programs. They can be structured to simply e-mail the required information, in text format, to the web administrator, or they can be set up to load the requested information into a database file.
An example of a feedback form is found below in Figure A.V.3. This particular form allows users to submit information on specific shoreline protection structure projects to the U.S. Army Corps of Engineers who are developing and maintaining a database of such projects throughout the U.S. In this example, drop-down lists are utilized to aid the submitter in selecting categories and sub-categories that have already been set-up within the database.
Figure A.V.3 - Feedback / Submittal Form for Section 227 Project Information
Pangaea Information Technologies, Ltd. 148 USACE IJC LOSLR IMS
Once the form is completed the user clicks the Submit key. This sends the survey response to a temporary database file on a server at the Corps. Records in this temporary database file are checked regularly and then verified by Corps staff. Once verified they are copied into the “live” database which is also searchable through the web interface.
Figure A.V.4 presents the database search query form. Once again the form is simplified through the use of drop-down boxes to keep the search possibilities easy for the user. Searches can be made by type of structure, composition of structure and location. Once submitted, the form sends the request to the server, the database is searched for all relevant entries, and a list of entries is returned to the user via the web browser (Figure A.V.5).
Figure A.V.4 - Section 227 Database Query Form
Pangaea Information Technologies, Ltd. 149 USACE IJC LOSLR IMS
Figure A.V.5 - Section 227 Search Results Page (only a portion of the returned record is viewable) Cost The cost to develop this type of functionality can be wrapped into the cost for an overall web design and will vary depending on the number of fields that a user will be required to fill in. If information is being sent as text based information via e-mail, additional development costs are minimal. Should the information go into a database program, additional development costs will be required to code and test the form appropriately.
A.V.2 Contact Addresses/Lists
Essentially another type of feedback form, the web can be used to request users to submit their names and addresses so that they can receive additional information on various topics, or be added to mailing lists for distribution. Like feedback forms, the request page can be set up very easily in most web design programs and the responses can be sent to an individual for manual entry into contact management software, or they can be populated directly into a database file residing on the server. Perhaps one of the most common forms of this nature are the on-line registration forms that need to be filled out
Pangaea Information Technologies, Ltd. 150 USACE IJC LOSLR IMS
when installing new software on a computer. Basically the company is gathering your key contact information for future reference. Once again costs can be part of the overall web design project and will vary depending on project complexity.
Pangaea Information Technologies, Ltd. 151 USACE IJC LOSLR IMS
Appendix VI: List of Acronyms
ANSI – American National Standards Institute API – Application Programming Interface CAP – Cooperative Agreements Program CCIW – Canadian Centre for Inland Waters CDNTWG – Common Data Needs Technical Working Group CDS – Coastal Data Server CEONet – Communities of Eastern Ontario Network CGDI – Canadian Geospatial Data Infrastructure COTS – Commercial Off-The-Shelf CUGIR – Cornell University Geospatial Information Repository DBMS – Database Management System DEM – Digital Elevation Model DIWG – Data and Information Working Group DMWG – Data Management Working Group DSS – Decision Support System EC – Environment Canada ECOR – Environment Canada Ontario Region ECQR – Environment Canada Quebec Region EIMS – Environmental Information Management System FEPS – Flood and Erosion Prediction System FGDC – Federal Geographic Data Committee FTP – File Transfer Protocol GCDIS – Global Change Data and Information System GCMD – Global Change Master Directory GIS – Geographic Information System GLC – Great Lakes Commission GLIN – Great Lakes Information Network GLINDA – Great Lakes Information Network Data Access GLWQA – Great Lakes Water Quality Agreement GSDI – Global Spatial Data Infrastructure GUI – Graphical User Interface H&H – Hydrologic and Hydraulic Modeling
Pangaea Information Technologies, Ltd. 152 USACE IJC LOSLR IMS
HTML – Hyper Text Markup Language IJC – International Joint Commission IM – Information Management IMS – Information Management Strategy ISO – International Standards Organization IT – Information Technology KB – Knowledge Base LIDAR – Light Detection and Ranging LIO – Land Information Ontario LMPDS – Lake Michigan Potential Damages Study LOSLR – Lake Ontario and St. Lawrence River MLI – Manitoba Land Initiative NA – Needs Assessment NAQ – Needs Assessment Questionnaire NOC – National Occupational Code NRCAN – Natural Resources Canada NSDI – National Spatial Data Infrastructure ODGD – Ontario Digital Geographic Database OGC – Open GIS Consortium OGDE – Ontario Geospatial Data Exchange OLID – Ontario Land Information Directory OLIW – Ontario Land Information Warehouse OMNR – Ontario Ministry of Natural Resources OWS – Open Web Services PDF – Portable Document Format PFEG – Plan Formulation and Evaluation Group PI – Performance Indicator PIAG – Public Interest Advisory Group QA/QC – Quality Assurance / Quality Control QME – Quebec Ministry of the Environment RRBDIN – Red River Basin Decision Information Network SDI – Spatial Data Infrastructure SDSFIE – Spatial Data Standard for Facilities, Infrastructure, and Environment SDTS – Spatial Data Transfer Standard
Pangaea Information Technologies, Ltd. 153 USACE IJC LOSLR IMS
SHOALS – Scanning Hydrographical Operational Airborne Lidar Survey SIC – Standard Industry Code TES – Threatened and Endangered Species TWG – Technical Working Group USACE – United States Army Corp of Engineers USEPA – United States Environmental Protection Agency USGCRP – United States Global Change Research Program USGS – United States Geological Survey UTM – Universal Transverse Mercator WFS – Web Feature Services (Server) WMS – Web Mapping Services (Server) Y2Y – Yellowstone to Yukon
Pangaea Information Technologies, Ltd. 154