Automatic Data Processing for Systematic Entomology: Promises and Problems
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Data Processing for Systematic Entomology: Promises and Problems. A report for the Entomological Collections Network First Annual Meeting 1 December 1990 Louisiana State University Baton Rouge, Louisiana Ronald A. Hellenthal University of Notre Dame, Notre Dame, INDIANA 46556 Jerry Louton Smithsonian Institution, Washington, D. C. 20560 Gerald R. Noonan Milwaukee Public Museum Milwaukee, WISCONSIN 53233 Randall T. Schuh American Museum of Natural History, New York, NEW YORK 10024 Margaret K. Thayer Field Museum of Natural History, Chicago, ILLINOIS 60606 F. Christian Thompson Coordinator Systematic Entomology Lab., ARS, USDA, Washington, D. C. 20560 Contents 1.Introduction THOMPSON 2.Justification for Literature Inventories SCHUH 3Justification of collection-based name capture THAYER 4.Proposed Model and Database file structures for Arthropod HELLENTHAL & LOUTON collection management S.Standard fields and terms for ecological and geographical data NOONAN & THAYER on arthropod 6.Standard data elements for classification and nomenclature THOMPSON 7.Proposed Data Exchange Standard for Arthropod Collections HELLENTHAL Automatic Data Processing for Systematic Entomology 1: 3 Biosystematic information is critical for today's world. Every major concern, such as global warming, food supply, environmental quality, etc., has a biological component that is dependent in part on biosystematic information. What is biosystematic information? Biosystematic information is all data that may be useful to man about organisms, such as what is it, what is it called, what does it look like, where does it occur, what does it do, when does it do it, and what does all this mean to me (= economic importance). Biosystematic information is organized by names arranged in an hierarchical classification based on shared (synapomorphic) similarities. Hence, biosystematic information can be obtained with a name. Names are obtained by identification of specimens, and identifications are made by matching attributes of unknown with known organisms. While everyone makes some identifications, for diverse and little known organisms, such as insects, identifications are made by systematists. Systematists need the data derived from specimens (and literature) to make the companions which lead to identifications. Specimens and their associated literature form collections. So, ultimately the biosystematic information must be derived from systematists and their collections. And, therefore, the methodology used by systematists to manage their collections and to produce biosystematic information is critical. Automated Data Processing (ADP) methods hold the promise of greater efficiency, but implementation appears to have caused problems. We, a small working group, met to investigate both the promise and problems, which were summarized by a series of questions. While these questions and our answers follow, our overall conclusion was that the promise was real, but problems were not, being due more to semantics and lack of communication. Systematic Entomology is at a critical transition. The goals of Systematic Entomology have been the enumeration of arthropod species and illumination of their characters and relationships. Today, the number of arthropod species is estimated in the 30 to 50 million range, with less than 10 percent of them known. This has led some to call for the abandonment of the goal of complete enumeration and the restriction of our work to those groups already well known (butterflies and mosquitoes) or of critical importance to man's welfare (agricultural pests & beneficials). Others have suggested, instead, that improvements can be made in the way systematists work. Such improvements would increase the rate of progress, making our goals realistic. Automation offers the promise of greater efficiency. For automated data processing technology to be truly useful, data must be shared. Sharing requires that all users understand how data and information are stored. Efficiency increases when common data standards are used, as less effort is required for conversion between different computer environments, less effort is spent on program development and maintenance, training, etc. This report is the first step toward the development and adoption of common ADP standards for Systematic Entomology. ADP Philosophy, Strategy and goals What are the goals we seek from ADP for Systematic Entomology? Who are our users (curators? scientists? students? the public?); and what are their needs? And, therefore, what is our strategy and philosophy? Our belief is that ADP offers the best promise of aiding systematics in reaching its goals. Our ADP philosophy is to handle data once and when first encountered, to analyse data frequently, and to generate and disseminate information as needed. Our strategy is to encourage all ADP efforts, to work toward common data standards, and to share data and information. Our goals are to increase research produc- tivity, information dissemination, and users' access to and satisfaction with biosystematic information. 1:4 A report for the Entomological Collections Network Given the massive data that systematise must handle to generate biosystematic information, the principal goal we seek from ADP is greater efficiency in data processing and sharing. Specimens and their associated data are wanted by systematists for analysis, the resulting information is desired by all. Given the enormous number of arthropod taxa, valuable manpower can not be wasted. So, literally every keystroke must be preserved and shared so together the diminished few can do what once many did and now every one wants! The basic problem with ADP standards in Systematic Entomology appears to be that of the blind men and the elephant. Various people have use ADP extensively in their work. Each feels that they know precisely what these ADP standards, the "elephant," should be, but each describes the elephant differently. So, the first question is: Is there really one and only one "elephant?" Second, if there is only one "elephant" can all our different views be integrated into a comprehensive description? Third, can each work independently on their part of the "elephant" so that the results can be used by all [that is, is parallel processing desirable?] A single comprehensive view of the data and information of interest to all is presented and a standard is proposed for the documentation necessary for sharing data and information. While these are preliminary proposals which may need further modifications, we believe their eventual acceptance by systematic entomologists will allow the community to maximize the promise of ADP. As users have different priorities, no one will begin by implementing the full view, and the approaches used to build the complete database will be different. However, acceptance of the comprehensive view and the standards associated with it, should insure that eventually all data and information can be integrated. A single comprehensive view of the data and information of interest to all is presented and a standard is proposed for the documentation necessary for sharing data and information. Endorsement of this report by the Entomological Collections Network will establish a protocol and begin the acceptance process for ADP standards for Entomological Systematics. The community needs to study this report, providing its comments to the working committee so that a final report can be prepared for adoption by ECN, Systematic Resources Committee of Entomological Society of America and other interested parties. Ultimately, these standards will be used to develop a consensus among biologists as whole. Building comprehensive systematic databases may start from inventory of collections or the literature, but both approaches are interdependent as one can not be completed without the other. Different funding sources make these different approaches significant. For example, at the National Science Foundation collection-based inventory work is funded by the Biological Research Resources Program, whereas funding for systematic catalogs (literature inventories) is provided by the Systematic Biology Program. Hence, collection-based inventory work is viewed more favorably among its peer than is literature-based inventory work. This is unfortunate as both are fundamental research resources for biologists and should be considered together on their merits for funding. Inventory goals will vary in respects to classification hierarchy. Minimally inventory data should be accumulated for higher order groups, such as family units. This is critical for the proper management of collections. Maximally users would like inventory information for species units. Literature-based species inventories (catalogs) are necessary for species level inventories of collections as well as being critical resources for other biologists. Specimens, which form collections, and their associated data (biological, geographic and temporal) are the basis from which all biological information is derived. Biological information is disseminated in publications. Modern databases of biosyste- matic information can be built from the original sources of data (specimens in collections) or from the sources of information themselves (the literature). Unfortunately, some biosystematic information is now only preserved in the written word (literature) because many specimens from which this information was derived were never preserved or have through time become lost. Likewise, no collection is complete, each