<<

A collaborative system for sharing paleontology collections data

Kenneth G. Johnson1 Harry F. Filkorn Mary Stecheson Department of Paleontology, Natural History Museum of Los Angeles County, 900 Exposition Boulevard, Los Angeles, California 90007, USA

ABSTRACT and physical sciences, and they represent it has been clear that the World Wide Web is one way forward into a data-rich future for an ideal forum to publish collections catalogs. Museum collections provide primary paleontology. Besides widespread availability and ease of data for paleontologists, and recent advanc- access, the Internet offers the additional ben- es in information technology have revolu- Keywords: geoinformatics, paleontology, e®t of allowing databases to be integrated into tionized how museums collect and share collections. new networks of bioinformatics and geoinfor- this information. However, many natural matics (Graham et al., 2004). Such networks history museums have huge collections and INTRODUCTION enable researchers to address questions re- small budgets, so museum scientists are garding the large-scale history of regional or challenged to keep these critical data cur- Fossil specimens are the best record of the global diversity in response to global environ- rent and available to the public. We suggest occurrence of a particular organism at a spe- mental change (e.g., Jackson and Johnson, that establishing an open collaboration ci®c time and place (Allmon and Poulton, 2000; Alroy et al., 2001), and are an inevitable through the Internet is one possible solution 2000), so collections are the raw data of pa- part of the future of paleontology. to this challenge. To achieve this solution, leontology. Collections are required for sub- Most natural history collections belong to we have implemented a Web-based collec- sequent researchers to check and reinterpret public or nonpro®t institutions that hold their tions catalog to encourage collaborative previous work, and they are an important collections in the public trust (American As- maintenance of collections data as a shared source of new information that can be released sociation of Museums, 2005). However, many resource. Anyone can search the catalog via by the arrival of new technologies and new of these institutions have recently been subject a simple interface designed for any stan- research questions. For example, collections to budget shortfalls (Dalton, 2003; Suarez and dard Web browser, and Web users can also have been used in studies based on morpho- Tsutsui, 2004) that have reduced support for be authorized to add information or update metric analysis, molecular methods including collections. At the same time, changing or- DNA sequencing, and various geochemical records as stratigraphic and taxonomic ganizational priorities has resulted in the techniques (Suarez and Tsutsui, 2004; All- concepts change. The goal is to establish transfer of collections to a smaller number of mon, 2005). Collections held by museums be- two-way communication between our cata- institutions (Gropp, 2003). For example, the come especially important in cases where log and the scienti®c community wherein Department of Invertebrate Paleontology at the museum shares its collections and re- original exposures are no longer available for the Natural History Museum of Los Angeles lated data, and in return the community collecting, as is commonly the case for man- County (LACMIP) currently contains collec- contributes new data acquired through use made exposures produced during road build- tions that formerly belonged to the University of the collections. The catalog also provides ing, quarrying, or construction. However, col- of Southern California, the University of Cal- a basic function for building links with on- lections of fossils are only useful if they are ifornia at Los Angeles, the California Institute line publications and other data sources. As accessible to potential users. Traditional use of of Technology, and California State Univer- data exchange standards become accepted, paleontology collections required researchers these links can be used to create metada- to visit museums and work with material on- sity, Northridge. The consequence of these tabases that could lead to global networks site or resort to secondary sources in the pub- transfers is that relatively small staffs are car- of collections, taxonomic, stratigraphic, and lished literature. In reality, much of the infor- ing for many large and important collections bibliographic information. By providing an mation about the contents of paleontology that are critical to the future of paleontology. ef®cient mechanism to locate and synthesize collections is passed along by word of mouth, Besides limitations in manpower, there is an large volumes of disparate information, as a kind of folklore: for example, Heinz Low- increasing shortage of expertise. With reduced such loosely integrated systems have result- enstam was a professor at the California In- staff, most institutions do not have in-house ed in rapid progress in disciplines of the stitute of Technology, so his collections might experts that can serve as taxonomic authorities be held by an institution in Southern Califor- in the entire spectrum of fossil groups repre- nia. Obviously, this is not the most ef®cient sented in their enormous combined collec- 1Current Address: Department of Palaeontology, Natural History Museum, Cromwell Road, London method to advertise the availability of impor- tions. Without this expert knowledge in-house, SW7 5BD, UK. tant research collections. For at least a decade, it is dif®cult to adequately maintain and im-

Geosphere; October 2005; v. 1; no. 2; p. 61±77; doi: 10.1130/GES00011.1; 7 ®gures.

For permission to copy, contact [email protected] ᭧ 2005 Geological Society of America 61

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 1. An example of a specimen lot from the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles County (LACMIP) collections, including paper labels containing potentially useful information that should be incorporated into the LACMIP specimen catalog.

prove collections without enlisting the support the Great Basin or Pleistocene mollusks of ued access to high-quality information. Other of experts in the broader paleontological com- western North America. Collections managers ®elds of research within bioinformatics are munity. This outside assistance must come provide free access to specimens and data, but reaching the same conclusion (Eiden, 2004; from the researchers using museum collec- sharing must become a two-way street. The Wilson, 2005). To help achieve this, we have tions to address questions in their own spe- research community using these resources developed a Web-based collections catalog cialized ®elds, whether Cambrian of must contribute its expertise to ensure contin- that can be jointly managed by the museum

62 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 2. A schematic model illustrating the architecture underlying the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles County (LACMIP) catalog. All information is stored in a relational database and is accessible through four user interfaces. Web forms and REST-style Web services can be used to search, browse, and add information into the system. Software underlying each component is indicated in parentheses, including Apache Web server, PostgreSQL database management system, and SQL and PHP programming languages. Connections between Web server and clients on the World Wide Web can be encrypted using the mod࿞ssl module available with the Apache Web server software.

collections staff and research community as a that have been identi®ed as belonging to the the late 1980s these data were entered by hand shared resource. same taxon. Over the years, each lot may have into a custom collections management system The LACMIP holds more than ®ve million accumulated a group of paper labels that con- developed in Borland Paradox. Nontype spec- specimens, primarily from the western United tains information regarding the fossil collect- imens that had never been ®gured in publi- States, including the world's largest collec- ing locality and sometimes multiple taxo- cations were not cataloged. However, the card tions of Cretaceous and Neogene mollusks nomic determinations made by different system continued to be maintained in parallel from western North America. Our collections researchers who have studied the material. For with the computer database and was consid- have been built over the past 90 yr and include example, the gastropod illustrated in Figure 1 ered the standard. In 2002 we extracted the the important university collections mentioned has four different hand-written and typed la- data from the legacy database and reformed it above that were transferred to the museum as bels that contain such data. One of our chal- into a new system. local universities decided to eliminate their re- lenges is to capture these data and make them search collections. The department is currently available to the public. THE LACMIP COLLECTIONS housed in an off-site facility about a half mile Cataloging of the collection was started in CATALOG from the main museum. This site contains col- the 1960s with the development of a card- lections storage as well as laboratories and based locality register. In this system, each lo- Our goal was to build an electronic catalog staff of®ces. Within the collections space, the cality was given a unique number and a card that could meet the following objectives: (1) fossils are stored in 674 steel cabinets. Spec- with essential geographic and stratigraphic in- The catalog must allow the rapid acquisition imens collected from the same locality are formation. These numbers were attached to of basic taxonomic, stratigraphic, geographic, stored together, and the entire main collections specimens and became the primary identi®- and bibliographic information. The majority are arranged ®rst according to geologic age cation of specimen lots in the collection. A of these data need to be entered manually by (Cambrian to Quaternary) and then by geo- similar card ®le system was developed for part-time staff with little training, mainly vol- graphic place (country, state, county) within and ®gured specimens. Each type speci- unteers and work-study students, so we have each age. Each steel cabinet has a set of draw- men was associated with a unique number and developed entry forms with pick lists to min- ers containing specimen lots. These are groups was cross-referenced with specimen identi®- imize typing of long and unfamiliar scienti®c of specimens from a single collecting locality cation and bibliographic information. During names; (2) The catalog must be accessible

Geosphere, October 2005 63

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 3. Forms for searching for collecting localities in the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles County (LACMIP) catalog. A: Search form allows users to specify values for various ®elds. Continued on next page.

from any computer connected to the Internet. veloped a ¯exible, modular system that can be to publish information regarding our collec- To achieve this, we decided to take advantage adapted to changing technology because in- tions only. of a Web architecture approach and the dividual components can be added, modi®ed, The new LACMIP system has been devel- existing mature technology developed for e- or removed as necessary. This will allow the oped as a Web-based, client-server database commerce sites on the World Wide Web. This system to be improved incrementally as new with multiple interfaces (Fig. 2). The data are decision was made both to streamline the de- technologies become available. For example, stored in a relational database as a backend, velopment process and to allow access for the the current system does not include collections using the PostgreSQL database system broad community of research scientists con- management functions so it cannot be used to (PostgreSQL Global Development Group, tributing to the site as well as museum staff track loans, insurance values, or the physical 2005). Some of the basic business logic is im- working in other locations; (3) The system location of specimen lots. Our institutional plemented on this server including checks for must be able to share information with outside Of®ce of the Registrar performs many of these referential integrity and triggers that enforce data networks in geoinformatics and other sys- tasks, and we are building automated links data updates. At the moment there are four tems in our own institution. Therefore, we interfaces to the data. The most simple is an from their registration system to our collection used a multitiered application architecture to interface that communicates via the SQL da- catalog. Our system also does not include a facilitate this sharing; (4) The system must al- tabase programming language (Wikipedia, sophisticated geographic information system low links to be made directly from online tax- 2005) used for administration and mainte- onomic publications to the type and ®gured to allow mapping or geospatial analysis, nor nance. Three interfaces written in the PHP specimens in our collections. These are among have we attempted to track complex synony- scripting language (PHP Group, 2005) run via the most important materials in our collec- mies and changes in taxonomic practice. In- an Apache Web server (Apache Software tions, and we strive to maximize their expo- stead we plan to take advantage of other tools Foundation, 2005). Two of these interfaces are sure for convenient use by the research com- developed especially for these purposes. For Web forms that allow input, searching, and munity; and (5) Images of specimens, example, we would likely cede responsibility browsing of the data using standard Web collecting localities, and digital copies of ®eld for maintaining taxonomic information to oth- browsers on any machine connected to the In- notes, maps, and other resources must also be er systems when distributed taxonomic dictio- ternet. One is composed of simple read-only available for remote use. naries become available for fossils. The forms accessible to the public, and the second With these objectives in mind, we have de- LACMIP electronic catalog has been designed interface includes data input forms and ac-

64 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 3. (Continued.) B: Results for a search for localities from Redding Formation in Shasta County include 88 localities. Note that there can be multiple entries for each ®eld, for example, the age of locality LACMIP 10726 has been re®ned from Cretaceous to Turonian by Harry Filkorn in August 2004. Public view cannot be modi®ed. Continued on next page.

cepts user authentication using secure proto- not warranted here. Our underlying database adigm of collections data as tools for online cols. The third interface is a set of basic Web structure is loosely based on these other mod- collaboration. All additions are time stamped services built under a Web architecture (Ja- els. The goal was to keep the schema rela- and marked with the name of the person that cobs, 2004) or ``REST-like'' philosophy tively simple but to capture as much useful made the contribution. This allows researchers (Fielding, 2000) that allows integration with information as possible. The subject areas in- to know who added the information and when other systems. clude localities, , lots, people, im- it was added. Therefore, anyone who is inter- Data models for collections ages, and a bibliography. One critical differ- ested can track changes in the system. have been described in detail elsewhere (As- ence between our model and many other Locality associated information includes sociation of Systematics Collections Commit- systems is that we track multiple interpreta- geographic, stratigraphic, and collection data. tee on Computerization and Networking, tions for most data ®elds. That is, data are Our use of locality is similar to the concept 1992; Morris, 2000; Pullan et al., 2000; Ra- never deleted as new information is added. of collecting event used in the ASC model guenaud et al., 2002), and further analysis is This is in keeping with the fundamental par- (Association of Systematics Collections Com-

Geosphere, October 2005 65

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 3. (Continued.) C: In contrast, authorized users may add additional information using controls along left margin of form. Continued below.

Figure 3. (Continued.) D: Clicking the control for Unit results in a new form that can be used to add additional information regarding stratigraphic units. This simple mechanism allows researchers to update the catalog as they use it from any computer connected to the Internet.

66 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 4. A new collecting locality can be added using this Web form.

mittee on Computerization and Networking, cluded where available and provided by the from a collecting locality that has been sorted 1992). In theory it would be possible to make collector (usually in the form of United States out and identi®ed as belonging to a particular multiple collections from the same geographic township/range system or latitude/longitude), or higher taxon. In theory all speci- and stratigraphic context, but in practice many but standardized georeferencing remains to be mens identi®ed as the same species from a repeated collections are not from precisely the completed. Stratigraphic information is limit- single locality would be contained in one lot, same context. Therefore, we consider each ed to stratigraphic units (member, formation, but in practice there might be more than one new collection as a new locality in our system. group) and associated age range. The chron- lot of this species because of specimen abun- The collector, ®eld number, and date of col- ostratigraphic units used in the system are the dance, limitations in container size, or special lection are associated with the collecting lo- internationally accepted standard stage names use of individual specimens from a lot (illus- cality in the LACMIP data model. Geographic (Geological Society of America, 1999). Ad- tration, geochemical analysis, etc.). Informa- data are categorized as political place names ditional information on stratigraphy and age tion associated with specimen lots includes (city, county, state or province, country) and can be included in the text description for each taxonomic determinations, the number of supplemented by detailed written descriptions locality. specimens in the lot, and whether the speci- provided by collectors. Geospatial data are in- A specimen lot is a group of specimens men has been cited in a published work. Dig-

Geosphere, October 2005 67

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 5. A simple pick list mechanism can be used to select taxonomic names. A: For example, when adding a new lot, determination is selected using a pick list. In this form, user is searching for the Chione. Continued on next page.

ital images of specimens are provided for imen lot, and we have implemented a basic on a ®leserver at two resolutions. Thumbnails some specimen lots. system for tracking synonyms to aid in the are small compressed ®les with widths of 150 Managing taxonomic data is a complex consistent application of taxon names. pixels for photographs and 300 pixels for ®eld problem, and data models have been devel- Although collecting localities and specimen maps or other scanned images. High-resolution oped to track synonymies, changes in rank, lots are the basic units of information in our images are also available to the public at splitting, and the detailed consequences of catalog, we also maintain information regard- widths of 450 pixels for specimens and 800 changing taxonomic concepts and practice ing associated personnel, digital images, and pixels for scanned materials. Image ®le data (Taxonomic Databases Working Group, 2004; a bibliography relevant to the LACMIP col- are maintained in a basic image database as- Shattuck, 2005). The LACMIP catalog records lections. These supplementary modules have sociated with our catalog so that they can be updates to determinations of specimen lots been kept simple. People associated with the published over the World Wide Web. and allows users to search for lots using su- collections include collectors, collections praspeci®c classi®cation. We use a combina- maintenance staff, authorized users of the cat- A WEB INTERFACE TO THE tion of our legacy database and data from the alog, and specialists who have contributed COLLECTIONS CATALOG United States Department of Agriculture In- data to the system. A basic bibliographic table tegrated Taxonomic Information System that allows publications to be associated with There are both public and restricted Web (ITIS) (ITIS, 2005) as the starting point for localities and specimen lots is also main- interfaces to the LACMIP collections catalog mollusks and corals, and we could easily in- tained. Most collection localities are associ- (Fig. 3A±D). The public interfaces allow re- tegrate other taxonomic dictionaries as they ated with maps, and these are referenced as searchers to browse the catalog over the World become available for fossil groups. Multiple publications in the bibliography. In our current Wide Web (Johnson et al., 2005a; Fig. 3B). determinations can be included for each spec- catalog, images are maintained as digital ®les Note that we track multiple interpretations for

68 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 5. (Continued.) B: One genus is found by that search and can be selected by choosing the Yes control. Continued on next page.

most data ®elds. The name of the person who cess this part of the site. Restricted forms for requested as a taxonomic determination, both made each entry and the date of entry are in- searching and browsing the catalog are similar that name and the senior are re- dicated in parentheses. Researchers can to the public pages except they allow input of turned as determinations. In general, this browse through a set of localities or specimen additional data. interface has been designed to minimize lots or can download the information for local The initial entry of locality and lot records potential data-entry errors because much in- use. Data can be downloaded as delimited text into the catalog can only be performed by mu- formation is hand keyed into the catalog by ®les that include only the most up-to-date in- seum collections staff. There are data entry assistants who may have limited geological or formation, because the full information asso- forms for each of the main subject areas (Fig. taxonomic expertise. However, information is ciated with any particular locality cannot be 4), written as standard hypertext markup lan- not proofed and all data entered into the sys- represented in a simple two-dimensional table guage (HTML) Web forms. An online data en- tem are immediately available to the research if multiple interpretations are present for any try guide is provided to ensure consistent data community. piece of information. For printing specimen input, and pick lists have been implemented Both the public and authorized researchers lot labels or hard copies of locality informa- where possible to minimize typographical er- can search and browse the data using the pro- tion, portable document format (PDF) ®les rors. For example, when a determination is vided set of Web forms. For example, to ®nd can be downloaded. A thumbnail is shown if made, there are several steps to selecting a all localities in Shasta County from the Redd- images are available, and higher-resolution taxon name (Fig. 5A±D). Also, modern Web ing Formation (Fig. 3A), a researcher needs to images can be viewed by clicking on the browsers have autocomplete functions that ®ll in the appropriate ®elds on the locality thumbnail. The restricted Web forms can be may reduce typographic errors. There is a sim- search form. In this case, a total of 88 local- accessed using our secure Web server. Muse- ple mechanism to increase the consistent use ities is returned, and users may browse um collections staff and researchers interested of taxonomic names via tracking synonyms. through them one by one (Fig. 3B). Alter- in contributing to the system are assigned user Junior synonyms can be associated with senior nately, a researcher could return to the search names and passwords that are required to ac- synonyms so that when a junior synonym is form and limit or re®ne the search (using the

Geosphere, October 2005 69

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 5. (Continued.) C: The pick list can then be used to select the appropriate subgenus and species within the genus. If a particular taxon is not found in the system, it can be added by selecting entry for New in pick list. Continued on next page.

Modify Search control), or the researcher (Fig. 6B). Data for this list of lots can then be the primary interface for the catalog. Similar could download the entire data set either as a downloaded as a text ®le by selecting the forms exist to search, browse, and add biblio- text ®le or a PDF-formatted ®le that is ready Download Lot List, or labels for specimen graphic and biographic information. A com- to be printed. Authorized researchers see a trays can be produced by selecting Create prehensive user guide that will assist research- slightly different view (Fig. 3C), because they Labels. Information for one of the lots (lot ers with use of the system, including standards are able to add information. The labels asso- LACMIP 10726-2) is shown in Figure 6C. for data entry, is available through a link on ciated with each line of data are now controls However, the downloaded data will not in- all of the forms. that may be used to access additional forms clude all of the information associated with As of May 2005, our entire locality register for data entry. For example, to add new in- this lot because this information cannot be or- of 27,970 collections has been included in the formation regarding the stratigraphic unit of a ganized into a simple two-dimensional table. catalog. To date 28,197 specimen lots have locality, a contributor would click on the but- This lot has been identi®ed several times, ®rst been cataloged comprising 601,409 individual ton marked Unit to use the appropriate form as Oonia? californica (Gabb, 1864), later as specimens. We estimate that this includes (Fig. 3D). Paosia colusaensis (Anderson, 1958), and ϳ20% of our complete collection, but we do Searching and browsing for specimen lots most recently as Paosia californica (Gabb, not have precise estimates for the total size of is similar to working with locality data and 1864). In addition, the specimen lot has been the collection. In fact, during the cataloging can be performed using a similar set of Web cited in two publications (Jones et al., 1978; process we are ®nding that the previous at- forms (Fig. 6A±C). A search can be per- Squires and Saul, 2004) as type specimen tempts to estimate collection size probably are formed for both lot information and the lo- LACMIP 10810. Several images are also 25%±30% lower than the true ®gure. A sim- cality from which the lots were collected. For available that can be downloaded in high res- ilar result may be obtained during cataloging example, a search for the gastropod genus olution. Information about locality LACMIP of other large paleontological collections. The Paosia from the Redding Formation (Fig. 6A) 10726 is at the bottom of the lot page, includ- majority of these records is derived from our returns eight lots from a selection of localities ing a map. This series of Web forms provides extensive holdings of Neogene Mollusca from

70 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 5. (Continued.) D: In this case Chione (Chionista) ¯uctifraga is selected as the determination for the new lot.

Southern California. Cataloging of this mate- This means that it is dif®cult to generate direct http://ip.nhm.org/ipdatabase/locality/17575 will rial was determined to be a priority due to the links to information, for example to link from return whatever information we have regarding potential use for studies of the impact of re- another Web site to one particular locality. Sec- locality 17575, and the URL http://ip.nhm. gional environmental change on shallow ma- ondly, the ``Web spider'' programs used by org/ipdatabase/lot/10762±2 will link directly to rine communities. In addition, our complete standard Web search engines to index Web pag- information about specimen lot 10762±2. Sim- set of type and ®gured specimen lots has been es cannot access Web forms easily. To over- ilar links exist for type specimens and images incorporated, including 10,429 specimens. come these limitations, we have designed a of specimen lots. For example, the URL http:// These are the most important components of simple Web interface to the LACMIP catalog ip.nhm.org/ipdatabase/type/9786 links directly the collection, so they were a priority for that allows direct linking to individual locality, to type specimen LACMIP 9786. The returned cataloging. specimen lot, type specimen, and digital image pages are not static Web pages but are gener- records. We have followed a REST-like archi- ated by the Web server at each request so they WEB SERVICES tecture (Fielding, 2000) that takes advantage of are always up to date. As standard schemas for existing Web protocols to allow access to our the publication of paleontological specimen There are several problems with the type of data from outside systems. Each data resource data become available, we will be able to pub- Web forms interface outlined above. Most se- is represented by a Web address or unique re- lish extensible markup language (XML)± rious is the requirement for human intervention source locator (URL). These addresses are stat- formatted information using this mechanism. to locate a particular piece of information re- ic and easy to construct if the user knows what As a test of this Web services interface, we garding a particular locality or specimen lot. he or she is looking for. For example, the URL developed a system that allows joint queries

Geosphere, October 2005 71

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 6. Specimen lot data can be browsed and new information can be added to the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles County (LACMIP) collections catalog using Web forms. A: A search for the gastropod genus Paosia from the Redding Formation is performed by entering Paosia and Redding in the appropriate ®elds. Continued on next page.

across both the Holocene and fossil mollusk more complete information that might not be sources to fully verify the immense volume of collections at the Natural History Museum of contained in both systems (Fig. 7A±C). information held in our catalog. Instead, the Los Angeles County (LACM; Johnson et al., paleontological community must help with 2005b). In our institution, most departments this never-ending task. The LACMIP catalog use different systems that are appropriate for DISCUSSION is a living document that is constantly being the needs of each department. For example, improved by museum staff and database users, the LACM Holocene malacology database re- Developing any information system re- and the information published in it should not quires no treatment of stratigraphy, and the in- quires compromise. Our priority has been to be used uncritically in large compilations. Al- paleontology database has no way publish as much collections-related informa- though effort is made to publish only accurate to track water depth. In the joint search tool, tion as possible with limited resources. The information, there is large variation in the searches are performed on a subset of ®elds quality of these data varies, but even imperfect quality of the information included in the cat- from the LACM malacology and LACMIP da- data can be useful (Lieberman and Kaesler, alog. Stratigraphic and taxonomic concepts tabases, and the results include links back into 2000). Furthermore, we acknowledge that as change with time, and these updates are not the original databases so users can access museum curators we will never have the re- always included in the system. Indeed, we

72 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 6. (Continued.) B: This search returned eight lots from various collecting localities; selecting one of the buttons on the left of page will return more information for a particular lot. Continued on next page.

hope that users will help improve the data, and into compilations of paleontological occur- of specimens or access to the collections cat- we ask that the authors publishing studies rences based on published records, thus allow- alog, but so far we reject this approach be- based on information in the LACMIP catalog ing users direct access to the underlying data cause it might result in reduced collections help update the catalog with new information and allowing database administrators to auto- use. An alternative is to provide a mechanism and interpretations resulting from their re- matically track revisions in data associated by which contributors could receive some search. We also expect authors to include ci- with museum collections. In the current im- form of professional credit in the form of mea- tations to the LACMP catalog if they have plementation we have adopted a Web archi- sures that could be added to curricula vitae or used it as a data source. tecture approach rather than a more complex management reports used in professional per- Besides sharing information with our com- Web services approach. The bene®t of this formance reviews. To achieve this, we plan to munity of researchers, we encourage links to type of interface is that it can be implemented implement an electronic recorder or score- the LACMIP catalog from online versions of right nowÐthe protocols exist, and they are board that lists the number and type of data publications that make use of our collections. simple to use. The only software required to contributed by each member of the commu- Such links should enhance greatly the utility view the catalog is a standard Web browser. nity using the catalog. As links develop into of research collections catalogs (National Re- Furthermore, as new data standards and mes- the catalog from online journals or other pub- search Council, 2002). For example, papers saging protocols develop we will be able to lications, this scoreboard could be used to published in the online version of the Journal accommodate them into new versions of the track usage of particular types of information, of Paleontology or Geosphere could contain LACMIP collections catalog. and the resulting track record could be used direct links from specimen or locality citations Probably the main obstacle to the wide- to weight contributions from individual re- to the LACMIP catalog. Such links allow spread adoption of community-based catalogs searchers in the same way that publications readers rapid access to the most up-to-date in- is encouraging quali®ed researchers to con- are weighted based on the number of times formation available. Changes in the interpre- tribute hard-earned data to a collaborative sys- that they are cited in works of other authors. tation of stratigraphy, environment, or taxo- tem. There are several potential models to rec- However, in the end, researchers and other us- nomic classi®cation cannot be tracked in a tify this problem, some of which offer a ers of information in the LACMIP catalog static document, but the static document can ``carrot,'' and others that threaten a ``stick.'' must take on part of the responsibility for provide links back to systems that can be For example, we could require some level of maintaining this shared resource. As a com- changed. Similar links could be incorporated contribution as a condition for providing loans munity, we all require high-quality informa-

Geosphere, October 2005 73

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 6. (Continued.) C: For example, the complete record for lot 10726-2 includes taxonomic determination, type status, citation information, and collecting locality details. In this case images of the specimen and a map of the collecting locality are available.

74 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 7. The Web services interface to the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles County (LACMIP) catalog has been used to construct a joint search tool for the catalogs of the Department of Invertebrate Paleontology and the Malacology Section (LACM) of the Natural History Museum of Los Angeles County. A: This search form allows researchers to locate specimens from two different data sets. Continued on next page.

tion to place fossils in the proper taxonomic, we move together into a data-rich future for REFERENCES CITED stratigraphic, and geologic context because the paleontology. scienti®c value of paleontological collections Allmon, W.D., 2005, The importance of museum collec- ACKNOWLEDGMENTS tions in paleontology: Paleobiology, v. 31, p. 1±5. lies as much in this context as in the fossils Allmon, W.D., and Poulton, T.P., 2000, The value of fossil collections, in White, R.D. and Allmon, W.D., eds., themselves. Unfortunately, with the funding Much of the data in the LACMIP collections cat- levels currently available for the support of Guidelines for the management and curation of inver- alog was entered by our predecessors including J.M. tebrate fossil collections: Paleontological Society Spe- collections, museum staff will never be able Alderson, L.T. Groves, G. Kennedy, P.G. Owen, and cial Publication 10, p. 5±24. to maintain and update all of the information E.C. Wilson. Our team of work study students, re- Alroy, J., and 24 others, 2001, Effects of sampling stan- for researchers and other users of the catalog. search associates, and volunteers includes M. Alon- dardization on estimates of marine diver- so, J. , S. Cowles, A. Fu, B. Gillies, L. Moore, si®cation, Proceedings of the National Academy of The resulting bottleneck will impede progress H. Murdock, L.R. Saul, J. Severe, R.J. Stanton Jr., Sciences, v. 98, p. 6261±6266. by limiting the availability of up-to-date in- and J. Wiggins. We thank C.M. Kelly for producing American Association of Museums, 2005, Code of ethics for many of the photographs. W. Allmon, W. Kiessling, museums: http://www.aam-us.org/museumresources/ formation. To avoid this, the community of ethics/coe.cfm (May 2005). paleontologists must perform as much of the D. Pentcheff, A. ValdeÂs, and R. Wetzer provided Anderson, F.M., 1958, Upper Cretaceous of the Paci®c coast: useful suggestions for improving this contribution. required maintenance and updating as their Geological Society of America Memoir 71, 378 p. We gratefully acknowledge the support of the Unit- Apache Software Foundation, 2005, Apache http server, collections use dictates. These data must be- ed States National Science Foundation (grant DBI- v. 1.3.33: http://www.apache.org (February 2005). come a shared resource maintained by all as 0237337). Association of Systematics Collections Committee on Com-

Geosphere, October 2005 75

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 JOHNSON et al.

Figure 7. (Continued.) B: A search for genus Terebralia in both LACM and LACMIP collections results in a total of six lots, including four from fossil localities and two sites from the Holocene. Continued on next page.

puterization and Networking, 1992, An information mod- Taxonomic Information System: http://www.itis.usda. Paleontological Society Special Publication 10, el for biological collections: http://www.nscalliance.org/ gov (February 2005). p. 109±117. bioinformatics/asc%20model/Ascmodrpt.pdf (December Jackson, J.B.C., and Johnson, K.G., 2000, Life in the last Morris, P.J., 2000, A model for invertebrate paleontology 2004). few million years, in Erwin, D.H., and Wing, S.L., collections information, in White, R.D. and Allmon, Dalton, R., 2003, Natural history collections in crisis as eds., Deep time: Paleobiology's perspective: Paleobi- W.D., eds., Guidelines for the management and cura- funding is slashed: Nature, v. 423, p. 575. ology, supplement to v. 26, p. 221±235. tion of invertebrate fossil collections: Paleontological Eiden, L.E., 2004, A two-way bioinformatic street: Science, Jacobs, I., ed., 2004, Architecture of the World Wide Web, Society Special Publication 10, p. 155±260. v. 306, p. 1437, doi: 10.1126/science.1107196. volume one: http://www.w3.org/TR/2004/REC- National Research Council, 2002, Geoscience data and col- Fielding, R.T., 2000, Architectural styles and the design of Webarch-20041215 (December 2004). lections: National resources in peril: National Acade- network-based software architectures [Ph.D. thesis]: Ir- Johnson, K.G., Filkorn, H.F., and Stecheson, M., 2005a, my of Sciences, 128 p. vine, University of California, http://www1.ics.uci.edu/ Collections catalog of the Department of Invertebrate PHP Group, 2005, PHP version 4.3.10: http://www.php.net %7E®elding/pubs/dissertation/top.htm (October 2004). Paleontology, Natural History Museum of Los An- (February 2005). Gabb, W.M., 1864, Description of the Cretaceous fossils: geles County: http://ip.nhm.org (February 2005). PostgreSQL Global Development Group, 2005, Postgre- Palaeontology, v. 1, p. 55±236. Johnson, K.G., ValdeÂs, A., and Groves, L.T., 2005b, Extinct SQL, version 8.0: http://www.postgresql.org (Febru- Geological Society of America, 1999, 1999 geologic time and extant molluscs in the collections of the Natural ary 2005). scale: http://www.geosociety.org/science/timescale/ History Museum of Los Angeles County: http:// Pullan, M.R., Watson, M.F., Kennedy, J.B., Raguenaud, C., timescl.htm (January 2005). ip.nhm.org/nhmsearch/®ndlots.php (May 2005). and Hyam, R., 2000, The Prometheus Taxonomic Graham, C.H., Ferrier, S., Huettman, F., Moritz, C., and Jones, D.L., Sliter, W.V., and Popenoe, W.P., 1978, Mid- Model: A practical approach to representing multiple Peterson, A.T., 2004, New developments in museum- Cretaceous (Albian to Turonian) biostratigraphy of classi®cations: Taxon, v. 49, p. 55±75. based informatics and applications in northern California: Annales de MuseÂum l'Histoire Raguenaud, C., Pullan, M.R., Watson, M.F., Kennedy, J.B., analysis: Trends in ecology and , v. 19, Naturelle de Nice, v. 4, p. xxii.1±xxii.13. Newman, M.F., and Barclay, P.J., 2002, Implementa- p. 497±503, doi: 10.1016/j..2004.07.006. Lieberman, B.S., and Kaesler, R.L., 2000, The scienti®c tion of the Prometheus Taxonomic Model: A compar- Gropp, R.E., 2003, Are university natural science collec- value of natural history museum collections, in White, ison of database models and query languages and an tions going extinct?: Bioscience, v. 53, p. 550. R.D. and Allmon, W.D., eds., Guidelines for the man- introduction to the Prometheus Object-Oriented Mod- Integrated Taxonomic Information System, 2005, Integrated agement and curation of invertebrate fossil collections: el: Taxon, v. 51, p. 131±142.

76 Geosphere, October 2005

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021 SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 7. (Continued.) C: Links are available on the page along with results that allow researchers convenient access to the additional information in the LACMIP catalog. For example, the full information for specimen lot 26814-1 is available from http://ip.nhm.org/ ipdatabase/lot/26814-1.

Shattuck, S., 2005, Biolink, version 2.0: http://www.ento. seum collections for research and society: Bioscience, Proceedings of the National Academy of Sciences of csiro.au/biolink/index.html (January 2005). v. 54, p. 66±74. the United States of America, v. 102, p. 6520±6521, Squires, R.L., and Saul, L.R., 2004, The pseudomelaniid Taxonomic Databases Working Group, 2004, International doi: 10.1073/pnas.0501936102. gastropod Paosia from the marine Cretaceous of the Working Group on Taxonomic Databases: http:// Paci®c slope of North America and a review of the www.tdwg.org (November 2004). MANUSCRIPT RECEIVED BY THE SOCIETY 18 FEBRUARY 2005 REVISED MANUSCRIPT RECEIVED 10 MAY 2005 age and paleobiogeography of the genus: Journal of Wikipedia, 2005, SQL, in Wikipedia, the free encyclopedia: MANUSCRIPT ACCEPTED 17 MAY 2005 Paleontology, v. 78, p. 484±500. http://en.wikipedia.org/wiki/SQL (April 2005). Suarez, A.V., and Tsutsui, N.D., 2004, The value of mu- Wilson, E.O., 2005, Systematics and the future of biology: Printed in the USA

Geosphere, October 2005 77

Downloaded from http://pubs.geoscienceworld.org/gsa/geosphere/article-pdf/1/2/61/3332304/i1553-040X-1-2-61.pdf by guest on 02 October 2021