This document has been archived.

On the Creation of Environmental Data Sets for the Region

When Karl Weyprecht proposed better coordi- ronmental data sets can be found in almost every nation of research in 1874, leading to a series of NOAA line office, but it is the NOAA National coordinated synoptic observations in the Arctic, Environmental Satellite, Data, and Information little did he think that his ideas would produce Service Data Centers that share a mission of data scientific data that remains of intense interest to management. The NOAA National Data Centers’ This article was prepared researchers 130 years later. And still less would he commitment to long-term data management pro- by Florence Fetterer, NOAA Liaison, National have imagined that his proposals, and the result- vides institutional support for producing exemplary Snow and Ice Data ing of 1882–1883, would environmental data sets. Each center has a partic- Center/World Data Center inform goals for creating and managing scientific ular research focus and expertise that adds value for Glaciology, Boulder, data in the 21st century. Because of advances in to its data management results. After a brief profile Colorado, and Igor observation and data technologies, the questions of these centers, we will discuss what general Smolyar, of NOAA’s that Weyprecht addressed have only increased in characteristics make certain data sets especially Ocean Climate significance. What constitutes useful environmen- valuable and what elements come into play during Laboratory, National Oceanographic Data tal data? How are data both a product of research the production of these data sets, highlighting Center, Silver Spring, and a catalyst for new research? How should data enough of them here to provide a sense of the Maryland. be managed to ensure continued accessibility and breadth of NOAA’s Arctic data production activi- usefulness? The NOAA Arctic research activities ties. An atlas, the Climatic Atlas of the Arctic described elsewhere in this edition of Arctic Seas 2004, serves as a case study. We also cite a Research of the United States both use and pro- number of NOAA operational, research, and mod- duce data. This article examines the process of eling products as examples of particular aspects of data product creation. National Data Centers National Oceanographic Data Center Located in Maryland, the National Oceano- graphic Data Center (NODC) is a repository and IPY meteorological dissemination facility for global ocean data. station, 1882. Researchers from NODC’s Ocean Climate Labora- creating environmental Arctic data sets and the tory (OCL) announced in 2000 that the world symbiosis of research, data, and data management. ocean has warmed significantly over the past 40 These data sets may have value beyond that of years. Just as the atmosphere has a climate, with advancing Arctic research objectives: they may variability on different time scales, the ocean’s be, for example, monitoring tools for change detec- temperature, salinity, and other characteristics tion, or they may underlie decision support appli- change over time. OCL researchers based their cations. conclusions on data laboriously collected, quality This focus on data and data management has controlled, and assembled into a special form of become a proper discipline at NOAA in the 130 environmental atlas called a climatology. To facili- years since Weyprecht’s call for better coordina- tate comparisons of the past with the present, tion of research and data resources. Arctic envi- and to investigate interannual-to-decadal ocean

77 (NCDC) in Asheville, North Carolina, are Arctic station data from the Global Historical Climate Network, the most comprehensive homogeneous collection of station temperature data available. “Homogeneous” means consistent over the years and from place to place, through changes in instrumentation, acquisition method, and site characteristics, so that scientists may look for trends in the data. Homogeneous data sets require Arctic temperature climate variability, many thousands of raw obser- careful quality control. Historical data are made anomalies vations acquired from ships were interpolated to homogeneous with present-day observations by a regular spatial grid and combined over annual, adjusting for non-climatic discontinuities, such as seasonal, and monthly compositing periods. A a jump in precipitation that might be caused by a definitive statement about oceanic warming would change in instrumentation. An important part of the not be possible without these climatologies. quality control process is compiling station inven- In addition to supporting scientific studies, tories that detail the history of each station, includ- OCL’s International Ocean Atlas and Information ing changes in instrumentation, changes in loca- Series (currently nine in number) exemplify inter- tion, and changes in surroundings. If a town grows national cooperation. Much of it is taking place up around a formerly rural station, for example, a through the OCL’s World Data Center (WDC) for heat island effect may be present in the data record. Oceanography, in Silver Spring, Maryland. Inter- NCDC also operates the World Data Center for national collaboration is an absolute necessity for Paleoclimatology (WDC Paleo), located in Boulder, acquiring a sufficient number of observations for Colorado. Paleoclimatology puts the relatively climatologies. The Global Oceanographic Data recent changes in Arctic climate, apparent in the Archaeology and Rescue (GODAR) Project, for instrumented record, in long-term context. Proxy example, has added over six million historical data from tree rings, ice cores, and lake and marine ocean temperature profiles to the archives, as well sediments available from the WDC Paleo were as a large amount of other data. Initiated by the used by an international team of scientists for a NODC and WDC, this OCL-directed project was circum-Arctic view of surface air temperature subsequently endorsed by the UNESCO Inter- changes over the last 400 years. governmental Oceanographic Commission. The WDC Paleo web site provides interpreta- Ice core samples. Ice cores are taken from tions of the record: A steep increase in warming ice sheets or ice caps National Climatic Data Center between 1850 and 1920 was most likely due to and are used by natural processes. Warming since 1920 is more paleoclimatologists as a Among the hundreds of climate data compila- difficult to ascribe to natural forcing alone. For an record of past climate. tions housed at the National Climatic Data Center even longer view, ice core data are valuable. WDC Paleo and the National Snow and Ice Data Center jointly maintain the Ice Core Gateway. Proxy climate indicators from ice cores such as oxygen isotopes, methane concentrations, dust content, and other parameters stretch the record back more than 1000 years.

National Geophysical Data Center The National Geophysical Data Center (NGDC), Boulder, Colorado, contributes significantly to Arctic science through participation in the devel- opment of the International Bathymetric Chart of the (IBCAO). IBCAO bathymetry provides a detailed and accurate representation of the depth and morphology of the Arctic Ocean seabed. This dynamic database contains all available bathymetric data north of 64°N. It is

78 maintained as a gridded database, and a version NSIDC has 522 data products in its on-line has been published in map form. The IABCO team archive. Excluding satellite data sets, 104 of these remapped the Lomonosov Ridge, showing it to be are Arctic data sets, and of these about 20% are more segmented in structure, wider, and shallower the more evolved compilations termed Arctic envi- than had previously been mapped. The Lomon- ronmental data products. osov Ridge is an important topographic barrier that influences deep water exchange between the What Makes a Good eastern and western basins of the Arctic Ocean. An accurate seafloor is important for applications Environmental Data including ocean modeling, mapmaking, and other Product? research endeavors. The IABCO effort involves investigators from eleven institutions in eight To make a good environmental data product, countries. It has been endorsed and supported by one starts with raw data and then processes or the Intergovernmental Oceanographic Commission, presents them in such a way that they become the International Arctic Science Committee, the information. Data are transformed into a product International Hydrographic Organization, and the that can advance a user up a hierarchy from data U.S. Office of Naval Research. to information, to knowledge, to wisdom, shorten- ing the user’s path from data to knowledge or to National Snow and Ice the why and how of environmental interactions Data Center and change. Following is a summary of some of the data management practices that effect this Operational products, such as sea ice charts for transformation. shipping interests from the NOAA/Navy/Coast Guard National Ice Center, are often laboriously Context produced by manually interpreting and synthesiz- ing data from many sources, both satellite and in Simply presenting data systematically is some- situ. They are generally more accurate than similar times enough to transmit any underlying meaning. products from single sources. The National Snow That is, even a well-organized collection of raw and Ice Data Center (NSIDC) works with opera- data can be an environmental data product. Usually, tional groups within NOAA to make these prod- though, data products are more sophisticated. ucts available to a different user base by archiving Presenting data in context is important, and the operational data, making data available online, data product creator must decide what “context” providing documentation, and fielding questions means for the particular data under consideration. from researchers about the data. Temporal context usually means having as long a Originally founded to manage scientific data record as possible, while spatial context may mean from the International Geophysical Year of 1957– covering as much of the Arctic as possible at an 1958 (the follow-on inspired by the IPY of 1882– appropriate resolution or station density. Context 1883), the World Data Center (WDC) for Glaciology may mean including population data for both pred- is operated by NSIDC in Boulder, Colorado. Today, ator and prey in an Arctic species survey or includ- NSIDC is a NOAA-affiliated data center, designated ing as complete a set of oceanographic hydro- by NOAA in 1976, affiliated with NGDC, and part of chemical parameters as possible. The point is to the University of Colorado’s Cooperative Institute make significant patterns evident, while committing for Research in Environmental Sciences (CIRES). no “sins of omission” in choosing what to include. The NOAA program at NSIDC supports the Methods are important, as is documenting uncer- WDC and emphasizes data rescue and data from tainty. For example, if the product is a gridded clima- operational sources that can be used for climate tology of snow depth on Arctic Ocean sea ice, some research and change detection. NOAA-funded analysis should be done to ensure that enough activities complement the activities of NSIDC’s observations are included in each grid cell for an Distributed Active Archive Center (DAAC) and its acceptable level of accuracy. Sometimes the way NSF-funded Arctic System Science Data Coordi- in which data are gridded, interpolated, and pre- nation Center. The latter centers handle large vol- sented implies a certain level of precision. For exam- umes of satellite data (about 80% of NSIDC’s ple, a two-dimensional quadratic function fit to snow funding comes from NASA for operation of the depth on sea ice tells the user immediately that the DAAC) and data from individual scientists. snow depth information is not very precise.

79 Data products may include the results of an analysis, such as an empirical orthogonal function analysis of surface pressure data that shows the Arctic Oscillation pattern (that is, the tendency of pressure near the pole to act counter to pressure at mid-latitudes) or the addition of a trend line or other model fitting to observations. In such cases, the product creators have an extra responsibility to explain the limitations of their method of analy- sis, since these methods, if appropriately used, draw the pattern in the data rather than leave it to inference.

Documentation Words are often the only way to provide appro- priate context. The heat island effect, known only if the weather station history is known, is but one example of the importance of clear and complete documentation. Good documentation is written with the user in mind. For example, if the users are scientists, they will need to know about any known biases in the data record. If the data prod- uct is created for the general public, this informa- tion is just as important because it influences what a user infers from the data, but the information must be given in non-technical language. Opera- tional users often need today’s data irrespective Variation in snow depth on Arctic sea ice, depicted by fitting a two-dimensional qua- of historical biases and may require little or no dratic function to available data by month. There are very few measurements of snow characteristics on Arctic sea ice. This representation, from the Environmental Working documentation. Group’s Meteorology Atlas, is the best possible in the absence of dense station coverage. For example, the NOAA National Weather Ser- While it appears unrealistic, product documentation explains why a more sophisticated vice’s National Operational Hydrologic Remote gridding method for the available data is not appropriate. Sensing Center (NOHRSC), located in Minneapo-

Examples of the Arctic Oscillation in its positive phase (left) and negative phase (right) from the NOAA National Weather Service’s Climate Prediction Center web site. Blue indicates negative pressure anomalies, and orange indicates positive. The Arctic Oscillation is a large-scale atmospheric circulation pattern. Variability in the AO has been implicated in changes such as the recent steep decline in ice extent.

80 represent the underlying resolution of the data, lest a scientist infer regional relevance that is not supported. Color tables should not have sudden jumps in intensity or hue that can draw the eye to a sea surface temperature difference, for example, that is an arbitrary point in a continuum, thus sug- gesting a pattern that is not there.

Data Integrity Three main components ensure environmental data product integrity: • The product must have scientific integrity; peer review of the data and a citation for the data set are needed to accomplish this. • The data repository must be trustworthy. Sea ice extent trends. lis, Minnesota, provides snow information in a • The data must not have been altered since the When data are presented variety of products and formats to meet opera- data were acquired or produced (or any alter- with a trend line, as in tional forecasting needs. The NOHRSC web site ation must be well described). The data man- this example, data provid- ers should include error (http://www.nohrsc.noaa.gov/) is designed to agement concepts of fixity, provenance, and bars and document the serve these users efficiently with interactive prod- source authentication come into play here. limitations of the method ucts and brief documentation. NSIDC archives Often it is the reputation of an individual scien- (linear regression in this assimilation model output (http://nsidc.org/data/ tist that imbues his or her data product creation case) when it comes to g02158.html) from NOHRSC and gives the research with an aura of integrity. Data centers work with providing information community access to this unique data set. Exten- scientists to ensure that the reality measures up. from the raw data. sive documentation on the NSIDC site, not needed Though it is common in the U.S. for investigators on the NOHRSC site, covers alternative products, to manage their own data, this is rarely successful data quality and value, and potential research over the long term because scientists rarely have uses of the data. Similarly, the NOAA NESDIS Sat- the requisite data management background needed ellite Services Division Operational Daily Northern to keep their data useful and accessible to the next Hemisphere Snow Cover Analysis is made avail- generation of scientists or the resources to deal able to the operational community at http://www. with technical issues that keep data secure, such ssd.noaa.gov/PS/SNOW/ and is archived for the as media migration and off-site backups. research community at NSIDC (http://nsidc.org/ data/g02156.html). Likewise, the NODC web site (http://www.nodc.noaa.gov/) is designed to pro- What Makes a Well-Used vide users access to various products with a Environmental Data Set? higher level of documentation than would be needed for operational users. Certain attributes will ensure that a data set will have many users. In such cases it is especially Graphics and Site Design important to follow the design precepts above. Data products that include unique data, that are Graphical presentation and site design are comprehensive collections, that offer continuous aspects of information architecture that are espe- coverage over a long time period, that are easy cially important for complex environmental data to use, and that provide a synthesis of available sets that are viewed or used through a web page. information are characteristics of the most popular Though not an environmental data set per se, the Arctic data products. NOAA Arctic Theme Page (http://www.arctic. noaa.gov/) offers an example of good site design. Uniqueness Careful attention must be paid to the graphical presentation of data. Gridded data to which a color Upward-looking sonar data from submarines table has been applied, or data smoothed by inter- provide the only measurements of ice thickness polation, are often subject to misinterpretation. over a large portion of the Arctic. Ice thickness The display resolution of pixels should faithfully estimates are critical to estimating the mass bal-

81 ance of ice in the Arctic Ocean—ice extent and concentration are only two dimensions of a three- dimensional problem. One difficulty in working with original records is that almost all submarine data are classified. Investigators at the U.S. Army’s Cold Regions Research and Engineering Temperature anomalies Laboratory (CRREL) in Hanover, New Hampshire, for October through and the Polar Science Center, Applied Physics December 2002 in the Laboratory, University of Washington, worked Arctic. This illustration with the U.S. Navy’s Arctic Submarine Laboratory assisted in attributing the to find a way to “fuzz” the submarine track posi- causes of the 2002 and 2003 sea ice extent mini- tions so that the data could be cleared for release ma to, in part, anoma- by the Chief of Naval Operations. NSIDC distrib- lously warm air tempera- utes the data set, Submarine Upward Looking A submarine surfacing through sea ice. This photo comes ture. The NOAA-CIRES Sonar Ice Draft Profile Data and Statistics (http:// from SCICEX (Scientific Ice Expeditions), a collaboration Climate Diagnostic Center nsidc.org/data/g01360.html). It has been the basis between the U.S. Navy and civilian scientists for environ- (http://www.cdc.noaa.gov) of a number of research articles on the controver- mental research in the Arctic. display tool allows users sial topic, “Is Arctic ice thinning?” to choose the month and year to display from the and other platform types. As such, the Interna- NCEP/NCAR re-analysis Comprehensiveness tional Comprehensive Ocean Atmosphere Data Set products. The color (ICOADS, http://www.cdc.noaa.gov/coads/) of table is intuitive (warm Comprehensive data products offer more value quality-controlled data dating from the late 18th colors are warmer than than data sets that must be combined with others century is a remarkable achievement. An entire average temperatures), in order to have enough data of a single type for body of literature has grown up around topics and the resolution of a scientific study. It takes a well-funded project, a related to the quality control of historical ship data one degree shown in the color bar is multi-year commitment, and many individual and in ICOADS. For example, sea surface temperatures appropriate to institutional partners to assemble, for example, acquired by throwing a bucket over the side and the data set. “all” surface marine reports from ships, buoys, measuring the temperature of the retrieved water are not the same as temperatures acquired from the engine intake. A “bucket correction” must be applied. This correction is based on modeled heat loss for water in a bucket on deck and should take into account ship speed (and its uncertainty) and the material of the bucket (wood or canvas). Clearly, quality controlling the millions of observations of various types so that they are homogeneous over time is a Herculean task. ICOADS began as a U.S. project (COADS) in 1981 as a partnership between the NOAA Office of Oceanic and Atmospheric Research’s Environmen- tal Research Laboratories and NCDC, CIRES, and NSF’s National Center for Atmospheric Research. The NOAA portion of ICOADS is currently sup- ported by the NOAA Climate and Global Change Program. NSIDC makes an Arctic subset available (http://nsidc.org/data/nsidc-0057.html). Continuous Spatial and Temporal Coverage Products that are as close as possible to continuous in space and time are often desirable because, for example, the danger of aliasing is minimized (that is, there is a smaller chance of

82 missing a significant event or pattern in the data). used to detect Arctic change and assist in attribut- The enormously popular “reanalysis” products ing change to specific causes. are an example. Reanalysis projects take data such as those in Ease of Access and Use ICOADS and assimilate them through a numerical weather prediction model to produce a series of Many valuable data sets lie unused in archives analyses in which parameters such as surface simply because they are not easily accessible. For- pressure and temperature fields are physically merly, NSIDC’s Glacier Photograph Collection of consistent with one another. The fields cover a thousands of historical glacier photographs dating large area (the Northern Hemisphere, for instance) from the 1880s saw only a handful of users each without gaps and are available at regular time year because users had to travel to NSIDC to view intervals over a long record, making reanalysis the fragile collection of prints. Now, thanks to output more useful than observations for many NOAA’s Climate Database Modernization Program applications. One example is studying the spatial funding for scanning the photos, many of the photo- and temporal variability of large-scale atmospheric graphs can be viewed on line, and high-quality circulation patterns, such as the Arctic Oscillation. digital images can be downloaded (http://nsidc.org/ NOAA’s Arctic Research Office is planning a cou- data/g00472.html). As a result, the number of users pled atmosphere–sea ice–ocean–terrestrial reanal- has climbed to about a thousand each month. ysis optimized for the Arctic region. The descrip- Improving access can broaden the user base tion of the climate system it will produce can be for a data set. NSIDC’s satellite passive microwave Cushing Glacier, Alaska. This photograph, taken in 1967, is one of thousands that are part of the Glacier Photograph Collection, created by the National Snow and Ice Data Center and available on line at http://nsidc.org/data/ g00472.html.

83 Sample result from the sea ice concentration data are popular among sci- from multiple sources. An exciting and successful Sea Ice Index, which entists but not geared toward the general public. example of this kind of product is NOAA’s Near displays anomalies in ice The data are voluminous, and some technical and Realtime Arctic Change Indicator web site (http:// extent and other ice scientific sophistication is needed to simply read www.arctic.noaa.gov/detect/), which summarizes parameters going back to 1979. Here, recent sum- and interpret the data. To mitigate these issues, “the present state of the Arctic climate and eco- mer ice extent, which has NSIDC developed the Sea Ice Index, which pro- system in an accessible, understandable, and been the lowest in the data vides an easy way to visualize the satellite data. credible historical context.” Designed for decision record, is displayed with The Sea Ice Index web site lets any user track makers and the general public, it presents a the median extent (pink changes in sea ice extent and compare conditions sophisticated 30-year principal components analy- line) to give climatological between years. About 3,000 users visit the Sea Ice sis (the synthesis) of 19 climate, land, marine, context to the information. Index site every month. and ecosystem “indicator” time series, such as the Similarly, the OCL web site (http://www.nodc. length of the travel season over tundra, the Bering noaa.gov/OC5//SELECT/dbsearch/dbsearch.html) Sea pollock population, the number of extremely allows users to extract data from the World Ocean cold days each year in cities such as Minneapolis, Database 2001. While contributors to an environ- and the extent of Arctic sea ice. mental data product often prefer that the data be Taken alone, any one of these time series compiled on CD-ROM or DVD for reasons of fixity would not present a compelling account of Arctic and attribution, data sets are much more likely to be change. Taken together, the big picture emerges. used if they can be easily browsed or manipulated The site tracks the rate and extent of changes in on line with a selection tool to facilitate access. the Arctic to facilitate informed decisions concern- ing the impacts that result. Web pages for each Synthesis of the indicators give a succinct but complete analysis of the data record in non-technical terms. Data products that offer synthesis—a “big pic- Changes are given in context, including the con- ture” version of the information in the data—are text of the human dimension. Links to reports and rare because they are difficult to construct. Syn- more detailed data make it a useful resource for thesis products are built by distilling information scientists as well.

84 of collaborating with Russian institutions to add more historical data to the OCL World Ocean Database Project. The most recent result is the Climatic Atlas of the Arctic Seas on DVD, with meteorology, oceanography, and hydrobiology (plankton, benthos, fish, sea birds, and marine mammals) data from the Barents, Kara, Laptev, and White Seas, collected by scientists from 14 coun- tries during the period 1810–2001. The Murmansk Marine Biological Institute of the Academy of Sci- ences of the Russian Federation and OCL/NODC prepared the atlas with support from NOAA NESDIS and the Climate and Global Change Program. The atlas provides historical context for its observations by including a written history of oceanographic observations in the Arctic, as well as scanned copies of selected rare books and arti- cles. A gallery with photos and drawings gives the user some idea of what historical data collection platforms and expeditions looked like. As is often the case with projects involving data rescue, libraries provided much of the material and documentation; the NOAA Central Library

Selection of time series representing Arctic change. The combined indicators are the result of a mathematical analysis (principal component analysis) that resolves the trends in all the time series into two major components. Series noted by an asterisk have been inverted. Red indicates large changes in recent years.

The Arctic Change Indicator web site was developed by NOAA’s Arctic Research Office under the stewardship of investigators at the NOAA Pacific Marine Environmental Laboratory. It draws on the work of hundreds of investigators around the globe. A major challenge will be to keep the site updated. As NOAA looks toward building new observing systems, it will be critical to maintain data flow from existing observing stations. The Climatic Atlas of the Arctic Seas

With these principles in mind, we now turn to a case study of active, collaborative data manage- ment that produces new knowledge and dissemi- Marine biologists P. Savitsky and I. Molchanovsky sam- pling plankton in the Kara Sea as part of an expedition of nates this potential across a wide community of the Murmansk Marine Biological Institute on the nuclear researchers. Sovetsky Soyz in April–May 2002. The Cli- The WDC for Oceanography in Silver Spring , matic Atlas of the Arctic Seas weds early observations Maryland, and OCL/NODC have a long history with contemporary observations in a seamless package.

85 One of the first Russian research vessels, Andrey Pervozvanny, on an expe- dition at the beginning of the 20th century, in the Barents Sea. (Silver Spring, Maryland), the Slavic and Baltic Users have two ways of accessing raw obser- Branches of the New York Public Library, the New vations: by oceanographic cruise or through 1° York Museum of Natural History Library, the Dart- squares. For every month, a distribution map of mouth College Library (Hanover, New Hampshire), stations is generated that allows a user to access the Slavic Library (Helsinki, Finland), and the pub- data from a chosen square. Data may be easily lic libraries of Moscow, Murmansk, and St. Peters- imported into Excel or other database applications. burg (Russian Federation) all contributed. Access to the actual observations is important Assembling an atlas on this scale presents a for many users. Other users are likely to prefer a number of challenges. A surprisingly difficult one climatological presentation, since climatologies is the elimination of duplicate stations from differ- provide a convenient representation of average ent data sources. As databases or parts of data- conditions, such as monthly or decadal means. bases are shared, metadata are altered. For exam- The atlas satisfies both by including mean monthly ple, one database may have a station location in temperature and salinity distribution fields at five degrees, minutes, and seconds, and another may standard depths, using an objective data analysis convert to decimal degrees. Rounding errors may method. give the appearance that these are two stations separated by as much as a few miles. Values of A Long Journey from the parameters may be presented at observed levels in one data set but interpolated, often by an Past: The International unknown method, to a standard level in another Polar Year data set. Another source of uncertainty is convert- ing units of measurement. As a result, the same The International Polar Year serves as an station data from different sources may differ in important milepost for assessing our efforts and coordinates, time of measurement, and values of establishing stronger standards to carry the value the parameters themselves. To help choose what of observations and research far into the future. station records to include, the atlas authors used As we look back over past IPY/IGY efforts and a system of priorities: cruise reports, ship logs, forward to those coming in 2007–2008, those of and expedition diaries (all original sources) were us who create Arctic environmental data sets have deemed more reliable that data sources where the observed some lessons over the years. data apparently were repeatedly transformed. Many of us know the tragic story of the Greely Elimination of duplicates and “near duplicates” Expedition, an American venture sent into the brought the number of stations down from an Arctic in 1881 to establish an IPY station that initial 1,506,481 to a still sizeable 433,179. ended in starvation for most of the party, but

86 Average September surface temperature (top) and salinity (bottom), from the Climatic Atlas of the Arctic Seas. Note the low salinity at the river mouths. Scientists are working to understand the role of freshwater input from rivers on Arctic Ocean (and global ocean) circulation. Climatologies provide a picture of average conditions against which to evaluate changes.

fewer of us are aware of the sad tale of their scien- didn’t arrive in time they would be left to retreat on tific data. Their observational records should have their own, Greely began making copies of their sci- become a legacy to their efforts and sacrifice, but entific work (amounting to some 500 observations this is not the case. As Kevin Wood of NOAA per day). When they were forced to abandon Fort writes, narrating the story of their data: Conger in August 1883 they took with them—in lieu of extra rations—these copies sealed in three tin “Perhaps the most compelling aspect of the Greely boxes of 50 pounds each, all of the daily journals, tragedy is the utter commitment of these men to 70 pounds of glass photographic plates, and all of preserve their scientific work. Aware that if relief the standard thermometers and several other impor-

87 tant instruments. They also continued a program of create from the experience of the first IPY. As we scientific observation, in the face of starvation, until look forward to a new International Polar Year, we just 40 hours before they were rescued.” must remain focused on these key lessons about data management. Surely, Greely and his men hoped that their work would lead to important advances in science. Lesson 1: Applying the Right Kind and Right Greely expressed this sentiment when he wrote in Amount of Effort at the Right Time is Imperative his official report, “The conviction that at no dis- While data rescue is difficult, tedious, and tant day the general laws of atmospheric changes often expensive, it is crucial. The only way to will be established, and later, the general character reduce uncertainty in our estimates of past, and of the seasons be predicted through abnormal predictions of future, Arctic environmental change departures in remote regions, causes this work to is to incorporate more, older, and better data into be made public…in the hope that it may contribute our analyses. NOAA’s Climate Database Modern- somewhat to that great end.” The desire to be able ization Program, now in its sixth year, has keyed to “predict the character of the seasons” still moti- or scanned and placed on line over 45 million envi- vates researchers today. ronmental records. More needs to be done, espe- Unfortunately, the research program of the first cially in documenting and quality controlling IPY was never completed as Weyprecht had origi- these records, because these last steps require nally planned. Each nation issued an individual capturing the knowledge of people who know the report over the ensuing years, but no systematic “rescued” data best, often a cadre that are beyond study of the simultaneous observations—the retirement age. heart of the IPY program—was undertaken. The International Polar Com- Lesson 2: Structures that Enable International mission dissolved, and Collaboration can Dramatically Increase Value the data collected at As a result of Arctic geography, the most com- such cost during the prehensive data sets result from international first IPY soon fell into cooperation. GODAR and ICOADS are models for obscurity. this cooperation. International data-sharing agree- Today the original ments are essential. In contrast to the National records of the first IPY Data Centers, the World Data Center system pro- are widely scattered in vides a structure within which data sharing can various libraries and occur with a minimum of diplomatic overhead. archives and are often WDCs in the U.S. that share Arctic data interna- in a perilous state of tionally are the WDC for Glaciology, Boulder preservation. Some of (co-located with NSIDC), WDC for Oceanography, the published reports Silver Spring (co-located with NODC), the WDC are extremely rare and for Marine Geology and Geophysics, Boulder are very difficult to (co-located with NGDC), the WDC for Meteorol- obtain. The fate of the ogy, Asheville (co-located with NCDC), and the first IPY records, gained WDC for Paleoclimatology (affiliated with NCDC). at such high cost, under- utilized both then and Lesson 3: Good Data Stewardship is Superior to now, and scattered over Untimely Data Rescue Sgt. Jewell recording tem- the course of time, highlights how important it is We can avoid expensive and possibly fruitless perature, Fort Conger, to provide for the effective preservation and man- data rescue efforts in the future by heeding the during the Greely expedi- agement of such extremely valuable data. lessons of the past. The International Polar Year, tion, 1881–1883. The scientific legacy of the Greely Expedition 2007–2008, will be a catalyst for reinvigorating and the other expeditions of the first IPY has only professional data management. The IPY promises with difficulty been preserved. NOAA has recently new international collaboration and the potential made meteorological data from the first IPY avail- for synthesis of knowledge under the headings able in digital format, along with an extensive col- of cross-disciplinary research themes. Good data lection of documentary images (see http://www. stewardship will help ensure that this major under- arctic.noaa.gov/aro/ipy-1). taking will not shortly become a dimly receding There is another kind of legacy that we can spot on the horizon behind us.

88 This effort requires not only attention to the vations, satellite data, and environmental data data, but also to capturing the “data about the products in a number and of a complexity that data” that enables continuing understanding and would have been hard to imagine in the late 1880s. value. A disciplined effort to define and organize To ensure preservation, the metadata will enable other researchers to • NOAA’s Data Centers and the Arctic Research locate, understand, and interpret data for years Office will work to advance standards and Jane Beitler and Ruth Duerr, NSIDC, assisted in to come, providing the foundation for long-term technologies that support this goal. NOAA editing this article. Kevin coordination and synthesis. In the instance of the advocates the use of the Open Archival Infor- Wood, PMEL, provided first IPY, Weyprecht’s vision of coordinated syn- mation System (OAIS) Reference Model for the material on the optic observations led to the acquisition of a data metadata. Work on the OAIS model and on first IPY and the set that serves as a snapshot of climatic condi- technological advances such as GRID com- Greely expedition. tions as they existed in that now long-past year. puting and interoperable catalogs is happen- Data collected then can now be compared with ing now at NOAA’s National Data Centers. conditions as they are today. Making that compar- • Cross-agency support for IPY data manage- ison, Wood and Overland (2005) found that ment is needed. Because of the international, monthly mean air temperatures at IPY-1 stations distributed nature of IPY activities, the data were generally within recent climatological limits, they produce will necessarily be archived and spatial patterns in temperature anomalies and made accessible through distributed data (departure from the long-term mean) were consis- management. This distributes the burden of tent with Arctic-Oscillation-driven patterns of vari- data management but imposes additional ability. In a nod to the value of documentation coordination challenges. Within the U.S., and metadata, Wood and Overland noted that “the the National Academy of Sciences’ Polar qualitative logs are particularly useful in validating Research Board has endorsed the concept climate information.” put forward by the International Council of NOAA will focus its strength in environmental Scientific Unions’ IPY Planning Group of a observations and analysis on the polar regions coordinating IPY Data and Information Ser- during IPY. NOAA’s Arctic Research Office has vice (IPY-DIS). Cross-agency support of the endorsed a fundamental goal for IPY data manage- DIS at a national level will ensure that the U.S. ment: to securely archive a baseline of data leaves a secure IPY data legacy. against which to assess future change, and to • Adequate funding is needed. Funding for ensure that IPY data are accessible and preserved the management of data acquired through for current and future users. research programs is often difficult to obtain, What will this IPY snapshot look like, and how either because the importance of data man- will data be preserved? In contrast to Weyprecht’s agement as a discipline is not recognized or IPY, most data from the coming IPY will be “born because there are simply not enough dollars digital.” Station logbooks from Weyprecht’s day to go around. Currently, for every $30 dollars could be preserved in libraries, where they had to spent nationally on Arctic research, about $1 be physically protected from destruction by fire, is spent on Arctic data management. insects, and chemical decomposition of paper and In the end, it is important to remember that ink. One might think it is easier to preserve digital technological advances and digital archives will information, but digital data are not immune from secure data for future generations of researchers physical destruction, and they require a host of only to the extent that they are successful in cap- measures to ensure their usability into the future: turing what people know about the data. We must “digital objects require constant and perpetual also keep today’s equivalent of the IPY-I station’s maintenance, and they depend on elaborate sys- “qualitative logs.” With them, future researchers tems of hardware, software, data and information will have the appropriate contextual material to models, and standards that are upgraded or turn the coming IPY data into information-filled replaced every few years.”* Arctic environmental data products. During the coming IPY, hundreds of investiga- tors and agency programs will produce raw obser- References * From It’s About Time: Research Challenges in Digital Arzberger, P., P. Schroeder, A. Beaulieu, G. Bowker, Archiving and Long-Term Preservation, final report of a workshop sponsored by NSF and the Library of Congress, K. Casey, L. Laaksonen, D. Moorman, P. Uhlir, August 2003. and P. Wouters (2004) An international frame-

89 work to promote access to data. Science, Vol. Series, Vol. 8, World Data Center for Oceanog- 303, No. 5665, p. 1777–1778. raphy, Silver Spring, Maryland. Levitus, S., J.I. Antonov, T.P. Boyer, and C. Markhaseva, E., A. Golikov, T. Agapova, A. Beig, Stephens (2000) Warming of the world ocean. and I. Smolyar (2002) Zooplankton of the Arctic Science, Vol. 287, No. 5461, p. 2225–2229. Seas 2002. CD-ROM, International Ocean National Science Foundation and Library of Con- Atlas and Information Series, Vol. 6, World Data gress (2003) It’s About Time: Research Chal- Center for Oceanography, Silver Spring, Mary- lenges in Digital Archiving and Long-term land. Preservation, Final Workshop Report. Spon- Matishov, G., A. Zuyev, V. Golubev, N. Adrov, V. sored by the National Science Foundation’s Slobodin, S. Levitus, and I. Smolyar (1998) Digital Government Program and Digital Librar- Climatic Atlas of the Barents Sea 1998: ies Program, Directorate for Computing and Temperature, Salinity, Oxygen. NESDIS Atlas Information Sciences and Engineering, and The 26, NOAA, Washington D.C. Library of Congress’s National Digital Informa- Matishov, G. P., Makarevitch, C. Timofeyev, L. tion Infrastructure and Preservation Program. Kuznetsov, N. Druzhkov, V. Larionov, V. Golubev, Overpeck, J.T., K. Hughen, D. Hardy, R. Bradley, R. A. Zuyev, V. Denisov, G. Iliyn, A. Kuznetsov, S. Case, M. Douglas, B. Finney, K. Gajewski, G. Denisenko, V. Savinov, A. Shavykin, I. Smolyar, Jacoby, A. Jennings, S. Lamoureux, A. Lasca, G. S. Levitus, T. O’Brien, and O. Baranova (2000) MacDonald, J. Moore, M. Retelle, S. Smith, A. Biological Atlas of the Arctic Seas 2000: Wolfe, and G. Zielinski (1997) Arctic environ- Plankton of the Barents and Kara Seas. mental change of the last four centuries. NESDIS Atlas 39, NOAA, Washington D.C. Science, Vol. 278, No. 5341, p. 1251–1256. Matishov, G., A. Zuyev, V. Golubev, N. Adrov, S. Peterson, T.C., and R.S. Vose (1997) An overview Timofeev, O. Karamusko, L. Pavlova, O. Fady- of the Global Historical Climatology Network akin, A. Buzan, A. Braunstein, D. Moiseev, I. temperature database. Bulletin of the American Smolyar, R. Locarnini, R. Tatusko, T. Boyer, and Meteorological Society, Vol. 78, No. 12, p. S. Levitus (2004) Climatic Atlas of the Arctic 2837–2849. Seas 2004: Part I. Database of the Barents, Wood, K.R., and J.E. Overland (2005) Climate Kara, Laptev, and White Seas - Oceanography lessons from the First International Polar Year and Marine Biology. NESDIS Atlas 58, NOAA, 1881–1884. Poster presented at the American Washington, D.C. Meteorological Society 8th Conference on Polar National Operational Hydrologic Remote Sensing Meteorology and Oceanography, San Diego, Center (2004) SNODAS Data Products at California. NSIDC. Digital media, National Snow and Ice Data Center, Boulder, Colorado. Data Set Citations National Snow and Ice Data Center (1998) Subma- rine Upward Looking Sonar Ice Draft Profile Berger, V. J., A.D. Naumov, N.V. Usov, M.A. Zuba- Data and Statistics. Digital media, National ha, I. Smolyar, R. Tatusko, and S. Levitus (2003) Snow and Ice Data Center/World Data Center 36-Year Time Series (1963–1998) of Zoo- for Glaciology, Boulder, Colorado. plankton, Temperature, and Salinity in the White National Snow and Ice Data Center/World Data Sea. NESDIS Atlas 57, NOAA, Washington, D.C. Center for Glaciology, Boulder (Compiler) (2002) Fetterer, F., and K. Knowles (2004) Sea Ice Index. Online Glacier Photograph Database. Digi- Updated from 2002, digital media, National tized subset of the Glacier Photograph Collec- Snow and Ice Data Center, Boulder, Colorado. tion, National Snow and Ice Data Center, Boul- Fetterer, F., and V. Radionov (Ed.) (2000) Environ- der, Colorado. mental Working Group Arctic Meteorology NOAA/NESDIS/OSDPD/SSD (2004) IMS Daily and Climate Atlas. CD-ROM, Arctic Climatology Northern Hemisphere Snow and Ice Analysis Project, National Snow and Ice Data Center, at 4 km and 24 km Resolution. Digital media, Boulder, Colorado. National Snow and Ice Data Center, Boulder, Lappo, S., Y. Egorov, M. Virsis, Y. Nalbandov, E. Colorado. Makovetskaya, L. Virsis, I. Smolyar, and S. Serreze, M. (Compiler) (1997) Comprehensive Levitus (2003) History of the Arctic Explora- Ocean–Atmosphere Data Set, LMRF Arctic tion 2003: Cruise Reports, Data. CD-ROM, Subset. Digital media, National Snow and Ice International Ocean Atlas and Information Data Center, Boulder, Colorado.

90 91