EMANI Meeting

EMANI Meeting

July 25-26, 2002

EMANI Meeting

July 25-26, 2002

Cornell University Library

Ithaca, New York

EMANI – Electronic Mathematical Archiving Network Initiative: An international collaboration to support and coordinate the long-term preservation and open accessibility of mathematical publications in digital form.

Project Coordinator: Bernd Wegner

1. Project Tasks and Priorities

Immediate

Send proposed advisory board names to Wegner / All
Distribute PDFs of EMANI logo to team / Springer
Press release / Wegner and Hasan
Survey of the EMANI partners on metadata requirements / Göttingen
Collect and distribute “problematic examples” to test metadata / All

September

Send work package reminders 5 weeks prior to November meeting / Wegner
DOI report / De Kemp

October

Definition of work packages due – send to Becker / All WPs

November

November meeting / Göttingen
Website operational / Springer and Neuroth
Minimum list of metadata requirements available by November meeting / Metadata WP
Functional requirements for users – proposal available by November meeting (prototypes by March 2003) / Access, Nav, Design WP
Report on the NSDL / Becker
No date / ongoing
Prepare budget for EMANI / Springer
Prepare 2-year plans for digitization (toward a registry). Submit descriptions to Becker / All partner organizations
Assemble a list of organizations for outreach / All
Conference papers and publications (partner should review before submission) / All
Funding efforts / All
Interoperability of distributed system

2. Presentations

  Hans Becker (SUB Göttingen), “Retrodigitization”

  Pierre Bérard (Grenoble), “NUMDAM”

  Arnoud de Kemp and Syed Hasan (Springer), “Formalizing Project EMANI”

  Gertraud Griepke (Springer), “Contents and Formats: Existing Digital Sources”

  Jiang Airong, “The Development of the Chinese Math Digital Library at Tsinghua University”

  Bernd Wegner (TU Berlin), “Information dissemination”

  Bernd Wegner, “Ongoing Projects”

  Project Euclid economic model

3. Meeting Minutes

Below are the minutes from the July 25-26 meeting of the EMANI project, held in Olin Library at Cornell University, Ithaca, New York. The notes are my paraphrase of the transactions and should not be taken as a direct transcript of the discussions. Links to participants’ PowerPoint presentations are provided above, as well as in the text of the minutes. Brief notes on the presentations in the minutes are meant to supplement the slides. – Kizer Walker

N.b. Tasks requiring specific follow-up from participants are marked with “à” and bold face.

Thursday, July 25

Sarah Thomas (CUL) opened the meeting by welcoming the EMANI partners to Cornell University Library and wishing the group success in its collaboration.

· · ·

Bernd Wegner (TU Berlin/Zentralblatt) provided an overview of projects relevant to digital archiving in general and the archiving of math materials in particular. [View presentation]. He noted that several participants in the EMANI meeting would go on to attend the July 29-30 planning meeting of the Digital Mathematics Library project at the US National Science Foundation in Washington, DC. Among other developments, Wegner pointed to a backfiles system being developed by Elsevier for 39 of its journals as a possible competitor for EMANI. He emphasized the importance of establishing visibility for EMANI, noting that content results should be made available as soon as possible – even before the infrastructure is perfected, since the infrastructure is not visible.

Discussion:

Gertraud Griepke (Springer Heidelberg) stressed that the priority assigned to archiving digital materials must influence current production. Today’s digital production must facilitate a future transition to the archives. What is new and significant about EMANI is this way in which the project unites future and past. Bernd Wegner agreed, adding that this influence on the production of digital materials must be a measure of EMANI’s success, not only what the project is able to make available on the web.

Hans Becker (SUB Göttingen) cited the development (e.g. in Germany and the UK) of preservation and archiving projects that are regionally based, rather than subject-based. SUB Göttingen is attempting to play the role of coordinating these two approaches.

Tom Hickerson (CUL) explained that in the US context, subject-based, national (LoC), and institutional archives (i.e., archives that collect scholarship in the institution where it is produced) are typically understood as competing models. He insisted that these models must interoperate. Today (especially in the US context), there can be no single solution, and at the moment, no single model is dominant. EMANI can serve as an exemplar of the subject-based approach and at the same time advocate for interoperability among the different models.

Several of the Work Package subjects assigned at the February Heidelberg meeting were discussed in the context of a grant proposal that EMANI partners from Cornell and Göttingen are preparing for submission to NSF/DFG. The proposal – “Access to Mathematics Over Time: Cooperative Management of Distributed Digital Archives” -- responds to a call for proposals from DFG and NSF’s joint funding program for International Digital Libraries Research (see <http://www.dfg.de/foerder/biblio/neues/dfg_nsf.pdf>). The NSF/DFG proposal took priority at the EMANI metadata meeting in Göttingen in May. Some of the participants at the Ithaca meeting also met separately to finish the proposal, which had an August 1 submission deadline.

David Ruddy (CUL) described the proposed collaboration between SUB Göttingen and CUL as a project that could build, with external funding, core working components of EMANI. If funded, the project will focus on the archiving of electronic journal materials in a distributed system that is open and non-proprietary, taking into account the heterogeneous systems at the different institutions. The partners would develop metadata to support long-term archiving in the framework of the Open Archives Initiative.

Hickerson stressed that the grant activity should not be seen as an attempt to break away from the larger EMANI project. The proposal responds to a specific funding opportunity that has arisen. It is a grant proposal to the two funding agencies, but should be understood at the same time as a proposal to the EMANI group, a suggestion of the direction the larger project can take.

Becker urged that the grant be pursued as a means of fulfilling the tasks laid out in the EMANI Work Packages, but that the Work Package structure be retained, along with the assignments made to the particular organizations in EMANI.

Hickerson emphasized that while the NSF/DFG grant is an International Digital Libraries Research grant specifically requiring US-German collaboration, in the EMANI context, the grant activity would be the business of all the national partners.

Here discussion diverged from the NSF/DFG proposal. Wegner asked whether the style files promised by Springer-Verlag in at the Heidelberg meeting in February were forthcoming. The Springer partners assured the group that the publisher is at work on style files for worldwide production.

Wegner suggested the group resume the discussion begun at the Göttingen meeting (May ’02): What does EMANI want to preserve? What are the minimum requirements?

Thomas posed the question of the extent to which “context” should be preserved along with “content.” For instance, is it desirable to archive information about a journal’s editorial board? This is not only a metadata question, but also a policy question. Not only a question for mathematicians, but also a question for historians and other user groups.

Arnoud de Kemp (Springer Heidelberg) characterized the “context” question as a structural question: how do the originators of information structure that information?

Pierre Bérard (Grenoble) responded that this is a technical point of view, and not an end-user perspective.

On the particular question of editorial boards, Ruddy noted that Project Euclid had made a decision to capture them in the metadata. But it would difficult to include them retroactively.

Wegner proposed that the group develop a set of potential preservation problems and a set of minimum requirements, with examples.

Hickerson agreed that these kinds of questions need to be addressed immediately as a starting point for future developments.

Hickerson suggested a new Work Package might focus on the question of minimum requirements for content/context archiving.

Jean Poland (CUL) suggested that math librarians consult historians of science on the question of “context” preservation needs.

Griepke: EMANI needs to arrive at a clear definition of what a “journal” is (including, e.g., table of contents, index).

De Kemp: This is a question of structure – the structure of the journal.

Becker noted that SUB Göttingen has already developed a document to address this.

Heike Neuroth (SUB Göttingen) is working on a survey of the EMANI partners on metadata requirements.

Becker reminded the group that we must consider the economics of preservation in formulating metadata requirements. Hickerson cautioned against expending too much of the group’s efforts in the search for a perfect solution – we must make our best guess at what must be preserved and move on. Must be flexible and pragmatic and not get bogged down by the immenseness of the question of what to keep.

à Hickerson proposed the November EMANI meeting as a deadline for developing a minimal list of metadata requirements.

Marcy Rosenkrantz (CUL) asked whether the group is prepared to preserve TeX on the assumption that TeX will continue to be usable in the future.

Ruddy: If we want to preserve presentation formats, preserving TeX will not suffice.

Steve Rockey (CUL) added that publishers must decide whether print or electronic will be the copy of record.

In summary, Thomas noted that two pieces are needed to move forward with the formulation of metadata requirements: the results of the Göttingen metadata survey and examples of problematic materials.

à Neuroth will distribute survey and results.

à Wegner agreed to collect and distribute examples of problematic materials.

à Griepke agreed to distribute examples from Springer.

The problematic examples will be tested against the proposed metadata – can the metadata describe the examples?

à The five participating groups will make recommendations based on the tests.

Access, Navigation, Design

Hickerson: the next step after coming to some conclusions about metadata requirements will be to create models/prototypes of a user system. When should such models be ready for review by the group? In one year?

Neuroth: Six months would be better. If prototypes are available sooner, this experience can still influence the development of the metadata.

à Ruddy: The access, navigation, design group (headed by CUL) will propose functional requirements for users in time for the November meeting.

Ruddy asked for clarification of whether the group understands the EMANI project to be creating a “dark” or a “light” archive. Are access controls, etc., a requirement? He noted that he has been working under the assumption that the project will produce a light archive.

Hickerson recalled that Rüdiger Gebauer had spoken at the February meeting in Heidelberg of opening the archive after a “short period.” He asked for the Springer-Verlag partners to make clear if this was not correctly understood.

De Kemp responded that it will not be difficult to discuss opening access to the archive with Springer. He pointed to the open discussion between Springer and Göttingen on Göttingen’s digitization of Springer math books. Springer is willing to discuss an open archive – but probably with restrictions. This might mean a model with 2 or 3 time layers. In any case, Gebauer must be involved in this discussion. There is currently no written policy on this question at Springer.

Hickerson answered that time constraints on access, a rolling model, do not pose a problem for CUL. What would be problematic is a subscriber-specific system to support subscription processes – this would be too expensive for the library.

De Kemp noted that part of the question of access has to do with the definition of “archive”: is it old materials? Current library collections?

Access, Navigation, Design tasks:

à basic functional specifications: by November meeting

à prototypes for presentation to the group: March 1 (to share with the group for discussion in April)

Participants from the CUL group noted that the prototypes should not represent a universal model, but rather prototypes to be adapted to needs of different user communities.

Architecture

Frank Klaproth (SUB Göttingen) opened the discussion on EMANI architecture. He emphasized that the system should be OAI compliant. It will be a distributed system. Rights management must be addressed, as well as the question of which institution holds the digital materials.

Wegner outlined a model of a distributed system in which copies of content would be stored at multiple mirror sites, but individual institutions would retain proprietary rights over specific content. This is the model employed by Zentralblatt MATH and EMIS.

De Kemp said he expected such a model would be of interest to CUL and Göttingen, but wondered whether France or Tsinghua would also be interested.

Bérard responded that the French partners would be interested in principle, but that it is too early for a decision.

Thomas referred to the ongoing discussion in the archives community of the status of mirror sites. Most hold that while mirror sites represent an important safety net for access, they do not constitute an archive.

Nancy McGovern (CUL) drew a distinction between redundancy and replication in the context of digital archiving. Because a mirror site holds copies of documents within an environment that replicates the main site, it is vulnerable to the same problems as the main site. Redundancy in the sense advocated by the LOCKSS project (http://lockss.stanford.edu/) provides more security by storing copies in a different environment.

Wegner added that the EMANI architecture should accommodate updates.

De Kemp envisioned a docking station for the delivery of information directly (i.e., Springer-Verlag to EMANI) in the desired format. Springer wants to see a platform for the delivery of digital information that offers an alternative to Elsevier. This would not be a proprietary Springer system, but an open system with participation from multiple publishers.

Thomas summarized the criteria for evaluating the EMANI architecture: capacity for interoperability with other systems, accommodation of updates.

Retrodigitization

Becker presented on SUB Göttingen’s Retrodigitization plan for the next two years. [View presentation]. Göttingen will continue work on Springer journals. They wish to create a definitive list of Springer math journals. Teubner journals are also a priority, but like Birkhäuser