<<

Features

From the Los Alamos Archive to the arXiv: An Interview with

Electronic dissemination of findings electronic repository of the full texts in arXiv funded? has long interested editors, but many TeX. She said it was something they’d love Articles can be submitted by e-mail, by of us in CSE know relatively little about such to do but would require too much manual anonymous ftp, or by Web upload. Any dissemination in fields other than and labor beyond what they were already doing. package of files can be submitted, and they medicine. Therefore, for this issue of Science Neither of us thought in terms of a fully arrive with metadata (authors, title, , Editor, I interviewed physicist Paul Ginsparg, automated system; moreover, as a faculty and so on) in a specified format for use in who in 1991 developed the Los Alamos member at Harvard at the time, I wouldn’t generating the search indexes. For the last Electronic Preprint Archive, recently renamed have had the elective time to pursue it few years it had been supported by a com- arXiv. I appreciate his having answered my anyway. bination of National Science Foundation questions while busy moving from Los Alamos By 1991, full-text articles in my of (NSF), Department of Energy (DOE), and National Laboratory (LANL) to Cornell were being regularly e-mailed LANL library funds. At Cornell it will be University, where he is continuing to main- to a mailing list, and at Aspen that sum- supported instead by a combination of NSF tain the arXiv in conjunction with the Cornell mer a physicist commented about being and Library funds (that University Library. inundated with these “large” files (actu- is, no longer DOE funds). Barbara Gastel ally much smaller than the typical .doc or .pdf attachment these days). By then How has the arXiv evolved over What is the arXiv? How can one I had my own workstation rather than a the years? access it? shared mainframe, knew how to program In 1993, as Web browsers became more The arXiv is an automated electronic repos- it, knew how minimal the diskpace and commonplace, we added a Web interface itory that permits researchers to deposit CPU requirements of such a system would to the original e-mail interface. Most of the their full-text research articles, including be, and, having joined LANL as a research other changes have been incremental: bet- all graphics, and permits interested parties staff member in 1990, had the time to ter autoprocessing of submissions, improved to access them free of charge. The articles undertake such a project. indexing and searching, addition of the are typically posted either before, during, or I decided it would be feasible to set up international network. The basic after , at author discretion. The an automated e-mail repository for the full core operations and underlying philosophy arXiv can be accessed via the World Wide text with an alert system that sent around have remained unchanged. Web at arXiv.org/ or by its historical e-mail only the accumulated new abstracts once a and ftp interfaces at the same address. It day with instructions for retrieving the full What is your role in the arXiv? includes a subscription list that provides a articles via automated e-mail request. Later What do you do to maintain it? subject-based alert system for new submis- that summer (after some travel), I spent an Are others involved, and what are sions. afternoon or two implementing the soft- their roles? ware and put it online, and it caught on Since 1993, when the NSF funding started, How did the idea for the arXiv immediately. I was originally anticipating I’ve typically employed two people to help arise? How quickly did it catch about 100 submissions per year from the with software development and provide an on? roughly 200 people in the one little sub- e-mail “help desk” for occasional questions The idea germinated for a few years. By field it originally covered, but there were that arise that need personal intervention. the middle 1980s, most physicists and multiple submissions per day from day 1, My own technical role in the last few years were using the scientific and by the end of the year a few thousand has been minimal—not much time left typesetting language TeX to produce their people were involved. after securing funding and giving presen- documents, and we’d switched from using See arXiv.org/show_monthly_submis- tations at meetings (this is sad because the telephone to e-mail for much of our sions for how this developed. We received designing and writing software were my communication. 33,159 new submissions in calendar year only real talents in any of this). In 1987, I’d mentioned to librarian Louise 2001. Addis at the Stanford Linear Accelerator How has disseminating papers Laboratory (SLAC) library, in charge of How does the arXiv work? For on the arXiv related to publishing a title-author indexing system for high- example, how are papers submit- papers in journals? For example, energy known as SLAC-Spires, ted and disseminated? How is the does inclusion in the arXiv tend to that they should consider maintaining an replace publication in a journal?

42 • Science Editor • March – April 2002 • Vol 25 • No 2 Features Interview with Paul Ginsparg continued

Does it tend to precede it? prominent is the arXiv in those rent “adult” implications on the . As mentioned earlier, authors are free to fields? (There’s also a little story behind the submit either before or after journal sub- The ArXiv had roughly 185,000 total sub- “xxx”, but fortunately you only asked about mission (or not submit to a journal at all). missions by the end of calendar year 2001. arXiv.) Some journals permit electronic submission The main fields represented are physics, In late 1998 I decided we needed to directly from the arXiv; that is, they have a , nonlinear dynamics, and register an “.org” domain name to facilitate submission form that permits simply speci- science. The arXiv’s greatest rapid redirection of accesses to a different fying the arXiv-assigned identifier. prominence is in physics, in which some main site in the event of hardware or net- Many high-energy physicists have asserted subfields (such as high-energy physics, work problems and to be able to normal- that the journals are less relevant and they where it started) have had virtually 100% ize the mirror-site names (for example, can get by on the arXiv alone, and there still participation since the middle 1990s. The uk.arXiv.org for the UK site, fr.arXiv.org remains much truth to that, at least as far as fastest growing fields since then have been for the French site, and so on). All the communication of research results goes. astrophysics and condensed- physics. archive.org, archives.org, thearchive.org, and But there remains much lingering conser- thearchives.org domains were already taken, vatism in the system: If people still need the Do you foresee expanding the so I had to dream up something else. While “peer-reviewed” publications for grants and arXiv to include fields not yet rep- driving up to Taos for a holiday dinner, I jobs, and moreover if it’s relatively painless, resented? If so, what might be decided that since the word had a Greek then why not take that additional small some of the issues in doing so? root I could use X to indicate the Greek step? It doesn’t “cost” anything, and it’s a Expansion is certainly likely, and there chi, imitating ’s usage in form of insurance policy. I recently scanned have been many requests from repre- the scientific typesetting language TeX the high-energy physics hep-th and hep-ph sentatives of fields that would like to be (pronounced Tech). I liked being able to archives for submissions entered during 1999 included. The likely problems are more preserve at least one of the original three and found that over 70% had an entered sociologic than technical. For example, it x’s, and I recognized the virtues of a unique journal reference. (The journal references seems that in the biomedical and sci- “brand name”. At dinner, I wrote down for these fields are provided by SLAC-Spires ences, researchers have ceded a great deal “arXive” on the back of a receipt to get my instead of relying on authors, so they’re fairly of power to high-visibility journals that wife’s opinion, and she suggested eliminat- well covered.) The remaining percentage might act to suppress this alternative mode ing the final e (as in the German archiv), includes a substantial fraction of conference of research communication. (Certainly the and that’s what I went ahead and registered proceedings and theses, so only a relatively “Public Library of Science” movement, a couple days later. small number were never submitted to jour- publiclibraryofscience.org, is an alternative It looked odd at first, but people get used nals or rejected by them. means of reforming some of these practices to these things. My intuition is that even if the journal from within.) Also, the word archive itself goes back to system were to be abandoned by this most the Greek archos for ruler, or arche to begin “radical” community, some form of review What other changes do you fore- or rule, where archeion was the government system would be reinvented anyway, so it see for the arXiv? What effect do house and led to the Latin archivum as a makes sense to remain in coordination with you think your move to Cornell is place where public records or historical the professional societies (like the American likely to have? documents were preserved. Hence, it was Physical Society) so they can adiabatically The main effect of moving to Cornell is that natural to have started this in the .gov evolve to where they need to go instead of the system will at last have a solid long-term domain. having to rise from the ashes later. institutional . In principle, that will ulti- mately permit me to return full-time to my For those who wish to learn more Are papers edited or reviewed in primary avocation, for which I’m somewhat about the arXiv, what sources of any way before they appear in the better trained: physics research. The move further would you arXiv? Does inclusion in the arXiv to Cornell also enhances the possibility of recommend? itself serve an editorial role? expanding into other fields, since it is such a My most recent writeup for a UNESCO No explicit editorial role; submissions are as broad-based academic institution. conference on “ in entered by the authors. The only screening science” can be found at arXiv.org/blurb/ is of the e-mail address of the submitter to For those of us interested in word pg01unesco.html, and it contains refer- ensure a recognized institutional affiliation. origins: How did you decide on ences to earlier resources. arXiv? If you haven’t yet said so: How The main site at Los Alamos had been Many thanks. large is the arXiv? What are the named xxx.lanl.gov. This was back in 1991, main fields represented? How long before “xxx” had acquired its cur-

Science Editor • March – April 2002 • Vol 25 • No 2 • 43