@Xxx.Lanl.Gov
Total Page:16
File Type:pdf, Size:1020Kb
@xxx.lanl.gov first steps toward electronic research communication Paul H. Ginsparg [email protected] is the e-mail short period the primary means of com- The archiving software has been ex- address for the first of a series of municating ongoing research informa- panded to serve a number of other re- hautomated archives for electronic tion in formal areas of high energy par- search disciplines (see Figure 1). The communication of research information. ticle theory. Its rapid acceptance within extended, automatically maintained This “e-print archive” went on-line in this community depended critically on database and distribution system cur- August, 1991. It began as an experi- both recent technological advances and rently serves over 20,000 users from mental means of circumventing recog- particular behavioral aspects of the more than 60 countries and processes nized inadequacies of research journals, community. There are now more than over 30,000 messages per day. It is al- but unexpectedly became within a very 3600 regular users of hep-th worldwide. ready one of the largest and most active 156 Los Alamos Science Number 22 1994 @xxx.lanl.gov databases on the internet. This system may be a paradigm for worldwide, dis- cipline-wide scientific-information ex- change when the next generation of “electronic-data highways” begins to provide more universal access to high- speed computer networks. Background The rapid acceptance of electronic communication of research information in my own community of high-energy theoretical physics was facilitated by a pre-existing “preprint culture,” in which the irrelevance of refereed journals to ongoing research has long been recog- nized. Since the mid-1970s the primary Figure 1. Number of Users of E-Print Archives means of communicating new research The bar chart shows the combined number of e-print-archive users over the period be- ideas and results has been a preprint- ginning August 1991 and ending April 1994. The data include users of the following distribution system in which printed e-print archives: High-energy particle theory (formal), started August 1991; Algebraic copies of papers were sent through the geometry, started February 1992; High-energy particle theory (phenomenological), start- ordinary mail to large distribution lists ed March 1992; Astrophysics, started April 1992; Condensed-matter theory, started at the same time that they were submit- April 1992; Computational and lattice physics, started April 1992; Functional analysis, ted to journals for publication. (The started April 1992; General relativity/Quantum cosmology, started July 1992; Nuclear larger high-energy physics groups typi- theory, started October 1992; Nonlinear Sciences, started March 1993; Economics, cally spent between $15,000 and started July 1993; High-energy experimental physics, started April 1994; Chemical $20,000 per year on copying, postage, physics, started April 1994; Computation and language, started April 1994. and labor costs for their preprint distri- bution.) Typically, it takes six months puter scientist Donald E. Knuth of dardized Postscript files produced by a to a year for a paper to appear in a Stanford. TeX was soon adopted as variety of graphics programs. journal. Members of our community our standard scientific wordprocessor, A second technological advance have therefore learned to determine and for the first time we could produce achieved during the same period was from the title and abstract (and occa- for ourselves a printed version equal or the exponential increase in computer sionally the authors) whether we wish to superior in quality to the published ver- network connectivity. By the end of read a paper as well as to verify results sion. TeX has the additional virtue of the 1980s, virtually all researchers in ourselves rather than rely on the alleged being based on ASCII, so transmitting this community were plugged into one verification of overworked or otherwise TeX files between different computer or another of the interconnected world- careless referees. The small amount of systems is straightforward. Collabora- wide networks and were using e-mail filtering provided by refereed journals tion at a distance became extraordinari- on a daily basis. Finally, the develop- plays no effective role in our research. ly efficient, since we no longer had to ment of large on-line archives of re- Taking advantage of advances in express-mail versions of a paper back search papers has been enabled by the computer software and hardware, many and forth and could instead see one an- widespread availability of low-cost, of us had already begun using highly other’s revisions essentially in real high-powered workstations with high- informal mechanisms of electronic-in- time. Figures and technical illustrations capacity storage media. After compres- formation exchange by the mid-1980s. can also be generated within a TeX-ori- sion, storing an average paper with fig- The first such advance was a program ented picture environment or, more ures requires 40 kilobytes of memory. called TeX, which was written by com- generally, can be transmitted as stan- Thus one of the current generation of Number 22 1994 Los Alamos Science 157 @xxx.lanl.gov Figure 2. User Interface for E-Print Archives The left side of the sample screen above shows two user interfaces for accessing the e-print archives. The window in the upper left corner shows abstracts received through e-mail, and the window in the lower left corner shows the graphical user interface provided by a WorldWideWeb client (in this case OmniWeb.app running under NeXTstep) accessing the frontpage http://xxx.lanl.gov/. (The underlined text signifies network hyperlinks that bring up new hypertext when clicked upon.) A paper extracted from the e-print archive appears in the window on the right side, where it can be read or sent to a printer. 158 Los Alamos Science Number 22 1994 @xxx.lanl.gov rapid-access gigabyte disk drives cost- Implementation isting e-mail distribution lists in the ing under $1,000 can hold 25,000 pa- subject of two-dimensional gravity and pers at an average cost of 4 cents per Having concluded that an electronic conformal field theory. Within six paper. Slower-access media for preprint archive was possible in princi- months the user base grew to encom- archival storage cost even less: A digi- ple, I spent a few afternoons during the pass most of the workers in formal tal audio-tape cartridge, available from summer of 1991 writing the original quantum field theory and string theory, discount electronics dealers for under software. It was designed as a fully au- and now includes the 3600 subscribers $15, can hold over 4 gigabytes, that is, tomated system in which users con- mentioned above. Its smooth operation over 100,000 such papers. The data struct, maintain, and revise a compre- has transformed it into an essential re- equivalent of multiple years of most hensive database and distribution search tool—many users have reported journals is often far less than the network without outside supervision or their dependence on receiving multiple amount many experimentalists handle intervention. The software is rudimen- “fixes” each day. The original hep-th every day. Moreover, the costs of data tary and allows users with minimal archive now receives roughly 200 new storage will only continue to decrease. computer literacy to communicate e- submissions per month, responds to Since storage is so inexpensive, an mail requests to the Internet address more than 700 e-mail requests per day, archive can be duplicated at several dis- [email protected]. Remote users and transmits more than 1000 copies of tribution points, minimizing the risk of can submit and replace papers, obtain papers on peak days. Internet e-mail loss due to accident or catastrophe and papers and listings, get help on avail- access time is typically a few seconds. facilitating worldwide network access. able commands, search the listings for The system originally ran as a back- The Internet runs 24 hours a day—with author names, and so on. ground job on a small UNIX worksta- virtually no interruptions—and transfers The formal communication provided tion (a 25-megahertz NeXTstation with data at rates of up to 45 megabits per by an “e-print archive” should be dis- a 68040 processor purchased for rough- second (that is, less than a hundredth of tinguished from the informal (and unar- ly $5,000 in 1991), which was primari- a second per paper). Projected up- chived) communication provided by ly used for other purposes by another grades of NSFnet to a few gigabits per electronic bulletin boards and network member of my research group, and second within a few years should be news. In the case of an e-print archive, placed no noticeable drain on CPU re- adequate to accommodate increased researchers are restricted to communi- sources. The system has since been usage for the academic community. cation by means of abstracts and re- moved to an HP 9000/735 that sits ex- The commercial networks that will con- search papers suitable for publication in iled on the floor under a table in a stitute the nation’s electronic data high- conventional research journals. Elec- corner. way will have even greater capacity. tronic bulletin boards are more akin to For those directly on the Internet, These technological advances—com- ordinary conversation or written corre- the system allows anonymous FTP ac- bined with a remarkable lack of re- spondence; that is, they are neither in- cess to the papers and listings directo- sponse to the electronic revolution from dexed for retrieval nor stored indefinite- ries. Now access can be gained conventional journals—rendered the de- ly. The e-print archives allow a through WorldWideWeb for those with velopment of e-print archives “an acci- submitter to replace his or her submis- the required (public-domain) client soft- dent waiting to happen.” Perhaps more sion, and the program automatically ware (see Figure 3).