Streamlining Publishing Procedures E. van Herwijnen1 and J.C. Sens2 (1 : CERN, 2: National Institute for Nuclear and High Energy Physics, Amsterdam) Big changes have taken place in the ways physicists prepare their scientific papers and new opportunities have arisen to limit retyping and to speed communication in the phases prior to publication. The EPS Publications Committee is exploring possible methods of standardization in the light of its survey of the word processors currently in use and the merits of the SGML generalized mark-up language.

The world of science publishing ap­ sors and networks, serving some of the should not have to worry about ques­ pears to suffer from indigestion in seve­ potential authors, editors, referees and tions of presentation. ral key areas and the road from logbook publishers (q.v.). The most obvious solu­ What is therefore needed is a system to journal is long and winding. Sub­ tion i.e. abandoning all but one of the of "generalised mark-up", i.e. a system mitting a paper to an editor resembles existing systems, is obviously out of the which permits the author to indicate the staring down a long tunnel; there is question, in view of the cost and effort logical purpose of a given textual ele­ often no end in sight. that have been invested in today's sys­ ment without considering its physical Attempts at improving this situation tems. appearance. For example, instead of centre on increased automation in the The only viable solution is to accept a entering — \ bf text \ rm ·—, the submission and publication of papers truly machine independent standard author would say — \ highlight text — through the use of inter-connected that is compatible with existing com­ and leave it up to the system to decide computers. While daily communication puter hardware, soft­ how the text would be highlighted. between some scientists has improved ware, and the requirements of transmis­ Identifiers like — \ highlight — are to the point where the absorptive sion across heterogeneous networks. called "tags". Such "generic mark-up capacity of the human brain is the limit, In the traditional system it is the languages" remove the need for the little use has been made of these faci­ copy-editor who annotates manuscripts author to bother about questions of lities for reducing delays in publishing with detailed instructions to the type­ presentation, and make the document papers. Although computers are used setter as to layout, fonts, spacings, in­ less device dependent since the tag — heavily by both publishers and authors, dentation, page numbering, etc. This \ highlight — may produce boldface there is as yet no link connecting the "mark-up" is the first step in bridging text on one system and italicised text on two ends of the process. the gap between the different MS for­ another. They are the natural extension mats and that of the final publication. In of the mark-up procedures in today's Need for a Standard Format electronic processing, a "mark-up lan­ word processors. Unfortunately how­ In a zero-order world, there would be guage" may be defined as a framework ever, mark-up codes are still heavily one type of computer, functioning with into which are embedded commands dependent on the word processing one operating system, housing one which spell out the exact action to be package which is used. package of word processing software, taken when such a command is en­ A meta-language is needed, which equipped with software for handling countered. For example — \ bf — carries the notion of generic mark-up mathematics and figures, connected to would imply that the text following one step further: the author should not a single network, and available to all must be printed in bold-face type style. have to worry about which word potential authors, editors and publi­ There is effectively mark-up in every processor will ultimately deal with the shers. An author would prepare a word processor: the string — \ bf text manuscript at the time of printing the manuscript, issue a preprint for local \ rm — placed in the middle of a line of journal. This meta-language should be consumption in the one and only stan­ text tells the system to put the piece able to define a mark-up language dard format, and send the manuscript [text] in bold-face type style and to con­ which is independent of any word pro­ through the network to the editor. The tinue in roman style from there on. This cessor. editor corrects the manuscript and for­ type of mark-up is called "specific In order to "connect" such a mark-up wards it, again through the network, to mark-up". language to a word processor language referees who return it with comments. Mark-up of this type can take on another ingredient is required namely an The editor then communicates with the many forms: it may be applied at the "application program" which translates author, and after a second round, sends point at which it is required (as in the mark-up symbols into symbols that are the manuscript to the publisher, who example), by means of a special charac­ recognized by the language of a particu­ stores the article in his database, gene­ ter at the beginning of a line (such as a lar word processor. Fig. 1 indicates the rates abstracts, arranges the page dot "), by means of a "style sheet" at relationship between a candidate for a layout and prints the journal, without the beginning of a paragraph, or by meta-language, SGML, and the diffe­ the need for rekeying the text. means of macro's at the top of the rent word processor mark-up langua­ In the real world, there is a bewilder­ manuscript, or by any combination of ges, as well as the role of applica­ ing variety of computers, word proces- the above. All these forms are restricted tion programs. The traffic is one-way: to a single word processor language. having generated a marked-up manu­ Their development has revealed an in­ script an author can "reach" one or more word processor languages by 1) Although there is a difference between creased need for a clear separation bet­ word- and text processing systems, we shall ween the content and the form of a invoking the corresponding application use one term to indicate both. manuscript: while writing, the author programs. 171 In the case of publishing a book, other programs could handle the generation of an index, etc. SGML A meta-language capable of describ­ ing any (software and hardware inde­ pendent) generalised mark-up language is the "Standard Generalized Mark-up Language (SGML)" developed by the International Standards Organisation (ISO). This system does not itself describe the structure of a document, but provides the framework for a mark­ up language that does. SGML has been an international stan­ dard for document interchange since December 1986 (ISO 8879) and it has attracted considerable attention from software developers (it is the second most sold ISO standard). The under­ lying assumption is that the text con­ sists of logical components called "ele­ ments"; SGML then specifies the Fig. 1 — SGML and its applications. manner in which the elements of a document should be indicated in the A scenario for using mark-up in the processor language to obtain a printed text. Marking-up is done via the "tags" publication of scientific papers is then version, while a fourth generates came­ mentioned above. A "Document Type the following. The author prepares his ra-ready copy appropriate to the journal. Definition (DTD)" defines the structure manuscript to his satisfaction on his own word processor and once he has Fig. 2 — Example of a physics article marked-up with SGML. finished the text, he modifies it in the context of a universally accepted gene­ ralized mark-up language. Fig. 2 shows an example of a tagged manuscript. It is

then processed and forwarded, either Experimental obeservation of lepton pairs of invariant by mail on a diskette or by transmission mass around 95 GeV/c² at the CFRN SPS collider through a network, to the editor of the </TITI.E> journal in question. Preprints for limited <COLLAB>UA1 Collaboration, CERN, Geneva, Switzerland distribution in advance of publication <DATE>3 june 1983 are obtained by invoking the application <AUTHLIST> software of the word processor in use in <AUTHOR ID=RAL>G. Arnison the author's Institute or University. The editor's computer reads the ma­ <AUTHOR ID=CERN>C. Rubbia nuscript and uses the application soft­ ware relevant to his processor, to obtain <ADDRESS ID=RAL>Rutherford Appleton Laborarory a readable, i.e. a tag-free, copy. He may then add his comments to the marked- <ADDRESS ID=CERN>CERN up version and send the revised manu­ </AUTHLIST> script on to one or several referees, who <ABSTRACT>Abstract use their own application programs to <P>We report the observation of four electron--positron pairs obtain readable copies for their own and one muon pair which have the signature of a twobody use. Their comments are added to the decay of a particle of mass ≈ 95 GeV/ctsup2;. These original marked-up manuscript (or a events fit well the hypothesis that they are produced by the process &pbar; + p &rarrow; Z&supO; + X (with copy), and returned via the editor to the Z&supO; &rarrow; &cell;&sup+; + ℓ&sup-;), where Z&sup0; is author if revisions are required. the Intermediate Vector Boson postulated by the electroweak Once the editing process is complete, theories as the mediator of weak neutral currents. the final marked-up manuscript is sent </ABSTRACT> to the publisher. The publisher uses the </TITLEP> tree structure of Fig. 1 for more than <BODY> one purpose. One application program <H1 >Introduction leads to an "archive" where the <P>We have recently reported the observation of large manuscript is classified and stored in a invariant mass electron--neutrino pairs... permanent fashion. Another copies the </BODY> abstract of the paper, if any, on to a file </ARTICLE> containing abstracts. A third translates the manuscript into a locally used word 172 for a given "class" of documents; A second drawback is that there are scientific articles could be defined to be only a small number of WYSIWYG such a class (see Fig. 1), (What You See is What You Get) word The DTD contains inter alia the fol­ processors that understand SGML, and lowing information: those that exist require a fair amount of - The names and definitions of the programming to set up an application. tags (and the end-tags) that may A third drawback is that authors take Mathematica ™ occur in the document. pride in creating aesthetically pleasing A System for Doing Mathematics by - The order, permissible frequency of documents; imposing a structure on a Computer appearance, and the nesting of the document by using a generalised mark­ A Wolfram Research Inc. product tags. up scheme defined by SGML is some­ □ Numerics - Works with numbers - The attributes of the tags (e.g. < p id times seen as a restriction on their crea­ of arbitrary magnitude and preci­ = A > permitting a paragraph to be tivity. sion. referred to elsewhere in the manu­ Some well known SGML systems are □ Symbolics - Encyclopaedia of script as < refpar refid = A >). Datalogics "Writerstation" (for IBM mathematical functions and opera­ - The reference symbols used in the compatible PCs); SoftQuad "Author/ tions used in arithmetic, algebra text. Editor" (for the Macintosh); the IBM and analysis. In order to let the DTD interact with SGML/DCF (Document Composition Procedural, functional and mathe­ the document some sort of framework Facility) translator products for VM/ matical programming. that will hold both the document and CMS and the Sobemap SGML parser for □ Graphics - 2D, 3D and animated the DTD is needed. This framework is VAX VMS and for other systems. PostScript graphics. called the SGML input system. It first □ Text processing - Fully interac­ task is to submit the DTD to a compiler Current Use of Processors: a Survey tive reports and textbooks. called the parser. The next task is to During 1988, the EPS Publications □ Runs on - MS-DOS based com­ have the compiled DTD check that the Committee formulated a plan of action puters; Macintosh, Apollo, Hew­ lett Packard, IBM AIX/RT, MIPS, document, as marked-up by the author, comprising the following: Silicon Graphics, Sony, Sun, VAX. contains no mistakes (e.g. no end-tag 1. Conduct a survey on the use of word where there should have been one). The processors for the preparation of Now available in Europe from: final tasks are to submit the document physics papers. MathSoft Overseas, Inc. to an application program which trans­ 2. Initiate a pilot project, to be carried POB 641, 1211 Geneva 3, Switzerland lates the tags etc. into a form suitable out by experts in SGML and develop Tel. ++41 (22) 46 52 60 for processing by a word processor an SGML input system, a DTD, a Fax ++41 (22) 46 59 39 such as TEX, <a href="/tags/Microsoft_Word/" rel="tag">Microsoft Word</a>, Script, parser, and application programs. etc. and to transport the document to Work on point 2 is in progress and will EPS, Academia (Czechoslovakian Acad, a file transfer utility for transmission not be discussed here. of Sci.), Edition de Physique, the Insti­ across a network. For the survey, questionnaires were tute of Physics Publishing Ltd., North- The principal limitation of SGML as distributed to publishers, who mailed Holland, Nuovo Cimento, the Royal sketched above is its user-unfriendli­ them to authors eg. along with proofs Swedish Academy of Sciences and ness in coding the manuscript with of papers. The questions concerned the Springer Verlag. tags. Replacing the tags (or some of types of computer, word processor Of these, 1089 or 39%, were retur­ them) by any context sensitive proper­ packages, the processing of mathema­ ned out of which 932 authors, or 85%, ties of the text (e.g. defining a blank line tics, storage media and access to net­ used a word processor in preparing their followed by a line indented by 3 spaces works. manuscript. as the equivalent of <p>, a new para- An estimated 2800 questionnaires The replies were subsequently divi­ grah) would violate the principle of were sent out to authors worldwide ded into those of users of large, multi­ separation of structure and appearance, who had submitted papers to a range of user systems and those of small, single- which is at the very basis of the system. European physics journals published by user systems, personal computers or work stations. Version numbers were ignored. Of the 932 authors using a word processor, 246, or 26%, used a large system, and 686, or 74% used a small system. This confirms the trend towards distributed computing and the suitability of small systems for word processing. For each of these two categories the sample was further sorted according to the type of system, the type of word processor, and the capability for hand­ ling mathematics. Fig. 3 shows the fraction of cases where the types of computer and word processor were known to the author ("unknowns" comprise the categories no math : 26 math. : 220 no math.: 157 math. : 529 "not mentioned", work done by secre­ Fig. 3 — Summary of the 1089 replies analysed, 39% of the total sent out. taries, etc.). 173 Fig. 4 — (Below) Large systems used by physicists for document SHARP RENFUL production. ERICSON VAX APOLLO TANDY Fig. 5 — (Right) Small systems. SINCLAIR SPERRY DONATEC ABK2 SIRIUS ADREX NCR CDC VICTOR TANDON DATA GENERAL CPT HITACHI AMSTRAD IMAX EPSON NBI APRICOT •NBI SIEMENS DECMATE FACOM OFUS APPLE P D P11 SUN GEC HP VANG XEROX GOULD BBC ZENITH MICROVAX OLIVETTI AES NEC NORSK DAT ATARI OTHER UNKNOWN MACINTOSH IBM UNKNOWN VAX IBM</p><p>Figs. 4, 5, 6 and 7 show, respectively authors used it. The next largest cate­ Concluding Remarks the large systems, the small systems, gory was IBM, with 20% of all users. The high cost of journals and the long the word processors on the large sys­ Among the small systems IBM domi­ delays in the appearance of papers in tems and the word processors on the nate with 30% of all users, while 16% journals on the one hand, and the ready small systems. used Macintosh. If one adds IBM com­ availability of word processing soft­ For the large systems, a total of 18 patible machines to the category IBM, ware, installed in desk top work stations different types were reported; of these this group increases by a further 10%. on the other, are pointing the way eight were reported by one author each. The number of these systems connec­ towards new, speedier ways of commu­ For the small systems, there are a total <a href="/tags/Ted_(word_processor)/" rel="tag">ted</a> to networks (via mainframes) is nicating scientific results among scien­ of 77 different types of computers, of unknown. tists. which 34 were reported by one author Among the word processors TEX do­ The major challenge lies in defining a each. minated on both the large and the small system that can act as an umbrella over systems followed closely by Wordstar the system currently in use. The Stan­ In the large systems 20 different on the small systems. In these totals, no dard Generalized Mark-up Language is a word processors were reported. Of distinction has been made between the contender for this position. It has been these, 10 were reported by one author various TEX macro packages (LaTEX adopted by organizations as diverse as each, and 15 have mathematics. In the PHYZZX, AMS-TEX etc.). Although TEX the Commission of the European Com­ small systems, 85 different word pro­ was used most frequently it still ac­ munities, the U.S. Department of De­ cessors were reported. Of these, 46 counts for only 22% of the users. fence, and Oxford University Press. The were reported by one author each, and Mathematics appears to be a general authors believe that the community of 71 have mathematics. requirement: more than 80% of the physicists working in physics institutes Among the large systems, VAX was users handled mathematical formulae and Universities should also adopt this the predominant computer: 46% of all with word processors. standard.</p><p>Fig. 6 — (Below) Text processing systems on large computers. The PROFITEXT LKPC PLUS category "other" refers to systems which were cited in one reply FRAMEWORK only. WPS VOULKSWRITER Fig. 7 — (Right) Text processing systems on small computers. The SAHNA CPT category "other" refers to systems which were cited in one reply WORDPLUS WORDMRILL only. SCICNCETEX HUE ΤΙΜΑΤΕ DSR EXP EM AX VIEY DECTEXT APPLEWDRKS SCRIBE 1ST WORD OPEN ACCESS WRITENOW SELFMADE NBI HIT MANUSCRIPT CTOS WORDMARC SPF DISPLAWRITE NBI SIGNUM WORD II MATHOR ATF VUWRITER WPS T3 WANG WP WORDPERFECT MASS II OTHER AES MSWORD NOT IS WP MACWRITE SCRIPT CHIWRITER RUNOFF UKWOWN UNKNOWN WORDSTAR TEX TEX</p><p>174</p> </div> </article> </div> </div> </div> <script type="text/javascript" async crossorigin="anonymous" src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8519364510543070"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script> var docId = '7230ce437985348265799effc15b6813'; var endPage = 1; var totalPage = 4; var pfLoading = false; window.addEventListener('scroll', function () { if (pfLoading) return; var $now = $('.article-imgview .pf').eq(endPage - 1); if (document.documentElement.scrollTop + $(window).height() > $now.offset().top) { pfLoading = true; endPage++; if (endPage > totalPage) return; var imgEle = new Image(); var imgsrc = "//data.docslib.org/img/7230ce437985348265799effc15b6813-" + endPage + (endPage > 3 ? ".jpg" : ".webp"); imgEle.src = imgsrc; var $imgLoad = $('<div class="pf" id="pf' + endPage + '"><img src="/loading.gif"></div>'); $('.article-imgview').append($imgLoad); imgEle.addEventListener('load', function () { $imgLoad.find('img').attr('src', imgsrc); pfLoading = false }); if (endPage < 7) { adcall('pf' + endPage); } } }, { passive: true }); </script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> </html>