A Comparative Study of Methods for Bibliographies 290 Tugboat, Volume 32 (2011), No
Total Page:16
File Type:pdf, Size:1020Kb
TUGboat, Volume 32 (2011), No. 3 289 A comparative study of methods the features of these successive bibliography proces- for bibliographies sors. Finally, a synthesis is given in Section 3. As mentioned above, the present article only covers bib- Jean-Michel Hufflen liography processors used in conjunction with LATEX, Abstract it is complemented by [25] about bibliography proces- sors used in conjunction with ConTEXt [11], another First, we recall the successive steps of the task per- format built out of TEX. Reading this article only formed by a bibliography processor such as BibTEX. requires basic knowledge about LATEX and BibTEX. Then we sketch a brief history of the successive meth- Of course, the short descriptions we give hereafter ods and fashions of processing the bibliographies of do not aim to replace the complete documentation of (LA)T X documents. In particular, we show what is E the corresponding tools. Readers interested in typo- new in the LAT X 2 packages natbib, jurabib, and E " graphical conventions for bibliographies can consult biblatex. The problems unsolved or with difficult [4, Ch. 10] and [7, Ch. 15 & 16]. implementations are listed, and we show how other processors like Biber or Ml T X can help. Bib E 1 Tasks of a bibliography processor Keywords Bibliographies, bibliography proces- sors, bibliography styles, Tib, BibTEX, MlBibTEX, In this section, we summarise the tasks to be per- Biber, natbib package, jurabib package, biblatex pack- formed by a bibliography processor such as BibTEX. age, sorting bibliographies, updating bibliography Along the way, we give the terminology used through- database files. out this article. Then we point out the features that are still unsolved or with difficult implementations. Introduction Of course, a bibliography processor works in conjunc- tion with a text processor such as LAT X or ConT Xt, As mentioned at the beginning of “‘Bibliography E E denoted by ‘the word processor’ in the following. Generation”, the 13th chapter of The LATEX Com- panion’s Second Edition [35], the items of a printed End-users can use bibliography database files, document’s bibliography may be composed manually, containing bibliographical entries. The main role but this method is not recommended, since the result of a bibliography processor is to extract the biblio- may not be reusable within another context. Indeed, graphical references of a document from these entries. bibliography layouts are very diverse: a publisher According to BibTEX’s standard use, bibliographical may require that authors’ names are written in ex- entries (resp. references) are stored in .bib (resp. .bbl) tenso as far as possible, whereas another prefers for files. Let us notice that in some documents, there first names to be abbreviated using only initials, etc. is no ‘References’ section, but rather bibliographical So the best way to deal with bibliographies is the use references are given as footnotes wherever they are of bibliography database files, containing the whole cited. In other words, bibliographical references exist information about bibliographical items. These data- as resources and are intended to be typeset — so the base files are searched by a bibliography processor, bibliography processor must build them as process- which builds ‘References’ sections for printed or on- able by the word processor — possibly as a section line documents. or sparsely. Sometimes there are several ‘References’ sections, because each chapter of an important book As we recall below, BibTEX[38] was unrivalled for a long time as the bibliography processor used has its own bibliography, or a unique bibliography is divided into several rubrics. in conjunction with the LATEX word processor. Now the landscape is changing and other comparable pro- When several bibliographical references are to grams have come out. So this article aims to focus be grouped into a section, some bibliographies are on the directions taken by BibTEX and its possible unsorted, that is, the order of items must be the order successors. In [19], we compared the programming of first citations of these items throughout the docu- languages used to design bibliography styles, con- ment. In practice, most bibliographies are ‘sorted’, trolling bibliographies’ layout. The present article’s most often according to first the authors’ names,1 purpose is different: we are interested in the evolu- second the dates: in such a case, it is up to the bib- tion of some successive bibliography processors used liography processor to perform this sort operation. in conjunction with LATEX, this evolution still being Let us mention that the ‘standard’ sort given above in progress. First, in Section 1, we delineate the is not universal: we personally were in charge of tasks to be performed by such a bibliography proces- sor. We also explain the requirements for how such a 1 . or editors’ names, when there is no author, for ex- program should be updated. Then Section 2 sketches ample, for a conference’s complete proceedings. A comparative study of methods for bibliographies 290 TUGboat, Volume 32 (2011), No. 3 the publication list of our laboratory — the LIFC2 — universal character encoding, Unicode [43], was de- when the activity report was written according to the signed, including some formats7 — such as UTF-8 directives given by the AERES:3 we had to sort this and UTF-16 — that allow the complete set of Unicode list first by research teams, second by categories,4 characters to be represented by byte or double-byte third by years decreasingly, fourth by authors’ names sequences. A modern bibliography processor should increasingly, fifth by months decreasingly. be able to deal with all these different encodings, Each bibliographical entry is supposed to be in particular UTF-8, which is becoming more and accessible from a citation key, that is, citation keys more common. Another point related to multilin- must be non-ambiguous. Source texts written by guism may be viewed as a particular case of software end-users only contain citation keys to point to bibli- localisation. Let us give an example about person ographical resources, whereas results typeset by the names: first names are usually put before last names word processor deal with bibliographical keys. It is in most languages written with the Latin alphabet; up to the bibliography processor to build a mapping as a counter-example, that is not the case for the between citation and bibliographical keys. These Hungarian language, where last names come first; a bibliographical keys depend on the system chosen; style suitable for bibliographies of documents written as an example, they are positive natural numbers in in Hungarian should take this point into account. Re- the number-only system. Sometimes, bibliographical garding the document’s language, some information keys are built from the first letters of authors’ names, included in bibliographical entries should be included followed by the year and possibly by a letter; so does or discarded. For example, a transliteration of titles BibTEX’s alpha bibliography style. In some other of works in Russian written with the Cyrillic alpha- systems — e.g., the author-date or author-number bet may be of interest for a document in English, system — some parts of an entry can identify it obvi- but would be useless for a document in Russian. ously: cf. [4, Ch. 10] or [35, Ch. 12] for a survey about A modern bibliography processor should gen- these systems. Let us mention that the alpha bib- erate source texts for word processors that typeset liography style belongs to the number-only system, documents to be printed — as mentioned above — but rather than the author-date one, because bibliograph- should also be able to build bibliographies for online ical keys are univoque — like natural numbers — and documents. Besides, let us recall that for several atomic in the sense that you cannot divide them years, the XML8 metalanguage has become a central into an author and year parts. In other words, they formalism for data interchange in general and for actually work like natural numbers, up to an isomor- production process of documents in particular. In phism. other words, a bibliography processor should be able Given a document, rules governing the layout of to deal with languages using XML-like syntax — a bibliographical references and citations, including or- good example is XSL-FO9 — and languages for the dering bibliographical items in a ‘sorted’ ‘References’ Web, such as (X)HTML.10 section, comprise a bibliography style. What we have expressed above could have been Last but not least, a bibliography processor for A put down when BibTEX was designed and put into (L )TEX source texts should be able to deal with bib- action, in the 1980s. Since that time, some addi- liography database (.bib) files written according to tional requirements have appeared. First, the charac- BibTEX’s format, because of backward compatibil- ter encoding that was most commonly used at that ity. As proof of this program’s success, there is a time and for a long period was ASCII,5 7-bit based. huge number of such files in end-users’ directories. Later, some 8-bit extensions — such as Latin 1 or However, that may be also viewed as legacy. Any- Latin 26 — allowed some additional characters used way, if another format for bibliographical entries was in non-English languages to be included. Then a adopted, a converter from .bib files into this new format would be needed. 2 Laboratoire d’Informatique de l’université de Franche- Comté. 7 ‘UTF’ stands for ‘Unicode Transformation Format’. 3 Agence d’Évaluation de la Recherche et de l’Enseigne- 8 EXtensible Markup Language. Readers interested in an ment Supérieur, that is, ‘agency evaluating research and introductory book to this formalism can consult [41]. university courses’.