A Modern Bibliography Management Tool
Total Page:16
File Type:pdf, Size:1020Kb
CrossTEX: A modern bibliography management tool Robert Burgess and Emin G¨unSirer Cornell University, Ithaca, NY http://www.cs.cornell.edu/People/egs/crosstex Abstract CrossTEX is a new bibliography management tool that eases database mainte- nance, style customization, and citation. It is based on an object-oriented data model that minimizes redundant information in bibliographic databases. It en- ables a work to be cited not only through arbitrarily-assigned object keys but through semantic information that uniquely identifies the work. It automates common tasks in order to avoid human errors and inconsistencies in bibliogra- phies, while providing users with fine-grained control. CrossTEX's other features include support for modern reference objects such as URLs and patents, direct generation of HTML documents, the ability to write styles in a modern program- ming language, and extensive databases of published works included in the distri- bution. It is backwards-compatible with existing BibTEX databases, and, overall, builds on BibTEX's strengths while fundamentally fixing the design restrictions that lead to errors in BibTEX-formatted documents. 1 Introduction cialized form of \inheritance", allowing information Bibliography management and typesetting play a to be factored out into a single other object, but critical role in publishing. Since its introduction in is not a generalized feature | it is insufficient, for example, for an author who wants to create a bibli- 1985, BibTEX has become the dominant tool for pro- ography of all of his or her own works, with a com- fessional bibliography management with LATEX. It has achieved this dominance due to several good de- mon note or URL associate with each entry. Al- though @string objects could help prevent spelling sign decisions, such as tight integration with LATEX, a human-readable format for bibliographic data- mistakes, there is no practical way around adding bases, and overall ease of use. However, two decades the fields explicitly to each entry. Furthermore, meeting publication requirements of experience with BibTEX have revealed several fundamental weaknesses that require re-evaluating in a professional setting that requires consistent ab- what features a modern bibliography management breviations, names, and formatting guidelines re- quires users to edit the database. To abbreviate system must provide. CrossTEX is such a tool, which a journal name, the user must edit a copy of the learns from the example of BibTEX and provides backwards compatibility while moving forward with database, find each occurrence of the name, and new models of bibliographic data and stylistic con- change it to the desired value. Even if the data- trol. base takes advantage of @string objects, a feature included in BibTEX to attempt to circumvent the 1.1 What's wrong with BIBTEX? restrictive relational model, the values of the strings must be changed, because information appears in a First, BibTEX interprets databases with a single- table relational model in which every bibliographic BibTEX-formatted bibliography as it appears in the entry contains all of the information that can ap- database. This fact alone prevents large, common pear in it. This redundancy is a challenge to those databases from being useful save for looking up cita- who maintain bibliographic databases because they tion information to copy-and-paste to smaller, per- must correctly look up, enter, and maintain all that document bibliographies that authors must manage. information for every reference; it is easy to enter BibTEX citations within documents are based or modify entries separately and use slightly differ- on arbitrary keys attached to each entry in the data- ent versions of the name of a journal, conference, or base. In the best case, authors establish site-wide even author. rules for creating keys from entries so they can cor- rectly guess the required key based on the publica- BibTEX's crossref field provides a simple, spe- tion information of the article to be cited. This is 342 TUGboat, Volume 28 (2007), No. 3 | Proceedings of the 2007 Annual Meeting CrossTEX: A modern bibliography management tool still quite fragile, as a single typo may cause the to do very little searching through the database for wrong work to be cited; when databases have been keys, because their concern is writing. Maintainers assembled by cutting and pasting entries from dif- should be able to manage databases without concern ferent sources, the keys are unlikely to follow a con- for document styles, because independence from in- vention, and the author must instead look up the dividual documents allows databases to be shared appropriate key for each citation. widely. BibTEX style files are written in their own, ar- CrossTEX is designed not only to enable, but to cane programming language. Few authors or data- encourage large, common databases so that authors base maintainers have the skill to create or accu- can get on with their writing while the maintain- rately modify a bibliography style to meet require- ers have clean databases to manage. As a start, it ments handed to them by publishers | they must comes distributed with many large databases of the count on an appropriate style file being provided or papers published at major computer science confer- already existing, or they must blindly change one ences, converted from the DBLP [13] project. that is \close" until it seems to produce the correct The most important aspect of CrossTEX is that typeset appearance. an object-oriented model replaces the underlying re- In addition, BibTEX does not easily permit the lational model of BibTEX. Every entry is an object, introduction of new kinds of objects. Database which contains fields and ultimately has a value it- maintainers represent newly emerging referenced self; in BibTEX, entries have fields, and @string works, such as URLs, and unsupported objects, such objects have values, but this notion is not taken to as patents, with the all-purpose @misc object whose the level of principle. One object can use the value appearance is difficult to control because it is too of another by assigning its key to a field, by exten- general. sion of the syntax for using @strings in BibTEX. In Finally, BibTEX provides very little automation addition, if the containing object leaves any fields besides formatting the references as specified in the unspecified, it will inherit them, if possible, from database. For example, it does not check capitaliza- the other objects it refers to. For example, an ob- tion in titles, ensure that an author's name appears ject representing a conference could specify not only consistently, or abbreviate journals. This can lead a value corresponding to the name of the conference, to inconsistent spelling, capitalization, or accents for but include fields such as editor or location. Any ar- authors' names or titles of papers. A modern tool ticles that appear in the conference will simply use should be able to automate the tasks of checking and that value as a book title and then automatically in- fixing these common errors by enforcing consistency herit an editor and location without specifying any- by default. thing extra. Those very familiar with BibTEX will note that this is similar to the behavior of the spe- 1.2 A modern bibliography tool cial `crossref' field, but has become a part of the way We have developed a new bibliographic manage- data is interpreted across the board rather than a ment tool, CrossTEX, that addresses these problems. special-purpose feature. It unifies features from modern programming lan- The object-oriented model enables CrossTEX guages, databases, and bibliography management databases to be concise and easily customized. For tools to solve problems authors and database main- instance, most authors refer to conferences by their tainers have struggled with for two decades, and full names in formal journal papers, but abbreviate also adds convenient new features while being com- them otherwise; in CrossTEX, objects are flexible pletely backwards-compatible to allow authors to and have both long and short values. For exam- use old BibTEX databases unmodified. ple, \OSDI" and \Symposium on Operating System Authors and maintainers have very different Design and Implementation" are two very different needs from a tool such as CrossTEX. Authors must strings, but they are both names for the same con- conform to a variety of requirements on appearance, ference. In BibTEX, the database maintainer would fields, and formatting, and so must have great flexi- have to choose just one value | but in CrossTEX, bility with their bibliographic data. Database main- both can appear in the same @conference object, tainers, on the other hand, should be able to spec- and the choice can wait until the author chooses ify everything just once, so they need look up and stylistic options for each document. correctly enter conference locations, author names, Objects can also specify fields conditionally. and book editors just once, and can find them in Some information depends on context, especially one place to update and verify. Separating their for often-reused objects such as conferences and au- jobs is also very important. Authors should need thors. If the location of a conference changes every TUGboat, Volume 28 (2007), No. 3 | Proceedings of the 2007 Annual Meeting 343 Robert Burgess and Emin G¨unSirer year, it can be specified one way for 2006, and an- bases entirely without copying and pasting, even if other for 2007. Conditional fields are inherited along they do not have permission to change the database with their conditions, allowing richly specified ob- itself. jects to adapt to the contexts where they are used. CrossTEX is structured as a drop-in replace- This enables powerful new idioms such as the ability ment for BibTEX. Simply replace invocations of to express everything about a conference over time bibtex in the typesetting process with crosstex, in a single object.