Towards a Universal Representation of Time Donald Byrd, School of Informatics and Research Technologies Indiana University, Bloomington Begun May 2010; Last Rev

DRAFT - DO NOT DISTRIBUTE - DRAFT Towards a Universal Representation of Time Donald Byrd, School of Informatics and Research Technologies Indiana University, Bloomington Begun May 2010; last rev. early February 2014 Note: Comments on this draft are very welcome, but please do not forward without my permission. Much of the material was originally part of my design whitepaper for the General Temporal Toolkit and Workbench (GTT/W). For a brief introduction to the GTT/W, with pointers to more information, see http://www.informatics.indiana.edu/donbyrd/Spc/GTWBriefIntro.pdf . —DAB Introduction: Why and What Is a Universal Representation of Time? This paper is about explicitly representing time in the sense of saying when something happened or is expected to happen, or saying for how long it happened or is expected to happen. We are interested in a representation that is universal, that is, one that can satisfy the needs of any discipline or field of concern whatever: hard sciences, social sciences, humanities, creative arts, sports, or anything else. A practical, truly universal format is almost certainly impossible, but we want to come as close as we can, and such flexibility requires ways to describe: • both absolute times (e.g., “8:46:30 AM EST on Sept. 11, 2001”) and relative times (“45.2 sec. after this video starts”), although there’s less to the difference than meets the eye; see below for discussion. • absolute times expressed in many, many ways, including a variety of both current and obsolete calendars. • time intervals from something like 10−44 seconds on the short end to perhaps a thousand trillion (1015) years on the long end. • both exact and approximate times. • For approximate times, it requires ways to describe uncertainty in the very different terms used by any of several groups: scientists, humanists, librarians, creative artists, and others. The differences in the way they talk about uncertainty are not surprising if one accepts the view that there are major differences in the ways people think about knowledge in different fields, as Pace & Middendorf (2004) argue. (Scientists may be surprised to discover that letting users specify any precision at all is still not enough flexibility for some disciplines, but, as we will see, that is certainly the case.) Of course, we must concern ourselves with dates as well as times in the narrower sense; to avoid confusion, we will generally use the term datime instead. ??CLEAR ENOUGH? Numerous more-or-less formal ways of describing datimes already exist. From the bibliographic community, there are AACR2R, the library-cataloging standard in English-speaking countries (??REF) and EAC/EAD(?), and development of an Extended Date Time Format (EDTF) is underway. From music/audio/multimedia, we have SMPTE time code (??REF), Standard MIDI Files ??MIDI Time Code? (??REF), and HyTime (??REF), among others. ISO 8601 (ISO, 2004) and the date/time description features of XSD (the XML Schema Definition language) (?REF) are more neutral, and they are widely used in many disciplines. (Most of these are discussed below.) However, to our knowledge, no datime representation in existence or under development even attempts to be as universal as possible. 1 But is there a need for such a representation? It seems clear that there is. Recent years have seen a surge of interest in interdisciplinary work. Interest in “Big History” (??REF) has led to work on ChronoZoom (??REF) blah blah. SIMILE Timeline (??REF) blah blah. The Scholars Lab at the University of Virginia is working on Neatline ??RELEVANT?. In any case, the world cannot be cleanly divided into independent disciplines; in fact, we have argued ??CITE GTW DOCS The world is full of time-dependent phenomena, both natural and artificial. Among the former are nuclear reactions, chemical reactions, animal movement, weather, illnesses, tumor growth, development of individuals, development of species, and development of geographic features. The latter include story telling, movies, operas, teaching algebra, football games, multimedia shows, multimedia show productions, political crises, and wars. Many phenomena might be classified either way, for example, early language learning and behavior of complex systems like computer networks and nuclear reactors. Even if occurrences of these phenomena appear independent, people are constantly interested in seeing how they might have been related in the past—or (for artificial things) how one might intentionally related them in the future.1 Our own interest in a universal representation of time arose from work on the General Temporal Tools and Workbench (GTT/W) project (??REF?). The aim of the project is to create a framework and toolbox for exploring, studying, and, where appropriate, editing any temporal phenomenon or combination of phenomena in any combination of visualizations, sonifications, and even “tactilizations” simultaneously. It’s most unlikely such a system can be achieved in full, but we wish to come as close as possible. The time scale we are thinking about needs some explanation. If one considers areas that are purely creations of the mind—mythology and science fiction, for instance—any range at all might be needed, so there’s no point to taking what they might require too seriously. Otherwise, in terms of extreme values on both ends of the scale, cosmology is arguably the most demanding discipline in existence. On one hand, cosmologists talk about the “Planck epoch”, which ended one Planck time (about 5x10−44 seconds) after the Big Bang; on the other, they consider the heat death of the universe, involving durations of perhaps 10100 years. ??NEED REF. This is a range of some 150 orders of magnitude! Even if you assume that there’s no need to represent more than, say, a trillion years because so little will happen after that, the range is still over 50 orders of magnitude. While this paper focuses on the representation of temporal information, it also has something to say about a specific encoding of that information. Terminology To avoid misunderstanding, it is worth clarifying some terminology. By far the most important is the distinction between representation on one hand, and encoding and format on the other. Representations, Encodings, and Formats The terms “representation”, “encoding”, and “format” are often confused, sometimes with the result (as we will argue) that a problem looks much more difficult than it really is. Representation is a matter of what information is present. For our purposes, encoding and format are synonymous: they are concerned with something quite different, namely exactly how the information is shown or stored. Another way to put it is that representation is concerned with meaning; encoding and format, with the 1 Recent years have seen the appearance of such phenomena as Coordinated Universal Time (UTC), the worldwide time standard (which replaced Greenwich Mean Time in scientific use in 1963); USB, the Universal Serial Bus for communication between computers and peripheral devices; and Unicode, an encoding scheme intended to represent any character from any writing system or other notation. This increased interest in universality for many purposes is not a coincidence. 2 syntax needed to convey the meaning. An important example for us is a popular standard for datime representations, ISO 8601, which generally supports encoding the same information in several ways. For example, it allows describing the second day of the year 1999 as “1999-01-02” or just “19990102”; it calls these extended format and basic format, respectively. But both of those strings, as well as the forms used in ordinary writing “January 2, 1999”, “2 January 1999”, and (in the U.S.) “1/2/1999”, represent the same information. Now most people are interested in practical applications, which require an encoding—and note that any encoding directly implies a representation: that is, given an encoding, it is not hard to say what information the encoding can represent. The reverse is not true. Besides, it’s much easier to describe a representation in terms of an encoding than to describe one abstractly. As a result, descriptions of encodings or formats are common, but (outside of computer science) descriptions of representations are not; this is undoubtedly a major factor in the confusion. Other Terms A point of confusion we have already mentioned involves the words “date” and “time”. In computer systems as well as ordinary English, the word “date” is sometimes used to include time of day, and the word “time” is sometimes used to include the date. In reality, time is the simpler and more general concept, since dates necessarily involve a calendar of some kind. To avoid ambiguity, we often use the artificial term datime as in the command some operating systems have, to include everything the words “date” and “time” together refer to. The terms “precision” and “accuracy” are often confused with each other; they’re related to (but rarely confused with) “resolution”. The first two are actually quite different. The mathematical constant pi to 10 decimal places is 3.14159 26536; it is often approximated by 3-1/7. Rounded to 10 decimal places, 3-1/7 is 3.14285 71429, which is much more precise than 3.14159, with only 5 decimal places. But it is considerably less accurate, i.e., it’s not nearly as close to the actual value as 3.14159 is. Precision and resolution are much more closely related concepts. A 10-decimal-place value has a resolution, and therefore implies a precision, of .00000 00001; a 5-place value has resolution and implied precision of .00001. To avoid the confusion between precision and accuracy, we prefer to speak of resolution instead of precision. For example, the ISO 8601 representation of time expresses dates and times to the second, but it refers to “reduced precision” formats that give values only to the minute, hour, or day; we will use the term “reduced resolution” instead.

Load more