<<

DRAFT - DO NOT DISTRIBUTE - DRAFT Towards a Universal Representation of Donald Byrd, School of Informatics and Research Technologies Indiana University, Bloomington Begun May 2010; last rev. early February 2014

Note: Comments on this draft are very welcome, but please do not forward without my permission. Much of the material was originally part of my design whitepaper for the General Temporal Toolkit and Workbench (GTT/W). For a brief introduction to the GTT/W, with pointers to more information, see http://www.informatics.indiana.edu/donbyrd/Spc/GTWBriefIntro.pdf . —DAB

Introduction: Why and What Is a Universal Representation of Time? This paper is about explicitly representing time in the sense of saying when something happened or is expected to happen, or saying for how long it happened or is expected to happen. We are interested in a representation that is universal, that is, one that can satisfy the needs of any discipline or field of concern whatever: hard , social sciences, humanities, creative arts, sports, or anything else. A practical, truly universal format is almost certainly impossible, but we want to come as close as we can, and such flexibility requires ways to describe: • both absolute (e.g., “8:46:30 AM EST on Sept. 11, 2001”) and relative times (“45.2 sec. after this video starts”), although there’s less to the difference than meets the eye; see below for discussion. • absolute times expressed in many, many ways, including a variety of both current and obsolete . • time intervals from something like 10−44 on the short end to perhaps a thousand trillion (1015) on the long end. • both exact and approximate times. • For approximate times, it requires ways to describe uncertainty in the very different terms used by any of several groups: scientists, humanists, librarians, creative artists, and others. The differences in the way they talk about uncertainty are not surprising if one accepts the view that there are major differences in the ways people think about knowledge in different fields, as Pace & Middendorf (2004) argue. (Scientists may be surprised to discover that letting users specify any precision at all is still not enough flexibility for some disciplines, but, as we will see, that is certainly the case.) Of course, we must concern ourselves with dates as well as times in the narrower sense; to avoid confusion, we will generally use the datime instead. ??CLEAR ENOUGH? Numerous more-or-less formal ways of describing datimes already exist. From the bibliographic community, there are AACR2R, the library-cataloging standard in English-speaking countries (??REF) and EAC/EAD(?), and development of an Extended Time Format (EDTF) is underway. From music/audio/multimedia, we have SMPTE time code (??REF), Standard MIDI Files ??MIDI Time Code? (??REF), and HyTime (??REF), among others. ISO 8601 (ISO, 2004) and the date/time description features of XSD (the XML Schema Definition language) (?REF) are more neutral, and they are widely used in many disciplines. (Most of these are discussed below.) However, to our knowledge, no datime representation in existence or under development even attempts to be as universal as possible.

1 But is there a need for such a representation? It seems clear that there is. Recent years have seen a surge of interest in interdisciplinary work. Interest in “Big ” (??REF) has led to work on ChronoZoom (??REF) blah blah. SIMILE (??REF) blah blah. The Scholars Lab at the University of Virginia is working on Neatline ??RELEVANT?. In any case, the world cannot be cleanly divided into independent disciplines; in fact, we have argued ??CITE GTW DOCS The world is full of time-dependent phenomena, both natural and artificial. Among the former are nuclear reactions, chemical reactions, animal movement, weather, illnesses, tumor growth, development of individuals, development of species, and development of geographic features. The latter include story telling, movies, operas, teaching algebra, football games, multimedia shows, multimedia show productions, political crises, and wars. Many phenomena might be classified either way, for example, early language learning and behavior of complex systems like computer networks and nuclear reactors. Even if occurrences of these phenomena appear independent, people are constantly interested in seeing how they might have been related in the —or (for artificial things) how one might intentionally related them in the .1 Our own interest in a universal representation of time arose from work on the General Temporal Tools and Workbench (GTT/W) project (??REF?). The aim of the project is to create a framework and toolbox for exploring, studying, and, where appropriate, editing any temporal phenomenon or combination of phenomena in any combination of visualizations, sonifications, and even “tactilizations” simultaneously. It’s most unlikely such a system can be achieved in full, but we wish to come as close as possible. The time scale we are thinking about needs some explanation. If one considers areas that are purely creations of the mind—mythology and fiction, for instance—any range at all might be needed, so there’s no point to taking what they might require too seriously. Otherwise, in terms of extreme values on both ends of the scale, cosmology is arguably the most demanding discipline in existence. On one hand, cosmologists talk about the “Planck ”, which ended one Planck time (about 5x10−44 seconds) after the Big Bang; on the other, they consider the heat death of the universe, involving durations of perhaps 10100 years. ??NEED REF. This is a range of some 150 orders of magnitude! Even if you assume that there’s no need to represent more than, say, a trillion years because so little will happen after that, the range is still over 50 orders of magnitude. While this paper focuses on the representation of temporal information, it also has something to say about a specific encoding of that information.

Terminology To avoid misunderstanding, it is worth clarifying some terminology. By far the most important is the distinction between representation on one hand, and encoding and format on the other.

Representations, Encodings, and Formats The terms “representation”, “encoding”, and “format” are often confused, sometimes with the result (as we will argue) that a problem looks much more difficult than it really is. Representation is a matter of what information is . For our purposes, encoding and format are synonymous: they are concerned with something quite different, namely exactly how the information is shown or stored. Another way to put it is that representation is concerned with meaning; encoding and format, with the

1 Recent years have seen the appearance of such phenomena as Coordinated (UTC), the worldwide time standard (which replaced Mean Time in scientific use in 1963); USB, the Universal Serial Bus for communication between computers and peripheral devices; and Unicode, an encoding scheme intended to represent any character from any writing system or other notation. This increased interest in universality for many purposes is not a coincidence.

2 syntax needed to convey the meaning. An important example for us is a popular standard for datime representations, ISO 8601, which generally supports encoding the same information in several ways. For example, it allows describing the of the 1999 as “1999-01-02” or just “19990102”; it calls these extended format and basic format, respectively. But both of those strings, as well as the forms used in ordinary writing “January 2, 1999”, “2 January 1999”, and (in the U.S.) “1/2/1999”, represent the same information. Now most people are interested in practical applications, which require an encoding—and note that any encoding directly implies a representation: that is, given an encoding, it is not hard to say what information the encoding can represent. The reverse is not true. Besides, it’s much easier to describe a representation in terms of an encoding than to describe one abstractly. As a result, descriptions of encodings or formats are common, but (outside of computer science) descriptions of representations are not; this is undoubtedly a major factor in the confusion.

Other Terms A point of confusion we have already mentioned involves the words “date” and “time”. In computer systems as well as ordinary English, the word “date” is sometimes used to include time of day, and the word “time” is sometimes used to include the date. In reality, time is the simpler and more general concept, since dates necessarily involve a of some kind. To avoid ambiguity, we often use the artificial term datime as in the command some operating systems have, to include everything the words “date” and “time” together refer to. The terms “precision” and “accuracy” are often confused with each other; they’re related to (but rarely confused with) “resolution”. The first two are actually quite different. The mathematical constant pi to 10 decimal places is 3.14159 26536; it is often approximated by 3-1/7. Rounded to 10 decimal places, 3-1/7 is 3.14285 71429, which is much more precise than 3.14159, with only 5 decimal places. But it is considerably less accurate, i.e., it’s not nearly as close to the actual value as 3.14159 is. Precision and resolution are much more closely related concepts. A 10-decimal-place value has a resolution, and therefore implies a precision, of .00000 00001; a 5-place value has resolution and implied precision of .00001. To avoid the confusion between precision and accuracy, we prefer to speak of resolution instead of precision. For example, the ISO 8601 representation of time expresses dates and times to the second, but it refers to “reduced precision” formats that give values only to the , , or day; we will use the term “reduced resolution” instead. We say discipline for any field or subject that might be of interest to anyone, even one as trivial as, say, the game of tic tac toe. “Field” is too often used for other things: for example, an element of data in a database. “Subject” might serve as well as “discipline”. Unfortunately, the term epoch has two important senses. In some disciplines—e.g., cosmology and —an epoch is a specific period of time. But in and astronomy, an epoch is an instant in time chosen as the origin for a particular period of time: say, midnight on January 1, 1 AD in the (the “Rate Die” epoch), or J2000.0, i.e., January 1, 2000, 11:58:55.816 UTC (the current epoch in astronomy). The epoch serves as a reference point from which time is measured within that period. The phrase epoch date is sometimes used for this sense, and we adopt that terminology, reserving the word “epoch” by itself for the other sense. Finally, we use the abbreviations “BCE” (Before the Common ) and “CE” () rather than the more common “BC.” and “AD” for two reasons: to avoid any suggestion of Christian orientation, which is simply irrelevant to this work, and to avoid the apparent paradox of saying that Christ was born (as most scholars now believe) between 7 BC and 2 BC—that is, several years before his birth.

3 Related Work All of the work discussed here is intended for practical use, which (as we have said) requires an encoding and not just a representation. We are aware of related work oriented towards libraries and archives; for music and audio; for audio, video, and multimedia systems; and for general use. These projects are very diverse, but even the most general is quite limited in scope compared to what we have in mind. Here are some projects, protocols, and systems. We have already mentioned ISO 8601, the ISO standard for datime representations (ISO, 2004). ISO 8601 can represent time points with precision of a year, , day, hour, minute, or second, or of a decimal fraction of an hour, minute, or second with any number of decimal places. These can be modified by time zones (as time relative to UTC). For example, “2001-9-11T8:46:30-05:00” describes the American Airlines Flight 11 hit the North Tower of the World Trade Center in New York time (five behind UTC), with implied precision of ten seconds. ISO 8601 can also express time intervals. “1985-04-12T23:20:50/1985-06-25T10:30:00” represents a time interval beginning at 20 and 50 seconds past 23 hours on 12 April 1985 local time and ending at 30 minutes past 10 hours on 25 June 1985 local time. It can also express recurring time intervals. This is a fair amount of flexibility, and ISO 8601 is widely used; but in some ways it is still quite limited. Above all, other than the precision levels mentioned above, there is no way to describe approximate, unspecified, or uncertain values; this is a severe limitation for many bibliographic, social-science, and humanities applications. It cannot describe multiple dates (e.g., to say that something happened in either 1775 or 1778) or . And, while it allows years of more than four digits, it requires users to agree among themselves on their exact format. EDTF, the “Extended Date/Time Format”, is perhaps the most important step towards a very broadly applicable representation of time, though it currently exists only in draft form. The EDTF Web page (Library of Congress, 2013) says “This website describes the current effort to develop a reasonably comprehensive date/time definition for the bibliographic community, as well as other interested communities, and submitting it for standardization or some other mode of formalization…” EDTF is itself a community project, led by the Library of Congress. EDTF is intended to “meet the needs of various well-known XML metadata schemas, for example MODS, METS, PREMIS, etc.” Thus, it is (unfortunately) not intended to replace ISO 8601; the EDTF coordinator(?) at LoC has said ??REWRITE! Interestingly, the current draft of EDTF has three “levels” of implementation. Level 0, the least powerful, is a profile of ISO 8601, but Level 2, its most general level, is indeed far more expressive than 8601. Among the extensions in the EDTF draft proposal are ways to describe approximate and unspecified datimes, multiple datimes, and . It can also represent time intervals. However, the current EDTF, even at Level 2, is significantly less expressive than ISO 8601 in one way—its finest resolution is one second—and it still has many other limitations. Its Annex A discusses features that “may be considered in a future version”, including: • reliability (uncertainty); • precision (resolution); • volatile/dynamic/contextual dates; • season qualifiers; • alternative definitions of year, month, day (e.g., sidereal vs. solar vs. calendar years); • holidays with no fixed (or unspecified) date; • named periods of (e.g., , ). This is a good start on a list of difficult issues, though supporting much greater precision should not be difficult; see below. AACR2R is the Anglo-American Cataloguing Rules, 2nd ed. revised (Joint Steering Committee for Revision of AACR, 2002). It is used by catalogers in libraries in the English-speaking world. blah blah

4 blah ??NOW BEING REPLACED?? Sec. 22.17A of AACR2R, “rules and approximation methods for dates”, includes some interesting examples (Table 1).

1924- living person 1900 Jan. 10- living person, precise birth date is known 1837-1896 both years known 1836 or 7-1896 year of birth uncertain, known to be one of two years 1837?-1896 probable year of birth ca. 1837-1896 year of birth uncertain by several years 1837-ca. 1896 approximate year of death ca. 1837-ca. 1896 both years approximate b. 1826 year of death unknown d. 1859 year of birth unknown fl. 1893-1940 year of birth and death unknown. Some years of activity known. Do not use fl. dates within the twentieth century 12th cent. years of birth and death unknown, years of activity unknown, century known 12th/14th cent. years of birth and death unknown. years of activity unknown. but active in both don't use for the twentieth century chin shih 1152 date at which a Chinese literary degree was conferred

Table 1. AACR2R Approximation Methods ??CHECK!

MARC 21 is also a product of the library world, specifically from the Library of Congress. blah blah blah EAC-CPF (EACWG 2010) blah blah blah ?? Many date/time representations explicitly include both points in time and intervals of time. Of course an interval can be described by two points, its start and end. But time intervals are extremely common, and it’s obvious that the phrase “18thth century” generally does not mean quite the same thing as “the interval from a moment after midnight on January 1, 1800 through midnight on December 31, 1899”: imagine, for example, each description appearing in ?? . EAC-CPF (EACWG 2010)—goes further by including a < chronList> element. ??QUOTE ITS EXAMPLE?

Standards for multimedia systems include HyTime, SMIL, MPEG 7, QuickTime, SMPTE Time Code—blah blah blah HyTime (ISO/IEC 10744:1992) is in a class by itself. It was a spinoff from the group working on the ambitious music-encoding language SMDL (the “Standard Music Description Language”, ISO/IEC CD 10743); the latter never became an official standard, nor has it seen much use in music programs, though it’s been very influential. HyTime’s representation of time is exceptionally flexible and powerful, but mostly in ways irrelevant to our purposes. ??MORE?

5 Scope This paper focuses primarily on representation; the encoding issues are relatively minor. But we are very interested in practical applications, and it’s hard to tell if an elegant but very general representation like the one described here is practical without considering encoding, so we ?? SEPARATE STUFF ON ENCODING, ESP. NUMBER OF BITS NEEDED! For the reason discussed above, we include representations of both points in time and intervals of time. Some representations—for example, EAC-CPF (EACWG 2010)—go further and include lists of time points and/or intervals, but of course they need a way to describe those points and intervals, and we restrict our attention to the latter question. Concepts of time that are not strictly linear are fascinating; they are also important for some purposes. For example, much notated music includes directions that sections are to be repeated, directions that performers often ignore. Similarly, recordings of songs and hymns may include different numbers of verses. And cyclical phenomena are widespread. Usage of time in fiction, including mythology, also introduces complications, for example extending durations to arbitrary lengths, and the non-linearity of and other features of science fiction.2 A number of existing formats, for example ISO 8601, handle simpler cases of non-linear time. But non-linear time in general complicates matters tremendously, and it seems to us that its complexities are independent of the complexity of representing linear time relations. Therefore, we will not consider it further here. ??INCLUDE AT LEAST WHAT 8601 HANDLES!

Relative vs. Absolute Time We want to represent both absolute times (e.g., “8:46:30 AM EST on Sept. 11, 2001”: that’s the time American Airlines Flight 11 hit the North Tower of the World Trade Center) and relative times (e.g., “45.2 sec. after this video starts”), although there’s far less to the difference than meets the eye. Absolute times can always be viewed as times relative to the occurrence of a well-known , what astronomers and chronologists call an epoch or epoch date (see below). And some times described in relative terms are effectively as absolute as any time can be (e.g., “45.2 sec. before American Airlines Flight 11 hit the North Tower of the World Trade Center”). Absolute times are much more difficult in several ways. ??IS THAT REALLY TRUE? For example, it appears that some phenomena—the moment of a person’s birth, the time a certain nuclear reaction went critical—inherently occur at more-or-less well-defined points in time; others—the beginning of the Renaissance, for example—do not. But the phrase “more-or-less” is key: at sufficiently high resolution, nothing occurs at a precisely known time. How meaningful would it have been to record Barack Obama’s birth time to the nanosecond, even if a precisely calibrated atomic had been available in the room where he was born? ??NOT CLEAR WHERE THIS PARAGRAPH BELONGS

2 One of a great many examples in written literature is Robert Heinlein’s tour de force short story “All of You Zombies”, in which the narrator turns out—via time travel and a sex-change operation—to be his own mother and father; of course this leads to an infinite loop. In movies, a few examples are Rashomon (the same time interval and “events” are shown from several people’s very different perspectives); Groundhog Day (the main character experiences a loop in time covering a single day, with each occurrence starting the same but diverging at some point); and Run, Lola, Run (a character imagines in some detail several versions of upcoming events, depending on her own choices of a course of action).

6 Supporting Impossible vs. All Possible Time Relationships Even disregarding uncertainty and limited resolution, a representation of time as general as we consider here is capable of expressing a huge range of essentially meaningless things. Relating to millisecond accuracy the time points for events in the physical universe years apart is rarely possible. If the time difference is thousands or millions of years and the required accuracy is a nanosecond, it is never possible. Though his title emphasizes calendars, the book by Steel (2000) covers many aspects of timekeeping. He has appendices entitled “How Long is a Year?”, “How Long is a Month?”, “How Long is a Day?”, and “How Long is a Second?”; each of the four brings up multiple non-trivial questions. The discussion below covers only a small subset of the issues, but it should give some idea of their complexity. However, we hope it is obvious that a representation that supports nonsensical relationships may still be useful; in fact—if one wants a representation that can be used in the greatest variety of disciplines—it simply must be as flexible as possible. Then users can decide for themselves if a given application is meaningful.

The Range of Time Durations To help the reader think about time ranges and resolutions, here are some examples covering a wide range of durations. Most of the numbers are approximate.

Stelliferous Era (time during which 3 x 1021 sec. 100 trillion (1014) the stars will shine) years of the universe (scientific) 4.36 x 1017 sec. 13.82 billion years Recorded history 2.2 x 1011 sec. 7000 years One year (average) 31,560,000 = 3.156 x 107 sec. 1 year Video frame time 1/24 – 1/30 sec. 10-9 years Digital audio sample time 1/8,000 – 1/96,000 sec. One cycle of a 2.5 GHz CPU 1/2,500,000,000 = 4 x 10-11 sec. Atomic vibration (cesium) 1/9,162,631,770 = ca. 10-11 sec. One of the shortest times mentioned 1 yoctosecond = 10-24 sec. in physics literature (Ipp et al., 2009) Planck time 5.391 x 10-44 sec.

Common temporal resolution values for audio and video include the film and video frame rates of 24, 25, 30, and 30 drop-frame (effectively 29.97); audio sampling rates from 8,000 to 192,000; and numbers like 600 (QuickTime), 1000, etc. It is generally believed that the laws of physics imply that the Planck time is the smallest meaningful (??REF?). In the scientific history of the universe, the Planck era (or Planck epoch) is the period from the Big Bang to one Planck time later. To relate the extremes in the table directly, the duration of the Stelliferous Era is about 1065 Planck times. Even more extreme values have been contemplated on both ends of the time scale. Obviously Planck- time resolution will not be sufficient to describe what happened during the Planck era, assuming that is a meaningful idea—and it might be, because the laws of physics that would apply are not at all well understood. And “100 trillion years” is nowhere near the longest duration ever contemplated. The lifetime of Brahma in Hindu mythology has been estimated as over 300 trillion years (??the Wikipedia

7 article “Hindu units of measurement” comments that “The time elapsed since the current Brahma has taken over the task of creation can be calculated as… 155.52 Trillion Years”). Finally, the time till the expected “heat death” of the universe is very much longer yet, probably over 10100 years.??NEED REF

Issues Here are what we consider the main issues in representing time in the most general way.

1. Actual Time vs. Abstractions of Time Do we represent actual time (in absolute units like years or microseconds or, given a sampling rate, samples) or something more abstract (like for music)? That is, can we just use multiples of some fixed time unit, or is something higher-level (such as beats or quarter-notes for music) required? The latter seems almost certain to lead to enormous complexity, and with little benefit, since different domains are unlikely to share high-level representations; so it seems far preferable to shield a general framework from abstract descriptions of time. For music, poetry, and—no doubt—other applications, the designers of HyTime and SMDL went even further by separating out temporal stress patterns as well. As summarized by SIGhyper (1994): “A key accomplishment of SMDL was the separation of temporal information (the ‘time model’) from other music information, and the time model's principled division into duration (the HyTime ‘extent’ construct), meter (the SMDL ‘stress template’ construct), and tempo (the HyTime ‘projection’ construct).” But what about symbolic times, as in timing diagrams for digital hardware (Figure 1, below)? This is too complex a question to tackle for now, so this article will exclude them from consideration. In any case, we will describe actual time, i.e., time as it might be measured with a —though a stopwatch with a range and precision from far below nanoseconds to perhaps trillions of years.

Figure 1. Timing diagram for PowerPC => SRAM write cycle

8 2. Uncertainty How should uncertainty be described? To our knowledge, no existing representation of time offers anywhere near the flexibility many applications require. First, can precision be fixed (presumably, assuming every value is accurate to the resolution in effect), or must it be explicitly variable? If the latter, over what range, and can it be global or must it be variable from event to event? Surely the last option: for many purposes, time values of widely varying precision are likely to be used together, and it would be very helpful to be able to describe the precision of each. Geological times might be no more accurate than, say, a 10-million-year interval. But uncertainty involves much more than precision in the usual sense. Historical dates, including dates relating to documents and other cultural artifacts, are often described in terms of a year, , fraction of a century, etc., and often with vague or uncertain values. Obvious examples include things like “1685?”, “around 1840”, and “flourished early 13th century”, but more complex forms are not hard to find. As discussed above, AACR2R—the Anglo-American Cataloguing Rules, 2nd ed. revised, used in library catalogs (Joint Steering Committee for Revision of AACR, 2002)—includes one of the few statements of this type of description we know of (Table 1). ??MARC 21 HAS SOME ADDITIONAL FORMS Beyond these examples, many people know their friends’ birthday anniversaries, i.e., the day and month, but not the year. And, for reasons like damage to original sources, the day and year of an event, perhaps even the time of day, might be known, but not the month. Finally, situations where some evidence points to one date and other evidence to another, perhaps years away, are not uncommon. Some creative artists— e.g., Charles Ives ??CHECK—are notorious for claiming much earlier creation dates than the real ones for their works, presumably in order to make themselves appear more original. For example, Henry Cowell’s piano piece The Tides of Manaunaun, probably written in 1917, is a highly original piece (it was one of the first pieces to employ tone clusters); but Cowell claimed he had written it in 1912, which makes it seem even more remarkable, and the published music bears that date. ??NEED REF . The earlier date might be called a “known wrong” date, a date that nonetheless should not be ignored. Indeed, library catalogers often include such dates, clearly annotated, in records they produce (Sue Stancu, personal communication, March 2013). To handle these examples, consider that some existing time formats allow replacing one or more trailing digits of the year with a question mark: for example, “17??” means some year in the 1700’s. Along the same lines, allowing partly- and completely-unknown value for every field of the time/date would handle most of these examples, though not all. Finally, for the item in EAC-CPF, a new proposed standard for archives, “attributes include @notBefore and @notAfter for dates of uncertainty. The @localType attribute can be used to supply a more specific characterization of the date.” (EACWG 2010) Needless to say, the most general time representation must be possible to express the temporal information in all of these as well as that of all of the AACR2R approximation methods. Note that all of this implies that it would not always be possible to tell if one datime precedes another, or even if they’re the same. This means a representation supporting these features can describe the real-world situation of ambiguously-ordered events, so it’s actually a good thing—a feature and not a bug, as the expression goes! Probability Density Functions. The most flexible way to handle time uncertainty we can imagine is to allow attaching a probability density function (PDF) to a datime value. So, for example, if you know everything but the month, the datime’s PDF would be 1/12 at the on the proper day of every month of the year, and zero everywhere else. This seems to make sense, but we know of no prior work along these lines. Some objections are: (1) In a great many disciplines, far more people are likely to be confused by PDFs than to want to use them. This is true, but it’s purely an issue of user interface for an application, with no impact on representation. (2) A user-interface question that does affect representation is: Can an application that prefers a different way to describe uncertainty always convert descriptions from that way to PDF and back, or is

9 there a chance significant information will be lost in the process? The answer is undoubtedly the latter. Consider just one example, the “known wrong” datimes we mentioned before. Presumably these have a probability of zero; but there is no way a PDF can properly represent an ordinary datime and one or more “known wrong” datimes. Going the other way, it’s obvious that converting a PDF with a value of 1.0 for a certain datime and zero elsewhere into a list of one known datime and infinitely many known-wrong datimes is not likely to make users happy. (3) Geraint Wiggins (personal communication, January 2010) commented that, to use PDFs to handle uncertain values, “You need a good theory of how to combine the probabilities, i.e., Bayesian reasoning about time… note that these confidence judgments are not the same thing as probability of correctness.” This is undoubtedly correct for many applications, but many others will likely never need to combine probabilities of events. In any case, this is a question for an application using our representation; it does not militate against allowing PDFs in the representation. Summary. We believe representing time values with PDFs will be invaluable in some situations but more trouble than they’re worth in others, and converting back and forth without losing information will not always be practical. Therefore the most general representation should support both PDFs and something like EDTF level 2 descriptions of time values.

3. Absolute Times How should we describe absolute times? An obvious requirement is to choose an instant in time as a reference point, i.e., an epoch date in the astronomy/chronology sense. But it’s safe to say that, with one exception, no single point in time is sufficiently well defined for all purposes. (Note that the definition of the current astronomer’s epoch date we gave, January 1, 2000, 11:58:55.816 UTC, implies a precision of a millisecond, and therefore maximum accuracy in the same range.) The exception is, of course, the beginning of the universe. However, assuming the scientific version of that moment—the Big Bang, not Bishop Ussher’s calculation of the Creation—it has another problem, namely that its relationship to the time of almost anything of interest outside of cosmology is not known very accurately. As of this writing, the best figure for the age of the universe seem to be 13.8 billion years, accurate to within 0.3% or about +-40 million ??REF; that includes a range of 80 million years!3 Another time point of enormous import is midnight on January 1, 1 CE: in common use and in most disciplines, the default epoch date. While that time is known in relation to everything since, say, the beginning of civilization with orders of magnitude more accuracy than an 80 million year interval, it is obvious it’s still not perfectly well-defined. As we have already said, we are describing real time, time as it might be measured with a stopwatch, albeit a stopwatch with a range from far below nanoseconds to at least billions of years. That is, we are not concerned with abstract descriptions of times like musical beats whose mapping to real time is not fixed. Note, however, that demanding very high precision over long periods of time makes complex mappings unavoidable due to factors like the gradual slowdown of the earth's rotation, a slowdown that is not perfectly regular. (The huge earthquake in Chile of March 2010 is estimated to have increased the length of a day by slightly more than a microsecond ??REF ??MOVE TO FOOTNOTE?). To reduce accumulating error, astronomers have switched epoch dates a couple of times just in the last hundred years or so, despite the fact that they presumably do not care about enough precision to

3 By Bishop James Ussher’s famous calculation of the early 17th century, God created the universe at about midday of October 23, 4004 BCE (Steel, 2000), so its age as of this writing—1 July 2013—is some 6016.61 years. Ussher’s date has a precision of a fraction of a day, making it a great deal more precise than the scientific estimate based on the Big Bang, but it is doubtful that it is anywhere near as accurate.

10 describe events of interest to high-energy physicists or students of the very early universe.4 For a system intended to cover billions of years with very high accuracy, nothing like a perfect solution is possible; this simply means that some extreme but conceivable applications of the GTW will not be workable in practice, and the reference point must be under the user’s control. See the Wikipedia articles “Epoch” and “Epoch (astronomy)”. ??NEED BETTER REFS!—STEEL? Another issue is sidereal vs. . ??SURELY THIS IS A CALENDAR ISSUE? Calendars, Time Zones, and the Semantics of Datimes The issue of calendars is remarkably difficult. While the standard calendar almost everywhere is now the same—the Gregorian calendar promulgated by Pope Gregory in 1582—dozens of calendars are still in use by various groups (Steel, 2000). Furthermore, even a question as seemingly simple as whether a given date in European history is in the Julian (“Old Style”) or the Gregorian (“New Style”) calendar can be very difficult to answer (Steel, 2000; Sapp & Selfridge-Field, n.d.). For one thing, some countries adopted the Gregorian calendar more or less immediately after its introduction, but others not for or centuries. In some cases, adoption was on a region-by-region basis within a country; in at least one case, a country adopted the Gregorian calendar, went back to the Julian, then returned to the Gregorian. (??REF) Even if it is clear which calendar was in use, it may not be easy to convert a date to a standard form. The new year hasn’t always been considered to start on 1 January. In Great Britain and the Colonies, until 1752, it started on March 25 (Feast of the Annunciation); in other places, it started with March 1, Christmas, or Easter, a date that is not even constant from year to year. In terms of the modern Gregorian calendar, George Washington was born on February 22, 1732. Since 1732 did not start till March 25 in the version of the in use in his birthplace at the time, that day was called February 11, 1731. Even the time a day starts can be a source of confusion: midnight is not the only time that’s been used for changing the date! And was there a year zero? Astronomers include one; most others (including the standard Gregorian calendar) do not. Again, no perfect solution is possible. A great deal has been written about calendars. A “Historic Calendars of Europe” website (Sapp & Selfridge-Field, n.d.) gives an idea of the complexity of the problem for Europe alone, and provides a tool for converting. Needless to say, considering calendars from other cultures—some still used today, e.g., the Jewish and Chinese calendars—complicates matters further. For a wider and far more detailed view of the problem, see for example Richards (1999) or Steel (2000). Time zones are another phenomenon that is more complex than any reasonable person would suspect. Contrary to what many people believe, time zones are not limited to whole hours away from UTC.5 Beyond that, at least one historical involved an offset from UTC that was not even an integral number of seconds.6 And when daylight savings time starts and ends varies considerably from place to place—and is not necessarily constant in any place. Finally, in some parts of the world, no time zone exists, or at least there is no general consensus on one (??REF).

4 VLBI seems to require the greatest timing accuracy in astronomy, on the order of perhaps 1 nanosecond. I’ve heard that described as an absolute timing requirement, but it’s not clear to me that VLBI times need that much accuracy relative to times outside of its apparatus. And even if they do, is the current epoch really defined within a nanosecond?) 5 Some areas in south Asia—India for example—are 30 minutes away from the hour, and Nepal is 45 minutes away. 6 According to Dutch law, from some time in 1909 to some time in 1937, Netherlands time was 19 minutes 32.13 seconds ahead of UTC.

11 Unit Conversion A related issue—probably not part of the representation—is, how should converting time units like seconds to days and days to years be done? We must distinguish between units with an astronomical basis (primarily days, , and years) and artificial units like seconds, minutes, and centuries. The former are vastly more difficult, thanks to factors like gradual changes in the speed of rotation and revolution of the earth and—for months—the fact that some calendars use lunar months. For relative time, all one can do is use an appropriate constant (though someone dealing with the distant past or future might want constants other than the usual ones). For absolute time, converting either seconds to days or days to years with really high precision is extremely difficult, especially over long intervals of time, for reasons like the very gradual slowdown in the earth’s rotation. One result is the occasional addition of leap seconds to a day. Of course this complicates conversion of both past and future times, but, more serious, it has not been done systematically in the past (cf. Steel 2000), so it’s not even clear how to extrapolate to the future. On the other hand, artificial units are generally constant and perfectly well-defined, though they may be incommensurable. Converting between such units, e.g., to synchronize audio samples at a rate of 44,100 per second (the CD rate) and video frames at 29.97 per second (30 fps drop-frame), may lead to roundoff error, but otherwise should be trivial.

4. Compatibility How important is compatibility with existing standards and representations? It depends to some extent on the application. In General It is vital to distinguish between compatibility of representations and compatibility of formats or encodings. The latter is both more demanding and less important. Why is this? The distinction is particularly relevant for ISO 8601. For the day terrorists destroyed the World Trace Center in 2001, ISO 8601 supports the following six Calendar Date Formats, among others: (1a) 20010911 or (b) 2001-09-11 (basic and extended complete representation, with precision of a day) (2a) 010911 or (b) 01-09-11 (basic and extended truncated representation, current century assumed) (3) 2001-09 (extended representation, precision reduced to a month) (4) 2001 (precision reduced to a year) Forms 1a and 1b obviously represent the identical information, as do 2a and 2b. In addition, the information in 1a and 1b is currently identical to that in 2a and 2b and will remain so for another 85 years or so. On the other hand, each of forms 3 and 4 represents different information from all five of the others. It is clear that supporting, in a much richer time encoding, just one way to convey any temporal information 8601 allows would be considerably easier than supporting every way it allows. If the richer encoding does not support the way a given application expresses time via ISO 8601, that would be a nuisance, presumably requiring a programmer to modify the application; but if it does not support any way to express the time that application wants to describe, it could easily be impossible to use the new encoding. Of course we consider it essential that our universal representation be able to describe anything ISO 8601 can describe. We do not by any means consider it essential that it support every encoding 8601 supports.

12 Finally, it is worth mentioning how desirable it is that the encoding of every 8601-expressible time our universal representation supported is actually an encoding 8601 supports! Fortunately, this goal should not be difficult to achieve. For the Proposed GTT/W System Our proposed GTW system would never need to deal directly with any data other than what its domain-knowledge modules send it or receive from it; that makes compatibility a relatively minor consideration instead of a hugely important and difficult one. Of course other users of the universal time representation might have different needs. AACR2R (Anglo-American Cataloguing Rules, 2nd ed. revised, used in library catalogs), ISO 8601 (the ISO standard for date/time representations), and standards for multimedia systems— HyTime, SMIL, MPEG 7, QuickTime, SMPTE Time Code, etc.—are particularly worth considering. Inasmuch as there are thousands of disciplines, most with ill- defined boundaries, it’s doubtful that discipline-specific standards are worth considering; but medium- specific standards like BWF (Broadcast Wave Format) should probably get some attention. For absolute times, it should support to the greatest possible extent all descriptions of uncertainty and precision in library and scholarly use, as well as numeric time intervals. Many existing or proposed time-description formats have good ideas. We designed the Variations date/time format based on several standards and sets of guidelines, including ISO 8601:1988, AACR2R and some other library standards, etc.; cf. Byrd (2001). More importantly, the Library of Congress and partners are working on EDTF, an “Extended Date/Time Format” intended to “meet the needs of various well-known XML metadata schemas, for example MODS, METS, PREMIS, etc.” But we know of no format that comes anywhere near doing all this.

5. Temporal Resolution ??THIS SECTION IS ALMOST ALL IMPLEMENTATION – AND NOT SUCH A BIG DEAL ANYMORE! REVISE, SHRINK, AND/OR CUT IT Can the temporal resolution be fixed or must it be variable? If the latter, over what range, and can it be constant in a given use or must it be variable from event to event? See the numbers in the table under The Range of Time Durations, above. For example, the age of the universe, taking it as the time since the Big Bang, is about 13.8 billion years; with 31.56 million seconds per year, that’s about 4.35 x 1017 sec., a 59-bit number. As we have said, common resolutions for audio and video include the film and video frame rates of 24, 25, 30, and 30 drop-frame (effectively 29.97); audio sampling rates from 8,000 to 192,000; and 600 (QuickTime), 1000, etc. But for many purposes, resolution of a day, month, or year is good enough; for many more, resolution of an hour or, at best, a second is sufficient. With 1-second resolution, all of recorded history (say, 7000 years or 2.2 x 1011 sec.) can be expressed in about 38 bits, and any time from the Big Bang through hundreds of billions of years in the future (say, 1019 sec.) can be expressed in fewer than 64 bits. On the other hand, for some purposes—say, describing the operation of a 3 GHz CPU, or events just after the Big Bang—a much higher resolution is needed; but how much higher? Time units at least as short as attoseconds (10-18 sec.) are in use, and yoctoseconds (10-24 sec.) have been mentioned in high-energy physics (Physics Today 2009). But it’d be best to go even further, probably to the Planck time, roughly 5.391 * 10-44 sec.; this is generally believed to be the smallest meaningful unit of time (??REF?). So Planck-time resolution should be enough for almost anything. (See discussion under The Range of Time Durations, above.) The age of the universe is about 2 * 1062 Planck times, a 207-bit value; even one second at Planck time resolution would take 147 bits. Needless to say, using numbers that large is hugely wasteful for normal situations. Thus, variable resolution is essential. A global value under the user’s control, perhaps specified in a configuration file, is probably sufficient for almost all purposes—but not all: consider cosmology! The 150-odd orders of magnitude mentioned above would require something like 500 bits. A reasonable solution here is just the standard

13 way computers store what most programming languages call “real numbers”, with an exponent and a fixed number of bits to represent the mantissa (coefficient). For comparison, QuickTime expresses time values as an integer number of ticks, with a 32-bit signed number of ticks per second (cf. the Movie.setTimeScale method). Thus, its maximum resolution is 1/231 sec. = about 4.66 x 10-10 sec. = 466 picosec. On the other hand, in SMIL 3.0, time values are “float” numbers of seconds. The discussion of clock values in Section 5.4.3 states that “Fractional values are just (base 10) floating point definitions of seconds. The number of digits allowed is unlimited (although actual precision may vary among implementations).” One cycle of a 3 GHz CPU takes about 333 picosec., so it’s obvious that QuickTime’s representation can’t handle the needs of computer hardware designers or physicists; SMIL’s might, but users would be ill-advised to rely on it. But this should be no surprise, since both representations were designed for direct use in multimedia systems.

Sketch of a Universal Representation As we have suggested, in our view, the best starting points for a universal representation are ISO 8601 and EDTF. EDTF, as its “Level 2” appears in the current draft, is far more powerful than 8601 while retaining a good deal of compatibility (its “Level 0” is a profile of 8601). In fact, it is not too difficult to offer full compatibility with ISO 8601’s representational powers while ??REPLACE WITH PROPOSED “LEVEL 5”(?) EXTENSIONS TO EDTF? Here are some examples of how we propose to represent various datimes. ??ADD EXX FROM 8601 & EDTF DOCS.

Calendar date: 12 April 1985 1985-04-12 Local time with decimal fractions :27 minutes and 15:27:35,5 35 and a half second past 15 hours A time interval of 1 year, 2 months, 15 days and 12 1985-04-12T23:20:00/P1Y2M15DT12H hours, beginning on 12 April 1985 at 20 minutes past 23 hours the moment of the Big Bang the “moment” Jesus Christ was born the “moment” President Obama was born 8:46:30 AM EST on Sept. 11, 2001 “45.2 sec. after this video starts”

1. Actual Time, not Abstractions of Time

2. Representing Uncertainty, including Precision and Approximations Uncertainty can be expressed in either of two ways: (1) an arbitrary user-supplied PDF, or (2) ??SOMETHING THAT CAN HANDLE ANYTHING IN EDTF, AACR2R, ETC.

14 3. Absolute Times, Calendars, and Epochs

4. Compatibility

5. Temporal Resolution and Range The temporal resolution is variable, but with global score; it can be expressed in seconds or years. If in seconds, it is described by an integer number of ticks per second from 1 to 10^45. The range is ??

References Byrd, Donald (2001). Proposed DML Date Formats. Variations2 project working paper (DMLDateFormatProp.txt). Craig, G. Y., & Jones, E. J. (1982). A Geological Miscellany. Princeton University Press. EACWG (2010). Encoded Archival Context—Corporate Bodies, Persons, and Families (EAC-CPF) Tag Library Version 2010 (initial release), by the Encoded Archival Context Working Group of the Society of American Archivists and the Staatsbibliothek zu Berlin. Available at http://www3.iath.virginia.edu/eac/cpf/tagLibrary . Ipp, Andreas; Keitel, Christoph H.; and Evers, Jörg (2009, 9 October). Yoctosecond Photon Pulses from Quark-Gluon Plasmas. Physical Review Letters 103, 152301. ISO (2004). Data elements and interchange formats — Information interchange — Representation of dates and times. 3rd ed. ISO 8601:2004. Geneva: International Organization for Standardization. Joint Steering Committee for Revision of AACR (2002). Anglo-American Cataloguing Rules (2nd ed., 2002 revision) (called “AACR2R” for short). Ottawa: Canadian Library Association; London: Chartered Institute of Library and Information Professionals; Chicago: American Library Association. Library of Congress (2013). Extended Date Time Format (EDTF) 1.0 Draft Submission, 13 January 2012, with minor corrections through 10 September 2013. Available at http://www.loc.gov/standards/datetime/ . Pace, David, & Middendorf, Joan, eds. (2004). Decoding the Disciplines: Helping Students Learn Disciplinary Ways of Thinking. New Directions for Teaching and Learning no. 98. San Francisco: Jossey- Bass. Physics Today (2009 October 16). Enter the Yoctosecond. Available at http://physicsworld.com/cws/article/news/40687 . Richards, E.T. (1999). Mapping Time: The Calendar and Its History. Oxford University Press. Rosenberg, Daniel (2007). Joseph Priestley and the Graphic Invention of Modern Time. Studies in Eighteenth Century Culture 36(1), pp. 55–103. Sapp, Craig Stuart, & Selfridge-Field, Eleanor (n.d.). Historic Calendars of Europe. Available at http://hcal.ccarh.org/ . Steel, Duncan (2000). Marking Time: The Epic Quest to Invent the Perfect Calendar. John Wiley and Sons. SIGhyper (1994). A Brief History of the Development of SMDL and HyTime. Available at http://www.sgmlsource.com/history/hthist.htm

15