Representation of Derived Units in Unitsml
Total Page:16
File Type:pdf, Size:1020Kb
Representation of Derived Units in UnitsML Peter J. Linstrom∗ November 6, 2006 ∗phone: (301) 975-5422,DRAFT e-mail: [email protected] 1 INTRODUCTION 11/6/06 Contents 1 Introduction 1 2 Why this convention is needed 2 3 Information needed to define a unit 3 4 Proposed XML encoding 4 5 Important conventions 5 6 Potential problems 7 7 Possible alternatives 8 A Prefixes from the SI 10 B SI units and units acceptable for use with the SI 11 C non-SI Units 16 1 Introduction This document describes a proposed convention for defining derived units in terms of their base units. This convention is intended for use in the UnitsML markup language to allow a precise definition of a wide range of units. The goal of this convention is to improve interoperability among applications and databases which use derived units based on commonly encountered base units. It is understoodDRAFT that not all units can be represented using this convention. It is, however, Representation of Derived Units in UnitsML Page 1 2 WHY THIS CONVENTION IS NEEDED 11/6/06 anticipated that a wide range of scientific and engineering units of measure can be represented with this convention. The convention consists of representing the unit in terms of its base units and providing controlled vocabulary of base units. For example the unit centimeter per second squared would be represented in terms of the following: 1. The unit meter with the prefix centi raised to the power 1. 2. The unit second raised to the power −1. Please note that this convention seeks to address the problem of defining derived units, not to define conversion factors. For this reason it will only support multiplication by constants which have defined SI prefixes. 2 Why this convention is needed Without this convention, there is no easy way to reliably compare unit definitions from different sources to see if they are the same. The proposed symbolic identifier can be used for this purpose, but it is not parsable XML, so it requires a specialized parser and cannot be validated against an XML schema. As will be noted later, other than syntax, this proposal is similar to the symbolic identifier; the need to enumerate a set of base units and multiplicative prefixes is the same for both approaches. Other identifying data in the current XML schema lacks the qualities which would make them useful for comparing unit definitions from different sources. Numeric identifiers are assigned by the author of the definition and thus are only useful for comparison within the context in which they were assigned. Names are obviously language specific. Even within a given language there may be multiple names for a given unit, so names may not be unique identifiers. Under this proposal, information about the definition is provided in a structured format based on explicitly enumerated base units and multiplicative prefixes. This will allow comparison of unit definitions from different sources; something essential for interoperability of applications with different unit definition databases. Such comparison will be done by comparing base units, multi- plicative prefixes,DRAFT and exponents of units to see if they match. Representation of Derived Units in UnitsML Page 2 3 INFORMATION NEEDED TO DEFINE A UNIT 11/6/06 3 Information needed to define a unit In order to define a unit in terms of other units the following information is needed for each unit which will be used to in the definition: identifier A code or name which identifies the unit. prefix The SI prefix which notes a factor to multiply the unit by. exponent numerator Numerator of the exponent to raise the unit and prefix to. The exponent is expressed as a separate numerator and denominator to restrict it to rational numbers (by restricting the numerator and denominator to integers). The exponent is applied to both the unit and the prefix. exponent denominator Denominator of the exponent to raise the unit and prefix to. Proposed codes for prefixes and units are provided in the appendix. It is important to note that the codes for units are internal representations to be used by the markup language to denote spe- cific units. They are not to be confused with symbols to be used in text documents or official abbreviations for the units. In most applications, users should never see the codes defined in the appendix. It is proposed that only well defined units which are not explicitly derived units be included in the set of units which may be used for definitions. This would mean that named derived units, such as newtons, could be used, but explicitly derived units, such as acre-feet could not. Units such as acre-feet can be defined as derived units. Only those units and prefixes defined in the appendix should be used in definitions. If users are allowed to add their own unit and prefix codes, interoperability will be lost. Because of this, however, this representation scheme will not work for all units and, hence, must be an optional part of the markup language. There is one important and potentially controversial unit listed in table 23. The item unit refers to a count of items and can be used to note derived units which included such counts (e.g. neutron flux). This concept is at odds with the SI which assigns such counts a unit of 1. In this proposal it was chosen to include counts as a named unit, because doing so provides additional semantic clarity over the practiceDRAFT endorsed by the SI. The units defined in the appendix have been taken from several sources [1, 2, 3, 4, 5, 6]. Representation of Derived Units in UnitsML Page 3 4 PROPOSED XML ENCODING 11/6/06 It is important to remember that this is a working document and that the list of units in the appendix is only an initial attempt at enumerating units to be defined. It is envisioned that units will be added or removed from the list based on input from the UnitsML developers. I addition, it should be recognized that the codes defined in this document are solely for enumerating base units in the XML, schema; they are not intended for use in any way outside of representing derived units in UnitsML. 4 Proposed XML encoding A noted above, derived units can be expressed as the product of base units with a multiplicative prefix raised to a specified power. It is proposed that such definitions be contained an an element named baseUnits. This element would contain elements for each base unit in the definition. Each base unit would be noted in a baseUnit element. This element would have the following attributes: prefix One of the codes for the multiplicative prefixes defined in table 1. If omitted there is no prefix. unit One of the unit codes defined in the appendix. This attribute is required. numerator The numerator of the exponent to raise the unit and prefix to. The value should be an integer. If this attribute is omitted the value is assumed to be one. denominator The denominator of the exponent to raise the unit and prefix to. The value should be an integer, but must not be zero. If this attribute is omitted the value assumed to be one. The baseUnits element is a child of the unit element. Only one baseUnits element per unit element would be allowed. The proposed markup is best illustrated with a few examples. The text in figure 1 shows the relevant markup for a cubic international foot. Another example (showing the use of rational number exponents is the markup for centimeters to the three-halves power given in figure 2. The advantages of the proposed markup can be seen by looking at an example. Figures 3 and 4 both show markup for kilojoules per cubic meter. The markup comes from different sources and uses names in differentDRAFT languages. The elements defined in this proposal are in boldface text; all other elements are present in the current definition of UnitsML. In the two figures, the existing Representation of Derived Units in UnitsML Page 4 5 IMPORTANT CONVENTIONS 11/6/06 <baseUnits> <baseUnit unit="foot" numerator="3" /> </baseUnits> Figure 1: Representation for a cubic international foot. <baseUnits> <baseUnit unit="meter" prefix="c" numerator="3" denominator="2" /> </baseUnits> Figure 2: Representation for centimeters to the three-halves power. UnitsML provides names and a numeric identifier, neither of which can be used to compare the units. The numeric identifier cannot be used because these are tied to a specific data sets, and therefore cannot be relied on for inter-comparison. The names cannot be relied on because, in this case, they are in different languages. Even if the same language is used, names are not reliable identifiers because they may be constructed the differently for the same unit (e.g., foot-pounds versus pound-feet. Without information provided in the baseUnits element, it would be impossible to determine if the units are different or the same. In this case it can be seen that the units are the same. Figure 5 shows another unit for energy per volume. Examination of the base units quickly shows that this unit is quite different from that used noted in figures 3 and 4. 5 Important conventions A problem arises when a unit can be expressed in terms of more than one set of base units. In such cases it it possible that two identical units may not be recognized as such. There are two ways this problem can occur: 1. Dimensionless base units such as radians or steradians are omitted.