
CONTRIBUTED RESEARCH ARTICLES 486 Measurement Units in R by Edzer Pebesma, Thomas Mailund, and James Hiebert Abstract We briefly review SI units, and discuss R packages that deal with measurement units, their compatibility and conversion. Built upon udunits2 and the UNIDATA udunits library, we introduce the package units that provides a class for maintaining unit metadata. When used in expression, it automatically converts units, and simplifies units of results when possible; in case of incompatible units, errors are raised. The class flexibly allows expansion beyond predefined units. Using units may eliminate a whole class of potential scientific programming mistakes. We discuss the potential and limitations of computing with explicit units. Introduction Two quotes from Cobb and Moore(1997)– “Data are not just numbers, they are numbers with a context” and “in data analysis, context provides meaning” – illustrate that for a data analysis to be meaningful, knowledge of the data’s context is needed. Pragmatic aspects of this context include who collected or generated the data, how this was done, and for which purpose (Scheider et al., 2016); semantic aspects concern what the data represents: which aspect of the world do the data refer to, when and where were they measured, and what a value of ‘1’ means. R does allow for keeping some context with data, for instance • "data.frame" columns must have and "list" elements may have names that can be used to describe context, using freetext • "matrix" or "array" objects may have dimnames • for variables of class "factor" or "ordered", levels may indicate, using freetext, the categories of nominal or ordinal variables • "POSIXt" and "Date" objects specify how numbers should be interpreted as time or date, with fixed units (second and day, respectively) and origin (Jan 1, 1970, 00:00 UTC) • "difftime" objects specify how time duration can be represented by numbers, with flexible units (secs, mins, hours, days, weeks); lubridate (Grolemund and Wickham, 2011) extends some of this functionality. Furthermore, if spatial objects as defined in package sp (Pebesma and Bivand, 2005) have a proper coordinate reference system set, they can be transformed to other datums, or converted to various flat (projected) representations of the Earth (Iliffe and Lott, 2008). In many cases however, R drops contextual information. As an example, we look at annual global land-ocean temperature index1 since 1960: > temp_data = subset(read.table("647_Global_Temperature_Data_File.txt", + header=TRUE)[1:2], Year >= 1960) > temp_data$date = as.Date(paste0(temp_data$Year, "-01-01")) > temp_data$time = as.POSIXct(temp_data$date) > Sys.setenv(TZ="UTC") > head(temp_data, 3) Year Annual_Mean date time 81 1960 -0.03 1960-01-01 1960-01-01 82 1961 0.05 1961-01-01 1961-01-01 83 1962 0.02 1962-01-01 1962-01-01 > year_duration = diff(temp_data$date) > mean(year_duration) Time difference of 365.2545 days Here, the time difference units are reported for the difftime object year_duration, but if we would use it in a linear algebra operation > year_duration %*% rep(1, length(year_duration)) / length(year_duration) [,1] [1,] 365.2545 the unit is dropped. Similarly, for linear regression coefficients we see 1data from http://climate.nasa.gov/vital-signs/global-temperature/ The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 487 Base quantity SI base unit Name Symbol Name Symbol length l, x, r, etc. meter m mass m kilogram kg time, duration t second s electric current I, i ampere A thermodynamic temperature T kelvin K amount of substance n mole mol luminous intensity Iv candela cd Table 1: base quantities, SI units and their symbols (from International Bureau of Weights and Measures et al.(2001), p. 23) > coef(lm(Annual_Mean ~ date, temp_data)) (Intercept) date 1.833671e-02 4.364763e-05 > coef(lm(Annual_Mean ~ time, temp_data)) (Intercept) time 1.833671e-02 5.051809e-10 where the unit of change is in degrees Celsius but either per day (date) or per second (time). For purely mathematical manipulations, R often strips context from numbers when it is carried in attributes, the linear algebra routines being a prime example. Most variables are somehow attributed with information about their units, which specify what the value 1 of this variable represents. This may be counts of something, e.g. ‘1 apple’, but it may also refer to some physical unit, such as distance in meter. This article discusses how strong unit support can be introduced in R. SI The BIPM (Bureau International des Poids et Mesures) is the “the intergovernmental organization through which Member States act together on matters related to measurement science and measurement standards. Its recommended practical system of units of measurement is the International System of Units (Système International d’Unités, with the international abbreviation SI)2”. International Bureau of Weights and Measures et al.(2001) describe the SI units, where, briefly, SI units • consist of seven base units (length, mass, time & duration, electric current, thermodynamic temperature, amount of substance, and luminous intensity), each with a name and abbreviation (Table1) • consist of derived units that are formed by products of powers of base units, such as ‘m/s2’, many of which have special names and symbols (e.g. angle: 1 rad = 1 m/m; force: 1 N = 1 m kg s−2) • consist of coherent derived units when derived units include no numerical factors other than one (with the exception of ‘kg’3); an example of a coherent derived unit is 1 watt = 1 joule per 1 second, • may contain SI prefixes (k = kilo for 103, m = milli for 10−3, etc.) • contain special quantities where units disappear (e.g., m/m) or have the nature of a count, in which cases the unit is ‘1’. Related work in R Several R packages provide unit conversions. For instance, measurements (Birk, 2016) provides a collection of tools to make working with physical measurements easier. It converts between metric and imperial units, or calculates a dimension’s unknown value from other dimensions’ measurements. It does this by the conv_unit function: 2http://www.bipm.org/en/measurement-units/ 3as a base unit, kg can be part of coherent derived units The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 488 > library(measurements) > conv_unit(2.54, "cm", "inch") [1] 1 > conv_unit(c("101 44.32","3 19.453"), "deg_dec_min", "deg_min_sec") [1] "101 44 19.2000000000116" "3 19 27.1800000000003" > conv_unit(10, "cm_per_sec", "km_per_day") [1] 8.64 but uses for instance kph instead of ‘km_per_hour’, and then ‘m3_per_hr’ for flow – unit names seem to come from convention rather than systematic composition. Object conv_unit_options contains all 173 supported units, categorized by the physical dimension they describe: > names(conv_unit_options) [1] "acceleration" "angle" "area" "coordinate" "count" [6] "duration" "energy" "flow" "length" "mass" [11] "power" "pressure" "speed" "temperature" "volume" > conv_unit_options$volume [1] "ul" "ml" "dl" "l" "cm3" "dm3" [7] "m3" "km3" "us_tsp" "us_tbsp" "us_oz" "us_cup" [13] "us_pint" "us_quart" "us_gal" "inch3" "ft3" "mi3" [19] "imp_tsp" "imp_tbsp" "imp_oz" "imp_cup" "imp_pint" "imp_quart" [25] "imp_gal" Function conv_dim allows for the conversion of units in products or ratios, e.g. > conv_dim(x = 100, x_unit = "m", trans = 3, trans_unit = "ft_per_sec", y_unit = "min") [1] 1.822689 computes how many minutes it takes to travel 100 meters at 3 feet per second. Package NISTunits (Gama, 2014) provides fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI and non-SI units, plus unit conversions, based on the data from NIST (National Institute of Standards and Technology). The package provides a single function for every unit conversion; all but 5 from its 896 functions are of the form ‘NISTxxxTOyyy’ where ‘xxx’ and ‘yyy’ refer to two different units. For instance, converting from W m−2 to W inch−2 is done by > library(NISTunits) > NISTwattPerSqrMeterTOwattPerSqrInch(1:5) [1] 0.00064516 0.00129032 0.00193548 0.00258064 0.00322580 Both measurements and NISTunits are written entirely in R. UNIDATA’s udunits library and the udunits2 R package Udunits, developed by UCAR/UNIDATA, advertises itself on its web page4 as: “The udunits package supports units of physical quantities. Its C library provides for arithmetic manipulation of units and for conversion of numeric values between compatible units. The package contains an extensive unit database, which is in XML format and user-extendable. The R package udunits2 (Hiebert, 2015) provides an R level interface to the most important functions in the C library. The functions provided by udunits2 are > library(udunits2) > ls(2) [1] "ud.are.convertible" "ud.convert" "ud.get.name" [4] "ud.get.symbol" "ud.have.unit.system" "ud.is.parseable" [7] "ud.set.encoding" Dropping the ud prefix, is.parseable verifies whether a unit is parseable > ud.is.parseable("m/s") [1] TRUE > ud.is.parseable("q") [1] FALSE are.convertible specifies whether two units are convertible 4https://www.unidata.ucar.edu/software/udunits/ The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 489 > ud.are.convertible("m/s", "km/h") [1] TRUE > ud.are.convertible("m/s", "s") [1] FALSE convert converts units that are convertible, and throws an error otherwise > ud.convert(1:3, "m/s", "km/h") [1] 3.6 7.2 10.8 and get.name, get.symbol and set.encoding get name, get symbol or modify encoding of the charac- ter unit arguments. > ud.get.name("kg") [1] "kilogram" > ud.get.symbol("kilogram") [1] "kg" > ud.set.encoding("utf8") NULL Unlike the measurements and NISTunits, udunits2 parses units as expressions, and bases its logic upon the convertibility of expressions, rather than the comparison of fixed strings: > m100_a = paste(rep("m", 100), collapse = "*") > m100_b = "dm^100" > ud.is.parseable(m100_a) [1] TRUE > ud.is.parseable(m100_b) [1] TRUE > ud.are.convertible(m100_a, m100_b) [1] TRUE This has the advantage that through complex computations, intermediate objects can have units that are arbitrarily complex, and that can potentially be simplified later on.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-