<<

Package ‘Taxonstand’

February 15, 2013

Type Package

Title Taxonomic standardization of plant names

Version 1.0

Date 2012-03-21

Author Luis Cayuela

Maintainer Luis Cayuela

Description Automated standardization of taxonomic names and removal of orthograhpic errors in plant species names using the The Plant List (TPL) website (www.theplantlist.org)

License GPL (>= 2)

LazyLoad yes

Repository CRAN

Date/Publication 2012-05-21 08:36:09

NeedsCompilation no

R topics documented:

bryophytes ...... 2 TPL...... 2 TPLck ...... 5

Index 8

1 2 TPL

bryophytes List of bryophytes

Description

bryophytes contain species names for 100 Mediterranean bryophytes

Usage

data(bryophytes)

Format

A data frame with 100 observations on the following 5 variables.

Full.name a character vector a character vector Species a character vector Var a character vector Intraspecific a character vector

Examples

data(bryophytes) str(bryophytes)

TPL Connection to The Plant List (TPL) website to check for the validity of a list of plant names

Description

Connects to TPL and validates the name of a vector of plant species names, replacing synonyms to accepted names and removing orthographical errors in plant names

Usage

TPL(splist, genus = NULL, species = NULL, infrasp = NULL, infra = TRUE, abbrev = TRUE, corr = FALSE, diffchar = 2, max.distance = 1, file = "") TPL 3

Arguments splist A vector of plant names, each element including the genus and specific epithet and, additionally, the infraspecific epithet. genus A vector containing the genera for plant species names. species A vector containing the specific epithets for plant species names. infrasp A vector containing the infraspecific epithets for plant species names (i.e. vari- eties and ). infra Logical. If ’TRUE’ (default) then infraspecific epithets are used to validate the taxonomic status of species names in TPL. abbrev Logical. If ’TRUE’ (default), abbreviations (aff., cf., subsp., var.) and their variants are removed prior to taxonomic standardization. corr Logical. If ’TRUE’, then removal of orthographical errors is performed on spe- cific epithets (only) prior to taxonomic standardization. diffchar A number indicating the maximum difference between number of characters in corrected and original species names. Not used if corr=FALSE. max.distance Maximum distance allowed for a match in agrep function when performing cor- rections of orthographical errors in specific epithets. Not used if corr=FALSE. file Either a character string naming a file or a connection open for writing. "" (default) indicates output to the console.

Details The procedure used for taxonomic standardization is based on function TPLck. If ’infra=FALSE’, then infraspecific epithets are neither considered for species name validation in TPL, nor returned in the output.

Value The function return an object of data.frame with the following components:

$Genus Original genus of species provided as input for taxonomic standardization. $Species Original specific epithet of species provided as input for taxonomic standardiza- tion. $Abbrev Standard annotation used in species epithet, including "cf.", "aff.", "s.l.", and "s.str." and their orthographic variants. $Infraspecific Original intraspecific epithet of species provided as input for taxonomic stan- dardization. If ’infra=FALSE’, this is not shown. $Plant.Name.Index Logical. If ’TRUE’ the name is in TPL. $Taxonomic.status Taxonomic status as in TPL, either ’Accepted’, ’Synonym’ or ’Unresolved’ $ Family name, extracted from TPL for the valid of the name. $New.Genus Genus, extracted from TPL for the valid form of the name. 4 TPL

$New.Species Specific epithet, extracted from TPL for the valid form of the name. $New.Infraspecific Infraspecific epithet, extracted from TPL for the valid form of the name. $Authority A field designating the scientist(s) who first published the name, extracted from TPL for the valid form of the name. $Typo Logical. If ’TRUE’ there was a spelling error in the specific epithet that has been corrected. $WFormat Logical. If ’TRUE’, fields in TPL had the wrong format for information to be automatically extracted as they were not properly tabulated or, alternatively, there was not a unique solutions (see ’note’).

Note Various limitations have been identified to date in the implementation of this function. First, homonyms (i.e. a name for a taxon that is identical in spelling to another such name, but belongs to a different taxon) cannot be identified. Second, if the input infraspecific epithet does not match any of the infraspecific epithets provided in TPL, then the first accepted name with no infraspecific epithet is selected as the best match. If all names provided by TPL have infraspecific epithets and are all synonyms (e.g. Pottia starkeana), then there is no best match and the output will match the original name.

Author(s) Luis Cayuela

References Cayuela, L., Granzow-de la Cerda, I., Albuquerque, F.S. and Golicher, J.D. 2012. Taxonstand: An R package for species names standardisation in vegetation databases. Methods in Ecology and Evolution, under review. Kalwij, J.M. 2012. Review of ’The Plant List, a working list of all plant species’. Journal of Vegetation Science, in press.

See Also See also TPLck.

Examples ## Not run: data(bryophytes)

# Species names in full r1 <- TPL(bryophytes$Full.name[1:20], corr=TRUE) str(r1)

# A separate specification for genera, specific, and infraspecific epithets r2 <- TPL(genus = bryophytes$Genus, species = bryophytes$Species, infrasp = bryophytes$Intraspecific, corr=TRUE) str(r2) TPLck 5

#################################### ### An example using data from GBIF #################################### require(dismo) # Download data containing all records available in GBIF of all species within genus Oreopanax (GBIF table) oreopanax <- gbif("Oreopanax", "*", geo=T) # But a list of species can be also downloaded from GBIF for a defined geographical area

# Names downloaded from GBIF often include the authority. The column names need to be split using the spaces as the split. This will result in multiple columns. We essentially only need the first two columns. sp.list <- do.call("rbind", strsplit(oreopanax$species, split=" ")) sp.list <- as.factor(paste(sp.list[,1], sp.list[,2]))

# Perform taxonomic standardisation on plant names list (TPL table) sp.check <- TPL(levels(sp.list), infra=FALSE, corr=TRUE) head(sp.check)

# Bind GBIF table with TPL table oreopanax$id <- as.numeric(sp.list) sp.check$id <- 1:dim(sp.check)[1] oreopanax.check <- merge(oreopanax, sp.check, by="id", all=T)

## End(Not run)

TPLck Connects to The Plant List (TPL) website and check for the validity of a plant name

Description Connects to TPL and validates the name of a single plant species name, replacing synonyms for accepted names and removing orthographical errors in plant names

Usage TPLck(sp, corr = FALSE, diffchar = 2, max.distance = 1, infra = TRUE)

Arguments sp A character specifying the genus and specific epithet and, additionally, the in- fraspecific epithet. corr Logical. If ’TRUE’, then removal of spelling errors is performed on the specific epithet (only) prior to taxonomic standardization. diffchar A number indicating the maximum difference between number of characters in corrected and original species names. Not used if corr=FALSE. max.distance Maximum distance allowed for a match in agrep function when performing corrections of spelling errors in specific epithets. Not used if corr=FALSE. infra Logical. If ’TRUE’ (default) then infraspecific epithets are used to validate the taxonomic status of species names in TPL. 6 TPLck

Details The procedure searches for a species name in The Plant List (TPL) and provides its taxonomic status. If the status is either ’Accepted’ or ’Unresolved’ (i.e. names for which the contributing data sources did not contain sufficient evidence to decide whether they were ’Accepted’ or ’Syn- onyms’), the function returns the species name unchanged. In those cases where the species name provided as input is recognised as a "Synonym", it is replaced with the current accepted name. Or- thographic errors can be corrected only in specific epithets. By increasing arguments ’diffchar’ and ’max.distance’, larger differences can be detected in typos, but this also increases false positives (i.e. replacement of some names for others that do not really match), so some caution is recom- mended here. If ’infra=FALSE’, then infraspecific epithets are neither considered for species name validation in TPL, nor returned in the output.

Value The function return an object of class data.frame with the following components:

$Genus Original genus of species provided as input for taxonomic standardization. $Species Original specific epithet of species provided as input for taxonomic standardiza- tion. $Infraspecific Original intraspecific epithet of species provided as input for taxonomic stan- dardization. If ’infra=FALSE’, this is not shown. $Plant.Name.Index Logical. If ’TRUE’ the name is in TPL. $Taxonomic.status Taxonomic status as in TPL, either ’Accepted’, ’Synonym’ or ’Unresolved’ $Family Family name, extracted from TPL for the valid form of the name. $New.Genus Genus, extracted from TPL for the valid form of the name. $New.Species Specific epithet, extracted from TPL for the valid form of the name. $New.Infraspecific Infraspecific epithet, extracted from TPL for the valid form of the name. $Authority A field designating the scientist(s) who first published the name, extracted from TPL for the valid form of the name. $Typo Logical. If ’TRUE’ there was a spelling error in the specific epithet that has been corrected. $WFormat Logical. If ’TRUE’, fields in TPL had the wrong format for information to be automatically extracted as they were not properly tabulated or, alternatively, there was not a unique solutions (see ’note’).

Note Various limitations have been identified to date in the implementation of this function. First, homonyms (i.e. a name for a taxon that is identical in spelling to another such name, but belongs to a different taxon) cannot be identified. Second, if the input infraspecific epithet does not match any of the infraspecific epithets provided in TPL, then the first accepted name with no infraspecific epithet is selected as the best match. If all names provided by TPL have infraspecific epithets and TPLck 7

are all synonyms (e.g. Pottia starkeana), then there is no best match and the output will match the original name.

Author(s) Luis Cayuela

References Cayuela, L., Granzow-de la Cerda, I., Albuquerque, F.S. and Golicher, J.D. 2012. Taxonstand: An R package for species names standardisation in vegetation databases. Methods in Ecology and Evolution, under review. Kalwij, J.M. 2012. Review of ’The Plant List, a working list of all plant species’. Journal of Vegetation Science, in press.

See Also See also TPL.

Examples ## Not run: # An accepted name sp1 <- TPLck("Amblystegium serpens juratzkanum") sp1 # An unresolved name sp2 <- TPLck("Bryum capillare cenomanicum") sp2 # A synonym sp3 <- TPLck("Pottia caespitosa") sp3 # A spelling error in specific epithet sp4 <- TPLck("Pohlia longicolla", corr=TRUE) sp4 # A spelling error that is not corrected (’max.distance’ defaults to 1) sp5 <- TPLck("Microbryum curvicollum", corr=TRUE) sp5 # If increasing ’max.distance’, the spelling error is accounted for sp6 <- TPLck("Microbryum curvicollum", corr=TRUE, max.distance=3) sp6

## End(Not run) Index

∗Topic datasets bryophytes,2 ∗Topic vegetation analysis TPL,2 TPLck,5 bryophytes,2

TPL,2, 7 TPLck, 4,5

8