Introduction to Ontologies for Environmental Biology

Barry Smith http://ontology.buffalo.edu/smith Finnegans Web concept type class instance model representation data process property Disciplines here involved

GIS Ecology Environmental biology Various -omics disciplines Bioinformatics Medical Informatics Database science Semantic webists ... Part 1: What is an Ontology?

4 what cellular component? what molecular function?

what biological process?

5 natural language labels designed for use in annotations to make the data cognitively accessible to human beings and algorithmically tractable to computers

6 compare: legends for maps

7 common legceonmdpsa arello: wle g(cernodsss -fboor rmaderp)s integration

8 ontologies are legends for data

9 compare: legends for diagrams

10 Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007 computationally tractable legends

help integrate complex representations of reality help human beings find things in complex representations of reality help computers reason with complex representations of reality

12 ontologies are used to annotate data but there are two kinds of annotations names of types

16 names of instances

17 A basic distinction

type vs. instance

science text vs. diary human being vs. Michael Ashburner

18 Catalog vs. inventory

A 515287 DC3300 Dust Collector Fan B 521683 Gilmer Belt C 521682 Motor Drive Belt 19 Ontology types Instances 20 An ontology is a collection of standardized names for types We learn about types in reality from looking at the results of scientific experiments captured in the form of scientific theories Ontologies provide the terminological scaffolding of scientific theories experiments relate to what is particular science describes what is general 21 thing types organism

animal

cat siamese frog instances

22 types vs. their extensions

type

{a,b,c,...} class of instances = a collections of particulars

23 Extension =def

The extension of a type A is the class of instances of A

(the class of all entities to which the term ‘A’ applies)

24 types vs. classes

types

{c,d,e,...} classes

25 types vs. classes

types extensions ~ defined classes

26 Defined class =def member of Abba aged > 50 years pizza with > 4 different toppings red wine to serve with fish

27 Part 2: The OBO Foundry

28 what cellular component? what molecular function?

what biological process?

29 The Gene Ontology The Gene Ontology Five bTahnegs Gen fore y Ontouro GloOg ybuck

2. based in biological science 3. cross-species data comparability (human, mouse, yeast, fly ...) 4. cross-granularity data integration (molecule, cell, , organism) 5. cumulation of scientific knowledge in algorithmically tractable form 6. links people to software

7. part of Open Biomedical Ontologies (OBO) 32 Entry point for creation of web- accessible biomedical data

GO initially low-tech to encourage users Simple (web-service-based) tools created to support the work of biologists in creating annotations (data entry) OBO  OWL DL converters now making OBO Foundry annotated data immediately accessible to Semantic Web data integration projects

33 The OBO Foundry

A suite of high quality interoperable reference ontologies to serve the annotation of biomedical data providing guidelines for those who need to create new ontology resources http://obofoundry.org RELATION CONTINUANT OCCURRENT TO TIME

INDEPENDENT DEPENDENT

GRANULARITY

Organism Anatomical Organ ORGAN AND (NCBI Entity Function ORGANISM Taxonomy) (FMA, CARO) (FMP, CPRO) Phenotypic Biological Process Quality (GO) (PaTO) CELL AND Cellular Cellular Cell CELLULAR Component Function (CL) COMPONENT (FMA, GO) (GO)

Molecule Molecular Function Molecular Process MOLECULE (ChEBI, SO, (GO) (GO) RnaO, PrO)

The OBO Foundry building out from the original GO

35 Simple guidelines

• use singular nouns • distinguish continuants from occurrents • distinguish things from their qualities • distinguish types from their instances • do not use the weasel word ‘concept’ CRITERIA . OPENNESS: The ontology is open and available to be used by all. . FORMAL LANGUAGE: The ontology is in, or can be instantiated in, a common formal language. . ORTHOGONALITY: The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. . CONVERGENCE: The developers agree to work torwards a single ontology for each domain.

37 http://obofoundry.org/ CRITERIA . UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. . IDENTIFIERS: The ontology possesses a unique identifier space within OBO. . VERSIONING: The ontology provider has procedures for identifying distinct successive versions. . DEFINITIONS: The ontology includes textual definitions for all terms.

38 http://obofoundry.org/ CRITERIA . CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. . DOCUMENTATION: The ontology is well-documented. . USERS: The ontology has a plurality of independent users. . COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.

39 http://obofoundry.org/ Foundry ontologies all work in the same way all are built to represent the types existing in a pre- existing domain and the relations between these types in a way which can support reasoning – we have data – we need to make this data available for semantic search and algorithmic processing – we create a consensus-based ontology for annotating the data – and ensure that it can interoperate with Foundry ontologies for neighboring domains

40 Formal-Ontological Relations

is_a part_of located_at depends_on is_boundary_of adjacent_to

41 To support integration of ontologies

relational expressions such as is_a part_of ... should be used in the same way in all ontologies involved

42 to define these relations properly

we need to take account of both types and instances in reality

43 Kinds of relations

: Toronto instance_of city

: Toronto part_of Ontario

: waterfall part_of river

44 is_a

human is_a mammal all instances of the type human are as a matter of necessity instances of the type mammal

45 Ontology Scope URL Custodians

Cell Ontology cell types from prokaryotes obo.sourceforge.net/cgi- Jonathan Bard, Michael (CL) to bin/detail.cgi?cell Ashburner, Oliver Hofman

Chemical Entities of Bio- Paula Dematos, molecular entities ebi.ac.uk/chebi logical Interest (ChEBI) Rafael Alcantara

Melissa Haendel, Terry Common Refer- anatomical structures in (under development) Hayamizu, Cornelius Rosse, human and model organisms ence Ontology (CARO) David Sutherland,

Foundational Model of fma.biostr.washington. JLV Mejino Jr., structure of the Anatomy (FMA) edu Cornelius Rosse

Functional Genomics design, protocol, data Investigation Ontology fugo.sf.net FuGO Working Group instrumentation, and analysis (FuGO)

cellular components, Gene Ontology molecular functions, www.geneontology.org Gene Ontology Consortium (GO) biological processes

Phenotypic Quality obo.sourceforge.net/cgi Michael Ashburner, Suzanna Ontology qualities of biomedical entities -bin/ detail.cgi? Lewis, Georgios Gkoutos (PaTO) attribute_and_value

Protein Ontology protein types and (under development) Protein Ontology Consortium (PrO) modifications

Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall

RNA Ontology three-dimensional RNA (under development) RNA Ontology Consortium (RnaO) structures

Sequence Ontology properties and features of song.sf.net Karen Eilbeck 46 (SO) nucleic sequences Ontology Scope URL Custodians

Cell Ontology cell types from prokaryotes obo.sourceforge.net/cgi- Jonathan Bard, Michael (CL) to mammals bin/detail.cgi?cell Ashburner, Oliver Hofman

Chemical Entities of Bio- Paula Dematos, molecular entities ebi.ac.uk/chebi logical Interest (ChEBI) Rafael Alcantara

Melissa Haendel, Terry Common Anatomy Refer- anatomical structures in (under development) Hayamizu, Cornelius Rosse, human and model organisms ence Ontology (CARO) David Sutherland,

Foundational Model of fma.biostr.washington. JLV Mejino Jr., structure of the human body Anatomy (FMA) edu Cornelius Rosse

Functional Genomics design, protocol, data Investigation Ontology fugo.sf.net FuGO Working Group instrumentation, and analysis (FuGO)

cellular components, Gene Ontology molecular functions, www.geneontology.org Gene Ontology Consortium (GO) biological processes

Phenotypic Quality obo.sourceforge.net/cgi Michael Ashburner, Suzanna Ontology qualities of biomedical entities -bin/ detail.cgi? Lewis, Georgios Gkoutos (PaTO) attribute_and_value

Protein Ontology protein types and (under development) Protein Ontology Consortium (PrO) modifications

Relation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris Mungall

RNA Ontology three-dimensional RNA (under development) RNA Ontology Consortium (RnaO) structures

Sequence Ontology properties and features of song.sf.net Karen Eilbeck 47 (SO) nucleic sequences Anatomical Anatomical Space Structure

Organ Cavity Organ Organ Subdivision Cavity Organ Part

Serous Sac Serous Sac Organ Organ Tissue Cavity Cavity Serous Sac Component Subdivision Subdivision

Pleural Sac Pleura(Wall Pleural of Sac) Cavity Parietal Pleura Visceral Interlobar Pleura recess Mediastinal Pleura Mesothelium of Pleura

Foundational Model of Anatomy Anatomical Anatomical Space Structure

Organ Cavity Organ Organ Subdivision Cavity Organ Part

Serous Sac Serous Sac Organ Organ Tissue Cavity Cavity Serous Sac Component Subdivision Subdivision is_

Pleural Sac Pleura(Wall Pleural of Sac) a Cavity f Parietal o Pleura Visceral _ Interlobar Pleura t recess Mediastinal Pleura Mesothelium ar of Pleura p Mature OBO Foundry ontologies now undergoing reform Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)

50 Ontologies being built to satisfy Foundry principles ab initio

Ontology for Clinical Investigations (OCI) Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)

51 Ontologies in planning phase

Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Immunology Ontology (ImmunO) Infectious Disease Ontology (IDO) Mouse Adult Neurogenesis Ontology (MANGO)

52 OBO Foundry Success Story

Model organism research seeks results valuable for the understanding of human disease. This requires the ability to make reliable cross- species comparisons, and for this anatomy is crucial. But different MOD communities have developed their anatomy ontologies in uncoordinated fashion.

53 Ontologies facilitate grouping of annotations

brain 20 hindbrain 15 rhombomere 10

Query brain without ontology 20 Query brain with ontology 45

54 CARO – Common Anatomy Reference Ontology for the first time provides guidelines for model organism researchers who wish to achieve comparability of annotations for the first time provides guidelines for those new to ontology work

See Haendel et al., “CARO: The Common Anatomy Reference Ontology”, in5:5 Burger (ed.), Anatomy Ontologies for Bioinformatics: Springer, in press. CARO-conformant ontologies already in development:

Fish Multi-Species Anatomy Ontology (NSF funding received) Ixodidae and Argasidae (Tick) Anatomy Ontology Mosquito Anatomy Ontology (MAO) Anatomy Ontology Xenopus Anatomy Ontology (XAO) undergoing reform: Drosophila and Zebrafish Anatomy Ontologies

56 Part 3 The Hole Story The Ontology of Environments Initial hypothesis: Environments are holes

environment place site niche habitat setting hole spatial region interior location Places are holes

RELATION CONTINUANT OCCURRENT TO TIME

INDEPENDENT DEPENDENT

GRANULARITY

Organism Anatomical Organ ORGAN AND (NCBI Entity Function ORGANISM Taxonomy) (FMA, CARO) (FMP, CPRO) Phenotypic Biological Process Quality (GO) (PaTO) CELL AND Cellular Cellular Cell CELLULAR Component Function (CL) COMPONENT (FMA, GO) (GO)

Molecule Molecular Function Molecular Process MOLECULE (ChEBI, SO, (GO) (GO) RnaO, PrO)

No place for environments

66 A Neglected Major Category in Ontologies thus far

Things (e.g. organisms) Qualities / Features Functions Processes

Environments = that into which organisms (etc.) fit RELATION CONTINUANT OCCURRENT TO TIME

INDEPENDENT DEPENDENT

GRANULARITY

Anatomical Organism Organ ORGAN AND Entity (NCBI Function ORGANISM (FMA, Taxonomy) (FMP, CPRO) CARO) Phenotypic Biological s

t Quality Process n

e (PaTO) (GO) m e n

CELL AND Cellular r Cellular Cell e CELLULAR Component o Function r h (CL) i

COMPONENT (FMA, GO) v (GO) e n r e a

Molecule Molecular Function Molecular Process MOLECULE (ChEBI, SO, (GO) (GO) RnaO, PrO)

Environments are holes in which

organisms, cells, molecules ... can 6l8ive RELATION CONTINUANT OCCURRENT TO TIME

INDEPENDENT DEPENDENT

GRANULARITY

POPULATION

Anatomical Organ Organism ORGAN AND Entity Function (NCBI ORGANISM (FMA, (FMP, Taxonomy) CARO) CPRO) Phenotypic Biological Quality Process (PaTO) (GO) CELL AND Cellular Cellular Cell CELLULAR Component Function (CL) COMPONENT (FMA, GO) (GO)

Molecule Molecular Function Molecular Process MOLECULE (ChEBI, SO, (GO) (GO) RnaO, PrO)

environments for populations 69

Environments are holes Double Hole Structure of the Occupied Niche

R e t a i n e r ( a b o u n d a r y o f s o m e s u r r o u n d i n g s t r u c t u r e )

M e d i u m ( f i l l i n g t h e e n v i r o n i n g h o l e )

T e n a n t ( o c c u p y i n g t h e c e n t r a l h o l e ) Tenant, medium and retainer

the medium of the bear’s niche is a circumscribed body of air medium might be body of water, cytosol, nasal mucosa, epithelium, endocardium, synovial tissue ... The Empty Niche

F i a t b o u n d a r y P h y s i c a l b o u n d a r y Two Types of Boundary

F i a t b o u n d a r y P h y s i c a l b o u n d a r y Positive and negative parts

negative part or hole (not made positive of matter) part (made of matter) Four Basic Niche Types (Niche as generalized hole)

1 2 3 4

1: a womb; an egg; a house (better: the interior thereof) 2: a snail’s shell; 3: the niche of a pasturing cow; 4: the niche around a circling buzzard (fiat boundary) Types of Niches a pond, a nest, a cave, a hut, an air- conditioned apartment building the history of evolution = history of the development of niches Types of relations for EnvO

in on (surface of) surrounds lives_in attaches to realizes occupies (spatial region) ... Lexical Semantics the fruit is in the bowl the bird is in the nest the lion is in the cage the pencil is in the cup the fish is in the river the river is in the valley the water is in the lake the car is in the garage the fetus is in the cavity in the uterine lining the colony of whooping crane is in its breeding grounds Double Hole Structure

R e t a i n e r ( a b o u n d a r y o f s o m e s u r r o u n d i n g s t r u c t u r e )

M e d i u m ( f i l l i n g t h e e n v i r o n i n g h o l e )

T e n a n t ( o c c u p y i n g t h e c e n t r a l h o l e ) when a tenant leaves its niche the gap left by the tenant is filled immediately by the surrounding medium A hole in the ground

Solid physical boundaries at the floor and walls

but with a fiat lid:

hole Part 4: Not every hole is an environment

An environment is a special kind of (generalized) hole but what kind? Elton – niche as role the ‘niche’ of an animal means its place in the biotic environment, its relations to food and enemies. [...] When an ecologist says ‘there goes a badger’ he should include in his thoughts some definite idea of the animal’s place in the community to which it belongs, just as if he had said ‘there goes the vicar’ (Elton 1927, pp. 63f.) G.E. Hutchinson: niche as volume in a functionally defined space the niche = an n-dimensional hyper- volume whose dimensions correspond to resource gradients over which species are distributed G.E. Hutchinson (1957, 1965)

Hypervolume niche = a location in an attribute space

defined by a specific constellation of environmental variables such as degree of slope, exposure to sunlight, soil fertility, foliage density, salinity... Niche Construction

Lewontin: niches normally arise in symbiosis with the activities of organisms or groups of organisms (“ecosystem engineering”); they are not already there, like vacant rooms in a gigantic evolutionary hotel, awaiting organisms who would evolve into them. (The Triple Helix, Gene Organism, Environment) Part Last: Bringing Together the Spatial and Functional Approaches to Environment Ontology

The environment is not a location in an attribute space, but it must have features have such location Every environment must have some spatial location

The functional niche presupposes the spatial-structural niche

Ontology of environment + ontology of associated environmental features J. J. Gibson’s Ecological Psychology

The terrestrial environment is [best] described in terms of a medium, substances, and the surfaces that separate them. (Gibson 1979, p. 16) Gibson’s theory of surface layout

‘a sort of applied geometry that is appropriate for the study of perception and behavior’ (1979, p. 33) ground, open environment, enclosure, detached object, attached object, hollow object, place, sheet, fissure, stick, fiber, dihedral, etc. Gibson’s theory of surface layout as an anatomy of environments

• systems of barriers, doors, pathways to which the behavior of organisms is specifically attuned, • temperature gradients, patterns of movement of air or water molecules • water holes, food sources (features) • apertures (mouths, sphincters ...) Two sets of issues

Environments, as spatial structures, and their parts

Environmental attributes (qualities, functions), determining multidimensional loci à la Hutchinson Aim

To define structural properties such as: open, closed, connected, compact, spatial coincidence, integrity, aggregate, boundary RCC (Region Connection Calculus) plus extensions Ecological Niche Concepts niche as particular place or subdivision of an environment that an organism or population occupies vs. niche as function of an organism or population within an ecological community Next steps

Our data needs are to link niche features with geo-locations Scale: From geographic to microbiological

From locations of organisms/samples, sources of museum artifacts ... to organism interactions, e.g. on bacterial infection – how the interior of one organism or organism part serves as environment for another organism Hosts for bacterial infection (interior of) lung blood (bacteremia) erythrocyte - plasmodium inhabits red blood cells hepatocyte – plasmodium infects liver cells macrophage gut and oral mucosa, nasal mucosa, vaginal mucosa kidney bladder portion of epithelial tissue

C: bacteria (arrows) adhering to and penetrating the epithelial cells (×3,000) D: abscess (Ab) formation in subepithelial region with a colony of bacteria (arrows) and a red blood cell (RBC) in it (×2,000)

RELATION CONTINUANT OCCURRENT TO TIME

INDEPENDENT DEPENDENT

GRANULARITY

Organism Anatomical Organ ORGAN AND (NCBI Entity Function ORGANISM Taxonomy) (FMA, CARO) (FMP, CPRO) Phenotypic Biological Process Quality (GO) (PaTO) CELL AND Cellular Cellular Cell CELLULAR Component Function (CL) COMPONENT (FMA, GO) (GO)

Molecule Molecular Function Molecular Process MOLECULE (ChEBI, SO, (GO) (GO) RnaO, PrO)

Environments, environment parts (features), environment qualities 106 Ontologies needed Environment -- Taxonomy place, habitat, city, farm, building (interior), oral cavity, uterine cavity, gut ... Environment part – Anatomy of environments (Surface, conduit, entry ...) city wall, uterine wall, water source, ... Environment function protection, supply of food,... Environment quality – (Phenotypes) ambient temperature, salinity, ...