<<

Can You Reason with Biodiversity? Semantic Data and the Encyclopedia of Life

Nathan Wilson Marine Biological Laboratory

Text and images with explicit copyrights are licensed under the Creative Commons Attribution, Share-Alike 3.0 License unless otherwise stated Outline

• Who am I? • Why is this question important? • Define some terms • Available semantic biodiversity data • EOL’s efforts • Challenges – Semantic Web Ontologies – Peer-Reviewed Descriptions – Applications

[email protected] - http://eol.org Who am I?

• Naturalist • Mycologist (Mushroom Expert) • Software Professional (DreamWorks, Apple, SRI) Armillaria mellea (Vahl) P. Kumm. California, USA © 1994 Nathan Wilson • Creator of Mushroom Observer (http://mushroomobserver.org) • Director of Biodiversity Informatics for EOL

[email protected] - http://eol.org EOL - E.O. Wilson’s Vision

“What we need is to get out there and search this little-known planet, and then put all the information that we get on species already known into a single great database, an electronic encyclopedia, with a page that’s indefinitely extensible for each species in turn, and that would be available to anybody, any time, anywhere, single access, on command, free.”

[email protected] - http://eol.org Why is it important to reason with biodiversity?

• Biodiversity in decline • Computer-aided identification • Computer-aided species discovery

Amanita vernicoccora Bojantchev & R.M. Davis. California, USA © 1997 Nathan Wilson

[email protected] - http://eol.org Reason == Ask?

Are you poisonous?

Ganoderma lucidum (Curtis) P. Karst. Ontario, Canada© 2013 Eva Skific [email protected] - http://eol.org Reason ≠ Ask

Are you poisonous?

Ganoderma lucidum (Curtis) P. Karst. Ontario, Canada© 2013 Eva Skific [email protected] - http://eol.org Reasoning == Computer Inference

• Computer Inference – Characters => Name => Edibility select ?sName, ?cName, ?poisonous where ?s :scientificName ?sName; ?s :chineseName ?cName; ?s :poisonous ?poisonous; ?s

Ganoderma lucidum, 靈芝 (or 灵芝), No

[email protected] - http://eol.org Reasoning == Computer Inference

• Common Name => Family Name

select ?family where ?s :chineseName “靈芝”; ?s : ?genus; ?genus :family ?family .

Ganodermataceae

[email protected] - http://eol.org What is the Semantic Web?

• “A web of data that can be processed directly and indirectly by machines” – Tim Berners-Lee • Queryable data integrated into the Internet • RDF – SPARQL – OWL – Linked Open Data

[email protected] - http://eol.org Sindice – The Semantic Web Index

• http://sindice.com • Try traditional Chinese name, 靈芝 • Returns 1 result:

[email protected] - http://eol.org Sindice – The Semantic Web Index

• Try simplified Chinese name, 灵芝

7th result mentions Ganoderma

[email protected] - http://eol.org Sindice – The Semantic Web Index

• Try English phonetic name, mushroom:

[email protected] - http://eol.org DBPedia – http://dbpedia.org

• Semanticized Wikipedia

[email protected] - http://eol.org DBPedia – http://dbpedia.org

• Recent evidence indicates that Ganoderma sichuanense is the best name for the traditional lingzhi mushroom

[email protected] - http://eol.org DBPedia – http://dbpedia.org

• Lots of data…

[email protected] - http://eol.org Observable Traits from DBpedia

• From Wikipedia: dbpprop:capshape: “no” or “offset” dbpprop:hymeniumtype: “pores” dbpprop:sporeprintcolor: “brown” dbpprop:stipecharacter: “NA” or “bare”

[email protected] - http://eol.org Name Data from DBpedia

• Example names: dbpprop:binomial “Ganoderma lucidum” dbpedia-owl:family dbpedia: rdfs:label and dbpprop:s 灵芝 rdfs:label and dbpprop:t 靈芝 • No synonyms, no close relatives

[email protected] - http://eol.org Other Traits from DBpedia

• From Wikipedia • But not very much: dbpprop:ecologicaltype: “saprotrophic” or “parasitic” dbpprop:howedible: “edible”

[email protected] - http://eol.org [email protected] - http://eol.org [email protected] - http://eol.org TranslateWiki – http://translatewiki.net

[email protected] - http://eol.org [email protected] - http://eol.org What is in EOL?

• Name data: – Successful search for “Lingzhi”, “灵芝”, “靈芝” – Stable URIs for Ganoderma lucidum, G. sichuanense, Ganoderma, Ganodermataceae – Multiple classifications – Available through the API

[email protected] - http://eol.org [email protected] - http://eol.org Traits from EOL?

• Wikipedia article • No trait data through the API

[email protected] - http://eol.org What will be coming from EOL?

• http://lod.eol.org (Under construction) – Will provide names data available through EOL API to the Linked Open Data Cloud • Add computable data from content partners

• Seeking funding for TraitBank: A global repository for biodiversity trait data

[email protected] - http://eol.org

Challenges

• Getting Trait Data – Computable Descriptions – Semantic Web Ontologies – Peer-Review – Definition of a Scientific Name • Applications – Computer-Aided Identification – Computer-Aided Species Discovery

[email protected] - http://eol.org Modern Species Description

Cap: 2-30 cm; at first irregularly knobby or elongated, but by maturity more or less fan-shaped; with a shiny, varnished surface often roughly arranged into lumpy "zones"; red to reddish brown when mature; when young often with zones of bright yellow and white toward the margin.

Pore Surface: Whitish, becoming dingy brownish in age; usually bruising brown; with 4-7 tiny (nearly invisible to the naked eye) circular pores per mm; tubes to 2 cm deep.

Stem: Sometimes absent, but more commonly present; 3-14 cm long; up to 3 cm thick; twisted; equal or irregular; varnished and colored like the cap; often distinctively angled away from one side of the cap.

[email protected] - http://eol.org Semantic Web Description

Class: SV1234 EquivalentTo: and (hasOverallShape some ShelfFungi) and (hasHymenophoreShape some Pored) and ((hasUpperSurfaceColor some DarkBrown) or (hasUpperSurfaceColor some White) or (hasUpperSurfaceColor some Yellow)) ...

[email protected] - http://eol.org Semantic Web Property

Definition hasOverallShape

Range

ShelfFungi Definition StipitateAgaric Stalked Ascomycete …

[email protected] - http://eol.org Semantic Web Ontology

• Collection of Properties and Values – hasOverallShape – ShelfFungi – StiptateAgaric – StalkedAscomycete – hasHymenophoreShape – Pored – hasUpperSurfaceColor – DarkBrown – White – Yellow • And their relationships

[email protected] - http://eol.org Peer-Review Process

• More Consensus == More Value • Review should be online & reasonably fast • Need review before use • Measurable collaborative process • Is there a role for anonymous review?

[email protected] - http://eol.org Definition of a Scientific Name

Reference Latin name Type Specimen

Circumscription

Cap: 2-30 cm; at first irregularly knobby or elongated, but by maturity more or less fan-shaped; with a shiny, var nished su r face often rou ghly arranged into lumpy " zones" ; red to reddish brown when matu re; when young often with zones of bright yellow and white towar d the mar gin. Pore Surface: Whitish, becoming dingy brownish in age; usually bruising brown; with 4-7 tiny (nearly invisible to

[email protected] - http://eol.org Definition of a Scientific Name

Reference Latin name Type Specimen

Multiple

Cap: 2-30 cm; at first irregularly Circumscriptions knobby or elongated, but by maturity more or less fan-shaped; with a shiny, var nished su r face often rou ghly Cap: 5-25 cm; more or less fan- arranged into lumpy " zones" ; red to shaped; with a shiny, varnished reddish brown when matu re; when su r face; rbrown when matu re; when young often with zones of bright young often with zones of bright yellow and white towar d the mar gin. yellow and white towar d the mar gin. Pore Surface: Whitish, becoming dingyPore Su r face: W hitish, becoming brownish in age; usually bruising dingy brownish in age; usually brown; with 4-7 tiny (nearly invisiblebruising to brown; [email protected] - http://eol.org Definition of a Circumscription Identifier: SV1234

E q u i val en t To: Fungus and (hasOverallShape some ShelfFu ngi) Circumscription and (hasHymenophoreShape some Pored) and ((hasUpperSurfaceColor some DarkBrown) ...

Unique Name?

Ganoderma lucidum Ganoderma multipileum Ganoderma sichuanense

[email protected] - http://eol.org Computer-Aided Identification

• Several existing systems: Delta, Lucid Keys, MatchMaker (for mushrooms), … • Small number of editors, slow moving, very little shared vocabulary • Semantic Web approach promises many contributors, fast moving, peer-reviewed, shared vocabulary

[email protected] - http://eol.org MushroomObserver.org

[email protected] - http://eol.org MushroomObserver.org

[email protected] - http://eol.org Computer-Aided Species Discovery

• Requires understanding recognized species • Searching for distinctive features • Sequencing support • Efficient publication

Psilocybe allenii Borov., Rockefeller & P.G. Werner California, USA © 2009 Alan Rockefeller

[email protected] - http://eol.org Conclusions

• Semantic biodiversity data can make a significant difference to managing biodiversity • Some semantic biodiversity data is available on the Internet • EOL is engaged in significantly improving that data • There is a lot more work to do and we need to do it together

[email protected] - http://eol.org Acknowledgements

• Dr. Shao, Jason Mai & Lee- Sea Chen, Academia Sinica • Encyclopedia of Life • Marine Biological Laboratory • Sloan and MacArthur Foundations • Dr. Deborah McGuinness, Han Wang & Katie Dunn, Rensselaer Polytechnic

Institute Cantharellus californicus Arora & Dunham • Mushroom Observer California, USA © 2002 Nathan Wilson Community

[email protected] - http://eol.org