eniS eumC o tnemtrapeD tnemtrapeD tnemtrapeD fo fo fo retupmoC retupmoC retupmoC ecneicS ecneicS ecneicS

a cs sess o a ngo egdelwonK egdelwonK egdelwonK inagro inagro inagro taz taz taz noi noi noi smetsys smetsys smetsys hcus hcus hcus sa sa sa o ds ea s gltoda ruaseht ruaseht i ruaseht i i dna dna dna igolotno igolotno igolotno se se se era era era desu desu desu rof rof rof aA aA l aA l l t t o t - o o - - DD DD DD

O ygolotn ygolotnO ygolotnO secivreS secivreS secivreS rof rof rof nitmon o y bdfi h n vorpmi vorpmi vorpmi gni gni gni eht eht eht ibadnfi ibadnfi l ibadnfi l i l i i yt yt yt fo fo fo tamrofni tamrofni tamrofni .noi .noi .noi 101 101 101 n so p ce tenc et e nma yehT yehT yehT inomrah inomrah inomrah ez ez ez eht eht eht tnetnoc tnetnoc tnetnoc rcsed rcsed rcsed tpi tpi tpi snoi snoi snoi dna dna dna / / /

ewe dan tiliaeoen d vorp vorp vorp edi edi edi ibareporetni ibareporetni l ibareporetni l i l i i yt yt yt ni ni ni dna dna dna neewteb neewteb neewteb 7102 7102 7102 K egdelwon egdelwonK egdelwonK noitazinagrO noitazinagrO noitazinagrO usm ahu mty nitamrofni tamrofni tamrofni noi noi noi smetsys smetsys , smetsys , , hcus hcus hcus sa sa sa muesum muesum muesum nuoJ i nuoJ imouT i nuoJ nen imouT i nen imouT nen srt nloda siabl ltgd ,snoitcelloc ,snoitcelloc ,snoitcelloc latigid latigid latigid ,seirarbil ,seirarbil ,seirarbil dna dna dna enilno enilno enilno .serots .serots .serots S smetsy smetsyS smetsyS fo esu eht etat i l icaf secivres ygolotnO ygolotnO ygolotnO secivres secivres secivres icaf icaf l icaf l i l i i etat etat etat eht eht eht esu esu esu fo fo fo i sess o a ngo egdelwonk egdelwonk egdelwonk inagro inagro inagro taz taz taz noi noi noi smetsys smetsys smetsys ni ni ni rof sdohtem stneserp siseht sihT .snoitacilppa .snoitacilppa .snoitacilppa sihT sihT sihT siseht siseht siseht stneserp stneserp stneserp sdohtem sdohtem sdohtem rof rof rof nto o sca yoon e e e soc soc t soc t t fe- fe- fe- cef cef t cef t i t i i ev ev ev ygolotno ygolotno ygolotno ssecca ssecca ssecca rof rof rof tnetnoc tnetnoc tnetnoc J inuo inuoJ inuoJ nenimouT nenimouT nenimouT n ,secas o arfi ,srexedni srexedni , srexedni , , tamrofni tamrofni tamrofni noi noi noi srehcraes srehcraes , srehcraes , , dna dna dna rplvdnitc lppa lppa lppa taci taci taci noi noi noi srepoleved srepoleved . srepoleved . .

ci lbup a sa dedivorp era sloot depoleved ehT ehT ehT depoleved depoleved depoleved sloot sloot sloot era era era dedivorp dedivorp dedivorp sa sa sa a a a lbup lbup lbup ci ci ci h ea cf et n ,cve a nv l l l gnivi gnivi gnivi bal bal bal ,ecivres ,ecivres ,ecivres dna dna dna yeht yeht yeht icaf icaf l icaf l i l i i etat etat etat eht eht eht nrffi of sioon f s sontlumis lumis lumis suoenat suoenat suoenat esu esu esu fo fo fo seigolotno seigolotno seigolotno morf morf morf id id f id f f tneref tneref tneref dna ,scimonoce ,cisum sa hcus ,sniamod ,sniamod ,sniamod hcus hcus hcus sa sa sa ,cisum ,cisum ,cisum ,scimonoce ,scimonoce ,scimonoce dna dna dna o dteep dhe .eu uirga rga rga luci luci luci erut erut . erut . . A A A dohtem dohtem dohtem i i s i s s detneserp detneserp detneserp rof rof rof oloib fo senh ht g ganam ganam ganam i i i gn gn gn t t t eh eh eh egnahc egnahc s egnahc s s o o o f f f b b b i i o i o o l l l go go go i i c i c a c a a l l l g o n mnx t t t imonoxa imonoxa e imonoxa e s e s s a a a s s s na na na no no no t t o t o o l l l ygo ygo ygo . . .

m S n a O egeloK rof secivre g nO t o nO t l o l ygo ygo eS r v eS r i v c i e s c e s f o r f o lwonK r gde lwonK e gde e rO nag rO i nag z i a t z a i t no i no yS s sme yS t s sme t mi revoC C C o o o v v v e e e r r r i i i am am am g g g e e e : : : nO t o l ygo eS r v i c e s f o r lwonK gde e rO nag i z a t i no yS s sme t slevel eerht pot eht fo yhcrareih tpecnoc ehT ehT ehT tpecnoc tpecnoc tpecnoc yhcrareih yhcrareih yhcrareih fo fo fo eht eht eht pot pot pot eerht eerht eerht slevel slevel slevel KK eht fo o o f f f t t h t h h e e e OKOK OKOK OKOK o o o n n n t t o t o o l l o l o o g g g y y y

B I I I NBS NBS NBS 6-6547-06-259-879 6-6547-06-259-879 6-6547-06-259-879 ( p ( ( p r p r i n r i n i t n t t de de ) de ) ) SSENISUB SSENISUB SSENISUB + + + B I I I NBS NBS NBS 9-5547-06-259-879 9-5547-06-259-879 9-5547-06-259-879 ( ( ( dp dp f dp ) f ) f ) E E E MONOC MONOC MONOC Y Y Y S I I I NSS NSS NSS - L - - L L 9971 9971 - 9971 - - 4394 4394 4394 S I I I NSS NSS NSS 9971 9971 - 9971 - - 4394 4394 4394 ( p ( ( p r p r i n r i n i t n t t de de ) de ) ) TRA TRA TRA + + + S I I I NSS NSS NSS 9971 9971 - 9971 - - 2494 2494 2494 ( ( ( dp dp f dp ) f ) f ) NGISED NGISED NGISED + + + HCRA HCRA HCRA I T I I T T TCE TCE TCE ERU ERU ERU ytisrevinU otlaA otlaA otlaA ytisrevinU ytisrevinU ytisrevinU ceic oohcS oohcS l oohcS l o l o f o f f cS cS i cS i i ecne ecne ecne ECNEICS ECNEICS ECNEICS + + + cec rtpo f tnemtrapeD tnemtrapeD tnemtrapeD fo fo fo retupmoC retupmoC retupmoC ecneicS ecneicS ecneicS T T E T E E ONHC ONHC ONHC L L L GO GO GO Y Y Y www www www . a . a . a a l a t l o t l o t . o f . i f . f i i EVOSSORC VOSSORC VOSSORC RE RE RE COD COD COD T T T RO RO RO A A A L L L +ggfeha*GMFTSH9 +ggfeha*GMFTSH9+ggfeha*GMFTSH9 COD COD COD T T T RO RO RO A A A L L L SNOITATRESSID SNOITATRESSID SNOITATRESSID

SNOITATRESSID SNOITATRESSID SNOITATRESSID ytisrevinU ytisrevinU ytisrevinU otlaA otlaAotlaA

7102 7102 7102 seires noitacilbup ytisrevinU otlaA ytisrevinU noitacilbup seires SNOITATRESSID LAROTCOD SNOITATRESSID 101/7102

egdelwonK rof secivreS ygolotnO secivreS rof egdelwonK smetsyS noitazinagrO smetsyS

nenimouT inuoJ nenimouT

fo rotcoD fo eerged eht rof detelpmoc noitatressid larotcod A larotcod noitatressid detelpmoc rof eht eerged fo rotcoD fo eht fo noissimrep eht htiw ,dednefed eb ot )ygolonhceT( ecneicS )ygolonhceT( ot eb ,dednefed htiw eht noissimrep fo eht ta dleh noitanimaxe cilbup a ta ,ecneicS fo loohcS ytisrevinU otlaA ytisrevinU loohcS fo ,ecneicS ta a cilbup noitanimaxe dleh ta .noon 21 ta 7102 enuJ 61 no loohcs eht fo 1UT llah erutcel eht erutcel llah 1UT fo eht loohcs no 61 enuJ 7102 ta 21 .noon

ytisrevinU otlaA ytisrevinU ecneicS fo loohcS fo ecneicS ecneicS retupmoC fo tnemtrapeD fo retupmoC ecneicS puorG hcraeseR gnitupmoC citnameS gnitupmoC hcraeseR puorG rosseforp gnisivrepuS rosseforp nenövyH oreE nenövyH

rosivda sisehT rosivda nenövyH oreE nenövyH

srenimaxe yranimilerP srenimaxe dnalniF ,älyksävyJ fo ytisrevinU ,nayizreT nagaV rosseforP nagaV ,nayizreT ytisrevinU fo ,älyksävyJ dnalniF dnalerI ,yawlaG dnalerI fo ytisrevinU lanoitaN ,niuqA'd ueihtaM rosseforP ueihtaM ,niuqA'd lanoitaN ytisrevinU fo dnalerI ,yawlaG dnalerI

tnenoppO setatS detinU ,ytisrevinU etatS tneK ,gneZ ieL aicraM rosseforP aicraM ieL ,gneZ tneK etatS ,ytisrevinU detinU setatS

seires noitacilbup ytisrevinU otlaA ytisrevinU noitacilbup seires SNOITATRESSID LAROTCOD SNOITATRESSID 101/7102

© inuoJ nenimouT

NBSI 6-6547-06-259-879 )detnirp( NBSI 9-5547-06-259-879 )fdp( L-NSSI 4394-9971 NSSI 4394-9971 )detnirp( NSSI 2494-9971 )fdp( :NBSI:NRU/if.nru//:ptth 9-5547-06-259-879

yO aifarginU yO iknisleH 7102

dnalniF

tcartsbA otlaA 67000-IF ,00011 xoB .O.P ,ytisrevinU otlaA ,ytisrevinU .O.P xoB ,00011 67000-IF otlaA if.otlaa.www

rohtuA nenimouT inuoJ nenimouT noitatressid larotcod eht fo emaN fo eht larotcod noitatressid smetsyS noitazinagrO egdelwonK rof secivreS ygolotnO secivreS rof egdelwonK noitazinagrO smetsyS rehsilbuP loohcS fo ecneicS tinU tnemtrapeD fo retupmoC ecneicS seireS otlaA ytisrevinU noitacilbup seires LAROTCOD SNOITATRESSID 101/7102 hcraeser fo dleiF fo hcraeser egdelwonK ygolonhceT dettimbus tpircsunaM dettimbus 21 hcraM 7102 etaD fo eht ecnefed 61 enuJ 7102 )etad( detnarg hsilbup ot noissimreP ot hsilbup detnarg )etad( 52 lirpA 7102 egaugnaL hsilgnE hpargonoM elcitrA noitatressid yassE noitatressid

tcartsbA eb nac ,seiralubacov dellortnoc sa hcus ,smetsys noitazinagro egdelwonk rehto dna seigolotnO dna rehto egdelwonk noitazinagro ,smetsys hcus sa dellortnoc ,seiralubacov nac eb a gnisu stnemucod fo stnetnoc eht gnibircsed yB .noitamrofni fo ytilibadnfi eht ecnahne ot desu ot ecnahne eht ytilibadnfi fo .noitamrofni yB gnibircsed eht stnetnoc fo stnemucod gnisu a gnisworb dna hcraes tneicfife edivorp nac smetsys noitamrofni ,ygolonimret dezinomrah ,derahs dezinomrah ,ygolonimret noitamrofni smetsys nac edivorp tneicfife hcraes dna gnisworb gniliaverp eht fo emos evlos ot smia atadatem evitpircsed ticilpxE .stnetnoc eht rof seitilanoitcnuf rof eht .stnetnoc ticilpxE evitpircsed atadatem smia ot evlos emos fo eht gniliaverp dna smynonys fo gnissecorp eht gnidulcni ,senigne hcraes ynam ni hcraes txet lluf ni seussi ni lluf txet hcraes ni ynam hcraes ,senigne gnidulcni eht gnissecorp fo smynonys dna fo ytilibassecorp-enihcam eht selbane sledom niamod sa seigolotno fo esu ehT .smynomoh ehT esu fo seigolotno sa niamod sledom selbane eht ytilibassecorp-enihcam fo eht gnissecorp fo syaw tnegilletni rehto dna ,noitargetni noitamrofni ,gninosaer citnames ,stnetnoc citnames ,gninosaer noitamrofni ,noitargetni dna rehto tnegilletni syaw fo gnissecorp eht .atad laveirter noitamrofni dna gnixedni tnetnoc ni smetsys noitazinagro egdelwonk fo noitazilitu ehT noitazilitu fo egdelwonk noitazinagro smetsys ni tnetnoc gnixedni dna noitamrofni laveirter dna seiduts siseht sihT .esu tneicfife rieht rof sloot detamotua gnidivorp yb detatilicaf eb nac eb detatilicaf yb gnidivorp detamotua sloot rof rieht tneicfife .esu sihT siseht seiduts dna sa smetsys noitazinagro egdelwonk gnisu dna gnihsilbup rof smetsys dna sdohtem levon stneserp levon sdohtem dna smetsys rof gnihsilbup dna gnisu egdelwonk noitazinagro smetsys sa taht smetsys epytotorp gnitaulave dna gningised yb detcudnoc si hcraeser ehT .secivres ygolotno .secivres ehT hcraeser si detcudnoc yb gningised dna gnitaulave epytotorp smetsys taht ngised eht fo selpicnirp eht swollof hcraeser ehT .sesac esu efil-laer ni seigolotno fo esu eht troppus eht esu fo seigolotno ni efil-laer esu .sesac ehT hcraeser swollof eht selpicnirp fo eht ngised .seigolodohtem hcraeser noitca dna ecneics dna noitca hcraeser .seigolodohtem gnimmargorp noitacilppa dna stnenopmoc ecafretni resu sedivorp metsys IKNO detneserp ehT detneserp IKNO metsys sedivorp resu ecafretni stnenopmoc dna noitacilppa gnimmargorp .swoflkrow desab-ygolotno elbane ot snoitacilppa lanretxe otni detargetni eb nac taht secafretni taht nac eb detargetni otni lanretxe snoitacilppa ot elbane desab-ygolotno .swoflkrow .seigolotno fo spuorg resu niam eht fo sdeen eht gnizylana no desab era metsys eht fo serutaef ehT serutaef fo eht metsys era desab no gnizylana eht sdeen fo eht niam resu spuorg fo .seigolotno ,hcraes tpecnoc edulcni swoflkrow desab-ygolotno ni defiitnedi seitilanoitcnuf nommoc ehT nommoc seitilanoitcnuf defiitnedi ni desab-ygolotno swoflkrow edulcni tpecnoc ,hcraes .noitceles dna ,gnisworb dna .noitceles a gnihsilbup dna gniganam rof hcaorppa duolc ygolotnO nepO dekniL eht stneserp siseht ehT siseht stneserp eht dekniL nepO ygolotnO duolc hcaorppa rof gniganam dna gnihsilbup a elpitlum esu ot sresu eht selbane metsys ehT .ecivres ygolotno na ni seigolotno deknilretni fo tes fo deknilretni seigolotno ni na ygolotno .ecivres ehT metsys selbane eht sresu ot esu elpitlum .seigolotno laudividni fo daetsni noitatneserper niamod-ssorc ,elbareporetni ,elgnis a sa seigolotno sa a ,elgnis ,elbareporetni niamod-ssorc noitatneserper daetsni fo laudividni .seigolotno eht ,seirotisoper ygolotno tnereffid ni dehsilbup seigolotno fo esu suoenatlumis eht gnitatilicaf roF gnitatilicaf eht suoenatlumis esu fo seigolotno dehsilbup ni tnereffid ygolotno ,seirotisoper eht .detneserp si hcaorppa yrotisopeR ygolotnO dezilamroN ygolotnO yrotisopeR hcaorppa si .detneserp metsys noitazinagro egdelwonk hcir yllacitnames a gnihsilbup dna gniganam fo esac esu a sA a esu esac fo gniganam dna gnihsilbup a yllacitnames hcir egdelwonk noitazinagro metsys serutalcnemon lacigoloib rof ledom ygolotnO-ateM noxaT eht stneserp siseht eht ,ygolotno na sa na ,ygolotno eht siseht stneserp eht noxaT ygolotnO-ateM ledom rof lacigoloib serutalcnemon fo snoinipo gnireffid dna segnahc fo noitatneserper eht stroppus ledom ehT .snoitacfiissalc dna .snoitacfiissalc ehT ledom stroppus eht noitatneserper fo segnahc dna gnireffid snoinipo fo .stpecnoc cimonoxat .stpecnoc evah siseht siht ni detneserp sdohtem eht gnisu depoleved seigolotno eht dna metsys IKNO ehT IKNO metsys dna eht seigolotno depoleved gnisu eht sdohtem detneserp ni siht siseht evah ecivres bal gnivil a sa dedivorp neeb dedivorp sa a gnivil bal ecivres fi.ikno//:ptth , hcihw sah neeb nur ecnis .8002 ehT ecivres noitamrofni ,srexedni tnetnoc gnidulcni ,seigolotno fo sresu eht rof troppus dna sloot sedivorp sloot dna troppus rof eht sresu fo ,seigolotno gnidulcni tnetnoc ,srexedni noitamrofni .srepoleved noitacilppa dna ,srepoleved ygolotno ,srehcraes ygolotno ,srepoleved dna noitacilppa .srepoleved

sdrowyeK citnames ,bew egdelwonk noitazinagro ,smetsys atadatem ,noitaerc ygolotno secivres )detnirp( NBSI )detnirp( 6-6547-06-259-879 NBSI )fdp( 9-5547-06-259-879 L-NSSI 4394-9971 NSSI )detnirp( 4394-9971 NSSI )fdp( 2494-9971 rehsilbup fo noitacoL fo rehsilbup iknisleH noitacoL fo gnitnirp iknisleH raeY 7102 segaP 891 nru :NBSI:NRU/fi.nru//:ptth 9-5547-06-259-879

ämletsiviiT otlaA 67000 ,00011 LP ,otsipoily-otlaA LP ,00011 67000 otlaA if.otlaa.www

äjikeT nenimouT inuoJ nenimouT imin najriksötiäV imin smetsyS noitazinagrO egdelwonK rof secivreS ygolotnO secivreS rof egdelwonK noitazinagrO smetsyS ajisiakluJ nedieteitsureP uluokaekrok ökkiskY nakiinketoteiT sotial ajraS otlaA ytisrevinU noitacilbup seires LAROTCOD SNOITATRESSID 101/7102 alasumiktuT akkiinketsymäteiT mvp neskutiojrikisäK mvp 7102.30.21 äviäpsötiäV 7102.60.61 äviäpsimätnöym navulusiakluJ äviäpsimätnöym 7102.40.52 ileiK itnalgnE aifargonoM ajriksötiävilekkitrA ajriksötiäveessE

ämletsiviiT ,ajotsanas ajutiollortnok netuk ,äimletenem nesimätsejräj neskymäteit atium aj atioigolotnO aj atium neskymäteit nesimätsejräj ,äimletenem netuk ajutiollortnok ,ajotsanas naalliavuk tölläsis neittnemukod nuK .iskesimatnarap nesimätyöl nodeit äättyäk naadiov äättyäk nodeit nesimätyöl .iskesimatnarap nuK neittnemukod tölläsis naalliavuk -ukah atiakkohet atojrat taviov tämletsejräjoteit ,aaigolonimret äyttetsiänethy ,autteaj ällämättyäk ,autteaj äyttetsiänethy ,aaigolonimret tämletsejräjoteit taviov atojrat atiakkohet -ukah iikryp oteitatem aveliavuk ,yttetise itsesittiisilpskE .nihiötläsis aiskuusillannimiotsuales aj aiskuusillannimiotsuales .nihiötläsis itsesittiisilpskE ,yttetise aveliavuk oteitatem iikryp aj neimyynonys netuk ,aimlegno nuahitsketokok nämättyäk nedienokukah neinom naamesiaktar neinom nedienokukah nämättyäk nuahitsketokok ,aimlegno netuk neimyynonys aj nejötläsis aatsillodham aniellametisäk nenimättyäk nedioigolotnO .nesimioimouh neimyynomoh .nesimioimouh nedioigolotnO nenimättyäk aniellametisäk aatsillodham nejötläsis .äimletenem ätiäkkylä atium aj ninniorgetni nodeit ,nylettääp nesittnames ,nylettisäk nesilleenok ,nylettisäk nesittnames ,nylettääp nodeit ninniorgetni aj atium ätiäkkylä .äimletenem assuah aj assinnioskedni nejötläsis ätsimätnydöyh neimletenem nesimätsejräj neskymäteiT nesimätsejräj neimletenem ätsimätnydöyh nejötläsis assinnioskedni aj assuah neesaakkohet nediin ajulaköyt ajutiositamotua ellijättyäk allamaojrat aattopleh naadiov aattopleh allamaojrat ellijättyäk ajutiositamotua ajulaköyt nediin neesaakkohet äimletsejräj aj äimletenem aisialneduu näätetise aj naatiktut assajriksötiäv ässäT .neesimättyäk ässäT assajriksötiäv naatiktut aj näätetise aisialneduu äimletenem aj äimletsejräj no sumiktuT .aniulevlapaigolotno iskesimesiakluj neimletenem nesimätsejräj neskymäteit nesimätsejräj neimletenem iskesimesiakluj .aniulevlapaigolotno sumiktuT no nedioigolotno tävätside aktoj ,äimletsejräjippyytotorp allamioivra aj allamelettinnuus uttetuetot allamelettinnuus aj allamioivra ,äimletsejräjippyytotorp aktoj tävätside nedioigolotno aj neeteitulettinnuus uutuajon sumiktuT .assiskuapatöttyäk assisilledot ätsimättyäk assisilledot .assiskuapatöttyäk sumiktuT uutuajon neeteitulettinnuus aj .niisiettaairep nedioigolodotem neskumiktutatnimiot nedioigolodotem .niisiettaairep aisillamlejho aj ajettnenopmokämyttiilöttyäk aaojrat ämletsejräj-IKNO yttetise ässöyT yttetise ämletsejräj-IKNO aaojrat ajettnenopmokämyttiilöttyäk aj aisillamlejho nejukluknöyt netsiatsurepaigolotno niiskullevos niisioklu adiorgetni naadiov aktoj ,ajotnipajar aktoj naadiov adiorgetni niisioklu niiskullevos netsiatsurepaigolotno nejukluknöyt netsieksek nedioigolotno neutsurep uttetuetot no teduusianimo nämletsejräJ .iskesimatsillodham nämletsejräJ teduusianimo no uttetuetot neutsurep nedioigolotno netsieksek äisiely ajuttetsinnut atsiuluknöyt atsisiatsurepaigolotnO .neesimättivles nedieprat neimhyräjättyäk nedieprat .neesimättivles atsisiatsurepaigolotnO atsiuluknöyt ajuttetsinnut äisiely .atnilav aj uliales ,ukah neettisäk tavo aiskuusillannimiot tavo neettisäk ,ukah uliales aj .atnilav nejyttetiknil asniisiot ämletenem nevlipaigolotno nemiova nytetiknil näätetise ässöyt ässäT ässöyt näätetise nytetiknil nemiova nevlipaigolotno ämletenem asniisiot nejyttetiknil täjättyäk alluva nämletsejräJ .assulevlapaigolotno neesimesiakluj aj neesimätipälly nedioigolotno neesimätipälly aj neesimesiakluj .assulevlapaigolotno nämletsejräJ alluva täjättyäk anetuusianokok änävätsidhy tala ,anavimiotneethy ,änethy atioigolotno atiesu äättyäk taviov äättyäk atiesu atioigolotno ,änethy ,anavimiotneethy tala änävätsidhy anetuusianokok nesiakianamas nedioigolotno nejutsiakluj assiulevlapaigolotno irE .naajis nedioigolotno netsillire nedioigolotno .naajis irE assiulevlapaigolotno nejutsiakluj nedioigolotno nesiakianamas .ämletenem nulevlapaigolotno nudiosilamron näätetise iskesimattopleh nötyäk iskesimattopleh näätetise nudiosilamron nulevlapaigolotno .ämletenem ätsesimätipälly nämletenem nesimätsejräj neskymäteit naakkir itsesittnames aneskuapatöttyäK itsesittnames naakkir neskymäteit nesimätsejräj nämletenem ätsesimätipälly nenimonoskat netsutikoul aj nejötsimin netsigoloib näätetise ässöyt atsesimesiakluj aj atsesimesiakluj ässöyt näätetise netsigoloib nejötsimin aj netsutikoul nenimonoskat neivaekkiop naatsisiot aj netsotuum nediettisäk netsimonoskat aatsillodham illaM .illamaigolotno illaM aatsillodham netsimonoskat nediettisäk netsotuum aj naatsisiot neivaekkiop .nesimättise netsymekän .nesimättise ässivättetyäk teello tavo taigolotno tytetihek ällimletenem älliytetise ässöyt aj ämletsejräj-IKNO aj ässöyt älliytetise ällimletenem tytetihek taigolotno tavo teello ässivättetyäk assulevlap- bal gnivil bal assulevlap- fi.ikno//:ptth , akoj no tullo assannimiot atsedouv 8002 .neithäl ulevlaP ,elliojikah ,ellijioskedni nodeit netuk ,ellijättyäk nedioigolotno aekut aj ajulaköyt aaojrat ajulaköyt aj aekut nedioigolotno ,ellijättyäk netuk nodeit ,ellijioskedni ,elliojikah .ellijättiheksullevos aj ellijättihek nedioigolotno ellijättihek aj .ellijättiheksullevos

tanasniavA nenittnames ,bew neskymäteit nesimätsejräj ,tämletenem natadatem ,nenimattout tulevlapaigolotno )utteniap( NBSI )utteniap( 6-6547-06-259-879 NBSI )fdp( 9-5547-06-259-879 L-NSSI 4394-9971 NSSI )utteniap( 4394-9971 NSSI )fdp( 2494-9971 akkiapusiakluJ iknisleH akkiaponiaP iknisleH isouV 7102 äräämuviS 891 nru :NBSI:NRU/fi.nru//:ptth 9-5547-06-259-879

Preface

The seeds for this dissertation were planted in 2007 when I started working in the Semanting Computing Research Group (SeCo), at the Department of Computer Science, Aalto University (formerly the Department of Media Technology, Helsinki University of Technology) and the Department of Computer Science, University of Helsinki, Finland. I have had the opportunity to work with a lot of wonderful colleagues and collaborators during these years. I have been enjoying the guidance and support of professor Eero Hyvönen, who has made the work possible by fertilizing my imagination and encouraging me to continue my work. I wish to thank the pre-examiners professor Vagan Terziyan and professor Mathieu d’Aquin for valuable feedback and suggestions for improving this thesis. When I started in SeCo, I was handed to the gentle care of my colleagues Kim Viljanen and Eetu Mäkelä. Kim has been my trusted wingman in building the ONKI service and an elevating mentor—for that I will always be grateful. Eetu has given me valuable guidance and support on methodological and technological decisions. During my doctoral studies, I have been collaborating widely with Matias Frosterus. All the work would have been much less fun without him. I would like to thank Nina Laurenne for introducing me to the world of biological taxonomies, and diving with me in the great ball pool of colorful organisms. I wish to thank Tomi Kauppinen for his fruitful ideas on ontology services and showing by example how to be a productive researcher, and Tuukka Ruotsalo for always emphasizing the importance of scientific rigor in our work. The ontology services presented in the thesis would be of no use without the actual ontologies. I express my gratitude to the head ontologist Katri Seppälä and Tuomas Palonen, who have been working with the ontologies

1 Preface of the KOKO cloud. I wish also thank Osma Suominen, Sini Pessala, Jussi Kurki, Reetta Sinkkilä, and Robin Lindroos for participating in building the tools for supporting the ontology infrastructure, and Mikko Salonoja, Rami Aamulehto, Alex Johansson, and Henri Ylikotila for their work on the user interfaces of the ONKI service. Mikko Koho and Hannu Saarenmaa have been valuable contributors on the biological taxonomies. I am grateful for the support and collaboration of Erkki Heino, Esko Ikkala, Petri Leskinen, Arttu Oksanen, Claire Tamper, and all my other colleagues in SeCo and collaborators I have been working with during the years. The work presented in this thesis has been carried out as part of the National Semantic Web Ontology Project in Finland (FinnONTO, 2003– 2012) and Linked Data Finland project (2012–2014), both funded by the Finnish Funding Agency for Innovation (Tekes). The work has also received funding from Lusto – Finnish Forest Museum (2008), EU project MedIEQ (2006–2008), EU FP7 project SMARTMUSEUM (2008–2010), and EU FI-PPP project ENVIROFI (2011–2013). I have received grants from the Finnish Cultural Foundation (2013) and KAUTE Foundation (2015) for the doctoral research. During the thesis work, I have also worked in the National Gazetteer of Historical Places project (2015–2016), funded by the Finnish Cultural Foundation, the Semantic Finlex project (2015–2017), funded by the Min- istry of Justice and the Ministry of Finance, the Linked Open Data Science Service project (2015–2016), funded by the Ministry of Education and Culture, and the Cultures of Knowledge project (2015–2017), funded by The Andrew W. Mellon Foundation. I have received funding for short term scientific mission from the European Cooperation in Science and Technol- ogy (COST, 2016) and travel grants from the Helsinki Doctoral Programme in Computer Science (2014, 2015). I would like to thank my parents for their support and encouragement on the path I have chosen. Finally, I wish to express my deepest gratitude to Krista for her love and understanding during the long hours.

Helsinki, May 10, 2017,

Jouni Tuominen

2 Contents

Preface 1

Contents 3

List of Publications 5

Author’s Contribution 9

1. Introduction 13 1.1 Background and Research Environment ...... 13 1.2 Objectives and Scope ...... 15 1.3 Research Process and Dissertation Structure ...... 17

2. Theoretical Foundation 19 2.1 Modeling Knowledge Organization Systems ...... 19 2.1.1 Knowledge Organization Systems ...... 19 2.1.2 Interlinked Ontologies ...... 21 2.1.3 Biological Nomenclatures and Taxonomies ...... 22 2.2 Publishing and Using Knowledge Organization Systems . . 25 2.2.1 Ontology Servers ...... 25 2.2.2 Semantic Annotation and Information Retrieval . . . 29

3. Results 33 3.1 Ontology Services (RQ1) ...... 33 3.2 Publishing Multiple Ontologies (RQ2) ...... 36 3.3 Complex Knowledge Organization Systems (RQ3) ...... 39 3.4 Summary ...... 41

4. Discussion 43 4.1 Theoretical Implications ...... 44 4.1.1 Ontology Services (RQ1) ...... 44

3 Contents

4.1.2 Publishing Multiple Ontologies (RQ2) ...... 45 4.1.3 Complex Knowledge Organization Systems (RQ3) . . 46 4.1.4 Summary ...... 47 4.2 Practical Implications ...... 47 4.2.1 Ontology Services (RQ1) ...... 47 4.2.2 Publishing Multiple Ontologies (RQ2) ...... 49 4.2.3 Complex Knowledge Organization Systems (RQ3) . . 50 4.2.4 Summary ...... 50 4.3 Reliability and Validity ...... 50 4.4 Recommendations for Further Research ...... 54

Bibliography 57

Publications 81

4 List of Publications

This thesis consists of an overview and of the following publications which are referred to in the text by their Roman numerals.

I Kim Viljanen, Jouni Tuominen, and Eero Hyvönen. Ontology Libraries for Production Use: The Finnish Ontology Library Service ONKI. In The Semantic Web: Research and Applications: 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31–June 4, 2009, Proceedings, Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Simperl (editors), Lecture Notes in Computer Science, volume 5554, pages 781–795, ISBN 978-3-642-02120-6, Springer- Verlag, June 2009.

II Jouni Tuominen, Matias Frosterus, Kim Viljanen, and Eero Hyvönen. ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services. In The Semantic Web: Research and Applications: 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31–June 4, 2009, Proceedings, Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Simperl (editors), Lecture Notes in Computer Science, volume 5554, pages 781–795, ISBN 978-3- 642-02120-6, Springer-Verlag, June 2009.

III Jouni Tuominen, Tomi Kauppinen, Kim Viljanen, and Eero Hyvönen. Ontology-Based Query Expansion Widget for Information Retrieval. In Proceedings of the 5th International Workshop on Scripting and Develop-

5 List of Publications

ment for the Semantic Web at ESWC 2009, Heraklion, Greece, May 31, Sören Auer, Chris Bizer, and Gunnar Aastrand Grimnes (editors), CEUR Workshop Proceedings, volume 449, pages 52–57, ISSN 1613-0073, online CEUR-WS.org/Vol-449/ShortPaper1.pdf, May 2009.

IV Matias Frosterus, Jouni Tuominen, Sini Pessala and Eero Hyvönen. Linked Open Ontology cloud: managing a system of interlinked cross- domain light-weight ontologies. International Journal of Metadata, Se- mantics and Ontologies, 10, 3, pages 189–201, DOI 10.1504/IJMSO.2015.073879, December 2015.

V Kim Viljanen, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen. Nor- malized Access to Ontology Repositories. In ICSC 2012: 2012 IEEE Sixth International Conference on Semantic Computing, Palermo, Italy, 19-21 September 2012, pages 109–116, ISBN 978-1-4673-4433-3, IEEE, September 2012.

VI Jouni Tuominen, Nina Laurenne, and Eero Hyvönen. Biological Names and Taxonomies on the Semantic Web – Managing the Change in Sci- entific Conception. In The Semantic Web: Research and Applications: 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 – June 2, 2011, Proceedings, Part II, Grigoris Antoniou, Marko Grobelnik, Elena Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter De Leenheer, and Jeff Pan (editors), Lecture Notes in Computer Science, volume 6644, pages 255–269, ISBN 978-3-642-21063-1, Springer- Verlag, June 2011.

VII Nina Laurenne, Jouni Tuominen, Hannu Saarenmaa and Eero Hyvönen. Making species checklists understandable to machines – a shift from relational databases to ontologies. Journal of Biomedical Semantics, 5, 40, DOI 10.1186/2041-1480-5-40, September 2014.

VIII Jouni Tuominen, Nina Laurenne and Eero Hyvönen. Publishing and Using Plant Names as an Ontology Service. In Proceedings of the first international Workshop on Semantics for Biodiversity at ESWC 2013, Montpellier, France, May 27, Pierre Larmande, Elizabeth Arnaud, Isabelle Mougenot, Clement Jonquet, Therese Libourel, Manuel Ruiz (editors), CEUR Workshop Proceedings, volume 979, ISSN 1613-0073,

6 List of Publications online CEUR-WS.org/Vol-979/WS_s4biodiv2013_paper_2.pdf, May 2013.

7 List of Publications

8 Author’s Contribution

Publication I: “Ontology Libraries for Production Use: The Finnish Ontology Library Service ONKI”

The author contributed significantly to the design and specification of the ONKI library system, and the general requirements of ontology library services. The author was one of the two primary developers of the ONKI SKOS server and the ONKI selector widget, and implemented the ONKI API in the ONKI SKOS server.

Publication II: “ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services”

The author is the first author of the publication. The author was one of the two primary designers and developers of the ONKI SKOS server and the ONKI selector widget, and primary developer of the ONKI API and SKOS support for the system. The author implemented the demonstration search interface for using ontology-based query expansion in the Kantapuu system.

Publication III: “Ontology-Based Query Expansion Widget for Information Retrieval”

The author is the first author of the publication. The author designed and implemented the query expansion functionality of the ONKI selector widget, configured the ontologies used in the use case and implemented the demonstration search interface.

9 Author’s Contribution

Publication IV: “Linked Open Ontology cloud: managing a system of interlinked cross-domain light-weight ontologies”

The author contributed significantly to the principles guiding the building of the ontology cloud and the management process of the cloud. The author designed the publication of the KOKO ontology cloud in the ontology service, and developed the functionalities in the ONKI service to support the processing of the ontology cloud and individual domain ontologies mapped to the general upper ontology.

Publication V: “Normalized Access to Ontology Repositories”

The author participated in the design of the Normalized Ontology Reposi- tory (NOR) approach and its API. The author was the primary developer of the HTTP API used to access the ontologies in the ONKI SKOS server. The author contributed to the design of the ONKI3 user interface and the ontology metadata specification.

Publication VI: “Biological Names and Taxonomies on the Semantic Web – Managing the Change in Scientific Conception”

The author shares the first authorship of the publication. The author was the one of the two primary designers of the Taxon Meta-Ontology (TaxMeOn), responsible for the computer science expertise, and was in charge of the technical implementation of the model. The author committed the technical details related to the use cases.

Publication VII: “Making species checklists understandable to machines – a shift from relational databases to ontologies”

The author contributed to the publication equally as the first author. The author was one of the two primary designers of the taxonomic meta- ontology, and responsible for the computer science expertise, the technical comparison of the LSID and HTTP URI identifiers, and evaluation of the models for managing taxonomic information.

10 Author’s Contribution

Publication VIII: “Publishing and Using Plant Names as an Ontology Service”

The author is the first author of the publication. The author was one of the two primary designers of the ontology model of the vernacular names, re- sponsible for the computer science expertise, published the ontology in the ONKI service, and designed the management process of the nomenclature based on the SAHA metadata editor.

In addition to these publications, the thesis contains references to related work by the author. They include the description of the first version of the ONKI widget [268], the FinnONTO ontology infrastructure [136], the ONKI2 user interface [258], the SPARQL-based ontology service ONKI Light [248], the deployment and further development of the ONKI service by the National Library of Finland [249], and the use of ontology services in the legal domain [87]. The author has also published other work related to ontology services, including using the ONKI service in cultural heritage [133], birdwatch- ing [129], for publishing historical places as on ontology time series [135], and for visualizing automatically enriched ontology relationships based on co-occurrences of concepts in annotations [153]. Furthermore, the author has worked on publishing ontologies in a linked data service [134] and developing a federated ontology service for historical places [130].

11 Author’s Contribution

12 1. Introduction

1.1 Background and Research Environment

As the amount of information in information systems grows, it is harder for the users to find relevant information for their needs [137]. Not only is the information hard to find, but it is not connected to other relevant information—thus getting an extensive understanding about a topic is challenging. These issues are intensified in the massive World Wide Web, which was estimated to contain 11.5 billion indexable web pages already in 2005 [102], and almost 50 billion of them in a more recent study [263]. The full text search employed by many search engines has several limi- tations. A simple search algorithm does not distinguish significant words from non-significant ones in the document, leading to the loss of precision of the search [46]. The issue can be compensated by ordering the search results based on their assessed relevancy for the search task [181], effec- tively displaying the most relevant results first. On the other hand, all significant terms regarding the information content of the document might not appear in the document, decreasing the recall of the search [114]. The Semantic Web1 [28, 77] is an extension of the current World Wide Web, providing technologies for processing information based on their semantics. Managing textual contents on the conceptual level resolves the issues of purely lexical full text search, such as handling synonymy and homonymy. A practical subtopic of the Semantic Web is the Linked Data [27, 112] concept, which is a method for publishing data in an inter- linked way. The semantic interoperability and interlinking of the infor- mation contents in the web changes the nature of the web from the web of documents to the web of data. These technologies enable building of

1http://www.w3.org/2001/sw/

13 Introduction services on top of the integrated datasets, for example, providing users novel search and recommendation interfaces, thus improving information findability. Ontologies are at the core of the Semantic Web infrastructure, as they model the domains of interest in a formal, machine-understandable way. An ontology acts as a shared conceptualization of a domain [98, 35, 244], enabling different parties to use a common language when communicating about the domain [100]. When information objects are described with meta- data [222, 17, 71, 46] referring to concepts of an ontology, machines can interpret their meanings. This semantic content annotation enables, e.g., the integration of heterogeneous data collections and automatic reasoning based on the properties of concepts [143, 239]. An ontology can be built by a domain expert, but also methods for auto- matic and semi-automatic ontology generation exist [176]. Also the content annotation can be done manually or (semi-)automatically [159, 234, 276]. In addition to ontologies, semantically lighter knowledge organization systems (KOS) originating from library and information science, such as subject headings, classifications, and thesauri can be used to harmonize the used terminology in content descriptions and search [124, 95]. KOSs can also be utilized in, e.g., query expansion, cross-language search, and as a navigation aid for accessing contents. Knowledge organization systems can be interlinked in order to facilitate the integration of data described using different KOSs, by utilizing the methods of ontology mapping [127] and matching [233]. An alternative method for using explicit ontology-based metadata for improving the information findability is to use automatic methods analyz- ing the contents of information objects. For example, natural language processing methods can be used to identify the meanings of words based on their context in a text document [53]. However, the strength of using explicit metadata is its applicability also to non-textual objects, such as images and videos, as otherwise text search would not be possible without reliable content analysis methods. For facilitating the use of ontologies, specialized software systems— ontology servers, have been proposed for publishing ontologies and pro- viding services for using them [79, 68, 9, 108, 60]. Most of the ontology server implementations introduced in the Semantic Web research have been designed for developing ontologies, and not for their actual usage, e.g., in content annotation or information retrieval, and therefore lack

14 Introduction crucial functionalities needed in applications [9]. The common function- alities of ontology servers include user interfaces for visualization and browsing of ontologies, and searching for concepts in an ontology. Several implementations also provide application programming interfaces (API) for the programmatic use of the ontologies. In addition to APIs, ontology functionalities may be integrated into client systems with user interface components.

1.2 Objectives and Scope

The aim of this thesis is to provide methods and technological solutions for publishing knowledge organization systems in such a way that they can be utilized cost-effectively in external applications. As a solution, the notion of an ontology service is presented. An ontology service is a software system that can be used by ontology developers for publishing their ontologies, and by content indexers, information searchers, and application developers to use ontologies in their tasks. The user needs for knowledge organiza- tion systems are analyzed, and based on them a set of requirements for functionalities is formulated. As a proof-of-concept system, implementa- tions of such an ontology service are provided and their application to real life cases is reported. The KOSs used in the cases include Finnish and international thesauri, lightweight ontologies covering general and domain-specific concepts, and semantically richer biological nomenclatures and classifications. The objectives of the ontology services presented in this thesis are:

Ontology publication channel. Provide a complete publication work- • flow for the ontology developers to publish an ontology, or a new version of it.

Heterogeneous ontologies. Support for distinct ontology formats by • using a harmonizing data model and configuration options.

Tools for metadata creation. Provide means to ontology-based con- • tent indexing.

Support for distributed content creation. Facilitate content cre- • ation in distributed workflows, where content is curated by independent parties and aggregated into one system, e.g., a web portal.

Facilitate search tasks. Support the use of published ontologies in •

15 Introduction

information retrieval by offering functionalities, e.g., for query expansion.

Multiple ontologies and repositories. The users should be able to ac- • cess multiple ontologies, even originating from different ontology services, simultaneously in a coherent way.

Programmatic access. Applications should be able to use the on- • tologies via application programming interfaces (API) for searching for concepts and getting their properties.

Evaluation by applying into practice. The applicability of the ser- • vices will be tested by building a proof-of-concept system, which is piloted in real life scenarios.

Promote complex KOSs. The system should support not only simple, • but also richer knowledge organization systems.

Based on these objectives that guide the design and implementation of the ontology services this thesis seeks to find solutions for the following research questions (RQ):

1. How can lightweight ontologies be published on the Semantic Web so that they can be utilized in content indexing and information retrieval tasks?

2. How can a collection of independent or interconnected ontologies—in dif- ferent formats and repositories—be published and utilized using shared user interfaces and APIs?

3. How can richer knowledge organization systems, such as biological nomenclatures and classifications, be managed as an ontology and pub- lished using an ontology service?

The research questions are answered with the publications I–VIII. Table 1.1 shows which research questions the individual publications contribute to. The contributions of the publications are summarized in Chapter 3.

1.3 Research Process and Dissertation Structure

The research presented in this thesis has been conducted by applying the methodologies of design science [182, 116, 211] and action research [24, 42,

16 Introduction

Research question PI PII PIII PIV PV PVI PVII PVIII RQ1 x x x RQ2 x x RQ3 x x x

Table 1.1. The relationship between the research questions and the publications.

62]. Design science is a technology-oriented paradigm in the information sys- tems discipline that aims to create things that serve human purposes [182]. The significance of a research is determined by the value or utility it provides—does it work, is it an improvement? Instead of new theories, the outcomes of design science are innovative and useful artifacts, which include constructs, models, methods, and implementations. The process of design science includes two phases: building and evaluation. The nature of design science tends to be applied: it exploits knowledge created by basic research to develop new technologies. However, the created artifact and its working environment might not be well understood, and in such case the artifact itself presents new scientific questions. By designing, building, and applying an artifact, knowledge and understanding about a problem domain and its solution is achieved [116]. As opposed to routine design and systems building work, design science builds novel ways to solve im- portant, unsolved problems or provides more effective or efficient ways to address previously solved problems. The solutions are generalizable and provide new knowledge for the application domain. The applicability of the artifact is evaluated in real-world scenarios by observational, analytical, experimental, testing, or descriptive methods. Complementing the technological aspect of design science, action re- search emphasizes the social elements of information systems research. In an action research setting, scientists and the subjects of the study col- laborate in order to study and solve problems in organizations [24]. The research involves two phases: the diagnostic stage to analyze the current situation and the therapeutic stage to carry out changes to improve the situation. In contrast to case studies, in action research the researcher is involved in the studied phenomenon and the research is carried out in a more rigorous way [23]. The rigor is ensured by following the action research cycle: diagnosis, action planning, action taking, evaluating, and specifying learning. The design of the ontology services presented in this thesis is based on

17 Introduction analyzing the user requirements of ontology users and existing ontology server implementations by a) conducting a literature and system review, b) formulating illustrative use scenarios, and c) running a prototype system as a living lab service, gathering feedback from the actual users of the system. Based on the cyclic nature of design science and action research, similar methods have been used to evaluate the purposefulness of the developed ontology services. The prototype system itself acts as a proof of concept, demonstrating the utility or suitability of the software artifact for the given requirements [210]. Furthermore, using the system in an action research setting in real use cases evaluates the effects of the system use in real-world situations. By basing the functionalities of the system on existing research and illustrative use scenarios, the utility of the system is ensured. This thesis is organized as follows. The theoretical background of the work is presented in Chapter 2. Building on the theory and based on the publications included, the results of the thesis are summarized in Chapter 3. Finally, the implications of the results, the validity of the work, and further research are discussed in Chapter 4.

18 2. Theoretical Foundation

2.1 Modeling Knowledge Organization Systems

2.1.1 Knowledge Organization Systems

Knowledge organization systems (KOS) originate from library and infor- mation science, where they are used as schemes for organizing information and promoting knowledge management [124, 95]. Examples of different types of KOSs include classification schemes, subject headings, authority files, taxonomies, thesauri, and ontologies [124, 119, 238, 95]. They provide a controlled vocabulary in the given domain of interest, and harmonize the terminology used to describe the information items in information collections, e.g., in digital libraries or document databases. In its simplest form, a controlled vocabulary is a list of terms, where each term corresponds to a concept of the domain. It can also include other information about the terms, such as synonyms, descriptions, and source information. A taxonomy arranges the terms in a controlled vocabulary into a hierarchy, aiding the users selecting a suitable term in, e.g., content description or information retrieval [91]. Extending taxonomies, thesauri may include richer information about the terms, such as associative rela- tions between them [91]. Guidelines for creating, displaying, and managing thesauri are documented in international and national standards, such as ISO 25964 [6, 66] and SFS 5471 [2]. Thesauri and other controlled vocabularies are used primarily for im- proving information retrieval [11, 236]. This is accomplished by using the concepts or terms of a thesaurus in content indexing, content searching, or in both of them, thus simplifying the matching of query terms and the indexed resources (e.g., documents) when compared with using free,

19 Theoretical Foundation uncontrolled natural language. The relations of thesauri can be utilized in information retrieval, for example by expanding query terms to more specific terms based on the concept hierarchy. Multilingual thesauri can be used for cross-language search where the information contents or the related metadata are expressed in different language than the one used by the end user. Knowledge Organization Systems, such as thesauri, are of great benefit for the Semantic Web [19, 282, 193, 106, 262, 278], enabling semanti- cally disambiguated data exchange and integration of data from different sources, though not to the same extent as ontologies [240] where the se- mantics of concepts is defined in more refined and machine-understandable ways [232, 10]. Ontologies based on thesaurus-like structures can be called lightweight ontologies [82, 159, 93, 143]. The Simple Knowledge Organization System (SKOS) [188, 19, 189] de- veloped within W3C is a data model and a syntax for expressing concept schemes such as thesauri, and is largely compatible with the ISO 25964 thesaurus standard [66, 139]. SKOS provides a standard way for cre- ating vocabularies and migrating existing vocabularies to the Semantic Web. SKOS solves the problem of diverse, non-interoperable thesaurus representation formats by offering a standard convention for presentation. Existing thesauri can be transformed into SKOS format via conversion processes [261, 245, 193, 237, 282]. When a thesaurus is expressed as a SKOS vocabulary, it can be processed with standard RDF/SKOS tools in a uniform way. There are also methods for converting thesauri into semantically richer OWL ontologies [262, 136, 44, 162]. Compared with SKOS conversion techniques, the OWL-based methods cannot be fully automated, as they re- quire human effort for refining the semantic relations of the concepts [162]. Especially the is-a hierarchy of the ontology needs to be carefully con- structed since the hierarchy of a thesaurus may have been built using a mix of different hierarchical relations [136, 162], which cannot be used as is for, e.g., subclass reasoning. The use of existing thesauri as the basis for ontologies enables the backwards compatibility with legacy data annotated with the thesauri, and facilitates the publication of the data as Linked Data. This thesis seeks to develop publication methods for the cost-effective utilization of KOSs in, e.g., content indexing and information retrieval. In this context, SKOS is applied as a harmonizing model for representing

20 Theoretical Foundation

KOSs.

2.1.2 Interlinked Ontologies

In Linked Data paradigm, entities can be linked on many levels: data instances [112, 140, 103], metadata schema fields [267], and concepts of ontologies [286] can be interlinked to facilitate interoperability between datasets. Linking the data through ontologies allows additional interoper- ability due to the inferred knowledge gained through the shared ontology semantics [121]. When integrating datasets that use different ontolo- gies (or KOSs), the ontologies need to be reconciled. Ontology reconcili- ation [104, 283] is a broad term, covering ontology merging, alignment, and integration. Most of the reconciliation methods are automatic or semi- automatic, which can lead to lower quality [32], especially if the ontologies were originally expert-made [73]. To faciliate the interoperability between different ontologies, there have been efforts to establish guidelines for the creation and management of ontologies, e.g., in the context of the OBO Foundry initiative [235]. The focus in OBO Foundry is on coordinating the development of different ontologies in the biomedical domain under shared principles. General, domain-independent ontology design principles have been proposed by several researchers [99, 260, 101, 191, 273, 89]. Linked Open Vocabularies (LOV) [267] is an effort on building a high quality catalogue of reusable vocabularies, and making their interconnections visible. Instead of KOSs, LOV focuses on metadata schemas. Ontology linking methods are used also in ontology modularization [243, 7], where ontologies are divided into smaller interlinked parts to facilitate distributed development and re-use. There have been several efforts on building a general upper ontology [184] that can be used as a foundational basis for domain ontologies. Some of the upper ontologies have been developed from scratch, such as CYC [170], while, e.g., the Suggested Upper Merged Ontology SUMO [194] was created by merging existing ontologies. For ensuring the consistency of interlinked ontologies, the changes of the ontologies have to be communicated to the dependent ontologies, e.g., by applying methods from the field of ontology evolution [281, 111, 169, 81, 128]. Methods include the detection of changes in an updated ontology by using logs [242, 157, 144] or comparing two versions of the ontology [164, 272]. To facilitate the processing of changes, different change types can be

21 Theoretical Foundation identified [246, 186], or more abstract change patterns can be constructed from atomic changes [160, 157, 144]. There has also been research on the nature of the change types the users are most interested in [37]. Also, the extra challenges of distributed ontology development [160, 241, 175, 146] have to be addressed when operating with interlinked ontologies. This thesis aims to design methods and tools for managing and publishing an interlinked cloud of cross-domain ontologies in such a way that they can be utilized using shared user interfaces and APIs. The approach is based on modularizing the ontology development work into an upper ontology and domain ontologies extending it, and keeping track of their semantic dependencies.

2.1.3 Biological Nomenclatures and Taxonomies

Management of names and taxonomies of organisms in biology is an example use case for the need of a complex KOS that cannot be rep- resented using simple, general KOS presentation languages, such as SKOS. In biology, taxonomy refers to the discipline that identifies, de- scribes, and names groups of organisms (taxa) based on their shared characters [150]. The organism groups are organized into taxonomic hi- erarchies. Taxon names and classifications are important when integrat- ing biological data from multiple sources [225, 201, 145, 253], and are therefore considered central resources, for example in biodiversity man- agement [25, 200, 228, 215, 85, 86, 105, 206, 219]. The changing nature of the names poses challenges for their management [148, 166, 208, 229]. There are several issues that make the biological nomenclatures and classifications a suitable domain for the study of the modeling and pub- lishing a rich KOS as an ontology. 1) Biological names are not stable or reliable identifiers for organisms as they or their meaning change in time. 2) The same name can be used by different authors to refer to different taxa, and a taxon can have more than one name. 3) Taxonomic knowledge is changing all the time and increases due to new research results. The number of new organism names in biology increases by 25,000 every year as new taxa to science are discovered [163]. At the same time, the rate of changes in existing names has accelerated by the implementation of molecular methods suggesting new positions to organisms in taxonomies. 4) The notion of ’species’ in the general case is actually very hard to define precisely. For example, some authors discuss as many as 22 different ways of defining the concept of species [185].

22 Theoretical Foundation

Figure 2.1. A hypothetical example of changes in taxonomic concepts and taxon names in time. First, two separate species A and B partly overlap (a). Then the two species are merged into a single species that has the name A, and the name B becomes a synonym to the name A (b). Finally, the species A is split into two separate species. The name A remains a valid name with a narrower meaning, and the name C is given to the new taxonomic concept. The black squares illustrate the biological characters of organisms, and the ellipses describe the limits of taxonomic concepts.

Although biological naming convention is regulated by nomenclature codes, e.g., International Code of Zoological Nomenclature (ICZN) [138] and International Code of Nomenclature for algae, fungi, and plants (ICN) [187], the names cannot be used as reliable identifiers when re- ferring to taxa due to their ambiguity. Figure 2.1 depicts a typical series of changes in taxonomic concepts and their names. The border between the two species is unclear, and later the two species are merged into a single one, which is finally split into two (or more) species again. The taxon names may or may not change accordingly. Both the taxonomic concepts and their names change over time, and tracing the meaning of a name is impossible without a reference to a study. The reason for the continual changes is that every study has a different set of taxa, biological characters, and methods, and consequently their results are different. A species checklist is a collection of names of organisms of a certain taxonomic group, compiled into a single taxonomic hierarchy by scientific experts. Checklists often present the species occurring in a particular geographical area. Comprehensive reference lists and catalogues of the names have been proposed as a solution to facilitate the access to the names and harmonize their usage [285, 105, 65, 208, 225, 173]. The need for such a list has been recognised, e.g., for vascular plants by the Convention on Biological Diversity (CBD) [5]. There are efforts on curating or aggregating taxon names from authorita- tive sources covering all species groups in the world, such as the Catalogue of Life (CoL) [223], the Encyclopedia of Life (EoL) [207], the Universal

23 Theoretical Foundation

Biological Indexer and Organizer (uBio) NameBank [225], WikiSpecies1, the NCBI Taxonomy database [76], Open Tree Taxonomy [120], GBIF ChecklistBank [94], Index to Organism Names (ION)2, and BioNames [203]. Other efforts are focusing on specific taxonomic groups or regions, such as ZooBank [215], the International Plant Names Index (IPNI) [55], World Register of Marine Species [51], Fauna Europaea [64], and the Atlas of Living Australia3. Berendsohn [25] introduced the concept of a ”potential taxon” to overcome the name ambiguity issues. A potential taxon is a combination of a taxon name and a literature reference to the taxonomic concept, that can be used, e.g., in databases for taxon references, enabling the interlinking of differing taxonomic views [26]. The importance of persistent identifiers for organism names has been further discussed by several researchers [200, 209, 225, 205, 219]. Pullan et al. [214] presented the Prometheus data model for manag- ing taxonomic nomenclature and multiple related classifications sepa- rately, while Ytow et al. have developed the Nomencurator data model for representing and managing taxonomic nomenclature in a relational database [280]. Page [200] presented a simple data model for present- ing taxon names and their relations, using using Life Science Identifiers (LSID) for identifying them. The use of LSIDs has been suggested also by organizations publishing taxonomic data [212, 56, 221], and piloted, for example in the Catalogue of Life database [148]. Further, Schulz et al. [228] presented an ontology model of biological taxa and its application to physical individuals. The model is based on a single unchangeable clas- sification. Franz and Peet [85] formulated the use of semantics in relating taxa to each other, within a single taxonomic hierarchy and between two distinct hierarchies. Franz and Thau [86] evaluated the limitations of applying ontologies to the scientific names and concluded that ontologies should focus either on a nomenclatural point of view or on strategies for aligning multiple taxonomies. As a use case for taxonomic ontologies, Lepage et al. [171] have im- plemented the Avibase database system for managing and organizing taxonomic concepts from major bird taxonomic checklists. There are also practical efforts on publishing taxonomic concepts as Linked Data, such

1http://species.wikimedia.org 2http://www.organismnames.com 3http://www.ala.org.au

24 Theoretical Foundation as Taxonconcept.org4 and Geospecies5 that aim to provide Linked Open Data identifiers for species concepts and link them to related data from different sources. Chawuthai et al. [48] have presented an ontology model for managing the change of taxonomic concepts and publishing them as Linked Data. In addition to data models that are focused on taxonomic information, there are metadata schemas for exchanging biodiversity data on broader scope, such as Darwin Core (DwC) [61, 277], the related Taxonomic Concept Transfer Schema (TCS) [251, 155], created by the Biodiversity Information Standards (TDWG), Access to Biological Collection Data (ABCD) [125], and Biological Collections Ontology (BCO) [271]. Darwin Core has a set of taxonomic extensions, Global Names Architecture (GNA) Profile [220], that introduces properties for richer nomenclatural details of taxa. Also, a semantically refined version of Darwin Core, Darwin-SW [22], has been proposed. Methods of ontology versioning [161], evolution [195], and matching [233, 74] are relevant in management of taxonomic and nomenclatural informa- tion, as there exist multiple views on taxonomy and taxonomic knowl- edge changes as new research results are published. Thus, systems that support the usage of multiple versions of ontologies [128] and con- cepts [252] simultaneously are needed. There are approaches that are focused on life science ontologies, providing support for mapping ontolo- gies [97, 110, 158], and systems for matching taxonomic concepts between databases [115, 202, 148, 51, 36, 218, 266, 84]. This thesis uses biological nomenclatures and classifications as an exam- ple use case of modeling and managing a richer KOS as an ontology. The focus is on practical management of the names and their changes.

2.2 Publishing and Using Knowledge Organization Systems

2.2.1 Ontology Servers

Once an ontology is modeled and serialized in some format, such as SKOS or OWL, it can be published for the wider community to be used as a shared domain model. In order to facilitate the usage of ontologies, on-

4http://www.taxonconcept.org 5http://lod.geospecies.org

25 Theoretical Foundation tology servers [79, 68, 9, 108, 60] have been proposed for publishing on- tologies and vocabularies on the web. Together with ontologies they have been considered a key resource for enabling the vision of the Semantic Web [136, 18, 108, 60]. The motivation for publishing ontologies using ontology servers instead of making them available as mere data files is to support the use of ontologies in applications. Ontology servers can provide ready-to-use services that can be integrated into information systems in a cost-effective way. Without such services, the user organizations have to implement the common functionalities for accessing ontologies in their own systems, leading to redundant work. Parallel terms with ontology server are ontology library, ontology reposi- tory, and ontology service, each with a slightly different emphasis on the topic. In addition, the community of Networked Knowledge Organization Systems (NKOS)6, that aims to develop web-based information services to support the description and retrieval of diverse information resources us- ing KOS, uses the terms terminology registry and terminology services [95]. Common to these systems is that they are intended for publishing, manag- ing, sharing, finding, and reusing ontologies and vocabularies for content indexing, information retrieval, content integration, and other purposes. Traditionally, the main focus in ontology server systems has been in sup- porting ontology development instead of the runtime usage of ontologies such as indexing and ontology-based end-user applications [68, 9]. The features of the systems vary greatly as they are designed for different purposes and based on specific user requirements [60]. An ontology server can support different phases in the ontology lifecycle, which can be defined as 1) design, 2) commit, and 3) runtime [9], or in a more fine-grained way as 1) acquisition, creation, and modification of vocabularies, 2) publication of vocabularies, 3) access, search, and discovery, 4) use, and 5) archiving and preservation of vocabularies [95]. The different lifecycle phases involve different user groups, which can be classified into ontology developers, ontology users, and application developers [60], or similarly in the NKOS community into KOS owners or creators, end users, and system developers [95]. The design phase covers tasks involved in ontology development, such as ontology engineering or editing, storing, versioning, mapping, and publishing. Once an ontology is published, the commit phase refers to the activity where a user is trying to find a suitable ontology for her needs, and needs support for discovering and evaluating

6http://nkos.slis.kent.edu

26 Theoretical Foundation candidate ontologies. In the runtime phase, the user needs tooling for finding concepts for a given task, such as content indexing. Based on a survey of ontology servers, d’Aquin and Noy [60] have identified the key functions of an ontology server to be search, browsing, selecting and evaluating ontologies, and programmatic access to ontologies. Similar features have been recognized also by other researchers [108, 18, 95]. Most ontology servers are web-based systems that catalogue ontologies on a specific domain, such as biomedical sciences [235, 279, 196, 52], agri- culture7, oceanography [224, 167], government8, by a specific organization, such as Library of Congress Linked Data Service9, or with no such restric- tion. They provide access mechanisms to ontologies as user interfaces, APIs, or both of them [9, 60]. The user interfaces typically include search and browsing functionalities for finding ontologies and concepts in them. Search functionalities can be provided as string search based on the labels or other textual properties of the concepts, and utilizing the semantic rela- tions of the concepts. Browsing interfaces typically visualize the structure of an ontology as a hierarchical tree or as a graph, where the concepts are presented as nodes and the relations as arcs between them. Ontology servers can provide users with listings of available ontologies, which are often classified based on different criteria. In some implementations, the set of ontologies can be further filtered and investigated using a faceted search. Some ontology servers have been implemented for publishing a single, specific ontology, such as SUMO Browser [230] and GTAA Browser [40], whereas some systems focus on providing a directory-like listings of ontologies, such as DAML Ontology Library10, Protégé Ontology Library11, OBO Foundry [235], oeGOV, OntologyDesignPatterns.org [33], SHOE ontology library [113], Basel Register of Thesauri, Ontologies & Classifications (BARTOC) [168], and TaxoBank [122]. There are sys- tems that focus on the ontology development and editing functionalities, such as iQvoc [20], PoolParty [226], VocBench [43], TemaTres12, SKOS Shuttle13, TopBraid Enterprise Vocabulary Net [255], SKOS editor [50],

7http://agroportal.lirmm.fr 8http://oegov.us 9http://id.loc.gov 10http://www.daml.org/ontologies/ 11http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library 12http://www.vocabularyserver.com 13http://skosshuttle.ch

27 Theoretical Foundation

PeriodO gazetteer [231], Neologism [21], Open Metadata Registry [213], WebOnto [69], Adapted Ontology Server [49], Ontolingua Server [75], and Medical Ontology Server [92]. Inter-ontology relations are considered important in several ontology servers, such as BioPortal [196], ACOS [172], MMI Ontology Registry and Repository [224], CATCH Vocabulary and alignment repository [264], ROMULUS [156], and Ontohub [190], as they support the creation and representation of concept mappings. There are systems that emphasize community-based aspects, such as uploading, rating and commenting on the ontologies. These ontology servers include, e.g., BioPortal, ACOS, and CupBoard [58]. There exist several search engines that crawl the web for RDF data and index them, such as Swoogle [67], Watson [59], and Sindice [198] for any data, whereas OntoSelect [41] and Falcons [217] are designed especially for ontologies. OntoSearch2 [254] is a similar ontology search engine, but it uses its own repository as opposed to crawling the public web. Such systems can be useful when searching for suitable ontologies to use in applications, and provide an overview of web-published ontologies in a specific domain. Many ontology servers provide APIs for accessing the ontologies, typi- cally for querying for ontologies and/or their concepts, and getting infor- mation about them. There are several specifications for APIs, such as Foundation for Intelligent Physical Agents (FIPA) Ontology Service [4] and Ontology Web Services (OWS) [57]. There are API implementations for processing ontologies in programming languages, e.g., the Java-based SKOS API [151], OWL API14, and the APIs in DOGMA Server [141] and KAON Server [197]. For accessing ontologies on the web, there are several ontology servers that provide Web Service (SOAP) APIs, e.g., SKOS Web Service API [257], OCLC Terminology Services [269], Ontology Lookup Service (OLS) [52], Watson, CATCH Vocabulary and alignment reposi- tory, and NERC Vocabulary Server [167]. A more recent approach is to provide access to ontologies through a RESTful HTTP API, such as in the case of SISSVoc [54], OCLC Terminology Services, Otago Ontology Repository [204], BioPortal, iQvoc, PoolParty, NERC Vocabulary Server, HIVE [96], TemaTres, and SKOS Shuttle. SPARQL [107] is the standard way to provide an application interface to Semantic Web databases, and it can be used also to access ontology repositories. Similarly, general RDF

14http://owlapi.sourceforge.net

28 Theoretical Foundation libraries can be used for processing ontologies, such as Apache Jena15 and RDFLib16. Ontology servers such as OCLC Terminology Services and BioPortal provide widgets, user interface components that can be inte- grated into applications, to enable ontology-based search functionalities in applications. To facilitate simultaneous access to several ontology servers, shared access mechanisms and protocols are needed. Open Ontology Repository (OOR) [18] is an initiative that aims to specify an architecture and in- terfaces for interoperability between ontology repositories. Also other researchers have emphasized the importance of interoperability between ontology repositories [60, 108]. Ontohub is an ontology repository engine that follows the ideas of the OOR initiative, and provides inter-repository access by defining a generalized federation API that needs to be imple- mented in the participating repository or as a wrapper around the legacy API of the repository. Common Ontology API Tasks (OntoCAT) [8] is a programming interface to query multiple ontology repositories seamlessly from an application. The system is based on wrappers that are imple- mented for each supported ontology repository. OLS2OWL [90] is a plugin for Protégé ontology editor enabling simultaneous queries to multiple on- tology servers by using a similar wrapper approach as in OntoCAT. JSKOS- API [168] is a general HTTP API for accessing knowledge organization systems, with methods for concept search and lookup. By implementing the API in multiple ontology servers or using wrappers, it is possible to provide inter-repository search and browsing interfaces. There are also general protocols for accessing knowledge bases, such as the Open Knowledge Base Connectivity (OKBC) [47], and agent communications languages FIPA- ACL [3], the Knowledge Query and Manipulation Language (KQML) [80], and the Semantic Agent Programming Language (S-APL) [152]. Regarding the different aspects of ontology servers, the focus of this thesis is to provide publication channels for ontologies in different formats and support for their runtime use, e.g., in a network of distributed content creation. One of the design principles is the support for the use of multiple ontologies and ontology repositories simultaneously.

15https://jena.apache.org 16https://rdflib.readthedocs.io/en/stable/

29 Theoretical Foundation

2.2.2 Semantic Annotation and Information Retrieval

The typical use cases for ontologies and other knowledge organization sys- tems are semantic annotation and information retrieval. Both these tasks can be facilitated with ontology services. In ontology-based manual an- notation human cataloguers create content descriptions using ontological concepts [159, 199, 16]. The annotation process is usually guided by index- ing guidelines and conventions that may be general [1, 15] or shared by particular disciplines or organizations. The created metadata can be based on a specific schema, which can be a simple collection of key-value-pairs, such as Dublin Core [63], or a semantically richer model with relations between the individual metadata fields [250, 227]. The annotation task can be defined as a process of analyzing the content to be annotated, find- ing relevant ontological concepts from the selected ontologies, and storing them in metadata fields. The process can be streamlined and made more effective with different kinds of automated tools [17]. Based on a user study, Hildebrand et al. [118] have identified the follow- ing use cases of a human annotator for a concept search:

The user already knows the concept she would like to use as a descriptor • for the content, and wants to find the concept from the used thesauri.

The user does not know the most suitable concept for the content descrip- • tion beforehand and she needs to examine the thesauri to find one.

The user suspects that the used thesauri do not contain a concept she • would need for the task, and she needs to ensure this before she adds a new concept into one of the thesauri.

The human annotator’s task to find the best matching concepts for her needs can be aided by providing concept search and browsing functional- ities. These general functionalities can be provided by ontology servers as APIs and user interface components that can be used for integrating them into applications. The importance of such services for thesaurus use, sharing, and interoperability has been emphasized by several re- searchers [95, 183, 216, 284, 30, 119, 124]. Such an approach relates to the notion of service-oriented architecture (SOA) [231, 70, 154], where software components are provided as technology independent services to other applications to be used over a network through a communication protocol.

30 Theoretical Foundation

Regarding the ontology services discussed in the previous section, OCLC Terminology Services provides a widget for querying controlled vocabu- laries and displaying information about their terms in the sidebar of the Internet Explorer browser, and transferring selected terms into a web- based cataloging application [269]. BioPortal provides web widgets that can be integrated into web applications, for concept search, selection, and visualizing ontologies [196]. The use of SKOS Web Service API as web widgets has been demonstrated in the STAR project by providing function- alities for concept search, expansion, and presentation [31]. Hildebrand et al. [117] have implemented a configurable autocompletion search widget for RDF repositories, based on which Amin et al. [14] have conducted a user study on the organization strategies for the autocom- pletion suggestions to provide effective means of navigation and finding relevant terms. Further, Hildebrand et al. [118] performed a user study in which museum professionals used the widget for annotation. They reported on different strategies for matching, sorting, grouping, and displaying con- textual information for the autocompletion suggestions. Malaisé et al. [179] conducted a user study on expert annotators using GTAA Browser, and concluded that the users mainly used the alphabetical search functionality to find relevant concepts, whereas the concept hierarchy browser was not used that much. The ability of the information retrieval systems to produce relevant search results depends on the user’s ability to represent her information needs in a query [270]. If the vocabularies used by the user and the system are not shared, or if the vocabulary is used in different levels of specificity, the search results are usually poor. Query expansion has been proposed to solve these issues and to improve information retrieval by expanding the query with terms related to the original query terms [45]. Query expansion can be based on a corpus, e.g., analyzing the co-occurrences of terms, or on knowledge models, such as thesauri [83, 274] or ontologies [270]. Methods based on knowledge models are especially useful in cases of short, incomplete query expressions with few terms found in the search index [270, 274]. Ontology-based query expansion can be used interactively to guide the user to formulate his query, for example by providing an autocompletion text search for disambiguating and selecting ontological concepts [132], and automatically by adding concepts to the initial query based on their ontological relations [34, 123, 29, 180, 149]. Typical relationships used in

31 Theoretical Foundation query expansion are the synonym, hyponym, hypernym, and associative relations [14, 126, 147, 142, 72]. When considering general associative relations, caution should be exercised as their use in query expansion can lead to an uncontrolled expansion of result sets, and thus to potential loss in precision [256, 126]. In an experiment by Navigli and Velardi [192], it was noted that extracting the query expansion terms from the sense definitions of the query terms from an ontology produced better results than using the taxonomic relations (e.g., synonym, hypernym). Ontology-based query expansion can also be used for cross-language information retrieval [45], and in addition to general-purpose ontologies, domain-specific ontologies can be utilized, e.g., for spatial query expansion [88]. A concrete example of a terminology service with query expansion support is the FACET Web demonstrator [30] that provides a web service and user interface for accessing thesauri. This thesis aims to develop practical solutions and tools for ontology- based content annotation and information retrieval. The common user tasks of such workflows are catered by providing user interface components and APIs that can be integrated into external applications.

32 3. Results

In the following, the results to the research questions of this thesis are presented. Furthermore, the results are reflected against previous research in Chapter 4.

3.1 Ontology Services (RQ1)

The research question 1 concerns publishing thesauri in such a way that they can be easily and cost-effectively used in ontology-based workflows, especially in content indexing and information retrieval.

1. How can lightweight ontologies be published on the Semantic Web so that they can be utilized in content indexing and information retrieval tasks?

The publications I–III provide solutions for this question by presenting ontology services. The main idea is to publish the ontologies not only as data, but as services that can be used by humans and machines for integrating ontology-based functionalities into applications, provided as user interfaces and application programming interfaces (API). The func- tionalities of the developed ontology services were developed based on the analysis of the user groups of ontologies, acting in different phases of the life cycle of an ontology. The main user groups of ontologies were identified as 1) ontology de- velopers, 2) content creators, 3) information searchers, and 4) software developers. In Publication I, their needs for using ontologies are classified into the following tasks.

1. Designing the ontologies. The structure and modeling principles of an ontology are developed based on the analysis of the subject domain

33 Results

and the use cases of the ontology. Ontology developers need tools for creating the ontology, collaborative editing, reuse, and alignment.

2. Populating the ontologies. An ontology may contain a high amount of instances, e.g., people, organizations, and places. To save efforts, they can be harvested from existing sources, or collected from the end users. Tools may provide support for content collection and updating processes.

3. Publishing the ontologies. To promote the use and reuse of an on- tology, the ontology owners can make it accessible to the users. Prior to publishing the ontology, its quality can be ensured with manual and automatic methods.

4. Finding, comparing, and committing to ontologies. When there is a need to use an ontology in an information system, the information architect of the system needs support for selecting a suitable ontology for her needs.

5. Ontology-based semantic application creation. Application de- velopers need to learn, evaluate, and apply methods for integrating ontologies into applications, e.g., by using user interface components and APIs.

6. Ontology-based semantic content creation. The work of content indexers can be facilitated by providing tools, e.g., for browsing the ontolo- gies, searching and selecting concepts, and (semi-)automatic indexing.

7. Ontology-based end-user applications. Application developers can utilize an ontology for building end-user applications, such as semantic portals, to facilitate information findability. The ontology can function as an educational resource for end users to learn about its domain.

The common functionalities for utilizing ontologies in ontology-based applications of different kinds were identified as concept search, browsing, and selection. The developed user interfaces, widgets, and APIs are de- signed to support these basic tasks. The proposed ONKI ontology service is based on several implementations of ontology servers suited for ontologies of different kinds due to distinct needs for accessing them. For example, a natural way to display a thesaurus is a tree-like visualization, whereas the users of geographical ontologies may prefer map-based user interfaces. Publication II presents an ontology server for thesaurus-like, lightweight ontologies, the ONKI SKOS system. The system supports publishing of

34 Results syntactically, semantically, and structurally differing thesaurus formats, with the requirement that the thesaurus has to be in RDF format and have some basic structure including concepts, their labels, and possible inter-concept relations. Existing thesauri in legacy formats can be converted into W3C’s SKOS data model, which acts as a harmonizing model for expressing knowledge organization systems. Such thesauri can be published in the ONKI SKOS server, which then provides functionalities for the users of the ontology. Thus, organizations developing thesauri do not have to implement their own thesaurus-specific publication systems and the users can use different thesauri with shared tools. The real-life benefits of using the ONKI SKOS server have been demonstrated by applying the system to use cases of a) content indexing in health promotion and cultural heritage context, among others, and b) information retrieval in the collections of the Finnish museums of forestry. The ONKI service enables content creation in a distributed network of organizations, where each organization uses shared ontologies and ontology services for accessing them, thus harmonizing the created metadata and facilitating information integration. Publication III gives a detailed view on how the query expansion facil- ities of the ontology service are used in practice in information retrieval scenarios. The query expansion widget uses the semantic relations of the ontology to refine the query with additional query terms to increase the recall of the search. For example, if the user is searching for ””, the query can be expanded to include also ”cats” and ”dogs” based on the concept hierarchy. As a proof of concept for the process of converting a legacy thesauri into the SKOS data model and publishing it in the ONKI SKOS service, the case of Finnish General Thesaurus YSA, is reported. YSA has been developed in the National Library of Finland since 1987 and is widely used in libraries, museums, and archives in Finland. In addition to YSA, over 80 national and international vocabularies, taxonomies, and ontologies are published in the ONKI SKOS system. The user interface of the ONKI service, including the ontology directory, search view, and ontology browser have been developed in an iterative process where different versions of the system have been published and made accessible online as ONKI1 (Publication I), ONKI2 [258], ONKI3 [12], and ONKI Light [248]. The ONKI system has been run as a living lab ontology service since the official announcement in the autumn of 2008,

35 Results and before the successor service, Finto1 of the National Library of Finland, was publicly released in January 2014, the ONKI service had over 10 000 monthly users (excluding the widget and API users), with 400 registered user domains for the widget and API use.

3.2 Publishing Multiple Ontologies (RQ2)

The research question 2 extends the research question 1 by introducing the dimension of multiple ontologies and ontology repositories used simul- taneously.

2. How can a collection of independent or interconnected ontologies—in dif- ferent formats and repositories—be published and utilized using shared user interfaces and APIs?

Such a multi-ontology scenario is prevalent in many cases, for example when indexing content with more than one ontology, or when trying to find and choose a relevant ontology for a specific use case. The publications IV and V present methods for solving issues in such use cases. Publication IV discusses the publication of an ontology cloud consisting of individual, interconnected ontologies. The system is based on an upper ontology and domain-specific ontologies extending it. The publication process involves merging the component ontologies into a single, coherent representation, and using the ONKI service to publish it to end users. The structure of the ontology cloud appears as a single whole to the users, without emphasizing the ontology boundaries. Thus, the users should be able to use it in a straightforward way, for example in content indexing, focusing only on the main task of choosing relevant concepts, not ontologies. The motivation for building such an ontology cloud is to facilitate the data integration of heterogeneous datasets from different domains, using domain-specific ontologies. The approach complements other existing techniques in data integration on the Semantic Web—data entity linking in Linked Open Data (LOD) and metadata schema linking in Linked Open Vocabularies (LOV). The formation of an ontology cloud can be more efficient since the mappings between ontologies can be reused for different datasets. Using an upper ontology as the base for the cloud instead of mapping individual ontologies on a one-to-one basis eliminates redundant

1http://finto.fi

36 Results mapping work. For building such an ontology cloud for end users, ontology development and management processes need to take into account the semantic depen- dencies between the individual ontologies. This includes the identification of conceptually overlapping parts of the ontologies to avoid redundant development work, and the communication of the changes of the upper on- tology to the domain ontologies extending it. The feasibility of the approach was demonstrated in practice by building the ontology cloud KOKO of more than 47,000 concepts, based on the Finnish General Finnish Ontology YSO and 15 domain ontologies. The ontologies were developed by a network of domain experts, where each organization took responsibility for managing a single domain. Based on the experiences in building KOKO, a set of seven principles guiding the building and management process of the cloud, was formed. The principles are general enough to be applied to other ontology clouds as well. The principles aim to streamline the ontology development process and ensure the semantic integrity of the resulting cloud, especially concerning the transitive subclass hierarchies of concepts. The developed approach of the proactive linking of ontologies as part of their development phase instead of mapping them afterwards aims to minimize redundant work and maximize interoperability. Publication V discusses an environment consisting of several ontology repositories, where users need to access the repositories concurrently. The proposed Normalized Ontology Repository (NOR) approach allows access- ing ontology repositories using shared tools and user interfaces based on a) harmonizing the representation of concepts in ontologies by using the SKOS data model, and b) providing a uniform API that encompasses the general ontology access needs, such as functionalities for getting the meta- data of ontologies in the repository, searching for concepts, and querying their properties. Based on the approach, it is possible to give the user an overview of the ontologies available in a set of ontology repositories. This helps the user to choose a suitable ontology repository or a specific ontology for her needs. The user can browse the ontologies using a uniform browser view, eliminating the need for learning to use ontology-specific user interfaces. The NOR API and concept representation is an extra layer on an ontology repository, meaning the functionalities of the repository are not restricted in any way. When the user has found a suitable ontology, she can move from the uniform NOR browser to the possible own, specialized user inter-

37 Results face provided by the underlying ontology repository. This mechanism is motivated because different ontologies might benefit from access mecha- nisms of different kinds, such as user interfaces and APIs. The system also allows the simultaneous usage of public ontology repositories and private repositories of organizations. To demonstrate the feasibility of NOR, it has been applied in real-life use scenarios. The ONKI ontology repository itself uses such an approach for providing a uniform user interface for searching and browsing over 80 vocabularies and ontologies published in the ONKI SKOS backends. The backends encompass ontologies in different RDF-based formats, such as SKOS and RDFS, which are accessible via an HTTP API that provides a concept search functionality and normalizes the representation of the concepts. The ONKI frontend provides users with a possibility to perform a global search to all the ontologies at the same time. The system also in- cludes a directory listing of all the ontologies available and offers a faceted search view to them, so the user can find such ontologies, for example, whose subject domain is ”business” and are published by ”FinnONTO Consortium”. The directory listing is built using a uniform metadata representation of the ontologies in the system. ONKI Widget uses the NOR approach in an even more heterogeneous environment, by providing users with access to not only ONKI SKOS backends, but also to the ONKI Geo ontology server [131], offering access to the geographical ontology of Finnish contemporary place names. The approach has also been tested by implementing a metasearch prototype for accessing the ontologies of ONKI and BioPortal repositories simultaneously, and even for accessing more general data repositories than ontology repositories, the CultureSampo portal [178] and SAHA metadata editor [165]. The relationship between the ontology cloud and NOR methodologies is a complementary one. The ontology cloud enables interoperability on the ontology level, whereas the NOR approach is focused on compatibility on the ontology service level. Building an ontology cloud requires mapping effort between the domain ontologies and the general upper ontology, and thus makes evident the relations between datasets described using different domain ontologies. On the other hand, in the NOR approach the ontologies are presented using a harmonized data model, but there is no need to map the ontologies to each other on the concept level. Thus, the system can be used for simultaneously accessing and processing even a set of mutually unrelated ontologies with shared tools.

38 Results

3.3 Complex Knowledge Organization Systems (RQ3)

The research question 3 concerns the applicability of the Semantic Web technologies to managing richer KOSs as ontologies and publishing them using ontology services.

3. How can richer knowledge organization systems, such as biological nomenclatures and classifications, be managed as an ontology and pub- lished using an ontology service?

As a case study, the publications VI–VIII present the modeling and man- agement of biological names and classifications. The proposed solution is an ontology model for taxonomic concepts and their scientific and vernacu- lar names. Publication VI presents the Taxon Meta-Ontology (TaxMeOn) model which is aimed for the following data: 1) species checklists and map- pings between them, 2) vernacular name collections, and 3) the changes of scientific names and classifications, and differing opinions of taxonomic concepts based on biological research results. The model makes a dis- tinction between the taxonomic concept and its name. Thus, they can be managed separately, and the nomenclatural changes and re-classifications of a concept can be tracked and managed. The model is flexible in a sense that it is designed to be suitable for data with different levels of details. The simplest use case is to express a static list of taxon names, but the model also supports more complex needs of representing the changes of names and taxonomic concepts. As the model is based on Linked Data, it offers possibilities to link divergent data serving divergent purposes and detailed information with more general information. An advantage of the model is its practicality and applicability to real-life use cases. This is demonstrated by applying it into three use cases: 1) publishing biological species checklists in an ontology service (27 lists, over 80,000 names), 2) collaborative management of vernacular names (ca. 26,000 taxa), and 3) management of individual scientific name changes resulting from biological research results (9 genera). Publication VII presents the use case of applying TaxMeOn into modeling and publishing species checklists of scientific names, and compares the ontology model with storing the checklists in a legacy relational database. The model allows mapping of taxa between different checklists (e.g., based on their congruency) and representing and managing the changes in taxo-

39 Results nomic concepts, their classifications, and names in individual checklists. The main advantages of the ontology model as opposed to a traditional database are the linkability to other datasets, extendability of the data model, (re)usability of the data via standard publication mechanisms, and possibility to edit the data with standard RDF tools. This means that the ontologies can be utilized in applications using general ontology services, without the need to implement domain-specific access mechanisms for biological name collections. The species checklists were published in the ONKI ontology service, fa- cilitating their reuse via user interfaces and APIs. The ONKI browser interface can be used for searching and browsing taxa, finding currently valid names, and tracing the temporal changes in the scientific names. The ONKI autocompletion widget provides a way for integrating an access mechanism to checklists into user applications, e.g., a content management system. Furthermore, HTTP and SOAP APIs are available for program- matic access and a SPARQL endpoint for querying the ontologies. Publication VIII gives a more thorough presentation on how the TaxMeOn model can be applied into management of vernacular names. The model provides a solution for managing the approval process of common names, supporting the temporal tracking of their changes via statuses and their time stamps. The system is used by the Finnish Biology Society Vanamo2 to manage the Finnish names of vascular plants in a collaborative way. In the typical workflow, a new common name is first proposed for a plant, after which it can be accepted to be the recommended name, and finally it can be made an alternative if another recommended name is introduced later. We present the complete workflow for managing a vernacular name ontology from a collaborative development of the ontology to publishing it as Linked Open Data and in an ontology service which makes it accessible to the general public. The ontology is available in machine-processable RDF format, with explicit semantics, e.g., the hierarchical relations are set between the plant URIs, facilitating data integration and information retrieval in cases where data is combined from heterogeneous sources. The plant name ontology helps harmonizing the terminology, which in turn enhances communication between various users. Application developers can utilize the ontology by using the plant name URIs for unambiguous referencing to plant species.

2http://www.vanamo.fi

40 Results

The applicability of the TaxMeOn model for its most complex use case, the management of individual scientific name changes based on biological research, has been demonstrated with a test dataset representing changes in the taxonomic classification of Afro-tropical family in Publication VI. The family has gone through numerous taxonomic treat- ments. For example, the position of the species Pterotarsus historio in the taxonomic classification has changed 22 times and at least eight taxonomic concepts are associated to the genus Pterotarsus. The TaxMeOn repre- sentation of the dataset encompasses the different conceptions of a taxon (e.g., Pterotarsus), the temporal order of the changes, and the references to scientific publications whose results justify these changes. Such a detailed information source provides a unified view of a complex taxon, which can be beneficial even to the researchers of biology, as the details of taxa have traditionally been scattered across the original publications, and piecing them together can be difficult and time-consuming. The detailed data can be further linked to other datasets with less taxonomic information, such as species checklists, which provides their users with more precise information.

3.4 Summary

To summarize the results, the research questions are re-visited in this section.

1. How can lightweight ontologies be published on the Semantic Web so that they can be utilized in content indexing and information retrieval tasks?

To faciliate the use of ontologies, they should be published as ontology services to fulfill the needs of different user groups, supporting their work- flows during the phases of the life cycle of an ontology, e.g., in content indexing and information retrieval. For the cost-efficient reuse of the ontologies, user interface components and APIs can be used to integrate ontology-based functionalities into applications. The common function- alities for ontology use include concept search, browsing, and selection. W3C’s SKOS data model can be used for harmonizing different legacy thesauri, which facilitates their publication via shared mechanisms in on- tology services. Ontologies of different kinds might benefit from differing user interfaces, e.g., a thesaurus can be visualized as a tree, whereas a

41 Results geographical ontology might employ a map view.

2. How can a collection of independent or interconnected ontologies—in dif- ferent formats and repositories—be published and utilized using shared user interfaces and APIs?

A set of interconnected ontologies can be combined into an ontology cloud that can be made accessible to end users via an ontology service as a single representation encompassing the different domains of the member ontologies. For building such an ontology cloud, ontology development and management processes need to take into account the semantic depen- dencies between the individual ontologies. The ontology development can be streamlined by formulating practical ontology design principles that guide the work and ensure the consistency of the cloud. The use of multi- ple ontology repositories simultaneously can be accomplished by using a shared upper data model for the ontologies, e.g., SKOS, and providing a shared API for accessing the ontologies. The approach facilitates, e.g., the building of aggregated ontology directories and global search on a network of ontology repositories.

3. How can richer knowledge organization systems, such as biological nomenclatures and classifications, be managed as an ontology and pub- lished using an ontology service?

Designing an ontology model for the specific needs of the domain, and mapping it to a harmonizing data model, e.g. SKOS, allows the publication of the ontology in shared ontology services. This way, the ontology can be accessed with general ontology user interfaces and APIs, without losing the detailed representation of the information. In the case study of biological names and classifications, the main modeling solutions in the TaxMeOn model are the separation of the taxonomic concepts and names, represen- tation of the changes and their temporal order, and mappings between the different conceptions of the taxonomic concepts. The management and publication workflow of the ontology can be implemented using a general RDF editor and ontology service.

42 4. Discussion

Traditional, comparative evaluation of the models, processes, and tools de- veloped in this thesis is difficult, as the systems provide novel solutions for the problems described in Chapter 1. In particular, evaluation of Semantic Web applications is difficult as the usefulness and usability of the systems depend on multiple factors: the quality of the heterogeneous source data used, the underlying search and inference software, and the user inter- face [265]. In more mature fields of computer science, such as in relational databases or text-based information retrieval, established data models and search algorithms are often available to be used as building blocks for cre- ating new methods. In Semantic Web, such existing components are rare, and they usually have to be designed for every application. The state adds complexity to the development process and requires a level of maturity from the system to be properly evaluated. Unless all the components are of high quality, the system is not useful for the user. As the evaluation method in this thesis, the extensive application to prac- tice has been used as a proof of concept with the developed artifacts being adjusted based on real-life experiences in accordance with the principles of action research. Burstein and Gregor [42] have proposed criteria for evaluating systems development research, covering the following aspects: 1) theoretical and practical significance, 2) internal validity, 3) external validity, 4) objectivity, and 5) reliability. In the following, the research of this thesis is evaluated according to these criteria.

43 Discussion

4.1 Theoretical Implications

4.1.1 Ontology Services (RQ1)

In comparison to the earlier ontology server research presented in Chapter 2, the main contribution of the ONKI ontology service model is the tight in- tegration and support for the runtime use of ontologies, focusing on the content indexing and information retrieval use cases. Many previous efforts have been focusing on development and editing capabilities [20, 226, 43, 50, 231, 21, 213, 69, 49, 75, 92] of ontology servers, or merely publishing ontologies via ontology directories [235, 33, 113, 168, 122] or ontology browsers [40, 230] without providing modular components or web services that can be integrated into external applications. The developed system is domain-agnostic, general solution for publishing and using on- tologies, and thus is not restricted to a single KOS or domain, as opposed to several other ontology servers. Compared with general semantic web search engines [67, 59, 198], the ONKI system provides focused support for ontology-based tasks, e.g., by displaying concept hierarchies. Ontology search engines [41, 217, 254], on the other hand, are tools suited especially for finding suitable ontologies, but not for using individual ontologies, e.g., in content indexing. ONKI widget for integrating concept search and selection functionalities into external applications is similar to the widget models and implemen- tations of the OCLC Terminology Services and BioPortal. The ONKI approach is more general than the OCLC widget, as ONKI widget can be integrated directly into the user interface of the application, and is not used as a separate browser toolbar. ONKI widget aims to provide a streamlined user experience, where the ontology-based functionalities are served with as little interference with the original user interface as possible. ONKI widget [268] was published earlier than the BioPortal widget [275], to the best of the author’s knowledge1. The autocompletion component by Hildebrand et al. for general RDF repositories is based on similar ideas as ONKI widget. However, ONKI widget is focused on ontology-based interactions and is packaged as a ready-to-use service. One of the novelties of ONKI widget is the combination of the autocompletion

1The history of the NCBO Widgets (BioPortal) documentation page https://www.bioontology.org/wiki/index.php/NCBO_Widgets dates to 12 May 2009.

44 Discussion search and ontology browsing mechanism. Thus, the user can start the concept search task by typing in a query term, select a matching concept, and further refine the selection by browsing the ontology, e.g., to find more specific concepts based on the concept hierarchy. The system also sup- ports semantic query expansion in similar vein as piloted in the FACET Web demonstrator and STAR project widgets, but has been made publicly available as a widget that can be integrated into external applications. The HTTP and SOAP API of the ONKI service provide high level ab- straction access to ontologies with a compact API specification, focusing on supporting concrete use cases of content indexing and information retrieval. The APIs are web-based, meaning they can be utilized in dis- tributed systems, promoting loosely coupled services and complying with the resource- [78] and service-oriented [70] architectures. The use of the APIs is not tied to a single ontology modeling or programming language, as opposed to implementations such as SKOS API and OWL API, or general RDF libraries, such as Jena and RDFLib. Compared with the general RDF query language SPARQL, ONKI is focused on providing ontology-based functionalities on a more abstract level. The common functionalities of an ontology server identified in this study based on the analysis of the user requirements of KOS—the concept search, browsing, and selection—are supported by previous literature [60, 108, 18, 95]. Also, the decision of using the SKOS vocabulary as the harmonizing model for publishing KOSs is reaffirmed by existing research [261, 237, 54, 50].

4.1.2 Publishing Multiple Ontologies (RQ2)

The ontology cloud model presented in this thesis is based on the idea of expressing multiple interlinked ontologies as a single coherent system, published in an ontology service. Previous research on the inter-ontology support in ontology servers has been focused on the creation and repre- sentation of concept mappings between individual ontologies [196, 172, 224, 264, 156, 190], and not providing them as a shared, easy-to-use, cross- domain ontology for use cases such as content indexing. The model is supported by previous research on upper ontologies [184, 170, 194] and ontology modularization [243, 7], where the ontology content is divided into subsets based on the generality and domain of the concepts. The presented principles and methods for the creation and management of the ontology cloud complement the previous guidelines for ontology

45 Discussion interoperability [235] and general ontology design principles [99, 260, 101, 191, 273, 89], by emphasizing the importance of the consistency of the concept hierarchy. The proposed methods and tools for identifying the overlappings of the participating ontologies and propagating the changes of the upper ontology to the domain ontologies aim to ensure that the ontology cloud is valid, up-to-date, and easy to use. The cloud model is based on utilizing existing legacy thesauri, and as such the system maintains the backward compatibility with existing annotations, providing a cost-efficient way to publish legacy data as Linked Data. The NOR approach of accessing multiple ontology repositories simulta- neously is based on a distributed architecture, whereas many previous ontology server models are centralized services. The system is not tied to a single ontology server implementation, but is based on a shared API instead. The NOR API is comprised of access methods to ontologies and concepts on a high abstraction level, with a focus on the concept search and representation of concept information and ontology metadata. The API is not based on a specific ontology language, as opposed to lower level APIs, such as SKOS API and OWL API. On other hand, in contrast to general knowledge and agent communication languages, such as OKBC, FIPA-ACL, KQML, and S-APL, NOR API is focused on practical use cases of content indexing and information retrieval. To avoid the building of extraneous wrappers, which are used in many federated ontology access systems [8, 90], NOR is designed to be lightweight and simple in order to be easy to implement in ontology servers. The approach of using a highly abstracted API and harmonizing metadata model is similar to the more recent approaches of Ontohub and JSKOS-API, of which the latter uses the same SKOS data model and basic methods of entity search and lookup as NOR API. The ontology metadata used in NOR utilizes the existing metadata models VoID [13], Dublin Core, and FOAF [38], complements them by adding information of the NOR endpoint address, and can be extended with other ontology and dataset description vocabularies, such as Ontology Metadata Vocabulary (OMV) [109] and Data Catalog Vocabulary (DCAT) [174].

4.1.3 Complex Knowledge Organization Systems (RQ3)

The developed TaxMeOn model for managing biological nomenclatures and classifications is an example of how a rich KOS can be modeled and maintained as an ontology and published in an ontology service. Compared

46 Discussion with traditional species databases that aim to aggregate taxon names from various sources and harmonize their usage [223, 207, 225, 76, 94, 215, 64], TaxMeOn provides an explicit data model that is a general solution for managing heterogeneous biological name collections, and not tied to a single database system. As the model is based on Linked Data technologies, the data model can be extended in a flexible way and integrated with other data sources. The use of URIs as global identifiers in TaxMeOn is supported by the previous research which has emphasized the need for persistent identifiers when referring to taxa [25, 200, 209, 225, 205, 219], either in the form of a taxon name combined with a literature reference or a technical string. Compared with previous ontology models on taxonomic information [228, 85, 86, 48], TaxMeOn is focused on practical name management of species checklists, research results, and other nomenclatural collections. The model supports the management of parallel classifications and nomenclat- ural conceptions of taxa in a semantically rich way, whereas some of the previous research have concentrated on modeling a single, unchangeable classification [228, 76]. Many Linked Data publishing projects of biological data and biodiversity data models [61, 251, 125, 220] are not focused on semantic rigor and therefore do not promote the machine processability of the contents optimally.

4.1.4 Summary

The theoretical implications of the methods and tools presented in this thesis are summarized in Table 4.1, by re-visiting the objectives of the ontology services defined in Section 1.2.

4.2 Practical Implications

4.2.1 Ontology Services (RQ1)

ONKI service provides out-of-the-box support for publishing and utilizing SKOS vocabularies and other lightweight ontologies in, e.g., content index- ing, without needing to implement application specific user interfaces for end users. The system caters for many common, sharable tasks in ontology- based applications related to, e.g., concept finding, browsing, selecting, and query expansion. Lots of work and costs can be saved by implementing

47 Discussion

Objective Methodological and technological solutions Ontology publication channel General publication channel for various ontologies, created by different parties, including user-uploaded ontologies and ones fetched from external sources by automated processes. Heterogeneous ontologies Support for ontologies in different RDF-based formats, not tied to a single domain, modeling language, or a publisher. Tools for metadata creation Widgets and APIs that can be integrated into applications to support ontology- based cataloging practices. Support for distributed content creation Public living lab ontology service that can be used by a heterogeneous network of memory organisations for creating interoperable metadata based on shared ontologies. Facilitate search tasks The widgets and APIs support information retrieval by providing query ex- pansion and cross-language search facilities. Multiple ontologies and repositories Publication of interlinked ontologies as a coherent, cross-domain cloud, and ability to perform federated search and uniform access to multiple ontology repositories based on a harmonized data model and shared API. Programmatic access HTTP and SOAP APIs for common ontology-based tasks, including concept search and lookup. Evaluation by applying into practice The feasibility of the developed ontology services has been demonstrated by extensive piloting in diverse use cases. Promote complex KOSs Rich knowledge organization systems can be mapped to a harmonizing data model, such as SKOS, and published in shared ontology services. The man- agement of such a KOS can be realized using a general RDF editor.

Table 4.1. The objectives of the ontology services and the corresponding solutions pre- sented in this thesis. such functionalities in standard ways and providing them for production use as ready-to-use services. In this way, the use patterns of utilizing vocab- ularies in the user interface can be harmonized, which makes the systems easier to learn and use, as there is no need for vocabulary-specific access mechanisms. The ONKI service provides simple, yet powerful APIs and an autocompletion widget for content indexing and query expansion. The services can be used not only in ontology-based applications, but in legacy systems, which can utilize the ontologies in a similar way as traditional thesauri. ONKI has been run as a living lab service since 2008 and has acted with the KOKO ontology cloud as the backbone of the Finnish national ontology infrastructure [136], which aims to enhance the interoperability of the collections of museums, libraries, archives, companies, and other organizations. Also, several international ontologies, such as Iconclass2, Medical Subject Headings (MeSH)3, the United Nations Standard Products and Services Code (UNSPSC)4, and Integrated Public Sector Vocabulary

2http://www.iconclass.nl 3http://www.nlm.nih.gov/mesh/ 4http://www.unspsc.org

48 Discussion

(IPSV)5, have been published in ONKI to facilitate their use in applications. The ONKI service has been used in the distributed ontology-based content creation approach of several semantic portals, such as CultureSampo, HealthFinland [247], and BookSampo [177]. The ontology infrastructure has gained maturity in Finland, and through technology transfer, ONKI’s production level successor, Finto service has been run by the National Library of Finland since 2014 [249].

4.2.2 Publishing Multiple Ontologies (RQ2)

The design principles and tools presented in this thesis for building and managing a cloud of interlinked ontologies have been used to build the cross-domain KOKO cloud of Finnish ontologies. The ontologies are based on existing, established thesauri that have been used in various organi- zations for describing heterogeneous contents. The users of the legacy thesauri have been shifting to use the KOKO cloud, which facilitates the cross-domain interoperability of their datasets, as the novelties of KOKO include the mappings between the individual domain ontologies and the semantic consistency of the concept hierarchies. The maintenance and fur- ther development of the KOKO cloud have been transferred to the National Library of Finland, where it is managed by a network of domain ontology developers using the ontology cloud design principles and tools that are being further developed by the library. The NOR approach of accessing multiple ontology repositories has been used as the internal architecture of the ONKI service, where multiple ontology backend servers are accessed and their information is aggregated by the frontend server. The system enables global search on the distributed ontology network, facilitating the comparison of different ontologies and using multiple ontologies simultaneously, e.g., in content indexing. By publishing ontologies using a shared API and metadata format, the users can access the ontologies using common tools and interfaces, making it easier for them to start using new ontologies and ontology repositories, and integrating them into external applications. The workflow of the users is streamlined as they do not have to access the ontology repositories separately and be familiar with the repository-specific user interfaces, modeling solutions, and other features. For the ontology publishers, NOR aims to increase the visibility of the ontologies as it is easier to incorporate

5http://id.esd.org.uk/IPSV

49 Discussion them into services that aggregate ontologies from different sources, such as ontology directories. The approach is applicable also to other kinds of data sources in addition to ontology repositories.

4.2.3 Complex Knowledge Organization Systems (RQ3)

The developed TaxMeOn model is a practical solution for managing various kinds of biological name collections. Its design principles include using terminology that is established in biology, focusing only on taxonomic information, and supporting data of various levels of granularity and of alternative views, in order to be simple, yet flexible to use. Accompanying the data model, the research presented in this thesis has contributed by providing a collaborative ontology maintenance and publication workflow utilizing existing tools, such as SAHA metadata editor and ONKI ontology service. By following the approach, it is straightforward and cost-efficient to develop and publish new ontologies for public use. The use of HTTP URIs as identifiers instead of LSIDs that are used in many previous taxonomic databases [148, 51, 55, 225, 215, 64] simplifies the publishing process and use of the taxonomic nomenclature. The system relies on standard resolving and locating mechanisms of the web infrastructure, without the need to implement a specialized LSID resolver. As a proof of concept of the ontology model, several species checklists of the Finnish Museum of Natural History have been converted into TaxMeOn ontologies and published in the ONKI service. Different stake- holders, such as environmental authorities or biodiversity researchers, can use the system for cataloging, finding, and integrating information from heterogeneous sources, enabling the use of unambiguous taxon references. The ability to link scientific and vernacular names together is useful espe- cially in the citizen science context and information retrieval by laymen as non-professionals might not be familiar with scientific nomenclature.

4.2.4 Summary

The practical implications of the methods and tools presented in this thesis are summarized in Table 4.2, by discussing how the ontology services and models have been applied in the case studies.

50 Discussion

Feature Application in a case study Creation of interoperable metadata for memory organizations and semantic portals Shared ontologies Semantic interoperability of heterogeneous data sources by creating ontology- based metadata. Support for legacy data By basing the ontologies on existing thesauri backwards compatibility to pre- viously created contents is achieved. Tools for content indexing Support for common cataloging workflows in a cost-effective way, minimizing manual work. Building the Finnish national ontology infrastructure National level ontology services Free public services support the creation of standardized metadata in a shared way by governmental agencies, companies, and other organizations. Formation of the KOKO cloud A single interlinked representation of a set of domain ontologies makes the cross-domain data integration possible, acting as semantic glue. The redun- dant work of ontology developers can be eliminated as the overlapping parts of individual ontologies are identified. Management process of the KOKO cloud The domain ontologies are kept up-to-date regarding the changes of the gen- eral upper ontology, ensuring the validity and consistency of the ontology cloud. Harmonized data model and API Multiple ontologies, possibly originating from different repositories can be used simultaneously with shared tools in ontology-based workflows, e.g., in content indexing. Management and publication of biological name collections Support for diverse name resources The data model can be used for various kinds of name collections, facilitating their interoperability and linking. Temporal management The changes and different interpretations of the names and classifications can be tracked and controlled, supporting the scientific processes of taxonomy. Extendability and external linking The data model can be expanded to support new use cases and linked to exter- nal data sources providing additional information. Tools for complete workflow The processes for creating, managing, publishing, and using name collections are supported by general RDF editors, ontology services, and the web infras- tructure.

Table 4.2. The features of the ontology services and models presented in this thesis and their application in the case studies.

51 Discussion

4.3 Reliability and Validity

The reliability refers to the consistency of the research process and its stability over time and across researchers and methods. The research questions and objectives of this study are stated in Chapter 1, and the research questions are revisited when the results of the study are presented in Chapter 3. The developed models and prototype systems are presented explicitly in Chapter 3, and discussed in more details in the individual publications that are included in the thesis. The objectivity of the research is ensured by describing the research methods, the ontologies used in the study, and the organizations involved. The author of the thesis has no competing interests regarding the research presented and does not recognize personal biases that might affect the process. The internal validity concerns the achievement of the stated objectives of the research and requirements of the developed systems, the alternative methods, and the limitations of the research. The models and systems developed meet the objectives presented in Chapter 1 as discussed in Chapter 3 and Sections 4.1 and 4.2, as evidenced by applying them in real-world use cases. The ONKI service has been used for publishing several ontologies, integrating them into external applications for creating interoperable metadata of distributed collections of museums and other organizations, and run as a living lab for supporting the users, including content indexers, information searchers, ontology developers, and applica- tion developers. The developed ontology models and principles have been used for the creation and management of the ontology cloud KOKO and sev- eral biological nomenclatures. The system demonstrates its applicability to the simultaneous usage of multiple ontologies and ontology reposito- ries. The models and software implementations have been compared with relevant related work. The usability of the user interface of the ONKI system was evaluated and improved by Rami Alatalo [12]. Based on the conducted user survey and interviews focusing on professional content indexers, and heuristic evaluation of the ONKI2 user interface, a new version of the user interface was built. The resulting user interface ONKI3 gained a mean usability score of 48/100 in the System Usability Scale (SUS) [39], based on a user survey. Considering the varying use cases and needs of the target users, the score can be viewed as decent. In September 2011, a user inquiry was conducted for the users of the

52 Discussion

ONKI service using an online questionnaire, consisting of Likert scale and open-ended questions. The inquiry received 107 responses from Finnish li- braries, museums, archives, research institutes, and government agencies. The preliminary results of the inquiry were presented in a FinnONTO project board meeting [259]. Below, key findings regarding the functional- ities of ONKI are summarized, based on the respondents who answered that they use ONKI daily or weekly.

58 % of the users regard the functionalities of ONKI as good or excellent • for their needs, whereas 40 % of them think that the functionalities are poor or satisfactory.

57 % of the users who use the concept search continuously or occasion- • ally regard the functionalities of ONKI as good or excellent for their needs, whereas 38 % of them think that the functionalities are poor or satisfactory.

45 % of the users who the use the ontology browsing continuously or • occasionally regard the functionalities of ONKI as good or excellent for their needs, whereas 36 % of them think that the functionalities are poor or satisfactory.

55 % of the users think that ONKI is as good or better than the VESA • Web Thesaurus Service6, as opposed to 28 % who think that ONKI is worse than VESA.

When comparing ONKI with VESA, it should be noted that the results give insight on the usefulness of the two systems in relation to each other, not on the absolute quality of the systems. Based on the responses, the most used functionality of ONKI is the concept search. Overall, the ONKI service was regarded as important. While the ONKI service can be considered an abstract concept, the concrete implementation introduces some limitations. For example, the KOSs that are to be published in the system need to be serialized in the RDF format. RDF is a non-proprietary format, and converting data into it is straightforward in many cases, but some complex data models might require careful and laborious design work. The proposed content

6A majority of the Finnish libraries, museums, and other memory organizations have used VESA previously to access established Finnish thesauri maintained by the National Library of Finland. VESA has been since replaced by the Finto service.

53 Discussion indexing methods are mainly manual, meaning they require expensive human efforts, though the methods streamline existing manual indexing processes and thus aim to save costs. In addition to manual indexing, the ontologies published in the ONKI system can be used via APIs, which can be used as components in automatic annotation systems. The external validity refers to the generalizability of the research results and congruency with prior theory. The applicability of the developed meth- ods to other settings has been ensured by demonstrating representative, diverse use cases in the proof-of-concept systems. The ONKI service has been used as a publishing platform for national and international thesauri and ontologies, covering different domains, ontology modeling languages, owners, and users. In addition to lightweight ontologies, the ONKI APIs and autocompletion widget have been used for accessing more complex geo- graphical ontologies and biological classifications, with use cases in content indexing and query expansion. The tool that was developed for detecting overlappings between ontologies has been used not only in building the KOKO cloud, but also for generating the first version of a combined ontol- ogy of legal concepts, based on three vocabularies in the field of the Finnish legislation [87]. The NOR approach has been tested in three different use cases, including a demonstration involving an external ontology repository. The TaxMeOn model for biological nomenclatures has been applied into three distinct types of name collections, each with specific requirements. The systems presented in the thesis have been designed by taking the previous research into account. The key functionalities of the ONKI service and NOR API are supported by the previous work on ontology servers, as the user requirements and common tasks in ontology-based workflows identified in this thesis are similar to findings by other researchers. The principles of building the ontology cloud complement existing, general ontology design patterns. The TaxMeOn model aims to be a simple, yet semantically rich solution, taking inspiration both from more theoretical modeling approaches and practical publishing efforts of the previous work.

4.4 Recommendations for Further Research

The research presented in this thesis paves way for future work on several areas. Deeper understanding about the roles and requirements of different ontology user groups and the suitability of the developed interfaces and tools for them would require more thorough user evaluation. The work

54 Discussion includes testing of the user interfaces of the ONKI service and the suit- ability of the APIs for various use cases. The use of ontology services in information retrieval needs to be further studied, as more information is required for the optimal selection of the relations used for expanding the queries, and for improving the user interactions in the ONKI widget. Such insight can be gained by conducting a formal evaluation covering alternative options. Concerning the publication of ontologies, it would be beneficial to make the different versions of an ontology available for the users, not only the latest version as in the ONKI service. By doing so, users would be able to compare the different versions of ontologies, analyze their changes, change frequencies, etc. Access to historical versions of ontologies is needed especially in situations where the user has content that has been indexed with an older version of the ontology and wishes to make it compatible with the current version. Implementing such functionalities would require further research on ontology versioning and visualization methods and their application to concrete use cases. Regarding the management of a cloud of interlinked ontologies, the processes and tooling for tracking and communicating the changes of the upper ontology to domain ontologies need to be refined. In addition, a more formal process for the development and overall coordination of the ontology cloud would further guide the ontology developers and streamline their work. Research on methods for validating the consistency of the ontology cloud is needed in order to ensure the high quality of the cloud, e.g., to avoid logical errors when reasoning over the class hierarchies. The work of developing the management processes of the ontology cloud is currently underway in the National Library of Finland. The NOR API that is used for accessing multiple ontology repositories simultaneously could be extended to allow more functionalities. In order to keep the basic API as simple and cost-effective to implement as possible, the extensions could be defined as optional modules that are not required to be implemented in every ontology repository participating in the NOR network. The possibilities for the extensions include more fine-grained ways to restrict the concept search, support for mappings between on- tologies, and ranking of the search results, e.g., based on the ontology or repository they are included in. To facilitate the development and maintenance of biological nomencla- tures using the TaxMeOn model, user-friendly tools are needed, as the

55 Discussion existing, generic RDF editors do not support efficient management of such complex ontologies. The user interface of the tool should hide the complexi- ties of the data model and present the data in an intuitive and established way to biologists. To this end, research on developing a configurable RDF editor that can be adjusted to different data models would be beneficial. The TaxMeOn model could be extended with new structures to enable a more fine-grained modeling of hybrid taxa and taking into account the distinct features of zoological and botanical nomenclature. The value of the species checklists and name collections published in the ONKI service could be added by developing or using mapping tools to generate links from taxon names to complementing datasets, such as DBpedia.

56 Bibliography

[1] ISO 5963: Documentation – methods for examining documents, determin- ing their subjects, and selecting indexing terms. ISO standard, Interna- tional Organization for Standardization, 1985.

[2] SFS 5471: Guidelines for the establishment and maintenance of Finnish language thesauri. SFS standard, Finnish Standards Association, 1988.

[3] FIPA agent communication language specifications. Tech. rep., Foundation for Intelligent Physical Agents, 2001. http://www.fipa.org/repository/ aclspecs.html.

[4] FIPA ontology service specification. Experimental specification, Founda- tion for Intelligent Physical Agents, 2001. http://www.fipa.org/specs/ fipa00086/XC00086D.html.

[5] Global Strategy for Plant Conservation: Technical Rationale, Justification for Updating and Suggested Milestones and Indicators. Meeting report UNEP/CBD/COP/10/19, Convention on Biological Diversity, 2010.

[6] ISO 25964-1: Information and documentation – thesauri and interoperabil- ity with other vocabularies – part 1: Thesauri for information retrieval. ISO standard, International Organization for Standardization, 2011.

[7] ABBÈS,S.B.,SCHEUERMANN,A.,MEILENDER, T., ANDD’AQUIN,M. Characterizing modular ontologies. In Proceedings of the 6th International Workshop on Modular Ontologies, the 7th International Conference on For- mal Ontology in I nformation Systems (FOIS 2012), Graz, Austria, July 24 (2012), T. Schneider and D. Walther, Eds., CEUR Workshop Proceedings.

[8] ADAMUSIAK, T., BURDETT, T., KURBATOVA, N., VAN DER VELDE, K. J., ABEYGUNAWARDENA, N., ANTONAKAKI,D.,KAPUSHESKY,M.,PARKIN- SON,H., AND SWERTZ,M.A. OntoCAT – simple ontology search and integration in Java, R and REST/JavaScript. BMC Bioinformatics 12 (2011).

[9] AHMAD, M. N., AND COLOMB,R.M. Managing ontologies: A comparative study of ontology servers. In Database Technologies 2007: Proceedings of the Eighteenth Australasian Database Conference (ADC2007), Ballarat, Victoria, Australia, January 29 - February 2 (2007), J. Bailey and A. Fekete, Eds., Australian Computer Society, pp. 13–22.

57 Bibliography

[10] AITCHISON, J., AND DEXTRE CLARKE,S. The Thesaurus: A Histori- cal Viewpoint, with a Look to the Future. Cataloging and Classification Quarterly 37, 3/4 (2004), 5–21.

[11] AITCHISON, J., GILCHRIST,A., AND BAWDEN,D. Thesaurus Construction and Use: A Practical Manual, 4 ed. Europa Publications, London, 2000.

[12] ALATALO,R. Ontologiapalvelun käyttöliittymän jatkokehitys (Further development of an ontology service user interface), Dec. 2010. Bachelor’s thesis. Aalto University, Espoo, Finland.

[13] ALEXANDER,K.,CYGANIAK,R.,HAUSENBLAS,M., AND ZHAO, J. De- scribing linked datasets with the VoID vocabulary. W3C interest group note, World Wide Web Consortium, 03 March 2011. http://www.w3.org/ TR/2011/NOTE-void-20110303/.

[14] AMIN,A.,HILDEBRAND,M., VAN OSSENBRUGGEN, J., EVERS, V., AND HARDMAN,L. List, group or menu: organizing suggestions in autocom- pletion interfaces. Tech. Rep. INS-E0901, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, Netherlands, 2009.

[15] ANDERSON, J. D. Guidelines for indexes and related information retrieval devices. Tech. rep., National Information Standards Organization (NISO), 1997.

[16] ANDREWS, P., ZAIHRAYEU,I., AND PANE, J. A Classification of Semantic Annotation Systems. Semantic Web 3, 3 (2012), 223–248.

[17] BACA,M., Ed. Introduction to Metadata, 3 ed. Getty Publications, Los Angeles, California, 2016. http://www.getty.edu/publications/ intrometadata.

[18] BACLAWSKI,K., AND SCHNEIDER, T. The open ontology repository initia- tive: Requirements and research challenges. In Proceedings of the Work- shop on Collaborative Construction, Management and Linking of Structured Knowledge (CK2009), collocated with the 8th International Semantic Web Conference (ISWC-2009), Washington D.C., USA, October 25 (2009), T. Tu- dorache, G. Correndo, N. Noy, H. Alani, and M. Greaves, Eds., CEUR Workshop Proceedings.

[19] BAKER, T., BECHHOFER,S.,ISAAC,A.,MILES,A.,SCHREIBER,G., AND SUMMERS,E. Key choices in the design of Simple Knowledge Organization System (SKOS). Journal of Web Semantics 20 (2013), 35–49.

[20] BANDHOLTZ, T., SCHULTE-COERNE, T., GLASER,R.,FOCK, J., AND KELLER, T. iQvoc – Open Source SKOS(XL) Maintenance and Publishing Tool. In Proceedings of the Sixth Workshop on Scripting and Development for the Semantic Web, co-located with the European Semantic Web Con- ference 2010 (ESWC 2010), Crete, Greece, May 31 (2010), G. A. Grimnes, S. Auer, and G. T. Williams, Eds., CEUR Workshop Proceedings.

[21] BASCA,C.,CORLOSQUET,S.,CYGANIAK,R.,FERNÁNDEZ,S., AND SCHANDL, T. Neologism: Easy vocabulary publishing. In Proceedings of the 4th Workshop on Scripting for the Semantic Web, Tenerife, Spain, June 02 (2008), C. Bizer, S. Auer, G. A. Grimnes, and T. Heath, Eds., CEUR Workshop Proceedings.

58 Bibliography

[22] BASKAUF, S. J., AND WEBB,C.O. Darwin-SW: Darwin Core-based terms for expressing biodiversity data as RDF. Semantic Web 7, 6 (2016), 645–667.

[23] BASKERVILLE,R.L. Distinguishing action research from participative case studies. Journal of Systems and Information Technology 1, 1 (1997), 25–45.

[24] BASKERVILLE,R.L., AND WOOD-HARPER, A. T. A critical perspective on action research as a method for information systems research. Journal of Information Technology 11, 3 (1996), 235–246.

[25] BERENDSOHN, W. G. The concept of "potential taxa" in databases. Taxon 44 (1995), 207–212.

[26] BERENDSOHN, W. G. A taxonomic information model for botanical databases: The IOPI model. Taxon 46, 2 (1997), 283–309.

[27] BERNERS-LEE, T. Linked data, W3C design issues. http://www.w3.org/ DesignIssues/LinkedData.html, 2006.

[28] BERNERS-LEE, T., HENDLER, J., AND LASSILA,O. The semantic web. Scientific American 284, 5 (2001), 34–43.

[29] BHOGAL, J., MACFARLANE,A., AND SMITH, P. A review of ontology based query expansion. Information Processing and Management 43, 4 (2007), 866–886.

[30] BINDING,C., AND TUDHOPE,D. KOS at your Service: Programmatic Access to Knowledge Organisation Systems. Journal of Digital Information 4, 4 (2004).

[31] BINDING,C., AND TUDHOPE,D. Terminology Web Services. Knowledge Organization 37, 4 (2010), 287–298.

[32] BINDING,C., AND TUDHOPE,D. Improving interoperability using vocabu- lary linked data. International Journal on Digital Libraries 17, 1 (2016), 5–21.

[33] BLOMQVIST,E.,GANGEMI,A., AND PRESUTTI, V. Experiments on pattern- based ontology design. In Proceedings of the fifth international conference on Knowledge capture (K-CAP ’09), Redondo Beach, California, USA, Septem- ber 1-4 (2009), ACM, pp. 41–48.

[34] BODNER,R.C., AND SONG, F. Knowledge-Based Approaches to Query Ex- pansion in Information Retrieval. In Advances in Artifical Intelligence: 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI ’96 Toronto, Ontario, Canada, May 21–24, 1996 Proceedings (1996), G. McCalla, Ed., Springer-Verlag, pp. 146–158.

[35] BORST, W. N. Construction of engineering ontologies for knowledge sharing and reuse. PhD thesis, University of Twente, Netherlands, Sept. 1997.

[36] BOYLE,B.,HOPKINS, N., LU,Z.,RAYGOZA GARAY, J. A., MOZZHERIN,D., REES, T., MATASCI, N., NARRO,M.L.,PIEL, W. H., MCKAY, S. J., LOWRY, S.,FREELAND,C.,PEET,R.K., AND ENQUIST, B. J. The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC Bioinformatics 14 (2013).

59 Bibliography

[37] BRAUN,S.,SCHMIDT,A.,WALTER,A., AND ZACHARIAS, V. The Ontology Maturing Approach for Collaborative and Work Integrated Ontology Devel- opment: Evaluation Results and Future Directions. In Proceedings of the First International Workshop on Emergent Semantics and Ontology Evolu- tion (ESOE 2007), co-located with ISWC 2007 + ASWC 2007, Busan, Korea, November 12 (2007), L. L. Chen, P. Cudré-Mauroux, P. Haase, A. Hotho, and E. Ong, Eds., CEUR Workshop Proceedings, pp. 5–18.

[38] BRICKLEY,D., AND MILLER,L. FOAF vocabulary specification 0.99. Namespace document, 14 January 2014. http://xmlns.com/foaf/spec/ 20140114.html.

[39] BROOKE, J. SUS - A quick and dirty usability scale. In Usability evaluation in industry, P. W. Jordan, B. Thomas, I. L. McClelland, and B. Weerdmeester, Eds. Taylor & Francis, 1996.

[40] BRUGMAN,H.,MALAISÉ, V., AND GAZENDAM,L. A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs. In Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, May 22-28 (2006).

[41] BUITELAAR, P., EIGNER, T., AND DECLERCK, T. OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection. In Proceedings of the 3rd International Semantic Web Conference (ISWC2004), Demo Papers, Hiroshima, Japan, November 7-11 (2004).

[42] BURSTEIN, F., AND GREGOR,S. The Systems Development or Engineer- ing Approach to Research in Information Systems: An Action Research Perspective. In Proceedings of 10th Australasian Conference on Informa- tion Systems, Wellington, New Zealand, December 1-3 (1999), B. Hope and P. Yoong, Eds., pp. 122–134.

[43] CARACCIOLO,C.,STELLATO,A.,RAJBAHNDARI,S.,MORSHED,A.,JO- HANSSEN,G.,JAQUES, Y., AND KEIZER, J. Thesaurus maintenance, alignment and publication as linked data: the AGROVOC use case. In- ternational Journal of Metadata, Semantics and Ontologies 7, 1 (2012), 65–75.

[44] CARDILLO,E.,FOLINO,A.,TRUNFIO,R., AND GUARASCI,R. Towards the reuse of standardized thesauri into ontologies. In Proceedings of the 5th International Conference on Ontology and Semantic Web Patterns (WOP’14), Riva del Garda, Italy, October 19 (2014), V. de Boer, A. Gangemi, K. Janow- icz, and A. Lawrynowicz, Eds., CEUR Workshop Proceedings, pp. 26–37.

[45] CARPINETO,C., AND ROMANO,G. A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Surveys 44, 1 (2012).

[46] CATHRO, W. Metadata: An Overview. In Proceedings of the Standards Aus- tralia Seminar: Matching Discovery and Recovery, Sydney and Melbourne, Australia, August 14-15 (1997).

[47] CHAUDHRI, V. K., FARQUHAR,A.,FIKES,R.,KARP, P. D., AND RICE, J. P. Open knowledge base connectivity 2.0.3. Proposed specification, OKBC working group, April 9, 1998. http://www.ai.sri.com/~okbc/spec/okbc2/ okbc2.html.

60 Bibliography

[48] CHAWUTHAI,R.,TAKEDA,H.,WUWONGSE, V., AND JINBO, U. Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data. Semantic Web 7, 6 (2016), 589–616.

[49] CHUNG,H.,CHOI, J.-M.,YI, J.-H.,HAN, J., AND LEE,E.-S. A Construc- tion of the Adapted Ontology Server in EC. In Intelligent Data Engineering and Automated Learning — IDEAL 2000. Data Mining, Financial Engi- neering, and Intelligent Agents: Second International Conference Shatin, N.T., Hong Kong, China, December 13–15, 2000 Proceedings (2002), K. S. Leung, L.-W. Chan, and H. Meng, Eds., Springer-Verlag, pp. 355–360.

[50] CONWAY,M.,KHOJOYAN,A.,FANA, F., SCUBA, W., CASTINE,M.,MOW- ERY,D.,CHAPMAN, W., AND JUPP,S. Developing a web-based SKOS editor. Journal of Biomedical Semantics 7, 5 (2016).

[51] COSTELLO, M. J., BOUCHET, P., BOXSHALL,G.,FAUCHALD,K.,GORDON, D.,HOEKSEMA, B. W., POORE,G.C.B., VAN SOEST, R. W. M., STÖHR, S.,WALTER, T. C., VANHOORNE,B.,DECOCK, W., AND APPELTANS, W. Global Coordination and Standardisation in Marine Biodiversity through the World Register of Marine Species (WoRMS) and Related Databases. PLoS ONE 8, 1 (2013).

[52] CÔTÉ,R.G.,JONES, P., APWEILER,R., AND HERMJAKOB,H. The On- tology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics 7 (2006).

[53] COWIE, J., AND LEHNERT, W. Information extraction. Communications of the ACM 39, 1 (1996), 80–91.

[54] COX, S. J. D., YU, J., AND RANKINE, T. SISSVoc: A Linked Data API for access to SKOS vocabularies. Semantic Web 7, 1 (2016), 9–24.

[55] CROFT, J., CROSS, N., HINCHCLIFFE,S.,LUGHADHA, E. N., STEVENS, P. F., WEST, J. G., AND WHITBREAD,G. Plant Names for the 21st Century: The International Plant Names Index, a Distributed Data Source of General Accessibility. Taxon 48, 2 (1999), 317–324.

[56] CRYER, P., HYAM,R.,MILLER,C.,NICOLSON, N., TUAMA,E.O.,PAGE, R.,REES, J., RICCARDI,G.,RICHARDS,K., AND WHITE,R. Adoption of persistent identifiers for biodiversity informatics: Recommendations of the GBIF LSID GUID task group, 6. November 2009. Tech. rep., Global Biodiversity Information Facility (GBIF), Copenhagen, Denmark, 2010. version 1.1, last updated 21 Jan 2010.

[57] DAMERON,O.,NOY, N. F., KNUBLAUCH,H., AND MUSEN,M.A. Accessing and manipulating ontologies using web services. In Proceedings of the ISWC 2004 Workshop on Semantic Web Services: Preparing to Meet the World of Business Applications, Hiroshima, Japan, November 8 (2004), D. Martin, R. Lara, and T. Yamaguchi, Eds., CEUR Workshop Proceedings.

[58] D’AQUIN,M., AND LEWEN,H. Cupboard – a place to expose your ontolo- gies to applications and the community. In The Semantic Web: Research and Applications: 6th European Semantic Web Conference, ESWC 2009 Heraklion, Crete, Greece, May 31–June 4, 2009 Proceedings (2009), L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyvönen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, Eds., Springer-Verlag, pp. 913–918.

61 Bibliography

[59] D’AQUIN,M.,MOTTA,E.,SABOU,M.,ANGELETOU,S.,GRIDINOC,L., LOPEZ, V., AND GUIDI,D. Toward a new generation of semantic web applications. IEEE Intelligent Systems 23, 3 (2008), 20–28.

[60] D’AQUIN,M., AND NOY, N. F. Where to publish and find ontologies? a survey of ontology libraries. Journal of Web Semantics 11 (2012), 96–111.

[61] DARWIN CORE TASK GROUP. Darwin Core. TDWG current standard, Biodiversity Information Standards (TDWG), 2009. http://www.tdwg.org/ standards/450.

[62] DAVISON,R.M.,MARTINSONS,M.G., AND KOCK, N. Principles of Canonical Action Research. Information Systems Journal 14 (2004), 65–86.

[63] DCMIUSAGE BOARD. DCMI metadata terms. DCMI recommenda- tion, Dublin Core Metadata Initiative, 14.6.2012. http://dublincore.org/ documents/2012/06/14/dcmi-terms/.

[64] DE JONG, Y., VERBEEK,M.,MICHELSEN, V., BJØRN, P. D. P., LOS, W., STEEMAN, F., BAILLY, N., BASIRE,C.,CHYLARECKI, P., STLOUKAL,E., HAGEDORN,G.,WETZEL, F. T., GLÖCKLER, F., KROUPA,A.,KORB,G., HOFFMANN,A.,HÄUSER,C.,KOHLBECKER,A.,MÜLLER,A.,GÜNTSCH, A.,STOEV, P., AND PENEV,L. Fauna Europaea - all European species on the web. Biodiversity Data Journal 2 (2014).

[65] DENGLER, J., BERENDSOHN, W. G., BERGMEIER,E.,CHYTRÝ,M.,DANI- HELKA, J., JANSEN, F., KUSBER, W.-H., LANDUCCI, F., MÜLLER,A., PANFILI,E.,SCHAMINÉE, J., VENANZONI,R., ANDVON RAAB-STRAUBE, E. The need for and the requirements of EuroSL, an electronic taxonomic reference list of all European plants. Biodiversity & Ecology 4 (2012), 15–24.

[66] DEXTRE CLARKE,S.G., AND ZENG,M.L. From ISO 2788 to ISO 25964: The evolution of thesaurus standards towards interoperability and data modeling. Information Standards Quarterly 24, 1 (2012), 20–26.

[67] DING,L.,FININ, T., JOSHI,A.,PAN,R.,COST,R.S.,PENG, Y., RED- DIVARI, P., DOSHI, V., AND SACHS, J. Swoogle: a search and metadata engine for the semantic web. In Proceedings of the thirteenth ACM interna- tional conference on Information and knowledge management (CIKM ’04), Washington, D.C., USA, November 8-13 (2004), ACM, pp. 652–659.

[68] DING, Y., AND FENSEL,D. Ontology Library Systems: The key to suc- cessful Ontology Re-use. In Proceedings of the 1st Semantic Web Working Symposium (SWWS’01), Stanford University, California, USA, July 30 - August 1 (2001), I. F. Cruz, S. Decker, J. Euzenat, and D. L. McGuinness, Eds., pp. 93–112.

[69] DOMINGUE, J. Tadzebao and WebOnto: Discussing, browsing, and editing ontologies on the web. In Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada, April 18-23 (1998).

[70] DRAHEIM,D. The Service-Oriented Metaphor Deciphered. Journal of Computing Science and Engineering 4, 4 (2010), 253–275.

[71] DUVAL,E. Metadata Standards: What, Who & Why. Journal of Universal Computer Science 7, 7 (2001), 591–601.

62 Bibliography

[72] EFTHIMIADIS, E. N. Interactive query expansion: A user-based evaluation in a relevance feedback environment. Journal of the American Society for Information Science and Technology 51, 11 (2000), 989–1003.

[73] EL JERROUDI,Z., AND ZIEGLER, J. iMERGE: interactive ontology merging. In EKAW 2008: 16th International Conference on Knowledge Engineering and Knowledge Management: Knowledge Patterns, Acitrezza, Italy, Septem- ber 29 - October 2, Poster and Demo Proceedings (2008), pp. 52–56.

[74] EUZENAT, J., AND SHVAIKO, P. Ontology Matching. Springer-Verlag, Berlin Heidelberg, 2007.

[75] FARQUHAR,A.,FIKES,R., AND RICE, J. The Ontolingua Server: a tool for collaborative ontology construction. International Journal of Human- Computer Studies 46, 6 (1997), 707–727.

[76] FEDERHEN,S. The NCBI Taxonomy database. Nucleic Acids Research 40, D1 (2012), 136–143.

[77] FEIGENBAUM,L.,HERMAN,I.,HONGSERMEIER, T., NEUMANN,E., AND STEPHENS,S. The semantic web in action. Scientific American 297, 6 (2007), 90–97.

[78] FIELDING, R. T. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, USA, 2000.

[79] FIKES,R., AND FARQUHAR,A. Distributed Repositories of Highly Expres- sive Reusable Ontologies. IEEE Intelligent Systems 14, 2 (1999), 73–79.

[80] FININ, T., WEBER, J., WIEDERHOLD,G.,GENESERETH,M.,FRITZSON, R.,MCKAY,D.,MCGUIRE, J., PELAVIN,R.,SHAPIRO,S., AND BECK,C. Specification of the KQML agent-communication language – plus example agent policies and architectures. Draft specification, DARPA Knowledge Sharing Initiative External Interfaces Working Group, June 15, 1993. http: //www.csee.umbc.edu/csee/research/kqml/papers/kqmlspec.pdf.

[81] FLOURIS,G.,MANAKANATAS,D.,KONDYLAKIS,H.,PLEXOUSAKIS,D., AND ANTONIOU,G. Ontology change: classification and survey. The Knowledge Engineering Review 23, 2 (2008), 117–152.

[82] FLUIT,C.,SABOU,M., AND VAN HARMELEN, F. Supporting User Tasks through Visualisation of Light-weight Ontologies. Handbook on Ontologies in Information Systems (2003), 1–20.

[83] FOX,E.A. Lexical Relations: Enhancing Effectiveness of Information Retrieval Systems. ACM SIGIR Forum 15, 3 (1980), 5–36.

[84] FRANZ, N. M., CHEN,M.,KIANMAJD, P., YU,S.,BOWERS,S.,WEAKLEY, A.S., AND LUDÄSCHER,B. Names Are Not Good Enough: Reasoning over Taxonomic Change in the Andropogon Complex. Semantic Web 7, 6 (2016), 645–667.

[85] FRANZ, N. M., AND PEET,R.K. Towards a language for mapping relation- ships among taxonomic concepts. Systematics and Biodiversity 7, 1 (2009), 5–20.

63 Bibliography

[86] FRANZ, N. M., AND THAU,D. Biological taxonomy and ontology develop- ment: scope and limitations. Biodiversity Informatics 7 (2010), 45–66.

[87] FROSTERUS,M.,TUOMINEN, J., AND HYVÖNEN,E. Facilitating Re-use of Legal Data in Applications—Finnish Law as a Linked Open Data Service. In Legal Knowledge and Information Systems: JURIX 2014: The Twenty- Seventh Annual Conference (2014), R. Hoekstra, Ed., IOS Press, pp. 115– 124.

[88] FU,G.,JONES,C.B., AND ABDELMOTY,A.I. Ontology Based Spatial Query Expansion in Information Retrieval. In On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2005, Agia Napa, Cyprus, October 31 - November 4, 2005, Proceedings Part II (2005), R. Meers- man and Z. Tari, Eds., Springer-Verlag, pp. 1466–1482.

[89] GANGEMI,A., AND PRESUTTI, V. Ontology Design Patterns. In Handbook on Ontologies, S. Staab and R. Studer, Eds., 2 ed. Springer-Verlag, 2009.

[90] GARCÍA-CASTRO,A.,GARCÍA-CASTRO,L.,VILLAVECES, J. M., CALDERÓN,G., AND HEPP,M. OLS2OWL. A repository management facility. In Proceedings of the 11th International Protégé Conference, Ams- terdam, Netherlands, June 23-26 (2009).

[91] GARSHOL,L.M. Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it all. Journal of Information Science 30, 4 (2004), 378–391.

[92] GENNARI, J. H., OLIVER,D.E.,PRATT, W., RICE, J., AND MUSEN,M.A. A Web-Based Architecture for a Medical Vocabulary Server. In Proceed- ings of the 19th Annual Symposium on Computer Applications in Medical Care, New Orleans, Lousiana, USA, October 28 - November 1 (1995), R. M. Gardner, Ed., American Medical Informatics Association, pp. 275–279.

[93] GIUNCHIGLIA, F., AND ZAIHRAYEU,I. Lightweight ontologies. In Ency- clopedia of Database Systems, L. Liu and M. T. Özsu, Eds. Springer-Verlag, 2009, pp. 1613–1619.

[94] GLOBAL BIODIVERSITY INFORMATION FACILITY (GBIF). ChecklistBank. http://github.com/gbif/checklistbank. Referenced 12 March 2017.

[95] GOLUB,K.,TUDHOPE,D.,ZENG,M.L., AND ŽUMER,M. Terminology Registries for Knowledge Organization Systems: Functionality, Use, and At- tributes. Journal of the Association for Information Science and Technology 65, 9 (2014), 1901–1916.

[96] GREENBERG, J., LOSEE,R.,PÉREZ AGÜERA, J. R., SCHERLE,R.,WHITE, H., AND WILLIS,C. HIVE: Helping Interdisciplinary Vocabulary Engineer- ing. Bulletin of the American Society for Information Science & Technology 37, 4 (2011), 23–26.

[97] GROSS,A.,PRUSKI,C., AND RAHM,E. Evolution of biomedical ontolo- gies and mappings: Overview of recent approaches. Computational and Structural Biotechnology Journal 14 (2016), 333–340.

[98] GRUBER, T. R. A translation approach to portable ontology specification. Knowledge Acquisition 5, 2 (1993), 199–220.

64 Bibliography

[99] GRUBER, T. R. Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 43, 5-6 (1995), 907–928.

[100] GUARINO, N., OBERLE,D., AND STAAB,S. What Is an Ontology? In Handbook on ontologies, S. Staab and R. Studer, Eds., 2 ed. Springer-Verlag, 2009, pp. 1–17.

[101] GUARINO, N., AND WELTY,C. Evaluating Ontological Decisions with Ontoclean. Communications of the ACM 45, 2 (2002), 61–65.

[102] GULLI,A., AND SIGNORINI,A. The Indexable Web is More than 11.5 Billion Pages. In Proceedings of the WWW ’05 Special interest tracks and posters of the 14th international conference on World Wide Web, Chiba, Japan, May 10-14 (2005), ACM, pp. 902–903.

[103] HALPIN,H.,HAYES, P. J., MCCUSKER, J. P., MCGUINNESS,D.L., AND THOMPSON,H.S. When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data. In The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I (2010), P. S. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm, Eds., Springer-Verlag, pp. 305–320.

[104] HAMEED,A.,PREECE,A., AND SLEEMAN,D. Ontology reconciliation. In Handbook on ontologies, S. Staab and R. Studer, Eds. Springer-Verlag, 2004, pp. 231–250.

[105] HARDISTY,A.,ROBERTS,D., AND BIODIVERSITY INFORMATICS COMMU- NITY. A decadal view of biodiversity informatics: challenges and priorities. BMC Ecology 13, 16 (2013).

[106] HARPER,C.A., AND TILLETT,B.B. Library of Congress Controlled Vocabularies and Their Application to the Semantic Web. Cataloging & Classification Quarterly 43, 3-4 (2007), 47–68.

[107] HARRIS,S., AND SEABORNE,A. SPARQL 1.1 query language. W3C recommendation, World Wide Web Consortium, 21 March 2013. http: //www.w3.org/TR/2013/REC-sparql11-query-20130321/.

[108] HARTMANN, J., PALMA,R., AND GÓMEZ-PÉREZ,A. Ontology Repositories. In Handbook on Ontologies, S. Staab and R. Studer, Eds., 2 ed. Springer- Verlag, 2009, pp. 551–571.

[109] HARTMANN, J., PALMA,R.,SURE, Y., SUÁREZ-FIGUEROA,M.C.,HAASE, P., GÓMEZ-PÉREZ,A., AND STUDER,R. Ontology metadata vocabulary and applications. In On the Move to Meaningful Internet Systems 2005: OTM 2005 Workshops: OTM Confederated Internationl Workshops and Posters, AWeSOMe, CAMS, GADA, MIOS+INTEROP, ORM, PhDS, SeBGIS, SWWS, and WOSE 2005, Agia Napa, Cyprus, October 31 - November 4, 2005. Proceedi (2005), R. Meersman, Z. Tari, and P. Herrero, Eds., Springer- Verlag, pp. 906–915.

[110] HARTUNG,M.,GROSS,A., AND RAHM,E. COnto-Diff: Generation of com- plex evolution mappings for life science ontologies. Journal of Biomedical Informatics 46, 1 (2013), 15–32.

65 Bibliography

[111] HARTUNG,M.,TERWILLIGER, J., AND RAHM,E. Recent Advances in Schema and Ontology Evolution. Evolution 141, 3 (2011), 149–190.

[112] HEATH, T., AND BIZER,C. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technol- ogy. Morgan & Claypool, Palo Alto, California, 2011.

[113] HEFLIN, J., AND HENDLER, J. Dynamic Ontologies on the Web. In Pro- ceedings of the 17th National Conference on Artificial Intelligence, Austin, Texas, USA, July 30 - August 3 (2000), H. A. Kautz and B. W. Porter, Eds., AAAI Press / MIT Press, pp. 443–449.

[114] HEFLIN, J. D. Towards the Semantic Web: Knowledge Representation in a Dynamic, Distributed Environment. PhD thesis, University of Maryland, USA, 2001.

[115] HERBERT,K.G.,GEHANI, N. H., PIEL, W. H., WANG, J. T. L., AND WU, C.H. BIO-AJAX: an extensible framework for biological data cleaning. ACM SIGMOD Record 33, 2 (2004), 51–57.

[116] HEVNER,A.R.,MARCH, S. T., PARK, J., AND RAM,S. Design Science in Information Systems Research. MIS Quarterly 28, 1 (2004), 75–105.

[117] HILDEBRAND,M., VAN OSSENBRUGGEN, J., AMIN,A.,AROYO,L.,WIELE- MAKER, J., AND HARDMAN,L. The design space of a configurable auto- completion component. Tech. Rep. INS-E0708, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, Netherlands, 2007.

[118] HILDEBRAND,M.,VAN OSSENBRUGGEN, J., HARDMAN,L., AND JACOBS, G. Supporting subject matter annotation using heterogeneous thesauri, a user study in Web data reuse. Tech. Rep. INS-E0902, Centrum Wiskunde & Informatica, 2009.

[119] HILL,L.,BUCHEL,O.,JANÉE,G., AND ZENG,M.L. Integration of Knowledge Organization Systems into Digital Library Architectures. In 13th ASIST SIGICR Workshop, Reconceptualizing Classification Research (2002), pp. 46–52.

[120] HINCHLIFF,C.E.,SMITH,S.A.,ALLMAN, J. F., BURLEIGH, J. G., CHAUD- HARY,R.,COGHILL,L.M.,CRANDALL,K.A.,DENG, J., DREW, B. T., GAZIS,R.,GUDE,K.,HIBBETT,D.S.,KATZ,L.A.,LAUGHINGHOUSE, H.D.,MCTAVISH, E. J., MIDFORD, P. E., OWEN,C.L.,REE,R.H.,REES, J. A., SOLTIS,D.E.,WILLIAMS, T., AND CRANSTON,K.A. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences 112, 41 (2015), 12764–12769.

[121] HITZLER, P., AND VAN HARMELEN, F. A Reasonable Semantic Web. Se- mantic Web 1, 1-2 (2010), 39–44.

[122] HLAVA,M. Developing an eclectic terminology registry. Bulletin of the Association for Information Science and Technology 37, 4 (2011), 19–22.

[123] HOANG,H.H., AND TJOA,A.M. The State of the Art of Ontology-based Query Systems: A Comparison of Existing Approaches. In Proceedings of the International Conference on Computing and Informatics, Kuala Lumpur, Malaysia, June 6-8 (2006), IEEE.

66 Bibliography

[124] HODGE,G. Systems of Knowledge Organization for Digital Libraries: Be- yond Traditional Authority Files. Tech. rep., The Digital Library Federation, Council on Library and Information Resources, 2000.

[125] HOLETSCHEK, J., DRÖGE,G.,GÜNTSCH,A., AND BERENDSOHN, W. G. The ABCD of primary biodiversity data access. Plant Biosystems - An International Journal Dealing with all Aspects of Plant Biology 146, 4 (2012), 771–779.

[126] HOLLINK,L.,SCHREIBER,G., AND WIELINGA,B. Patterns of semantic relations to improve image content search. Journal of Web Semantics 5, 3 (2007), 195–203.

[127] HOOI, Y. K., HASSAN, M. F., AND SHARIFF,A.M. A Survey on Ontology Mapping Techniques. In Advances in Computer Science and its Applications: CSA 2013 (2014), H. Y. Jeong, M. S. Obaidat, N. Y. Yen, and J. J. J. H. Park, Eds., Springer-Verlag, pp. 829–836.

[128] HUANG,Z., AND STUCKENSCHMIDT,H. Reasoning with multi-version ontologies: A temporal logic approach. In The Semantic Web – ISWC 2005: 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings (2005), Y. Gil, E. Motta, V. R. Benjamins, and M. A. Musen, Eds., Springer-Verlag, pp. 398–412.

[129] HYVÖNEN,E.,ALONEN,M.,KOHO,M., AND TUOMINEN, J. BirdWatch — Supporting Citizen Scientists for Better Linked Data Quality for Biodi- versity Management. In Proceedings of the first international Workshop on Semantics for Biodiversity Montpellier, France, May 27 (2013), P. Larmande, E. Arnaud, I. Mougenot, C. Jonquet, T. Libourel, and M. Ruiz, Eds., CEUR Workshop Proceedings.

[130] HYVÖNEN,E.,IKKALA,E., AND TUOMINEN, J. Linked data brokering service for historical places and maps. In Proceedings of the 1st Workshop on Humanities in the Semantic Web co-located with 13th ESWC Conference 2016 (ESWC 2016) Anissaras, Greece, May 29 (2016), A. Adamou, E. Daga, and L. Isaksen, Eds., CEUR Workshop Proceedings, pp. 39–52.

[131] HYVÖNEN,E.,LINDROOS,R.,KAUPPINEN, T., AND HENRIKSSON,R. An Ontology Service for Geographical Content. In Poster Proceedings of the International Semantic Web Conference (ISWC 2007), Busan, Korea, November 11-15 (2007).

[132] HYVÖNEN,E., AND MÄKELÄ,E. Semantic Autocompletion. In The Se- mantic Web – ASWC 2006: First Asian Semantic Web Conference, Beijing, China, September 3-7, 2006, Proceedings (2006), R. Mizoguchi, Z. Shi, and F. Giunchiglia, Eds., Springer-Verlag, pp. 739–751.

[133] HYVÖNEN,E.,MÄKELÄ,E.,KAUPPINEN, T., ALM,O.,KURKI, J., RUOT- SALO, T., SEPPÄLÄ,K.,TAKALA, J., PUPUTTI,K.,KUITTINEN,H.,VIL- JANEN,K.,TUOMINEN, J., PALONEN, T., FROSTERUS,M.,SINKKILÄ, R.,PAAKKARINEN, P., LAITIO, J., AND NYBERG,K. CultureSampo – A National Publication System of Cultural Heritage on the Semantic Web 2.0. In The Semantic Web: Research and Applications: 6th European Seman- tic Web Conference, ESWC 2009 Heraklion, Crete, Greece, May 31–June 4, 2009 Proceedings (2009), L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano,

67 Bibliography

T. Heath, E. Hyvönen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, Eds., Springer-Verlag, pp. 851–856.

[134] HYVÖNEN,E.,TUOMINEN, J., ALONEN,M., AND MÄKELÄ,E. Linked Data Finland: A 7-star model and platform for publishing and re-using linked datasets. In The Semantic Web: ESWC 2014 Satellite Events: ESWC 2014 Satellite Events, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers (2014), V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, and A. Tordai, Eds., Springer-Verlag, pp. 226–230.

[135] HYVÖNEN,E.,TUOMINEN, J., KAUPPINEN, T., AND VÄÄTÄINEN, J. Repre- senting and utilizing changing historical places as an ontology time series. In Geospatial Semantics and the Semantic Web: Foundations, Algorithms, and Applications, N. Ashish and A. P. Sheth, Eds. Springer-Verlag, 2011, pp. 1–25.

[136] HYVÖNEN,E.,VILJANEN,K.,TUOMINEN, J., AND SEPPÄLÄ,K. Building a National Semantic Web Ontology and Ontology Service Infrastructure— The FinnONTO Approach. In The Semantic Web: Research and Appli- cations: 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008 Proceedings (2008), S. Bechhofer, M. Hauswirth, J. Hoffmann, and M. Koubarakis, Eds., Springer-Verlag, pp. 95–109.

[137] HÖLSCHER,C., AND STRUBE,G. Web search behaviour of internet experts and newbies. Computer networks 33, 1–6 (2000), 337–346.

[138] ICZN-INTERNATIONAL COMMISSIONON ZOOLOGICAL NOMENCLATURE. International Code of Zoological Nomenclature. International Trust for Zoological Nomenclature, 1999.

[139] ISO TC46/SC9/WG8, AND ISAAC,A. Correspondence between ISO 25964 and SKOS/SKOS-XL Models. Tech. rep., National Information Standards Organization (NISO), 2013.

[140] JAIN, P., HITZLER, P., KUNAL VERMA,YEH, P. Z., AND SHETH,A. Moving beyond sameAs with PLATO : Partonomy detection for Linked Data. In Pro- ceedings of 23rd ACM conference on Hypertext and social media, Milwaukee, USA, June 25-28 (2012), ACM, pp. 33–42.

[141] JARRAR,M., AND MEERSMAN,R. Formal Ontology Engineering in the DOGMA Approach. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE: Confederated International Confer- ences CoopIS, DOA, and ODBASE 2002 Proceedings (2002), R. Meersman and Z. Tari, Eds., Springer-Verlag, pp. 1238–1254.

[142] JÄRVELIN,K.,KEKÄLÄINEN, J., AND NIEMI, T. ExpansionTool: Concept- Based Query Expansion and Construction. Information Retrieval 4, 3-4 (2001), 231–255.

[143] JASPER,R., AND USCHOLD,M. A Framework for Understanding and Classifying Ontology Applications. In Proceedings of the Twelfth Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada, October 16-22 (1999).

68 Bibliography

[144] JAVED,M.,ABGAZ, Y. M., AND PAHL,C. Ontology Change Management and Identification of Change Patterns. Journal on Data Semantics 2, 2 (2013), 119–143.

[145] JETZ, W., MCPHERSON, J. M., AND GURALNICK, R. P. Integrating bio- diversity distribution knowledge: Toward a global map of life. Trends in Ecology and Evolution 27, 3 (2012), 151–159.

[146] JIMÉNEZ RUIZ,E.,CUENCA GRAU,B.,HORROCKS,I., AND BERLANGA, R. Supporting concurrent ontology development: Framework, algorithms and tool. Data & Knowledge Engineering 70, 1 (2011), 146–164.

[147] JOHO,H.,COVERSON,C.,SANDERSON,M., AND BEAULIEU,M. Hierar- chical presentation of expansion terms. In Proceedings of the 2002 ACM symposium on Applied computing, Madrid, Spain, March 11-14 (2002), ACM, pp. 645–649.

[148] JONES,A.C.,WHITE, R. J., AND ORME,E.R. Identifying and relating bi- ological concepts in the Catalogue of Life. Journal of Biomedical Semantics 2, 7 (2011).

[149] JOTHILAKSHMI,R.,SHANTHI, N., AND BABISARASWATHI,R. A survey on semantic query expansion. Journal of Theoretical and Applied Information Technology 57, 1 (2013), 128–138.

[150] JUDD, W. S., CAMPBELL,C.S.,KELLOGG,E.A.,STEVENS, P. F., AND DONOGHUE, M. J. Plant Systematics: A Phylogenetic Approach, 4 ed. Sinauer Associates, Sunderland, Massachusetts, 2016.

[151] JUPP,S.,BECHHOFER,S., AND STEVENS,R. A flexible API and editor for SKOS. In The Semantic Web: Research and Applications: 6th European Se- mantic Web Conference, ESWC 2009 Heraklion, Crete, Greece, May 31–June 4, 2009 Proceedings (2009), L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyvönen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, Eds., Springer-Verlag, pp. 506–520.

[152] KATASONOV,A., AND TERZIYAN, V. Semantic Agent Programming Lan- guage (S-APL): A Middleware Platform for the Semantic Web. In Pro- ceedings of the IEEE International Conference on Semantic Computing 2008 (ICSC 2008), Santa Clara, California, USA, August 4-7 (2008), IEEE, pp. 504–511.

[153] KAUPPINEN, T., KUITTINEN,H.,TUOMINEN, J., SEPPÄLÄ,K., AND HYVÖNEN,E. Extending an Ontology by Analyzing Annotation Co- occurrences in a Semantic Cultural Heritage Portal. In Proceedings of the ASWC 2008 Workshop on Collective Intelligence (ASWC-CI 2008) orga- nized as a part of the 3rd Asian Semantic Web Conference (ASWC 2008), Bangkok, Thailand, February 2-5 (2009), pp. 1–6.

[154] KEEN,M.,ACHARYA,A.,BISHOP,S.,HOPKINS,A.,MILINSKI,S.,NOTT, C.,ROBINSON,R.,ADAMS, J., AND VERSCHUEREN, P. Patterns: Imple- menting an SOA using an enterprise service bus. IBM, 2004.

[155] KENNEDY, J. B., KUKLA,R., AND PATERSON, T. Scientific names are ambiguous as identifiers for biological taxa: their context and definition are required for accurate data integration. In Data Integration in the Life

69 Bibliography

Sciences: Second International Workshop, DILS 2005, San Diego, CA, USA, July 20-22, 2005, Proceedings (2005), B. Ludäscher and L. Raschid, Eds., Springer-Verlag, pp. 80–95.

[156] KHAN,Z.C., AND KEET,C.M. The foundational ontology library RO- MULUS. In Model and Data Engineering: Third International Conference, MEDI 2013, Amantea, Italy, September 25-27, 2013, Proceedings (2013), A. Cuzzocrea and S. Maabout, Eds., Springer-Verlag, pp. 200–211.

[157] KHATTAK,A.M.,LATIF,K., AND LEE,S. Change management in evolving web ontologies. Knowledge-Based Systems 37 (2013), 1–18.

[158] KIRSTEN, T., GROSS,A.,HARTUNG,M., AND RAHM,E. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. Journal of Biomedical Semantics 2, 6 (2011).

[159] KIRYAKOV,A.,POPOV,B.,TERZIEV,I.,MANOV,D., AND OGNYANOFF,D. Semantic annotation, indexing, and retrieval. Journal of Web Semantics 2, 1 (2004), 49–79.

[160] KLEIN,M. Change Management for Distributed Ontologies. PhD thesis, Vrije Universiteit Amsterdam, Netherlands, Aug. 2004.

[161] KLEIN,M., AND FENSEL,D. Ontology versioning on the Semantic Web. In Proceedings of the 1st Semantic Web Working Symposium (SWWS’01), Stanford University, California, USA, July 30 - August 1 (2001), I. F. Cruz, S. Decker, J. Euzenat, and D. L. McGuinness, Eds., pp. 75–91.

[162] KLESS,D.,JANSEN,L., AND MILTON,S. A content-focused method for re-engineering thesauri into semantically adequate ontologies using OWL. Semantic Web 7, 5 (2016), 543–576.

[163] KNAPP, P., POLASZEK,A., AND WATSON,M. Spreading the word. Nature 446 (2007), 261–262.

[164] KOZAKI,K.,SUNAGAWA,E.,KITAMURA, Y., AND MIZOGUCHI,R. A framework for cooperative ontology construction based on dependency man- agement of modules. In Proceedings of the First International Workshop on Emergent Semantics and Ontology Evolution (ESOE 2007), co-located with ISWC 2007 + ASWC 2007, Busan, Korea, November 12 (2007), L. L. Chen, P. Cudré-Mauroux, P. Haase, A. Hotho, and E. Ong, Eds., CEUR Workshop Proceedings, pp. 33–44.

[165] KURKI, J., AND HYVÖNEN,E. Collaborative metadata editor integrated with ontology services and faceted portals. In Proceedings of the 1st Work- shop on Ontology Repositories and Editors for the Semantic Web, Hersonis- sos, Crete, Greece, May 31 (2010), M. d’Aquin, A. García Castro, C. Lange, and K. Viljanen, Eds., CEUR Workshop Proceedings, pp. 7–11.

[166] LAPP,H.,MORRIS,R.A.,CATAPANO, T., HOBERN,D., AND MORRISON, N. Organizing our knowledge of biodiversity. Bulletin of the American Society for Information Science and Technology 37, 4 (2011), 38–42.

[167] LEADBETTER,A. The NERC Vocabulary Server version 2.0. Tech. rep., British Oceanographic Data Centre, 2012.

70 Bibliography

[168] LEDL,A., AND VOSS, J. Describing Knowledge Organization Systems in BARTOC and JSKOS. In Proceedings of the 12th International Conference on Terminology and Knowledge Engineering, Copenhagen, Denmark, June 22-24 (2016), H. Erdman Thomsen, A. Pareja-Lora, and B. Nistrup Madsen, Eds., pp. 168–178.

[169] LEENHEER, P. D., AND MENS, T. ONTOLOGY EVOLUTION: State of the Art & Future Directions. Ontology Theory Management and Design- Advanced Tools and Models 2, 1 (2010), 1–47.

[170] LENAT,D.B. CYC: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38, 11 (1995), 33–38.

[171] LEPAGE,D.,VAIDYA,G., AND GURALNICK,R. Avibase - A database system for managing and organizing taxonomic concepts. ZooKeys 135, 420 (2014), 117–135.

[172] LI, Y., THOMPSON,S.,TAN,Z.,GILES, N., AND GHARIB,H. Beyond ontology construction; ontology services as online knowledge sharing com- munities. In Semantic Web - ISWC 2003: Second International Semantic Web Conference, Sanibel Island, FL, USA, October 20-23, 2003, Proceed- ings (2003), D. Fensel, K. Sycara, and J. Mylopoulos, Eds., Springer-Verlag, pp. 469–483.

[173] LUGHADHA, E. N. Towards a working list of all known plant species. Philosophical transactions of the Royal Society of London B: Biological sciences 359, 1444 (2004), 681–687.

[174] MAALI, F., AND ERICKSON, J. Data catalog vocabulary (DCAT). W3C recommendation, World Wide Web Consortium, 16 January 2014. http: //www.w3.org/TR/2014/REC-vocab-dcat-20140116/.

[175] MAEDCHE,A.,MOTIK,B., AND STOJANOVIC,L. Managing multiple and distributed ontologies on the Semantic Web. The International Journal on Very Large Data Bases (The VLDB Journal) 12, 4 (2003), 286–302.

[176] MAEDCHE,A., AND STAAB,S. Ontology learning in the semantic web. IEEE Intelligent Systems 16, 2 (2001), 72–79.

[177] MÄKELÄ,E.,HYPÉN,K., AND HYVÖNEN,E. BookSampo—Lessons Learned in Creating a Semantic Portal for Fiction Literature. In The Semantic Web – ISWC 2011: 10th International Semantic Web Confer- ence, Bonn, Germany, October 23-27, 2011, Proceedings, Part II (2011), L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, Eds., Springer-Verlag, pp. 173–188.

[178] MÄKELÄ,E.,RUOTSALO, T., AND HYVÖNEN,E. How to deal with mas- sively heterogeneous cultural heritage data – lessons learned in Culture- Sampo. Semantic Web 3, 1 (2012), 85–109.

[179] MALAISÉ, V., AROYO,L.,BRUGMAN,H.,GAZENDAM,L., DE JONG,A., NEGRU,C., AND SCHREIBER,G. Evaluating a thesaurus browser for an audio-visual archive. In Managing Knowledge in a World of Networks: 15th International Conference, EKAW 2006, Podˇebrady, Czech Republic, October 2-6, 2006, Proceedings (2006), S. Staab and V. Svátek, Eds., Springer-Verlag, pp. 272–286.

71 Bibliography

[180] MANGOLD,C. A survey and classification of semantic search approaches. International Journal of Metadata, Semantics and Ontologies 2, 1 (2007), 23–34.

[181] MANNING,C.D.,RAGHAVAN, P., AND SCHÜTZE,H. Introduction to Information Retrieval. Cambridge University Press, 2008.

[182] MARCH, S. T., AND SMITH, G. F. Design and natural science research on information technology. Decision Support Systems 15, 4 (1995), 251–266.

[183] MARTÍNEZ-GONZÁLEZ,M.M., AND ALVITE-DÍEZ,M.-L. On the evalu- ation of thesaurus tools compatible with the Semantic Web. Journal of Information Science 40, 6 (2014), 711–722.

[184] MASCARDI, V., CORDÌ, V., AND ROSSO, P. A Comparison of Upper Ontolo- gies. In Proceedings of the 8th Workshop Dagli Oggetti agli Agenti: Agenti e Industria: Applicazioni tecnologiche degli agenti software (WOA 2007), Genova, Italy, September 24-25 (2007), pp. 55–64.

[185] MAYDEN,R.L. A hierarchy of species concepts: the denouement in the saga of the species problem. In Species: The Units of Biodiversity, M. F. Oaridge, H. A. Dawah, and M. R. Wilson., Eds. Chapman & Hall, 1997, pp. 381–424.

[186] MAYNARD,D.,PETERS, W., D’AQUIN,M., AND SABOU,M. Change man- agement for metadata evolution. In Proceedings of the International Work- shop on Ontology Dynamics (IWOD-07), the 4th European Semantic Web Conference (ESWC-07), Innsbruck, Austria, June 7 (2007), G. Flouris and M. d’Aquin, Eds., pp. 27–40.

[187] MCNEILL, J., BARRIE, F. R., BUCK, W. R., DEMOULIN, V., GREUTER, W., HAWKSWORTH,D.L.,HERENDEEN, P. S., KNAPP,S.,MARHOLD,K., PRADO, J., PRUD’HOMME VAN REINE, W. F., SMITH, G. F., WIERSEMA, J. H., AND TURLAND, N. J. International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) adopted by the Eighteenth International Botanical Congress Melbourne, Australia, July 2011. Koeltz Scientific Books, Oberreifenberg, 2012.

[188] MILES,A., AND BECHHOFER,S. SKOS simple knowledge organization sys- tem reference. W3C recommendation, World Wide Web Consortium, 18 Au- gust 2009. http://www.w3.org/TR/2009/REC-skos-reference-20090818/.

[189] MILES,A.,MATTHEWS,B.,WILSON,M., AND BRICKLEY,D. SKOS Core: Simple Knowledge Organisation for the Web. In Proceedings of the International Conference on Dublin Core and Metadata Applications 2008, Madrid, Spain, September 12-15 (2005), Dublin Core Metadata Initiative, pp. 3–10.

[190] MOSSAKOWSKI, T., KUTZ,O., AND CODESCU,M. Ontohub: A seman- tic repository for heterogeneous ontologies. In Proceedings of the Theory Day in Computer Science (DACS 2014), satellite workshop of ICTAC 2014, Bucharest, Romania, September 17-19 (2014).

[191] NAGYPÁL,G., AND MÜLLER, W. Ontology Design. In Semantics At Work, Ontology Management – Tools and Techniques, R. Volz, Ed. Lulu Press, 2008.

72 Bibliography

[192] NAVIGLI,R., AND VELARDI, P. An Analysis of Ontology-based Query Expansion Strategies. Information Retrieval (2002), 42–49.

[193] NEUBERT, J. Bringing the "thesaurus for economics" on to the web of linked data. In Proceedings of the Linked Data on the Web Workshop (LDOW2009), Madrid, Spain, April 20 (2009), C. Bizer, T. Heath, T. Berners-Lee, and K. Idehen, Eds., CEUR Workshop Proceedings.

[194] NILES,I., AND PEASE,A. Towards a Standard Upper Ontology. In Pro- ceedings of the international conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine, USA, October 17 - 19 (2001), ACM, pp. 2–9.

[195] NOY, N. F., AND KLEIN,M. Ontology evolution: Not the same as schema evolution. Knowledge and Information Systems 6, 4 (2004), 428–440.

[196] NOY, N. F., SHAH, N. H., WHETZEL, P. L., DAI,B.,DORF,M.,GRIFFITH, N., JONQUET,C.,RUBIN,D.L.,STOREY,M.-A.,CHUTE,C.G., AND MUSEN,M.A. BioPortal: Ontologies and integrated data resources at the click of a mouse. Nucleic Acids Research 37, Web Server issue (2009), W170–W173.

[197] OBERLE,D.,STAAB,S.,STUDER,R., AND VOLZ,R. Supporting application development in the semantic web. ACM Transactions on Internet Technology 5, 2 (2005), 328–358.

[198] OREN,E.,DELBRU,R.,CATASTA,M.,CYGANIAK,R.,STENZHORN,H., AND TUMMARELLO,G. Sindice.com: A Document-oriented Lookup Index for Open Linked Data. International Journal of Metadata, Semantics and Ontologies 3, 1 (2008), 37–52.

[199] OREN,E.,MÖLLER,K.H.,SCERRI,S.,HANDSCHUH,S., AND SINTEK,M. What are Semantic Annotations? Tech. rep., DERI Galway, 2006.

[200] PAGE,R.D.M. Taxonomic names, metadata, and the Semantic Web. Biodiversity Informatics 3 (2006), 1–15.

[201] PAGE,R.D.M. Biodiversity informatics: The challenge of linking data and the role of shared identifiers. Briefings in Bioinformatics 9, 5 (2008), 345–354.

[202] PAGE,R.D.M. Linking NCBI to Wikipedia: A wiki-based approach. PLoS Currents (2011), 1–14.

[203] PAGE,R.D.M. BioNames: linking taxonomy, texts, and trees. PeerJ 1 (2013).

[204] PAN, J., CRANEFIELD,S., AND CARTER,D. A lightweight ontology repos- itory. In Proceedings of the second international joint conference on Au- tonomous agents and multiagent systems - AAMAS ’03 (2003), p. 632.

[205] PARR,C.S.,GURALNICK,R.,CELLINESE, N., AND PAGE,R.D.M. Evolu- tionary informatics: Unifying knowledge about the diversity of life. Trends in Ecology and Evolution 27, 2 (2012), 94–103.

[206] PARR,C.S.,SCHULZ,K.S.,HAMMOCK, J., WILSON, N., LEARY, P., RICE, J., AND CORRIGAN,JR., R. J. TraitBank: Practical semantics for organism attribute data. Semantic Web 7, 6 (2016), 577–588.

73 Bibliography

[207] PARR,C.S.,WILSON, N., LEARY, P., SCHULZ,K.S.,LANS,K.,WALLEY, L.,HAMMOCK, J. A., GODDARD,A.,RICE, J., STUDER,M.,HOLMES, J. T. G., AND CORRIGAN,JR., R. J. The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth. Biodiversity Data Journal 2 (2014).

[208] PATTERSON, D. J., COOPER, J., KIRK, P. M., PYLE,R.L., AND REMSEN, D. P. Names are key to the big new biology. Trends in Ecology and Evolution 25, 12 (2010), 686–691.

[209] PATTERSON, D. J., REMSEN,D.,MARINO, W. A., AND NORTON,C. Taxo- nomic Indexing—Extending the Role of Taxonomy. Systematic Biology 55, 3 (2006), 367–373.

[210] PEFFERS,K.,ROTHENBERGER,M.,TUUNANEN, T., AND VAEZI,R. Design Science Research Evaluation. In Design Science Research in Informa- tion Systems. Advances in Theory and Practice: 7th International Confer- ence, DESRIST 2012, Las Vegas, NV, USA, May 14-15, 2012. Proceedings (2012), K. Peffers, M. Rothenberger, and B. Kuechler, Eds., Springer-Verlag, pp. 398–410.

[211] PEFFERS,K.,TUUNANEN, T., ROTHENBERGER,M.A., AND CHATTER- JEE,S. A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems 24, 3 (2007), 45–77.

[212] PEREIRA,R.,RICHARDS,K.,HOBERN,D.,HYAM,R.,BELBIN,L., AND BLUM,S. TDWG life sciences identifiers (LSID) applicability statement. TDWG draft standard, Biodiversity Information Standards (TDWG), 2009. http://www.tdwg.org/standards/150.

[213] PHIPPS, J., AND HILLMANN,D.I. The Open Metadata Registry: An Update. Bulletin of the Association for Information Science and Technology 37, 4 (2011), 35–37.

[214] PULLAN,M.R.,WATSON, M. F., KENNEDY, J. B., RAGUENAUD,C., AND HYAM,R. The Prometheus Taxonomic Model: A Practical Approach to Representing Multiple Classifications. Taxon 49, 1 (2000), 55–75.

[215] PYLE,R.L., AND MICHEL,E. ZooBank: Developing a nomenclatural tool for unifying 250 years of biological information. Zootaxa, 1950 (2008), 39–50.

[216] QING,Z. Building a platform for linked open thesauri. Library Hi Tech 31, 4 (2013), 620–637.

[217] QU, Y., AND CHENG,G. Falcons Concept Search: A Practical Search Engine for Web Ontologies. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 41, 4 (2011), 810–816.

[218] REES, T. Taxamatch, an algorithm for near (’fuzzy’) matching of scientific names in taxonomic databases. PLoS ONE 9, 9 (2014).

[219] REMSEN,D. The use and limits of scientific names in biological informatics. ZooKeys 550 (2016), 207–223.

74 Bibliography

[220] REMSEN, D. P., DÖRING,M., AND ROBERTSON, T. GBIF GNA profile refer- ence guide for Darwin Core Archive, core terms and extensions. Tech. rep., Global Biodiversity Information Facility (GBIF), Copenhagen, Denmark, 2011. version 1.2, released on 1 April 2011.

[221] RICHARDS,K.,WHITE,R.,NICOLSON, N., AND PYLE,R. A beginner’s guide to persistent identifiers. Tech. rep., Global Biodiversity Information Facility (GBIF), Copenhagen, Denmark, 2011. version 1.0, released on 9 February 2011.

[222] RILEY, J. Understanding Metadata. Primer, National Information Stan- dards Organization, 2017.

[223] ROSKOV, Y., ABUCAY,L.,ORRELL, T., NICOLSON,D.,FLANN,C.,BAILLY, N., KIRK, P., BOURGOIN, T., DEWALT,R.,DECOCK, W., AND DE WEVER, A. Species 2000 & ITIS Catalogue of Life. Digital resource, Species 2000: Naturalis, Leiden, Netherlands, 2017. 27th February 2017, http://www. catalogueoflife.org/col.

[224] RUEDA,C.,BERMUDEZ,L., AND FREDERICKS, J. The MMI Ontology Registry and Repository: A Portal for Marine Metadata Interoperability. In Proceedings of the Oceans Conference 2009, Biloxi, Mississippi, USA, 26–29 October (2009), IEEE, pp. 1854–1859.

[225] SARKAR, I. N. Biodiversity informatics: Organizing and linking informa- tion across the spectrum of life. Briefings in Bioinformatics 8, 5 (2007), 347–357.

[226] SCHANDL, T., AND BLUMAUER,A. PoolParty: SKOS thesaurus manage- ment utilizing linked data. In The Semantic Web: Research and Appli- cations: 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30 – June 3, 2010, Proceedings, Part II (2010), L. Aroyo, G. Antoniou, E. Hyvönen, A. ten Teije, H. Stuckenschmidt, L. Cabral, and T. Tudorache, Eds., Springer-Verlag, pp. 421–425.

[227] SCHREIBER, A. T., DUBBELDAM,B.,WIELEMAKER, J., AND WIELINGA, B. Ontology-based photo annotation. IEEE Intelligent Systems and Their Applications 16, 3 (2001), 66–74.

[228] SCHULZ,S.,STENZHORN,H., AND BOEKER,M. The ontology of biological taxa. Bioinformatics 24, 13 (2008), 313–321.

[229] SEGERS,H., DE SMET, W. H., FISCHER,C.,FONTANETO,D., MICHALOUDI,E.,WALLACE,R.L., AND JERSABEK,C.D. Towards a List of Available Names in Zoology, partim Phylum Rotifera. Zootaxa 3179 (2012), 61–68.

[230] ŠEVCENKOˇ ,M. Online Presentation of an Upper Ontology. In Proceedings of Znalosti 2003, Ostrava, Czech Republic, February 19-21 (2003), pp. 153– 162.

[231] SHAW,R.,RABINOWITZ,A.,GOLDEN, P., AND KANSA,E. A sharing- oriented design strategy for networked knowledge organization systems. International Journal on Digital Libraries 17, 1 (2016), 49–61.

75 Bibliography

[232] SHIRI,A. Thesauri: Introduction and Recent Developments. In Powering search: The role of thesauri in new information environments. Information Today, 2012, pp. 1–41.

[233] SHVAIKO, P., AND EUZENAT, J. Ontology Matching: State of the Art and Future Challenges. IEEE Transactions on Knowledge and Data Engineering 25, 1 (2013), 158–176.

[234] SLIMANI, T. Semantic Annotation: The Mainstay of Semantic Web. Inter- national Journal of Computer Applications Technology and Research 2, 6 (2013), 763–770.

[235] SMITH,B.,ASHBURNER,M.,ROSSE,C.,BARD, J., BUG, W., CEUSTERS, W., GOLDBERG, L. J., EILBECK,K.,IRELAND,A.,MUNGALL, C. J., OBI CONSORTIUM,LEONTIS, N., ROCCA-SERRA, P., RUTTENBERG,A.,SAN- SONE,S.-A.,SCHEUERMANN,R.H.,SHAH, N., WHETZEL, P. L., AND LEWIS,S. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25, 11 (2007), 1251– 1255.

[236] SOERGEL,D. The Arts and Architecture Thesaurus (AAT): A Critical Appraisal. Visual Resources 10, 4 (1995), 369–400.

[237] SOLOMOU,G., AND PAPATHEODOROU, T. The use of SKOS vocabularies in digital repositories: The DSpace case. In ICSC 2010: 2010 IEEE Fourth International Conference on Semantic Computing, Proceedings, Pittsburgh, Pennsylvania, USA, September 22-24 (2010), IEEE, pp. 542–547.

[238] SOUZA,R.R.,TUDHOPE,D., AND ALMEIDA,M.B. The KOS spectra A tentative typology of knowledge organization systems. In Paradigms and conceptual systems in knowledge organization: Proceedings of the Eleventh International ISKO Conference, Rome, Italy, 23-26 February (2010), C. Gnoli and F. Mazzocchi, Eds., Ergon-Verlag, pp. 122–128.

[239] STAAB,S.,MAEDCHE,A., AND HANDSCHUH,S. An annotation framework for the semantic Web. In Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan (2001).

[240] STAAB,S., AND STUDER,R., Eds. Handbook on Ontologies. International Handbooks on Information Systems. Springer-Verlag, Berlin Heidelberg, 2004.

[241] STOJANOVIC,L. Methods and Tools for Ontology Evolution. PhD thesis, University of Karlsruhe, Germany, Aug. 2004.

[242] STOJANOVIC,L.,MAEDCHE,A.,MOTIK,B., AND STOJANOVIC, N. User- driven Ontology Evolution Management. In Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web: 13th Inter- national Conference, EKAW 2002, Sigüenza, Spain, October 1–4, 2002, Proceedings (2002), A. Gómez-Pérez and V. R. Benjamins, Eds., Springer- Verlag, pp. 285–300.

[243] STUCKENSCHMIDT,H., AND KLEIN,M. Integrity and change in modu- lar ontologies. In IJCAI’03: Proceedings of the 18th international joint conference on Artificial intelligence (2003), Morgan Kaufmann Publishers, pp. 900–905.

76 Bibliography

[244] STUDER,R.,BENJAMINS, V. R., AND FENSEL,D. Knowledge engineering: Principles and methods. Data & Knowledge Engineering 25, 1-2 (1998), 161–197.

[245] SUMMERS,E.,ISAAC,A.,REDDING,C., AND KRECH,D. LCSH, SKOS and Linked Data. In Proceedings of the International Conference on Dublin Core and Metadata Applications 2008, Berlin, Germany, September 22-26 (2008), Dublin Core Metadata Initiative, pp. 25–33.

[246] SUNAGAWA,E.,KOZAKI,K.,KITAMURA, Y., AND MIZOGUCHI,R. An environment for distributed ontology development based on dependency management. In The Semantic Web - ISWC 2003: Second International Semantic Web Conference, Sanibel Island, FL, USA, October 20-23, 2003, Proceedings (2003), D. Fensel, K. Sycara, and J. Mylopoulos, Eds., Springer- Verlag, pp. 453–468.

[247] SUOMINEN,O.,HYVÖNEN,E.,VILJANEN,K., AND HUKKA,E. HealthFinland-A national semantic publishing network and portal for health information. Journal of Web Semantics 7, 4 (2009), 287–297.

[248] SUOMINEN,O.,JOHANSSON,A.,YLIKOTILA,H.,TUOMINEN, J., AND HYVÖNEN,E. Vocabulary Services Based on SPARQL Endpoints: ONKI Light on SPARQL. In Poster proceedings of the 18th International Confer- ence on Knowledge Engineering and Knowledge Management (EKAW 2012), Galway, Ireland, October 8-12 (2012).

[249] SUOMINEN,O.,PESSALA,S.,TUOMINEN, J., LAPPALAINEN,M.,NYKYRI, S.,YLIKOTILA,H.,FROSTERUS,M., AND HYVÖNEN,E. Deploying national ontology services: From ONKI to Finto. In Proceedings of the Industry Track at the International Semantic Web Conference 2014, Riva del Garda, Italy, October 19-23 (2014), A. Polleres, A. Garcia, and R. Benjamins, Eds., CEUR Workshop Proceedings.

[250] TAM,A.M., AND LEUNG,C.H.C. Structured Natural-Language Descrip- tion for Semantic Content Retrieval. Journal of the American Society for Information Science and Technology 52, 11 (2001), 930–937.

[251] TAXONOMIC NAMESAND CONCEPTS INTEREST GROUP. Taxonomic con- cept transfer schema (TCS). TDWG current (2005) standard, Biodiversity Information Standards (TDWG), 2006. [http://www.tdwg.org/standards/ 117.

[252] TENNIS, J. T., AND SUTTON,S.A. Extending the Simple Knowledge Organization System for Concept Management in Vocabulary Development Applications. Journal of the American Society for Information Science and Technology 59, 1 (2008), 25–37.

[253] THESSEN,A.E.,CUI,H., AND MOZZHERIN,D. Applications of natural language processing in biodiversity science. Advances in Bioinformatics 2012 (2012).

[254] THOMAS,E.,PAN, J. Z., AND SLEEMAN,D. ONTOSEARCH2: Searching ontologies semantically. In Proceedings of the OWLED 2007 Workshop on OWL: Experiences and Directions, Innsbruck, Austria, June 6-7 (2007), C. Golbreich, A. Kalyanpur, and B. Parsia, Eds., CEUR Workshop Proceed- ings.

77 Bibliography

[255] TOPQUADRANT. Controlled vocabularies, taxonomies, and thesauruses (and ontologies). Whitepaper, TopQuadrant, 2013.

[256] TUDHOPE,D.,ALANI,H., AND JONES,C. Augmenting thesaurus rela- tionships: Possibilities for retrieval. Journal of Digital Information 1, 8 (2001).

[257] TUDHOPE,D., AND BINDING,C. Toward Terminology Services: Experi- ences with a Pilot Web Service Thesaurus Browser. Bulletin of the American Society for Information Science and Technology 32, 5 (2006), 6–9.

[258] TUOMINEN, J., SALONOJA,M.,VILJANEN,K., AND HYVÖNEN,E. A user interface for ontology repositories. In Proceedings of the 1st Workshop on Ontology Repositories and Editors for the Semantic Web, Hersonissos, Crete, Greece, May 31 (2010), M. d’Aquin, A. García Castro, C. Lange, and K. Viljanen, Eds., CEUR Workshop Proceedings, pp. 17–21.

[259] TUOMINEN, J., AND VILJANEN,K. Ontology services: results of the ONKI user inquiry, presentation at the FinnONTO 2.0 project board meeting 13.12.2011, 2011.

[260] USCHOLD,M., AND GRUNINGER,M. Ontologies: Principles, methods and applications. Knowledge Engineering Review 11, 2 (1996), 93–136.

[261] VAN ASSEM,M.,MALAISÉ, V., MILES,A., AND SCHREIBER,G. A method to convert thesauri to SKOS. In The Semantic Web: Research and Ap- plications: 3rd European Semantic Web Conference, ESWC 2006, Budva, Montenegro, June 11-14, 2006, Proceedings (2006), Y. Sure and J. Domingue, Eds., Springer-Verlag, pp. 95–109.

[262] VAN ASSEM,M.,MENKEN,M.R.,SCHREIBER,G.,WIELEMAKER, J., AND WIELINGA,B. A Method for Converting Thesauri to RDF/OWL. In The Semantic Web – ISWC 2004: Third International Semantic Web Conference, Hiroshima, Japan, November 7-11, 2004, Proceedings (2004), S. A. McIlraith, D. Plexousakis, and F. van Harmelen, Eds., Springer-Verlag, pp. 17–31.

[263] VAN DEN BOSCH,A.,BOGERS, T., ANDDE KUNDER,M. Estimating search engine index size variability: a 9-year longitudinal study. Scientometrics 107, 2 (2016), 839–856.

[264] VAN DER MEIJ,L.,ISAAC,A., AND ZINN,C. A web-based repository service for vocabularies and alignments in the cultural heritage domain. In The Semantic Web: Research and Applications: 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30 –June 3, 2010, Proceedings, Part I (2010), L. Aroyo, G. Antoniou, E. Hyvönen, A. ten Teije, H. Stuckenschmidt, L. Cabral, and T. Tudorache, Eds., Springer-Verlag, pp. 394–409.

[265] VAN OSSENBRUGGEN, J., AMIN,A., AND HILDEBRAND,M. Why evaluating semantic web applications is difficult. In Proceedings of the Fifth Interna- tional Workshop on Semantic Web User Interaction (SWUI 2008), collocated with CHI 2008, Florence, Italy, April 5 (2008), D. Degler, M. Schraefel, J. Golbeck, A. Bernstein, and L. Rutledge, Eds., CEUR Workshop Proceed- ings.

78 Bibliography

[266] VANDEN BERGHE,E.,CORO,G.,BAILLY, N., FIORELLATO, F., ALDEMITA, C.,ELLENBROEK,A., AND PAGANO, P. Retrieving taxa names from large biodiversity data collections using a flexible matching workflow. Ecological Informatics 28 (2015), 29–41.

[267] VANDENBUSSCHE, P.-Y., ATEMEZING,G.A.,POVEDA-VILLALÓN,M., AND VATANT,B. Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web. Semantic Web, In press (2016).

[268] VILJANEN,K.,HYVÖNEN,E.,MÄKELÄ,E.,SUOMINEN,O., AND TUOMI- NEN, J. Mash-up ontology services for the semantic web. In Demo track at the European Semantic Web Conference ESWC 2007, Innsbruck, Austria, June 4 (2007).

[269] VIZINE-GOETZ,D.,HOUGHTON,A., AND CHILDRESS,E. Web Services for Controlled Vocabularies. Bulletin of the American Society for Information Science and Technology 32, 5 (2006), 9–12.

[270] VOORHEES,E.M. Expansion using lexical-semantic relations. In SIGIR’94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (1994), pp. 61–69.

[271] WALLS,R.L.,DECK, J., GURALNICK,R.,BASKAUF,S.,BEAMAN,R., BLUM,S.,BOWERS,S.,BUTTIGIEG, P. L., DAVIES, N., ENDRESEN,D., GANDOLFO,M.A.,HANNER,R.,JANNING,A.,KRISHTALKA,L.,MAT- SUNAGA,A.,MIDFORD, P., MORRISON, N., TUAMA,É.Ó.,SCHILDHAUER, M.,SMITH,B.,STUCKY, B. J., THOMER,A.,WIECZOREK, J., WHITACRE, J., AND WOOLEY, J. Semantics in Support of Biodiversity Knowledge Dis- covery: An Introduction to the Biological Collections Ontology and Related Ontologies. PLoS ONE 9, 3 (2014).

[272] WANG,S.,SCHLOBACH,S., AND KLEIN,M. Concept drift and how to identify it. Journal of Web Semantics 9, 3 (2011), 247–265.

[273] WANG,X.,ALMEIDA, J. S., AND OLIVEIRA,A.L. Ontology design prin- ciples and normalization techniques in the web. In Data Integration in the Life Sciences: 5th International Workshop, DILS 2008, Evry, France, June 25-27, 2008, Proceedings (2008), A. Bairoch, S. Cohen-Boulakia, and C. Froidevaux, Eds., Springer-Verlag, pp. 28–43.

[274] WANG, Y.-C., VANDENDORPE, J., AND EVENS,M. Relational thesauri in information retrieval. Journal of the American Society for Information Science 36, 1 (1985), 15–27.

[275] WHETZEL, P. L., NOY, N. F., SHAH, N. H., ALEXANDER, P. R., NYULAS, C.,TUDORACHE, T., AND MUSEN,M.A. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Research 39, Web Server issue (2011), W541–W545.

[276] WHITE,H.,WILLIS,C., AND GREENBERG, J. HIVEing: The Effect of a Semantic Web technology on Inter-Indexer Consistancy. Journal of Docu- mentation 70, 3 (2014), 307–329.

[277] WIECZOREK, J., BLOOM,D.,GURALNICK,R.,BLUM,S.,DÖRING,M., GIOVANNI,R.,ROBERTSON, T., AND VIEGLAIS,D. Darwin Core: An

79 Bibliography

evolving community-developed biodiversity data standard. PLoS ONE 7, 1 (2012).

[278] WIELINGA, B. J., SCHREIBER, A. T., WIELEMAKER, J., AND SANDBERG, J. A.C. From thesaurus to ontology. In Proceedings of the 1st International Conference on Knowledge Capture: K-CAP 2001, Victoria, Canada, October 21-23 (2001), ACM, pp. 194–201.

[279] XIANG,Z.,MUNGALL,C.,RUTTENBERG,A., AND HE, Y. Ontobee: A linked data server and browser for ontology terms. In Proceedings of the 2nd International Conference on Biomedical Ontology, Buffalo, NY, USA, July 26-30 (2011), O. Bodenreider, M. E. Martone, and A. Ruttenberg, Eds., CEUR Workshop Proceedings.

[280] YTOW, N., MORSE,D.R., AND ROBERTS,D.M. Nomencurator: a nomen- clatural history model to handle multiple taxonomic views. Biological Journal of the Linnean Society 73 (2001), 81–98.

[281] ZABLITH, F., ANTONIOU,G., D’AQUIN,M.,FLOURIS,G.,KONDYLAKIS, H.,MOTTA,E.,PLEXOUSAKIS,D., AND SABOU,M. Ontology evolution: a process-centric survey. The Knowledge Engineering Review 30, 1 (2015), 45–75.

[282] ZAPILKO,B.,SCHAIBLE, J., MAYR, P., AND MATHIAK,B. TheSoz: A SKOS representation of the thesaurus for the social sciences. Semantic Web 4, 3 (2013), 257–263.

[283] ZENG,M.L., AND CHAN,L.M. Trends and issues in establishing interop- erability among knowledge organization systems. Journal of the American Society for Information Science and Technology 55, 5 (2004), 377–395.

[284] ZENG,M.L., AND HODGE,G. Developing a Dublin Core Application Profile for the Knowledge Organization Systems (KOS) Resources. Bulletin of the American Society for Information Science & Technology 37, 4 (2011), 30–34.

[285] ZERMOGLIO, P. F., GURALNICK, R. P., AND WIECZOREK, J. R. A standard- ized reference data set for vertebrate taxon name resolution. PLoS ONE 11, 1 (2016).

[286] ZHAO,L., AND ICHISE,R. Ontology Integration for Linked Data. Journal on Data Semantics 3, 4 (2014), 237–254.

80 Publication I

Kim Viljanen, Jouni Tuominen, and Eero Hyvönen. Ontology Libraries for Production Use: The Finnish Ontology Library Service ONKI. In The Se- mantic Web: Research and Applications: 6th European Semantic Web Con- ference, ESWC 2009, Heraklion, Crete, Greece, May 31–June 4, 2009, Pro- ceedings, Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Simperl (editors), Lecture Notes in Computer Science, volume 5554, pages 781–795, ISBN 978-3-642-02120-6, Springer-Verlag, June 2009.

c 2009 Springer-Verlag Berlin Heidelberg.

Reprinted with permission.

81

Ontology Libraries for Production Use: The Finnish Ontology Library Service ONKI

Kim Viljanen, Jouni Tuominen, and Eero Hyv¨onen

Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK) and University of Helsinki [email protected] http://www.seco.tkk.fi/

Abstract. This paper discusses problems of creating and using ontol- ogy library services in production use. One approach to a solution is presented with an online implementation—the Finnish Ontology Library Service ONKI— that is in pilot use on a national level in Finland. ONKI contributes to previous research on ontology libraries in many ways: First, mashup and web service support with various tools is provided for cost-efficient utilization of ontologies in indexing and search appli- cations. Second, services covering the different phases of the ontology life cycle are provided. Third, the services are provided and used in real world applications on a national scale. Fourth, the ontology framework is being developed by a collaborative effort by organizations representing different application domains, such as health, culture, and business.

1 Introduction

TheSemanticWeb1 is based on ontologies [1,2,3]. With the help of ontologies, the content and services on the Web can be described with metadata in an explicit, machine “understandable” way which enables, for example, interoper- ability on a semantic level and intelligent semantic searching and browsing of heterogeneous distributed content in semantic portals [4,5,6]. Utilizing ontolo- gies, including thesauri and other vocabularies, in new and existing applications requires efficient tools for finding, managing, searching and browsing ontologies. Ontology library systems2 offer functions for managing, adapting and standard- izing groups of ontologies, for indexing content with ontologies, and for utilizing ontologies in applications [6,7,8,9]. This paper discusses the requirements for ontology library systems from a practical viewpoint, and presents an approach for building such a service. As a concrete result of the research and a case study, the national Finnish Ontology Library Service ONKI3 is presented. ONKI is a major objective of the National 1 http://www.w3.org/2001/sw 2 Ontology library systems are referred to in the literature with terms “ontology servers” and “ontology services”, too. We use the term “ontology library systems” to encompass all these meanings. 3 http://www.yso.fi/

L. Aroyo et al. (Eds.): ESWC 2009, LNCS 5554, pp. 781–795, 2009. c Springer-Verlag Berlin Heidelberg 2009  782 K. Viljanen, J. Tuominen, and E. Hyv¨onen

Semantic Web Ontology project FinnONTO (2003–2010)4 which aims at devel- oping a Semantic Web ontology infrastructure on a national level in Finland [6]. The consortium behind the initiative represents a wide spectrum of functions of the society, including libraries, health organizations, cultural institutions, gov- ernment, media, and education. In the following, we first set requirements for an ontology library system in terms of services at different phases of ontology development and usage. After this, the ONKI system and its services are presented along the same phases, and the implementation is discussed. The system has been used in several case ap- plications that are briefly surveyed next. In conclusion, related work is discussed and contributions of ONKI summarized.

2 Requirements for an Ontology Library Service Requirements for ontology library services can be identified by analyzing the life cycle of an ontology. Based on literature [8,7] and our own work, we identify following parts to be typical in a life cycle of an ontology. 1) Designing the ontologies. The first step in the life cycle of an ontology is to design the structure and modelling principles of the ontology based on analysis of the subject domain and by identifying the business and application problems the ontology is intended to solve. The foundational classes, properties and in- stances of the ontology are created. Sometimes the ontology may also be based on existing vocabularies such as a thesaurus, which are then “ontologised”. The main actors of this phase are workgroups and individual ontologists. Ontology libraries should provide support e.g. for collaborative ontology editing, reuse and alignment [10]. 2) Populating the ontologies. Ontologies may consist of huge amounts of instances such as people, organisations and places. Populating and maintaining the information can either be a one-time effort or constantly contin- uing process even after the ontology has been published. Populating may be e.g. a community-based distributed effort or based on utilizing existing registries and sources as input. Ontology libraries should support such content collection and updating processes. 3) Publishing the ontologies. When an ontology has been created, methods for publishing and promoting it are needed to ensure that the ontology is actively used to achieve the benefits of creating the ontology in the first place. In addition to provide such publishing and promoting mechanisms, the ontology library should also have mechanisms - both manual and automatic - for ensuring the quality of the ontologies to be published. The main actors of this phase are the ontology owners. 4) Finding, comparing and commiting to ontologies. When considering using ontologies for some purpose, the finding of suitable ontologies require support from the ontology library. Typical users of this phase are information architects. 5) Ontology based semantic application creation. Learning, evaluating and implementating ontology services to appli- cations require functionalities from the ontology library service for making this 4 Our work is funded by the National Funding Agency for Technology and Innovation (Tekes) and a consortium of 38 companies and public organizations. http://www.seco.tkk.fi/projects/finnonto/ Ontology Libraries for Production Use 783 process as fluent and easy as possible - and to support the wide usage of ontolo- gies in applications [11,12]. Typical users of this phase are software architects. 6) Ontology based semantic content creation. Ontologies are mostly used for de- scribing and indexing content semantically. Ontology libraries should support the work of content indexers by providing efficient tools and services for e.g. browsing ontologies, finding and fetching concepts for annotating purposes [9] or automatically indexing documents [13]. 7) Ontology-based end-user applications. Ontological content search, semantic browsing, semantic portals are examples of typical ways to provide the end-user with benefits from using ontologies in an applications (see e.g. [4,5,6,14]). Ontology libraries should support creating such end-user applications by providing services for the application builders. Ontol- ogy libraries may also provide services directly to the end-users such as the possibility to learn about some domain with the help of ontologies.

3 Finnish Ontology Library Service ONKI

The Finnish Ontology Library Service ONKI is a pilot system for addressing the requirements of an ontology library service on a national scale, but with the special focus on ontology publishing and using them in content indexing, and information retrieval through both user and application interfaces [14,6]. ONKI contains currently over 40 ontologies from various domain areas (see Table 1). Most of the ontologies are freely available to anybody to test and use in their applications.

3.1 Designing the Ontologies, Populating the Ontologies Ontologies developed within the ONKI framework are mostly created with the Prot´eg´e5 editor. Version management of the ontology files is done with Subver- sion6. In many cases, the ontologies are based on an existing thesaurus or other content which have first been transformed with an custom-made program to OWL and then been refined ontologically by the ontologist using Prot´eg´eand by aligning the ontologies with the Finnish Upper Ontology YSO [6]. Populat- ing the ontologies have been done either with Prot´eg´e by the ontologist(s) or collaboratively with the browser-based annotation editor SAHA [15]. For many ontologies, populating have been done with custom-made programs. Quality of the ontologies is controlled using three ways: gate keeping, quality requirements and training. Gate keeping is practised by selecting only trusted participants in the ontology creation and publishing which include both com- panies, governmental and non-governmental organizations. Typically the main author of any single ontology in ONKI is the leading authority in Finland of the respective domain. Quality requirements enforced in ONKI cover ontology presentation and ontology creation process issues. The ontology should be pre- sented using some RDF-based ontology language, such as SKOS, OWL or RDF 5 http://protege.stanford.edu 6 http://subversion.tigris.org/ 784 K. Viljanen, J. Tuominen, and E. Hyv¨onen

Table 1. A selection of ontologies currently available in ONKI (The amount of concepts consists classes and/or instances, depending on the ontology.)

Ontology Concepts Format Public?

Upper and Holistic Ontologies Holistic Collaborative Finnish Ontology KOKO ca. 30,000 OWL yes General Finnish Upper Ontology YSO 20,649 OWL yes General Finnish Thesaurus YSA 26,633 SKOS yes Wordnet ca. 230,000 SKOS yes Cultural Ontologies Ontology for Museum Domain MAO 6,775 OWL yes Ontology of Applied Arts TAO 29,940 OWL yes Finnish Ontology of Photography VALO 22,596 OWL yes Ontology for music MUSO 21,650 OWL yes Art and Iconography classification Iconclass 26,636 SKOS yes Kaunokki thesaurus for fictive literature 4,373 SKOS yes Music thesaurus MUSA 931 SKOS yes Art & Architecture Thesaurus AAT 27,992 OWL no Agriforest and Natural Science Ontologies Agriforest Ontology AFO 26,612 OWL yes Ontology of Birds AVIO 11,161 SKOS no Ontology of Mammals MAMO 6,059 SKOS no Health Ontologies Medical Subject Headings MeSH 24,355 SKOS yes European Multilingual Thesaurus on Health Promotion HPMULTI 1,271 SKOS yes Business Ontologies, Governmental Ontologies Seafaring thesaurus MESA 1,448 SKOS yes United Nations Standard Products and Services Code UNSPSC 20,794 SKOS no Finnish Governmental Thesaurus VNAS 6,342 SKOS yes Instance Ontologies Finnish Geo-ontology SUO ca. 800,000 OWL yes Finnish Time-Location Ontology SAPO 1,102 OWL yes Getty Thesaurus of Geographic Names TGN (exluding USA) 142,990 OWL no Getty Union List of Artist Names ULAN ca. 100,000 OWL no

Schema. The consistency of the ontologies is checked including syntax check- ing (valid RDF) and conceptual checking manually by the ontologists. One of the most important ways to enforce the quality is that the ontologies in ONKI library are created using a common development process and modelling idea which are supervised by the core YSO developer team [6]. This promotes using compatible development processes in all other ontologies also and thus provides a more compatible collection of ontologies as a result. If possible, ontologies are aligned with a common upper ontology, the Finnish Upper Ontology YSO. This alignment to YSO adds value to YSO, the ontology at hand and the ONKI Li- brary as a whole, because each additional alignment adds new possibilities to find concept relations. To spread good practices and knowledge about the mod- elling methods used in YSO and other relevant ontologies, training is provided to ontology developers. To enforce the reuse of ontologies, the license of the pub- lished ontologies should allow publishing, using and redeveloping the ontology as Ontology Libraries for Production Use 785 freely as possible. The default license used for ONKI ontologies is the Creative Commons license7.

3.2 Publishing the Ontologies

Publishing an ontology in ONKI typically contains the following phases: First, the ontology to be published is added to the Subversion repository and the needed configurations for the ontology are created [16]. Second, a URI normal- ization for the ontology to be published is done where the original URIs are transformed to persistent numeric URIs (PURIs). Instead of (typically) human readable URIs we propose that URIs should not contain any reference to human languages to avoid unnecessary needs for changing the concept URIs e.g. when translating the ontology to some other language8. For example, instead using the URI “myonto:semanticweb” we propose using the URI “myonto:p12345”. Third, if the ontology is published part of the KOKO ontology, the automatic updating of KOKO takes place. Finally, the ontology is added to ONKI and made available via different services such as human user interfaces and machine APIs. If the ontology is maintained in some external system it can be published using ONKI by establishing a publishing pipeline from the external system to ONKI. This method has been used e.g. in publishing the General Finnish Thesaurus YSA, maintained by the National Library of Finland [16]. The thesaurus is fetched each night from the National Library’s server using the MARCXML format9. The content is then transformed to SKOS and finally published in ONKI. ONKI provides also an upload functionality “Your ONKI” for publishing SKOS (or other) ontologies in the library. When an ontology has been uploaded, it is moderated by the server administration and if the content is suitable, it will be added to the library. ONKI quality requirements presented earlier are recommended also for Your ONKI submissions.

3.3 Ontology Discovery, Ontology Library Service Evaluation

To support finding, evaluating and choosing an ontology for specific purposes each ontology is described with metadata including title, description, classifica- tion, version information and available access methods. Depicted in Figure 1 is the main user interface with the list of available ontologies, which also shows the available access methods for each ontology. Access methods are described later, but include e.g. the possibility to browse and search the ontologies. The ontologies can be described and documentedinawiki,partoftheONKIsystem. The list of ontologies is available in RDF for machine usage. When allowed by the publishing license, current and previous versions of ontologies are available for downloading.

7 http://creativecommons.org/ 8 The idea of stable URIs is also discussed in http://www.w3.org/Provider/Style/URI 9 http://www.loc.gov/standards/marcxml/ 786 K. Viljanen, J. Tuominen, and E. Hyv¨onen

Fig. 1. List of ontologies in ONKI. Each ontology contains the links to the available access methods such as the ontology-specific browser.

3.4 Ontology-Based Semantic Content Creation For users creating semantic content, e.g. by describing resources by using onto- logical concepts, ONKI service provides the ONKI Selector, depicted in Figure 2 [9]. With ONKI Selector content creators can find suitable concepts for their annotation tasks. When the ONKI Selector is integrated into a HTML input field, the field turns into a semantic autocompletion search interface. When typ- ing search string into the input field, the matching concepts are returned as a hit list. Desired concepts can be selected from the list and added to the content creation application. Depending on the use case, concept’s URI, label or both of them can be fetched to the application. In combination with the ONKI Selector, domain-specific ONKI browsers can be used to browse the ontologies when searching for suitable concepts. The browsers have a “Fetch Concept” button which returns the selected concept into the content creation application. ONKI SKOS Browser[16] is an ontology

Fig. 2. ONKI Selector Ontology Libraries for Production Use 787

Fig. 3. ONKI SKOS Browser

Fig. 4. ONKI Geo Browser browser for thesaurus-like class ontologies. It supports visualizing and brows- ing of vocabularies conforming to SKOS recommendation, and also RDF(S) and OWL ontologies with additional configuration. ONKI SKOS Browser consists of three main components: 1) concept search with semantic autocompletion,2) concept hierarchy and 3) concept properties,asdepictedinFigure3.ONKIGeo Browser [17] is used for accessing geographical instance data with a map inter- face, as depicted in Figure 4. It provides unambiguous place identifiers (URIs) and coordinates for arbitrary points or polygons to be used in content annota- tion. ONKI People [18] is used for browsing and searching ontologies of persons, organizations, and similar instance registries.

3.5 Ontology-Based End-User Applications

For ontology-based end-user applications ONKI service provides means for find- ing ontological concepts and using them, e.g., in information retrieval tasks. 788 K. Viljanen, J. Tuominen, and E. Hyv¨onen

Compared to a simple free text search field, the ONKI Selector aids user to find query concepts with autocompletion search and ontology browsers. The ONKI Selector is useful even if the application is not ontology-based. In that case the labels of the concepts can be used as query terms. To increase the recall of the information retrieval tasks, ONKI Selector per- forms query expansion by ontological inference. The properties used for per- forming the query expansion can be configured separately for each ontology. In class ontologies, a concept is typically expanded to its subconcepts. The query expansion could also be based on partonomy, associative relationships or other relations between concepts. In geographical instance data, a place instance is ex- panded to places that have historically had overlapping regions with the place. Other possible query expansion methods include partonomy, places with shared regional borders etc.

3.6 Ontology-Based Semantic Application Creation

ONKI services may be integrated into semantic applications at the user inter- face level by using the ready-to-use user interface component ONKI Selector, domain-specific ONKI Browsers, and by using application programming inter- faces (API). ONKI supports the software developer in using ONKI services with the help of helper applications such as the ONKI Selector Builder, depicted in Figure 5, which helps the developer to generate the JavaScript code needed for integrating ONKI Selector into web-based applications. When the desired con- figuration properties have been set in the ONKI Selector Builder, the resulting JavaScript code can be copied into the application being developed.

Fig. 5. ONKI Selector builder Ontology Libraries for Production Use 789

ONKI API provides methods for accessing ontologies, e.g., for searching for concepts, getting metadata of an ontology and performing ontology-based query expansion. ONKI API is implemented as Web Service (SOAP) and JavaScript interface. ONKI API contains the following methods: – search(query, lang, maxHits, type, parent) - for searching for the ontological concepts. Returns a list of matching concepts. – expandQuery(URI, lang, maxHits, type) - for querying for the query expan- sion for a concept. Returns a list of concepts. – getLabel(URI, lang) - for fetching a label for a given concept URI in a given language. – getAvailableLanguages() - for querying for the supported languages of an ontology. Returns a list of language codes. – getAvailableTypeUris() - for querying for the concept types (rdf:type rela- tions) existing in the ontology. Returns a list of URIs. Software developers can also utilize the RDF files of the ontologies published in the ONKI service. All concept and instance URIs are designed so that they function also as URLs. When the URI of a concept is accessed with a web browser, the relevant view is opened in the ONKI browser. This means that the URI itself acts as a functional link when added to a HTML page. In accordance to W3C10, if the URI is accessed with an RDF aware system, the machine readable RDF presentation of the content is returned instead of the ONKI browser’s HTML presentation.

4 Implementation and Usage Statistics

The ONKI service is constituted of a loosely coupled set of independent appli- cations such as the ONKI SKOS and ONKI Geo servers which are combined by using a lightweight facade service made with an Apache web server, Apache rewrite rules and PHP scripts. Back-end ONKI applications conform also to the ONKI API described in section 3.6. Technologies used for implementing the various back-end applications include Java, Semantic Web Framework Jena11, MySQL database, Lucene index12, Direct Web Remoting DWR for AJAX func- tionalities13, Varnish HTTP accelerator14 and shell scripts and Subversion ver- sion management system. The ontologies are presented internally in various RDF formats, typically in SKOS or OWL. With the help of ontology-specific config- urations, the ontologies are served to the user in a uniform way. The ONKI is running as a pilot service publicly available on the web. It was officially launched in September 200815. During year 2008 ONKI had ca. 36,000

10 http://www.w3.org/TR/swbp-vocab-pub/ 11 http://jena.sourceforge.net 12 http://lucene.apache.org 13 http://www.directwebremoting.org 14 http://varnish.projects.linpro.no 15 http://www.youtube.com/watch?v=qG2YhK17ifs 790 K. Viljanen, J. Tuominen, and E. Hyv¨onen

Table 2. ONKI usage statistics for November 2008

Service Hits Human interfaces ONKI-SKOS Browser 89,346 ONKI Selector Widget 54,384 Persistent URI redirects 4,415 ONKI Selector builder 1,103 Web Service builder 1,403 ONKI-IRMA 495 ONKI-Geo Browser 203 ONKI-Geo coordinates Browser 168 Application interfaces Web service calls 87,388 Javascript calls 18,816 Total 257,721 unique visitors and ca. 104,000 visits. 91 organizations outside the research group have registered an access key for using the JavaScript and web service interfaces which of 25 have actually implemented a test application using the components. For an overview, table 2 presents the usage statistics of the main ONKI function- alities for a representative month16. The most popular ontologies are the Medical Subject Headings, the Finnish Upper Ontology YSO, the General Finnish The- saurus YSA and the Ontology for Museum Domain MAO where each ontology got over ten thousand hits during the month.

5 Case Applications Using ONKI

5.1 Content Creation: HealthFinland, CultureSampo and Tilkut HealthFinland and CultureSampo are two major pilot applications of the FinnONTO project [6]. They demonstrate the usage of Semantic Web tech- nologies in the contexts of health promotion and cultural heritage. Both sys- tems uses ONKI as the ontology server for indexing content especially with the the browser-based annotation editor SAHA17 [15]. For example (Figure 6), the Finnish General Ontology YSO [6] has been added as a ONKI Selector compo- nent to SAHA for finding and fetching annotation concepts. The web laboratory Owela18 of the VTT Technical Research Centre of Finland has implemented a service for collecting and sharing text and image clips from the Web19. In the service one can organize the clips into folders and tag them with different categories. The ONKI Selector is used for tagging the clips.

16 The hits represents user activities of the system. Outliers caused e.g. by web crawlers have been removed as far as possible. 17 http://www.seco.tkk.fi/services/saha/ 18 http://owela.vtt.fi/ 19 http://owela.vtt.fi/tilkut Ontology Libraries for Production Use 791

             



 ! 

   "#

         

Fig. 6. The Finnish General Ontology connected to SAHA

5.2 Content Search: Kantapuu.fi and eViikki Kantapuu.fi20 is a web user interface for browsing and searching for collections of Finnish museums of forestry. The collection items are annotated with terms from General Finnish Thesaurus YSA, Thesaurus for Museum Domain MASA and Agriforest Thesaurus. Kantapuu.fi search page is a web form into which query strings are typed as free text. The query strings can be placed into spe- cific fields, e.g. “keywords”, “place of use” or “time of use”. We have created a demonstration page containing a Kantapuu.fi’s search form with integrated ONKI Selectors which can be used for selecting query terms to be used in the Kantapuu.fi search21. The ONKI Selector is used for finding terms from vocab- ularies of the ONKI service. The used vocabularies are the same as those used in the annotation process of the items, or actually their ontologized versions. To find suitable query terms user can utilize the autocompletion search func- tionality or the ONKI Browser. Thus, the user does not need to be familiar with the vocabularies used in the annotations of the items, as in the case of free text search. The ONKI Selector performs query expansion based on the selected query terms. So, for example a query term “animals” would return items anno- tated with term “cats”. When the desired query terms are selected, the actual search to the Kantapuu.fi system can be executed. The ONKI Selector Widget has also been integrated into the Viikki Science Library22 reference database system eViikki23. eViikki is a search interface for the library’s collections, which consist of scientific literature on agriforestry. The ONKI Selector is used for populating the “keywords” field of the search form of eViikki. The fetched concept labels are used in the information retrieval task. Query expansion is not performed currently.

20 http://www.kantapuu.fi/ 21 http://www.yso.fi/lusto 22 http://www.tiedekirjasto.helsinki.fi/english/ 23 http://www-db.helsinki.fi/eviikki/eviikkihaku.html 792 K. Viljanen, J. Tuominen, and E. Hyv¨onen

6 Related Work

Based on reviews on ontology library systems [7,8], the main focus in existing systems tends to be in supporting ontology development and not the runtime usage of ontologies such as indexing and ontology-based end-user applications. Although ONKI provides support for the whole ontology life cycle, a major contribution of ONKI is the support for indexing and other runtime needs. The DAML Ontology Library24 is a classic implementation of an ontology library. The ontologies can be accessed via different categories, such as meta- data describing the ontologies such as keywords, Open Directory category25 and submitting organization. Also information obtained from the ontologies such as names of classes and properties can be used for finding relevant ontologies. The main method for using the ontologies is to download them. In comparison, ONKI provides application support for e.g. adding ontologies as mash-up and web services to applications. The Ontology Library Service ONKI provides methods for finding ontologies and concepts amongst the ontologies published in the centralized service, whereas Swoogle [19] and Watson [20] act as global Semantic Web search engines. They crawl the web and index the RDF files they find. Such search engines can be useful when searching for suitable ontologies to use in applications, providing an overview of ontologies of some domain published on the web. The ONKI Service is based on a different approach. It aims to be a community-based service that gathers together the users of ontologies providing them ready-to-use ontological functionalities which can be integrated into semantic applications. Dameron et al. proposes that ontology services should be provided as Ontology Web Services (OWS) which could be used in applications for automatically find and use ontologies [12]. In ONKI we support the idea of providing application interfaces to the ontology library, but extend the idea to a higher abstraction level by providing also ready-to-use user interface components to avoid duplicated work by re-implementing user interface and visualization functionalities. Faviki26 is a semantic bookmarking service which uses Wikipedia’s27 term identifiers for tagging web pages. In comparison, ONKI is focusing on publishing ontologies and to support the creation of ontology-based indexing, content search and other applications. Freebase28 is a data repository on the web that aggregates information from many sources and provides a single topic and identifier for each logical entity, e.g. a person. One goal of ONKI is also to provide (optimally) single, shared identifiers for ontological concepts which can be used to aggregate distributed content repositories. Freebase is based on a bottom-up approach based on exist- ing information e.g. in Wikipedia whereas ONKI ontologies are (typically) based on top-down analysis of a domain and its relevant concepts. 24 http://www.daml.org/ontologies/ 25 http://www.dmoz.org 26 http://www.faviki.com 27 With the help of DBPedia, http://dpbedia.org 28 http://www.freebase.com Ontology Libraries for Production Use 793

7 Discussion

This paper discussed the requirements of an ontology library system to sup- port the different phases of an ontology life cycle and related user needs for creation, publishing, maintaining and using ontologies. The Finnish national on- tology library service ONKI addresses all phases of the ontology life cycle and contributes especially in providing support for 1) collaborative ontology publish- ing, 2) content indexing, and 3) information searching. The ontology services can be used in external legacy and other applications as ready-to-use functionalities. The new idea here is to support mash-up usage of ontologies in a way similar to Google Maps and other similar services. Our approach of providing an integrable autocompletion widget for external systems is the same as in [21]. The ONKI system supports syntactically, structurally and semantically het- erogeneous content. RDF-based content representations such as RDF Schema, SKOS and OWL can be easily published by the ONKI SKOS server. SKOS gen- erators for especially thesauri presentation formats such as MARCXML, various database schemas and text files has been implemented. Also multilingual content is supported. ONKI has been built for and tested with real world data consist- ing of ontologies, well-known thesauri and registries. The geographical ontology SUO contains over 800,000 places in Finland, which is published using the ONKI Geo server. The typical size of ontologies and thesauri published using the ONKI SKOS server is tens of thousands concept, e.g., YSO contains 20,600 concepts. For testing the scalability of ONKI SKOS, the Wordnet with 230,000 concepts has been successfully presented with the system. The ONKI services have been tested in various ways: the ONKI Selector as part of the SAHA editor in creat- ing content for e.g. HealthFinland and CultureSampo and by external parties in their indexing and search applications. The ONKI Browser has been tested by expert users from e.g. the National Library of Finland. To conclude, this full scale national ontology library service ONKI is novel and has not been done before. There are currently thousands of individual users and hundreds of organizations from different domains testing the ONKI system and using it in pilot applications. The National Library’s commitment to ONKI means that a substantial part of public and private organizations in Finland be- gin to use ONKI – and most promisingly also the KOKO ontology – for indexing and search, but also for publishing and accessing their ontologies and thesauri – and to join the Semantic Web. Future work include continuing observing how the ontology development, pub- lishing and using community continues to build up around ONKI. We are most interested in seeing what kind of new applications will emerge based on ONKI and how well the concept of interlinked ontologies works in practice. DBPedia could be an interesting repository to be published in ONKI. Research topics include developing further methods for supporting community-based ontology development, managing changes in ontologies and utilizing change history and change propagation in e.g. inference and searching. 794 K. Viljanen, J. Tuominen, and E. Hyv¨onen

References

1. Gruber, T.R.: A translation approach to portable ontology spesifications. Knowl- edge Acquisition 5(2), 199–220 (1993) 2. Staab, S., Studer, R.: Handbook on ontologies. Springer, Heidelberg (2004) 3. Fensel, D.: Ontologies: Silver bullet for knowledge management and electronic com- merce, 2nd edn. Springer, Heidelberg (2004) 4. Reynolds, D., Shabajee, P., Cayzer, S.: Semantic Information Portals. In: Proceed- ings of the 13th International World Wide Web Conference on Alternate track papers & posters. ACM Press, New York (2004) 5. Hyv¨onen, E., M¨akel¨a, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., Kettula, S.: MuseumFinland—Finnish museums on the semantic web. Journal of Web Semantics 3(2), 224–241 (2005) 6. Hyv¨onen, E., Viljanen, K., Tuominen, J., Sepp¨al¨a, K.: Building a national seman- tic web ontology and ontology service infrastructure—the finnonto approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 95–109. Springer, Heidelberg (2008) 7. Ding, Y., Fensel, D.: Ontology library systems: The key for successful ontology reuse. In: Cruz, I.F., Decker, S., Euzenat, J., McGuinness, D.L. (eds.) Proceedings of SWWS 2001, The first Semantic Web Working Symposium, Stanford University, California, USA, July 30 - August 1, 2001, pp. 93–112 (2001) 8. Ahmad, M.N., Colomb, R.M.: Managing ontologies: a comparative study of on- tology servers. In: ADC 2007: Proceedings of the eighteenth conference on Aus- tralasian database, pp. 13–22. Australian Computer Society, Inc., Australia (2007) 9. Viljanen, K., Tuominen, J., Hyv¨onen, E.: Publishing and using ontologies as mash- up services. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021. Springer, Heidelberg (2008) 10. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007) 11. M¨akel¨a, E., Viljanen, K., Alm, O., Tuominen, J., Valkeap¨a¨a, O., Kauppinen, T., Kurki, J., Sinkkil¨a, R., K¨ans¨al¨a, T., Lindroos, R., Suominen, O., Ruotsalo, T., Hyv¨anen, E.: Enabling the semantic web with ready-to-use web widgets. In: Pro- ceedings of the First Industrial Results of Semantic Technologies Workshop, ISWC 2007, November 11 (2007) 12. Dameron, O., Noy, N.F., Knublauch, H., Musen, M.A.: Accessing and manipulating ontologies using web services. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298. Springer, Heidelberg (2004) 13. Vehvil¨ainen, A., Hyv¨anen, E., Alm, O.: A semi-automatic semantic annotation and authoring tool for a library help desk service. In: Emerging Technologies for Semantic Work Environments: Techniques, Methods, and Applications. IGI Group, Hershey (2008) 14. Viljanen, K., Tuominen, J., K¨ans¨al¨a, T., Hyv¨onen, E.: Distributed semantic content creation and publication for cultural heritage legacy systems. In: Proceedings of the 2008 IEEE International Conference on Distibuted Human-Machine Systems, Athens, Greece. IEEE Press, Los Alamitos (2008) 15. Valkeap¨a¨a, O., Hyv¨onen, E., Alm, O.: A framework for ontology-based adaptable content creation on the semantic web. J. of Universal Computer Science (2007) 16. Tuominen, J., Frosterus, M., Viljanen, K., Hyv¨onen, E.: ONKI SKOS server for publishing and utilizing skos vocabularies and ontologies as services. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 768–780. Springer, Heidelberg (2009) Ontology Libraries for Production Use 795

17. Kauppinen, T., Henriksson, R., Sinkkil¨a, R., Lindroos, R., V¨a¨at¨ainen, J., Hyv¨onen, E.: Ontology-based disambiguation of spatiotemporal locations. In: Proceedings of the 1st international workshop on Identity and Reference on the Semantic Web (IRSW2008), Tenerife, Spain, June 1-5 (2008) 18. Kurki, J.: Finding people and organizations on the semantic web. In: AI and Ma- chine Consciousness - Proceedings of the 13th Finnish Artificial Intelligence Con- ference STeP, August 20-22 (2008) 19. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 652–659. ACM Press, New York (2004) 20. d’Aquin, M., Baldassarre, C., Gridinoc, L., Sabou, M., Angeletou, S., Motta, E.: Watson: Supporting next generation semantic web applications. In: Proceedings of IADIS International Conference on WWW/Internet (ICWI 2007), Vila Real, Portugal (2007) 21. Hildebrand, M., van Ossenbruggen, J., Amin, A., Aroyo, L., Wielemaker, J., Hard- man, L.: The design space of a configurable autocompletion component. Technical Report INS-E0708, CWI, Amsterdam (November 2007)

Publication II

Jouni Tuominen, Matias Frosterus, Kim Viljanen, and Eero Hyvönen. ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontolo- gies as Services. In The Semantic Web: Research and Applications: 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31–June 4, 2009, Proceedings, Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Simperl (editors), Lecture Notes in Computer Science, volume 5554, pages 781–795, ISBN 978-3-642-02120-6, Springer-Verlag, June 2009.

c 2009 Springer-Verlag Berlin Heidelberg.

Reprinted with permission.

99

ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services

Jouni Tuominen, Matias Frosterus, Kim Viljanen, and Eero Hyv¨onen

Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK) and University of Helsinki [email protected] http://www.seco.tkk.fi/

Abstract. Vocabularies are the building blocks of the Semantic Web providing shared terminological resources for content indexing, informa- tion retrieval, data exchange, and content integration. Most semantic web applications in practical use are based on lightweight ontologies and, more recently, on the Simple Knowledge Organization System (SKOS) data model being standardized by W3C. Easy and cost-efficient publi- cation, integration, and utilization methods of vocabulary services are therefore highly important for the proliferation of the Semantic Web. This paper presents the ONKI SKOS Server for these tasks. Using ONKI SKOS, a SKOS vocabulary or a lightweight ontology can be published on the web as ready-to-use services in a matter of minutes. The services include not only a browser for human usage, but also Web Service and AJAX interfaces for concept finding, selecting and transporting resources from the ONKI SKOS Server to connected systems. Code generation ser- vices for AJAX and Web Service APIs are provided automatically, too. ONKI SKOS services are also used for semantic query expansion in in- formation retrieval tasks. The idea of publishing ontologies as services is analogous to Google Maps. In our case, however, vocabulary services are provided and mashed-up in applications. ONKI SKOS was published in the beginning of 2008 and is to our knowledge the first generic SKOS server of its kind. The system has been used to publish and utilize some 60 vocabularies and ontologies in the National Finnish Ontology Service ONKI www.yso.fi.

1 Introduction

Thesauri and other controlled vocabularies are used primarily for improving in- formation retrieval [1]. This is accomplished by using concepts or terms of a thesaurus in content indexing, content searching or in both of them, thus simpli- fying the matching of query terms and the indexed resources (e.g. documents) compared to using natural language. For users, such as content indexers and searchers, to be able to use thesauri, publishing and finding methods for them as well as methods for integrating them with applications [2] are needed [3]. Thesauri are of great benefit for the Semantic Web1, enabling semantically dis- ambiguated data exchange and integration of data from different sources, though 1 http://www.w3.org/2001/sw/

L. Aroyo et al. (Eds.): ESWC 2009, LNCS 5554, pp. 768–780, 2009. c Springer-Verlag Berlin Heidelberg 2009  ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 769 not to the same extent as ontologies [4] where the semantics of concepts are de- fined in more refined and “machine understandable” ways. Publishing and utilizing thesauri is a laborious task because the representa- tion formats of thesauri and features they provide differ from each other. When utilizing thesauri one has to know how to locate a given thesaurus, and also be familiar with the software the thesaurus is published with. A thesaurus can even be published as a plain text file or even worse, as a paper document, with no proper support for utilizing it. In such a case the users have to implement applications for processing the thesaurus in order to exploit it. Therefore, stan- dard ways for expressing and publishing thesauri would greatly facilitate the publishing and utilizing processes of thesauri. The Simple Knowledge Organization System (SKOS)2 [5] being developed within W3C is a data model and a syntax for expressing concept schemes such as thesauri. SKOS provides a standard way for creating vocabularies and migrating existing vocabularies to the Semantic Web. SKOS solves the problem of diverse, non-interoperable thesaurus representation formats by offering a standard con- vention for presentation. Existing thesauri can be transformed into SKOS format via conversion processes. When a thesaurus is expressed as a SKOS vocabulary, it can be published as a RDF file on the web, allowing the vocabulary users to fetch the files and process them in a uniform way. However, this does not solve the problem of users having to implement their own applications for processing vocabularies. This paper argues that there are many common, sharable tasks in such vocabulary-aware applications related to e.g. term/concept finding, browsing, selecting, fetching, and query expansion. Lots of work and costs can be saved by implementing such functionalities in standard ways [6] and by providing them for production use as ready-to-use services without having to reimplement the functionalities separately for each local application case. In this way, the use patterns of utilizing vocabularies in the user interface can be harmonized, which makes systems easier to learn and use. Ontology servers [7,8] have been proposed for publishing ontologies and vo- cabularies on the Semantic Web. Such servers are used for managing ontologies and offering users access to them. For accessing SKOS vocabularies, there is the Web Service based SKOS API3 developed in the SWAD-Europe project, and the terminology service by Tudhope et al. [9]. There is also ongoing research by the Networked Knowledge Organization Systems/Services (NKOS) community4 that focuses on enabling knowledge organization systems as networked inter- active information services to support the description and retrieval of diverse information resources through the Internet. However, general tools for providing out-of-the-box support for utilizing SKOS vocabularies in, e.g., content indexing, without needing to implement application specific user interfaces for end users do not exist.

2 http://www.w3.org/TR/skos-reference/ 3 http://www.w3.org/2001/sw/Europe/reports/thes/skosapi.html 4 http://nkos.slis.kent.edu/ 770 J. Tuominen et al.

This paper fills this gap by presenting the ONKI SKOS Server for publish- ing and utilizing thesauri and lightweight ontologies. In the following, ways of presenting thesauri on the Semantic Web are first overviewed. After this, the approach and implementation of the ONKI SKOS Server is presented followed by descriptions of use cases of its services. In conclusion, contributions of the work are summarized and directions for further research outlined.

2 Presenting Thesauri on the Semantic Web

W3C’s SKOS data model provides a vocabulary for expressing the basic struc- ture and contents of concept schemes, such as thesauri, classification schemes and taxonomies. The concept schemes are expressed as RDF graphs by using classes and properties defined in the SKOS specification, thus making thesauri compatible with the Semantic Web. SKOS is capable of representing vocabularies which loosely conform to the influential ISO 2788 thesaurus standard [6]. Although semantically richer RDFS/OWL ontologies enable more extensive ways to perform logical inferencing than SKOS vocabularies, in several cases thesauri represented with SKOS are sufficient. In our opinion, the first and the most obvious benefit of using Semantic Web ontologies/vocabularies in content indexing is their ability to disambiguate concept references in a universal way. This is achieved by using persistent URIs as the identification mechanism and is a tremendous advantage when compared with the traditional idea of controlled vocabularies where plain concept labels are used as identifiers. In such a case, identification problems can be encountered e.g. with polysemous and homony- mous labels, when dealing with multi-lingual resources, and with resources whose labels may change as time goes by. For example, human names are transliterated in many ways in different languages, may change due to marriage, a person may have nick names, and so on. In such cases the labels of concepts are not a per- manent identification method, and the references to the concepts may become ambiguous or invalid. URIs provide not only an identification mechanism, but also means for access- ing the concept definitions and thesauri, when using the http URI Scheme5. Such URIs can act as URLs, and with a proper server configuration provide the users additional information about the referred concept6. In addition to these general RDF characteristics, SKOS provides a way for expressing relations between con- cepts suitable for the needs of thesauri, thus providing conceptual context for concepts. In short, using a common representation model such as SKOS for thesauri greatly reduces the cost of (a) sharing thesauri, (b) using different thesauri in conjunction within one application, and (c) development of standard software to process them [6].

5 http://www.iana.org/assignments/uri-schemes.html 6 http://www.w3.org/TR/swbp-vocab-pub/ ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 771

3 Utilizing Thesauri with ONKI SKOS Services

ONKI SKOS is an ontology server implementation for publishing and utilizing thesauri and lightweight ontologies. It conforms to the general ONKI vision and API, and is thus usable via ONKI ontology services as easily integrable user interface components and APIs [2]. The Semantic Web applications typically use ontologies which are either straightforward conversions of well-established thesauri, application-specific vo- cabularies or semantically richer ontologies, that can be presented and accessed in similar ways to thesauri [3,10]. Since SKOS defines a suitable model for ex- pressing thesauri, it was chosen as the primary data model supported by the ONKI SKOS Server. ONKI SKOS can be used to browse, search and visualize any vocabulary conforming to the SKOS specification, and also RDFS/OWL ontologies with additional configuration. ONKI SKOS does simple reasoning, e.g., transitive closure over class and part-of hierarchies. The implementation has been piloted using various thesauri and ontologies from diverse domains. Piloted ontologies include ontologies for cultural heritage (Art & Architecture Thesaurus AAT7, 28,000 concepts; Iconclass8, 27,000 concepts), health ontologies (Medical Subject Headings MeSH9, 24,000 concepts), business ontologies (United Nations Stan- dard Products and Services Code UNSPSC10, 21,000 concepts), geographical in- stance ontologies (The Getty Thesaurus of Geographic Names TGN11, 143,000 concepts), upper ontologies (General Finnish Upper Ontology YSO [3], 21,000 concepts), governmental ontologies (Finnish Governmental Thesaurus VNAS, 6,300 concepts) and natural science taxonomies (Ontology of Birds of the World AVIO, 11,000 concepts; Ontology of Mammals of the World MAMO, 5,000 con- cepts). However, not all of these ontologies are freely accessible due to licensing issues. When utilizing thesauri represented as SKOS vocabularies and published on the ONKI SKOS server, several benefits are gained. Firstly, SKOS provides a universal way of expressing thesauri. Thus processing different thesauri can be done in the same way, eliminating the use of thesaurus specific processing rules in applications or separate converters between various formats. Secondly, ONKI SKOS provides access to all published thesauri in the same way, so content indexers and end-users do not have to use thesaurus specific implementations of thesaurus browsers and other tools developed by different parties, which is the predominant way. Also, one of the goals of the ONKI ontology services is that all the essential ontologies/thesauri can be found at the same location, thus eliminating the need to search for other thesaurus sources. The typical way to use thesaurus specific publishing systems in content in- dexing and searching is either by using their browser user interface for finding 7 http://www.getty.edu/research/conducting_research/vocabularies/aat/ 8 http://www.iconclass.nl/ 9 http://www.nlm.nih.gov/mesh/meshhome.html 10 http://www.unspsc.org/ 11 http://www.getty.edu/research/conducting_research/vocabularies/tgn/ 772 J. Tuominen et al. desired concepts and then copying and pasting the concept label to the used in- dexing system12, or by using APIs for accessing and querying the thesaurus [9]. Both methods have some drawbacks. The first method introduces rather un- comfortable task of constant switching between two applications and the clumsy copy-paste procedure. The second method leaves the implementation job of the user interface entirely to the parties utilizing the thesaurus. While ONKI SKOS supports both the aforementioned thesauri utilizing meth- ods, in addition, as part of the ONKI ontology services, it provides a lightweight web widget for integrating general thesauri accessing functionalities into web based applications on the user interface level. The ONKI Selector widget de- picted in Figure 1 can be used to search and browse thesauri, fetch URI ref- erences and labels of desired concepts and storing them in a concept collector. Similar ideas have been proposed by Hildebrand et al. [11] for providing search widget for general RDF repositories, and by Vizine-Goetz et al. [12] for provid- ing widget for accessing thesauri through the side bar of the Internet Explorer web browser. When the desired concepts have been selected with the ONKI Selector they can be stored into, e.g., the database of the application by using an HTML form. Either the URIs or the labels of the concepts can be transferred into the application providing support for the Semantic Web and legacy applications. For browsing the context of the concepts in thesauri, the ONKI SKOS Browser can be opened by pressing a button. Desired concepts can be fetched from the browser to the application by pressing the “Fetch Concept” button. Thus, there is no need for copy-paste procedures or user interface implementation projects. For information retrieval use cases, ONKI SKOS provides support for find- ing suitable query terms and expanding them by using ontological inference, e.g., based on the concept subsumption, partonymy or associative relations be- tween concepts. In legacy systems, which do not support URIs, the labels of the concepts can be used as query terms. In this context, ONKI SKOS supports multilingual search. If a thesaurus contains, e.g., English and Finnish labels for the concepts, the relevant query concepts can be searched in English, while in the actual information retrieval task the used query terms are in Finnish. This is useful when the end-user does not understand the language of the terms used in indexing the content of the underlying system. The ONKI SKOS Browser (see Figure 2) is the graphical user interface of ONKI SKOS. It consists of three main components: 1) semantic autocompletion concept search,2)concept hierarchy and 3) concept properties.Whentypingtext to the search field, a query is performed to match the concepts’ labels. The result list shows the matching concepts, which can be selected for further examination. When a concept is selected, its concept hierarchy is visualized as a tree struc- ture. The ONKI SKOS Browser supports multi-inheritance of the concepts (i.e. a concept can have multiple parents). Whenever a multi-inheritance structure is met, a new branch is formed to the tree. This leads to cloning of nodes, i.e.

12 This is the way the Finnish General Thesaurus YSA has been used previously via the VESA Web Thesaurus Service, http://vesa.lib.helsinki.fi/ ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 773

Fig. 1. ONKI Selector for concept finding

Fig. 2. The ONKI SKOS Browser a concept can appear multiple times in the hierarchy tree. Next to the concept hierarchy tree, the properties of the selected concept are shown in the user inter- face. Mappings and other relations between concepts act as links allowing user to browse thesauri. In context of geographical ontologies, the place instances can be visualized on a Google Maps13 view, as depicted in the Figure 2, if the instances contain coordinate information.

13 http://maps.google.com/ 774 J. Tuominen et al.

The National Finnish Ontology Service ONKI14 provides documentation and tools for software architects developing ontology-based applications and integrat- ing ONKI services into them [13]. With ONKI Selector Builder a developer can set the desired configuration properties for the ONKI Selector and generate the needed JavaScript code to integrate the ONKI Selector into an application. Also, a hands-on demonstration page for integrating the ONKI Selector is available. The page contains a HTML form, into which the user can type HTML/JavaScript code and see how the resulting page will look like in the end-user’s web browser. The Web Service and AJAX interfaces of the ONKI SKOS server can be used for querying for concepts by label matching, performing query expansion, getting label for a given URI and for querying for supported languages and type URIs of a thesaurus. For testing Web Service methods, a demonstration page is provided at the ONKI service. One can try out the methods of the ONKI API and examine the resulting SOAP request and response messages. ONKI SKOS is implemented as a Java Servlet using the Jena Semantic Web Framework15, the Direct Web Remoting library16 and the Lucene17 text search engine.

4 Configuring ONKI SKOS with SKOS Structures

ONKI SKOS supports straightforward loading of SKOS vocabularies with minimal configuration needs. For using other data models than SKOS, vari- ous configuration properties are specified to enable ONKI SKOS to process the thesauri/ontologies as desired. The configurable properties include the proper- ties used in hierarchy generation, the properties used to label the concepts, the concept to be shown in the default view and the default concept type used in restricting the concept search. When the ONKI SKOS Browser is accessed with no URL parameters, in- formation related to the concept configured to be shown as default is shown. Usually this resource is the root resource of the vocabulary, if the vocabulary forms a full-blown tree hierarchy with one single root. In SKOS concept schemes the root resource is the resource representing the concept scheme itself, i.e. the instance of skos:ConceptScheme. The concept hierarchy of a concept is generated by traversing the config- ured properties. In SKOS these properties are skos:narrower and skos:broader and they are used to express the hierarchical relations between concepts. Hier- archical relations between the root resource representing the concept scheme and the top concepts of the concept scheme are defined with the property skos:hasTopConcept. Labels of concepts are needed in visualizing search results, concept hierar- chies, and related concepts in the concept property view. In SKOS the labels are

14 http://www.yso.fi/ 15 http://jena.sourceforge.net/ 16 http://directwebremoting.org/ 17 http://lucene.apache.org/java ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 775 expressed with the property skos:prefLabel. The label is of the same language as the currently selected user interface language, if such a label exists. Otherwise any label is used. The semantic autocompletion search of ONKI SKOS works by searching for concepts whose labels match the search string. To support this, the labels of the concepts are indexed. The indexed properties can be configured. In SKOS these properties are skos:prefLabel, skos:altLabel and skos:hiddenLabel. When the user searches, e.g., with the search term “cat”, all concepts which have one of the aforementioned properties with values starting with the string “cat” are shown in the search results. The autocompletion search also supports wildcards, so a search with a string “*cat” returns the concepts which have the string “cat” as any part of their label. If a SKOS vocabulary contains concepts that are instances of subclasses of skos:Concept, the search can be limited to certain types of concepts only. To accomplish this, the types of the concepts (which are expressed with the property rdf:type) are indexed. It is also possible to limit the search to a certain subtree of the concept hierarchy by restricting the search to subconcepts of a specific concept. To support this, also the parents of concepts are indexed. Many thesauri include structures for representing categories of concepts. To support category-based concept search in the ONKI SKOS Browser, another search field is provided. When a category is selected from the category search view, the concept search is restricted to the concepts belonging to the selected category. SKOS includes a concept collection structure, skos:Collection,which can be used for expressing such categories. However, skos:Collection is often used for slightly different purposes, namely for node labels18. Thus, instances of skos:Collection are not used for category-based concept search by default.

5 Converting Thesauri to SKOS—Case YSA Publishing a thesaurus in the ONKI SKOS server is straightforward. To load a SKOS vocabulary into the server, only the location path of the RDF file of the vocabulary needs to be configured manually. After rebooting the ONKI SKOS, the RDF file is loaded, indexed and made accessible for users. ONKI SKOS provides the developers of thesauri a simple way to publish their thesauri. There exists quite amount of well-established keyword lists, thesauri and other non-RDF controlled vocabularies which have been used in traditional approaches in harmonizing content indexing. In order to reuse the effort already invested de- veloping these resources by publishing these vocabularies in ONKI SKOS server, conversion processes need to be developed. This idea has also been suggested by van Assem et al. [6] and Summers et al. [14]. We have implemented transforma- tion scripts for, e.g., MARCXML format19, XML dumps from SQL databases and proprietary XML schemas. 18 A construct for displaying grouping concepts in systematic displays of thesauri. They are not actual concepts, and thus they should not be used for indexing. An example node label is “milk by source animal”. 19 http://www.loc.gov/standards/marcxml/ 776 J. Tuominen et al.

An example of the SKOS transformation and publishing process is the case of YSA, the Finnish General Thesaurus20. YSA is developed by the National Library of Finland and is the most widely used controlled vocabulary in Finland. YSA is published in the VESA Web Thesaurus Service, where it receives over 12 million user hits per year. YSA is simultaneously published in the ONKI SKOS, currently in a pilot phase. During this year, the operation of the VESA service will be ended, and the ONKI SKOS becomes the official publishing channel of YSA. To begin the SKOS transformation process, YSA is exported into MARCXML format from the thesaurus management system of the National Library of Fin- land. The resulting constantly up-to-date file is published at the web server of the National Library of Finland, from where it is fetched via OAI-PMH proto- col21 to our server. This process is automated and the new version of the XML file is fetched daily. Instead of fetching the full file every day, the incremental update feature of the OAI-PMH could be used, but we have not tested it. After fetching a new version of the file, the transformation process depicted in Figure 3 is started by loading the MARCXML file (ysa.xml). The Java-based converter first creates the necessary structure and namespaces for the SKOS model utiliz- ing Jena Semantic Web Framework. Next, the relations in YSA are translated into their respective SKOS counterparts,whichisdepictedinFigure4.

Fig. 3. The SKOS transformation process of YSA

A URI for the new concept entry is created through the unique ID in the source file. The preferred and alternative labels are converted straightforwardly from one syntax to another. Similarly the type and scheme definitions are added to the SKOS model. Since the relations in the MARCXML refer not to the identifiers but rather to the labels, the source file is searched for an entry that has the given label and then its ID is recorded for the SKOS relation. MARCXML record for the concept “pistols” can be seen in the left part of Figure 4. The ID of the concept is denoted by the controlfield with tag attribute’s 20 http://www.nationallibrary.fi/libraries/thesauri/ysa.html 21 http://www.openarchives.org/OAI/openarchivesprotocol.html ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 777

Fig. 4. An example of the SKOS transformation of YSA value “001” (“98138”). This ID value becomes the local name of the URI of the corresponding SKOS concept. The category in which the concept belongs to is depicted with the datafield with tag “072” (“ysa67”). The preferred label is the valueprovidedinthedatafield with tag value “150” (“pistols”). Hierarchical and associative relations are marked with datafield with tag value “550”, while the subfield’s attribute code specifies which one of the relations is in question. Once the SKOS transformation is ready, the converter fetches the labels for the concept categories from a separate file (ysa-groups.owl) - these labels are not included in the MARCXML file. Finally, a RDF file is written and imported into ONKI SKOS.

6 Use Cases of ONKI SKOS

ONKI SKOS has been tested in several content indexing and information re- trieval pilot applications.

6.1 Content Indexing The browser-based annotation editor SAHA22 [15] has been used for creating semantic content for semantic portals. The two major applications demonstrat- ing Semantic Web technologies are HealthFinland23 [16] in the health promotion context, and CultureSampo24 [17] in the cultural heritage context. Both systems use the SAHA editor with the services provided by ONKI SKOS and the Na- tional Finnish Ontology Service ONKI. ONKI Selector has been integrated into SAHA for finding suitable concepts in content indexing. The web laboratory Owela25 of the VTT Technical Research Centre of Finland has implemented a service for collecting and sharing text and image clips from the Web26. In the service one can organize the clips into folders and tag them with

22 http://demo.seco.tkk.fi/saha/?lang=en 23 http://demo.seco.tkk.fi/tervesuomi/ 24 http://www.kulttuurisampo.fi/ 25 http://owela.vtt.fi/ 26 http://owela.vtt.fi/tilkut 778 J. Tuominen et al. different categories. The ONKI Selector is used for tagging the clips keywords from vocabularies published in the ONKI service.

6.2 Information Retrieval Kantapuu.fi27 is a web user interface for browsing and searching for collections of Finnish museums of forestry. The collection items are annotated with Finnish terms from General Finnish Thesaurus YSA, Thesaurus for Museum Domain MASA and Agriforest Thesaurus. Kantapuu.fi search page is a web form into which query strings are typed as free text. The query strings can be placed into specific fields, e.g. “keywords”, “place of use” or “time of use”. We have created a demonstration page containing a Kantapuu.fi’s search form with integrated ONKI Selectors which can be used for selecting query terms to be used in the Kantapuu.fi search28. The ONKI Selector is used for finding terms from vocab- ularies of the ONKI service. The used vocabularies are the same as those used in the annotation process of the items, or actually their ontologized versions. To find suitable query terms user can utilize the autocompletion search func- tionality or the ONKI Browser. Thus, the user does not need to be familiar with the vocabularies used in the annotations of the items, as in the case of free text search. The ONKI Selector performs query expansion based on the selected query terms. So, for example a query term “animals” would return items anno- tated with term “cats”. When the desired query terms are selected, the actual search to the Kantapuu.fi system can be executed. The users of the pilot system have given positive feedback especially on the multilingual search possibility. The ONKI Selector Widget has also been integrated into the Viikki Science Library29 reference database system eViikki30. eViikki is a search interface for the library’s collections, which consist of scientific literature on agriforestry. The literature is annotated with terms from General Finnish Thesaurus YSA and Agriforest vocabulary. The ONKI Selector is used for populating the “keywords” field of the search form of eViikki. The fetched concept labels are used in the information retrieval task. Query expansion is not performed currently.

7 Discussion

The main contribution of this paper is to show, both in principle and as a deployed implemented service in use, how thesauri can be published and utilized easily on the Semantic Web, emphasizing the benefits of the use of W3C’s SKOS data model as a uniform vocabulary representation framework. The ONKI SKOS server is presented as a proof of concept for a cost-efficient thesauri utilization method. By using ONKI SKOS, general thesauri accessing functionalities can be easily integrated into applications without the need for users to reimplement their own user interfaces for this. 27 http://www.kantapuu.fi/ 28 http://www.yso.fi/lusto 29 http://www.tiedekirjasto.helsinki.fi/english/ 30 http://www-db.helsinki.fi/eviikki/eviikkihaku.html ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies 779

The utilization of the SKOS structures in an ontology server was described in context of the ONKI SKOS server. The case of the Finnish General The- saurus acts as one example on how an existing, widely used thesaurus can be converted into the SKOS format and be published on the ONKI SKOS server. At the moment some 60 international and national vocabularies, taxonomies, and lightweight ontologies have been published online using ONKI SKOS and the number is increasing. Finally, in order to demonstrate end-user benefits of ONKI SKOS services, we provided some real world use cases of ONKI SKOS in content indexing and information retrieval. Future work for ONKI SKOS includes creating a more extensive Web Service interface for supporting, e.g., querying for properties of a given concept and for discovering concepts which are related to a given concept. The starting point for this API will be the SKOS API. Also, version control techniques and other support for vocabulary management (e.g. protocol for communicating vocabu- lary changes to users) would be needed to support the development phase of vocabularies.

Acknowledgements

We thank Ville Komulainen for his work on the original ONKI server. This work is a part of the National Semantic Web Ontology project in Finland31 (FinnONTO) and its follow-up project Semantic Web 2.032 (FinnONTO 2.0, 2008-2010), funded mainly by the National Technology and Innovation Agency (Tekes) and a consortium of 38 private, public and non-governmental organiza- tions.

References

1. Aitchison, J., Gilchrist, A., Bawden, D.: Thesaurus Construction and Use: A Prac- tical Manual, 4th edn. Europa Publications (2000) 2. Viljanen, K., Tuominen, J., Hyv¨onen, E.: Publishing and using ontologies as mash- up services. In: Proceedings of the 4th Workshop on Scripting for the Semantic Web (SFSW 2008), Tenerife, Spain, June 1-5 (2008) 3. Hyv¨onen, E., Viljanen, K., Tuominen, J., Sepp¨al¨a, K.: Building a national semantic web ontology and ontology service infrastructure—the FinnONTO approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 95–109. Springer, Heidelberg (2008) 4. Staab, S., Studer, R. (eds.): Handbook on ontologies. Springer, Heidelberg (2004) 5. Miles, A., Matthews, B., Wilson, M., Brickley, D.: SKOS Core: Simple knowl- edge organisation for the web. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2005), Madrid, Spain, September 12-15 (2005)

31 http://www.seco.tkk.fi/projects/finnonto/ 32 http://www.seco.tkk.fi/projects/sw20/ 780 J. Tuominen et al.

6. van Assem, M., Malais´e, V., Miles, A., Schreiber, G.: A method to convert thesauri to SKOS. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 95– 109. Springer, Heidelberg (2006) 7. Ding, Y., Fensel, D.: Ontology library systems: The key to successful ontology reuse. In: Proceedings of SWWS 2001, The first Semantic Web Working Symposium, Stanford University, USA, pp. 93–112 (August 2001) 8. Ahmad, M.N., Colomb, R.M.: Managing ontologies: a comparative study of ontol- ogy servers. In: Proceedings of the eighteenth Conference on Australasian Database (ADC 2007), Ballarat, Victoria, Australia, January 30–February 2, pp. 13–22. Aus- tralian Computer Society, Inc (2007) 9. Tudhope, D., Binding, C.: Towards terminology service: experiences with a pilot web service thesaurus browser. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2005), Madrid, Spain, September 12-15, 2005, pp. 269–273 (2005) 10. van Assem, M., Menken, M.R., Schreiber, G., Wielemaker, J., Wielinga, B.: A method for converting thesauri to RDF/OWL. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 17–31. Springer, Heidelberg (2004) 11. Hildebrand, M., van Ossenbruggen, J., Amin, A., Aroyo, L., Wielemaker, J., Hard- man, L.: The design space of a configurable autocompletion component. Technical Report INS-E0708, Centrum voor Wiskunde en Informatica (CWI), Amsterdam (2007) 12. Vizine-Goetz, D., Childress, E., Houghton, A.: Web services for genre vocabularies. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2005), Madrid, Spain, September 12-15 (2005) 13. Viljanen, K., Tuominen, J., Hyv¨onen, E.: Ontology libraries for production use: The Finnish ontology library service ONKI. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 781–795. Springer, Heidelberg (2009) 14. Summers, E., Isaac, A., Redding, C., Krec, D.: LCSH, SKOS and Linked Data. In: Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2008), Berlin, Germany, September 22-26 (2008) 15. Valkeap¨a¨a,O.,Alm,O.,Hyv¨onen, E.: Efficient content creation on the semantic web using metadata schemas with domain ontology services (system description). In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 819–828. Springer, Heidelberg (2007) 16. Hyv¨onen, E., Viljanen, K., Suominen, O.: HealthFinland—Finnish health infor- mation on the semantic web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudr´e-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 778–791. Springer, Heidelberg (2007) 17. Hyv¨onen, E., Ruotsalo, T., H¨aggstr¨om, T., Salminen, M., Junnila, M., Virkkil¨a, M., Haaramo, M., M¨akel¨a, E., Kauppinen, T., Viljanen, K.: Culturesampo—Finnish culture on the semantic web: The vision and first results. In: Robering, K. (ed.) Information Technology for the Virtual Museum. LIT Verlag, Berlin (2007)

Publication III

Jouni Tuominen, Tomi Kauppinen, Kim Viljanen, and Eero Hyvönen. Ontology-Based Query Expansion Widget for Information Retrieval. In Pro- ceedings of the 5th International Workshop on Scripting and Development for the Semantic Web at ESWC 2009, Heraklion, Greece, May 31, Sören Auer, Chris Bizer, and Gunnar Aastrand Grimnes (editors), CEUR Workshop Pro- ceedings, volume 449, pages 52–57, ISSN 1613-0073, online CEUR-WS.org/Vol-449/ShortPaper1.pdf, May 2009.

c 2009 Tuominen et al.

Reprinted with permission.

115

Ontology-Based Query Expansion Widget for Information Retrieval

Jouni Tuominen, Tomi Kauppinen, Kim Viljanen, and Eero Hyv¨onen

Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK) and University of Helsinki http://www.seco.tkk.fi/ firstname.lastname@tkk.fi

Abstract. In this paper we present an ontology-based query expansion widget which utilizes the ontologies published in the ONKI Ontology Service. The widget can be integrated into a web page, e.g. a search system of a museum catalogue, enhancing the page by providing a query expansion functionality. We have tested the system with general, domain- specific and spatio-temporal ontologies.

1 Introduction

In information retrieval systems the relevancy of search results depends on the user’s ability to represent her information needs in a query [1]. If the vocabularies used by the user and the system are not the same ones, or if the shared vocab- ulary is used in different levels of specificity, the search results are usually poor. Query expansion has been proposed to solve these issues and to improve infor- mation retrieval by expanding the query with terms related to the original query terms. Query expansion can be based on corpus, e.g. analyzing co-occurences of terms, or on knowledge models, such as thesauri [2] or ontologies [1]. Methods based on knowledge models are especially useful in cases of short, incomplete query expressions with few terms found in the search index [1, 2]. We have implemented a web widget providing query expansion functionality to web-based systems as an easily integrable service with no need to change the underlying system. The widget uses ontologies to expand the query terms with semantically related concepts. The widget extends the previously devel- oped ONKI Selector widget, which is used for selecting concepts especially for annotation purposes [3]. The user does not have to be familiar with the ontologies used in content annotations by utilizing the autocompletion search feature of the widget, as the system suggests matching concepts as the user is writing the query string. Also, to help the user to disambiguate concepts the ONKI Ontology Browsers [4] can be used to get a better understanding of the semantics of the concepts, e.g. by providing a concept hierarchy visualization. The query expansion widget supports Semantic web and legacy systems1, i.e. either the concept URIs or the concept labels can be used in queries. In 1 By legacy systems we mean systems that do not use URIs as identifiers. legacy systems cross-language search can be performed, if the used ontology contains concept labels in several languages. In addition to the widget, the query expansion service can also be utilized via JavaScript and Web Service APIs. The query expansion widget and the APIs are available for public use as part of the ONKI Ontology Service2 [4]. The JavaScript code needed for integrating the widget into a search system can be generated by using the ONKI Widget Generator3. The contribution of this paper is to present an approach to perform query expansion in systems cost-effectively, not to evaluate how the chosen query ex- pansion methods improve information retrieval in the systems.

2 Ontologies used for Query Expansion

The ONKI query expansion widget can be used with any ontology published in the ONKI Ontology Service. The service contains some 60 ontologies at the time of writing. Users are encouraged to submit their own ontologies to be published in the service by using the Your ONKI Service4. In the following, we describe how we have used different types of ontologies for query expansion.

2.1 Query Expansion with General and Domain-specific Ontologies

For expanding general and domain-specific concepts in queries we have used The Finnish Collaborative Holistic Ontology KOKO5 which consists of The Finnish General Upper Ontology YSO [5] and several domain-specific ontologies expand- ing it. To improve poor search results caused by using vocabularies in different levels of specificity in queries and in the search index we have used the transitive is-a relation (rdfs:subClassOf 6) for expanding the query concepts with their sub- classes. So for example, when selecting a query concept publications, the query is expanded with concepts magazines, books, reports and so on. Using other relations in addition or instead of the is-a relation in query expan- sion might be beneficial. When considering general associative relations, caution should be exercised as their use in query expansion can lead to uncontrolled expansion of result sets, and thus to potential loss in precision [6, 7]. In case of a legacy system (not handling URIs, using labels instead) the use of alternative labels of concepts (synonyms) may improve the search. The relations used in the query expansion of an ontology can be configured when publishing the ontology in the ONKI Ontology Service.

2 http://www.yso.fi/ 3 http://www.yso.fi/onkiselector/ 4 http://www.yso.fi/upload/ 5 http://www.seco.tkk.fi/ontologies/koko/ 6 Defined in the RDFS Recommendation, http://www.w3.org/TR/rdf-schema/ 2.2 Query Expansion with the Spatio-temporal Ontology SAPO

A spatial query can explicitly contain spatial terms (e.g. Helsinki) and spatial relations (e.g. near), but implicitly it can include even more spatial terms that could be used in query expansion [8]. For example, in a query “museums near Helsinki” not only Helsinki is a relevant spatial term, but also its neighboring municipalities. Spatial terms – i.e. geographical places – do not exist just in space but also in time [9, 10]. This is especially true for museum collections where objects have references to places from different times. This sets a requirement to utilize also relations between historical places and more contemporary places in query expansion. To provide these mappings we used a spatio-temporal ontology SAPO (The Finnish Spatio-temporal Ontology) [11]. In SAPO regional overlap mappings are expressed as depicted in Figure 1, where example Turtle RDF7 statements8 express that the region of the latest temporal part of place sapo:Joensuu — i.e. the one valid from the beginning of year 2009 — overlaps the region of the temporal part of sapo:Eno of years 1871– 2008. The temporal part of the place simply means the place during a certain time-period such that different temporal parts might have different extensions (i.e. borders) [11].

sapo:Joensuu(2009-) sapo:Joensuu sapo:begin sapo:unionof "2009-01-01" ; sapo:Joensuu(1848-1953) , sapo:overlaps sapo:Joensuu(1954-2004) , sapo:Eno(1871-2008) , sapo:Joensuu(2005-2008) , sapo:Pyhaselka(1925-2008) , sapo:Joensuu(2009-) ; sapo:Joensuu(2005-2008) . sapo:overlapsAtSomeTime sapo:Eno , sapo:Pyhaselka , sapo:Tuupovaara , sapo:Pielisensuu , sapo:Kiihtelysvaara .

Fig. 2. A place is a union of its tempo- Fig. 1. Overlap mappings between ral parts. Moreover, places may have temporal parts of places. overlapped other places at some time.

For example, the place sapo:Joensuu is a union of four temporal parts, defined in the example depicted in Figure 2. However, annotations of items likely utilize places rather than their temporal parts. For this reason the model uses property sapo:overlapsAtSomeTime to explicate that e.g. a place sapo:Joensuu has — at some point in the history — overlapped together five different places (sapo:Eno and four others). In other words, e.g. at least one temporal part of sapo:Joensuu has overlapped at least one temporal part of sapo:Eno. We have used this more generic property sapo:overlapsAtSomeTime between places for query expansion.

7 http://www.dajobe.org/2004/01/turtle/ 8 The example uses the following prefix - sapo: http://www.yso.fi/onto/sapo/ 3 A Use Case of the Query Expansion Widget

We have created a demonstration search interface9 consisting of the original Kantapuu.fi search form10 and integrated ONKI widgets for query expansion. Kantapuu.fi is a web user interface for browsing and searching for collections of Finnish museums of forestry, using simple matching algorithm of free text query terms with the item index terms. The ontologies used in the query expansion are the same ones as used in annotation of the items11, namely The Finnish General Upper Ontology YSO, Ontology for Museum Domain MAO12 and Agfiforest On- tology AFO13. For expanding geographical places the Finnish Spatio-temporal Ontology SAPO is used. When a desired query concept is selected from the results of the autocomple- tion search of the widget or by using the ONKI Ontology Browser, the concept is expanded. The resulting query expression is the disjunction of the original query concept and the concepts expanding it, formed using the Boolean operation OR. The query expression is placed into a hidden input field, which is sent to the original Kantapuu.fi search page when the HTML form is submitted. An example query is depicted in Figure 3, where the user is interested in old publications from place Joensuu. User has used the autocompletion feature of the widget to input to the keywords field a query term “publicat”, which has been autocompleted to the concept publications, which has been further expanded to its subclasses (their Finnish labels). Similarly, the place Joensuu has been added to the field place of usage and expanded with the places it overlaps. The result set of the search contains four items, from which two are magazines used in place Eno and the rest two are cabinets for books used in place Joensuu. Without using the query expansion the result set would have been empty, as the place Eno and the concept books were not in the original query.

4 Discussion

When implementing the demonstration search interface for the Kantapuu.fi sys- tem with ONKI widgets we faced some challenges. If a query concept has lots of subconcepts, the expanded query string may become inconveniently long, as the concept URIs/labels of the subconcepts are added to the query. This may cause problems because the used HTTP server, database system or other soft- ware components may set limits to the length of the query string. With lengthy queries the system may not function properly or the response times of the system may increase.

9 http://www.yso.fi/kantapuu-qe/ 10 http://www.kantapuu.fi/, follow the navigation link “Kuvahaku”. 11 To be precise, the ontologies are based on thesauri that have been used in annotation of the items. 12 http://www.seco.tkk.fi/ontologies/mao/ 13 http://www.seco.tkk.fi/ontologies/afo/ Fig. 3. Kantapuu.fi system with integrated ONKI widgets.

Future work includes user testing for finding out if users consider the query expansion of the concepts and places useful. Also, systematic evaluation of the search systems used would be essential to find out if the query expansion im- proves the information retrieval, and specifically which semantic relations im- prove the results the most. The user interface of the query expansion widget needs further developing, e.g., the user should be able to select/unselect the suggested query expansion concepts.

Acknowledgements

We thank Ville Komulainen for his work on the original ONKI server and Leena Paaskoski and Leila Issakainen for cooperation on integrating the ONKI query expansion widgets into the Kantapuu.fi system. This work has been partially funded by Lusto The Finnish Forest Museum14 and partially by the IST funded EU project SMARTMUSEUM15 (FP7-216923). The work is a part of the Na- tional Semantic Web Ontology project in Finland16 (FinnONTO) and its follow- up project Semantic Web 2.017 (FinnONTO 2.0, 2008-2010), funded mainly by the National Technology and Innovation Agency (Tekes) and a consortium of 38 private, public and non-governmental organizations.

References

1. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceed- ings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland (July 3-6 1994) 61–69 2. Wang, Y.C., Vandendorpe, J., Evens, M.: Relational thesauri in information re- trieval. Journal of the American Society for Information Science 36(1) (1985) 15–27 3. Viljanen, K., Tuominen, J., Hyv¨onen,E.: Publishing and using ontologies as mash- up services. In: Proceedings of the 4th Workshop on Scripting for the Semantic Web (SFSW 2008), 5th European Semantic Web Conference 2008 (ESWC 2008), Tenerife, Spain (June 1-5 2008) 4. Viljanen, K., Tuominen, J., Hyv¨onen,E.: Ontology libraries for production use: The Finnish ontology library service ONKI. In: Proceedings of the 6th European Semantic Web Conference (ESWC 2009). (May 31 - June 4 2009) 5. Hyv¨onen,E., Viljanen, K., Tuominen, J., Sepp¨al¨a,K.: Building a national semantic web ontology and ontology service infrastructure—the FinnONTO approach. In: Proceedings of the 5th European Semantic Web Conference (ESWC 2008). (June 1-5 2008) 6. Tudhope, D., Alani, H., Jones, C.: Augmenting thesaurus relationships: Possibili- ties for retrieval. Journal of Digital Information 1(8) (2001) 7. Hollink, L., Schreiber, G., Wielinga, B.: Patterns of semantic relations to improve image content search. Journal of Web Semantics 5(3) (2007) 195–203 8. Fu, G., Jones, C.B., Abdelmoty, A.I.: Ontology-based spatial query expansion in information retrieval. In: In Lecture Notes in Computer Science, Volume 3761, On the Move to Meaningful Internet Systems: ODBASE 2005. (2005) 1466–1482 9. Kauppinen, T., Hyv¨onen,E.: Modeling and reasoning about changes in ontology time series. In Kishore, R., Ramesh, R., Sharman, R., eds.: Ontologies: A Handbook of Principles, Concepts and Applications in Information Systems. Integrated Series in Information Systems, New York, NY, Springer-Verlag, New York (NY) (January 15 2007) 319–338 10. Jones, C., Abdelmoty, A., Fu, G.: Maintaining ontologies for geographical informa- tion retrieval on the web. Volume 2888., Sicily, Italy, Springer Verlag (November 2003) 934–951 11. Kauppinen, T., V¨a¨at¨ainen,J., Hyv¨onen,E.: Creating and using geospatial ontology time series in a semantic cultural heritage portal. In: S. Bechhofer et al.(Eds.): Proceedings of the 5th European Semantic Web Conference 2008 ESWC 2008, LNCS 5021, Tenerife, Spain. (June 1-5 2008) 110–123 14 http://www.lusto.fi 15 http://smartmuseum.eu/ 16 http://www.seco.tkk.fi/projects/finnonto/ 17 http://www.seco.tkk.fi/projects/sw20/ Publication IV

Matias Frosterus, Jouni Tuominen, Sini Pessala and Eero Hyvönen. Linked Open Ontology cloud: managing a system of interlinked cross-domain light- weight ontologies. International Journal of Metadata, Semantics and On- tologies, 10, 3, pages 189–201, DOI 10.1504/IJMSO.2015.073879, December 2015.

c 2015 Inderscience Enterprises Ltd.

Reprinted with permission.

123

Noname manuscript No. (will be inserted by the editor)

Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies

Matias Frosterus Jouni Tuominen Sini Pessala Eero Hyvonen¨ · · ·

the date of receipt and acceptance should be inserted later

Abstract Traditionally the structure of the controlled vo- between the values used in the metadata. Semantic interop- cabularies used for annotation can be utilised for reasoning erability between heterogeneous cross-domain data reposi- for information retrieval. However, this can be problematic tories of distributed content providers has become the crit- when applied in the Linked Data context. Linked data typi- ical challenge for the Semantic Web and Linked Data [13]. cally comes from different organisations and domains with Enabling different thesauri and vocabularies to link to one mutually incompatible vocabularies without explicit links another in meaningful ways would allow different organisa- between them resulting in data silos. This paper argues that tions to benefit from each others’ work by enriching a com- to solve the problem one has to transform the annotation vo- mon pool of linked knowledge [15]. cabularies into a Linked Open Ontology cloud. We present a method for transforming a set of legacy thesauri into a cloud of interlinked ontologies while ensuring the validity 1.1 Why a Linked Open Ontology Cloud? of the transitive subclass relations and the means for main- taining the system when component ontologies are updated. The Linked Data movement1 has focused its efforts on build- Our approach has been used and evaluated in practice build- ing cross-domain interoperability by creating and using (typ- ing a cloud called KOKO of sixteen ontologies, with a total ically) owl:sameAs mappings between the entities (e.g. of 47,000 concepts. KOKO has been published as an ontol- places, persons) in the datasets of the Linked Open Data ogy service and is in use in various organisations for both (LOD) cloud. However, when linking metadata, not only data indexing and semantic search. the data entities but also ontologies used in describing them need to be interlinked for interoperability. This calls for more Keywords Light-weight Ontologies Linked Open · refined ontology alignment techniques [8] maintaining the Ontology Cloud Ontology change propagation SKOS · · · integrity of the concept hierarchies. Ontologisation Cross-domain interoperability · This paper focuses on aligning light-weight domain on- tologies intended for metadata annotations. A light-weight ontology in our terminology is a hierarchy of concepts with 1 Introduction subsumption, partitive, and associative relations like in a tra- ditional thesaurus [2], and can be represented using RDFS2, Libraries, archives, museums, and other organisations have simple OWL3 constructs, or SKOS4. been using classifications and thesauri for content indexing The research hypothesis of this paper is that the LOD (annotation) for a long time. There exist vast amounts of cloud could be complemented by developing one or more high-quality annotations describing vast amounts of docu- light-weight “Linked Open Ontology” (LOO) clouds. The ments and other objects but these metadata descriptions are idea of LOO is to provide a shared cross-domain ontology often divided into silos without machine-traversable links for data annotations based on a set of interlinked domain

Semantic Computing Research Group (SeCo), Aalto University 1 http://linkeddata.org/ P.O. Box 15500, 00076 Aalto, Finland 2 http://www.w3.org/TR/rdf-schema/ http://www.seco.tkk.fi/, firstname.lastname@aalto.fi 3 http://www.w3.org/standards/techs/owl#stds 4 http://www.w3.org/TR/skos-reference/ 2 Matias Frosterus et al. ontologies. This idea is also complementary with the idea of Our approach therefore emphasises the systematic de- ”Linked Open Vocabularies”5 that focus on mapping meta- velopment process of a coherent aggregated ontology from data schemas (e.g. Dublin Core, FOAF, and Bibo) onto each a set of component ontologies that are maintained by differ- other. In our implementation of the idea, we created the LOO ent domain communities. This proactive idea of developing cloud from light-weight ontologies based on existing the- a larger ontology based on different domain ontologies [16] sauri. If the resulting LOO cloud is then mapped to, e.g., is different from traditional ontology mapping, where one DBpedia, legacy data annotated using the original thesauri takes a set of existing, independently developed ontologies can be linked to the LOD cloud quite easily. and tries to map them together afterwards. In contrast to Developing a LOO cloud is in many ways different from simply mapping individual ontologies together we take the linking entities in datasets or elements in metadata schemas. mappings as an integral part of the ontologies. The ontolo- A major difference is that in LOO the linked structure is used gies are considered both as individual entities but also as for reasoning based on the hierarchical subclass relation, the integral parts of the cloud forming a greater whole. This as- backbone of ontologies [31]. This fundamental task requires pect is taken into account at all levels of the development special consideration at the ontology boundaries as other- and publication of the ontologies, thus leading to a different wise cross-domain reasoning and ontology-based query and process and underlying philosophy compared with the usual document expansion [3,17] in applications may fail [16]. approach of linking independently developed ontologies. For example, assume that the concept “Mirror” is present in a given ontology A of daily utensils and has the subclass “Make-up-mirror”: 1.2 A National Effort in Building a LOO Cloud a:Make-up-mirror rdfs:subClassOf a:Mirror. In order to test and evaluate these hypotheses in practice, a In a related ontology B of furniture, the class Mirror is LOO cloud called KOKO of cross-domain ontologies (e.g. used as a subclass of the class Furniture: health, cultural heritage, agriculture, seafaring, government, b:Mirror rdfs:subClassOf b:Furniture. defence) has been realised in Finland on a national level dur- Without context, the concept of mirror looks like the ing the FinnONTO research project (2003–2012). Various same in both ontologies, i.e. libraries, archives, museums, and governmental actors have a:Mirror owl:sameAs b:Mirror. annotated vast amounts of documents, each with their own Reasoning and query expansion works fine in A and B thesauri. The aim of the KOKO cloud is to maintain back- separately, but when using A and B linked together, expand- wards compatibility with the existing annotations while al- ing a search query for “furniture” would return falsely hand- lowing for better interoperability between different datasets. held make-up mirrors in addition to pieces of furniture. A Thus, the system is based on transforming a set of legacy larger context than the concept alone has to be considered thesauri in use into light-weight ontologies and interlinking when linking ontologies. them with each other. Currently, KOKO encompasses six- There are also other difficulties specific to developing a teen ontologised thesauri with more to be integrated into the LOO cloud. For example, the principle of dividing a shared system in the future. The current version of KOKO is a har- concept X into subclasses in different ontologies may be monised global ontology of some 47,000 concepts aligned different. For example, “clothes” can be divided into sub- into a single hierarchy. classes based on the gender or the age of their wearers. Then, In the centre of the KOKO cloud is the General Finnish from a human perspective, X may have a confusing mix- Ontology YSO, based on the General Finnish Thesaurus YSA7, ture of subclasses in the linked ontology, hampering its use which has been developed since 1987 by the National Li- in user interfaces (e.g. as a search facet). Addressing issues brary of Finland and is used across various organisations in like these is hard to automate, and therefore LOO develop- Finland. YSO provides the upper hierarchy, originally in- ment in practice requires more coordinated collaboration be- spired by the ideas of “foundational ontologies”, such as tween the developers of linked ontologies than when linking DOLCE [9], and shared “upper ontologies”, such as IEEE datasets of instances. By collaboration, better quality links SUMO [27]. The idea is to provide the LOO cloud with the can be created and various critical issues of linked data qual- common upper concepts needed in many domains. YSO is ity6 can be addressed. Coordinated collaboration also facili- then complemented by more refined ontologies for specific tates larger scale ontology development, which prevents the domains. These component ontologies are developed by the creation of interoperability problems, and minimises redun- experts responsible for the original thesauri preserving the dant ontology development work in overlapping areas of on- domain know-how while allowing for asynchronous updat- tologies. ing based on the resources of each organisation. From an end user’s point of view, KOKO ontology is seen and used 5 http://lov.okfn.org/dataset/lov/ 6 Cf. e.g. http://pedantic-web.org/fops.html 7 http://vesa.lib.helsinki.fi/ysa/ Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 3 as a single ontology without boundaries; for the ontology the process has been applied to practice. Since the tradi- developers, domain boundaries are needed in order to di- tional, comparative evaluation of the process is difficult, the vide the responsibilities of distributed ontology work based extensive application to practice has been used as a proof on domain expertise needed in different parts of the KOKO of concept with the process being adjusted based on real- cloud. life experiences in accordance with the principles of action KOKO ontology was originally published in the ontol- research [5]. The main user groups for this have been the ogy library system ONKI8 [36,37] operating as a living lab ontology developers on the one hand, and the systems that research environment. Over the years ONKI has been in- KOKO has been integrated into on the other hand. In the fi- tegrated into, e.g. several museums, libraries, and web por- nal Section 6, related work is presented and the contributions tals. In 2013 the National Library of Finland launched a joint of the paper are summarised. project with the Ministry of Finance and the Ministry of Ed- ucation and Culture to build a permanent, national ontology service Finto9 based on the ONKI system [35]. The National 2 A Model for Managing a Linked Ontology Cloud Library also took on the responsibility of coordinating fur- ther development of the KOKO cloud. Finto ontology ser- Our ontology development work started by a field study on vice provides a centralised publication channel for the on- how thesauri in use are actually developed. The result was tologies with common interfaces for accessing them in vari- that thesauri are typically developed by independent expert ous applications. groups focusing on concepts in their own domains of inter- est, with little collaboration between the groups. The situa- tion seems to be more or less similar in other countries, too. 1.3 Challenges in Developing a Linked Ontology Cloud Obviously, this model leads to redundant work in develop- ing overlapping areas of thesauri and, at the same time, to in- Several issues need to be taken into account when mov- teroperability problems between the thesauri, since different ing from developing individual thesauri into developing and parties define their concepts without considering each oth- maintaining a cloud of interlinked ontologies. In this paper, ers’ choices. To address these problems we propose a more the following challenges are discussed. coordinated collaborative model for developing a linked on- tology cloud, which is depicted in Fig. 1. Note that in the – Creation of ontologies. How is an existing legacy the- following discussion the bolded numbers in parentheses re- saurus transformed into an ontology? How is a new on- fer to the numbered parts in the figure. tology mapped into the linked ontology system? How are the URIs of the concepts formed? – Development of ontologies. How are the overlapping 2.1 Ontology Creation Phase parts of ontologies recognised in order to minimise the duplicate work of the ontology developers? How are the First, existing thesauri (1) and ontologies (2) are selected changes in one ontology communicated to other related for building blocks of the ontology cloud. A thesaurus is ontologies? How are the errors and other quality issues converted into RDF format using a shared ontology schema recognised in a system of several ontologies? (3) and aligned with a general upper ontology (GUO) (4). – Publication of ontologies. How should the linked ontol- Aligning domain ontologies with a GUO forms the basis for ogy system be presented to the end user in order to make interoperability by providing a complete concept hierarchy it comprehensible? What kind of services for using the and is much easier to maintain than direct, pair-wise map- ontologies are needed for different user groups? pings between domain ontologies [12]. This idea was sug- gested, e.g. by the IEEE Standard Upper Ontology (SUO)10 1.4 Structure of the Paper working group. The alignment can be done in a semi-automatic fash- This paper is divided into four main parts. Section 2 presents ion by first generating equivalency mappings automatically a model for creating and updating a linked ontology cloud and then correcting them manually. In addition to equiva- and lists a set of seven principles guiding the process. The lency mappings, subsumption and partitive relations might following Sections 3–5 describe the development cycle of be used in the case of light-weight ontologies. For exam- the cloud in more detail with an emphasis on how a sin- ple, the concept “antique furniture” in a museum domain gle domain ontology is handled by the process. Throughout, ontology may be aligned with the concept “furniture” in the the KOKO cloud provides an illustrative use case on how upper ontology using a subclass-of relation. In order to cre- ate a complete, fully connected linked ontology, all concepts 8 http://onki.fi/ 9 http://finto.fi/en/ 10 http://suo.ieee.org/ 4 Matias Frosterus et al. of a domain ontology should be mapped to the GUO either tologies, but only those related to them via equivalency, sub- directly or through other concepts in the domain ontology. sumption or partitive relations. The transformation is discussed in more detail in [16]. In our case study, a natural basis for a GUO was the General Finnish Thesaurus YSA that was transformed into the Gen- 2.3 Cloud Publication Phase eral Finnish Ontology YSO. already as a reference in many specialised thesauri. When a development cycle of an ontology cloud has been completed, its logical consistency and other quality aspects should be validated (8) making sure that the resulting ontol- ogy adheres to the constraints of the properties and classes used. Some of the problems encountered can be fixed au- tomatically [34] (e.g. overlaps in disjoint semantic relations and cycles in the concept hierarchy) but they should nonethe- less be communicated to the developers. However, automatic validation has difficulty in finding problems on the semantic level, which usually requires manual checking. Finally, the ontology cloud can be published as services for humans and machines, e.g. via user interfaces, APIs, and downloadable files (9).

2.4 Principles for Building a Linked Ontology Cloud

In our case study, the KOKO cloud based on the sixteen ontologised thesauri presented in Table 1 was created. Be- low, the lessons learned during the work are summarised as a seven-point list of practical building principles. We con- sider the proposed principles novel, as we are not aware of Fig. 1 The model of Linked Ontology Cloud formation and manage- previous ontology design patterns focused on managing an ment ontology cloud as a whole.

I The ontology cloud consists of one general upper on- tology and several domain ontologies that are linked to the upper ontology with subsumption, equivalency, as- 2.2 Ontology Integration Phase sociative and partitive relations. Reason: This means that the domain ontologies do not The ontology integration phase begins after the domain on- have to be linked to each other pairwise, as the upper tologies have been aligned with the GUO (5). There may ontology acts as semantic glue for joining all the on- be mutually overlapping parts between the domain ontolo- tologies together. Shared concepts are included in the gies because the alignments were made between the domain upper ontology. The idea is to minimise the links be- ontologies and the GUO only. To facilitate the integration of tween the domain ontologies, which simplifies their de- domain ontologies (DO) (6), processes, and tools for discov- velopment since a given developer needs only concern ering overlapping parts of the ontologies are needed. Based herself with her own domain ontology and the GUO. on the analysis, it is possible to eliminate redundant devel- II Every concept in a domain ontology has a subsumption opment work by deciding between domain ontology devel- or equivalency relation to a concept in the GUO or a opers which ontologies maintain which overlapping parts. subsumption relation to a concept in the same domain Changes in the GUO create pressure for the domain on- ontology. tologies to be updated accordingly to ensure the consistency Reason: This means that every concept in a domain on- of the cloud (7). For example, in our case study system, tology needs to be able to trace a subsumption relation some 2,000 changes are made annually in the upper ontol- to a concept in the general upper ontology. This en- ogy YSO. The changes should be taken into account in the sures a consistent concept hierarchy for the whole cloud development process of the domain ontologies. Fortunately, and that domain ontologies can not define new top-level not all changes in the GUO are relevant to all domain on- concepts. Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 5

Name of the Domain Number of the other domain ontology directly. The use of broader ontology concepts and associative relations extends the target domain on- YSO General upper ontology 27,200 tology but does not affect its semantics (strongly), whereas (GUO) the equivalency relation would possibly introduce new AFO Agriculture and forestry 7,000 broader concepts to concepts in the target ontology and JUHO Government 6,300 thus redefine their semantics. VI The GUO and the domain ontologies use a shared on- KAUNO Literature 5,000 tology schema and are based on domain-independent KITO Literary research 850 standards. KTO Linguistics 900 Reason: Thus the ontologies can be easily merged and KULO Cultural research 1,500 used together with various data sources outside of those LIITO Economics 3,000 directly linked to the cloud, using standard software, e.g. SKOS or RDFS/OWL tools. MAO Museum artefacts 6,800 VII The resulting ontology cloud should be logically con- MERO Seafaring 1,300 sistent, e.g. by ensuring the integrity of the concept sub- MUSO Music 1,000 sumption hierarchy over ontology boundaries (since tran- PUHO Military 2,000 sitivity is assumed). TAO Design 3,000 Reason: This allows for reasoning and query expansion over the whole cloud. TERO Health 6,500 TSR Working and employment 5,100 VALO Photography 2,000 3 Creating a Thesaurus-based Ontology for the Linked Table 1 The ontologies comprising the LOO cloud KOKO Ontology Cloud

The creation of a cloud of linked ontologies begins with the formation of the general upper ontology. This ontology III If a concept in a domain ontology has an equivalency to provides a completely connected hierarchy of general con- a concept in the GUO, it may not have broader concepts cepts including the topmost division, e.g. between abstract, in the domain ontology, which lack an equivalency re- endurant and perdurant concepts [9]. This forms the basic lation to a concept in the GUO. structure for the domain ontologies to map into, thus saving This is needed to avoid having dependencies Reason: them from having to repeat the higher parts of the hierarchy. from the GUO towards a domain ontology by forbid- ding domain ontologies from introducing broader con- cepts to concepts in the GUO (through inference over equivalency relation). Otherwise domain ontology de- velopers could propagate contradictions or unwanted concept (re)definitions into the GUO, especially in cases where more than one domain ontology is involved. IV The domain ontologies are focused on a clearly bounded domain and are as self-contained as possible. Reason: This allows the domain ontology developer to concentrate on the area of her expertise. This also min- imises dependencies between domain ontologies and facilitates ontology development work. V A concept in a domain ontology may not have an equiv- alency to a concept in another domain ontology. A con- cept in a domain ontology may have an associative re- lation or have a broader concept in another domain on- tology at the discretion of the developers. Fig. 2 Ontology Creation Phase Reason: Dependencies between domain ontologies can- not always be avoided due to inherent relations across the borders of different domains. This means that the Fig. 2 depicts the process of creating a new domain on- developers need to monitor the changes to the other do- tology for the cloud of linked ontologies, which begins with main ontology but this is allowed since it does not affect the conversion (3) of the domain thesaurus (2) from a legacy 6 Matias Frosterus et al. format into RDF, typically SKOS. This is often a straight- mentMakerLight13, which could be utilised for the initial forward operation but in some cases the original thesaurus mapping. can include relations that are not easy to convert to SKOS. Finally, the URI scheme of the ontology needs to be con- In these cases it can be a good idea to retain the original re- sidered. A good starting point is the URI design principles lations in the form of a temporary predicate which can then presented by W3C in Cool URIs for the Semantic Web14. be harmonised in the publication phase of the process. Human-readable meaning-bearing URIs pose difficulties in In the FinnONTO case, we used ad-hoc scripts for the that they are language-dependent and, most importantly, in transformation since the domain ontologies were in differ- the case where further development ends up changing the ent formats. We also transformed them originally into a cus- label of the concept, the persistent URI can not keep up tom OWL-based format since at the beginning of the project with the changes, thus gradually leading to the degenera- SKOS had not yet been established as a standard way of rep- tion of the mapping between the labels and URIs. Since we resenting light-weight RDF ontologies. We continued this are building on thesauri that have been in active use and de- practice in part because of existing tools, such as Proteg´ e´11, velopment for long, we must also consider the long term ef- but also because some thesauri used relations not present in fects of our current decisions. Therefore, we decided to use SKOS. An example of this was the Agriforest12 thesaurus of language- and meaning-neutral URIs where the local name agriculture and forestry, which uses different relations to dif- consisting of a letter and a string of numbers (e.g. p3612). ferentiate between names of concepts that were derived from Since ontologies evolve with new concepts and URIs different sources. These the were preserved for the benefit of introduced and old ones possibly deleted, a system (7) is the ontology developers but were then combined into a com- needed for tracking the use of the URIs. In FinnONTO, we 15 mon predicate for publication. created our own tool for this called Purify . Purify har- monises all local names in a given namespace to the letter The next step of the process is the automatic mapping followed by a number format. The tool keeps a log of the (4) to the GUO (1). This can be done roughly through string mappings between the possible temporary URIs used during comparison between labels or, alternatively, by also utilising the early stages of the development since human-readable the structure of the ontologies, depending on the nature of URIs were found to be useful at the beginning of the ontol- the original thesaurus. Since different thesauri may have dif- ogisation. Finally, Purify makes sure that no two resources ferent labelling conventions (for example plural vs. singular get the same URI and that if the URI generation needs to be forms) some sort of stemming or lemmatisation may also be repeated, the result will be the same every time. needed for the label comparison. The result is a preliminary With the first version of a domain ontology completed mapping which then needs to be checked manually in the (8) and successfully mapped to the GUO, the next step is to next part (5 and 6) by a domain expert ontologist since the integrate it into the ontology cloud proper. Table 1 lists all aim is to produce as good and reliable a result as possible. the ontologies completed for our LOO cloud KOKO at the Aside from checking the mapping, the human development moment. The name of the ontology is followed by a descrip- part also entails the ontologisation work proper, which needs tion of its domain and the number of concepts. to be done when changing vaguely defined terms into more precise ontology concepts [16,21]. In many cases, a term in the original thesaurus becomes several concepts in the on- tology depending on whether the term has multiple mean- 4 Maintaining a Cloud of Interlinked Ontologies ings. When a single term corresponds to several concepts, There are two main concerns when integrating and manag- we have added a qualifier in parentheses to the preferred la- ing domain ontologies in a cloud: how to manage the domain bel for each concept in order to make it easier to differentiate ontologies with respect to one another and how to manage between them. For example, the ambiguous keyword ”child” their relation to GUO. This phase is depicted in Fig. 3, con- could be split into concepts ”child (age)” and ”child (family tinuing from Fig. 2, and is explained in depth below. relation)”. In the FinnONTO project, we did the preliminary map- ping between a domain ontology and the GUO by label com- 4.1 Avoiding Overlap Between Domain Ontologies parison using lemmatisation and custom scripts, while most of the actual ontologisation work was done by the experts The Principles IV and V presented in Section 2 posit that in from the organisations that maintained the thesauri. Now order to keep relations between domain ontologies to a min- there exist specialised ontology matching tools, such as Agree- imum, the domains covered need to be precisely set. With

13 https://github.com/AgreementMakerLight 11 http://protege.stanford.edu/ 14 http://www.w3.org/TR/cooluris/ 12 http://www-db.helsinki.fi/agri/agrisanasto/Welcome eng.html 15 http://puri.onki.fi/info/ Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 7

c) The concepts that overlap are really the same concept and the ontology developers wish for it to remain present in both (or all) of the ontologies. A note should be made that if the concept is changed or developed further, the other developers should be informed as needed. d) The concepts that overlap are actually different concepts but might share a preferred label. In this case, the labels should be differentiated from one another if possible so as to avoid confusion when the ontologies are used to- gether. Great care should be exercised when choosing option c) since it clashes with Principle V and can easily lead to in- creasing complexity in development. This should be avoided as much as possible, since one of the main goals of the pre- sented system is to make the asynchronous development of ontologies possible and having the same concept in two do- Fig. 3 Ontology Integration Phase main ontologies means that both sets of developers need to agree on possible further development. minimal links between domain ontologies, they can be de- In our case study, we implemented a tool called KOAN, veloped independently from one another, but it also means based on simple label matching, for finding ontology over- that the curation of the ontological domains is an on-going laps since the light-weight ontologies do not offer much struc- process. Overlaps can come into being especially in the bor- ture to use as a basis for the discovery task. Furthermore, der areas of two domains where a given set of concepts can identical labels pose the most difficulty for the annotators not be unequivocally said to belong to either one of two dif- using the ontology since they would need to decide between ferent ontology domains. different concepts with the same names based on their place In Fig. 3, we can see that the process of discovering these in the hierarchy. Having concepts with different labels that overlaps between a given domain ontology (1) and the rest end up being the same is much less of a problem since, when of the domain ontologies in the cloud (3) makes use of a the mistake is discovered, it is relatively easy to combine the tool (4) to help the domain ontology developers in finding annotations made using the duplicated concepts. and even analysing the overlaps. This tool can be based on When applied to KOKO, KOAN found lots of overlap- string matching similar labels and reporting on the poten- ping concepts even between ontologies of seemingly very tially overlapping concepts, but it can also make use of the distinct domains. Table 2 shows some of the comparison re- ontological structure and relations to find overlapping con- sults between ontologies by showing the per cent amount of cepts [8]. overlap. For instance, from the first row, second cell, we can Once an overlap between two or more domain ontolo- see that the agriculture and forestry domain ontology AFO gies has been discovered, a dialogue (5) should be started contains 8 % of the concepts of the government domain on- between the developers of those ontologies. Its possible end tology JUHO. Comparing with Table 1 we can see that even results are as follows: ontologies from seemingly very distinct domains can have a lot of overlap between them. Maintaining the overlaps in a) The concepts that overlap are really the same concept and multiple places at the same time creates a lot of unnecessary the developers of the domain ontologies and the GUO work and can lead to inconsistencies in the transitive rela- can agree that the concept is general enough to be in- tions. Our aim is to implement a systematic process for the cluded in the GUO. In this case, the concept can be re- elimination of these overlaps. moved from the domain ontologies. In addition to overlapping concepts, a domain ontology b) The concepts that overlap are really the same concept but may have a concept with an associative relation or a broader the developers can agree that the concept is not needed concept in another domain ontology because cross-domain in both (or all) of the domain ontologies and agree on a relations cannot always be avoided. For example, in our use single domain ontology that should host the concept in case, the museum domain ontology MAO contains the con- question and handle changes and development further on cept ”catapults”, which is a subclass of the concept “weapons” that concept and its subconcepts. This is most common in GUO. On other hand, the military domain ontology PUHO in situations where the concept is clearly in the domain has the concept “single-shot weapons”, which could be used of one ontology and only included in the other due to as the superclass of “catapults” for more refined semantics. historical reasons, for example. However, using relations between domain ontologies may 8 Matias Frosterus et al.

Ontology AFO JUHO KAUNO MERO TERO main ontology in question. A set of criteria for estimating AFO 8% 2% 3% 25% relevance should take into account the differences in ontolo- gies and the relations between them as well as the likely JUHO 7% 16% 5% 40% changes that are going to occur in development. KAUNO 2% 12% 1% 28% In the FinnONTO project, we created a change propaga- MERO 0% 1% 0% 2% tion tool MUTU and a set of accompanying relevance crite- TERO 23% 41% 36% 13% ria as described in [29]. The basic idea is that a change in the GUO is likely to be relevant to domain ontology developers Table 2 Number of overlapping concepts in five domain ontologies of KOKO if it concerns a concept that has been directly linked to from the domain ontology (a connecting concept). If a concept in the GUO has been marked as equivalent or as a superclass to be a justification of moving such cross-domain concepts (“single-a concept in the domain ontology, any change to it is likely shot weapons” in the example) into GUO in order to min- to be of interest to the domain ontology developers. Further- imise dependencies between domain ontologies. more, if the concept hierarchy of an ancestor changes for a connecting concept in GUO, this is likely to be relevant to the domain ontology developer due to the transitive nature 4.2 Keeping the Domain Ontologies up to Date Regarding of the relation. If a concept is removed, rare though that is, GUO that is deemed as interesting due to the fact that this con- cept could have been in use by the domain ontology users The second part of the cloud management process, as de- but has not been duplicated to the ontology itself. Finally, if picted on the right hand side of Fig. 3, is the handling of the a concept has been added that has the same label as a con- changes in the upper ontology. As an implication of Prin- cept already existing in the domain ontology, this is always ciple II, the domain ontologies react to the changes in the relevant. upper ontology. In other words, when the GUO developers MUTU simply finds out the changes between the new release a new version (6), the domain ontologies need to be version of the GUO and the one that was used for the map- potentially updated. ping of the domain ontology, lists these changes according The structure of the cloud aims to allow for the asyn- to the type of the change and relevance, and adds helper chronous updating of the domain ontologies, which can re- classes to the development version of the domain ontology sult in long intervals between the updates of a given domain for grouping the concepts in the ontology editor suite that ontology. Additionally, the ontologies are often developed in need to be checked by the developer (8). MUTU also allows different organisations, and using separate ontology editors the domain ontology developer to configure it according to creates a challenge in how to communicate the changes be- her needs based on, e.g., a specific priority of languages or tween the developers. The two main approaches to convey- on blocking changes from certain properties deemed irrele- ing the changes are push and pull synchronisation [6]. The vant to the domain ontology development. push version propagates the changes in one ontology imme- diately to the other ontologies, whereas in the pull version the change listing is requested when needed by the ontology 5 Publishing a Linked Open Ontology Cloud developer. The push approach is good for situations where the ontologies are updated frequently, so that the ontology Once the individual component ontologies have been devel- developer can quickly ensure the consistency of the ontol- oped and the links between them have been curated, the on- ogy after the changes. However, this approach is challeng- tology cloud needs to be published. The aim is to provide ing if the ontology reacting to changes is update infrequently the users with a single, unified whole that can be used es- due to, e.g., lack of resources. Then it would be preferable sentially as a single ontology like any other. The process that the changes could be acquired when needed and not of publishing a Linked Open Ontology Cloud is depicted in propagated immediately. A long update interval also means Fig. 4. that the amount of changes can build up over time and when the update process of the domain ontology is started, the number of changes that need to be checked can be in the 5.1 Validation, Merging and Publication thousands. In order to ease the work of the domain ontology devel- The publication phase starts when a new version of a do- oper, a tool (7) needs to be used for propagating the changes. main ontology (1) is ready and the developer wants to pub- It would also be beneficial to order or categorise the changes lish it in the LOO cloud. The first step is to clean up the of the GUO somehow according to their relevance to the do- ontology for publication (2), by removing structures needed Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 9 only in the development of the ontology. For example, tem- per language one of the labels is arbitrarily selected for use porary concepts that are used for grouping concepts that are as skos:prefLabel while the others are simply converted under development and editorial notes for internal use by into skos:altLabels. A check is performed in order to de- the ontology developers can be removed. Similarly, tempo- tect overlaps in disjoint label properties (skos:prefLabel, rary ontology-specific predicates should be converted to the skos:altLabel, and skos:hiddenLabel) and the super- common schema. fluous ones are removed. As a best practice, the cycles in the concept hierarchy are removed, and finally extra whites- paces are removed from concept labels. The violations are also reported to the ontology developer so he may fix the problems in a controlled way instead of relying on the auto- matic fixing procedure. We also used Skosify to convert the ontologies from the custom OWL format to SKOS for pub- lication. SKOS was chosen due to the amount of tools avail- able and for its suitability for our use case of light-weight ontologies. After the validation, the new version of the domain on- tology is merged (6) with the GUO (3) and the other domain ontologies (4). The idea is to build a single representation of the linked ontology cloud (7) by merging equivalent con- cepts and giving them a single URI. The differing ontology schemas are harmonised so that the end user does not get confused with the ontology-specific structures and naming conventions. When annotating using the linked ontology cloud, the Fig. 4 Cloud Publication Phase annotator should be making choices between concepts as opposed to ontologies. To this end, the cloud is merged (6) into a single whole (7) by taking all the concepts linked In the FinnONTO case, the domain ontologies were de- with skos:exactMatch relations between them (marked as veloped in a Proteg´ e´ project with the general upper ontology equivalent) and combining them. In case a clearer division included. To help the ontology developer to focus on the between the development version of individual ontologies concepts of the domain ontology, the topmost concepts of and the published version of the cohesive cloud needs to be the domain ontology (the ones that do not have subclass-of made, new URIs can also be assigned to the concepts of the relation to a concept in the domain ontology) are connected cloud. Here care needs to be taken in order to ensure that the with subclass-of relation to a temporary concept acting as a same concepts always map to the same URIs even in cases root concept for the ontology. This root concept is removed where that concept might at first originate from the upper in the clean up process as the goal is to present the resulting ontology and later have been moved to a domain ontology ontology cloud to the end users as a single complete hierar- or vice versa. chy with a single root concept. In FinnONTO, we gave all the concepts in the KOKO Before the new domain ontology version is merged into cloud new URIs in a new namespace. The combination of the cloud it is validated (5) for compliance with Principle VII, the ontologies listed in Table 1 resulted in a combined cloud e.g. by checking the logical consistency and spotting the vi- of some 47,000 concepts. In order to allow the end user to olations of best practices. The results of the validation are select only from a single domain, we assigned domain ontol- communicated to the ontology developer who can then fix ogy specific concept types as subclasses of skos:Concept. the problems. A validator may also fix some of the prob- Once the ontology cloud is merged, it is ready for pub- lems automatically, e.g. by removing cycles in the concept lication (8). In accordance with the “open” part of the name hierarchy. LOO, the cloud is published as Linked Data [13], so that in- For the validation of the ontologies we are using the dividual concept URIs can be referenced from various datasets Skosify tool [34], which converts RDFS/OWL/SKOS vo- and URIs are resolvable to information about the concepts cabularies into proper SKOS format. Overlaps in disjoint se- in human- and machine-readable forms. The LOO cloud is mantic relations (skos:related and skos:broaderTransitivepublished) for humans via user interfaces for searching, brows- are checked and inconsistencies are corrected automatically ing and visualising the ontologies, and as widgets for inte- by removing skos:related relations in problematic cases. grating ontologies into applications. For machine use, addi- In cases where a concept has more than one skos:prefLabel tional REST and Web Service APIs are provided to facilitate 10 Matias Frosterus et al. even deeper integration of the ontologies. Finally, the ontol- been integrated into the collection management system Kauko. ogy cloud is published as a SPARQL endpoint enabling ad The first large scale system using KOKO was the seman- hoc queries for more complex needs. tic cultural heritage portal CultureSampo19 [25] aggregating These services were provided by the ONKI Ontology contents from various data sources. KOKO is a basis of the Service [37], part of which was further developed by the deployed BookSampo20 [24] literature portal of the Finnish National Library of Finland in the form of Finto thesaurus public libraries with some 60,000 monthly end users on the and ontology service. These services act as repositories for Web. vocabularies, thesauri, and ontologies, providing support for At the moment, Finnish archives are integrating KOKO publishing, finding and accessing them. The main user groups ontologies via Finto into their common search registry ser- of ONKI and Finto are ontology developers, content index- vice for metadata on both digital as well as conventional ers and information searchers. Ontology developers need a documents with the AHAA project [19]. A recent company way to visualise the ontology they are developing, and es- pilot user of KOKO has been the Swedish language divi- pecially in the context of the LOO cloud, where the domain sion of the national public service broadcasting company of ontology developers map their ontologies only to the gen- Finland, Svenska YLE. They have used KOKO in the anno- eral upper ontology, there is a need for getting an overview tation of their web content and have complemented it with of the whole cloud. This way the developers can see how Freebase21 for instance references, such as people and or- their domain ontologies are situated in the LOO cloud and ganisations. The pilot has been a success and they are con- discover possible overlaps with other domain ontologies. sidering adopting the system in the Finnish part of YLE as Content indexers and information searchers need ways well as their archive. of finding ontologies and concepts suited for their needs. The multi-disciplinary nature of KOKO means that it The Linked Open Ontology Cloud approach eliminates the is especially suited to organisations that deal with cross- need for finding ontologies and making selections between domain contents such as media organisations, museums, li- them, as one ontology system covering all domains of life braries, and archives. Furthermore, using the concepts from aims to fulfill the needs of everyone. For finding suitable domain ontologies links the content to more in-depth data concepts, ONKI/Finto service provides an ontology browser from the specialist organisations that originally developed which visualises the ontologies as a tree hierarchy and shows the domain ontologies for their data annotation needs. other relations between concepts. The user may use auto- completion search for finding concepts based on their labels. In addition to dereferenceable URIs, machines are served 6 Conclusions and Discussion with REST API, and a SPARQL endpoint16. The general ontology service software powering Finto is developed cur- We described a process for creating and managing a Linked rently by the National Library of Finland as an open source Open Ontology Cloud, based on existing legacy thesauri. To project Skosmos17 and can be freely used to set up a similar conclude, we compare our approach to related work, sum- service. The ontology cloud is published under a permissive marise our contributions, and suggest future work. Creative Commons Attribution license18.

6.1 Related Work 5.2 Use Cases Our work on linking ontological concepts used in annota- During the FinnONTO research project, the adoption of KO- tions complements the idea of the Linked Open Data cloud, KO was hampered by the uncertainty regarding its future. where emphasis is on data entity linking using (typically) With the ontology service project of the National Library owl:sameAs properties [11]. We also complement the work of Finland, the development and publication of KOKO was on Linked Open Vocabularies (LOV)22, which focuses on given sustainable governmental resources thus securing its metadata schema linking based on property hierarchies rather future. However, due to the length of the process of securing than linking domain ontologies. Linking the data through funding, the wide-spread adoption of KOKO is only begin- ontologies allows additional interoperability due to the in- ning. ferenced knowledge gained through the shared ontology se- KOKO and ONKI/Finto have been in daily use in many mantics [14]. Our approach follows this principle and pro- museums, such as the Espoo City Museum, where it has vides a framework for interlinked ontology cloud develop-

16 http://api.finto.fi/sparql; The KOKO cloud is available in the 19 http://www.kulttuurisampo.fi/ named graph http://www.yso.fi/onto/koko/. 20 http://www.kirjasampo.fi/ 17 http://github.com/NatLibFi/Skosmos/ 21 https://www.freebase.com/ 18 http://creativecommons.org/licenses/by/3.0/ 22 http://lov.okfn.org/dataset/lov/ Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 11 ment. Backwards compatibility with existing annotations is have been presented in the context of the OBO Foundry retained by basing the ontologies on legacy thesauri in use. initiative [30]. However, in OBO Foundry the focus is on Another example of highly linked ontologies can be found coordinating the development of different ontologies under in BioPortal23, an ontology repository that has features sup- shared principles, but not on merging them into a single on- porting collaborative (inter-)ontology development. In addi- tology cloud and the challenges therein. General, domain- tion to uploading new ontologies to BioPortal, users can also independent ontology design principles have been proposed create and upload mappings between the concepts of dif- by several researchers [10,38]. Our model uses their ideas as ferent ontologies [28]. The mappings can be used to bridge a foundation, e.g. by organising orthogonal concept domains overlapping ontologies or to extend general ontologies with into separate ontologies and supporting ontologies of differ- specialised ones. Users can comment and create discussion ent granularity levels in the form of a GUO and the domain threads on ontologies, their parts, and mappings, thus sup- ontologies extending it. The principles presented in this pa- porting a collaborative and open inter-ontology development per extend the general ontology design principles by cover- process. However, a set of mapped ontologies does not ap- ing modelling issues related to the management of changes pear as a single whole to the user though one can browse in a linked ontology cloud. the ontologies by following the mappings between concepts. Keeping domain ontologies up to date with the changes Moreover, the mappings are not utilised in the ontology de- of the GUO is closely related to the topic of ontology evo- velopment process to propagate the changes of an (upper) lution, which concentrates on addressing the changes in on- ontology to ontologies extending it, as they are in our LOO tologies over time. Different change types have been listed model. in [33], whereas [20] considers more abstract change pat- In order to move from a group of ontologies into a co- terns constructed from atomic changes. According to [4] herent ontology system, the ontologies need to be recon- users would have liked to see explicitly when a concept cre- ciled. Ontology reconciliation [12] is a broad term, cover- ated by them had been modified, indicating that not all the ing ontology merging, alignment, and integration. Most of changes occurring in an ontology are equally relevant to all the reconciliation methods are automatic or semi-automatic, developers providing the impetus for our work on building which can lead to lower quality, especially if the ontologies a set of priorities for different changes. The detection of were originally expert-made [7]. Our approach emphasises changes in an updated ontology can be done using logs [32] the importance of manual development work in the recon- or by comparing two versions of the ontology [22], which ciliation process. was the approach we chose. Additionally, the extra chal- In ontology modularisation [1], ontologies are divided lenges of distributed ontology development have been ad- into smaller interlinked parts to facilitate distributed devel- dressed in [20,23,18]. In our approach there is no assump- opment and re-use. Our approach merges ontologies based tion that all the ontology developers use the same ontology on existing thesauri to form a Linked Open Ontology Cloud, editor, as the support tools are implemented separately from while the development of the individual ontologies contin- the ontology development suite. ues in a modularised way. Furthermore, we present the re- sulting interlinked cloud as single whole so that the end 6.2 Contributions and Future Work users do not have to make selections between ontologies. There have been several previous efforts on building a We presented the idea of the Linked Open Ontologies (LOO) general upper ontology [26] that can be used as a founda- for fostering data integration. LOO complements dataset en- tional basis for domain ontologies. Some of the upper on- tity linking in LOD and metadata schema linking in LOV. tologies have been developed from scratch, while, e.g. the The realisation of the LOO cloud was achieved by facili- Suggested Upper Merged Ontology SUMO [27] was cre- tating an environment of interconnected but easily manage- ated by merging existing ontologies. We used the General able set of cross-domain ontologies allowing for distributed Finnish Ontology YSO, transformed from an existing gen- ontology development. This can be more efficient since the eral thesaurus YSA, as the upper ontology in our Linked mappings between ontologies can be re-used for different Open Ontology Cloud. In contrast to the previous work on datasets. upper ontologies, which focuses on the creation of an upper Furthermore, we presented a detailed description of man- ontology, our work emphasises the model for managing the aging the overlaps between domain ontologies and the in- domain ontologies as part of the ontology cloud and keep- consistencies resulting from the asynchronous updating of ing them up to date and synchronised with the changes of the GUO compared with the domain ontologies. Finally, we the upper ontology. implemented a LOO in practice with sixteen ontologies and Related to the principles of forming and managing the a full set of tools for the development cycle from a set of ontology cloud presented in this paper, similar guidelines thesauri to a fully realised Linked Open Ontology Cloud. 23 http://bioportal.bioontology.org/ Based on this work, we accompanied the LOO model with 12 Matias Frosterus et al. a set of seven principles that guide the process of building Knowledge Engineering and Knowledge Management (EKAW and managing an ontology cloud. The principles are general 2008), pages 52–56, 2008. enough to be applied to other ontology clouds in addition to 8. J. Euzenat and P. Shvaiko. Ontology Matching. Springer–Verlag, 2007. the one we have built. 9. A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schnei- A central challenge in the linking process is in main- der. Sweetening ontologies with dolce. In Proceedings of the 13th taining the integrity of the transitive relations across differ- International Conference on Knowledge Engineering and Knowl- edge Management. Ontologies and the Semantic Web, EKAW ’02, ent ontologies and we have achieved this through the use pages 166–181. Springer-Verlag, 2002. of a general upper ontology acting as a central hub for the 10. T. Gruber. Toward principles for the design of ontologies used for linking of various domain ontologies. We consider proactive knowledge sharing. International Journal of Human-Computer linking as an integral part of the development process itself Studies, 43(5-6):907–928, 1995. 11. H. Halpin, P. J. Hayes, J. P. McCusker, D. L. McGuinness, and as opposed to simply mapping independently developed on- H. S. Thompson. When owl:sameAs isn’t the same: An analysis of tologies. identity in linked data. In P. F. Patel-Schneider, Y. Pan, P. Hitzler, The model was piloted by implementing it into prac- P. Mika, L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm, editors, tice in extensive scale in various organisations encompass- The Semantic Web – ISWC 2010, volume 6496 of Lecture Notes in Computer Science, pages 305–320. Springer Berlin Heidelberg, ing many different domains. The model evolved based on 2010. lessons learned during the multi-year implementation and 12. A. Hameed, A. Preece, and D. Sleeman. Ontology reconciliation. development process. Feedback was gathered from ontology In S. Staab and R. Studer, editors, Handbook on ontologies, pages developers as well from the systems integrating the linked 231–250. Springer-Verlag, Germany, 2004. 13. T. Heath and C. Bizer. Linked Data: Evolving the Web into a ontology cloud. Based on the success of the prototype, the Global Data Space (1st edition). Morgan & Claypool, Palo Alto, model is now being applied on a national scale in archives, California, 2011. libraries, and museums, as well as in ministries and other 14. P. Hitzler and F. van Harmelen. A reasonable semantic web. Se- governmental agencies. mantic Web Journal, 1(1-2):39–44, 2010. 15. E. Hyvonen.¨ Publishing and using cultural heritage linked data For our future research, we are focusing on building bet- on the semantic web. Morgan & Claypool, Palo Alto, California, ter tools and refining the processes for tracking and com- October 2012. municating the changes in the GUO to the domain ontology 16. E. Hyvonen,¨ K. Viljanen, J. Tuominen, and K. Seppal¨ a.¨ Build- developers. To this end, we are also devising a more formal ing a National Semantic Web Ontology and Ontology Service Infrastructure—The FinnONTO Approach. In Proceedings of the administrative process for the development and overall co- European Semantic Web Conference (ESWC 2008), pages 95–109, ordination of the KOKO cloud. Since it is a joint operation 2008. by over a dozen different organizations the particulars of the 17.K.J arvelin,¨ J. Kekal¨ ainen,¨ and T. Niemi. ExpansionTool: administration need to be both flexible yet well-defined. Concept-based query expansion and construction. Information Re- trieval, 4(3/4):231–255, 2001. 18. E. Jimenez´ Ruiz, B. Cuenca Grau, I. Horrocks, and R. Berlanga. Supporting concurrent ontology development: Framework, algo- References rithms and tool. Data & Knowledge Engineering, 70(1):146–164, 2011. 1. S. B. Abbes,` A. Scheuermann, T. Meilender, and M. d’Aquin. 19. J. Kilkki, O. Hupaniittu, and P. Henttonen. Towards the new era Characterizing modular ontologies. In Proceedings of the 6th In- of archival description – the Finnish approach. In Proceedings of ternational Workshop on Modular Ontologies. CEUR Workshop the International Council on Archives Congress, August 2012. Proceedings, http://CEUR-WS.org, July 2012. 20. M. Klein. Change Management for Distributed Ontologies. PhD 2. J. Aitchison, A. Gilchrist, and D. Bawden. Thesaurus Construc- thesis, Vrije Universiteit Amsterdam, 2004. tion and Use: A Practical Manual. Aslib IMI, 2000. 21. D. Kless, S. Milton, and E. Kazmierczak. Relationships and relata 3. J. Bhogal, A. Macfarlane, and P. Smith. A review of ontology in ontologies and thesauri: Differences and similarities. Applied based query expansion. Information Processing & Management, Ontology, 7(4):401–428, 2012. 43(4):866–886, 2007. 22. K. Kozaki, E. Sunagawa, Y. Kitamura, and R. Mizoguchi. A 4. S. Braun, A. Schmidt, A. Walter, and V. Zacharias. The Ontology framework for cooperative ontology construction based on depen- Maturing Approach to Collaborative and Work Integrated Ontol- dency management of modules. In Proceedings of the Interna- ogy Development: Evaluation Results and Future Directions. In tional Workshop on Emergent Semantics and Ontology Evolution Proceedings of the International Workshop on Emergent Seman- (ISWC2007), pages 33–44, 2007. tics and Ontology Evolution at (ISWC 2007), pages 5–18, 2007. 23. A. Maedche, B. Motik, and L. Stojanovic. Managing multiple 5. F. Burstein and S. Gregor. The systems development or engineer- and distributed ontologies on the Semantic Web. The Interna- ing approach to research in information systems: An action re- tional Journal on Very Large Data Bases (The VLDB Journal), search perspective. In Proceedings of the 10th Australasian Con- 12(4):286–302, 2003. ference on Information Systems, pages 122–134. Victoria Univer- 24.E.M akel¨ a,¨ K. Hypen,´ and E. Hyvonen.¨ BookSampo—lessons sity of Wellington, New Zealand, 1999. learned in creating a semantic portal for fiction literature. In Pro- 6. P. Deolasee, A. Katkar, A. Panchbudhe, K. Ramamritham, and ceedings of ISWC-2011, Bonn, Germany. Springer–Verlag, 2011. P. Shenoy. Adaptive Push-Pull: Disseminating Dynamic Web 25.E.M akel¨ a,¨ T. Ruotsalo, and E. Hyvonen.¨ How to deal with mas- Data. In Proceedings of the 10th international conference on sively heterogeneous cultural heritage data—lessons learned in World Wide Web (WWW 2001), pages 265–274, 2001. CultureSampo. 3(1), 2012. 7. Z. El Jerroudi and J. Ziegler. iMERGE: Interactive Ontology 26. V. Mascardi, V. Cord`ı, and P. Rosso. A comparison of upper Merging. In Proceedings of the International Conference on ontologies. In Proceedings of the 8th Workshop Dagli Oggetti Linked Open Ontology Cloud – Managing a System of Interlinked Cross-domain Light-weight Ontologies 13

agli Agenti: ”Agenti e Industria: Applicazioni tecnologiche degli agenti software” (WOA 2007), pages 55–64, September 2007. 27. I. Niles and A. Pease. Towards a standard upper ontology. In Pro- ceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS 2001), pages 2–9. ACM, October 2001. 28. Natalya F. Noy, Nicholas Griffith, and Mark A. Musen. Collect- ing community-based mappings in an ontology repository. In Pro- ceedings of the 7th International Semantic Web Conference (ISWC 2008), pages 371–386. Springer–Verlag, October 2008. 29. S. Pessala, K. Seppal¨ a,¨ O. Suominen, M. Frosterus, J. Tuominen, and E. Hyvonen.¨ MUTU: An analysis tool for maintaining a sys- tem of hierarchically linked ontologies. In Proceedings of the Workshop on Ontologies come of Age Workshop (ISWC 2011), 2011. 30. B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck, A. Ireland, C. J. Mungall, The OBI Consortium, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S.-A. San- sone, R. H. Scheuermann, N. Shah, P. L. Whetzel, and S. Lewis. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25(11):1251– 1255, 2007. 31. S. Staab and R. Studer, editors. Handbook on Ontologies (2nd Edition). Springer–Verlag, 2009. 32. L. Stojanovic, A. Maedche, B. Motik, and N. Stojanovic. User- driven ontology evolution management. In Proceedings of the International Conference on Knowledge Engineering and Knowl- edge Management (EKAW 2002), pages 133–140, 2002. 33. E. Sunagawa, K. Kozaki, Y. Kitamura, and R. Mizoguchi. An en- vironment for distributed ontology development based on depen- dency management. In Proceedings of the 2nd International Se- mantic Web Conference (ISWC 2003), pages 453–468. Springer– Verlag, 2003. 34. O. Suominen and E. Hyvonen.¨ Improving the quality of SKOS vocabularies with Skosify. In Proceedings of the 18th Inter- national Conference on Knowledge Engineering and Knowledge Management (EKAW 2012), pages 383–397. Springer-Verlag, Oc- tober 2012. 35. O. Suominen, S. Pessala, J. Tuominen, M. Lappalainen, S. Nykyri, H. Ylikotila, M. Frosterus, and E. Hyvonen.¨ Deploying national ontology services: From ONKI to Finto. In Proceedings of the In- dustry Track at the International Semantic Web Conference 2014. CEUR Workshop Proceedings, http://CEUR-WS.org, Octo- ber 2014. 36. J. Tuominen, M. Frosterus, K. Viljanen, and E. Hyvonen.¨ ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In Proceedings of the 6th European Se- mantic Web Conference (ESWC 2009), pages 768–780. Springer– Verlag, 2009. 37. K. Viljanen, J. Tuominen, and E. Hyvonen.¨ Ontology libraries for production use: The Finnish ontology library service ONKI. In Proceedings of the ESWC 2009, Heraklion, Greece, pages 781– 795. Springer–Verlag, 2009. 38. X. Wang, J. S. Almeida, and A. L. Oliveira. Ontology design prin- ciples and normalization techniques in the web. In Proceedings of the 5th International Workshop on Data Integration in the Life Sciences (DILS 2008), pages 28–43. Springer–Verlag, June 2008.

Publication V

Kim Viljanen, Jouni Tuominen, Eetu Mäkelä and Eero Hyvönen. Normalized Access to Ontology Repositories. In ICSC 2012: 2012 IEEE Sixth Interna- tional Conference on Semantic Computing, Palermo, Italy, 19-21 September 2012, pages 109–116, ISBN 978-1-4673-4433-3, IEEE, September 2012.

c 2012 IEEE.

Reprinted with permission.

139

Normalized Access to Ontology Repositories

Kim Viljanen, Jouni Tuominen, Eetu Makel¨ a¨ and Eero Hyvonen¨ Semantic Computing Research Group (SeCo) Aalto University, School of Science, and University of Helsinki http://www.seco.tkk.fi/, firstname.lastname@aalto.fi

Abstract—Ontology repositories, such as NCBO Bioportal, For example, if the user is creating metadata about an article ONKI and Cupboard, help finding and using ontologies on about fishes, she could use an ontology repository containing the Semantic Web. However, currently each ontology repository an ontology about fishes to find out the correct URI for the constitutes a separate island with its own user interface, APIs, users, ontology languages and set of ontologies. Because there concept ”fish”, and then use this identifier as the value in her is not a universal way to access all ontology repositories, doing metadata. global search, browsing, and inference over all available ontology In addition to formal ontologies, a vast amount of other repositories turns out to be technically difficult and is generally kinds of concept collections of various degrees of formality not done. Ontologies are not reused as much as they could and exist that could be useful as identifiers for the Semantic hence the full potential of ontologies is not achieved. To address the problem, we propose the Normalized Ontology Repository Web. We call them informal ontologies. One example of such (NOR) approach to make the ontology repositories universally an informal ontology is Wikipedia, where the URL of each accessible while maintaining their unique functionalities and Wikipedia page correspond to a concept. For example the BBC strengths. The SKOS language is used as the lowest common is using the Wikipedia identifiers for interlinking content [8] denominator for presenting the ontologies. In addition, a simple which is based on RDF representation of the Wikipedia, the API for searching and accessing the ontologies is defined. As a proof-of-concept evaluation, we present three case implemen- DBpedia [9]. Other examples of informal ontologies include tations to demonstrate the NOR approach: 1) the distributed registries maintained by libraries, such as books and people architecture of the ONKI repository, 2) the metasearch for ONKI (e.g. ULAN4), and identifiers maintained by other organiza- and NCBO Bioportal, and 3) publishing informal ontological tions, such as locations (e.g. GeoNames5). In addition, many concept collections as NOR end-points, demonstrated with the websites and their underlying content management systems semantic portal CultureSampo and the metadata editor SAHA. use site specific categories and other types of concept collec- I.INTRODUCTION tions that could be useful for others, too. For example, both ebay6 and Amazon7 have extensive product categorizations. Ontologies and ontology repositories have been considered A problem of current ontology repositories is that each to be a key resource for enabling the vision of the Semantic system constitutes a separate island with its own functionalities Web [1]–[3]. Ontology repositories are used for publishing, and its own set of ontologies [10]. Each system has its sharing and reusing ontologies and vocabularies for content own user-interface, own API, and support different ontology indexing, information retrieval, content integration, and other languages. This limits the user from using efficiently different purposes. Current implementations of ontology repositories 1 ontology repositories together because, for example, searching include for example the NCBO BioPortal [4], the Finnish for a specific concept simultaneously from many repositories Ontology Library Service ONKI2 [5], Cupboard [6], the forth- 3 is not possible but requires visiting each repository separately. coming Open Ontology Repository (OOR) [2], and there are As a solution to the problem, we propose a universal access many other systems, too [3]. method to ontology repositories based on a normalized pre- An ontology is a shared specification of a conceptualization, sentation of the ontology content and a shared API. We call defining concepts of a specific area of interest to allow this the Normalized Ontology Repository (NOR) approach. In sharing of knowledge [7]. Ontologies typically contain textual addition to ontology repositories, we argue that it is relevant to information about the concepts, relations between the concepts consider non-ontological concept collections also as valuable and, perhaps most importantly, define the unique identifiers sources for concept identifiers. Therefore, we suggest that (the URI in the context of the Semantic Web) of the concepts. the NOR approach could be used for publishing such non- With the help of the identifiers, the concepts can be referred to, ontological sources, too. for example, as values in metadata. Therefore, one typical use In the following, we first discuss why there is a need for a case for ontology repositories is to support the user in finding multitude of ontology repositories with different functionalities relevant ontological concepts from the underlying ontologies. instead of just creating one application or web service to With the help of concept search and browsing functionalities, address all the needs. Then we present the NOR approach the user can find the best matching concepts for her needs. 4http://www.getty.edu/research/tools/vocabularies/ulan/ 1http://bioportal.bioontology.org/ 5http://www.geonames.org 2http://onki.fi 6http://www.ebay.com 3http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository 7http://www.amazon.com for creating universal access to different ontology repositories. practical. In addition, due to business reasons, some ontologies After this, three implementations of the NOR approach are are not published as files but only as a service. In these described. Finally, related work is presented, the results of cases, the ontology is only available via the specific API or a this paper are summarized and discussed. user-interface, but can not be uploaded to a shared ontology repository. II.MOTIVATION Security or business reasons may require using internal To motivate our work, we discuss the following questions: ontology repositories. For example, security reasons of an Is there a need for different ontology repositories? Would organization may require that selected ontologies are avail- simultaneous access to ontology repositories be useful? Do able only for internal use or that the server logs of who existing Semantic Web technologies and practices address the checked what ontological concepts remain confidential. Such needs of ontology repositories? requirements can be addressed with private, internal ontology repositories that are fully controlled by the organization itself. A. One Size Does Not Fit All All concept collections are not ontology repositories. As Instead of using a multitude of different implementations we pointed out in the introduction, a vast amount of informal for ontology repositories, one could argue that a single imple- ontological concept collections exists that are useful as identi- mentation could address all the different needs of the users, fiers for the Semantic Web, such as the Wikipedia, the various and the needs created by the different types of ontologies and registries maintained by libraries and categorizations located in application domains. In addition, there could be a single global various websites, such as ebay and Amazon. A single ontology ontology repository that would contain all ontologies. While repository system most probably will not replace all of the both scenarios could theoretically be possible, we argue that different systems that are used for maintaining such informal there are a multitude of challenges in such approach due to ontologies and vocabularies of various degrees of formality. the following reasons: Different ontologies and user needs require different func- B. Simultaneous Access to Repositories tionalities. For example, the ONKI ontology repository sup- The typical way to use ontology repositories and ontologies ports different types of ontologies and implements different in applications is that the application developers choose a visualizations, such as a geographical map interface for ge- specific ontology repository and specific ontologies in advance ographical ontologies and a tree visualization for concept which are then used in the application. The end-user does not hierarchies, as depicted in Fig. 1. This is done to address have to bother what ontology repository or ontology to choose the different needs of different ontologies such as general (ab- when using the application. We argue that the practice of using stract) ontologies versus geographical ontologies. For example, only one repository leads up to the following problems. the BioPortal has been designed originally to address the needs The optimal concept or ontology might not be found. of the biomedical domain. Also different ontology languages Because searching simultaneously ontology repositories is not require different technical implementations to maximize the possible, the end-user or the application developer might not benefits of the given formalism. notice that there exists a suitable ontology or ontological concept for one’s needs. Due to this, the quality of ontology- enabled applications and metadata may decrease when the perfect concepts are not found and used. Also redundant new ontologies and concepts may created if ontology developers do not find existing ontologies, which makes it more difficult to maintain the interoperability of data, because data origi- nating from different sources is described using a different (redundant) ontology, which means that the ontologies must first be interlinked before they can be used for interlinking the underlying data. High quality ontologies might be underused. The more a specific ontology is used, the more established it is to act as the de facto standard for representing concepts and metadata Fig. 1. Some of the user interfaces of the ONKI repository, including of its specific domain. This also increases the ontology de- ontology listing, map visualization for geographical ontologies, and concept velopers and publishers benefits for creating and maintaining hierarchy visualization. the ontology. If a specific ontology repository is not found by the potential users, then also the high quality ontologies Some ontologies are not available as files but only as contained in the repository will be underused, and the benefits services. Typically, ontologies are published as files that can from creating the ontology decreases. In addition, as pointed be uploaded to ontology repositories. For example, if the out before, informal ontological concept collections might ontology changes constantly or if the size of the ontology not be considered as valuable sources for concept identifiers is substantial, publishing the content as a file may not be such as the Wikipedia or subject heading thesauri maintained by libraries because they are not published in an existing is not based on Semantic Web technologies. Making SPARQL ontology repository or made available using standard Semantic queries require also advance knowledge on what ontology Web formats. Publishing them as ontology repositories would language has been used to be able to make a matching query increase the benefits of existing work. and to interpret the result. Ontologies are not interlinked between repositories. The To conclude our analysis, in the foreseeable future there will ontology repositories are currently not acting as model citizens be many different ontology repositories. Accessing simultane- of the Semantic Web, since their ontological content is not ously these repositories would be useful and is not solved in interlinked between repositories. That is, the ontologies and an optimal way with current technologies. the ontology repositories do not implement and follow the Linked Data [11] practices at the moment. For example, III.THE NORMALIZED ONTOLOGY REPOSITORIES (automatic) linking to relevant concepts in other ontologies As a solution to the problems presented above, we propose could help the users to find the best ontologies and concepts the Normalized Ontology Repository (NOR) approach. NOR for each need. consist of 1) a normalized presentation for ontology concepts, Publishing same ontologies in many repositories creates making thus the different ontology language schemas inter- challenges. Some ontologies may be published in many On- operable, and 2) a simple API for accessing the ontology tology Repositories because they have been considered to be repository. useful by the repository publisher, or because they have been uploaded by the ontology developer to as many repositories A. Normalized Representation of Ontological Concepts as possible to reach as many users as possible. Maintaining Ontologies are presented using different ontology lan- the same ontology in many repositories may lead to redundant guages, such as OWL, RDFS and SKOS, and there exists many work because new versions of the ontology has to be updated informal ontologies, too. From the interoperability point of in all repositories. Also, some repositories may contain older view, this creates a problem, because each ontology language versions of the ontology which creates compatibly problems must handled as a separate case. In the worst case, an when using metadata based on the ontologies. application developer have to handle ontologies presented in Internal ontology repositories require maintenance. Inter- many different languages to build an application that utilizes nal, potentially confidential, ontology repositories may con- ontologies. Due to this, for example, the ONKI repository has tain public ontologies. Maintaining the public ontologies to a rule-based configuration language to adjust the system to the latest versions require additional work. To avoid this, support ontologies represented in various kinds of RDF based simultaneously access to both internal and public ontology languages. repositories would be beneficial. To avoid complicated mappings and inference of hierarchi- cal and other relations, we propose that each ontology reposi- C. Shortcomings of Existing Technologies tory should provide a normalized, dumbed down presentation General Semantic Web search engines, such as of the ontology concepts in addition to the native format 8 9 Swoogle [12] and Sindice [13] are not focused on of the ontology. As the normalization language we suggest ontologies but provide general search of all kind of RDF using the RDF based Simple Knowledge Organization System 10 data. Ontology search engines, such as Falcons [14] and (SKOS)12, which is a RDF based language for presenting Ontosearch2 [15] address ontology specific needs, but do thesauri, classification schemes, subject heading systems and not address the problem of accessing informal ontology taxonomies within the framework of the Semantic Web. SKOS repositories. In addition, all of the previously mentioned is by design intended to serve as a common denominator be- search engines are based on crawling the ontology sources, tween different modeling approaches and therefore we decided which means, that they are may not always be up-to-date. In to use it compared to other alternatives, such as OWL or addition, ontologies that are only available as services, via an RDFS. API or user-interface, may not be indexed. Hiding ontological details makes it easier for the applica- Ontologies are represented using various languages, such as tions using the NOR compatible ontology repositories. After the Semantic Web languages RDFS, OWL and SKOS, Com- finding an interesting concept, the user can be directed to mon Logic, Excel, HTML, database tables, and application the specific Ontology Repository with its full functionality specific languages. A shared practice is missing on how to for using the specific ontology. Our intention is to make it publish ontologies on the Semantic Web [2]. easier to access the basic information of ontological concepts 11 SPARQL is the standard way to provide an application in an unified way, not to restrict the user from using the interface to Semantic Web databases, and it can be used original, full-blown ontology languages and functionalities of also to access ontology repositories. However, implementing the underlying ontology repositories for specific needs. a SPARQL end-point can be difficult if the underlying system In practice, a NOR compatible ontology repository must provide a concept lookup method: 8http://swoogle.umbc.edu 9 concept?uri=[concept identifier] http://sindice.com • 10http://ws.nju.edu.cn/falcons 11http://www.w3.org/TR/rdf-sparql-query/ 12http://www.w3.org/2004/02/skos/ which returns the normalized SKOS version of the given concepts presented using a JSON based response format. Other concept, identified by the concept URI. For example, to get result languages and formats may be considered in the future, the normalized concept representation of yso:p907 from the but we deemed this representation to be simpler than, for ONKI ontology repository, the lookup request URL is: example, representing the same information as an ordered list http://onki.fi/nor/concept?uri=http%3A%2F%2F in RDF. www.yso.fi%2Fonto%2Fyso%2Fp907 For example, a search for “fish” to the ONKI ontology repository is done with the following URL: which returns the following SKOS representation13 of the given concept followed by the (optional) native representation: http://onki.fi/nor/search?q=fish&l=en # Namespace declarations omitted The system responds with the following result: # Normalized SKOS representation begins {"concept-label" : "fish", a skos:Concept; "concept-label-language" : "en", skos:prefLabel "fish"@en, "kala"@fi; "concept" : skos:broader "http://www.yso.fi/onto/yso/p907", ; "http://onki.fi/nor/concept?uri=http%3A #additional properties omitted %2F%2Fwww.yso.fi%2Fonto%2Fyso%2Fp907", #link to the native concept format "native-concept" : nor:describes yso:p907 "http://www.yso.fi/onto/yso/p907", . "ontology-abbreviation" : "yso", # Native representation begins (optional) "ontology-label" : "Finnish General Upper yso:p907 Ontology", a ysometa:Concept; "ontology-label-language" : "en", ysometa:prefLabel "fish"@en, "kala"@fi; "ontology-uri" : #...additional properties omitted "http://www.yso.fi/onto/yso" . } ... The SKOS presentation above describes key information ], "metadata" : {"containingHitsAmount" : 50, about the given concept (yso:p907) such as the labels (in "moreHitsAmount" : 1467} English “fish”, in Finnish “kala”), and the URL to the nor- } malized broader concept yso:p6580 (foods). In addition, the In the result, concept is the URI of the concept, normalized- native representation yso:p907 is also presented as part of the concept is the URL of the normalized representation of the normalized concept lookup response. given concept, and native-concept is the URL to the native To avoid cluttering the native presentation by adding addi- representation of the concept. tional RDF triplets to it, the native and normalized formats are kept apart from each other with the following RDF property14: C. Ontology Repository Metadata nor:describes • To find NOR compatible ontology repositories, a list of The property is used for referring to the native concept repositories that conform to the NOR principles would be help- presentation from the normalized SKOS representation. To ful. However, to avoid the problems of centralized systems, we avoid making unintended conclusions, we did not use, for do not require ontology repositories to publish information example, the owl:sameAs property which would have meant about themselves to any specific registries. that the normalized and the native presentations would refer To help finding suitable repositories and ontologies for one’s to the same thing, which may not be true. need, we suggest that the NOR ontology repositories publish Finally, in some cases the ontology repository publisher may metadata about the available ontologies using the following have decided to use SKOS as the native representation for method: the concepts. If so, the nor:describes relation and the native ontologies representation can be omitted. • which returns metadata about the ontologies in the repos- B. Concept Search itory and the NOR end-point URL of each ontology. The metadata of the ontologies can be represented using, for exam- To make searches to a NOR compatible ontology repository ple, the Vocabulary of Interlinked Datasets (voiD)15 metadata we define the following method: language. Additional information about the ontology, such search?q=[query]&l=[language] • as the title and description, may be expressed using e.g. The search method is used for finding concepts matching the the Dublin Core metadata schema, the Ontology Metadata given query string and language. Currently, the query string Vocabulary (OMV)16, and the upcoming Catalogue Vocabulary can only contain a keyword, but in future the query language (dcat)17. may be extended. The method returns a list of matching 15http://rdfs.org/ns/void# 13presented using the RDF Turtle syntax 16http://omv2.sourceforge.net 14nor namespace: http://purl.org/finnonto/schema/nor 17http://www.w3.org/egov/wiki/Data Catalog Vocabulary To express the URL of the NOR end-point for a given The NOR approach generalizes and unifies experiences ontology, we define the following RDF property: gained from our work on the ONKI repository and the ONKI nor:endpoint API. Therefore, a majority of the following case studies are • For example, in the case of the ONKI ontology repository, based on ONKI, viewed from the NOR perspective. The the ontology metadata is available at: functionalities of the NOR API is a subset of the ONKI API’s http://onki.fi/nor/ontologies functionalities, however, a key difference is that the ONKI API represents the SKOS description of a concept in an ONKI which returns following metadata (excerpt): specific JSON format to avoid the overhead of parsing RDF # Namespace declarations omitted in the ONKI frontend, but in the NOR API we propose using RDF to represent this information. a void:Dataset ; dc:title "Finnish General A. NOR as an Internal Architecture Upper Ontology"@en ; dc:creator The ONKI SKOS ontology server [16] has been used for ; dc:license ; Service ONKI [5], which has been running as a pilot service foaf:homepage from September 2008. The system is in living lab use with ca. , 18 nor:endpoint . 10 000 unique human visitors monthly , and there are over ... 300 registered users of the APIs and widgets. Even though ONKI SKOS supports especially vocabularies presented in The metadata can be used for creating for example a cata- SKOS, the server can be used for publishing ontologies pre- logue of NOR compatible ontology repositories and concept sented in RDF(S) and restricted OWL. To access the multitude collections. In addition, this could be used for implementing of ONKI SKOS servers, the ONKI system implements a front- a metasearch service to search simultaneously multiple under- end service for making metasearches to the ONKI SKOS and lying NOR compatible ontology repositories. other ONKI back-ends using a shared HTTP API (see Fig. 2). D. General Remarks on the API The back-ends and their respective ontologies are described We argue, that a simple HTTP API is easy to implement with metadata to enable the front-end to locate the available both for the ontology repository developers and for the de- ontologies and to display information about the ontologies to velopers that want to access the NOR compatible ontology the users. repositories. In addition, a simple API is easy to implement Searching for concepts using an ONKI backend server even if the underlying ontology repository is not based on is done with its HTTP API method search, which returns RDF but is, for example, a relational database of people, which concepts matching to the query string in a specified language. could be highly relevant to publish as a NOR endpoint. Thus, The getFullPresentation method returns all information about compared to e.g. using the RDF query language SPARQL, the a given concept, such as the preferred and alternative labels, simple API approach makes it easier for both publishers and the transitive parent concept tree, and the related concepts. users to benefit of the NOR network. Independently of the language each ontology is presented in, This does not limit, however, the underlying ontology each concept is always returned in a uniform SKOS inspired repositories from implementing in addition, for example, a JSON format which describe the normalized basic information SPARQL end-point. A key idea behind NOR is that the of the given concept. native functionalities of the underlying ontology repositories Building on this underlying distributed architecture, three are available for users that need more functionalities than what clients have been designed and implemented. The ONKI3 19 the simple NOR API and normalized presentation can provide. Browser is a metasearch and browsing user interface for The proposed API is summarized in the Table I. accessing the ONKI SKOS and other back-end servers. For example, making a global query to all ontology servers can TABLE I be done. Also, a directory listing of the ontologies in the THE NORMALIZED ONTOLOGY REPOSITORY API. ONKI Ontology Repository is provided based on the metadata Method name Parameters Return value about the published ontologies. The ONKI3 user interface was 20 concept concept identifier normalized concept represen- mostly implemented using PHP . tation Another client is the JavaScript-based ONKI Selector wid- search query string, matching concepts get [17] for adding ontological concept search to HTML language forms. The third client is a simple URI resolver for deref- ontologies - ontology metadata erencing the end-user’s ontology concept URI requests to a suitable representation provided via the ontology repository IV. CASE STUDIESAND EVALUATION network, such as HTML or RDF.

To analyze the idea of NOR, we have implemented three 18Measured with Google Analytics. proof-of-concept prototypes which will be presented and dis- 19http://onki.fi/en/browser cussed in the following. 20http://www.php.net/ Fig. 2. The ONKI architecture is based on a distributed metasearch approach.

The loosely coupled ONKI architecture has turned out to be a flexible and modularized approach for implementing an ontology repository consisting of multiple back-end ontology servers. The normalized representation of the underlying on- tology repositories have made it easy to implement a user- interface for accessing all underlying repositories. Making multiple HTTP requests to back-end servers may be slow in the worst case, but in our test implementation this lag has not Fig. 3. The user-interface of the ONKI-BioPortal metasearch prototype. been a problem.

B. Searching Simultaneously BioPortal and ONKI C. NOR for Informal Ontological Concept Collections To test the NOR approach in a distributed setting of multiple Besides Ontology Repositories, applications often need to independent ontology repositories, we implemented a proof- refer also to informal ontological concept collections, such of-concept metasearch prototype to search simultaneously the as authority or place databases. However, the functionalities ONKI SKOS [16] servers described above and the NCBO required for such data sources are usually very similar to those BioPortal [4]. The NCBO BioPortal is an open repository of required for ontology repositories. For example, in an editor biomedical ontologies and it has been used for publishing over environment, similar semantic autocompletion search func- 200 ontologies [4]. BioPortal provides functionalities, such as tionalities are used for both ontological and non-ontological concept and ontology search and browsing, peer reviewing of concept collections, along with the same functionality for the ontologies, and support for creating and viewing mappings describing and visualizing the possible choices returned from between ontologies. such a search. Informal ontological concept collections also The ONKI-BioPortal metasearch prototype allows the user often change more rapidly than their ontological counterparts, to find the relevant concepts from the participating ontology so it makes even more sense to access the original system repositories, without having to know in advance which repos- through programmatic APIs than exporting and publishing the itory to make the search to. data in a ontology repository. In order to test how the NOR Since the ONKI front-end [5] was already designed using approach fared in the context of such informal ontological a metasearch approach, the ONKI-BioPortal prototype was concept collections, the ONKI API was implemented in two implemented by creating a wrapper for BioPortal which imple- applications: the semantic portal CultureSampo [18] and the ments the ONKI API’s search and concept lookup (getFull- SAHA metadata editor [19]. Both are Semantic Web applica- Presentation) methods. When calling the wrapper, it makes tions, but their focus is not on ontologies but to display and requests to BioPortal, parses BioPortal’s XML messages, edit all kinds of semantic data. and transforms them to the ONKI JSON format. Since the For CultureSampo, the ONKI API was actually imple- BioPortal API does not contain a concept lookup method that mented to benefit those using SAHA to edit data. This was would return all information about the specific concept with because the CultureSampo database contains, for example, a single request, multiple HTTP REST requests have to be a large number of places, people and organizations that are made to get all the needed information about a concept. useful to people indexing new content. For added freedom, Fig. 3 presents the ONKI-BioPortal search prototype user- the CultureSampo ONKI API was parametrized, so that the interface displaying the result for a metasearch query for types of objects that search operations return can be specified “fish product”. The result consists of 22 hits which are found dynamically. This way, one can say for example that they from the BioPortal and the ONKI SKOS back-ends. The want an autocompletion facility of all the organizations, all hits originating from BioPortal are labeled as ”BioPortal” the places, or all the historical events in CultureSampo. for demonstrating purposes, but in actual use, instead of While SAHA was already a client to the ONKI API of the ”BioPortal” the name of the ontology should be displayed. ONKI Ontology Repository and CultureSampo, the API was implemented also into SAHA itself. This was done to make the goal of publishing both ontology repositories and informal possible the creation of a network of dynamically updated, col- ontological concept collections as uniform services. laboratively curated concept collections. The multiple projects Ontology Repositories such as BioPortal and Cupboard using the SAHA editor to index content often need to add support publishing interlinked ontologies, but the ontologies new places, organizations or people to their list of reference have to be uploaded into a centralized service for a global values. However, until now, these have all resided in the search. On the other hand, the OOR [2] initiative intends to private data spaces of the different projects using SAHA. Now, design an Ontology Repository framework that addresses the the intention is to move these created concepts into SAHA needs of all users, and includes an inter-repository content projects of their own, so that one SAHA project will hold change protocol to keep the different OOR repositories up to collaboratively curated place database, while another contains date. In contrast to these, the NOR approach does not restrict a database of organizations and people. These can then be the ontology publishers in where to publish the ontologies or linked through the ONKI APIs to each other, as well as to the what software to use. Instead, the ontologies can be published primary indexing projects. In this way, the various projects using an ontology service that is optimized for the specific can start to directly benefit each other. ontology and the user’s needs. If the organization wants to promote and make their ontologies available to the NOR users, V. RELATED WORK they can implement the NOR API to make their repository compatible with other NOR repositories. If needed, the NOR The work is partially based on our previous work on the API of a repository can be restricted to selected users or made national ontology library ONKI [5], [20] and is related to publicly available for anybody. 21 the open ontology repository (OOR) initiative which aims at Compared to the Open Knowledge Base Connectivity developing an interoperability infrastructure for ontologies [2]. (OKBC) specification27 and the agent communications lan- Compared to more general methods of accessing RDF guages FIPA-ACL28 and KQML29, which all can also be used 22 data, such as SPARQL and Linked Data [11], the NOR to access ontological information, the NOR approach is more approach focuses on ontologies. For example, when searching focused on the specific use-cases of finding ontologies and for concepts with the NOR API, one does not need to know ontology concepts, and to get relevant information about them. what RDF properties are used in the data to express the labels. The OntoCAT is a programming interface to query multiple In addition, the ontology repositories can be optimized to ontology repositories seamlessly from an application [10]. A respond quickly to specific API queries. A normalized presen- wrapper is implemented for each supported ontology repos- tation of ontological concepts (SKOS) could, however, also be itory, such as the NCBO BioPortal. In comparison, to avoid beneficial for querying the data via SPARQL, and browsing wrappers, the NOR approach is based on defining a shared, the ontology repositories as linked data. For example, one does unified representation for the ontology repositories. A well- not have to know which specific hierarchical relation (e.g. known limitation of wrappers is that changes in the underlying rdfs:subClassOf or skos:broader) has been used, because the representation often breaks the wrapper. normalized hierarchical relation is constant. APIs for accessing ontologies and vocabularies published VI.DISCUSSION by other authors previously include the SKOS API23 and This paper argues that ontology repositories should be made the OWL API24. Compared to them, the NOR approach pro- accessible using a shared API that would provide a simple but vides a higher abstraction, independent from specific ontology universal methods for accessing the repositories in a uniform languages, and a lightweight and simple API. Compared to way. In addition, the ontologies should be presented using a the APIs of BioPortal [4], Swoogle25 [12], Watson26 [21], normalized concept representation. ONKI SKOS and others, the NOR API focuses on a few The NOR approach has been evaluated with three case stud- basic methods that reflects the basic functionality of ontology ies: The ONKI ontology repository case study demonstrates repositories, e.g. concept search. using the NOR approach for building an ontology service The Ontosearch2 [15] does a automatic complexity reduc- consisting of over 70 underlying back-ends with over 10 000 tion of ontologies to ensure answering the ontology search unique monthly users. The NCBO BioPortal and ONKI case queries within a specific time limit. This automatic approach study demonstrates using the NOR approach for creating a however require using the OWL ontology language which is global search and browsing user-interface for accessing inde- a limitation since many ontologies are not presented in that pendent distributed ontology repositories. Finally, the SAHA specific language. In contrast, the NOR approach is based on metadata editor and the CultureSampo semantic portal case defining the normalized language and the simple API with study demonstrates that the NOR approach can be used for accessing non-ontological concept collections. 21http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository The outcome of this work is that the NOR approach is 22http://www.w3.org/TR/rdf-sparql-query/ feasible for providing a unified access to a multitude of 23http://www.w3.org/2001/sw/Europe/reports/thes/skosapi.html 24http://owlapi.sourceforge.net/ 27http://www.ai.sri.com/ okbc/spec.html 25http://swoogle.umbc.edu/ 28http://www.fipa.org/repository/aclspecs.html 26http://watson.kmi.open.ac.uk/ 29http://www.cs.umbc.edu/csee/research/kqml/ ontology repositories. This makes it possible to provide for [5] K. Viljanen, J. Tuominen, and E. Hyvonen,¨ “Ontology libraries for pro- example global search and global browsing functionalities duction use: The Finnish ontology library service ONKI,” in Proceedings of the ESWC 2009, Heraklion, Greece. Springer–Verlag, 2009. to a collection of separate underlying ontology repositories. [6] M. d’Aquin and H. Lewen, “Cupboard - a place to expose your At the same time, the NOR does not restrict the individual ontologies to applications and the community,” in Proceedings of the ontology repository providers from creating advanced ontol- ESWC 2009. Heraklion, Greece: Springer–Verlag, June 2009, pp. 913– 918. ogy, business, and user specific implementations because the [7] T. R. Gruber, “A translation approach to portable ontology specifica- relation between the normalized representation and the native tions,” Knowledge Acquisition, vol. 5, no. 2, pp. 199–220, 1993. representation is kept intact. [8] G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee, “Media meets semantic web – The NOR approach allows the ontology user to find relevant how the bbc uses dbpedia and linked data to make connections,” in concepts and ontology repositories in cases where the correct Proceedings of the ESWC 2009, Heraklion, Greece. Springer–Verlag, ontology repository is not known in advance or when many 2009. [9] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, ontology repositories are used simultaneously. After finding and S. Hellmann, “Dbpedia a crystallization point for the web of data,” the relevant repository, the user may access the underlying Journal of Web Semantics: Science, Services and Agents on the World ontology repository for full-blown functionalities. For organi- Wide Web, no. 7, p. 154165, 2009. [10] T. Adamusiak, T. Burdett, N. Kurbatova, K. J. van der Velde, zations that maintain an internal ontology repository, the NOR N. Abeygunawardena, D. Antonakaki, M. Kapushesky, H. Parkinson, approach makes it possible to make simultaneous queries to and M. Swertz, “Ontocat – simple ontology search and integration repositories outside the organization. For the ontology pub- in java, r and rest/javascript,” BMC Bioinformatics, vol. 12, no. 1, p. 218, 2011. [Online]. Available: http://www.biomedcentral.com/1471- lishers, implementing the NOR API increases the findability 2105/12/218 of the ontologies and therefore the benefits of publishing the [11] C. Bizer, R. Cyganiak, and T. Heath, “How to publish linked data on the ontology in the first place. web,” http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/, July 27 2007. Future work includes developing further the API and its [12] L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, methods to support, for example, restricting queries to a V. C. Doshi, and J. Sachs, “Swoogle: A Search and Metadata Engine for specific ontology, specific subpart of the ontology or to a the Semantic Web,” in Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. ACM Press, November specific concept type. The normalized concept representations 2004. could be improved by introducing links between ontologies in [13] E. Oren, R. Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, and the spirit of Linked Data. Such mappings between ontologies G. Tummarello, “Sindice.com: a document-oriented lookup index for open linked data.” IJMSO, vol. 3, no. 1, pp. 37–52, 2008. [Online]. could be produced potentially automatically by creating a Available: http://dblp.uni-trier.de/db/journals/ijmso/ijmso3.html matching application on the top of the NOR compatible [14] Y. Qu and G. Cheng, “Falcons concept search: A practical search ontology repositories. NOR based metasearch would benefit engine for web ontologies,” IEEE Transactions on Systems, Man, and Cybernetics, Part A, vol. 41, no. 4, pp. 810–816, 2011. from a ranking algorithm for ordering the results originating [15] E. Thomas, J. Z. Pan, and D. Sleeman, “ONTOSEARCH2: Searching from different underlying ontology repositories. Finally, to Ontologies Semantically,” in Proceedings of the OWLED 2007 Workshop evaluate the full potential of the approach, formal and informal on OWL: Experiences and Directions, 2007, online publication: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol- ontology repositories should implement the NOR API. 258/. Acknowledgements: This work is part of the National Se- [16] J. Tuominen, M. Frosterus, K. Viljanen, and E. Hyvonen,¨ “ONKI SKOS mantic Web Ontology project in Finland30 (FinnONTO, 2003- server for publishing and utilizing SKOS vocabularies and ontologies as services,” in Proceedings of the ESWC 2009, Heraklion, Greece. 2012), funded mainly by the National Technology and Inno- Springer–Verlag, 2009. vation Agency (Tekes) and a consortium of 38 organizations. [17] K. Viljanen, J. Tuominen, and E. Hyvonen,¨ “Publishing and using We thank Osma Suominen, the Semantic Computing Research ontologies as mash-up services,” in Proceedings of the 4th Workshop on Scripting for the Semantic Web (SFSW2008), 5th European Semantic Group, and the OOR network for fruitful discussions. Web Conference 2008 (ESWC 2008), June 1-5 2008. [18] E. Hyvonen,¨ E. Makel¨ a,¨ T. Kauppinen, O. Alm, J. Kurki, T. Ruotsalo, REFERENCES K. Seppal¨ a,¨ J. Takala, K. Puputti, H. Kuittinen, K. Viljanen, J. Tuominen, T. Palonen, M. Frosterus, R. Sinkkila,¨ P. Paakkarinen, J. Laitio, and [1] E. Hyvonen,¨ K. Viljanen, J. Tuominen, and K. Seppal¨ a,¨ “Building a K. Nyberg, “CultureSampo – Finnish culture on the semantic web 2.0. national semantic web ontology and ontology service infrastructure— Thematic perspectives for the end-user,” in Proceedings, Museums and the FinnONTO approach,” in Proceedings of the ESWC 2008, Tenerife, the Web 2009, Indianapolis, USA, April 15-18 2009. Spain. Springer–Verlag, 2008. [19] J. Kurki and E. Hyvonen,¨ “Collaborative metadata editor integrated [2] K. Baclawski and T. Schneider, “The open ontology repository initiative: with ontology services and faceted portals,” in Workshop on Ontology Requirements and research challenges,” in Proceedings of Workshop Repositories and Editors for the Semantic Web (ORES 2010), the on Collaborative Construction, Management and Linking of Structured Extended Semantic Web Conference ESWC 2010, Heraklion, Greece. Knowledge at the ISWC 2009, Washington DC., USA, October 2009. CEUR Workshop Proceedings, http://ceur-ws.org/, June 2010. [3] M. d’Aquin and N. F. Noy, “Where to publish and find ontologies? [20] K. Viljanen, J. Tuominen, M. Salonoja, and E. Hyvonen,¨ “Linked A survey of ontology libraries,” Web Semantics: Science, Services and open ontology services,” in Workshop on Ontology Repositories and Agents on the World Wide Web, vol. 11, pp. 96–111, Mar. 2012. Editors for the Semantic Web (ORES 2010), the Extended Semantic Web [Online]. Available: http://dx.doi.org/10.1016/j.websem.2011.08.005 Conference ESWC 2010. CEUR Workshop Proceedings, http://ceur- [4] N. F. Noy, N. H. Shah, P. L. Whetzel, B. Dai, M. Dorf, N. Griffith, ws.org/, June 2010. C. Jonquet, D. L. Rubin, M.-A. Storey, C. G. Chute, and M. A. Musen, [21] M. d’Aquin, M. Sabou, E. Motta, S. Angeletou, L. Gridinoc, V. Lopez, “BioPortal: ontologies and integrated data resources at the click of a and F. Zablith, “What can be done with the semantic web? an overview mouse,” Nucleic Acids Research, vol. 37, no. Web Server issue, pp. of watson-based applications,” in 5th Workshop on Semantic Web 170–173, 2009. Applications and Perspectives, SWAP 2008, 2008.

30http://www.seco.tkk.fi/projects/finnonto/ Publication VI

Jouni Tuominen, Nina Laurenne, and Eero Hyvönen. Biological Names and Taxonomies on the Semantic Web – Managing the Change in Scientific Con- ception. In The Semantic Web: Research and Applications: 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 – June 2, 2011, Proceedings, Part II, Grigoris Antoniou, Marko Grobelnik, Elena Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter De Leenheer, and Jeff Pan (editors), Lecture Notes in Computer Science, volume 6644, pages 255–269, ISBN 978-3-642-21063-1, Springer-Verlag, June 2011.

c 2011 Springer-Verlag Berlin Heidelberg.

Reprinted with permission.

149

Biological Names and Taxonomies on the Semantic Web – Managing the Change in Scientific Conception

Jouni Tuominen, Nina Laurenne, and Eero Hyv¨onen

Semantic Computing Research Group (SeCo) Aalto University School of Science and the University of Helsinki [email protected] http://www.seco.tkk.fi/

Abstract. Biodiversity management requires the usage of heterogeneous biological information from multiple sources. Indexing, aggregating, and finding such information is based on names and taxonomic knowledge of organisms. However, taxonomies change in time due to new scientific findings, opinions of authorities, and changes in our conception about life forms. Furthermore, organism names and their meaning change in time, different authorities use different scientific names for the same taxon in different times, and various vernacular names are in use in different languages. This makes data integration and information retrieval dif- ficult without detailed biological information. This paper introduces a meta-ontology for managing the names and taxonomies of organisms, and presents three applications for it: 1) publishing biological species lists as ontology services (ca. 20 taxonomies including more than 80,000 names), 2) collaborative management of the vernacular names of vascu- lar plants (ca. 26,000 taxa), and 3) management of individual scientific name changes based on research results, covering a group of . The applications are based on the databases of the Finnish Museum of Natural History and are used in a living lab environment on the web.

1 Introduction Exploitation of natural resources, urbanisation, pollution, and climate changes accelerate the extinction of organisms on Earth which has raised a common concern about maintaining biodiversity. For this purpose, management of in- formation about plants and animals is needed, a task requiring an efficient us- age of heterogeneous, dynamic biological data from distributed sources, such as observational records, literature, and natural history collections. Central re- sources in biodiversity management are names and ontological taxonomies of organisms [1,19,20,3,4]. Animal ontologies are stereotypical examples in the se- mantic web text books, but in reality semantic web technologies have hardly been applied to managing the real life taxonomies of biological organisms and biodiversity on the web. This paper tries to fill this gap.1 1 We discuss the taxonomies of contemporary species, not ’phylogenetic trees’ that model evolutionary development of species, where humans are successors, e.g., of dinosaurs.

G. Antoniou et al. (Eds.): ESWC 2011, Part II, LNCS 6644, pp. 255–269, 2011. c Springer-Verlag Berlin Heidelberg 2011  256 J. Tuominen, N. Laurenne, and E. Hyv¨onen

Managing taxonomies of organisms provides new challenges to semantic web ontology research. Firstly, although we know that lions are carnivores, a subclass of mammals that eat other animals, the notion of ’species’ in the general case is ac- tually very hard to define precisely. For example, some authors discuss as many as 22 different definitions of the notion of species [16]. Secondly, taxonomic knowl- edge changes and increases due to new research results. The number of new or- ganism names in biology increases by 25,000 every year as new taxa to science are discovered [11]. At the same time, the rate of changes in existing names has accel- erated by the implementation of molecular methods suggesting new positions to organisms in taxonomies. Thirdly, biological names are not stable or reliable iden- tifiers for organisms as they or their meaning change in time. Fourthly, the same name can be used by different authors to refer to different taxa (units of classifi- cation that commonly have a rank in the hierarchy), and a taxon can have more than one name without a consensus about the preferred one. As a result, biological texts are written, content is indexed in databases, and information is searched for using different names and terms from different times and authorities. In biological research, scientific names are used instead of com- mon names, but in many applications vernacular names in different languages are used instead. Data fusion is challenging and information retrieval without deep biological knowledge is difficult. We argue that a shared system for publishing and managing the scientific and vernacular names and underlying conceptions of organisms and taxonomies is needed. From a research viewpoint, such a system is needed to index research results and to find out whether a potential new species is already known under some name. Biological information needed by environmental authorities cannot be properly indexed, found or aggregated unless the organism names and iden- tifiers are available and can be aligned. For amateur scientists and the public, aligning vernacular names to scientific names and taxonomies is often a prereq- uisite for successful information retrieval. This paper presents a meta-ontology and its applications addressing these problems. Our research hypothesis is that semantic web technologies are useful in practise in modelling change in the scientific perception of biological names and taxonomies, for creating a platform for collaboratively managing scientific knowledge about taxonomies, and for publishing taxonomies as ontology services for indexing and information retrieval purposes in legacy systems. In the following, biological classification systems are first discussed and a meta-ontology TaxMeOn for defining such systems is presented [13]. Three use case applications of the meta-ontology are then discussed: a system for manag- ing vascular plant names collaboratively (26,000 species) based on the SAHA metadata editor [12], application of the ONKI ontology service [25] for publish- ing taxonomic species lists on the semantic web (over 80,000 taxa of mammals, birds, butterflies, wasps, etc.), and a more focused application for managing the names and scientific findings of the Afro-tropical beetle family Eucnemidae. Fi- nally, contributions of our work are summarised, related work discussed, and directions for further research are outlined. Biological Names and Taxonomies on the Semantic Web 257

2 Biological Names and Taxonomies

The scientific name system is based on the Linnean binomial name system where the basic unit is a species. Every species belongs to some genus and every genus belongs to a higher taxon. A scientific name often has a reference to the original publication where it was first published. For example, the scientific name of the bumblebee, Apis mellifera Linnaeus, 1758, means that Linnaeus published the description of the bumblebee in 1758 (in Systema Naturae 10th edition) and that bumblebee belongs to the genus Apis. The upper levels of the taxonomic hierarchy do not show in a scientific name. A confusing feature of scientific names is that the meaning of the name may change although the name remains the same. Taxon boundaries may vary according to different studies, and there may be multiple simultaneous views of taxon limits of the same organism group. For example, a genus may be delimited in three ways and according to each view different sets of species are included in the genus as illustrated in Fig. 1. These differing views are taxonomic concepts. The usage of the correct name is not enough, and Berendsohn [1] suggested that taxonomic concepts should be referred to by an abbreviation sec (secundum) after the authors name to indicate in which meaning the name is used. The nature of a biological name system is a change, as there is no single in- terpretation of the evolution. Typically there is no agreement if the variation observed in an organism is taxon-specific or shared by more than one taxon, which makes the name system dynamic. For example, the fruit fly Drosophila melanogaster was shifted into the genus Sophophora, resulting in a new name combination Sophonophora melanogaster [7]. The most common taxonomic changes and their implications to the scientific names are the following: 1) A species has been shifted to another genus - the genus name changes. 2) One species turns out to be several species - new species are described and named, and the old name remains the same with a narrower interpretation. 3) Several species are found to be just one species - the oldest name is valid and the other names become its synonyms.

Taxonomic concept 1

Taxonomic concept 2 Taxonomic concept 3

Fig. 1. A genus is delimited in three different ways according to three different studies. Black squares indicate species. 258 J. Tuominen, N. Laurenne, and E. Hyv¨onen

Species lists catalogue organisms occurring in a certain geographical area, which may vary from a small region to global. Often species lists contain valid taxon names with author information and synonyms of the valid names. They are snapshots of time and used especially by environmental authorities. The problem with species lists is that not all organism groups are catalogued and changes are not necessarily recorded in the species lists. Traditionally printed lists tend to be more detailed than online lists and their status is higher. Species lists often follow different hierarchies and species may be associated with different genera according to the person who published the list. The hier- archy in a species list is a compromise that combines several studies, and the author can subjectively emphasise a view that he/she wishes. A taxon may also have different taxonomic ranks in literature, for example the same taxon can occur both as a species and a subspecies. Common names tend to have regional variation and they do not indicate hierarchy unlike scientific names. Vernacular names have an important role in everyday language, but due to the variation and vagueness, they have little relevance in science. Vernacular names are used mainly in citizen science.

3 TaxMeOn – Meta-ontology of Biological Names

We have developed a meta-ontology for managing scientific and vernacular names. The ontology model consists of three parts that serve different purposes: 1) name collections, 2) species lists, and 3) name changes resulting from re- search. These parts are manageable separately, but associations between them are supported. Being a meta-ontology, TaxMeOn defines classes and properties that can be used to build ontologies. The ontologies can be used for creating semantic metadata for describing e.g. observational data or museum collections. TaxMeOn is based on RDF using some features of the OWL. The model contains 22 classes and 53 properties (61 including subproperties), of which ten classes and 15 properties are common to all the three parts of the model2. The core classes of TaxMeOn express a taxonomic concept, a scientific name, a taxonomic rank, a publication, an author, a vernacular name, and a status of a name. Taxonomic ranks are modelled as classes, and individual taxa are instances of them, for example the species forest fir forrestii (belongs to the genus Abies) is an instance of the class Species. The model contains 61 taxonomic ranks, of which 60 are obtained from TDWG Taxon Rank LSID Ontology3.Inorderto simplify the management of subspecific ranks, an additional class that combines species and taxonomic levels below it was created. References embody publications in a broad sense including other documented sources of information, for instance minutes of meetings. Bibliographic informa- tion can be associated to the reference according to the Dublin Core metadata standard. In biology, author names are often abbreviated when attached to taxon

2 The TaxMeOn schema is available at http://schema.onki.fi/taxmeon/ 3 http://rs.tdwg.org/ontology/voc/TaxonRank Biological Names and Taxonomies on the Semantic Web 259 names. The TaxMeOn model supports the referring system that is typical to bi- ology. Some of the properties used in TaxMeOn are part-specific as the uses of the parts differ from each other. For instance, the property that refers to a ver- nacular name is only available in the name collection part as it is not relevant in the other parts of the model. The most distinctive feature of the research part [14] is that a scientific name and taxonomic concepts associated to it are separated, which allows detailed management of them both. In the name collection and species list parts, a name and its taxonomic concepts are treated as a unit. Different statuses can be as- sociated to names, such as validity (accepted/synonym), a stage of a naming process (proposed/accepted) and spelling errors. The model has a top-level hierarchy that is based on a rough classification, such as the division of organism classes and orders. Ontologies that are gener- ated using TaxMeOn, can be hung on the top-level classification. A hierarchy is created using the transitive isPartOfHigherTaxon relation, e.g. to indicate that the species forrestii belongs to the genus Abies. Taxon names that refer to the same taxon can occur as different names in the published species lists and different types of relations (see Table 1) can be set between the taxa. Similarly, research results of phylogenetic studies can be mapped using the same relations. The relations for mapping taxa are divided on the basis of attributes of taxa (intensional) or being a member of a group (ostensive). If it is known that two taxa have an association which is not specified, a class is provided for expressing incomplete information (see the empty ellipse in Fig. 2). This allows associations of taxa without detailed taxonomic knowledge, and especially between taxa originating from different sources.

Table 1. Mapping relations used in species lists and research results. The three rela- tions can be used as intensional and/or ostensive, using their subproperties.

Relation Description congruent with taxon taxonomic concepts of two taxa are equal is part of taxon a taxonomic concept of a taxon is included in a taxonomic concept of another taxon overlaps with taxon taxonomic concepts of two taxa overlap

In TaxMeOn, a reference (an author name and a publication year) to the original publication can be attached to a name. A complete scientific name is atomised into units that can be combined in applications by traversing the RDF graph by utilising the isPartOfHigherTaxon and publishedIn relations. Name collections. Scientific names and their taxonomic concepts are treated as one unit in the name collection, because the scope is in vernacular names. The model supports the usage of multiple languages and dialects of common names. There may be several common names pointing to the same taxon, and typically one of them is recommended or has an official status. Alternative names are expressed defining the status using the class VernacularNameStatus and 260 J. Tuominen, N. Laurenne, and E. Hyv¨onen references related to the changes of a name status can be added. This allows the tracking the temporal order of the statuses. The model for vernacular names is illustrated in Fig. 2. Species lists. Species lists have a single hierarchy and they seldom include ver- nacular names. Species lists have more relevance in science than name collections, but they lack information about name changes and a single list does not express the parallel or contradictory views of taxonomy which are crucial for researchers. Synonyms of taxa are typically presented and the taxonomic concept is included in a name like in a name collection. Taxa occurring in different species lists can be mapped to each other or to research results using the relations in Table 1. In addition, a general association without taxonomic details can be used (see Fig. 2). Biological research results. In biological research results a key element is a taxonomic concept that can have multiple scientific names (and vice versa). In- stead of names, taxonomic concepts are used for defining the relations between taxa. The same relations are applied here as in the speies list part (see Ta- ble 1). The latest research results often redefine taxon boundaries, for example a split of taxa narrows the original taxonomic concept and the meaning of the name changes although the name itself may remain the same. The new and the old concepts are connected into a temporal chain by instantiation of a change event. In Fig. 3 the concept of the beetle genus Galba is split into the concepts of the Balgus and Pterotarsus. The taxon names are shown inside the ellipses

Pine trees Abies

rdfs:label

C. Coltman Rogers chengii

hasVernacular VernacularName Name TaxonInNameCollection Synonym Genus

isPartOfHigher Reference forrestii TaxonInNameCollection

Taxon Species list hasVernacular AcceptedVernacularName hasVernacular NameStatus VernacularName TaxonInNameCollection Name TaxMeOn refersTo Taxon Species rdfs:label Forrest fir

rdfs:seeAlso Research results VernacularName Reference Accepted˘ AlternativeVernacularName 1919 Chundian dc:title lengshan Wikipedia Gard. Chron., III, 65:150

Fig. 2. An example of vernacular names in a name collection. The ellipses represent instances of TaxMeOn classes and literals are indicated as boxes. Other parts of the model are connected to the example taxon in the box with dotted line, in which the empty ellipse illustrates a general representation of a taxon. Biological Names and Taxonomies on the Semantic Web 261

Schenkling Fleutiaux1928 Fleutiaux, Crowson, Cobos, 1920 1945 1967 1961

PublicationPublication Publication Publication Publication

Eucnemidae Elateridae

Eucnemidae Elateridae Elateridae 5a iPOHT 5b iPOHT 5c iPOHT Balgus congruent Balgus isOlder Balgus isPartOf isPartOf Taxon Than HigherTaxon HigherTaxon 1 2 3 after potential before Galba Pterotarsus Galba Split changeIn Relation Eucnemidae Eucnemidae Eucnemidae Eucnemidae Taxonomic Concept publishedIn after iPOHT iPOHT iPOHT tuberculata historio 4a 4b 4c 4d Pterotarsus Galbites Pterotarsus Galbites Publication changeIn changeIn changeIn Taxonomic Taxonomic Taxonomic Publication Publication Concept Concept Concept Lameere, 1900 Publication Publication Publication Publication Guerin- Guerin-Meneville, Meneville, 1830 1831 Fleutiaux, Fleutiaux, Fleutiaux, Muona, Guerin-Meneville, 1920 1918 1945 1987 1838. In the illustrations of the book were publishe later and Galba tuberculata had a name Pterotarsus marmorata

Fig. 3. An example of name changes and taxonomies of eucnemid beetles based on research results. The ellipses represent instances of TaxMeOn classes. Taxonomic hier- archies are expressed with the isPartOfHigherTaxon (iPOHT) relations, and the name change series of taxa are illustrated with a darker colour. The following abbreviations are used for the change types: S = Split of taxa, NC = Name change, TCC = Taxon concept change and CH = Change in hierarchy. The meaning of the numbers: 1) The species description of Galba tuberculata was originally published in 1830, but the illus- trations of the book were published in 1838. However, in the illustrations G. tuberculata appeared with the name Pterotarsus marmorata (conflicting information). 2) Mean- while, in 1831, the same taxon was independently described as Pterotarsus historio (independent events). 3) Lameere was confused by the two independently published works and changed the name to Galba in 1900 (uncertain relation between 2 and 3). 4a) Fleutiaux split the genus Galba into two genera. The name Galba was changed into Pterotarsus as there turned out to be a crustacean genus Galba (S, NC,TCC). 4b) Fleutiaux re-examined the genus and concluded that it is new to science and de- scribed it as Galbites (NC, TCC). 4c) Later Fleutiaux changed his mind and renamed the genus as Pterotarsus again (NC, TCC). 4d) Muona discovered that Fleutiaux was originally right and renamed the genus as Galbites (NC, TCC). 5a) When Galba was split, a part of its species were shifted into the genus Balgus that was described as new to science at the same time. Balgus was placed in the family Eucnemidae (CH). 5b) And changed into the family Throscidae (CH). This was originally published in a monthly magazine in the 1950’s, but the magazines were published as a book in 1967 which is most commonly cited. 5c) Balgus was changed into the family Elateridae in 1961 (CH and conflict in publication years). 262 J. Tuominen, N. Laurenne, and E. Hyv¨onen representing taxonomic concepts in order to simplify the presentation. Other change types are a lump of taxa, a change in taxon boundaries and a change in a hierarchy. These changes lead to the creation of a new instance of a taxonomic concept in order to maintain the traceable taxon history. An instantion of a new concept prevents evolving non-existing name combinations and artificial classi- fications. For instance, a species name is not associated with a genus name in which it has never been included. The status of a scientific name may change in time as an accepted name may become a synonym. Multiple statuses can be attached to a name, but according to the nomenclatural rules only one of them is accepted at time. The temporal order of the statuses can be managed according to the same idea as in the name collections part.

4 Use Cases

We have applied the TaxMeOn ontology model to three use cases that are based on different needs. The datasets include a name collection of common names of vascular plants, several species lists of different animal groups and a collection of biological research results of Afro-tropical beetles. The use cases were selected on the basis of the active usage of the data (vernacular names), usefulness to the users (species lists), and the taxonomic challenges with available expertise (scientific names based on research results). The datasets used are depicted in Table 2.

4.1 Collaborative Management of Vascular Plants Names The biological name collection includes 26,000 Finnish names for vascular plants that are organised into a single hierarchy. A deeply nested hierarchy is not nec- essary here as the classification used is robust, containing only three taxonomic ranks. The need is to maintain the collection of the common names and to man- age the name acceptance process. The number of yearly updates exceeds 1,000. The typical users of the name collection are journalists, translators and other non-biologists who need to find a common name for a scientific name. The name collection of vascular plants is managed in SAHA4 [12]. SAHA is a simple, powerful and scalable generic metadata editor for collaborative content creation. Annotation projects can be added into SAHA by creating the metadata schema for the content and loading it into SAHA. The user interface of SAHA adapts to the schema by providing suitable forms for producing the metadata. The values of the properties of the schema can be instances of classes defined in the schema, references to ontologies or literals. The annotations created us- ing SAHA are stored in a database, from which they can be retrieved for use in semantic applications. SAHA also provides a SPARQL endpoint for making queries to the RDF data.

4 http://demo.seco.tkk.fi/saha/VascularPlants/index.shtml Biological Names and Taxonomies on the Semantic Web 263

Table 2. Datasets TaxMeOn has been applied to. Vascular plants are included in the name collection, the false click beetles are biological research results, and all other datasets are based on species lists.

Taxon group Region Publ. years # of taxa Vascular plants World constantly 25726 updated Long-horn beetles Scandinavia, 1939, 1960, 1979, 205, 181, 247, (Coleoptera: Cerambycidae) Baltic countries 1992, 2004, 2010, 269, 300, 297, 2010 1372 Butterflies and moths Scandianavia, 1962, 1977, 1996, 313, 256, 265, (Lepidoptera) North-West 2002, 2008 4573, 12256, Russia, Estonia 3244, 3251, 3477 Thrips (Thysanoptera) Finland 2008 219 Lacewings and scorpionflies Finland 2008 113 (Neuroptera and Mecoptera) True bugs (Hemiptera) Finland 2008 2690 Flies (Diptera: Brachycera) Finland 2008 6373 Parasitic wasps Finland 1995, 1999, 1999, 282, 398, 919, (Hymenoptera: Ichneumoidae) 2000, 2003 786, 733 Bees and wasps Finland 2010 1048 (Hymenoptera: Apoidea) Mammals World 2008 6062 Birds World 2010 12125 False click beetles Afrotropics – 9 genera (Coleoptera: Eucnemidae)

New scientific species names are added by creating a new instance of the Species class and then adding the other necessary information, such as their status. Similarly, a higher taxon can be created if it does not already exist, and the former is linked to the latter with the isPartOfHigherTaxon relation. SAHA has search facilities for querying the data, and a journalist writing a non- scientific article about a house plant, for example, can use the system for finding a common name for the plant.

4.2 Publishing Species Lists as Ontology Services The users of species lists are ecologists, environmental authorities and ama- teurs searching for the correct scientific name occurring in a certain geograph- ical area. In this use case ca. 20 published species lists obtained from the tax- onomic database of the Finnish Museum of Natural History5 containing more than 80,000 names were converted into TaxMeOn ontologies. In addition, seven regional lists of long-horn beetles (cerambycids) with 100 species are available from the years 1936–2010. The various names meaning the same taxon were mapped by an expert. The most common differences between the lists are a shift of a genus for a species, a change in a hierarchy and/or in a name status. Similarly, ca. 150 species of butterfly names from five lists were mapped. 5 http://taxon.luomus.fi/ 264 J. Tuominen, N. Laurenne, and E. Hyv¨onen

Currently, the mapped beetle names are published as services for humans and machines in the ONKI Ontology Service6 [25]. The ONKI Ontology Service is a general ontology library for publishing ontologies and providing functionalities for accessing them, using ready-to-use web widgets as well as APIs. ONKI sup- ports content indexing, concept disambiguation, searching, and query expansion. Fig. 4 depicts the user interface of the ONKI server [24]. The user is browsing the species lists of cerambycid beetles, and has made a query for taxon names starting with a string “ab”. The selected species abdominalis has been described by Stephens in 1831, and it occurs in the species list Catalogue of Palaearctic Coleoptera, published in the year 2010 [15]. The species abdominalis belongs to the subgenus and genus Grammoptera. The taxonomy of the family Cerambyci- dae is visualised as a hierarchy tree. The same species also occurs in other species lists, which is indicated by congruentWithTaxonOst relation. Browsing the taxa reveals varying taxon names and classifications. For example, the Grammoptera (Grammoptera) abdominalis has a subgenus in this example, but the rank sub- genus does not exist in the other lists of cerambycid. Also, the synonyms of the selected taxon are shown (analis, femorata, nigrescens and variegata).

Fig. 4. The species of abdominalis shown in the ONKI Browser

The ONKI Ontology Services can be integrated into applications on the user interface level (in HTML) by utilising the ONKI Selector, a lightweight web widget providing functionalities for accessing ontologies. The ONKI API has

6 http://demo.seco.tkk.fi/onkiskos/cerambycids/ Biological Names and Taxonomies on the Semantic Web 265 been implemented in three ways: as an AJAX service, as a Web Service, and as a simple HTTP API. The ONKI Ontology Service contains several ontologies covering different fields and is a part of the FinnONTO project [6] that aims to build a national ontology infrastructure. The Finnish Spatio-temporal Ontology (SAPO) [8], for example, can be used to disambiguate geographical information of observational data. Combining the usage of species ontologies and SAPO, extensive data har- monisation is avoided as both taxon names and geographical names change in time.

4.3 Management of Individual Scientific Names

The use case of scientific names is the Afro-tropical beetle family Eucnemidae, which consists of ca. nine genera that have gone through numerous taxonomic treatments. Also, mistakes and uncertain events are modelled if they are rel- evant to name changes. For example, the position of the species Pterotarsus historio in taxonomic classification has changed 22 times and at least eight tax- onomic concepts are associated to the genus Pterotarsus [17]. Fig. 3 illustrates the problematic nature of the beetle group in a simplified example. A compara- ble comparable situation concerns most organism groups on Earth. Due to the numerous changes in scientific names, even researchers find it hard to remember them and this information can only be found in publications of taxonomy. The option of managing individual names is advantageous as it completes the species lists and allows the mapping of detailed taxonomic information to the species lists. For example, environmental authorities and most biologists prefer a simple representation of species lists instead of complicated change series.

5 Discussion

We have explored the applicability of the semantic web technologies for the management needs of biological names. Separating taxonomic concepts from scientific and vernacular names is justified due to the ambiguity of the names referring to taxa. This also enables relating relevant attributes separately to a concept and to a name, although it is not always clear to which of these an attribute should be linked and subjective decisions have to made. The idea of the model is simplicity and practicality in real-world use cases. The fruitfulness lays in the possibilities to link divergent data serving diver- gent purposes and in linking detailed information with more general information. For example, a common name of a house plant, a taxonomic concept that ap- pears to be a species complex (a unit formed by several closely related species) and the geographical area can be linked. The most complex use case is the management of scientific name changes of biological research results. The main goal is to maintain the temporal control of the name changes and classifications. The instantiation of taxon names and concepts lead to a situation in which they are hard to manage when they form a 266 J. Tuominen, N. Laurenne, and E. Hyv¨onen long chain. Every change increases the number of instances created. Proteg´e7 was used for editing the ontologies, although managing names is quite inconvenient because they are shown as an alphabetically ordered flat list, not as a taxonomic hierarchy. As Proteg´e is rather complicated for a non-expert user, the metadata editor SAHA was used for maintaining the continuous changes of common names of plants. The simplicity of SAHA makes it a suitable option for ordinary users who want to concentrate on the content. However, we noticed that some useful features are missing from SAHA. The visualisation of a nested hierarchy would help users to compare differing classifications. In many biological ontologies the ’subclass of’ relation is used for expressing the taxon hierarchies. However, in the TaxMeOn model we use the isPartHigh- erTaxon relation instead. If the ’subclass of’ relation was used to express the taxonomic hierarchy, a taxon would incorrectly be an instance of the higher taxon ranks, e.g., a species would be an instance of the class Genus.Thiswould lead to a situation in which queries for genera also return species.

5.1 Related Work

NCBO BioPortal8 and OBO Foundry9 have large collections of life science on- tologies mainly concentrating on biomedicine and physiology. The absence of taxonomic ontologies is distinctive which may indicate the complexity of the bi- ological name system. The portals contain only three taxonomic ontologies (Am- phibian taxonomy, Fly taxonomy and Teleost taxonomy) and one broader clas- sification (NCBI organismal classification). The taxonomic hierarchy is defined using the rdfs:subClassOf relation in the existing ontologies. Taxonconcept.org10 provides Linked Open Data identifiers for species concepts and links data about them originating from different sources. All names are expressed using literals and the following taxonomic ranks are included: a combination of a species and a genus, a class and an order. Parallel hierarchies are not supported. Geospecies11 uses the properties skos:broaderTransitive and skos:narrowerTransitive to ex- press the hierarchy. Page [19] discusses the importance of persistent identifiers for organism names and presents a solution for managing names and their synonyms on the semantic web. The taxon names from different sources referring to the same taxon are mapped using the owl:sameAs relation which is a strong statement. Hierarchy is expressed using two different methods in order to support efficient queries. Schulz et al. [20] presented the first ontology model of biological taxa and its application to physical individuals. Taxa organised in a hierarchy is thoroughly discussed, but the model is static and based on a single unchangeable taxonomy.

7 http://protege.stanford.edu/ 8 http://bioportal.bioontology.org/ 9 http://www.obofoundry.org/ 10 http://www.taxonconcept.org/ 11 http://lod.geospecies.org/ Biological Names and Taxonomies on the Semantic Web 267

Despite recognising the dynamic nature of taxonomy and the name system, the model is not applicable in the management of biological names as such. Franz and Peet [3] enlighten the problematic nature of the topic by describing how semantics can be applied in relating taxa to each other. They introduce two essentially important terms from philosophy to taxonomy to specify the way, in which differing classifications that include different sets of taxa can be compared. An ostensive relation is specified by being a member of a group and intensional relations are based on properties uniting the group. These two fundamentally different approaches can be used simultaneously, which increases the information content of the relation. Franz and Thau [4] developed the model of scientific names further by eval- uating the limitations of applying ontologies. They concluded that ontologies should focus either on a nomenclatural point of view or on strategies for align- ing multiple taxonomies. Tuominen et al. [23] model the taxonomic hierarchy using the skos:broader property, and preferred scientific and common names of the taxa are represented with the property skos:prefLabel and alternative names with skos:altLabel.The property rdf:type is used to indicate the taxonomic rank. This is applicable to relatively simple taxonomies such as species lists, but it does not support ex- pressing more elaborate information (changes in a concept or a name). The Darwin Core (DwC) [2] is a metadata schema developed for observation data by the TDWG (Biodiversity Information Standards). The goal of the DwC is to standardise the form of presenting biological information in order to enhance the usage of it. However, it lacks the semantic aspect and the terms related to biological names are restricted due to the wide and general scope of the DwC. The scope of the related work presented above differs from our approach as our focus is on practical name management and retrieval of names. Research on ontology versioning [10] and ontology evolution [18] has focused on finding mappings between different ontology versions, performing ontology refinements and other changes in the conceptualisation [9,21], and in reasoning with multi-version ontologies [5]. There are similarities in our problem field, but our focus is to support multiple parallel ontologies interpreting the domain differently, not in versioning or evolution of a specific ontology. For example, there is no single taxonomy of all organisms, but different views of how they should be organised into hierarchies. A similar type of an approach for managing changes and parallel views of concepts has been proposed by Tennis and Sutton [22] in the context of SKOS vocabularies. However, TaxMeOn supports richer ways of expressing informa- tion, e.g. for managing changes of taxon names and concepts separately.

5.2 Future Work The model will be tested using different datasets to ensure its applicability. Cur- rently, the research results part covers animal names, but will be expanded to plant names as well. The lack of user-friendly tools is obvious and the metadata 268 J. Tuominen, N. Laurenne, and E. Hyv¨onen editor SAHA is planned to be expanded to respond to the needs. Describing evolutionary trees and their information content is a challenging application area as phylogenetics produces name changes.

Acknowledgements. This work is part of the National Semantic Web Ontology project in Finland12 (FinnONTO, 2003-2012), funded mainly by the National Technology and Innovation Agency (Tekes) and a consortium of 38 organizations. We thank Jyrki Muona, Hans Silfverberg, Leo Junikka and Juhana Nieminen for their collaboration.

References

1. Berendsohn, W.: The concept of ”potential taxon” in databases. Taxon 44, 207–212 (1995) 2. Darwin Core Task Group. Darwin core. Tech. rep (2009), http://www.tdwg.org/standards/450/ 3. Franz, N., Peet, R.: Towards a language for mapping relationships among taxo- nomic concepts. Systematics and Biodiversity 7(1), 5–20 (2009) 4. Franz, N., Thau, D.: Biological taxonomy and ontology development: scope and limitations. Biodiversity Informatics 7, 45–66 (2010) 5. Huang, Z., Stuckenschmidt, H.: Reasoning with multi-version ontologies: A tem- porallogicapproach.In:Gil,Y.,Motta,E.,Benjamins,V.R.,Musen,M.A.(eds.) ISWC 2005. LNCS, vol. 3729, pp. 398–412. Springer, Heidelberg (2005) 6. Hyv¨onen, E., Viljanen, K., Tuominen, J., Sepp¨al¨a, K.: Building a national semantic web ontology and ontology service infrastructure – the FinnONTO approach. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 95–109. Springer, Heidelberg (2008) 7. ICZN. Opinion 2245 (case 3407) drosophila fall´en, 1823 (insecta, diptera): Drosophila funebris fabricius, 1787 is maintained as the type species. Bulletin of Zoological Nomenclature 67(1) (2010) 8. Kauppinen, T., V¨a¨at¨ainen, J., Hyv¨onen, E.: Creating and using geospatial ontology time series in a semantic cultural heritage portal. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 110–123. Springer, Heidelberg (2008) 9. Klein, M.: Change Management for Distributed Ontologies. Ph.D. thesis, Vrije Universiteit Amsterdam (August 2004) 10. Klein, M., Fensel, D.: Ontology versioning on the Semantic Web. In: Proceedings of the International Semantic Web Working Symposium (SWWS), July 30 – August 1, pp. 75–91. Stanford University, California (2001) 11. Knapp, S., Polaszek, A., Watson, M.: Spreading the word. Nature 446, 261–262 (2007) 12. Kurki, J., Hyv¨onen, E.: Collaborative metadata editor integrated with ontology services and faceted portals. In: Workshop on Ontology Repositories and Edi- tors for the Semantic Web (ORES 2010), the Extended Semantic Web Confer- ence ESWC 2010, CEUR Workshop Proceedings, Heraklion, Greece (June 2010), http://ceur-ws.org/

12 http://www.seco.tkk.fi/projects/finnonto/ Biological Names and Taxonomies on the Semantic Web 269

13. Laurenne, N., Tuominen, J., Koho, M., Hyv¨onen, E.: Modeling and publishing biological names and classifications on the semantic web. In: TDWG 2010 Annual Conference of the Taxonomic Databases Working Group (September 2010); poster abstract 14. Laurenne, N., Tuominen, J., Koho, M., Hyv¨onen, E.: Taxon meta-ontology TaxMeOn – towards an ontology model for managing changing scientific names in time. In: TDWG 2010 Annual Conference of the Taxonomic Databases Working Group (September 2010); contributed abstract 15. L¨obl, I., Smetana, A.: Catalogue of Palearctic Coleoptera Chrysomeloidea, vol. 6. Apollo Books, Stenstrup (2010) 16. Mayden, R.L.: A hierarchy of species concepts: the denouement in the saga of the species problem. In: Claridge, M.F., Dawah, H.A., Wilson, M.R. (eds.) Species: The Units of Biodiversity Systematics Association Special, vol. 54, pp. 381–424. Chapman and Hall, London (1997) 17. Muona, J.: A revision of the indomalesian tribe galbitini new tribe (coleoptera, eucnemidae). Entomologica Scandinavica. Supplement 39, 1–67 (1991) 18. Noy, N., Klein, M.: Ontology evolution: Not the same as schema evolution. Knowl- edge and Information Systems 6(4) (2004) 19. Page, R.: Taxonomic names, metadata, and the semantic web. Biodiversity Infor- matics 3, 1–15 (2006) 20. Schulz, S., Stenzhorn, H., Boeker, M.: The ontology of biological taxa. Bioinfor- matics 24(13), 313–321 (2008) 21. Stojanovic, L.: Methods and Tools for Ontology Evolution. Ph.D. thesis, University of Karlsruhe, Germany (2004) 22. Tennis, J.T., Sutton, S.A.: Extending the simple knowledge organization system for concept management in vocabulary development applications. Journal of the American Society for Information Science and Technology 59(1), 25–37 (2008) 23. Tuominen, J., Frosterus, M., Laurenne, N., Hyv¨onen, E.: Publishing biological classifications as SKOS vocabulary services on the semantic web. In: TDWG 2010 Annual Conference of the Taxonomic Databases Working Group (September 2010); demonstration abstract 24. Tuominen, J., Frosterus, M., Viljanen, K., Hyv¨onen, E.: ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In: Aroyo, L.,Traverso,P.,Ciravegna,F.,Cimiano,P.,Heath,T.,Hyv¨onen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 768–780. Springer, Heidelberg (2009) 25. Viljanen, K., Tuominen, J., Hyv¨onen, E.: Ontology libraries for production use: The finnish ontology library service ONKI. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyv¨onen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 781–795. Springer, Heidelberg (2009)

Publication VII

Nina Laurenne, Jouni Tuominen, Hannu Saarenmaa and Eero Hyvönen. Making species checklists understandable to machines – a shift from re- lational databases to ontologies. Journal of Biomedical Semantics, 5, 40, DOI 10.1186/2041-1480-5-40, September 2014.

c 2014 Laurenne et al.

Reprinted with permission.

167

Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 http://www.jbiomedsem.com/content/5/1/40 JOURNAL OF BIOMEDICAL SEMANTICS

RESEARCH Open Access Making species checklists understandable to machines – a shift from relational databases to ontologies Nina Laurenne1*†, Jouni Tuominen1†, Hannu Saarenmaa2 and Eero Hyvönen1

Abstract Background: The scientific names of plants and animals play a major role in Life Sciences as information is indexed, integrated, and searched using scientific names. The main problem with names is their ambiguous nature, because more than one name may point to the same taxon and multiple taxa may share the same name. In addition, scientific names change over time, which makes them open to various interpretations. Applying machine-understandable semantics to these names enables efficient processing of biological content in information systems. The first step is to use unique persistent identifiers instead of name strings when referring to taxa. The most commonly used identifiers are Life Science Identifiers (LSID), which are traditionally used in relational databases, and more recently HTTP URIs, which are applied on the Semantic Web by Linked Data applications. Results: We introduce two models for expressing taxonomic information in the form of species checklists. First, we show how species checklists are presented in a relational database system using LSIDs. Then, in order to gain a more detailed representation of taxonomic information, we introduce meta-ontology TaxMeOn to model the same content as Semantic Web ontologies where taxa are identified using HTTP URIs. We also explore how changes in scientific names can be managed over time. Conclusions: The use of HTTP URIs is preferable for presenting the taxonomic information of species checklists. An HTTP URI identifies a taxon and operates as a web address from which additional information about the taxon can be located, unlike LSID. This enables the integration of biological data from different sources on the web using Linked Data principles and prevents the formation of information silos. The Linked Data approach allows a user to assemble information and evaluate the complexity of taxonomical data based on conflicting views of taxonomic classifications. Using HTTP URIs and Semantic Web technologies also facilitate the representation of the semantics of biological data, and in this way, the creation of more “intelligent” biological applications and services. Keywords: Scientific name, Taxonomic concept, LSID, HTTP URI, Ontology, Semantic web, Linked data, Species checklist

Background makes data reuse and integration a challenge for both Research on biodiversity requires integrating data from human users and machines. distributed heterogeneous sources, such as scientific lit- Scientific names are important for interlinking infor- erature, observations, and biomedical resources. Data is mation about taxa in all fields of the Life Sciences. A often presented using a variety of terms, vocabularies, and taxon is a group of one or more organisms whose mem- languages, which presents a barrier to interoperability and bers are considered evolutionarily related to one another; a taxon typically has a name and rank, i.e., a species, genus, *Correspondence: [email protected] etc. Taxon names are especially necessary when indexing †Equal contributors biological information and cataloguing biodiversity. The 1Semantic Computing Research Group (SeCo), Department of Media Technology, Aalto University, P.O. Box 15500, 00076 Aalto, Espoo, Finland nature of names, whether important or problematic, has Full list of author information is available at the end of the article recently been re-examined by several researchers [1-6].

© 2014 Laurenne et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 2 of 13 http://www.jbiomedsem.com/content/5/1/40

Difficulties arise when a particular taxon can be referred This means that the interoperability of DwC records is to using multiple names, since scientists’ opinions differ not achieved if the elements are not used in a consis- on how evolutionary units should be organised into classi- tent way. For example, a taxon name may be a literal fications. Also, researchers may use the same name with a value or referred to using a URI. Darwin Core Archive different meaning when referring to taxa. Well-conducted (DwC-A) [18] is a data standard for producing a self- taxonomic studies may be 250 years old and still use- contained dataset for sharing species-related data, such ful but in most cases, the perceived boundaries of taxa as occurrence records and checklists. The CSV (Comma- have been revised several times after the original publica- Separated Values) data files of an archive are organised tion. Contrary to popular belief, a generally agreed-upon, in a star-like manner, with one core data file and possible single taxonomy of organisms does not exist, and this extensions, e.g., for vernacular names or distribution data. fact is directly reflected in the scientific naming system The scope of biomedical resources differs from check- through the various usages of names. For a taxonomist, lists because the focus is on a gene or a cell level. Nev- a scientific name is a label that mirrors an evolution- ertheless, the name question remains relevant because ary hypothesis that is under continuous testing. There scientific names are used for linking information. Cur- will never be a commonly agreed upon single taxonomy rently, the National Center for Biotechnology Information and there will always be multiple competing current tax- (NCBI) [19] provides a single robust consensus hierar- onomic views. Nevertheless, efforts are made to provide chy of taxa constructed by experts, but NCBI ambi- usable taxonomies for non-taxonomists. tiously seeks to build a topology based on monophyletic Checklists are species catalogues where taxa are organ- groups, i.e., taxa derived from a common ancestor. NCBI ised hierarchically according to an author’s current view of allows flexibility in the acceptance of informal names a classification. The coverage of a species checklist varies and surrogate names can be used when contributing from a geographically limited area to a worldwide list, data and searching for taxa [5]. The majority of the and it typically focuses on a particular organismal group. submitted DNA sequences do not have a binominal sci- An author’s view of research results is thus inevitably entific name because specimens are not identified into emphasised, which opens the lists to interpretation if a species level at the time of submission or only sur- they lack sufficient taxonomic details. A regional species rogate names are used [5]. The significance of DNA listindexestaxaofagivenarea,butitcanalsocontain sequence data is increasing due to the rapid development additional information. For example, Fauna Europaea [7] of molecular methods that are applied in constructing and the Atlas of Living Australia [8] provide distribution evolutionary hypotheses and barcoding biodiversity. Con- maps and visualisation tools. The database Encyclope- sequently, descriptions of new species based on molecular dia of Life (EoL) [9] covers the whole world and has a evidence result in an increased number of species in considerable amount of species information. Also, unlike checklists. most resources, it supports multiple classifications since A major source for ambiguity in scientific names is that data providers can upload differing taxonomies into the they change over time. One of the most common types of system. change concerns a Linnean binominal name combination. Checklists were previously only published in journals The genus of a binominal name changes when a species is (static lists), but up-to-date checklists (dynamic lists) are moved to another genus. For example, the parasitic wasp increasingly available on the web. For example, the most species moscaryi once belonged to the genus Tetraconus, notable database, Catalogue of Life (CoL) [10], aims to but as a result of a taxonomic revision that synonymised include all known species and currently contains nearly two genera, its new name combination is Monomachus 1,352,112 species from 132 taxonomic datasets (2013 moscaryi [20]. Synonymisation happens due to assess- Annual Checklist). The database of zoological names ments of the identity of types (i.e., typically a physical ZooBank [11] currently has 101,777 nomenclatural acts. specimen to which a scientific name is attached). If two The Global Biodiversity Information Facility (GBIF) [12] or more taxa are lumped, the older name remains valid has made an effort to stabilise name usage by setting up but with a changed taxonomic circumscription, and the a Checklist Bank [13] for storing names and information more recent names become its synonyms. Consequently, about them. The widely used Taxonomic Concept Trans- there is more than one name pointing to the taxon, and fer Schema (TCS) [14] specifies the format (XML), in the taxonomic concept associated with the older name which taxonomic information is presented when exchang- changes. The opposite situation is the split of taxa, where ing data. Darwin Core (DwC) [15,16], created by Biodi- one taxon is divided into two or more taxa. The diver- versity Information Standards (TDWG) [17], is a stan- gence between a name and its meaning is characteristic dardised form of presenting biological information. The of taxonomy, because a scientific name does not neces- metadata elements in DwC are not strictly defined as sarily change despite the fact that taxon boundaries are the format and the element values are not fully specified. redefined. Researchers can also classify the same species Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 3 of 13 http://www.jbiomedsem.com/content/5/1/40

in various ways, thus leading to the existence of multiple part expresses the authority, and the fourth specifies the name combinations. namespace (which specifies the type of an LSID, e.g., sci- Berendsohn [21] introduced the concept of a poten- entific name, living thing, picture, or museum specimen), tial taxon, which is a scientific name with information the fifth points to the object ID, and the optional sixth part on a circumscription. He proposed the term “secundum” is for versioning information. An LSID can be accommo- (abbr. “sec”) be attached to a scientific name when refer- dated to a single name or to a set of taxonomic details, ring to a particular taxonomic circumscription. This was depending on its purpose [2,24]. For example, identi- a concrete suggestion on how to interlink differing tax- fiers are given to scientific names in the World Register onomic views while continuing to retain the adequate of Marine Species [25], but in the Catalogue of Life [10] taxonomic information in databases [22]. Having infor- they are given to taxonomic concepts. The Universal Bio- mation on circumscriptions in databases is an improve- logical Indexer and Organizer (uBio) [26] has 11,106,374 ment, but machine-readable semantics need to be used in namebank records where LSIDs are used for referring to order to enhance the machine-processability of taxonomic taxonomic concepts [6]. Also, an RDF (Resource Descrip- information. tion Framework) representation [27] is provided but some Inthispaperwepresenttwomodelsfordescribingtax- of the essential information is expressed as literals (a onomic information in a machine-processable way. The classification, taxonomic rank and a typing of resources) first model describes species checklists as a relational instead of URIs, which hampers machine-processability. database and the second one is further developed repre- The data carried by an LSID is obtained using a specific sentation of taxonomic information using Semantic Web resolver. Locating the resolver via the Domain Name Sys- technologies. We explore the reasons for moving away tem(DNS)oftheInternetrequiresthattheresolverbe from relational databases towards semantic technology, configured in a DNS SRV record (DNS service record) of and we also discuss options for managing scientific names the domain used as the authority part of an LSID. LSIDs as they change over time. canalsobeusedwithoutaresolveriftheyarepresentedas HTTP URIs using an LSID HTTP proxy. According to the Towards semantic handling of biological names TDWG guidelines for using identifiers, an LSID resolver A biologist understands the semantics of scientific names should return metadata about the requested resource in by reading scientific literature, but computers require RDF form [27]. The application of LSIDs in the Catalogue explicit identifier systems and data models to process of Life is thoroughly discussed by Jones et al. [2]. GBIF has semantics. It is obvious that persistent identifiers for taxa published recommendations for the adoption of LSIDs should be used instead of ambiguous name strings to and HTTP URIs [28,29]. increase the processability of scientific names. Using iden- The URN scheme applied to LSIDs is a URI scheme tifiers allows information to be connected unambiguously, standardised by the Internet Assigned Numbers Author- which enables interoperability between systems. Further- ity (IANA) [30]. HTTP is also a URI scheme, but there more, there is a need to interlink taxa between the differ- is a fundamental difference between URNs and HTTP ent versions of checklists as they are updated. Otherwise, URIs. HTTP URIs are based on the DNS, where the global data indexed using an earlier version of a checklist can- uniqueness of identifiers is guaranteed by the DNS infras- not categorically be found using a later version of the tructure, which also facilitates addressing and retrieving checklist. information about HTTP URIs. In contrast to URNs, sep- arate web services are not necessary to manage identifier Recognising taxa using identifiers creation or resolve them for data retrieval because these The most commonly used identifiers in biology are Life functions are already available in the infrastructure of the Science Identifiers (LSID) [23]. An LSID consists of six web. As a result, HTTP URIs are used as the identifier parts (Figure 1): the first two indicate that the type of mechanism for the Semantic Web and Linked Data [31]. URN (Uniform Resource Name) is an LSID, the third In addition, the form of an HTTP URI is flexible because

Figure 1 The structure of an LSID. An LSID of a cerambycid beetle species obtained from the Catalogue of Life database. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 4 of 13 http://www.jbiomedsem.com/content/5/1/40

it does not have strictly defined parts like LSIDs. HTTP [37-41] and the ONKI ontology service [42]. The most URIs allow linking data across the web on the basis of the comprehensive of them is the NCBI Organismal clas- meaning of concepts that are identified with HTTP URIs, sification [41], which contains more than 352,000 taxa which enables the creation of the Web of Data. in a single hierarchy. Common to the classifications in LSIDs were the first attempt to solve the name problem, the NCBO BioPortal is that the hierarchy is constructed but due to the rapid development of Semantic Web tech- using subclassOf (isA) relations and presented in the OBO nologies, the trend now favours standardised web technol- ontology language [43]. TaxonConcept.org [44] tackles ogy. The main differences between LSIDs and HTTP URIs the name problem of taxonomic information in prac- are presented in Table 1. The technology applied does not tice and shows how to publish the information as Linked solve the problem of the divergence between a name and Open Data. It also demonstrates how data from external its meaning, but it does provide an appropriate solution sources are integrated and investigates how to combine for publishing and interlinking data in an interoperable taxonomic concepts with specimen data. However, some way on the web. of the important information about names are described Both LSID- and HTTP URI-based checklists can be as literals, e.g., the classification of taxa. Also, the taxo- published for humans via a user interface and for nomic change types are not described (split or lump of machines as APIs (Application Programming Interface) to taxa). The Taxonomic Meta-Ontology TaxMeOn [45,46] provide access to the data in multiple ways. For example, aims to respond to the practical needs of managing bio- the user interface can be used to check a valid name for a logical names over time, and it links taxonomic infor- taxon and browse a classification. The same information mation to names. This meta-ontology differs in that it can be obtained using a specialised API, but more general offers a greater level of detail and supports differing query interfaces can also be provided. In Linked Data, an classifications. API for reading the RDF description or a human-readable An increasing number of ontologies are available and HTML page for a resource is typically provided, as well therefore ontology evolution has become an important as a general purpose endpoint service that can be queried issue. The world – and our conceptualisation of it – is using the Semantic Web query language SPARQL. In addi- continually changing, which makes ontology versioning tion, checklists can be made available as downloadable essential [45,47,48]. Existing data that refer to a concept files [31]. should be kept consistent when its meaning changes or when it is removed from an ontology. Data described Semantic modelling of taxonomies using different versions of an ontology then can be inte- On the Semantic Web, taxonomies are represented using grated by utilising mappings (alignments) between the RDF resources, i.e., entities with URI identifiers, and ontology versions [49]. Khattak et al. [50] document ontol- explicit relations between them. A relatively new approach ogy evolution by keeping a log of changes in concepts. is to express taxonomic information as an ontology. The Small changes in an ontology are grouped into sets, which first ontology model for a taxonomic classification was can later be used to revert to previous stages. An alter- presented by Schulz et al. [34], with taxa organised into native solution is to recognise concept changes instead of a single hierarchy. Franz and Peet [35] and Franz and versioning an ontology. Wang et al. [51] show how the Thau [36] have offered further insight into the issues of changes in concepts and their impacts can be identified taxonomic ontology modelling. So far, a few taxonomic automatically by comparing the concepts both extension- ontologies have been published in the NCBO BioPortal ally and intensionally in cases where they do not have fixed identifiers. Table 1 The main differences between LSIDs and HTTP URIs Methods Life science HTTP URIs In order to develop two models for presenting taxonomic identifiers information in a machine-processable way, four design Standardised by Object Management Internet Engineering principles were applied to satisfy the following conditions: Group [32] Task Force [33] 1. useasfewtermsaspossibletoexpressasmuch Reuse existing Defines a new Uses an URI schemes URN subscheme existing scheme information as possible in the schema of the model. The taxonomic terminology and its usage is Data retrieval/ Specific resolving Uses existing dereferenceability service needed web technology established in biology, and the terms are used in (DNS, web servers) consistent way. As few new terms as possible are Structure of identifier Strict Flexible introduced. 2. focus on a restricted domain, that is, scientific Linked Data compatibility No Yes species checklists including all taxonomic Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 5 of 13 http://www.jbiomedsem.com/content/5/1/40

information and excluding any other taxon-related expressing level in a hierarchical classification (species, information (e.g., distribution). genus, etc.). A taxonomic hierarchy between scientific 3. support information on various levels of granularity, names is constructed using a hierarchical part-of relation. as the source material is heterogeneous in its level of An LSID that is obtained from an external source can detail. be assigned to a taxon concept as an attribute. Common 4. accept all views of taxonomy equally legitimate names in multiple languages can be connected to the con- regardless of the time they were disseminated. cept, but no taxonomic rank can be specified for them. In order to recognise the orthographic variants of scien- The focus of the models is in representing the taxo- tific names, LSIDs are accommodated to the names as nomic relations between taxa in a single checklist (clas- well. sification, synonymy), in different checklists (mapping AnewLSIDisgiventoaconceptifitchanges,suchasa taxonomic concepts) and in individual versions of a check- taxonomic change, an addition or removal of a synonym, list (managing taxonomic changes). or a change in relations between taxa. An LSID is assigned The datasets utilised in the study consist of 20 published to a new taxon when added to a dynamic checklist. LSIDs species checklists that cover mainly northern European are versioned in the case of minor changes using the mammals, birds and several groups of and assem- optional part of the identifier. The decision whether to ble ca. 78,000 taxon names (Additional file 1). Two models create a new object identifier of an LSID or a new version are applied to the same datasets. Name mappings between is made by a maintainer. the checklists are provided for eight families of papilionoid Taxa can be searched using a complete or partial sci- and hesperioid butterflies. entific name via a user interface, and the system returns a currently valid name and its synonyms. If the taxon Results is found in other checklists, their interrelations are also Taxonomic database described. The information is also provided as an RDF The main elements of the Taxonomic Database (Figure 2) representation for machine consumption. Only the latest [52] are a binominal scientific name and a taxonomic versions of dynamic checklists can be seen in the system. concept that connects the names that refer to the same However, older ones are stored internally in the database. taxon. Each concept is identified with a concept LSID. In Taxonomic concepts are linked on the basis of their addition, three other attributes are assigned to the sci- equivalence at a species level, but at higher levels the entific name: 1) a reference to the original publication alignment of taxa is based on the species content. For (author name and year of publication) in which the taxon instance, two species that have the same name and the description was first published, 2) a status of a name indi- same authorship citation are linked as congruent by cating its validity in the checklist, and 3) a taxonomic rank default, but two genera are linked as congruent only if the species belonging to the genera are the same. The rea- sons for treating species and taxa above the species level differently are debated in the Discussion.

Taxonomic meta-ontology TaxMeOn is an ontology schema for biological names, and here we present the part that describes species checklists. The model is based on RDFS (RDF Schema) and some fea- tures of OWL (Web Ontology Language); it contains 12 classes with 49 subclasses (excluding 61 subclasses of the class TaxonomicRank) and 28 properties. The core classes and their relations are illustrated in Figure 3. The class TaxonInChecklist represents both a scientific Figure 2 A simplified structure of the relational taxonomic name and its concept. The relation rdfs:label expresses the database. The boxes illustrate the tables of the database, and the unominal name of a taxon which is 1) the last epithet of a lines present the relations between them. LSIDs are given to name combination, or 2) a name of a taxon at higher lev- taxonomic concepts and scientific names (illustrated with a darker els, e.g., a family. The taxonomic hierarchy is constructed colour). Taxonomic concepts are linked to each other using the using the relation isPartOfHigherTaxon. relations described in Table 3 and each concept is linked to a scientific name. External LSIDs and common names are connected to The author references are presented in two ways: the concepts. An author reference, validity, and a taxonomic rank are assigned to the scientific names. 1. The property hasScientificNameAuthorship expresses the author of the original publication (if the Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 6 of 13 http://www.jbiomedsem.com/content/5/1/40

Figure 3 The core classes of the taxonomic meta-ontology. The classes are illustrated with ellipses (colours are to improve the readability of the figure). The arrows indicate relations (properties) between the classes. The subclass relations are indicated with lighter-coloured arrows and a few examples of the subclasses. To demonstrate how the TaxMeOn model is applied, an example taxon depicted using dotted lines is illustrated. The example taxon is an instance of the class TaxonInChecklist and of a specific taxonomic rank. The properties associated with the example taxon are marked with dotted-line arrows. The properties with literal values are not shown in the figure.

full reference of the original publication is not example as Turtle [54] presentation is in Additional provided in a checklist). file 2. 2. The properties publishedIn and In Figure 3, the relation hasStatus is associated with the publishedOriginallyIn refer to the publication. class TaxonInChecklist and indicates: 1) the nomenclatu- ral status of a name (nomen alternativum, nomen correc- The way the taxonomic authority information is worded tum, etc.), 2) the orthographic variants (altered spelling, differs between zoology and botany. Author names are incorrect spelling, etc.), and 3) the current opinion of often abbreviated in diverse ways in zoology; for exam- a taxonomic concept (valid, synonym, etc.). Modelling ple, both L. and Linn. stand for Linnaeus. In botany, the the changes is further discussed in the Discussion. Other abbreviations are standardised, but if a species is shifted important properties and their explanations are listed in into another genus, a new author name is catenated into Table 2. the author reference (unlike in zoology). For instance, Lin- The taxonomic concepts are mapped using the rela- naeus first described the species Bassia scoparia in the tions described in Table 3. An additional relation isAs- genus Chenopodium and later A.J. Scot shifted it into the sociatedWithTaxon is provided for linking concepts in genus Bassia. The order of multiple authors comes out in taxonomically unresolved cases. The relation describes the literal, i.e., (L.) A.J. Scot. an undetermined connection between taxa, which is use- A binominal name combination of a species with a refer- ful if deeper expertise is not available when mapping the ence to the original author (e.g., Arhopalus ferus (Mulsant, concepts. 1839)) is formed by traversing the RDF graph where a The taxa can be mapped to an external source as shown genusnameisobtainedusingtheisPartOfHigherTaxon below, where the genera Arhopalus are mapped congru- relation and the other parts of the name from the lit- ently between two checklists. erals. The literal completeTaxonName is for facilitating the usage of the model for humans, and is generated from a genus name, a species name, and an author ref- @prefix cerambycids: . erence. Dublin Core attributes [53] are supported (e.g., @prefix taxmeon: . bibliographical details). Figure 4 presents an example cerambycids:p2090 taxmeon:congruentWithTaxonInt of the species Arhopalus ferus which was described by . Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 7 of 13 http://www.jbiomedsem.com/content/5/1/40

Figure 4 Core taxonomic information represented according to the taxonomic meta-ontology. Ferus is described by Mulsant in 1839 and it belongs to the genus Arhopalus.

Table 2 The core properties of the Taxonomic The URI of the scientific name and its concept (Tax- Meta-Ontology and their explanations onInChecklist) is duplicated when there is a taxonomic, Property Explanation nomenclatural, or hierarchical change. In this way, a par- Citation-related properties ticular taxon can be explicitly referred to at a particular occursInChecklist Reference to a species checklist time. The old and the new URIs are connected with the auctorumYear The year of original publication relations described in Table 3. Temporal management completeAuctorumString Author name(s) expressed according is based on the time stamps of scientific names’ taxo- to the established practices of taxonomy nomic status in dynamic checklists. In static checklists, the temporal order of the taxon instances is traced by the Name-related properties publication year of the checklist. hasNonvalidName Expresses synonyms, homonyms Two examples of concept mapping and taxonomic and orthographic variants of a valid changes are presented below. Each scientific name is given scientific name a new URI in each static checklist. Different URIs for the hasVernacularName The common name equivalents for the scientific names same scientific name enable the presentation of alterna- hasNomenclaturalCode Specifies the set of rules that are applied tive classifications and different sets of taxonomic details. (ICN [55] or ICZN [56]) The first example presents four cases presented in static hasVernacularNameStatus Expresses whether a common name is checklists: accepted or an alternative one rdf:type Expresses the hierarchical level in 1. Two species of long-horn beetles, pubescens a classification. The ranks are obtained Fabricius, 1787 and revestita Linnaeus, 1767 belong from TDWG Taxon Rank LSID Ontology [57]. to the genus Leptura Linnaeus, 1758 in the checklist Every taxon is an instance of a specific that was published in 1992 [58]. taxonomic rank and the class TaxonInChecklist (Figure 3). 2. Both species belong to the genus Pedostrangalia See other properties in the Results section, in subsection Taxonomic Sokolow, 1758 in the checklist published in Meta-Ontology. 2011 [59]. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 8 of 13 http://www.jbiomedsem.com/content/5/1/40

Table 3 The relations used for mapping underlying taxonomic concepts Relation between taxa Intensive Ostensive Notation Properties Congruent Share the same characters Share the same species A = B Symmetric, transitive Part of All characters of a taxon are All species are included in A ⊂ B Non-symmetric, transitive included in another taxon another taxon Overlap At least one character is shared At least one species is shared A ∩ B =∅, A = B Symmetric, non-transitive between taxa, but not all of them between taxa, but not all of them The division into intensional and ostensive relations [35] is only available in TaxMeOn (not in the Taxonomic Database).

3. The species L. aethiops Poda, 1761 remained in the where NUMBER is a randomly generated unique identi- genus Leptura while two other species were shifted fier for the checklist data. For the authors and TaxMeOn, in 2011 [59]. the LOCAL_ID is human-readable. The number of RDF 4. Pedostrangalia was a synonym for Leptura in triples after the data conversion (TaxMeOn) is over 1,2 1992 [58]. million. The details are presented in Additional file 1. TaxMeOn is applied in a broader context as one of the The corresponding RDF representation is presented in use cases of the European research program, the “Envi- Additional file 3. ronmental Observation Web and its Service Applications The second example describes a fictitious dynamic within the Future Internet (ENVIROFI)” [63] which aims checklist with three artificial taxa. The species bus and cus to harmonise biodiversity observation data gathered from belonged to the genus Aus in 2012. Later, these two species heterogeneous sources. were synonymised and bus remained a valid name while cus became its synonym. The URIs of the scientific names are duplicated in order to: 1) preserve the name combina- Discussion tions of the genus Aus (i.e., the lower-level classifications), Identifiers should not embed semantics according to the and 2) present a change in taxonomic concepts and in sta- recommendations of GBIF [28,29], a practical approach tus of the species bus and cus. The corresponding RDF to ensure the persistence of the identifiers should the representation is presented in Additional file 4. concepts change. In practise, it is helpful if URIs are intu- The checklists are managed using the scalable generic itively understandable to some degree when reading RDF. metadata editor SAHA [60], but more complex taxonomic Here, human-readable checklist identifiers are embedded information of the scientific names is managed using in the namespace of the URIs in the data, which is justified the ontology editor Protégé [61]. The species ontologies because the namespaces are permanent. The local names are accessible with several user interfaces and APIs via of the URIs, however, do not carry meaning. The identi- the Finnish Ontology Library Service ONKI [42,62]. The fiers of the classes and properties in ontology models and ONKI browser is used for searching and browsing taxa, schemas are typically human-readable, as is the case in finding currently valid names, and tracing the tempo- TaxMeOn. ral changes in scientific names. The ONKI service also The HTTP URIs used in the data and in TaxMeOn act as provides an autocompletion widget which can be inte- locators for relevant metadata, that follows the best prac- grated into user applications, e.g., a content management tices of Linked Data [31]. The metadata is presented as an system. ONKI provides HTTP and SOAP APIs for pro- HTML page to humans and in RDF format to machines grammatic access and a SPARQL endpoint for querying via content negotiation. theontologies.ThechecklistsinONKIarethesameasin the Taxonomic Database described earlier. Comparison of the two models The HTTP URIs were generated for the data The differences between the Taxonomic Database and the resources in the following form: http://www.yso.fi/onto/ Taxonomic Meta-Ontology are summarised in Table 4. CHECKLIST_ID/LOCAL_ID where CHECKLIST_ID is The Taxonomic Database is a relational database, and a human-readable identifier for a checklist (or a group therefore its structure is strictly specified in a database of checklists, if there is more than one checklist about schema. The advantage of RDF-based TaxMeOn is that the same group) and LOCAL_ID is a local identifier it can easily be extended by adding new classes and for a resource (e.g., scientific name, taxonomic status). properties. Global identifiers (URIs) are given to taxa in Similarly, the URIs of the authors have namespace, with TaxMeOn which allows publishing them as Linked Data the CHECKLIST_ID replaced with the string “author”. and linking and re-using heterogeneous data on the web. The URIs of TaxMeOn are constructed in the same TaxMeOn can also be utilised via standard SPARQL query way, but the CHECKLIST_ID is replaced with the string language and additional APIs. In contrast to the RDF “taxmeon”. LOCAL_ID is in the form “p[NUMBER]”, model, linking other datasets to the Taxonomic Database Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 9 of 13 http://www.jbiomedsem.com/content/5/1/40

Table 4 A comparison of the features of the taxonomic Also in TaxMeOn, the versioning of a static check- database and the taxonomic meta-ontology list is the solution for managing names over time given Taxonomic TaxMeOn its simplicity in comparison to modelling the changes database (Figure 5). Consequently, a large number of URIs are Technology created, which is impracticable for a maintainer if spe- Structure easily No Yes cial tools are not developed. Updating taxonomic changes extensible in a dynamic checklist requires the duplication of the Global linkability No Yes URIs at the species and genus level so that the situation to other contents before and after can be presented and interlinked. This Public interfaces Simple search API, HTTP and SOAP APIs, step is especially necessary if there is a change in a tax- LSID resolver Linked Data, onomic concept. The whole upper classification is not SPARQL endpoint duplicated because that would generate a large number Need of a resolver Yes No of URIs, and here we are more interested in names than Content editing Web interface SAHA [60], Protégé [61] classifications. A machine does not understand that there was a taxo- Content nomic change if the change is not modelled. Two alterna- Granularity of Low High tive ways of describing the changes are demonstrated in taxonomic information Figure 5. One approach is to present a change in a tax- Linking additional No Yes onomy, classification, or nomenclature by forming a class scientific publications that describes the change type (Figure 5A). The situa- Treatment of botanical Identical Not identical tion is described before and after the change, and the two and zoological names instances are connected with relevant relations (Table 3). Semantics applied to No Yes In this way, it is possible to refer to a taxonomic concept at author names a particular time. An alternative approach is to represent Tracking temporal Publication year Versioning of checklists the relations as instances (Figure 5B). The relations are changes of a checklist (static) and duplication ordered temporally by assigning them a time stamp. If the of taxa (dynamic) URIs assigned to the concepts are not duplicated, then it is not possible to refer to a taxonomic concept at a partic- ular time. This might be practical in some cases, because or re-using its data is not straightforward because the anewURIisassignedonlytogenuinelynewinforma- data can only be accessed with a separate LSID resolver tion (new hierarchical relations). The former alternative and a simple search API. The datasets of TaxMeOn can is included in TaxMeOn, and the latter can be used if the be edited with standard RDF tools, such as ontology edi- model is extended with an additional class that describes tors, whereas the Taxonomic Database is managed with the relations between taxa. its own web interface. TaxMeOn supports more detailed taxonomic information than the Taxonomic Database, for Mapping taxonomic concepts example nomenclatural treatments. It also allows link- A species checklist is an understandable way of presenting ing taxa to additional scientific publications and applying information to non-taxonomists, but unfortunately only a semantics to authors instead of presenting them as simple small proportion of species are catalogued, and they cover strings. Moreover, TaxMeOn provides versatile methods only limited geographical areas. Moreover, the informa- for managing dynamic checklists by representing tempo- tion is often insufficient because name combinations are ral changes of taxonomic concepts. not necessarily listed. Cross-linking taxon names between checklists helps a user to piece together the changes in Managing changes in time scientific names and determine the approximate number In the Taxonomic Database, the goal was to create connec- of taxonomic treatments (none vs. many). Linking higher tions between the scientific names of published checklists taxa between checklists is rather artificial because the tax- where the timeline is evident due to the year of publica- onomic concepts are seldom referenced. The problem is tion. Less emphasis was placed on dynamic lists. However, therefore how to reconcile the differing classifications of evincing temporality is achieved by tracking changes in regional checklists. A pragmatic option is to compare the dynamic lists, an activity that requires: 1) keeping a log of species included in a higher taxon. However, this approach taxa removals and additions, 2) creating a new version of a fails to distinguish taxonomy and regionality, leading to a checklist when taxa are removed or added, and 3) linking situation where the occurrence of a new species in a cer- older LSIDs to the new ones. An original link to a genus tain area changes the existing relations between the higher should be kept if a species is shifted into another genus. taxa of checklists. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 10 of 13 http://www.jbiomedsem.com/content/5/1/40

Figure 5 Taxonomic changes in relation to time presented in RDF. (A) The change is modelled as an instance. (B) The relation is modelled as an instance. The instances are depicted as lighter-coloured ellipses and literals as rectangles.

The challenges of concept mapping have been discussed the information on circumscriptions. We decided to use by many researchers [2,35,36,64], and it is suggested that ostensive relations as our default when mapping the con- it should be stated whether comparisons are based on cepts at the species level because the nature of the check- being a member of a group or on characters that unite list can be interpreted as ostensive because they present a the group [35]. In the Taxonomic Database, higher taxa classification. However, intensional relations can be set if are not only aligned on the basis of underlying taxo- there is information about the underlying taxonomic con- nomic concepts, but the occurrence of species are also cepts. The comparison of higher taxa (above the species taken into account due to the lack of information about level) is always based on the species (see the discussion taxonomic concepts in checklists. Higher taxa of the Tax- of the Taxonomic Database above). The use of osten- onomic Database are not mapped with the CoL’s taxa sive relations (Table 3) differs slightly from Franz and identifiers because only the part-of relation could be used Peet’s [35], which is explained by the difference of the data (because a regional species list is always part of a world- (phylogenies vs. species checklists). wide list). Instead, external identifiers (CoL) are treated Linking the taxonomic concepts automatically is a as additional information about the taxonomic concepts. quick way of handling datasets. Automatic mapping Despite the discrepancy between taxonomy and region- immediately links new content to existing without time- ality, a non-taxonomist is more likely to be interested in consuming work by experts that could be done later. A the species inhabiting a certain geographical area than in general taxon class (TaxonGeneral)representsataxonata those found in the entire world. On the other hand, a high level of abstraction, and an instance of it is generated maintainer decides how the model is applied. The taxa in for all taxa. If the taxa share the same name and author- both models are mapped equivalently, but TaxMeOn sup- ship, then they will be automatically mapped to the same ports more than one way of expressing a relation between instance of the class TaxonGeneral. The idea is to keep taxa (Table 3), which benefits users with differing needs the machine-generated mappings separate from the man- and levels of expertise. ual ones. The advantage is that if the mappings are used Franz and Peet [35] present how phylogenetic relation- in information retrieval, then search results can be clas- ships are described using ostensive (i.e., based on being sified according to reliability. Mistakes generated in auto- a member of a group) and intensional (i.e., based on mated work are inevitable, but most links are likely to be characters) relations simultaneously, which increases the correct due to the non-specific nature of the class Taxon- semantic precision of the relations between the concepts. General. Different levels of abstraction increase a model’s In species checklists, there is no satisfactory solution to flexibility. For instance, the International Federation of defining relations at the species level. If ostensive rela- Library Associations and Institutions’ (IFLA) [65] Func- tions are used, there is an assumption that the species tional Requirements for Bibliographic Records (FRBR) have subspecies; however, most species do not have entity-relationship model [66], which is used in online any subspecies. Applying intensional relations assumes library catalogues, represents the products of intellectual that the circumscriptions are known; species lists lack or artistic endeavour at four levels of abstraction. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 11 of 13 http://www.jbiomedsem.com/content/5/1/40

Challenges advantageous when reusing and integrating data as well Detailed information is considered more reliable and as deepening the level of biological information. Linked therefore more likely to be linked to other content than Data efficiently prevents the formation of silos, where vague information. However, most taxonomic informa- distributed information is not interlinkable. The advan- tion in checklists is inaccurate in one way or another. tages of using a Semantic Web approach are presented in Therefore the data model should support the expression Table 4. of information at various levels of detail, resulting in the Linked Data increases the utility of data gathered from complexity of an ontology model. For instance, a taxo- multiple sources, as their reliability is easier to evaluate. nomic author citation can include a set of bibliographical For example, the existence of multiple classifications usu- details or it simply can be an abbreviation of a name. ally indicates that a taxonomic group is complex and many Our aim was to create a practical model that suits diverse opinions of it exist. Traditional databases are not compat- situations, but there is a clear trade-off between prac- ible as such with Linked Data, and they tend to be used ticality and complexity. Combining the scientific name internally by organisations rather than shared. The struc- and its concept into a single unit in TaxMeOn increases ture of a relational database has to be strictly specified in simplicity but decreases the granularity of information. advance because it cannot be easily changed later, unlike The biggest obstacle in using Semantic Web technolo- Linked Data, which is more extensible. gies is the lack of suitable tools. Few ontology editors Thenextchallengeistodevelopamodelthataddresses are available. The most commonly used editor is Protégé, both zoological and botanical nomenclatures that are which is not practicable in this case because taxonomic independent of one another and separated by distinct classifications cannot be viewed hierarchically unless the features. We aim to develop an ontology model that cov- rdfs:subClassOf relation is used. ers both nomenclatures without losing the practicality. In the real world, scientists who study the evolu- Applying Semantic Web technologies is a promising step tionary relationships of organisms are often unaware of in enhancing the linkability of biological contents and the advance of biodiversity informatics, or they simply distributing environmentally important information. ignore it because they evaluate the usefulness of available resources on the basis of content. Misleading or insuf- Availability of supporting data ficient information in databases that is copied from one The datasets are accessible in the Taxonomic Database, place to another does not encourage scientists to con- http://taxon.luomus.fi, and in the ONKI Ontology Ser- tribute or follow best practices. Taxonomists cannot be vice, http://onki.fi. The ontology schema of the TaxMeOn expected to follow what happens in biodiversity informat- model is available at: http://schema.onki.fi/taxmeon/. ics because it might not be their field of interest. However, it would be very helpful if they were willing to report mistakes in content, though it would be frustrating if Additional files the corrections were not made. One can debate whether harmonising names is realistic due to the fact that sci- entific names are constantly changing. However, applying Additional file 1: Datasets included in the study. Additional file 2: Core taxonomic information of a checklist semantics to the content better enables the presenta- expressed in RDF. tion of parallel views reflecting the nature of research. Additional file 3: Alternative classifications in static checklists Databases and ontologies might not be useful for tax- expressed in RDF. onomists because they rely on scientific publications, and Additional file 4: A synonymisation of taxa in a dynamic checklist are familiar with their own subject. Regardless, their input expressed in RDF. is fundamentally important for non-taxonomists, because the need exists for reliable taxonomic information. In Abbreviations LSID: Life science identifier; HTTP URI: Hypertext transfer protocol uniform general, maintaining and updating ontologies is complex resource identifier; TaxMeOn: Taxonomic meta-ontology; EoL: Encyclopedia of compared to databases. The work is worthwhile, though, life; CoL: Catalogue of life; GBIF: Global biodiversity information facilities; TCS: because it facilitates interoperability and the semantically Taxonomic concept transfer schema; XML: Extensible markup language; DwC: Darwin core; TDWG: Biodiversity information standards; DwC-A: Darwin core enriched processing of content, and brings expert knowl- archive; CSV: Comma-separated values; NCBI: National center for edge into wider use in the environmental and biological biotechnology information; URN: Uniform resource name; uBio: Universal sciences. biological indexer and organisator; RDF: Resource description framework; DNS: Domain name system; DNS SRV record: DNS service record; IANA: Assigned numbers authority; API: Application programming interface; HTML: Hypertext Conclusions markup language; SPARQL: SPARQL protocol and RDF query language; NCBO Semantic Web technologies provide a suitable way to BioPortal: National center for biomedical ontology BioPortal; OBO ontology language: Open biomedical ontologies ontology language; RDFS: RDF describe species checklists because they enable the schema; OWL: Web ontology language; ICN: International code of compatibility with Linked Data. This compatibility is nomenclature for algae, fungi, and plants; ICZN: International code of Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 12 of 13 http://www.jbiomedsem.com/content/5/1/40

zoological nomenclature; SOAP: Simple object access protocol; ENVIROFI: The 21. Berendsohn WG: The concept of “potential taxa” in databases. Taxon environmental observation web and its service applications within the future 1995, 44:207–212. internet; IFLA: International federation of library associations and institutions; 22. Berendsohn WG: A taxonomic information model for botanical FRBR: Functional requirements for bibliographic records. databases the IOPI Model. Taxon 1997, 46:283–309. 23. Object Management Group (OMG): Life sciences identifiers final Competing interests adopted specification. 2004. [http://www.omg.org/cgi-bin/doc?dtc/ The authors declare that they have no competing interests. 04-05-01] 24. Page RDM: Taxonomic names, metadata, and the semantic web. Authors’ contributions Biodiversity Inform 2006, 3:1–15. NL was responsible for the content of the Taxonomic Database and designed 25. World register of marine species. [http://www.marinespecies.org] the Taxonomic Meta-Ontology with JT. HS designed the structure of the 26. uBio. [http://www.ubio.org/] Taxonomic Database of the Finnish Museum of Natural History. EH conceived 27. TDWG Globally Unique Identifiers Task Group (GUID): TDWG life the study and participated in its design and coordination and helped draft the science identifiers (LSID) applicability statement. 2007. manuscript. All authors read and approved the final manuscript. [http://www.tdwg.org/fileadmin/subgroups/guid/ LSID_Applicability_Statement_draft.pdf] Acknowledgements The development of the Taxonomic Database was funded by the Nordic 28. Cryer P, Hyam R, Miller C, Nicolson N, Tuama EO, Page R, Rees J, Riccardi G, Council. The Taxonomic Meta-Ontology has been developed as part of the Richards K, White R: Adoption of persistent identifiers for biodiversity FinnONTO project, which built a national Semantic Web infrastructure in informatics: Recommendations of the GBIF LSID GUID task group, 6. Finland. Nationally important ontologies are made available as Linked Data in November 2009. Tech. rep., Global Biodiversity Information Facility the FinnONTO project. We thank Arto Mertaniemi, Jyrki Muona and Hans (GBIF), Copenhagen, Denmark, 2010. [Version 1.1, last updated 21 Jan Silfverberg for collaboration. The two anonymous reviewers are 2010] acknowledged for their comments. 29. Richards K, White R, Nicolson N, Pyle R: A beginner’s guide to persistent identifiers. Tech. rep., Global Biodiversity Information Facility (GBIF), Author details Copenhagen, Denmark 2011. [Version 1.0, released on 9 February 2011] 1Semantic Computing Research Group (SeCo), Department of Media 30. Internet Assigned Numbers Authority (IANA): Uniform resource Technology, Aalto University, P.O. Box 15500, 00076 Aalto, Espoo, Finland. identifier (URI) schemes. [http://www.iana.org/assignments/uri- 2Digitarium, University of Eastern Finland, P.O. Box 111, 80101 Joensuu, Finland. schemes/uri-schemes.xhtml] 31. Heath T, Bizer C: Linked Data: Evolving the Web into a Global Data Space. Received: 14 April 2013 Accepted: 26 August 2014 Palo Alto, California: Morgan & Claypool; 2011. [Synthesis Lectures on the Published: 8 September 2014 Semantic Web: Theory and Technology] 32. Object management group (OMG). [http://www.omg.org] References 33. Internet engineering task force (IETF). [http://www.ietf.org] 1. Patterson DJ, Cooper J, Kirk PM, Pyle RL, Remsen DP: Names are key to 34. Schulz S, Stenzhorn H, Boeker M: The ontology of biological taxa. the big new biology. Trends Ecol Evol 2010, 25(12):686–691. Bioinformatics 2008, 24(13):i313–i321. 2. Jones AC, White RJ, Orme ER: Identifying and relating biological 35. Franz NM, Peet RK: Towards a language for mapping relationships concepts in the catalogue of life. J Biomed Semantics 2011, 2:7. among taxonomic concepts. Syst Biodivers 2009, 7:5–20. 3. Parr CS, Guralnick R, Cellinese N, Page RDM: Evolutionary informatics 36. Franz NM, Thau D: Biological taxonomy and ontology development: unifying knowledge about the diversity of life. Trends Ecol Evol 2012, scope and limitations. Biodivers Inform 2010, 7:45–66. 27(2):94–103. 37. NCBO BioPortal. [http://bioportal.bioontology.org] 4. Segers H, de Smet WH, Fischer C, Fontaneto D, Michaloudi E, Wallace RL, 38. Amphibian taxonomy (ATO). [http://purl.bioontology.org/ontology/ Jersabek CD: Towards a list of available names in Zoology, partim ATO] Phylum Rotifera. Zootaxa 2012, 3179:61–68. 39. Fly taxonomy (FBsp). [http://purl.bioontology.org/ontology/FB-SP] 5. Federhen S: The NCBI taxonomy database. Nucleic Acids Res 2012, 40. Teleost taxonomy (TTO). [http://purl.bioontology.org/ontology/TTO] 40(Database issue):D136–D143. 41. NCBI organismal classification (NCBITaxon). [http://purl.bioontology. 6. Sarkar IN: Biodiversity informatics: organizing and linking org/ontology/NCBITaxon] information across the spectrum of life. Brief Bioinform 2007, 42. Viljanen K, Tuominen J, Hyvönen E: Ontology libraries for production 8(5):347–357. use: the finnish ontology library service ONKI. In Proceedings of the 6th 7. Fauna Europaea. [http://www.faunaeur.org] European Semantic Web Conference (ESWC): May 31–June 4 2009, Heraklion, 8. Atlas of living Australia. [http://www.ala.org.au] Greece. Edited by Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, 9. Encyclopedia of life. [http://eol.org] Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E. Berlin Heidelberg: 10. Catalogue of life. [http://www.catalogueoflife.org] Springer–Verlag; 2009:781–795. 11. ZooBank. [http://iczn.org/content/about-zoobank] 43. OBO flat file format 1.4 syntax and semantics [WORKING DRAFT]. 12. Global Biodiversity information facility (GBIF). [http://www.gbif.org] 2011. [http://purl.obolibrary.org/obo/oboformat/spec.html]. [Mungall C, 13. Checklist bank. [https://github.com/gbif/checklistbank] Ruttenberg A, Horrocks I, Osumi-Sutherland D (editors)]. 14. Taxonomic Names and Concepts Interest Group: Taxonomic concept 44. TaxonConcept.org. [http://www.taxonconcept.org] transfer schema. 2005. [http://www.tdwg.org/standards/117] 45. Tuominen J, Laurenne N, Hyvönen E: Biological names and taxonomies 15. Darwin Core Task Group: Darwin core. 2009. [http://rs.tdwg.org/dwc] on the semantic web – managing the change in scientific 16. Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, conception. In Proceedings of the 8th Extended Semantic Web Conference Robertson T, Vieglais D: Darwin core: an evolving community- (ESWC): May 29–June 2 2011; Heraklion, Greece. Edited by Antonio G, developed biodiversity data standard. PLoS ONE 2012, Grobelnik M, Simperl E, Parsia BD, Plexousakis D, Leenheer P, de Pan JZ. 7:e29715. Berlin Heidelberg: Springer–Verlag; 2011:255–269. 17. Biodiversity information standards (TDWG). [http://www.tdwg.org] 46. Tuominen J, Laurenne N: Taxonomic meta-ontology TaxMeOn 18. Remsen D, Döring M, Robertson T: GBIF GNA profile reference guide specification. 2013. [http://schema.onki.fi/taxmeon] for darwin core archive, core terms and extensions. Tech. rep., Global 47. Kirsten T, Gross A, Hartung M, Rahm E: GOMMA: a component-based Biodiversity Information Facility (GBIF), Copenhagen, Denmark, 2011. infrastructure for managing and analyzing life science ontologies [Version 1.2, released on 1 April 2011] and their evolution. J Biomed Semantics 2011, 2:6. 19. NCBI national center for biotechnology information. 48. Maynard D, Peters W, d’ Aquin M, Sabou M: Change Management for [http://www.ncbi.nlm.nih.gov] Metadata Evolution. In Proceedings of the International Workshop on 20. Johnson NF, Musetti L: Genera of the parasitoid wasp family Ontology Dynamics (IWOD), the 4th European Semantic Web Conference monomachidae (hymenoptera: diaprioidea). Zootaxa 2012, (ESWC): 7 June 2007, Innsbruck, Austria. Edited by Flouris G, d’ Aquin M; 3188:31–41. 2007:27–40. Laurenne et al. Journal of Biomedical Semantics 2014, 5:40 Page 13 of 13 http://www.jbiomedsem.com/content/5/1/40

49. Euzenat J, Shvaiko P: Ontology Matching. Berlin Heidelberg: Springer–Verlag; 2007. 50. Khattak AM, Latif K, Lee S: Change management in evolving web ontologies. J Knowledge-Based Syst 2013, 37:1–18. 51. Wang S, Schlobach S, Klein M: Concept drift and how to identify it. J Web Semantics 2011, 9(3):247–265. 52. Taxonomic database. [http://taxon.luomus.fi] 53. DCMI Usage Board: DCMI metadata terms. 2012. [http://www. dublincore.org/documents/dcmi-terms/] 54. Beckett D, Berners-Lee T: Turtle – Terse RDF triple language. 2011. [http://www.w3.org/TeamSubmission/turtle/] 55. McNeill J, Barrie FR, Buck WR, Demoulin V, Greuter W, Hawksworth DL, Herendeen PS, Knapp S, Marhold K, Prado J, Prud’homme van Reine WF, Smith GF, Wiersema JH, Turland N: International Code of Nomenclature for algae, fungi, and plants (Melbourne Code), adopted by the Eighteenth International Botanical Congress Melbourne, Australia, July 2011. Königstein: Koeltz Scientific Books; 2012. [Regnum Vegetabile]. 56. International Comission on zoological nomenclature (ICZN). [http://iczn.org] 57. TDWG Biodiversity Information Standards: TDWG taxon rank LSID ontology. 2007. [http://rs.tdwg.org/ontology/voc/TaxonRank] 58. Silferberg H: Enumeratio Coleopterorum Fennoscandiae, Daniae at Baltiae. Helsinki, Finland: Helsingin Hyönteisvaihtoyhdistys; 1992. 59. Silfverberg H: Enumeratio renovata Coleopterorum Fennoscandiae, Daniae et Baltiae. Sahlbergia 2011, 16(2):1–144. 60. Kurki J, Hyvönen E: Collaborative Metadata editor integrated with ontology services and faceted portals. In Proceedings of the 1st Workshop on Ontology Repositories and Editors for the Semantic Web (ORES) the 7th Extended Semantic Web Conference (ESWC): 31 May 2010, Heraklion, Greece. Edited by d’Aquin M, García Castro A, Lange C, Viljanen K: CEUR Workshop Proceedings; 2010:7–11. 61. Protégé ontology editor. [http://protege.stanford.edu] 62. Finnish ontology library service ONKI. [http://onki.fi] 63. The environmental observation web and its service applications within the future internet (ENVIROFI). [http://www.envirofi.eu] 64. Kennedy J, Kukla R, Paterson T: Scientific names are ambiguous as identifiers for biological taxa: their context and definition are required for accurate data integration. In Proceedings of the 2nd International Conference on Data Integration in the Life Sciences (DILS) 20–22 July 2005; San Diego, California. Edited by Ludäscher B, Raschid L. Berlin Heidelberg: Springer–Verlag; 2005:80–95. 65. International federation of library associations and institutions (IFLA). [http://www.ifla.org/] 66. IFLA Study Group on the Functional Requirements for Bibliographic Records: Functional requirements for bibliographic records : final report. München, Germany: K.G. Saur; 1998. [UBCIM publications; new series, vol 19].

doi:10.1186/2041-1480-5-40 Cite this article as: Laurenne et al.: Making species checklists understandable to machines – a shift from relational databases to ontologies. Journal of Biomedical Semantics 2014 5:40.

Submit your next manuscript to BioMed Central and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Publication VIII

Jouni Tuominen, Nina Laurenne and Eero Hyvönen. Publishing and Us- ing Plant Names as an Ontology Service. In Proceedings of the first inter- national Workshop on Semantics for Biodiversity at ESWC 2013, Montpel- lier, France, May 27, Pierre Larmande, Elizabeth Arnaud, Isabelle Mougenot, Clement Jonquet, Therese Libourel, Manuel Ruiz (editors), CEUR Workshop Proceedings, volume 979, ISSN 1613-0073, online CEUR-WS.org/Vol-979/WS_s4biodiv2013_paper_2.pdf, May 2013.

c 2013 Tuominen et al.

Reprinted with permission.

183

Publishing and Using Plant Names as an Ontology Service

Jouni Tuominen, Nina Laurenne, and Eero Hyv¨onen

Semantic Computing Research Group (SeCo) Aalto University School of Science, Dept. of Media Technology, and University of Helsinki, Dept. of Computer Science http://www.seco.tkk.fi, [email protected]

Abstract. Animals and plants are referred to using scientific or com- mon names depending on the expertise of an audience or a source of data. The names change in time and therefore their usage as identifiers as such is problematic. We present a solution for managing and using plant names as an ontology. The ontology is based on the TaxMeOn meta-ontology for biological names. In order to refer to organisms un- ambiguously and publish information as Linked Data on the web, the names are given URIs. The ontology is developed collaboratively and it supports the approval process and temporal tracking of the common names. We introduce an ontology service of plant names for end-users and provide user interfaces and APIs for integrating the ontology into applications.

1 Introduction

The scientific names of plants and animals have a major role when indexing, querying, and integrating information about species. Biologists use scientific names although the vast majority of people use the common name equivalents. Contrary to common belief, neither the scientific nor common names identify organisms unambiguously as one name may point to multiple species and one species may have multiple names. New research results change the name combination of the scientific names be- cause taxa are constantly split and lumped. For example, if a species is changed into another genus, the name combination changes accordingly. Approximately 25,000 new species descriptions are published in thousands of journals annu- ally [6] which makes it hard for researchers to keep up-to-date the biodiversity of the nature. Not all organisms need a common name but still there is huge work to be done in developing the vernacular nomenclature and in terms of es- tablished names, the dialect expressions remarkably expand the spectrum of the biological names. The international commissions of the nomenclatures (IBC, ICZN) specify the rules how the scientific names should be used in various taxonomic treatments. The nomenclatures of plants and animals are independent of each other and the rules are applied only to the scientific names. The common names are not regulated but they also change in time because there is often a need to update the common names at intervals. The changing nature of the names poses challenges for their management [5, 10, 13]. The diversity of the names causes problems when combining data from het- erogeneous sources, e.g., observational records, literature and museum collec- tions [11, 9]. The data cannot be easily integrated if a taxon is referred to using multiple names and vice versa the existence of homonyms (the same name refers to multiple taxa) causes errors when merging the data. Comprehensive reference lists and catalogues of the names have been pro- posed as a solution to facilitate the access to the names [1, 10]. However, this is not enough because the biological names ought to be machine-processable in order to refer to them unambiguously and semantically enrich the biological con- tents. Ontologies remarkably increase the re-use and utilization of the available resources which minimizes the amount of manual work when harmonizing data. We present an ontology model for managing the common names of organisms and linking them to the scientific names. The model supports temporal tracking of name changes and an approval process of the common names. The model is used for maintaining and publishing plant names in Finnish as an ontology. The ontology is published as Linked Open Data [3] and can be used as an ontology service.

2 Ontology Model

TaxMeOn1 [14] is an RDF-based meta-ontology for modeling and managing bi- ological names and classifications. TaxMeOn introduces classes and properties for expressing biological names as ontologies. The model consists of three parts according to the level of taxonomic details, which are common names, species checklists, and detailed taxonomic information respectively. In this paper, the focus is on the common names although many of the classes and properties are common to all three parts. The simplified structure of the model is presented in Fig. 1, where the core classes are Scientific name, Common name and their statuses. The status of the Scientific name indicates if a name is an accepted or a synonymized one, etc. The synonyms are linked to an accepted name. The hier- archical structure is constructed setting relations between the Scientific names. The Common names (in one or more languages) that refer to the same taxon are connected through a Scientific name. The model also allows mapping the scientific names to each other based on the underlying taxonomic concepts (con- gruence, overlap, part-of, general association). A taxonomic rank expresses the hierarchical level in a classification (e.g., a species, a genus) and it is specified for every scientific name. The taxonomic ranks are presented as a separate vo- cabulary which contains 61 ranks, of which 60 are obtained from TDWG Taxon Rank LSID Ontology2. In order to avoid the complex details of the botanical and

1 http://schema.onki.fi/taxmeon/ 2 http://rs.tdwg.org/ontology/voc/TaxonRank Fig. 1. The ontology model of the common names of organisms. The ellipses represent classes and the arrows depict relations between the classes. horticultural nomenclatures, the species level and the taxonomic levels below it are treated as one unit. The approval process of the common names is the following: first, a new name is proposed; then the name becomes accepted; and finally, the name may become an alternative, if there is a new accepted name for the same plant. The model allows the maintainers to propose a new name which then can be com- mented by the other maintainers until the name finally gets accepted, rejected or synonymized. The temporal management of the names is based on time stamps which are given to the statuses of the names in the approval process. If a name is given a new status, the old status is not removed from the system. This makes it possible to track the chain of changes of the names and to see the period of time period when a particular name was accepted.

3 Managing Plant Names as an Ontology

We applied the TaxMeOn ontology model to a database of the Finnish names of plants maintained by the Finnish Biology Society Vanamo3. The original database contained nearly 26,000 plant names in Finnish in a single classification. The taxa were divided into three taxonomic levels (a species, a genus and a family) but it is possible to specify more taxonomic levels in the current ontology. The database of the plant names was converted into RDF format based on the TaxMeOn ontology model. The ontology is managed in the metadata editor SAHA4 [7] by the Vanamo association. Currently, the ontology contains 21,797 species, and the number of updates exceeds one thousand names yearly. The

3 http://www.vanamo.fi 4 http://www.seco.tkk.fi/services/saha/ utilization of the ontology facilitates the management of the names because the approval process is integrated into the ontology. The association has an active role in developing new Finnish names for plants and the public availability of the ontology releases voluntary based work for more relevant activities than responding to various queries by journalists, translators etc. The development of the new names is based on the needs, therefore the coverage of the taxa is not systematically or geographically restricted into any particular plant group or a region. The browser-based SAHA editor allows collaborative editing of the ontology, providing the simultaneous access of multiple users and a chat functionality. The TaxMeOn model has been extended to support the management of the ontology in SAHA, by adding a property indicating the current status of the processing of a proposed common name. If a new name is suggested for a species, a maintainer can add it into the ontology and mark it as “in process”. The proposed but not yet processed names can be found easily at later stages of the process.

4 Using Plant Names as an Ontology Service

The ontology is published as Linked Open Data in the Finnish Ontology Library Service ONKI5 [15], as part of the Finnish semantic web infrastructure project FinnONTO6 [4]. The ONKI service provides user interfaces and APIs for ac- cessing and using the plant names in applications. For example, end-users can browse and search the ontology to find a common name for a taxon that they know only by the scientific name. The ONKI selector widget can be integrated into legacy CMS systems to provide an autocomplete and URI fetching features to support the annotation of plant related information. One of the advantages of the ontology service is that the end-users can now access the ontology themselves. Users are directed to the ONKI service via search engines, and they have adopted the service by extending Wikipedia articles about plant species with links to Finnish plant names in ONKI. End-users actively send feedback, comments and corrections to the maintainers, which help them to improve the quality of the content. The ontology is also accessible as a SPARQL endpoint. An example query below shows how the accepted Finnish common names of species (and taxa below it) that belong to a genus “Quercus” (oak) can be retrieved:

PREFIX rdf: PREFIX rdfs: PREFIX xsd: PREFIX taxmeon: PREFIX taxonomic-ranks:

SELECT ?vernacularName WHERE { ?species taxmeon:isPartOfHigherTaxon ?genus . ?genus rdf:type taxonomic-ranks:Genus . ?genus rdfs:label "Quercus"^^xsd:string .

5 http://onki.fi/en/browser/overview/kassu 6 http://www.seco.tkk.fi/projects/finnonto/ ?species taxmeon:hasVernacularName ?vernacularNameRes . ?vernacularNameRes taxmeon:hasVernacularNameStatus ?status . ?status rdf:type taxmeon:AcceptedVernacularName . ?vernacularNameRes rdfs:label ?vernacularName . FILTER langMatches(lang(?vernacularName), "fi") }

The result of the query is a list of the Finnish names of oak species, such as the sessile oak and white oak. The query demonstrates the use of the ontology for cross-language query expansion. Currently, the plant name ontology is used by several cultural museums and libraries for annotating their collections. The ontology is also applied as a use case in the EU FP project ENVIROFI7 which focuses on the environmental usage area of the Future Internet. The ontology is used in the project as a conceptual hub for referring to the plants in the observational data on biodiversity. The ontology has been extended with the English and German names of plants used in the project pilots (these names are not available in the ONKI ontology service).

5 Discussion

5.1 Related Work

The importance of persistent identifiers for organism names and solutions for managing them on the semantic web have been discussed by several workers. Page [8] presented how taxon names are modeled as semantic metadata in RDF form. Taxon names are identified with using Life Science Identifiers (LSID) and the names are connected using taxonomic relations. Taxon names that are ob- tained from various data sources and which refer to the same taxon are mapped using the owl:sameAs relation. Schulz et al. [12] presented the first ontology model of biological taxa and its application to physical individuals. The model is based on a single unchangeable classification. Franz and Thau [2] evaluated the limitations of applying ontologies to the scientific names and concluded that ontologies should focus either on a nomenclatural point of view or on strategies for aligning multiple taxonomies. The Darwin Core (DwC)8 is a metadata schema developed for taxon oc- currence data by the TDWG (Biodiversity Information Standards). The goal of DwC is to standardize the form of how biological information is presented. However, it lacks the semantic aspect and when it comes to the names, the scope of DwC is quite general. Taxonconcept.org9 provides Linked Open Data identifiers for species concepts and links data from different sources. All the names of species are expressed using literals. Also, the machine-processability is weakened by the usage of literal values for expressing the hierarchies. The data contains scientific and common names, and taxonomic statuses.

7 http://www.envirofi.eu 8 http://www.tdwg.org/standards/450/ 9 http://www.taxonconcept.org Many existing databases aim to be comprehensive online catalogues that aggregate individual species checklists, such as the Catalogue of Life (CoL)10 and The International Plant Names Index (IPNI)11. The IPNI database contains only scientific names, but the Catalogue of Life also includes their taxonomic statuses and common names. They both provide the names in a machine-processable form, as RDF conforming to the TDWG Taxonomic Concept Transfer Schema (TCS)12 using LSIDs as identifiers of the names [5]. In the Catalogue of Life the requirement to use a separate LSID resolver for fetching metadata about an LSID prevents the Linked Data compatibility of the dataset. The IPNI database provides an LSID proxy that allows Linked Data compatibility. In the IPNI database, the hierarchy is not expressed explicitly in the RDF (e.g., the genus of a species is shown only in the binomial name literal). There are several other plant name databases available on the web, e.g., the Royal Horticultural Society Horticultural Database13, The Plant List14 and the Euro+Med PlantBase15. Most available resources contain the scientific names, but in few, the common names are included. Common to these systems is that they are intended for human usage, and they are not available in a machine- processable form with unique name identifiers.

5.2 Contributions and Future Work

Most of the related work concentrate on the scientific names, but our focus is on the common names. The common names expand the cross-domain use of the on- tology because they are in wider spectrum of use than the scientific ones. The on- tology is available in machine-processable RDF format, with explicit semantics, e.g., the hierarchical relations are set between the plant URIs, and the statuses of names are supported. The TaxMeOn model provides a solution for managing the approval process of common names, supporting the temporal tracking of the name changes via statuses and their time stamps. The model connects together different names of a taxon facilitating data integration and information retrieval in cases where data is combined from heterogeneous sources. We have also demonstrated the complete workflow from a collaborative devel- opment of an ontology to publishing it as Linked Open Data and as an ontology service which makes it accessible to the general public. The plant name ontol- ogy helps harmonizing the terminology which in turn enhances communication between various users. Application developers can utilize the ontology by using the plant name URIs for unambiguous referencing to plants species. Currently, hybrid taxa are modeled in the ontology in the same way as the ordinary species. An idea for the future development is to extend the model to

10 http://www.catalogueoflife.org 11 http://www.ipni.org 12 http://www.tdwg.org/standards/117/ 13 http://apps.rhs.org.uk/horticulturaldatabase 14 http://www.theplantlist.org 15 http://www.emplantbase.org support the representation of hybrid names at a deeper level. Another area for development is to link the scientific names of plants to their author URIs in DBpedia, connecting the ontology to the Linked Data Cloud (LOD). Ontologies are a bridge between experts and ordinary people in communica- tion and popularizing science. Additionally, the Linked Data approach provides a way how to easily extend an ontology with additional information which in turn increases the information value of contents.

Acknowledgments This work is part of the National Semantic Web Ontol- ogy project in Finland FinnONTO (2003-2012), funded mainly by the National Technology and Innovation Agency (Tekes) and a consortium of 38 public organi- zations and companies, and the EU FP project The Environmental Observation Web and its Service Applications within the Future Internet (ENVIROFI). We thank Leo Junikka and Arto Kurtto for their collaboration.

References

1. Dengler, J., Berendsohn, W.G., Bergmeier, E., Chytr´y, M., Danihelka, J., Jansen, F., Kusber, W.H., Landucci, F., M¨uller, A., Panfili, E., Schamin´ee, J.H.J., Venan- zoni, R., von Raab-Straube, E.: The need for and the requirements of EuroSL, an electronic taxonomic reference list of all european plants. Biodiversity & Ecology 4, 15–24 (2012) 2. Franz, N., Thau, D.: Biological taxonomy and ontology development: scope and limitations. Biodiversity Informatics 7, 45–66 (2010) 3. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1–136, Morgan & Claypool (2011) 4. Hyv¨onen, E., Viljanen, K., Tuominen, J., Sepp¨al¨a,K.: Building a national semantic web ontology and ontology service infrastructure—the FinnONTO approach. In: Proceedings of the ESWC 2008, Tenerife, Spain. Springer–Verlag (2008) 5. Jones, A.C., White, R.J., Orme, E.R.: Identifying and relating biological concepts in the Catalogue of Life. Biomedical Semantics 2(7) (2011) 6. Knapp, S., Polaszek, A., Watson, M.: Spreading the word. Nature 446, 261–262 (2007) 7. Kurki, J., Hyv¨onen, E.: Collaborative metadata editor integrated with ontology ser- vices and faceted portals. In: Workshop on Ontology Repositories and Editors for the Semantic Web (ORES 2010), the Extended Semantic Web Conference ESWC 2010, Heraklion, Greece. CEUR Workshop Proceedings, http://ceur-ws.org (2010) 8. Page, R.: Taxonomic names, metadata, and the semantic web. Biodiversity Infor- matics 3, 1–15 (2006) 9. Page, R.D.M.: Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Briefings in Bioinformatics 9(5), 345–354 (2008) 10. Patterson, D.J., Cooper, J., Kirk, P.M., Pyle, R.L., Remsen, D.P.: Names are key to the big new biology. Trends in Ecology & Evolution 25(12), 686–691 (2010) 11. Sarkar, I.N.: Biodiversity informatics: organizing and linking information across the spectrum of life. Briefings in Bioinformatics 8(5), 347–357 (2007) 12. Schulz, S., Stenzhorn, H., Boeker, M.: The ontology of biological taxa. Bioinfor- matics 24(13), 313–321 (2008) 13. Segers, H., de Smet, W.H., Fischer, C., Fontaneto, D., Michaloudi, E., Wallace, R.L., Jersabek, C.D.: Towards a list of available names in zoology, partim phylum rotifera. Zootaxa 3179, 61–68 (2012) 14. Tuominen, J., Laurenne, N., Hyv¨onen, E.: Biological names and taxonomies on the semantic web – managing the change in scientific conception. In: Proceedings of the ESWC 2011, Heraklion, Greece. pp. 255–269. Springer–Verlag (2011) 15. Viljanen, K., Tuominen, J., Hyv¨onen, E.: Ontology libraries for production use: The Finnish ontology library service ONKI. In: Proceedings of the ESWC 2009, Heraklion, Greece. pp. 781–795. Springer–Verlag (2009) eniS eumC o tnemtrapeD tnemtrapeD tnemtrapeD fo fo fo retupmoC retupmoC retupmoC ecneicS ecneicS ecneicS

a cs sess o a ngo egdelwonK egdelwonK egdelwonK inagro inagro inagro taz taz taz noi noi noi smetsys smetsys smetsys hcus hcus hcus sa sa sa o ds ea s gltoda ruaseht ruaseht i ruaseht i i dna dna dna igolotno igolotno igolotno se se se era era era desu desu desu rof rof rof aA aA l aA l l t t o t - o o - - DD DD DD

O ygolotn ygolotnO ygolotnO secivreS secivreS secivreS rof rof rof nitmon o y bdfi h n vorpmi vorpmi vorpmi gni gni gni eht eht eht ibadnfi ibadnfi l ibadnfi l i l i i yt yt yt fo fo fo tamrofni tamrofni tamrofni .noi .noi .noi 101 101 101 n so p ce tenc et e nma yehT yehT yehT inomrah inomrah inomrah ez ez ez eht eht eht tnetnoc tnetnoc tnetnoc rcsed rcsed rcsed tpi tpi tpi snoi snoi snoi dna dna dna / / /

ewe dan tiliaeoen d vorp vorp vorp edi edi edi ibareporetni ibareporetni l ibareporetni l i l i i yt yt yt ni ni ni dna dna dna neewteb neewteb neewteb 7102 7102 7102 K egdelwon egdelwonK egdelwonK noitazinagrO noitazinagrO noitazinagrO usm ahu mty nitamrofni tamrofni tamrofni noi noi noi smetsys smetsys , smetsys , , hcus hcus hcus sa sa sa muesum muesum muesum nuoJ i nuoJ imouT i nuoJ nen imouT i nen imouT nen srt nloda siabl ltgd ,snoitcelloc ,snoitcelloc ,snoitcelloc latigid latigid latigid ,seirarbil ,seirarbil ,seirarbil dna dna dna enilno enilno enilno .serots .serots .serots S smetsy smetsyS smetsyS fo esu eht etat i l icaf secivres ygolotnO ygolotnO ygolotnO secivres secivres secivres icaf icaf l icaf l i l i i etat etat etat eht eht eht esu esu esu fo fo fo i sess o a ngo egdelwonk egdelwonk egdelwonk inagro inagro inagro taz taz taz noi noi noi smetsys smetsys smetsys ni ni ni rof sdohtem stneserp siseht sihT .snoitacilppa .snoitacilppa .snoitacilppa sihT sihT sihT siseht siseht siseht stneserp stneserp stneserp sdohtem sdohtem sdohtem rof rof rof nto o sca yoon e e e soc soc t soc t t fe- fe- fe- cef cef t cef t i t i i ev ev ev ygolotno ygolotno ygolotno ssecca ssecca ssecca rof rof rof tnetnoc tnetnoc tnetnoc J inuo inuoJ inuoJ nenimouT nenimouT nenimouT n ,secas o arfi ,srexedni srexedni , srexedni , , tamrofni tamrofni tamrofni noi noi noi srehcraes srehcraes , srehcraes , , dna dna dna rplvdnitc lppa lppa lppa taci taci taci noi noi noi srepoleved srepoleved . srepoleved . .

ci lbup a sa dedivorp era sloot depoleved ehT ehT ehT depoleved depoleved depoleved sloot sloot sloot era era era dedivorp dedivorp dedivorp sa sa sa a a a lbup lbup lbup ci ci ci h ea cf et n ,cve a nv l l l gnivi gnivi gnivi bal bal bal ,ecivres ,ecivres ,ecivres dna dna dna yeht yeht yeht icaf icaf l icaf l i l i i etat etat etat eht eht eht nrffi of sioon f s sontlumis lumis lumis suoenat suoenat suoenat esu esu esu fo fo fo seigolotno seigolotno seigolotno morf morf morf id id f id f f tneref tneref tneref dna ,scimonoce ,cisum sa hcus ,sniamod ,sniamod ,sniamod hcus hcus hcus sa sa sa ,cisum ,cisum ,cisum ,scimonoce ,scimonoce ,scimonoce dna dna dna o dteep dhe .eu uirga rga rga luci luci luci erut erut . erut . . A A A dohtem dohtem dohtem i i s i s s detneserp detneserp detneserp rof rof rof oloib fo senh ht g ganam ganam ganam i i i gn gn gn t t t eh eh eh egnahc egnahc s egnahc s s o o o f f f b b b i i o i o o l l l go go go i i c i c a c a a l l l g o n mnx t t t imonoxa imonoxa e imonoxa e s e s s a a a s s s na na na no no no t t o t o o l l l ygo ygo ygo . . .

m S n a O egeloK rof secivre g nO t o nO t l o l ygo ygo eS r v eS r i v c i e s c e s f o r f o lwonK r gde lwonK e gde e rO nag rO i nag z i a t z a i t no i no yS s sme yS t s sme t mi revoC C C o o o v v v e e e r r r i i i am am am g g g e e e : : : nO t o l ygo eS r v i c e s f o r lwonK gde e rO nag i z a t i no yS s sme t slevel eerht pot eht fo yhcrareih tpecnoc ehT ehT ehT tpecnoc tpecnoc tpecnoc yhcrareih yhcrareih yhcrareih fo fo fo eht eht eht pot pot pot eerht eerht eerht slevel slevel slevel KK eht fo o o f f f t t h t h h e e e OKOK OKOK OKOK o o o n n n t t o t o o l l o l o o g g g y y y

B I I I NBS NBS NBS 6-6547-06-259-879 6-6547-06-259-879 6-6547-06-259-879 ( p ( ( p r p r i n r i n i t n t t de de ) de ) ) SSENISUB SSENISUB SSENISUB + + + B I I I NBS NBS NBS 9-5547-06-259-879 9-5547-06-259-879 9-5547-06-259-879 ( ( ( dp dp f dp ) f ) f ) E E E MONOC MONOC MONOC Y Y Y S I I I NSS NSS NSS - L - - L L 9971 9971 - 9971 - - 4394 4394 4394 S I I I NSS NSS NSS 9971 9971 - 9971 - - 4394 4394 4394 ( p ( ( p r p r i n r i n i t n t t de de ) de ) ) TRA TRA TRA + + + S I I I NSS NSS NSS 9971 9971 - 9971 - - 2494 2494 2494 ( ( ( dp dp f dp ) f ) f ) NGISED NGISED NGISED + + + HCRA HCRA HCRA I T I I T T TCE TCE TCE ERU ERU ERU ytisrevinU otlaA otlaA otlaA ytisrevinU ytisrevinU ytisrevinU ceic oohcS oohcS l oohcS l o l o f o f f cS cS i cS i i ecne ecne ecne ECNEICS ECNEICS ECNEICS + + + cec rtpo f tnemtrapeD tnemtrapeD tnemtrapeD fo fo fo retupmoC retupmoC retupmoC ecneicS ecneicS ecneicS T T E T E E ONHC ONHC ONHC L L L GO GO GO Y Y Y www www www . a . a . a a l a t l o t l o t . o f . i f . f i i EVOSSORC VOSSORC VOSSORC RE RE RE COD COD COD T T T RO RO RO A A A L L L +ggfeha*GMFTSH9 +ggfeha*GMFTSH9+ggfeha*GMFTSH9 COD COD COD T T T RO RO RO A A A L L L SNOITATRESSID SNOITATRESSID SNOITATRESSID

SNOITATRESSID SNOITATRESSID SNOITATRESSID ytisrevinU ytisrevinU ytisrevinU otlaA otlaAotlaA

7102 7102 7102