Semantic Archive Integration for Holocaust Research. the EHRI Research Infrastructure

Umanistica Digitale - ISSN:2532-8816 - n.4, 2019 A. van Nis en, L. J#ngma – H#l#ca&st and (#)ld (a) *+# Lin,ed O en Data Devel# ments in t.e Net.e)lands DOI: htt ://'#i.#)g/10.6092/issn.2532-8816/9049 Semantic A)c.ive Integ)ati#n 0#) %#l#ca&st 1esea)c.. *.e 2%1I 1esea)c. In0)ast)&ct&)e 13ladmi) Ale4iev, 2Ivelina Ni,#l#va and 3Neli Hateva -nt#text 5#) ., S#0ia, 6&lga)ia [email protected] [email protected] [email protected] A7st)act. *.e 2&)# ean %#l#ca&st 1esea)ch In0rast)&ct&)e 82%1I9 is a large-scale 2U )#:ect inv#lves 23 instit&ti#ns an' a)c.ives +#),ing #n %#l#ca&st st&'ies, 0)#m 2&)# e, Israel an' t.e US. In its 0irst .ase 82011-2015) it agg)egate' a)chival 'escri ti#ns an' materials #n a large scale an' 7&ilt a 3i)t&al 1esea)ch 2nvi)#nment 8 #rtal9 0#) %#l#ca&st )esea)chers base' #n a gra . 'atabase. In its secon' .ase 82015- 2019), 2%1I-2 see,s t# en.ance t.e gat.e)e' materials &sing semantic a ) enrichment, co-)e0e)encing, interlin,ing. Semantic integrati#n inv#lves 0#&) #0 t.e 14 2%1I-2 +#), ackages an' .el s integrate 'atabases, 0)ee text, an' meta'ata t# inte)connect .ist#rical entities 8 e# le, #rganizati#ns, laces, .ist#ric events9 an' create net+#),s. (e +ill )esent s#me #0 t.e 2%1I-2 technical +#),, incl&'ing critical iss&es +e .ave enco&nte)e'. 2%1I $ 2&)# ean %#l#ca&st 1esea)ch In0rast)&ct&)e $ < &n )#gett# e&)# e# cui a)teci an# 23 0ra istit&ti 'i rice)ca, enti 'i c#nse)vazi#ne e azien'e in0#rmatiche in 2&)# a, Is)aele e Stati Uniti. Nella rima 0ase 'el )#gett# 82%1I-1, 2011-2015) < stata avviata &na raccolta s& larga scala 'i 'escrizi#ni a)chivistic.e e materiali s&lla S.#a. che s#n# state integrate nel 3i)t&al 1esea)ch 2nvi)#nment 82%1I (e7 =#)tal9 basat# s& >ra . D6. Nella secon'a 0ase 'el )#gett# 82%1I-2, 2015-2019) si sta ce)can'# 'i val#rizza)e i materiali raccolti &tilizzan'# a )#ci 'i ti # semantico 8enric.ment, interlin,ing, c#-)e0e)encing9. In ?&esta attivit@ s#n# coinv#lti ?&att)# +#),- ackages 8s&i 14 'ellAinte)# )#gett#9, t&tti im egnati e) lAintegrazi#ne 'i 'atabase, testi e meta'ati al 0ine 'i connette)e 0ra 'i l#)# entit@ varie 8 ers#ne, enti, l&#g.i, eventi st#rici9 e crea)e cosB 'ei net+#), 'i con#scenza. In ?&est# a e) veng#n# )esentate le attivit@ sv#lte 'ai vari +#), ackages 'i 2%1I 2 e) lAintegrazi#ne 'ei 'ati, 'an'# s azi# e riliev# anche alle criticit@ incontrate nel cors# 'el lav#)#. 131 Umanistica Digitale - ISSN:2532-8816 - n.4, 2019 Introduction *.e 2&)# ean %#l#ca&st 1esea)ch In0)ast)&ct&)e 82%1I9 is a la)ge-scale 2U )#:ect inv#lves 23 instit&ti#ns +#),ing #n %#l#ca&st st&'ies, 0)#m 2&)# e, Is)ael an' t.e US. In its fi)st .ase (2011-2015) 2%1I agg)egate' a)chival descri ti#ns an' mate)ials #n a la)ge scale an' 7&ilt a 3i)t&al 1esea)ch 2nvi)#nment s:// #)tal.e.)i-p)#:ect.e&/9 0#) %#l#ca&st )esea)che)s base' #n a g)a . database (ne#4j9 3.. 2%1I )es&lts +e)e )esente' in seve)al %#l#ca&st-related maga;ines 2.; 10. and confe)ences 5.; 14.. In its secon' .ase (2015-2019), 2%1I-2 see,s t# en.ance t.e gat.e)e' mate)ials &sing semantic a )#aches: en)ic.ment, co-re0e)encing, inte)lin,ing, ge#-ma ing, name' entitD )ecogniti#n, t# ic e4t)acti#n an' ma ing, etc. E#&) #0 t.e 14 2%1I-2 +#), ackages 8WP10, WP11, WP13, WP14) &se Semantic Integ)ati#n a )#aches, +.ich .el s integ)ate data7ases, 0)ee te4t, an' metadata t# inte)connect .ist#)ical entities 8 e# le, #)gani;ati#ns, laces, .ist#)ic events9 and create net+#),s. In detail: • (=10 82AD9 c#nve)ts a)chival 'escri ti#ns 0)#m va)i#&s 0#)mats t# standa)' 2AD FG!; t)ans #)ts EADs using OAI PM% o) Res#&)ceSDnc; ingests EADs t# t.e E%1I data7aseC ena7les &se cases s&ch as sDnch)#ni;ati#nC c#-refe)encing #0 te4t&al Access =#ints t# p)# e) t.esa&)&s refe)ences. • (=11 8A&t.#)ities an' Stan'a)'s9 c#ns#lidates an' enla)ges t.e 2%1I a&t.#)ities t# )en'e) t.e inde4ing an' )et)ieval #0 in0#)mati#n m#)e e00ective. It a'')esses Access =#ints in ingeste' 2ADs 8n#)mali;ati#n #0 Unic#de, s elling, &nct&ati#nC de'& licati#nC cl&ste)ingC co-refe)encing t# a&t.#)itD c#nt)#l), S&7:ects (de l#Dment #0 a *.esa&)&s Ganagement SDstem in s& #)t #0 t.e 2%1I *.esa&)&s 2dit#)ial 6#a)d), =laces (c#-refe)encing t# >e#names9C 5am s an' >.ett#s (integ)ating data wit. (i,idata9C =e)s#ns, 5#) #)ate 6#dies 8&sing USHGG %SV an' 3IAE9C semantic (conce t&al9 sea)ch incl&'ing .ie)a)chical ?&e)D e4 ansi#nC inte)connectivitD #0 a)chival 'escri ti#nsC e)manent U1!sC metadata ?&alitDC 2AD 1ela4N> and Schemat)#n schemas and validati#n, etc. • (=13 8Data In0)ast)&ct&)es9 7&ilds & '#main ,n#wledge bases 0)#m instit&ti#nal data7ases 7D &sing de'& licati#n, semantic data integ)ati#n, semantic te4t analDsis. It )#vides t.e 0#&ndati#n 0#) )esea)ch &se cases #n "ewis. S#cial Net+#),s an' t.ei) im act on t.e chance of s&)vival. • (=14 8Digital %ist#)i#g)a .D 1esea)c.9 +#),s on semantic te4t analDsis (semantic en)ichment), te4t an' entitD simila)itD, ge#-ma ing. It devel# s Digital %ist#)i#g)a .D resea)che) t##ls, incl&ding P)#s# #g)a .ical a )#aches. Ei)st +e )esent t+# e4am les #0 &sing semantic lin,ing (t.e 2%1I 6l#g an' 5DE5 collecti#n), +.ic. )#vi'e m#tivati#n 0#) a)t #0 t.e 2%1I technical +#),. *.en +e )esent t.e wo), packages. (e c#ncl&'e wit. a s&mma)D, less#ns lea)ned and o&tstanding challenges. 132 A. van Nis en, L. J#ngma – H#l#ca&st and (#)ld (a) *+# Lin,ed O en Data Devel# ments in t.e Net.e)lands The EHRI Document Blog *.e 2%1I D#cument 6l#g s://bl#g.e.)i-p)#:ect.e&/9 was sta)te' as a s ace t# s.a)e i'eas a7#&t %#l#ca&st-relate' a)chival '#cuments, an' t.ei) )esentati#n an' inte) )etati#n &sing digital t##ls. 2%1I )esea)che)s '#cument t.ei) activities an' e4 e)iment wit. 'iffe)ent waDs t# e4 lain an' s.#+ 'igital a)chival content. *.e bl#g is a s.#+case #0 &sing n#vel a )#aches 0#) digital a)chival resea)ch. *.e bl#g als# )#vi'es ins i)ati#n t# t.e tec.nical +#), ackages (describe' bel#+9 as t# 0&ncti#nalities an' analDses can be &se0&l t# )esea)che)s, in #)de) t# a&t#mate a)t #0 t.ei) +#),, and all#w t.em t# analD;e s&ch la)ge am#&nts of data was n#t p#ssible p)evi#&slD. =ict&)e 1: 2%1I D#cument 6l#g. 5#&)tesD 2%1I 2016 The Value of Linking An ea)lD ins i)ati#n .el ed convince t.e 2%1I =)#:ect Ganagement 6#a)d #f t.e val&e #0 inte)lin,ing, was t.e +#), e)0#)me' 7D t.e Contem #)a)D "ewis. D#cumentation 133 Umanistica Digitale - ISSN:2532-8816 - n.4, 2019 Cente) Eoun'ation 85DE59, Gilan, in lin,ing t.ei) inte)nal databases (a)chival mate)ials and pe)s#ns) t# Lin,ed O en Data (L-D). E#) e4am le, #ne #0 t.e 9k e)s#n )ec#)ds 5DE5 .as is a7#&t =)im# !evi, an Italian "ewis. chemist, mem#i)ist, s.#)t st#)D +)ite), n#velist, essaDist, an' A&schwit; s&)viv#). *.is e)s#n is p)esent in 50 wi,i edias (a)ticles a7#&t him, e.g. see P)im# !evi #n en.wi,i edia1), 11 wi,i?&#te sites (his saDings), and 2 wi,is#&)ce sites (his b##,s). *.e)e is ?&ite a l#t #0 st)&ct&)e' !in,e' - en Data 8!-D) a7#&t .im: =)im# !evi #n wi,idata2 (o) see P)im# Levi on Reas#nat#),3 w.ic. is a nice reading inte)face): • Names in a n&m7e) of lang&ages (incl&ding R&ssian, C.inese, K#)ean, A)abic): P)im# !Ivi | P)im# Mic.ele Levi | P)im# levi | P)im# M.

