The National Library of Catalonia and the Digital Heritage
Total Page:16
File Type:pdf, Size:1020Kb
Digital Heritage Unesco 2003 Digital heritage embraces cultural, educational, The National Library of Catalonia scientific and administrative resources, as well as technical, legal, medical and other information and the Digital Heritage created digitally , or converted into digital form from existing analogue resources. Ciro Llueca Biblioteca de Catalunya [email protected] www.bnc.cat Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 National Library of Catalonia Bibliographic production at 2007 Strategy Books, Leaflets, Printed sheets, Periodic The mission of the Biblioteca de Catalunya is to publications, Engravings, Maps and plans, Posters, collect, to preserve and to diffuse Postcards, “Slides“, Prints or sound recordings, Movies... the Catalan bibliographic production and that + related to Catalan linguistic area ; Websites: e-books, e-News, Weblogs, Webcams, to watch over the bibliographic heritage preservation on-line votes, Chats, Corporative websites, e- and divulging; commerce, Personal webs, Digital articles and and to keep up the condition of a consultation and papers, software documentation... universal scientific research centre. Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 National Library of Catalonia National Library of Catalonia Projects focused in digital heritage ARCA (Old Serials Repository) Retrospective ARCA (Arxiu de Revistes Catalanes Antigues) ARCA (Old Serials Repository) Full online access & search * RACO (Open Serials Repository) Digitalizing from closed collections CLACA (Catalan Classics Portal) 50 titles, 50.000 pages (goal 2009: 300 tit.) Google Book Search Cooperating with several public & univ libraries Born digital Copyright: Public Domain PADICAT (Catalan Web Archive) http://mdc.cbuc.cat/ http://www.bnc.cat/digital/arca/ Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 1 National Library of Catalonia National Library of Catalonia RACO (Open Serials Repository) CLACA (Catalan Classics Portal) * RACO (Revistes Catalanes amb Accés Obert) CLACA (Clàssics Catalans) Full online access & search Full online access & search Digitalizing and Publishing in pdf the “press Digitalizing from own holdings (books, articles, copy” from live collections letters, graphics, audio & video, etc.) 100 titles Thematic collections: Jacint Verdaguer (poet) Cooperating with university libraries Copyright: Public Domain Copyright: Agreements with publishers (universities, professional associations, etc.) http://www.bnc.es/fons/claca.php http://www.raco.cat/ Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 National Library of Catalonia National Library of Catalonia Google Book Search PADICAT (Catalan Web Archive) Google Book Search PADICAT (Patrimoni Digital de Catalunya Full online access & search = Digital Heritage of Catalonia) X00.000 titles Full online access & search Cooperating with 4 Libraries in Catalonia 100.000 websites (goal 2009) Copyright: Public domain Cooperating with a technical partner (CESCA), and 1.000 web producers (goal 2009) Copyright… http://books.google.com/ http://www.padicat.cat/ Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Born digital heritage is substantial Born digital heritage is ephemeral 1999 2001 2003 2006 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 2 Experiences in Digital preservation Experiences in Digital preservation Germany, 1997 Iceland, 1997 Australia, 1996 Japan, 2002 Austria, 1999 Lithuania, 2002 Canada, 1994 Norway, 2001 Denmark, 1998 New Zealand, 1999 USA, 2000 Netherlands, 1995 Estonia, 2004 Quebec, 2000 Finland, 1997 United Kingdom, 2004 1996 France, 2000 Czech Republic, 2001 55 billion Web pages Greece, 2003 Sweden, 1996 Internet Archive http://www.archive.org / Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Experiences in Digital preservation Experiences in Digital preservation 1996 1996 Selective archive Mass compilation 26.000 webs, 33 million files 347.000 webs, 306 million files Sweden http://www.kb.se/kw3/ENG/ Australia http://pandora.nla.gov.au / Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Experiences in Digital preservation Models : advantages & inconveniences Exhaustive or integral Selective Rich and appropriate collection Well-balanced collection Short-timeAutomatic evident mass results compilation MaximumSelective access compilation to results Low cost Strategic model according to domain, tpic or according to the topic, Excludes invisible Internet Partialnational view interest or language Irregular collection Out ofgeographic context criterion Limited access to results High cost Hybrid 1998 Rich andA well-balancedcombination collection of the preious models. Hybrid system Appropriate Is the way according to the trend of the most of 600.000 webs aprox. Strategic model with evident results Includes invisible Internetthe projects Maximum access (in regard to author’s copyright) Denmark http://netarchive.dk/index-en.php High cost, similar to Selective model Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 3 PADICAT PADICAT Strategy Exhaustive captures The mission of PADICAT project is • Webs under .CAT domain to collect, to preserve and to diffuse • Webs located in Catalan hosts the catalan digital production and that related to • Webs in Catalan language under other domains catalan linguistic field ; (.ES, .ORG, .NET, .COM, .INF, ...) to watch over the digital heritage preservation and • Webs created by Catalan people divulging; • Other web sites related to Catalonia, not included and to keep up the condition of a consultation and by the previous criteria universal scientific research centre. • 472 webs * 917 captures * 9,5 M files * 235 GB Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Ajuntament d'Arenys de Mar Associació de Mestres Rosa Sensat PADICAT Universitat Politècnica de Catalunya PADICAT Col·legi de Pedagogs de Catalunya Federació Catalana de Cineclubs Selective captures Ajuntament de Cerdanyola del Vallès Focalized Action: elections 2006 Ajuntament de Cervera Col·legi de Biòlegs de Catalunya Ajuntament de Girona • Selection of 1.000 representativePetita i institutionsMitjana Empresa de Catalunya (PIMEC) of Fundació Joan Brossa • Political Parties (CIU, PSC, ERC, ICV, PP + Partit MACBA (Museu d’Art Contemporani de Catalunya) Ajuntament de Manlleu the Catalan society Tricicle Companyia Teatral S.L Blau, Lliga antitaurina, Escons insubmisos…) Casal Lambda Ajuntament de Mollet del Vallès •City councils and local governmentsReial Club Deportiu Espanyol • Candidates (Montilla, Carod, Saura, Sirera…) Ajuntament de Montcada i Reixac Col·legi de Periodistes de Catalunya •Political Parties and Trade Unions Ajuntament d'Olot • Foundations (Jordi Pujol, Rafael Campalans…) Universitat de Barcelona Dagoll Dagom •Professional Associations Ajuntament de Ripoll • Parliament + Administration (eleccions2006.cat…) Ajuntament de Sabadell Col·legi de Fisioterapeutes de Catalunya •Universities and Culture Universitat Pompeu Fabra • Personal blogs (ciberpolítica de Joselito…) Vilaweb Ajuntament de Tarragona •Business Museu Episcopal de Vic • Mass media (e-noticies.com, vilaweb…) Ajuntament de Terrassa 170 agreements from 09/11/2006Ajuntament de Torroella de Montgrí Universitat Oberta de Catalunya 652 captures from 83 different webs Ajuntament de Tremp Ajuntament de Vilanova i la Geltrú Associacions Copyright Group EBLIDA Seminar Casal LambdaGranada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Uni ó de Consumidors de Catalunya ( UCC ) PADICAT PADICAT Repository and description Access • Description, through a permanent identifier and • On-line, open-access: www.padicat.cat Dublin Core metatags (descriptives and • Search for keyword, phrase and metatags techniques) • Legal aspects related to the open or restricted • Repository, in ARC format • Preservation, through the current and future acces to the library LAN, author’s copyright , technologies (migration, emulation…) limitations for copy and reproduction… 96% of files belong to Better “i’m sorry” than “please help us” standard formats (html, jpeg, gif, pdf…) Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 4 PADICAT PADICAT Accés Highlights • Beginning: June 2005, project: 2006-2008 • Hardware: 4 servers ProLiant DL360 G4p, Robot Scalar i2000, 10 TB (+ annual growth 10 TB) • Software: Heritrix, BAT, NutchWax, Wera, Wayback • Budget: 1.000.000 € (2006-08) • Members of Int. Internet Preservation Consortium •Professionals involved: 7 (librarians/technicians) + BC and CESCA teams, the Centre de Supercomputació de Catalunya Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 PADICAT PADICAT January 2009 April 2007 • Pioneer project in Spain and reference in Europe (regional web) and Latin-America • Several editions from 100.000 websites • 50 million files, in 30 TB • Agreements with 1.000 institutions • Online acces to the major part of the collection Copyright Group EBLIDA Seminar Granada, April 26th 2007 Copyright Group EBLIDA Seminar Granada, April 26th 2007 Thanks for your attention! www.padicat.cat Biblioteca de Catalunya Ciro Llueca, [email protected] Copyright Group EBLIDA Seminar Granada, April 26th 2007 5.