Capturing and Replaying Streaming Media in a Web Archive – a British Library Case Study
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arhiviranje Weba
Arhiviranje weba Lučić, Lucija Master's thesis / Diplomski rad 2020 Degree Grantor / Ustanova koja je dodijelila akademski / stručni stupanj: University of Zagreb, University of Zagreb, Faculty of Humanities and Social Sciences / Sveučilište u Zagrebu, Filozofski fakultet Permanent link / Trajna poveznica: https://urn.nsk.hr/urn:nbn:hr:131:448437 Rights / Prava: In copyright Download date / Datum preuzimanja: 2021-09-25 Repository / Repozitorij: ODRAZ - open repository of the University of Zagreb Faculty of Humanities and Social Sciences SVEUČILIŠTE U ZAGREBU FILOZOFSKI FAKULTET ODSJEK ZA INFORMACIJSKE I KOMUNIKACIJSKE ZNANOSTI SMJER ARHIVISTIKA Ak. god. 2019./2020. Lucija Lučić Arhiviranje weba Diplomski rad Mentor: prof. dr. sc. Hrvoje Stančić Zagreb, 2020. Izjava o akademskoj čestitosti Izjavljujem i svojim potpisom potvrđujem da je ovaj rad rezultat mog vlastitog rada koji se temelji na istraživanjima te objavljenoj i citiranoj literaturi. Izjavljujem da nijedan dio rada nije napisan na nedozvoljen način, odnosno da je prepisan iz necitiranog rada, te da nijedan dio rada ne krši bilo čija autorska prava. Također izjavljujem da nijedan dio rada nije korišten za bilo koji drugi rad u bilo kojoj drugoj visokoškolskoj, znanstvenoj ili obrazovnoj ustanovi. Sadržaj 1. Uvod ....................................................................................................................................... 3 2. Što je arhiviranje weba? ..................................................................................................... -
Web Archiving Supplementary Guidelines
LIBRARY OF CONGRESS COLLECTIONS POLICY STATEMENTS SUPPLEMENTARY GUIDELINES Web Archiving Contents I. Scope II. Current Practice III. Research Strengths IV. Collecting Policy I. Scope The Library's traditional functions of acquiring, cataloging, preserving and serving collection materials of historical importance to Congress and the American people extend to digital materials, including web sites. The Library acquires and makes permanently accessible born digital works that are playing an increasingly important role in the intellectual, commercial and creative life of the United States. Given the vast size and growing comprehensiveness of the Internet, as well as the short life‐span of much of its content, the Library must: (1) define the scope and priorities for its web collecting, and (2) develop partnerships and cooperative relationships required to continue fulfilling its vital historic mission in order to supplement the Library’s capacity. The contents of a web site may range from ephemeral social media content to digital versions of formal publications that are also available in print. Web archiving preserves as much of the web‐based user experience as technologically possible in order to provide future users accurate snapshots of what particular organizations and individuals presented on the archived sites at particular moments in time, including how the intellectual content (such as text) is framed by the web site implementation. The guidelines in this document apply to the Library’s effort to acquire web sites and related content via harvesting in‐house, through contract services and purchase. It also covers collaborative web archiving efforts with external groups, such as the International Internet Preservation Consortium (IIPC). -
Web Archiving and You Web Archiving and Us
Web Archiving and You Web Archiving and Us Amy Wickner University of Maryland Libraries Code4Lib 2018 Slides & Resources: https://osf.io/ex6ny/ Hello, thank you for this opportunity to talk about web archives and archiving. This talk is about what stakes the code4lib community might have in documenting particular experiences of the live web. In addition to slides, I’m leading with a list of material, tools, and trainings I read and relied on in putting this talk together. Despite the limited scope of the talk, I hope you’ll each find something of personal relevance to pursue further. “ the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use International Internet Preservation Coalition To begin, here’s how the International Internet Preservation Consortium or IIPC defines web archiving. Let’s break this down a little. “Collecting portions” means not collecting everything: there’s generally a process of selection. “Archival format” implies that long-term preservation and stewardship are the goals of collecting material from the web. And “serving the archives for access and use” implies a stewarding entity conceptually separate from the bodies of creators and users of archives. It also implies that there is no web archiving without access and use. As we go along, we’ll see examples that both reinforce and trouble these assumptions. A point of clarity about wording: when I say for example “critique,” “question,” or “trouble” as a verb, I mean inquiry rather than judgement or condemnation. we are collectors So, preambles mostly over. -
Cultural Heritage Digitisation, Online Accessibility and Digital Preservation
1 Cultural heritage Digitisation, online accessibility and digital preservation REPORT on the Implementation of Commission Recommendation 2011/711/EU 2013-2015 Cover image: Albert Edelfelt’s 'The Luxembourg Gardens, Paris', Finnish National Gallery. Source: europeana.eu Back cover image: Raphael's 'Sposalizio della Vergine', Pinacoteca di Brera (Milano). Source: europeana.eu Page | 2 EUROPEAN COMMISSION Directorate-General for Communications Networks, Content and Technology Page | 3 Implementation of Commission Recommendation on the digitisation and online accessibility of cultural material and digital preservation Progress report 2013-2015 Working document June 2016 Table of contents EXECUTIVE SUMMARY ............................................................................................................................ 6 1. DIGITISATION: ORGANISATION AND FUNDING ................................................................................ 10 1.1. Planning and monitoring digitisation ......................................................................................... 10 1.1.1. Schemes, quantitative targets and allocated budgets ........................................................ 11 Page | 4 1.1.2 National and European overviews of digitised cultural material ........................................ 14 1.2 Public - private partnerships ....................................................................................................... 16 1.3 Use of Structural Funds .............................................................................................................. -
情報管理 O U R Nal of Information Pr Ocessing and Managemen T December
JOHO KANRI 2009 vol.52 no.9 http://johokanri.jp/ J情報管理 o u r nal of Information Pr ocessing and Managemen t December 世界の知識の図書館を目指すInternet Archive 創設者Brewster Kahleへのインタビュー Internet Archive aims to build a library of world knowledge An interview with the founder, Brewster Kahle 時実 象一1 TOKIZANE Soichi1 1 愛知大学文学部(〒441-8522 愛知県豊橋市町畑町1-1)E-mail : [email protected] 1 Faculty of Letters, Aichi University (1-1 Machihata-cho Toyohashi-shi, Aichi 441-8522) 原稿受理(2009-09-25) (情報管理 52(9), 534-542) 著者抄録 Internet ArchiveはBrewster Kahleによって1996年に設立された非営利団体で,過去のインターネットWebサイトを保存し ているWayback Machineで知られているほか,動画,音楽,音声の電子アーカイブを公開し,またGoogleと同様書籍の電 子化を行っている。Wayback Machineは1996年からの5,000万サイトに対応する1,500億ページのデータを保存・公開し ている。書籍の電子化はScribeと呼ばれる独自開発の撮影機を用い,ボストン公共図書館などと協力して1日1,000冊の ペースで電子化している。電子化したデータを用いて子供たちに本を配るBookmobileという活動も行っている。Kahle氏 はGoogle Book Searchの和解に批判的な意見を述べているほか,孤児著作物の利用促進やOne Laptop Per Child(OLPC)運 動への協力も行っている。 キーワード Webアーカイブ,Wayback Machine,書籍電子化,Google Book Search,新アレキサンドリア図書館,Open Content Alliance,Open Book Alliance 1. はじめに Googleと同様書籍の電子化を行っている。インター ネットが一般に使えるようになったのが1995年で Internet Archive注1)はBrewster Kahle(ケールと発 あるから,Internet Archiveはインターネットとほぼ 音する)によって1996年に設立された非営利団体 同時に誕生したことになる。現在年間運営費は約 である。過去のインターネットW e bサイトを保存 1,000万ドルであり,政府や財団の補助や寄付で運 しているWayback Machine1)で知られているほか, 営している。この(2009年)5月にKahle氏(以下敬 534 JOHO KANRI 世界の知識の図書館を目指すInternet Archive 2009 vol.52 no.9 http://johokanri.jp/ J情報管理 o u r nal of Information Pr ocessing and Managemen t December 称略)を訪ね,インタビューを行ったので報告する A O Lに売却した。その売却益によって翌年I n t e r n e t (写真1)。 Archiveを立ち上げたのである。 K a h l eは1982年 に マ サ チ ュ ー セ ッ ツ 工 科 大 学 (Massachusetts Institute of Technology: MIT)のコン 2. Internet Archiveの事業 ピュータ科学工学科を卒業した。 2000年前エジプトのアレキサンドリアには当時 2.1 Wayback Machine 世界最大の図書館があり,パピルスに書かれた書物 I n t e r n e t A r c h i v eのホームページのU R Lはw w w . -
The UK Web Archive As a Source for the Contemporary History of Public Health Downloaded from Martin Gorsky*
Social History of Medicine Vol. 28, No. 3 pp. 596–616 Sources and Resources Into the Dark Domain: The UK Web Archive as a Source for the Contemporary History of Public Health Downloaded from Martin Gorsky* Summary. With the migration of the written record from paper to digital format, archivists and histor- http://shm.oxfordjournals.org/ ians must urgently consider how web content should be conserved, retrieved and analysed. The British Library has recently acquired a large number of UK domain websites, captured 1996–2010, which is col- loquially termed the Dark Domain Archive while technical issues surrounding user access are resolved. This article reports the results of an invited pilot project that explores methodological issues surrounding use of this archive. It asks how the relationship between UK public health and local government was represented on the web, drawing on the ‘declinist’ historiography to frame its questions. It points up some difficulties in developing an aggregate picture of web content due to duplication of sites. It also highlights their potential for thematic and discourse analysis, using both text and image, illustrated through an argument about the contradictory rationale for public health policy under New Labour. at London School of Hygiene & Tropical Medicine on March 8, 2016 Keywords: methodology; websites; local government; public health Introduction: the Dark Domain Archive Public health researchers have long been interested in the internet as a medium of communi- cation, focusing inter alia on web tools for data gathering, on dissemination of health mes- sages and on the networking potential which the World Wide Web provides.1 Facilities for the capture of websites, for example, are now routinely included in qualitative data analysis software such as NVivo. -
Web Archiving in the UK: Current
CITY UNIVERSITY LONDON Web Archiving in the UK: Current Developments and Reflections for the Future Aquiles Ratti Alencar Brayner January 2011 Submitted in partial fulfillment of the requirements for the degree of MSc in Library Science Supervisor: Lyn Robinson 1 Abstract This work presents a brief overview on the history of Web archiving projects in some English speaking countries, paying particular attention to the development and main problems faced by the UK Web Archive Consortium (UKWAC) and UK Web Archive partnership in Britain. It highlights, particularly, the changeable nature of Web pages through constant content removal and/or alteration and the evolving technological innovations brought recently by Web 2.0 applications, discussing how these factors have an impact on Web archiving projects. It also examines different collecting approaches, harvesting software limitations and how the current copyright and deposit regulations in the UK covering digital contents are failing to support Web archive projects in the country. From the perspective of users’ access, this dissertation offers an analysis of UK Web archive interfaces identifying their main drawbacks and suggesting how these could be further improved in order to better respond to users’ information needs and access to archived Web content. 2 Table of Contents Abstract 2 Acknowledgements 5 Introduction 6 Part I: Current situation of Web archives 9 1. Development in Web archiving 10 2. Web archiving: approaches and models 14 3. Harvesting and preservation of Web content 19 3.1 The UK Web space 19 3.2 The changeable nature of Web pages 20 3.3 The evolution of Web 2.0 applications 23 4. -
'Non-Published' Outputs of Research Datacite
Towards systemic solutions for preservation and archiving, citation and referencing systems for data and web-based ‘non-published’ outputs of research DataCite Data sharing and re-use are becoming increasingly central to the research process, meaning that it is vital to have effective tools to find, access and use that data. DataCite is a global network of national libraries, data centres and other research organisations that work to increase the recognition of data as a legitimate, citable contribution to scholarly record. DataCite provides Digital Object Identifiers (DOIs) for data sets and other non-traditional research outputs. DOI assignment helps to make data persistently identifiable and citable. DataCite chooses to work with DOIs because they have number of features that make them more suitable than other persistent identifiers1. For example, they are already well-established for research publications, recognised as an ISO standard and are centrally managed by the International DOI Foundation (IDF). The British Library is one of the founding members of DataCite and provides UK-based organisations with the ability to mint DOIs for their data. The British Library provides a web interface for DOI creation and the Metadata Store, which is a centralised database that makes data easy to find and access. The infrastructure and governance provided by the IDF and DataCite means that there are standards, policies and guidelines that allow for data citation to take place. However, this technical infrastructure cannot function in isolation. The participating universities and research organisations have to have their own repository infrastructure, senior level buy-in, policies and skilled staff in place to ensure that they can meet the requirements for providing persistent access to data. -
Web Archives for the Analog Archivist: Using Webpages Archived by the Internet Archive to Improve Processing and Description
Journal of Western Archives Volume 9 Issue 1 Article 6 2018 Web Archives for the Analog Archivist: Using Webpages Archived by the Internet Archive to Improve Processing and Description Aleksandr Gelfand New-York Historical Society, [email protected] Follow this and additional works at: https://digitalcommons.usu.edu/westernarchives Part of the Archival Science Commons Recommended Citation Gelfand, Aleksandr (2018) "Web Archives for the Analog Archivist: Using Webpages Archived by the Internet Archive to Improve Processing and Description," Journal of Western Archives: Vol. 9 : Iss. 1 , Article 6. DOI: https://doi.org/10.26077/4944-b95b Available at: https://digitalcommons.usu.edu/westernarchives/vol9/iss1/6 This Case Study is brought to you for free and open access by the Journals at DigitalCommons@USU. It has been accepted for inclusion in Journal of Western Archives by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. Gelfand: Web Archives for the Analog Archivist Web Archives for the Analog Archivist: Using Webpages Archived by the Internet Archive to Improve Processing and Description Aleksandr Gelfand ABSTRACT Twenty years ago the Internet Archive was founded with the wide-ranging mission of providing universal access to all knowledge. In the two decades since, that organization has captured and made accessible over 150 billion websites. By incorporating the use of Internet Archive's Wayback Machine into their workflows, archivists working primarily with analog records may enhance their ability in such tasks as the construction of a processing plan, the creation of more accurate historical descriptions for finding aids, and potentially be able to provide better reference services to their patrons. -
Wayback Machine Instructions
Revitalizing Old Links You will need: An Internet connection The URL of an article or other content formerly posted on a Phoenix Media Group website Step 1: Copy the old URL of your selected content. Step 2: Navigate to the WayBack Machine: https://archive.org/web Step 3: Input the copied URL of your content into the search box on the Wayback Machine home page. Hit Enter or click “Browse History”. Step 4: The WayBack Machine will retrieve a page that displays the original URL of your content, the number of times that content has been captured by the WayBack Machine, and a calendar. If you would like to access the most recently captured version of the content, click the most recent “snapshot” date on the calendar (dates on which the content was captured are marked with blue circles). If the content has been edited or changed and you would like to access an older saved version, navigate through the calendar to your desired “snapshot” date. Select the most recent save date to view the most recently saved Select the year, then version of the month and date to view content. an older version of the content. Black bars indicate years in which the content was captured. Step 5: The WayBack Machine will redirect you to the preserved snapshot of the page from the date you selected. Please note that some original content, such as advertisements, JavaScript or Flash videos, may not be saved. This URL directs to the saved version of the material; replace any outdated links to the content with this URL. -
The Future of Web Citation Practices
City University of New York (CUNY) CUNY Academic Works Publications and Research John Jay College of Criminal Justice 2016 The Future of Web Citation Practices Robin Camille Davis CUNY John Jay College How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/jj_pubs/117 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] This is an electronic version (preprint) of an article published in Behavioral and Social Sciences Librarian. The article is available online at http://dx.doi.org/10.1080/01639269.2016.1241122 (subscription required). Full citation: Davis, R. “The Future of Web Citation Practices (Internet Connection Column).” Behavioral & Social Sciences Librarian 35.3 (2016): 128-134. Web. Internet Connection The Future of Web Citation Practices Robin Camille Davis Abstract: Citing webpages has been a common practice in scholarly publications for nearly two decades as the Web evolved into a major information source. But over the years, more and more bibliographies have suffered from “reference rot”: cited URLs are broken links or point to a page that no longer contains the content the author originally cited. In this column, I look at several studies showing how reference rot has affected different academic disciplines. I also examine citation styles’ approach to citing web sources. I then turn to emerging web citation practices: Perma, a “freemium” web archiving service specifically for citation; and the Internet Archive, the largest web archive. Introduction There’s a new twist on an old proverb: You cannot step in the same Web twice. -
The Dbpedia Wayback Machine ∗
The DBpedia Wayback Machine ∗ Javier D. Fernández Patrik Schneider Jürgen Umbrich Vienna University of Vienna University of Vienna University of Economics and Business, Economics and Business, Economics and Business, Vienna, Austria Vienna, Austria Vienna, Austria [email protected] [email protected] [email protected] ABSTRACT Web and provides these time-preserved website snapshots DBpedia is one of the biggest and most important focal point for open use. The wayback mechanism offers access to the of the Linked Open Data movement. However, in spite of its history of a document and enables several new use cases and multiple services, it lacks a wayback mechanism to retrieve supports different research areas. For instance, one can ex- historical versions of resources at a given timestamp in the tract historical data and capture the “Zeitgeist” of a culture, past, thus preventing systems to work on the full history or use the data to study evolutionary patterns. Previous of RDF documents. In this paper, we present a framework works on providing time-travel mechanism in DBpedia, such that serves this mechanism and is publicly offered through as [6], focus on archiving different snapshots of the dumps and serve these by means of the memento protocol (RFC a Web UI and a RESTful API, following the Linked Open 4 Data principles. 7089) , an HTTP content negotiation to navigate trough available snapshots. Although thisa perfectly merges the ex- isting time-travel solution for WWW and DBpedia dumps, 1. INTRODUCTION it is certainly restricted to pre-fetched versions than must DBpedia is one of the biggest cross-domain datasets in be archived beforehand.