<<

Hi! Good morning, all, and thanks for joining us. I’m Karl Blumenthal. I’m a web archivist for the Internet ’s “Archive-It” service and partnership community. And to begin our discussion of of collaborative web archiving I’d like to introduce a little bit of web archiving’s history and how in fact it was collaboration among many different archivists, technologists, and organizations that made the practice what it is today, and indeed how the lessons learned from that early collaboration are just as vital and important to new web archivists and their subjects today as the ever were, which I think Amy and Sam can then demonstrate in even more living color.

So before we dig any deeper into this topic we can first just agree on some specific terminology. What we mean when we say “web archiving” is something like this: its the process of collecting, preserving, and ultimately enabling end-user patron access to materials originally published to the web. There are myriad reasons why and perform this labor, but in general, you may find: that the materials you have traditionally collected in print, bound and serial forms, have increasingly shifted to a web-based publishing paradigm--that local organization or academic department might no longer send you their materials on paper but instead may share it all online; and indeed your organization itself may need to meet its own records retention mandate by preserving materials only published to its website or even the website itself; increasingly web archiving is a means to preserve and provide enduring access to events and conversations that exist entirely online, like movements with social media presences. And whatever their specific goal, each web archivist engaged in this work as a result mitigates the threat of what we call Link Rot, the loss of content found at the other end of live links, such as ​ ​ increasingly appear in the citations of journal articles, book chapters, even court decisions, so that everyone can still find what they’re looking for online instead of those universally dreaded “404: File not found” error messages.

This work different precise forms, based on the needs and goals, but more often than not it looks generally something like this: an archivist or selector of some kind at a computer terminal identifies a website or web page that they want to collect; they acquire it using a software--not always but most frequently a web crawler--which deposits it into local or networked storage--again not always but most commonly in the form of what we call a Web ARChive or “WARC” file; that file can thereafter be read through a browser-based software that knows how to interpret WARC files--just as you would use Microsoft Word or the like to read a DOC file, and renders them the way we would expect to see and browse through them as they appeared at the time that they were archived.

Now, if you’ve never done any web archiving yourself but you’ve heard of it before, it might be because of my organization, the Internet Archive. Since 1996 we’ve been a non-profit digital based in San Francisco--this is our actual headquarters in an old converted (but not very converted!) Christian Science church near Golden Gate Park. And from here we host and serve millions of books, movies, audio recordings…

...software and games. Increasingly we’re collecting born-digital artifacts and even the broader software environments that are necessary to access them like we might have originally done at a school computer lab thirty years ago.

But we’re likely most universally known to this day for the Wayback Machine. That’s our web-spanning archive--one of those rendering software’s that I mentioned a moment ago--at archive.org/web, which provides access to the web as we’ve collected and preserved it since ​ 1996.

And you can find all sorts of interesting stuff in there! Say for instance I wanted to know what the Archivists Round Table of were up to in 2000, just from the comfort of my laptop or tablet, I can do that now.

New England Archivists, I see you too. All the way back too 1996, this time! So the Wayack Machine is obviously pretty useful in its way; it’s a vital resource to historians of technology and the web and its cultures in particular, of course. Increasingly though it’s also become singularly important to journalists, activists, government watch dogs, artists, and really anyone invested in not throwing web-published information down the memory hole when it becomes politically or socially inconvenient to provide. I’ll take a little bit further about that one in just a moment. But in the meantime really it’s proven its value in countless individual stories of writers long since disconnected from their old and shuttered blog sites, students and teachers looking for that long lost course syllabus -- all of those things that deserved but didn’t have an archive. There are myriad stories like these and they begin to stretch the Wayback Machine to its boundaries. The trove is vast, to be sure--some 500+ Billion individual URLs preserved over the last 22 years. And still, which that vast extent, it can present an incomplete or unsatisfying record. And in fact you can summarize its limitations as a function of not having enough archivists. What we long had here was a reflection of as much as we could gather from the entire expanse of the web and by using highly automated web crawling tools. What it lacks then is depth--oftentimes we don’t have more than a site’s homepage--or coherence--there’s no regular, predictable frequency at which you can rely on seeing captures of the URL most important to you; to this day the only really reliable way to find a preserved URL is to go straight to it, there’s no arrangement or description of holdings. These are all perfect demonstrations of what distinguished a back-up like Wayback, from an archive, like the people in this room would create--something that has been critically curated to contain the right content, in an order and with the context needed to understand what it is and means.

By about 2005 we’d found that there was a critical mass of archivists who needed something similarly modeled but under their own intellectual control. And with 10 pilot partners to help build and beta test it we developed what came to be know as “Archive-It,” a suite of software and a non-profit partnership model that enabled archivists to create their own web archives--their own miniature versions of the Wayback Machine, if you like, but they could enrich that corpus too in the process--with the tools to decide precisely what gets acquired and when, to arrange and describe those holdings and to ultimately enable access to them through a front-end web-based interface, and to maintain and manage multiple redundant copies of those WARC files in their network and locally for preservation purposes, since as we know “lots of copies keeps stuff safe.” We started with the geographically, thematically, and professionally diverse group that you see here.

Just for example our early partners in state archives and libraries succeeded in collecting and preserving government records that had no print analog, such as the website of the Bob McDonnell campaign and later administration in , over on the left. The later web archive of Governor ’s administration got a lot of attention and use some years later, as you might imagine. And these archivists were even in fact the first to help us at the Internet Archive build out our capacity for capturing social media. The State Archives and Library of North Carolina, for instance, were keen to preserve Governor bev Purdue’s communication with constituents over and Twitter since she was likewise among the first to really heavily invest in those notoriously ephemeral or walled-off platforms to reach them. You can now find these and many more captures in the Wayback Machine, and even better you can browse through or search for them on our website, archive-it.org. Just like a real archive! ​ ​

It worked! From a few pilot partners the community of web archivists has grown to now--since I made this graph a couple of weeks ago it topped 600 different organizations and institutions. Each one contributes to the self-sustaining non-profit model by paying to keep the lights on--that’s what it takes to keep copies of their collections online from our data centers in the bay area, or through tool development to make sure that our crawlers and rendering tools are the best that they can be, or by doing further outreach of their own. From the smallest one- or two-person non-profit to the big R1 universities, each in the process is empowered to fulfill its own specific mission by using the shared technology stack to collect those unique materials that it knows best, be that the institutional records of a small town or municipality...

...or J.R.R. Tolkien fandom, if you happen to be Marquette University. It’s a model that’s worked for services with similar goals like HathiTrust and which we’ve seen increasingly in the digital preservation realm, as archivists across institutions decide that it’s really better to share infrastructure than to always roll their own.

Very early in Archive-It’s life though, there began interest not only sharing the hardware and software to build web archives, but indeed the appraisal and selection responsibilities; to use good old fashioned archival practice, documentation strategy in this case, to again make web archiving achievable at what would otherwise be an unattainable scale. Most vividly you can see how this brought archivists together in rapid response efforts--breaking news events, the records of which were posted to this very fragile internet infrastructure and have no long term owner or steward. Roger Christman of the Library of Virginia just this month wrote a great (and ​ short) blog post on this topic and in particular the 2007 shooting incident that ​ inspired his and his regional collaborators’ efforts for the National Council of Public Historians. I recommend it.

Closer to home: by April 2013--we’re coming up on the fifth anniversary--we the Archive-It staff were regularly creating collections through our platform that multiple partners and even subject , other experts, or the lay public outside of our immediate community, could contribute to, such as this, the Boston Marathon Bombing collection--an example of an event so affected ​ ​ by the web this time that its role really needed to be preserved in that medium for future understanding. Just on a personal note I’ll say too that from my experience on things like the Pulse Nightclub collection, this can as you might imagine be some of the more emotionally laborious work done in digital archives these days; the weight of these acquisitions can be too much for any one archivist to bear after a while, if you know what I mean, so it’s definitely better to do with friends.

We’ll talk much more about rapid response practices and strategies when the Web Archiving Section meets at SAA this year, so do please join us for that if you need to prepare. Documentation strategy, in the meantime, is maturing in web archiving beyond triage needs. In 2017 for instance we saw a great volume of interest in archivists working together and with their patrons to prevent the loss of web-native resources with the transition in the federal government. And after several DataRescue guerilla archiving events, a core of Archive-It partners met at our annual meeting in order to first start discussing and imagining what a long term model for collecting climate data and related resources would look like if each could take responsibility for a thematically complementary sector and make it part of their routine collecting process. I recommend reading Laura Alagna’s Archive-It blog post about this idea if you want to ​ ​ jump in and participate yourself.

A lot of collaboration within the Archive-It community just naturally extends from long-standing partnerships in libraries and archives though--simply extending it into this new, additional collecting area. I feel like I can’t escape it myself since my first professional work in archives was done for the Tri-College Consortium of Bryn Mawr, Haverford, and Swarthmore Colleges, and yes these three share an Archive-It account and its appraisal and selection needs instead of running their own. And I learned web archiving for myself on a short-term gig for the New York Art Resources Consortium, or NYARC, the partnership between art and museum libraries in New York that likewise makes web archives together just as it otherwise seeks to collaboratively steward on, for instance, the exhibition history of New York City in other media. This, for example, is a capture that NYARC made of the now closed Laurel Gitlen gallery on Manhattan’s Lower East Side. Shameless self-promotion alert here, but: for a comprehensive description of the whole web archiving lifecycle and how NYARC’s approach builds collaboration among its members into each stage, you can check out the Art Libraries Journal ​ ​ article that I co-authored with NYARC’s Sumitra Duncan. It’s openly available as a pre-print from ​ the super amazing LISSA open archive of LIS scholarship.

And of course I’d be remiss if I didn’t likewise recommend a similar open access article by the ​ member institutions of the Kansas Archive-It Consortium, or KAIC(!), in the Journal of Western ​ ​ Archives. It’s easy to find because I stole the name of this presentation--Collaboration made it ​ happen!--wholesale from them and from that article. KAIC consists of four university archives or manuscript libraries and the state historical society, responsible for captures like this one, and divvies up responsibilities for Kansan collecting areas, but also provides a nice model of intra-community education and training, as members regularly meet to discuss strategy and learn from one another’s experience archiving websites that they all understand and value.

Which kind of brings us back to the original idea: empowering sometimes hyperlocal collecting with vastly shared infrastructure and technical resources. This has enabled web archiving in lots of academic and government archives--these still make up the vast majority of Archive-It partners and web archivists in general. But we’ve found over the years that it hasn’t been enough to kickstart the collecting efforts in the local history collections of independent historical societies, art and museum libraries, and public libraries across the country. And this is not because those institutions don’t see the value, that they don’t want to preserve web-published local history, but stretched thin as they are and perceiving a relatively higher barrier to access when it comes to learning whole new technologies and appraisal challenges, they haven’t taken that first meaningful step to get started with web archiving. So, in the spirit of cohorts like KAIC, what we’re doing now at the Internet Archive is bringing together these stewards from across the country to strategize together, to learn from one another’s experience, and to bring the practice directly to their stakeholders--to educate their patrons and donors about this field so that they can participate with them in it. With generous funding from IMLS and the Kahle Austin Foundation, we started the Community Webs project in 2017. Community Webs brought ​ ​ together stewards from 27 public libraries across the country--here they are at their kickoff in San Francisco in November--libraries from the multi-dozen branch Queens to the little Patagonia Public library, serving a community of 900 in Arizona. And what they’ve gotten since is customized training, which will soon be available to all as open educational resources, Archive-It accounts and all of the good data storage and direct technical support that those come with. And at this point many of them are transitioning to their public outreach and stakeholder engagement phases--putting on public programs, workshops, or just working one-on-one with people in their community who own or can nominate important local history resources to be collected and preserved. In the coming weeks we’ll start to publish some of their individual stories on the Archive-It blog, so keep an eye on that one if you’re interested in emulating the model in your library or with a community of peers.

And Amy can speak in much more detail about this from the archivist’s perspective, but one thing that I love about the Community Webs project is that it prominently recenters the web archiving conversation on archivists working directly with community members in the selection, acquisition, and preservation of their creations and voices online, lest we ever focus too much on the technologies capacity to just capture everything and sort through the ethical implications only later. Just one such effort in this realm that I want to highlight before we move on: the community web archive facilitated by Patrick Wallace and his team at Middlebury College, which employs students directly in the curation of their archive. Patrick wrote an absolute fire blog post ​ on this for us called “Unauthorized voices in the archive” last year and I can’t recommend it highly enough if student presence online is now or is becoming part of your collecting scope.

Many hands and eyes: Campus collaboration through web archives

Amy Wickner NEA / ART Spring 2018 https://osf.io/jge85/

Today, I’ll discuss some examples of using web archives as a way to spark and strengthen collaboration around archival issues. Each example highlights a different way in which web archives and archiving extended my understanding of a core archival activity, as well as responsibilities that creators, subjects, and users of web archives have to one another. creators users subjects … https://osf.io/ex6ny/

I gave a talk at Code4Lib last month about DIY web archiving and how that particular community of library and archives technologists have a stake in web archiving as creators, users, and subjects of web archives. This is not that talk. However, I do want to re-emphasize the trifecta of creators, users, and subjects to move away from an archivist-archived dichotomy and also to recognize that these are roles or personas that people, including archivists, take on at different times, often in overlapping ways. Collaborating for appraisal

As a content warning, in this section I will mention several acts of hate and violence that have taken place on the University of Maryland, College Park, and other campuses. Last May 20, Richard Collins III, a senior at Bowie State University, an HBU in Bowie, Md., was stabbed and killed by a stranger while visiting friends at the University of Maryland, College Park campus. His killer, a College Park student, was charged with a hate crime last October based in part on digital evidence such as smartphone data and social media activity. At the time of the murder, which received a great deal of media coverage, he was reported to have participated in white supremacist Facebook groups. In the days following, UMD students began conversations on Twitter about Lt. Collins’ murder, but also other numerous incidents on campus throughout 2016 and 17 in which members of the university community put racial, ethnic, and ableist slurs and hate symbols in public places. They talked about being made to feel unsafe and unwelcome through and microaggressions by their peers, their instructors, administrators, and police of multiple jurisdictions. Some used the hashtag #feartheturtle to organize the Twitter dialogue around this topic. #feartheturtle is ordinarily used to rally around sports teams; our mascot is the terrapin. Consider the poignancy and pointedness of co-opting athletics boosterism to tell personal stories about the violence of institutionalized .

Image description: Archive-It capture of an article published in The Diamondback (University of Maryland student newspaper) titled, “UMPD is working with FBI to determine whether Saturday’s homicide was a hate crime.” We in and University Archives share a building with the Maryland Institute for Technology in the Humanities, or MITH, and the program for African American History, Culture, and Digital Humanities, or AADHUM. As this conversation emerged over #feartheturtle, Catherine Knight Steele, who directs AADHUM, was very interested in collecting those tweets to document what was happening. Ed Summers, a developer at MITH, set up a running collection of tweets that used #feartheturtle and/or mentioned Richard Collins. Capturing media coverage of the event was another interest for Catherine, who’s a communications scholar. So, to protect against link rot in the tweet dataset, and to approximate a kind of community appraisal, Ed shared with me a list of top URLs shared in those tweets, which I added and crawled as seeds in Archive-It and also recorded with WebRecorder.

Image description: Screenshot of a Google Sheet titled, “#feartheturtle.” Conversations we are still having

What happens to the tweet data?

Who else is creating online documentation?

How can we connect these web archives with the University’s past, present, future?

Big questions remain about this material. What happens to the tweet data? The tweet IDs -- not the entire and extensive metadata of each tweet -- are stored privately at MITH and, given that they were collected without consent, a larger conversation about privacy and access is merited should Special Collections and University Archives, my department, one day take over stewardship.

The top links shared are almost entirely news sites of various kinds; but who else is creating web-based material that reflects or responds to these student concerns? As we think about approaching these creators to explain why we are interested in working with them to build web archives, I’m thinking in large part of our students, staff, faculty, and other community members of color; but should I also be looking to document online hate speech around these incidents? I certainly read the comments on news coverage and made sure to capture them. But would it be dishonest to take some kind of “both sides” approach, or to document expressions of hate as isolated incidents? These are not questions I can answer on my own.

Whatever comes of asking them, it’s going to mean reconfiguring our normal approach to web archives, which at Maryland is mostly kind of Jenkinsonian: if we have your records, we’ll crawl your website, no questions asked. Other repositories have more of an acquisitive approach, in line with the collecting tradition of American historical societies and libraries. In reflecting on the “rapid response” nature of some web archiving, Roger Christman at the Library of Virginia recently wrote about the need for a collaborative, public history approach to documenting events like the 2007 shooting at Virginia Tech. This means understanding that what we see as “history” or “events” are ongoing, everyday realities for the people impacted. It means incorporating multiple perspectives on what to document, including looking outside of department and institution for collaborators and ways to contextualize what has happened. Building these relationships, earning trust, and sharing appraisal power is going to mean a major reconsideration of how we appraise in general. https://wayback.archive-it.org/2410/20120623040942/http://www.odi.umd.edu/divtimeline/index.html; http://cdm16064.contentdm.oclc.org/cdm/ref/collection/p266901coll7/id/2614

Going further, how can the web archives we have connect students with the University's past, present, and future? What we do right now is document official messaging. The administration did this nice, triumphalist kind of timeline in 2004 or 2005. In 2008, undergraduates in a UMD history class spent two semesters researching and writing a report on the role of slavery in the university's founding and growth. Interestingly, that’s not in our web archives. I also see our campus today struggling to help students of color feel safe -- not in a “no one will disagree with you” way but in a basic, fundamental, “you do not need to perpetually fear for your physical and mental well-being” way. Of course there is plenty of on that score, including by the very people charged with supporting diversity and inclusion on campus. If anyone in the room has read Sara Ahmed, for example, this will not surprise you. Could it be that, having done a certain amount of digging into the past, this campus now finds itself incapable of going further?

Image description: (left) Archive-It capture of a web page titled, “University of Maryland Diversity Timeline,” with sections titled “Timeline Overview,” “1856-1906,” “1907-1956,” and “1957-Present”; (right) cover page of Knowing Our History: African American Slavery & the University of Maryland. http://wayback.archive-it.org/2410/20170808213340/https://umd.edu/umdreflects

Thinking now about how web archives could help bring such an investigation forward into the present. The past year has seen public fora, meetings, communications and miscommunications stemming from Lt. Collins’ murder and the campus community response. The President and University Senate appointed a task force with numerous subcommittees and a full slate of charges, one outcome of which has been a Hate-Bias Protocol for incident reporting and investigation. The website you see here, crawled last June, has since been rebranded to say We Are UMD, which is an interesting rhetorical shift.

Image description: Archive-It capture of a web page titled, “UMD Reflects.” http://wayback.archive-it.org/2410/20170620152554/http://www.dbknews.com/protect-umd-demands/

The web archives we have cannot say for 100% sure how these administrative moves come to be. They mostly document official language and news media responses. However, we can do better at documenting a number of angles and perspectives on the process, including how the outputs of governance respond to community positions and vice versa. We can create a framework that supports examining and pushing at a variety of university discourses.

Image description: Archive-It capture of a web page titled, “64 Demands for New Programs, Resources and Initiatives.” Collaborating for learning & teaching

Part of building such a framework is supporting the growth of responsible, critical archivists. I’ll next talk about two examples of learning & teaching with web archiving in the classroom. I was lucky enough to be introduced to Archive-It as a student in Dr. Kari Kraus’ Intro to Digital Humanities class. We read stories that prefigured emerging technologies, and played with the technologies by using them to compose reading responses. So for example we created a diorama of 3D-printed creatures in response to the Philip K. Dick stories, “The Preserving Machine” and “Pay the Printer” …

Image description: (left) Photo of a 3D printer in action; (right) photo of a red and white 3D printed crayfish. ←_https://wayback.archive-it.org/5 556/20150407203651/https://fallin gdust.com/promo/

https://wayback.archive-it.org/555 6/20150407203552/https://falling dust.com/ →

… and, for a final project, documented an alternate reality, citizen science game called DUST by crawling various aspects of its web presence. As with each of these exercises, learning with web archiving was a way to make the web strange (again), to poke and prod at how things are put together and with those findings develop a sense of how a website performs or should perform in a preserved state. These are the experiential, some might even say affective elements of appraisal that are really hard to put into words, much less a collection policy. Web archiving has made me think that a truly equitable profession or practice is one that opens experiences. We should all get to enjoy futzing with regular expressions if that’s what we want out of working in archives.

Image description: (left, back) Archive-It capture of the dynamic trailer for DUST, an alternate reality citizen science game; (right, front) Archive-It capture of the homepage for DUST. Teaching with web archiving

Developing a documentation strategy

Phronesis (practical wisdom)

Technical constraints

Archival theory

I’m now in a position to be teaching with web archiving, this time in a class called “Documentation, collection, and appraisal of records.” I’m co-teaching with Dr. Ricky Punzalan at the UMD iSchool, and we have a class of 12 masters students focusing on archives. For their penultimate assignment, we’re going to ask them to develop a compressed documentation strategy. They’ll identify a topic, place, or event; key stakeholder communities around that topic; and get a sense of who else might be web archiving on the issue. Based on those findings, which are meant to approximate and appraise the universe of records creators, they’ll identify potential websites, social media feeds, and other web-based material that could form part of a collection. They’ll winnow that list down further, test crawl and scope in Archive-It, and practice recording with WebRecorder, comparing results in terms of performance. The final products will be an web archive collection and written reflection.

Unfortunately, I don’t yet have lessons learned to share, as the assignment goes live April 10. But I anticipate that, at various times during the four weeks they have with this assignment, students will likely find that their interactions with technology pushes appraisal a little bit out of the picture. It’s hard to learn a new technology at the same time that you’re learning new archival techniques like which of 20-plus ways to understand archival value seem appropriate for a given project, and how to be a good collaborator. It’s also hard to evaluate work that’s 90% new to you. What is quality assurance? What does quality even look like? I want our students to one day be able to see the fine line they walk between practiced judgement, technical constraints, and archival theory -- it just may not happen this semester. Collaborating for outreach

In addition to teaching and learning with web archives, I’m looking at how they behave as potential vehicles for outreach. I’ll give two examples, one of which is so far successful, and one that definitely did not work out. archivable websites records management sustainability https://www.lib.umd.edu/terps-publish

Terps Publish is a cross-departmental collaboration within the University Libraries, now in its second year. It was established by my colleague Kate Dohe in Digital Programs and Initiatives. Terps Publish is a spring event that includes a roundtable for student publishers on campus to compare notes and learn from one another, followed by a recruitment fair for anyone interested in promoting or joining a student publication. This year we’re looking ahead to try and keep this engagement going year-round. One idea is offering an incentive for students to produce content for a shared resource, like a UMD Student Publishers’ Guide or How to Run a Student Publication at Maryland. We’re noticing, in conversation with the students who are helping plan the event, that succession planning, continuity, and archiving are major hurdles for publications at any stage of development. As part of co-producing this guide with whichever students are interested, I’m working on an accessible introduction to web archiving, guide to building archivable websites, and some general advice for managing digital stuff. I’m also trying to gauge the utility of offering Web Archiving as a Service -- WAaaS -- and how to weigh the irritation that phrase causes me against its potential as a lever for introducing helpful organizational practices. Rather than a repository, could the University Archives be a partner in student publishing, the way Digital Programs & Initiatives already is?

Image description: Logo for Terps Publish, featuring a terrapin with eyeshade using a typewriter. https://twitter.com/HornbakeLibrary/status/965683390723870721; https://twitter.com/PrangeColl/status/966387182884806656

Next, I’d like to say a few words about the exhibit that wasn’t. My department recently hosted a pop-up museum about “activism centered around people of color,” in partnership with a campus-wide initiative called Rise Above -isms. If you’re not familiar, a pop-up museum lasts for a short period of time, maybe an afternoon, and asks visitors to bring an object, write a brief description, and stick around to see and discuss the other contributions. Rightly concerned that people would not find their way to the museum location, nor bring objects to share, my University Archives colleagues also developed a miniature exhibit of material from existing collections: publications by black student groups; photographs of campus protests in the 60s, 70s, 80s, and 90s; yearbooks; signs from the 2017 women’s march, things like that. I decided to see if we had comparable material in our web archives, which date back to 2009. Surely there had been campus protests in the intervening 9 years.

Image description: (left) Flyer promoting a “Pop Up Museum Celebrating Activism”; (right) photo of participants in the pop-up museum. https://wayback.archive-it.org /2410/20170814094121/http:// wmuc.umd.edu/node/657

I did find evidence of numerous protests, mostly through articles in the student newspaper. People marched on the administration building to protest campus police use of pepper spray and night sticks, and in support of reinstating an assistant provost for equity and diversity. Members of Occupy Wall Street ate at our food co-op on their way from New York to DC. The rally promoted here was a response to police brutality in 2014.

But right away I noticed how useless our web archives currently are for understanding university history, especially when you don’t know much about how the university or its websites are organized. Subjects, where assigned, exist at collection rather than seed or document level; search results are voluminous and hard to parse; we have no research guides; there’s no guarantee that anything is there to be found, nor warning that something is not there but should be. In web archives it may be more complicated than ever to understand the limits of what was collected, and the difference between something that was not collected versus never existed in the first place. The Wayback Machine banner helps a bit if you’re already familiar, directing users to all publicly available captures of a given URL. However, these user interface features don’t resolve the challenges facing, for example, a student looking for information about campus activism of the past, why it took place, and evidence of its outcomes. Should someone in the future want to know more about Richard Collins, #feartheturtle, and the Joint President/Senate Inclusion & Respect Task Force, what do we want to be able to tell them? And very little of this complexity seems practical or even responsible to try and build into a pop-up exhibit, yanked almost entirely out of context. Image description: Archive-It capture of a WMUC College Park Radio web page titled, “HIPHOP YOGA LIVE! POLICE PROTEST! NONVIOLENT THRU DOPE MUSIC!” Collaborating for access

So I think there is a real call for web archivists to partner with people who know landscapes and subjects pretty deeply, and this is where that trifecta of creators, users, and subjects becomes important once again. I want to end with two ideas that I feel specifically demands this kind of collaboration. Critical web archives curriculum

Partners who … Who put this here? Why?

● know the landscape / topic of Who did the work? documentation ● understands community needs What is not here? around web archives How could things be ● will help think through ... different?

First, let’s look for partners not just as selectors or contributors, but perhaps as co-developers of curricula for navigating web archives. Such curricula would need to address, for example, how web archives work; how they’re created; how to use them; where else they exist; and what’s not in them. We’ll need to find partners who understand both the superficial and deep needs with which different individuals or communities might approach web archives. What are they looking for? Do they care specifically about web-ness of these archives? What additional knowledge, challenges, or expectations might they bring with them? The curriculum might also look at how it could all be done differently, to imagine a web archives that meets your particular community’s needs. Honest description for web archives

Jennifer Douglas, “Toward More Honest Description,” The American Archivist 79, no. 1 (Spring/Summer 2016): 26-55. doi:10.17723/0360-9081.79.1.26.

How are web archives shaped by ... How can description …

● the process of archiving? ● help surface and connect web ● custodians & intermediaries? archives collections? ● archivists? ● reveal decision-making, mistakes, constraints, concealed contexts?

As defined by Jennifer Douglas, honest description critically assesses and reveals how archives are shaped by the process of archiving, by custodians and intermediaries, and by archivists. Description helps surface and link collections, but it can also make explicit the decision-making, mistakes, constraints, and other often concealed contexts of documentation. Honest description, well maintained, is needed to support ethical web archives.

Honest description anticipates and answers questions like “Who put this here and why.” It addresses gaps and the reasons they exist. It contextualizes captured websites archivally, which is to say, linked together and with documented provenance that includes the influence of processes and actors in stewardship. Honest description can exist at multiple levels, from document to seed to collection to web archives portal to workflows and manuals. It’s iterative, takes time and many hands and eyes. Something I’m working on now and hope to pilot locally is a workshop in which participants critically read, rework, and write web archives metadata, to collaboratively develop capacity and guidance for more honest description. First: apologies for the title. I thought, and thought, and thought, and I so tried to be clever, but this is what I came up with.

I’m Samantha Abrams, and I am the Web Resources Collection — or, a web archivist — for Ivy Plus Libraries. Big thank you to both Karl and Amy for their fantastic presentations, and to each of you for joining us. We’re also incredibly grateful to all of those who equipped this room, and made this conference happen. So, first up: What is Ivy Plus Libraries? Ivy Plus Libraries is a collaborative partnership best known, perhaps, for its resource-sharing initiative BorrowDirect, but the partnership also supports additional collaborative efforts, including: a collective collections tool (think: a shared catalog), progress toward collective collection analysis (what are we spending, and where), and a collective collections e-book pilot. Member libraries of Ivy Plus Libraries are also part of additional collaborations, including, but not limited to: Hathi Trust, the International Internet Preservation Consortium, the Association of Research Libraries, ReCAP, and more. (Some of our member Libraries — like Chicago, say — belong to similar academic partnerships.)

Additional notes about this slide: I’ve highlighted the Institute of Technology and Stanford University because these organizations do not currently participate in the Ivy Plus Libraries Web Collecting Initiative. Selectors at these institutions did participate in the Pilot Program, and Selectors may choose to continue to submit websites and build collections with their colleagues, but it’s understood that these institutions are not formally recognized as participants in the Program. (For the record, to be recognized as an official Program within Ivy Plus Libraries, at least ten institutions must participate in your initiative.) The Web Collecting Program currently operates on a three-year schedule.

Before we continue: My perspective is certainly privileged, and I am indeed here speaking about work done by and at thirteen incredibly well-resourced institutions. To present our work as the one-and-done solution for all libraries around the globe would be, at best, disingenuous. I will say that, from my perspective, web archiving might be one of those areas with difficulties and growing pains across the board, and we’ve done our best to tackle those issues through collaboration. I hope that what’s at the heart of our strategy — our ideas and our workflows, for starters — will be of use to, and make sense for, institutions facing similar obstacles.

3 The Web Collecting Program: The Web Collecting Program was first established as a Mellon-funded project in 2013, which led to two pilot collections (which I’ll introduce in a moment), and the hiring of a full-time Web Resources Collection Librarian (/waves), and a part-time Bibliographic Assistant. When considering the Program, I encourage participants to remember three basic things: (1) that this is a collaborative effort; the Program is designed to reach across institutional lines and bolster collaborative collection building and thinking; (2) these are thematic collections, not institutional collections (so we’re not collecting .edu content); and (3) that content should be freely available, which means not hidden behind paywalls or logins. Overall, the program seeks original, substantial content. The Web Resources Collection Librarian: The Ivy Plus Libraries Web Resources Collection Librarian works closely with Ivy Plus Libraries stakeholders (buzzword!) to help coordinate the collaborative web collection development Program, while also performing much of the work of building the shared collections, including managing permission requests, harvesting / crawling, quality assurance, description and organization, assessment, and outreach for public use. I started with Ivy Plus Libraries in May of this year; this is a three-year term position.

I am, thankfully, supported by a Bibliographic Assistant named Jean Park, who helps with a lot of quality assurance. (Side note: despite my long list of duties, quality assurance takes up a large chunk of my day-to-day work. Jean and I don’t necessarily do this collaboratively — though we have considered it — so I won’t discuss our workflow in this presentation. I would, however, be happy to answer a question about quality assurance later on.) So, the pilot collections! I introduce these to illustrate the breadth of the Program. The subject experts upon which the Program relies vary across the spectrum, and that means our collections vary from one subject to another. One of our very first collections was —

The Contemporary Composers Web Archive: a collection developed by the Ivy Plus Libraries Music Librarians, which serves as an extension of an existing Ivy Plus Libraries collaborative collection development agreement which identified approximately 1,800 globally based composers of sufficient importance to have their published printed works collected at a comprehensive level by at least one participating institution. The Collection started with fifty-one seeds, and now sits at about 650 public seeds, and will top out at around 1,000 seeds (not all identified composers have a web presence, for various reasons).

The Collection features a large amount of multimedia items, including (but not limited to): YouTube and Vimeo videos; photo galleries; SoundCloud players; downloadable scores, biographies, music samples, and tons of proprietary software (!!!). This greatly contributes to the Collections large size. The Collaborative Architecture, Urbanism, and Sustainability Web Archive: was built by the Ivy Plus Libraries Art & Architecture Librarians. Unlike the Composers collection, it is nomination-based and continues to evolve as Selectors continue to nominate seeds. The Collection is less media-based, and consists of more organizational pages. The most common ‘media’ found in the Collection comes in the form of downloadable PDFs and YouTube videos.

Topics covered in this collection include: land conservation, architecture, historic preservation, neighborhood associations, open spaces, climate change, resilient cities, etc. Our first non-pilot collection!

The Global Webcomics Web Archive: The Global Webcomics Web Archive aims to preserve webcomics and websites belonging to webcomic creators throughout the world in order to assure the continuingly availability of these important, and potentially ephemeral, documents for use by researchers and scholars. This initiative intends to preserve webcomics and websites in a wide variety of styles, subjects and themes, in many different languages, created by a diverse group of artists.

We have several additional collections in progress, on topics including: hyper-local politics, banking statistics, Latin American Contemporary Art, Geoscience publications, and more. Our Selectors, who I’ll talk about more in a moment, have really embraced this work, and the content of each collection shines because of it. So, how it all works.

First, I work with — and talk to — Selectors. (At your institution, these individuals may be called subject specialists, bibliographers, curators, etc.) When I talk with Selectors many of them feel overwhelmed by the prospect of participating in this Project, and often think that I’m asking them to collect the entire Internet (which is not true). Instead, I encourage them to focus on:

● Existing collection policies. What do they already select? How might those subjects, themes, mandates, etc., be enhanced by consulting the live web? And which of those subjects already exist exclusively on the live web?

● Existing Ivy Plus Libraries efforts. Do Selectors participate in any physical or born-digital collaborative collecting efforts? Would it make sense to create a web collection that mirrors these efforts?

● Gaps in collecting. What are we not collecting because it doesn’t exist in print — period? What about organizations and people who cannot afford to disseminate information in print, via vendors, etc.? Where on the live web does this information exist?

● Researchers and literature. Do researchers and students ask Selectors questions that lead them to the live web? Are researchers and students consulting reports, statistics, and publications that exist on organizational websites? If these websites did not exist, where would students and researchers find this information?

● Social media. Is there an anniversary, event, social movement, discovery, breaking news event, etc., that has an associated #hashtag? When you read through that #hashtag, do you notice — and follow — links mentioned within? This may be a good place to start a collection.

10 A point I really want to drive home here today is that this doesn’t have to be complicated: At the suggestion of a colleague from Brown, I created a shared spreadsheet that Selectors can use to brainstorm ideas. A problem often vocalized by people across Ivy Plus Libraries is their desire to participate in the program, but their limited reach within the partnership. Some librarians within Ivy Plus Libraries fit into existing groups — the Ivy Plus Libraries Music Librarians, and the Ivy Plus Libraries Art & Architecture Librarians come to mind — but others don’t. And the ones who don’t felt intimidated by the prospect of a cold email to a colleague they’ve never met. This spreadsheet enables conversation between those who have expressed interest, and eases the burden of reaching out to a random colleague. For most Selectors, this is where their involvement in the Program begins. And then a Google Form: Again, this doesn’t have to be complicated. Once Selectors are confident in their ideas, they’re invited to submit a formal collection proposal. Formal, of course, is a heavy word: because we want to keep the barrier to entry low, we ask Selectors to submit some basic information about their idea. Things like: What’s your name? and How can we get ahold of you? We also ask Selectors to describe the scope of their collection and provide a brief rationale behind their idea. The rationale helps us consider and prioritize which collections to build first — it’s true that the entire web is ephemeral, but some sites — perhaps those political in nature — may be more at risk than others.

Once Selectors submit their nominations: I get an email, and I pass the nomination onto what’s called the Web Advisory Committee. I report up to the Web Advisory Committee, which reviews proposals. The Committee keeps specifics to a minimum, and instead focuses on the expertise of Selectors within Ivy Plus Libraries; those who understand a specific collecting area are the strength of this program. Once a collection is approved by the Web Advisory Committee, its sent up to the Collection Development Group. The Collection Development Group represents all thirteen Ivy Plus Libraries, and members of the Group have final say about all new collections. What, admittedly, can feel bureaucratic has helped our Program: Ivy Plus Libraries works with a limited Archive-It budget (1.5 TB), and the Collection Development Group helps us prioritize collecting.

12 Then, we build! Clearly — and for better or worse — we do everything in Google.

Selectors drop seeds directly into our shared spreadsheet and, for all new collections, also help create metadata. Which is huge! Metadata creation was not written into the original Program charge, but because we want people to use and access our collections, we create it anyway. (More on that in a second.) Sheets allow Selectors to create and define metadata fields, and no two collections are the same. This is vital, too, because I am not a subject specialist and while I can do some basic metadata, I often need help identifying, say, languages, countries of publication, and on. (An extra bonus: Archive-It allows me to upload this spreadsheet directly into our collections, which means I don’t have to create the same metadata twice.) We create an individual record for every seed Ivy Plus Libraries crawls: Every seed gets (dublin core) metadata in Archive-It, and every seed gets a MARC record — which is uploaded to WorldCat, and can be downloaded by participating Libraries. Archive-It metadata is sometimes created collaboratively (as seen in our Webcomics collection), but often created solely by the Web Resources Collection Librarian. MARC records are created solely by the Web Resources Collection Librarian (with additional help provided occasionally by Columbia staff). In the coming years, the Program hopes to distribute this work across the partnership — though we have yet to figure out how that might work.

(Now, how — and in which format to — catalog websites is up for debate.) Every program has its challenges!

I’ve been fortunate enough to work with individuals willing to embrace the challenges of web archiving head on. Of course, when one problem is solved, another is created — this is, perhaps, the truest thing I know about libraries. For one: I think that Ivy Plus Libraries does a lot well, but because I am centrally located at Columbia, we do grapple with the fact that Columbia takes a lot more on than other Ivy Plus Libraries do. There are Human Resources considerations, paychecks, days off, snow days, etc. My supervisor also works at Columbia, even though he’s not dedicated to the project outside of that role. This is all worth considering when thinking about a project of your own. Archive-It, of course, provides storage for our collections, but: Ivy Plus Libraries is interested in a second method.

We’re not sure what this would look like, and it almost becomes another question about division of labor. With digital preservation, there are costs and burdens to consider — for instance, which institution signs the contract with whichever service we select? And, finally, we don’t move fast.

This is, perhaps, our greatest obstacle. Due to the consideration paid to each new collection — a process that sometimes takes months — we can’t react to, say, breaking news, which is a lot of what makes up web archiving. But we’ve tried to embrace other options: for instance, can we teach Selectors to use tools like Webrecorder to capture fast-breaking news? Or, can we teach Selectors to branch out even further — to reach out to colleagues like those at the Internet Archive or Rhizome, or those with the ability to move a bit faster, and collaborate even beyond the walls of Ivy Plus Libraries?