<<

Internet 20 years / 20 petabytes Archive Digital Storage

● Lots of hard drives ● 56.36 petabytes of disk space ● 21.93 petabytes of archival content ● Stored redundantly in different physical locations DIY storage cluster ● Archival on-disk format ● Efficient (cost + energy) Upload

● 8M Texts ● 2.4M Audio recordings ● 140K Concerts ● 1.9M ● 100K Software items ● … and your stuff! Upload

● S3-compatible API ● API ● `pip install internetarchive` ● ://archive.org/help/abouts3.txt Upload

Using IA S3 to back distributed art projects Physical Storage Upload

● Started in 1996 with Alexa crawls ● Updated in realtime at 2000 urls/sec ● 475,951,398,000 captures ● 10.5 petabytes of WARC data ● Intl Internet Preservation Consortium ● Open source on .com/internetarchive Realtime Wayback

Deleted within two hours: Realtime Wayback - Save Page Now Save Page Now Wayback Under the Hood

● 10.5 petabytes of WARC data ● 2nd Level CDX Index: 20TB compressed ● 1st level: 13GB. Fits in core. Wayback APIs

● Availability and CDX APIs: build your own tools on top of Wayback ● Open source tools to read/write WARCs

● WARC Support in : wget --warc-file

● 2.5 million books with Scribe bookscanner ● 1000 books/day ● 35 scanning centers on 5 continents ● 800 million images ● Open-sourced gphoto Canon 1D/5D drivers Books Books Books Books Television Television

● Archive-It ● Foundation Housing ● OPDS ● Audio Books ● crawler ● ATM ● IIPC ● seeds ● JSMESS emulator ● Petabox ● Bookmobile ● Lending Library ● Physical Archive ● Reader ● Listening Rooms ● Reader Privacy ● ● Live Archive ● Researcher Access ● CD archiving ● LP ● S3-like API ● Community Wireless ● Metadata API ● Scribe and TTScribe ● Credit Union ● Microfilm scanning ● Software preservation ● DAISY / Print- ● Microfiche ● Television Archive Disabled ● Netlabels ● V2 site ● Distributed Web ● Newspaper digitization ● VHS digitization ● Film preservation ● Wayback Machine Questions?

● developers.archive.org ● we love volunteers! ● we are hiring engineers! ● archive.org/jobs