Data Stewardship and the Decentralized Web

Data Stewardship and the Decentralized Web DANIELLE ROBINSON, PhD Co-Executive Director at Code for Science & Society @daniellecrobins @codeforsociety Code for Science & Society Supporting open source in the public interest Code for Science & Society Civic tech + Scholarly research + New media + Open source + Equity, support, inclusion = CS&S community Sharing Bringing:experiences - Knowledge of decentralized computing, data collection & management Seeking: - Better understanding of needs, challenges of your community What is the future of data stewardship? - Bringing together leaders, stakeholders - Design a cooperative data preservation network - Push for ‘FAIR’ and save libraries money Adam Brock 1. Data on the web 2. A new model of data stewardship 3. Prototyping decentralized preservation 4. Reimagine data on the web @daniellecrobins 1. Data on the web 2. A new model of data stewardship 3. Prototyping decentralized preservation 4. Reimagine data on the web @daniellecrobins Across domains, data live online Early work of a writer Government data Newspaper archives Your family photos Scientific data @daniellecrobins Data transparency: Inconsistent practices across domains @daniellecrobins Many data publishing options https://www.ohsu.edu/xd/education/library/data/share-and-archive/index.cfm @daniellecrobins Siloed info, centralized gate keepers control access Doc Searls @daniellecrobins https://imgflip.com/memegenerator/Picard-Wtf @daniellecrobins Distributed beginnings http://som.csudh.edu/fac/lpress/history/arpamaps/ @daniellecrobins Clark Boyd Web centralization Image courtesy of Beaker Browser @daniellecrobins Web centralization It’s easier to manage and monetize a silo Image courtesy of Beaker Browser @daniellecrobins “We embed values into our technology whether we are aware of it or not” - Stephen Whitmore (@noffle) Digital Democracy See also the work of Safiya Noble https://blog.datproject.org/2018/03/05/css-community-call-03-2018/ @daniellecrobins In the centralized web We trust the server to locate, not change objects Silos are the natural state Data may be in multiple silos @daniellecrobins Today’s web relies upon URLs to identify location of objects Ability to change information without changing location Aggregating content for discovery @daniellecrobins Today’s web lacks Persistent identifiers Transparent change log Links between silos @daniellecrobins “The internet is a terribly unstable way to keep information available” - Laurie Allen Penn Libraries' Assistant Director for Digital Scholarship @daniellecrobins “Federal data ≅ website” https://www1.ncdc.noaa.gov/pub/data/ @daniellecrobins Why are federal data ≅ webpages? To find an object online: 1. Discover the link 2. Link still works 3. Trust the info at the link https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations @daniellecrobins Why are federal data ≅ webpages? https://www1.ncdc.noaa.gov/pub/d ata/annualreports https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations @daniellecrobins Link rot: When links fail Content Drift: When referenced content are changed Link rot + content drift = Reference rot M. Klein, several papers and talks, links at end @daniellecrobins The Internet is broken and we are using it to access and distribute all of human knowledge ¯\_(ツ)_/¯ @daniellecrobins The web is being reimagined its all about Rock (: @daniellecrobins What’s important to you? romana klee @daniellecrobins 1. Data on the web 2. A new model of data stewardship 3. Prototyping decentralized preservation 4. Reimagine data on the web @daniellecrobins Preservation starts here @daniellecrobins and preserving “Sharing research data is not well ^ understood, incentivized, or accessible” Daniella Lowenberg Research Data Specialist Product Manager of @uc3dash California Digital Library https://medium.com/@UC3CDL/we-are-talking-loudly-and-no-one-is-listening-a108248693f7 / csv @daniellecrobins screenshot from https://peerj.com/preprints/2588/ Preservation requires custody seagen @daniellecrobins Centralized model requires custody to provide access Image courtesy of Beaker Browser @daniellecrobins Web accessible objects @daniellecrobins Via Agency Is custody required? #WOCinTech Chat @daniellecrobins “Preservation in place… Bring preservation services to the content” -Stephen Abrams Preservation without Possession California Digital Library https://figshare.com/articles/Preservation_without_possession_Content- addressable_identifiers_for_post-custodial_preservation/5844369 @daniellecrobins Cooperative of trusted entities Sharing data and costs Image courtesy of Beaker Browser @daniellecrobins SangyaPundir / www.force11.org/group/fairgroup/fairprinciples @daniellecrobins Leverage existing infrastructure www.force11.org/group/fairgroup/fairprinciples @daniellecrobins Visions are nice! Peter Miller @daniellecrobins Now let’s get real vladeb @daniellecrobins 1. Data on the web 2. A new model of data stewardship 3. Prototyping decentralized preservation 4. Reimagine data on the web @daniellecrobins Multiple decentralized approaches Blockchain Peer-to-peer BTC Keychain / Danilo / http://www.ala.org/tools/future/trends/blockchain / https://gist.github.com/mafintosh/bd9e6d350ebf02441c9707c5f799d05b @daniellecrobins Centralized “hub and spoke” model Data stored at central location, accessed by independent users Image courtesy of Beaker Browser @daniellecrobins Decentralized models Data persistently identified, networked ability to scale Image courtesy of Beaker Browser @daniellecrobins Peer-to-peer public technology https://github.com/mafintosh/bws-2017 @daniellecrobins What’s Dat? Persistent identifiers + Network of peers https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf @daniellecrobins Dat + scholarly data = - Automate preservation, versioning - Find data across storage locations - Spread cost burden across network - Foundational links between silos @daniellecrobins Reimagine data preservation 俍宏葉 @daniellecrobins It’s all about TRUST Image courtesy of Beaker Browser @daniellecrobins … and I trust LIBRARIES Image courtesy of Beaker Browser @daniellecrobins Building a prototype Eran Sandler @daniellecrobins Start with data creation Dr. Dannise V. Ruiz-Ramos describes sea star genome annotation pipeline @daniellecrobins Dat in the Lab lessons: Leverage existing workflows Automate data versioning, preservation Link researchers to library Now linking libraries to each other https://blog.datproject.org/tag/science/ @daniellecrobins Prototype: CDL - IA - SDSC CDL’s DASH corpus (<5 TB) Copied to IA and SDSC Deal with technical hurdles (S3) Next: Monitoring dynamic information @daniellecrobins Every institution contributes Storage, bandwidth Metadata on their collection Commitment to preserve their collection to the network @daniellecrobins Any user can access Information on library collections History of objects Whole or partial data sets from the network @daniellecrobins 1. Data on the web 2. A new model of data stewardship 3. Prototyping decentralized preservation 4. Reimagine data on the web @daniellecrobins What’s important to you? www.liveoncelivewild.com @daniellecrobins Discussion: ● What are the data types that your organization is responsible for? ● How are those data created, stored, used? When do they come to you? ● Who interacts with data? How do they interact with it? ● How are equity, justice addressed (or not) in data stewardship plans? ● What are your concerns around long term preservation of data? Cool project alert! The Data to Policy Project (D2P) is an initiative to engage students with their community’s needs through course-based assignments, which culminate into data-driven policy proposals to local governments and agencies. https://library.auraria.edu/d2pproject/about Thank you to the Western States Government Information Conference Planning Committee DANIELLE ROBINSON, PhD Co-Executive Director at Code for Science & Society @daniellecrobins @codeforsociety Discussion: ● What are the data types that your organization is responsible for? ● How are those data created, stored, used? When do they come to you? ● Who interacts with data? How do they interact with it? ● How are equity, justice addressed (or not) in data stewardship plans? ● What are your concerns around long term preservation of data? .

Data Stewardship and the Decentralized Web

2010–2011 Our Mission

Letter from the P Resident

Copernic Agent Search Results

January 2014

H. Stern Action at a Distance: German Ballads and Verse Entertainments from Goethe to Morgenstern

Topical Lyophilized Targeted Lipid Nanoparticles in the Restoration of Skin Barrier Function Following Burn Wound

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

Science Enhanced S&S Biology

Pando: Personal Volunteer Computing in Browsers

'Innovations in Bio Chemical and Food Technology – 2020' (IBCFT-20)

Summary of UNIX Commands Furnishing, Performance, Or the Use of These Previewers Commands Or the Associated Descriptions Available on Most UNIX Systems

Beaker Browser and the Peer-To-Peer Web: Why? How? and What's Next?