Managing sensitive data and authorship in Humanities and Social Sciences

Louise Corti Collections Development and Producer Support

ODIN conference, Cologne October 2013 Overview

• Introducing the UK Data Service

• Our data portfolio and users

• Citation, impact measurement and DOIs

• Challenges for citation The UK Data Archive

• Based at the , since 1967

• 45 years of selecting, ingesting, curating and providing access to social science data

• designated as Place of Deposit by The National Archives

• Data and data support services for higher and further education for research, teaching and learning

• Recently attained the highest information security standard, ISO 27001 University of Essex

The Archive SISTER DATA ARCHIVES

Council of European Social Science Data Archives (CESSDA ) ICPSR (USA) Inter-University Consortium for Political and Social Research

ADA Australian Social Science Data Archive What is the UK Data Service?

• Comprehensive data resource funded by the UK Economic and Social Research Council

• Single virtual point of access to a wide range of secondary data for social science research (Directed from Essex)

• Offer promotion, support, training and guidance

What does the UK Data Service do?

• Put together a collection of the most valuable data • Preserve data for the long term for future research purposes • Make the data and documentation available for reuse • Provide data management advice for data creators • Provide training and support for users of the service • Bring together owners, producers and users • Demonstrate impact through evidence of usage • Easy access through website - ukdataservice.ac.uk Who is our service for?

• Data for secondary analysis, research, policy making • Teaching and learning

• Academic researchers and students • Government analysts • Charities and foundations • Business consultants • Independent research centres • Think tanks

Our data portfolio

• Over 6,000 datasets in the collection • 230 new datasets added each year

• Official agencies - mainly central government • International statistical time series • Individual academic’ research grants • Market research agencies • Public records/historical sources • Access to international data via links with other data archives worldwide

UK survey series

• High quality repeated cross-sectional surveys • Individual or household level data • Cover many topics including health, work, crime, social attitudes, family expenditure, living costs, housing etc.

• British Crime Survey • Health Survey for England • British Social Attitudes • Annual Population Survey ….

Cross-national surveys and macro databanks

• Eurobarometers

• European Social Survey

• European Values Survey

• International Social Survey Programme • Time series data aggregated to country/region • International governmental organisations (IMF, OECD, IEA, ) Longitudinal studies

• British Household Panel Survey and Understanding Society • Understanding Society (2009-) • English Longitudinal Study of Ageing • Families and Children Study • Growing Up in Scotland • Longitudinal Study of Young People in England

UK census data

• 1971-2011 census data • Baseline for other statistics • Detailed combinations of characteristics • Small geographies • Census outputs • Aggregate data • Boundary data • Flow data • Microdata Business data

• Collected through a wide range of surveys, and administrative sources:

• productivity, innovation, workforce skills, earnings • international trade, foreign direct investment • research and development • business demography • industrial relations

Qualitative data

• Interviews, focus groups • Essays, diaries, open-ended survey questions • Observations, case notes etc.

• Family Life and Work Experience before 1918, Middle and Upper Class Families in the Early 20th Century,1870-1977 • Gender Difference, Anxiety and the Fear of Crime, 1995 • Mothers Alone: Poverty and the Fatherless Family, 1955-1966

Usage of data

• Operate a spectrum of access • over 22,000 • Web download under End registered users User Licence • approximately • Permission only via Special 60,000 downloads Licence access worldwide p.a. • ‘Approved researcher’ access • 3,000+ user support via remote secure access queries

• End user licence includes: • Appropriate data usage • Full citation of data and informing us of re-use • Have always provided a citation format Evidence of access and re-use

User access information • Collect user information and ‘projects’ upon registration • Collate data and documentation download statistics • Users can share project information for others to see • Report data access stats on demand

Usage information • Email all users every 6 months after registration about activity • Manually add all research outputs references to the data record • Reporting rate of publications is poor! • Prior to DOIs, have scanned citation literature for dataset mentions – very manual and unreliable, and poorly cited Impactful case studies of use

• Identify and seek out case studies of re-use: research or teaching. • Very successful!

• 125 case studies in our database • Can help provide impact stories for data owners/producers and users • And can inspire others! • Some are harvested by ESRC for their website • Often include ongoing work – no need to wait for publications

Our Persistent identifiers approach

• Our data collections are not digital objects

• Need to capture changes made to data • Versioning data in a commonly understood manner • Needed rule-based definition of a‘significant’change

• Integrate processes with digital preservation activities & work flows

• In 2011 we assigned Datacite DOIs for all of our collections • Mint and update DOIs with our metadata management infrastructure

Recording significant change

• Approx. 15% UKDA data collections are altered within first year after first publication

• We have distinguished between major and minor changes to a data collection = high impact vs. low impact

• DOI allocated to a metadata instance of a data collection • DOIs resolve to jump page pointing to all external instances • New DOI = High Impact change, with explicit logging

• Provided access only to most up-to-date version of data Major changes – high impact

• New variable added • New labels/value codes added • Weighting variables reconstructed • Wrong data supplied (e.g., March not April) • Mis-coded data (e.g., Don’t know/Refused confused) • Change in format (file migration) • Significant changes in documentation • Change in access conditions

Raising awareness in the social sciences

• ESRC funding for short-term project on citation • Advocacy for best practice in citing research data • Audiences • Professional organisations • Academic publishers and journal editors • Researchers and postgraduates • Key activities • Data citation principles for social sciences • Personal communications • Events with BL DataCite, JISC and wider PI community • Outreach through Doctoral Training Centres Making

Demonstrating impact with citation

• Assuming better use of DOIS…

• Starting to search for use of our DOIs – Google

• Automate this process and compile reports; promote

• Gather data citation statistics from Thomson Reuters Data Citation Index. One of the early 20 feeder repositories, but our own access limited!

• Work with BL Datacite and ODIN to gain connectivity between identifiers & outputs – early adopters CHALLENGES FOR THE FUTURE

• Citing parts (fragments) of data collections • single files • subsets of quantitative data • extracts of textual data

• ESRC project Digital Futures will enable extract level citation within a web-based browsing system • Using rich highly structured XML metadata • GUIDS for everything UK Quali Bank Resolving citation objects

• Will enable extract level citation

• Citation object and citation format created on the fly – using GUIDS and URI

• URI resolves directly to the data extract

• Some more sensitive collections will be closed, so cannot resolve to data

• As yet uncertain of relationship to our collection- level DOIs CONTACT

UK Data Service University of Essex Wivenhoe Park Colchester Essex CO4 3SQ • ……………..…..……………………….. T +44 (0)1206 872001 E [email protected]