Managing sensitive data and authorship in Humanities and Social Sciences
Louise Corti Collections Development and Producer Support
ODIN conference, Cologne October 2013 Overview
• Introducing the UK Data Service
• Our data portfolio and users
• Citation, impact measurement and DOIs
• Challenges for social science citation The UK Data Archive
• Based at the University of Essex, since 1967
• 45 years of selecting, ingesting, curating and providing access to social science data
• designated as Place of Deposit by The National Archives
• Data and data support services for higher and further education for research, teaching and learning
• Recently attained the highest information security standard, ISO 27001 University of Essex
The Archive SISTER DATA ARCHIVES
Council of European Social Science Data Archives (CESSDA ) ICPSR (USA) Inter-University Consortium for Political and Social Research
ADA Australian Social Science Data Archive What is the UK Data Service?
• Comprehensive data resource funded by the UK Economic and Social Research Council
• Single virtual point of access to a wide range of secondary data for social science research (Directed from Essex)
• Offer promotion, support, training and guidance
What does the UK Data Service do?
• Put together a collection of the most valuable data • Preserve data for the long term for future research purposes • Make the data and documentation available for reuse • Provide data management advice for data creators • Provide training and support for users of the service • Bring together owners, producers and users • Demonstrate impact through evidence of usage • Easy access through website - ukdataservice.ac.uk Who is our service for?
• Data for secondary analysis, research, policy making • Teaching and learning
• Academic researchers and students • Government analysts • Charities and foundations • Business consultants • Independent research centres • Think tanks
Our data portfolio
• Over 6,000 datasets in the collection • 230 new datasets added each year
• Official agencies - mainly central government • International statistical time series • Individual academic’ research grants • Market research agencies • Public records/historical sources • Access to international data via links with other data archives worldwide
UK survey series
• High quality repeated cross-sectional surveys • Individual or household level data • Cover many topics including health, work, crime, social attitudes, family expenditure, living costs, housing etc.
• Labour Force Survey • British Crime Survey • Health Survey for England • British Social Attitudes • Annual Population Survey ….
Cross-national surveys and macro databanks
• Eurobarometers
• European Social Survey
• European Values Survey
• International Social Survey Programme • Time series data aggregated to country/region • International governmental organisations (IMF, OECD, IEA, World Bank) Longitudinal studies
• British Household Panel Survey and Understanding Society • Understanding Society (2009-) • English Longitudinal Study of Ageing • Families and Children Study • Growing Up in Scotland • Longitudinal Study of Young People in England
UK census data
• 1971-2011 census data • Baseline for other statistics • Detailed combinations of characteristics • Small geographies • Census outputs • Aggregate data • Boundary data • Flow data • Microdata Business data
• Collected through a wide range of surveys, and administrative sources:
• productivity, innovation, workforce skills, earnings • international trade, foreign direct investment • research and development • business demography • industrial relations
Qualitative data
• Interviews, focus groups • Essays, diaries, open-ended survey questions • Observations, case notes etc.
• Family Life and Work Experience before 1918, Middle and Upper Class Families in the Early 20th Century,1870-1977 • Gender Difference, Anxiety and the Fear of Crime, 1995 • Mothers Alone: Poverty and the Fatherless Family, 1955-1966
Usage of data
• Operate a spectrum of access • over 22,000 • Web download under End registered users User Licence • approximately • Permission only via Special 60,000 downloads Licence access worldwide p.a. • ‘Approved researcher’ access • 3,000+ user support via remote secure access queries
• End user licence includes: • Appropriate data usage • Full citation of data and informing us of re-use • Have always provided a citation format Evidence of access and re-use
User access information • Collect user information and ‘projects’ upon registration • Collate data and documentation download statistics • Users can share project information for others to see • Report data access stats on demand
Usage information • Email all users every 6 months after registration about activity • Manually add all research outputs references to the data record • Reporting rate of publications is poor! • Prior to DOIs, have scanned citation literature for dataset mentions – very manual and unreliable, and poorly cited Impactful case studies of use
• Identify and seek out case studies of re-use: research or teaching. • Very successful!
• 125 case studies in our database • Can help provide impact stories for data owners/producers and users • And can inspire others! • Some are harvested by ESRC for their website • Often include ongoing work – no need to wait for publications
Our Persistent identifiers approach
• Our data collections are not digital objects
• Need to capture changes made to data • Versioning data in a commonly understood manner • Needed rule-based definition of a‘significant’change
• Integrate processes with digital preservation activities & work flows
• In 2011 we assigned Datacite DOIs for all of our collections • Mint and update DOIs with our metadata management infrastructure
Recording significant change
• Approx. 15% UKDA data collections are altered within first year after first publication
• We have distinguished between major and minor changes to a data collection = high impact vs. low impact
• DOI allocated to a metadata instance of a data collection • DOIs resolve to jump page pointing to all external instances • New DOI = High Impact change, with explicit logging
• Provided access only to most up-to-date version of data Major changes – high impact
• New variable added • New labels/value codes added • Weighting variables reconstructed • Wrong data supplied (e.g., March not April) • Mis-coded data (e.g., Don’t know/Refused confused) • Change in format (file migration) • Significant changes in documentation • Change in access conditions
Raising awareness in the social sciences
• ESRC funding for short-term project on citation • Advocacy for best practice in citing research data • Audiences • Professional organisations • Academic publishers and journal editors • Researchers and postgraduates • Key activities • Data citation principles for social sciences • Personal communications • Events with BL DataCite, JISC and wider PI community • Outreach through Doctoral Training Centres Making
Demonstrating impact with citation
• Assuming better use of DOIS…
• Starting to search for use of our DOIs – Google
• Automate this process and compile reports; promote
• Gather data citation statistics from Thomson Reuters Data Citation Index. One of the early 20 feeder repositories, but our own access limited!
• Work with BL Datacite and ODIN to gain connectivity between identifiers & outputs – early adopters CHALLENGES FOR THE FUTURE
• Citing parts (fragments) of data collections • single files • subsets of quantitative data • extracts of textual data
• ESRC project Digital Futures will enable extract level citation within a web-based browsing system • Using rich highly structured XML metadata • GUIDS for everything UK Quali Bank Resolving citation objects
• Will enable extract level citation
• Citation object and citation format created on the fly – using GUIDS and URI
• URI resolves directly to the data extract
• Some more sensitive collections will be closed, so cannot resolve to data
• As yet uncertain of relationship to our collection- level DOIs CONTACT
UK Data Service University of Essex Wivenhoe Park Colchester Essex CO4 3SQ • ……………..…..……………………….. T +44 (0)1206 872001 E [email protected]