Gathering Data
Total Page:16
File Type:pdf, Size:1020Kb
Access Dataset transfer Databases Collaborate Web-based file sharing Automated ingest Collaborative sites and management Acquire Analyse HPC Cloud Virtual labs Technical advice Costing Research Grant assistance Data Life Cycle Comprehend Plan Visualisation facilities Publish Manage Institutional repository Reuse Archive RDM support Gathering Data Emerging Researcher Series Thursday, 23 February 2017 Jason van Rooyen - eResearch Analyst Niklas Zimmer – Head: Digital Library Services Data Storage Gathering Data 23 February 2017 Managing volumes – challenges at scale • Movement Lee Lab: • from processing to storage • Findability and recoverability • context (metadata) • Privacy & access • Infrastructure • Local, central • Support, admin • Backup, security • Education in best practise and tools • Costs • Consumer/enterprise, lifecycle Gathering Data Gathering Data 23 February 2017 Storage considerations for research data • Hardware ownership: • Local vs hosted o Start-up costs o Operating costs including staff o Replacement • Data ownership/location: • Local vs. hosted o Privacy and POPI o Big data? • Business models of cloud providers: • Data egress/egress costs • Preservation Selecting an IDR for UCT Gathering Data 26 January 2017 Scale of research data at UCT • Sources: instruments, processing, collaborationsLee Lab: • 400 TB allocated Storage Provisioned • Average allocated vs. provision ration 2:1 arceibo (74 TB) • Current rate 40 TB/m provisioned astronomy • 90 TB fast parallel storage on HPC (fhgfs) CASA (74 TB) 400 Uptake Rate 350 300 SATVI 250 TB (70 TB) 200 150 100 50 0 2014/09/18 2014/12/27 2015/04/06 2015/07/15 2015/10/23 Gathering Data Gathering Data 23 February 2017 Research data storage allocations: Storage allocation in Terabytes (TB) 80 73 73 70 70 70 60 48 50 40 Storage TB Storage in 30 25 20 20 10 10 8 10 6 4 5 5 5 5 5 3 2 0.5 1 1 1 1 1 1 0 All Departments & Units Gathering Data 23 February 2017 Future demand – data deluge Lee Lab: Gathering Data Gathering Data 23 February 2017 UCT storage services • Convenient Lee Lab: • User / IT • Easily accessible • Shareable • Applicable • Fast HPC • Archival • Collaborative • Secure • Backups • Private Gathering Data 23 February 2017 Data management and planning services • Planning assistance Lee Lab: • Costing • DMP team and tool • Funder guidelines • Data policy • Acquisition / ingest • Tools (iRods) • Support • Training • Compliance monitoring • eRA integration • Institutional repositories • Collaboration spaces Gathering Data 23 February 2017 UCT Research data storage Gathering Data Gathering Data 23 February 2017 Accessing your Research data storage • A mapped network drive on your Windows, Linux or Apple computer • Web browser using NextCloud to access all your external storage, eg. Research Data, Drop Box, One Drive, Google Drive, etc. • Storage available in increments of 1 TB • Data accessible from anywhere in the world Gathering Data Gathering Data 23 February 2017 Tools for accessing & moving my data Windows Linux Apple Mac TeraCopy SCP SCP RichCopy 4 RSync RSync FastCopy UltraCopier UltraCopier XXCopy FastCopy FastCopy FreeFileSync FreeFileSync FreeFileSync Windows Explorer RoboCopy (Command Line) FileZilla Gathering Data Gathering Data 23 February 2017 How to I request data storage? • Email: - [email protected] or visit the UCT eResearch web site at www.eresearch.uct.ac.za • We can provide guidance on data volumes and growth (and data structure) • Cost – R240 per TB per month or R2880 per year. • Please contact us for more information on storage requirements exceeding one year for a discounted rate. Gathering Data Gathering Data 23 February 2017 UCT Institutional Repository? • Initial evaluations are complete and implementation will commence soon Gathering Data Gathering Data 23 February 2017 Automated Ingest Gathering Data 23 February 2017 Case Study: - EMU & SBRU Prof. Sewell Image acquisition Gathering Data Gathering Data 23 February 2017 Computing environment by: Ashley Rustin Gathering Data Gathering Data 23 February 2017 Myami Acquisition architecture web server and database (virtual centOS – LAMP) control PC (win) data TF20 processing data and \\researchdata\EMU processing (NFS) view Leginon processing (Ubuntu) data acquisition user’s machine HEX processing Appion processing (SLES) (Ubuntu) Gathering Data Gathering Data 23 February 2017 Case Study – Molecular & Cell Biology: Prof Nicola Illing • Nikon Ti Fluorescent Microscope for live cell imaging • eResearch supported storage , remote data analysis and collaboration Gathering Data Gathering Data 23 February 2017 Case Study - Molecular and Cell Biology Gathering Data Gathering Data 23 February 2017 Centre for Infectious Disease Epidemiology and Research Prof Andrew Boulle • eResearch assisted with setting multiple databases and processing and data analysis servers. • Also provisioned research data storage and collaboration Gathering Data Gathering Data 23 February 2017 Case StudyStudy: - - CIDER EMU & SBRU Gathering Data Gathering Data 23 February 2017 Case Study: - Digital Pathology Prof Jane Yeats • Olympus VS120 Virtual Slide Microscope digitizing specimens. • eResearch provide advice on sizing & specifying server hardware and storage. Installed and configured server software - storage , remote data analysis and collaboration Gathering Data Gathering Data 23 February 2017 Case Study – Integrative Bio-Medical Science Gathering Data Gathering Data 23 February 2017 Case Study – Integrative Bio-Medical Science • Dr Mhlanga is building several super- Dr Musa Mhlanga resolution microscopy • eResearch provided advice on sizing & specifying server hardware and storage. • Installed and configured a server for remote access and processing, allowing international peer reviewers to access , process and verify data used for a paper submitted to Nature. Gathering Data Gathering Data 23 February 2017 Digital Scholarship Gathering Data 23 February 2017 Digital Scholarship Digital scholarship • What is digital scholarship? • Support provided by Digital Library Services • Good (digital) research practice around (y)our assets: • Metadata • File naming conventions / Folder structures • Open (non-proprietary) file formats • Putting (y)our research funding to good use: • Digitising existing (primary) research materials • Creating ‘born-digital’ research materials • Connecting your findings and (y)our data: • Open Science Framework (OSF) Gathering Data Gathering Data 23 February 2017 Digital Scholarship What is digital scholarship? http://hpc.uct.ac.za/ “Digital scholarship is the use of digital evidence and method, digital authoring, digital publishing, digital curation and preservation, and digital use and reuse of scholarship .” (Abby Smith Rumsey, 2011) • Digital scholarship harnesses digital tools and strategies in research, teaching, learning and publishing, such as: • Data analysis & visualisation (e.g. GIS spatial data) • Text mining (e.g. Historical studies, Law, Linguistics etc.) • OA publishing : • UCT Libraries OpenUCT • Digitisation & Metadata (e.g. Special / primary collections): • UCT Libraries Digital Library Services • UCT Libraries Special Collections • Data Management Planning (funding > DAM > reusability etc.): • UCT Libaries DMPonline Gathering Data Gathering Data 23 February 2017 Digital Scholarship A proposed typology of digital scholarship Adapted from: William Thomas (February 28, 2015) http://hpc.uct.ac.za/ Digital Projects or Interactive Scholarly Works Digital Narratives Thematic Research Collections Type of Data Homogeneous, Primary Heterogeneous, Primary Integrated, Layered Components APIs, Scripting Schema, Data Models Analysis, Modules Organization Hypothesis Theme or Subject Criticism Scope Tightly-defined Capacious Problem-oriented Interpretative Nature Query-based Affordances Multi-modal Character Procedural Inquiry Open Ended User-directed, Hypertextual ORBIS Valley of the Shadow The Differences Slavery Made Visualizing Emancipation Whitman Archive Gilded Age Plains City Railroaded Mapping the Republic of Letters Who Shot Liberty Valance? Examples Digital Gazetteer of Hearing the Music of the Who Killed William Robinson? the Song Dynasty Hemispheres (resource): UCT Libraries Special Collections Struggle for dignity in Cape The Programming Historian Humanitec Digital showcase Town's informal settlements Gathering Data Gathering Data 23 February 2017 Digital Scholarship Support provided by Digital Library Services: http://hpc.uct.ac.za/ • Research Data Management : • Our RDM services assist you with organising, managing, and curating your research data to facilitate its preservation and access for present and future use. Together with the eResearch Centre and ICTS, we give you access to the datasets and tools you need to complete your research. • Digitisation : • Our digitisation unit offers digitisation, project management, curation, and preservation services for a wide variety of audio- visual, photographic and paper documents to enable and support long-term preservation of, and access to, digital collections. We digitise objects according to international archival preservation and access standards . We also transcode, edit, and curate born- digital objects. Gathering Data Gathering Data 23 February 2017 Digital Scholarship Support provided by Digital Library Services: http://hpc.uct.ac.za/ Digital Scholarship Workshops hosted by UCT Libraries: UCT Libraries, in conjunction with UCT e-Research Centre, Oxford University and University of Hertfordshire, hosted a Digital Scholarship