Objectives Overview of Data Management Tools
Total Page:16
File Type:pdf, Size:1020Kb
Data Management: Collecting, Protecting and Managing Data Presented by Andre Hackman Johns Hopkins Bloomberg School of Public Health Department of Biostatistics July 17, 2014 Objectives • Review of available database and data management tools and their capabilities • Demonstrate examples of good data management • Discuss strategies and good practice for data security and backup • Discuss data conversion and transfer options Overview of Data Management Tools Types of data management tools to consider Type of tool Examples Spreadsheet Microsoft Excel Google Docs Spreadsheet Open Office Calc Desktop database Microsoft Access Filemaker Pro Specialized databases Epi Info, CSPro, EpiData Web database REDCap OpenClinica Mobile data collection Open Data Kit, CommCare, Magpi, iFormBuilder, 1 Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected • Is it longitudinal data with many time points? • What is the expected number of records? Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes: mobile, kiosk, and computer entry. • How will data flow between them? Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes: mobile, kiosk, and computer entry. – Complexity of the data entry forms 2 Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes: mobile, kiosk, and computer entry. – Complexity of the data entry forms – Multiple users access • Can you control and restrict access? – Restrict data views by site – Control whether users have full or read-only access Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes: mobile, kiosk, and computer entry. – Complexity of the data entry forms – Multiple users access – Merging/importing/exporting data capabilities Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes: mobile, kiosk, and computer entry. – Complexity of the data entry forms – Multiple users access – Merging/importing/exporting data capabilities – Audit capabilities 3 Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes such as mobile, kiosk, and computer entry. – Complexity of the data entry forms – Multiple users access – Merging/importing/exporting data capabilities – Audit capabilities – Reporting capabilities during and after the data collection. Built-in or separate tool? • During: – What data is collected/missing? – Running validation checks on data – Interim looks at the trends • After: – Summaries – Analytic results of study Choosing a data management tool… Pick the tool that is appropriate for the data needs of the project • What will be the challenges you face with each tool? – Complexity and scale of the data being collected – Mixture of collection modes such as mobile, kiosk, and computer entry. – Complexity of the data entry forms – Multiple users access – Merging/importing/exporting data capabilities – Audit capabilities – Reporting capabilities during and after the data collection. Built-in or separate tool? Spreadsheets • “Lowest common denominator” • Format that all users can open and edit files • Often used by labs to output their results • Disadvantages – Data can be corrupted by single column sorts or accidental keystrokes – Data entry interface is limited. – Sometimes program tries to “assist” the user with data – Not designed for multiple user entry (unless you are using a web-based spreadsheet) • Helpful spreadsheet data management commands – Transpose data (Paste Special, transpose) – Remove duplicates (Data, Remove Duplicates wizard) – Match merge data (VLOOKUP) – Concatenation/substrings (CONCATENATE, SUBSTRING) – Data Validation (Data, Data Validation) – Use “Forms” for data entry (Excel: Add button to Quick Access Toolbar) – Use “Table” to maintain integrity (Insert, Table) – Use Filter to quickly subset your data 4 Spreadsheets Data management tips Transpose data (Copy, Paste Special, Transpose) From columns to rows Concatenation Create new fields by concatenating fields together Data Validation For coded fields, create validation to ensure consistent data. Spreadsheets Data management tips Remove duplicates tool To ensure unique records, use the Remove Duplicates option. Be careful if not checking entire records. “Forms” tool for data entry Some spreadsheets allow you to automatically create data entry form. Form is simple but often better than trying to enter data directly in spreadsheet Desktop databases • Microsoft Access and FileMaker Pro commonly used • Includes database, forms, reports, queries, custom programming • Easy to initially set up using wizards • Designed for simultaneous access from multiple users • Disadvantages – By default, limited to local network access – Built-in wizards can make it easy to start but tasks beyond wizards require training – Scalability can be an issue if amount of data is large 5 Desktop databases Microsoft Access All of the database elements are in one file Desktop databases Microsoft Access Data entry forms Desktop databases Microsoft Access Queries 6 Desktop databases Microsoft Access Reports Desktop Database: Microsoft Access (“Access 2007: The Missing Manual, Matthew MacDonald, O'Reilly Media, Inc., 2006) Specialized databases • Epi Info, CSPro, Epidata are common • Includes database, forms, reports, queries, custom programming, analysis tools • Quickly set up database by building dictionary/forms • Some include mapping capability to map your data. • Some can be networked for multi-user access but aren’t truly multi-user. • Disadvantages – Can require significant training time (CSPro for example) – Tend to be Windows only applications 7 Specialized databases EpiData Specialized databases Epi Info – Form Design Specialized databases Epi Info – Mapping 8 Overview of Data Management Tools • The web database application – REDCap and OpenClinica are examples – Include database, forms, reports, queries, user/site management, audit trails, backup and security – Automatically build forms from codebooks – Accessible from a web browser – no additional software needed – Scale for very large studies and multiple sites – Most have large user communities as a resource – Disadvantages • Features that YOU want may not exist yet • Some changes require approval by system admin http://www.project-redcap.org/ Project Home View in REDCap 9 Entering Data in REDCap Exporting Data in REDCap Exporting Data in REDCap 10 Setting User Rights in REDCap Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android (phones and tablets) are main options. – Data is entered directly on a smart phone or tablet – Data can be stored locally and transmitted manually OR it can be transferred in real time to a central database. – Smart phones work well for simple question types – Tablets are better suited for complex question types with many answers OR the ability to see previous answers – Considerations: • If data is stored on the device, is it protected in case of theft, damage, etc. • Do you need a mobile data connection to transfer data on-demand or will it be transferred via WiFi or USB? • Battery life of the device. • Where will the data ultimately be stored? Central server, website? 11 Mobile data collection • Options for mobile data collection – iFormbuilder • https://www.iformbuilder.com/ – doForms • http://www.doforms.com/ – Magpi • https://www.magpi.com – Formhub (for Android only) • https://www.formhub.org/ – Enketo • https://enketo.org/ – Open Data Kit (ODK) (for Android only) • This is what our Center currently uses for mobile data collection. Open Data Kit (ODK) Sample Screens Open Data Kit (ODK) Sample Screens 12 Form & Database Design • Recommendations for fields and codings on forms – Make codes for missing and refused data fields. • A common convention is “9” for missing but only if it is not a valid value • Problematic if the 9’s get included in the analysis • Sometimes need to distinguish between – “missing – not obtained”, – “don’t know”, – “refused to answer” Form & Database Design • Recommendations for fields and database design – Make codes for missing and refused data fields. – Use multiple identifiers on each form as a cross check • You don’t need to save all of them each time. It is simply a cross- check. 13 Form & Database Design • Recommendations for fields and database design – Make codes for missing and refused data fields. – Consider requiring that all fields are entered unless they are