ABSTRACT ...... 4 BACKGROUND ...... 6 Figure 1. Steps in completing a systematic reviewa ...... 6 Specific Aims ...... 8 PARTICIPATION OF PATIENTS AND OTHER STAKEHOLDERS ...... 10 Impact of Stakeholder Engagement on Project ...... 10 METHODS ...... 12 Aim 1: Developing DAA ...... 12 Aim 2: Conducting a Randomized Controlled Trial to Evaluate DAA ...... 14 Table 1. Assignment of 24 Pairs of Data Abstractors to 6 Sequences and 48 Articlesa ...... 17 Figure 2. Screenshot from the Baseline Tab of a data abstraction form used during the DAA trial ...... 20 Aim 3: Disseminating the Study Findings ...... 24 RESULTS ...... 25 Aim 1: Developing DAA ...... 25 Figure 3. Screenshot showing how DAA displays the source document in HTML format (right) adjacent to the data abstraction form in the data abstraction system (SRDR, left)a...... 26 Aim 2: Conducting a Randomized Controlled Trial to Evaluate DAA ...... 27 Figure 4. Participant flow during the DAA trial ...... 27 Table 2. Baseline Characteristics of All 52 Participants in the DAA Trial ...... 28 Table 3. Baseline Characteristics of All 52 Participants in the DAA Trial by Level of Experience With Data Abstraction ...... 30 Table 4. Proportion of Errors by Data Abstraction Approach, Type of Error, Type of Data Item, and Topic ...... 33 Table 4. Proportion of Errors by Data Abstraction Approach, Type of Error, Type of Data Item, and Systematic Review Topic (cont’d) ...... 34 Table 5. Proportion of Errors Across All Approaches, by Type of Error, Type of Data Abstracted, and Systematic Review Topic ...... 36 Table 6. Between-Approach Comparisons of Error Proportions by Type of Data Abstracteda ...... 37 Table 7. Auto-recorded Time Spent (in minutes) by Data Abstraction Approach, Type of Data Item, and Systematic Review Topic ...... 39

2 Table 8. Self-recorded Time (in minutes) Spent by Data Abstraction Approach, Step of Data Abstraction, and Systematic Review Topic ...... 41 Table 9. Self-recorded Time (in minutes) Spent Across All Approaches, by Step of Data Abstraction and Systematic Review Topic ...... 42 Table 10. Between-Approach Comparisons of Auto-recorded Time by Type of Data Abstracteda ...... 44 Table 11. Between-Approach Comparisons of Self-recorded Time Across All Topicsa ...... 44 Aim 3. Disseminating the Study Findings ...... 46 Table 12. Considerations When Selecting Data Abstraction Approaches During Systematic Reviews ...... 47 DISCUSSION ...... 49 Error Proportions Observed and Context for Study Results ...... 49 Differences in Error Proportions and Time Among Data Abstraction Approaches ...... 50 Possible Reasons for Higher Error Proportions With DAA ...... 51 Subpopulation Considerations ...... 51 Value of Using DAA and Implications for Future Research ...... 51 Challenges With Independent Dual Data Abstraction Plus Adjudication ...... 52 Implications and Uptake of Study Results ...... 53 Study Limitations and Strengths ...... 53 CONCLUSIONS ...... 55 REFERENCES ...... 56 RELATED PUBLICATIONS ...... 59 In Preparation ...... 59 Published ...... 59 ACKNOWLEDGMENTS ...... 60 APPENDICES ...... 61 Appendix 1: Published paper describing the technical details of DAA ...... 61 Appendix 2: Survey instrument ...... 75 Appendix 3: Summary of survey responses, by level of experience with data abstraction ...... 78 Appendix 4: Published paper describing the DAA trial protocol ...... 80



Background: When performing systematic reviews, data abstraction, a predominantly manual process, is labor intensive and error prone. Current standards for abstraction rest on a weak evidence base.


Aim 1. Develop Data Abstraction Assistant (DAA), a software tool to identify and track the location of data in articles and to automatically enter data into the Systematic Review Data Repository.

Aim 2. Conduct a randomized controlled trial to evaluate the comparative effectiveness of 3 approaches—(A) DAA-facilitated single data abstraction plus verification, (B) single data abstraction plus verification, and (C) independent dual data abstraction plus adjudication—on the accuracy and efficiency of data abstraction.

Aim 3. Disseminate DAA, study findings, and a decision tool to enable users to better understand the trade-offs between accuracy and efficiency when selecting abstraction approaches during systematic reviews.

Methods: For aim 1, we designed DAA to be a user-friendly platform that would indicate the source of abstracted data and be compatible with various data abstraction systems. We surveyed early users of DAA regarding its user-friendliness. For aim 2, we conducted an online, randomized, crossover trial with 26 pairs of data abstractors. Pairs abstracted data from 6 articles, 2 under each approach. Outcomes were (1) proportion of data items abstracted constituting an error (compared with an answer key), and (2) time taken to complete abstraction. For aim 3, we disseminated DAA to various stakeholders.


Aim 1. Using DAA, abstractors flag specific locations in source documents, thereby creating potentially permanent linkages between abstracted information and its source. When users click on existing flags, DAA scrolls the screen to the exact highlighted location of the source text. Among the 52 surveyed early users of DAA, 83% reported that using DAA was very or somewhat easy; 71% were very or somewhat likely to use DAA; and 87% were very or somewhat likely to recommend DAA to others.

Aim 2. Although overall mean error proportions were similar among the 3 approaches (A, 17%; B, 16%; C, 15%), A was associated with 8% higher odds of errors than B (odds ratio [OR], 1.08; 95% CI, 0.99-1.17) and 12% higher odds of errors than C (OR, 1.12; 95% CI, 1.03-1.22). Approach A had more errors in data items related to study outcomes or results (41%) than approaches B (36%; OR, 1.30; 95% CI, 1.09-1.56) and C (31%; OR, 1.52; 95% CI, 1.27-1.82). Approach A took 20 minutes more (95% CI, 1-40 minutes) to implement than B and 46 minutes less than C (95% CI, 26-66 minutes).


Aim 3. We published manuscripts, made conference presentations, and developed considerations for selecting from available abstraction approaches.

Conclusions: Our findings suggest independent dual abstraction is necessary for outcomes and results data; a verification approach is sufficient for other data. By linking abstracted data with their exact source, DAA provides an audit trail crucial for reproducible research. Reviewers should choose their data abstraction approach on the basis of the inevitable trade-off between saving time and minimizing errors.

Limitations: Currently, DAA is limited to flagging entire lines (not individual words) and cannot flag image-based text. The error proportions, although consistent with those reported in previous studies, might have been inflated due to abstractors’ unfamiliarity with DAA and with the review topics selected for abstraction.



Systematic reviews are research studies in which explicit methods are used to identify, appraise, and synthesize the research evidence addressing a research question.1 The steps in completing a systematic review include preparing the topic and formulating the research question, searching for studies, screening studies for inclusion, abstracting data from relevant individual studies, analyzing the data, synthesizing the evidence, and reporting the findings (Figure 1).2 The validity of the systematic review findings is contingent on collecting accurate and complete data from reports of relevant studies, a process known as data abstraction (or data extraction).

Figure 1. Steps in completing a systematic reviewa

aAdapted from Wallace et al 2013.3

As a predominantly manual process, data abstraction is inefficient, being both labor intensive and error prone. Errors during data abstraction are common and have been well- documented in the literature.4-6 Buscemi et al4 estimated that the proportion of errors, which they defined as “any small discrepancy from the reference standard,” was approximately 30% for single abstraction, regardless of the level of data abstractor experience. Abstraction errors occur when data abstractors either omit to abstract or incorrectly abstract information present in the article. When Gøtzsche et al5 examined 27 meta-analyses (ie, statistical combinations of


abstracted results from studies) across a range of topics, they were unable to replicate the results of 37% of meta-analyses. In another study, Jones et al6 documented abstraction errors in 20 of 42 systematic reviews (48%); in all cases, the errors changed the summary meta- analytic results, although none changed the systematic review conclusions.

Current recommended approaches for reducing errors in data abstraction fall into 2 categories: (1) abstraction by 1 person, followed by checking of the abstraction by a second person (ie, single abstraction plus verification); and (2) independent abstraction by 2 people followed by resolution of any discrepancies (ie, independent dual abstraction plus adjudication). Buscemi et al4 found an absolute error proportion of 17.7% for single abstraction plus verification and 14.5% for independent dual abstraction plus adjudication (an absolute difference of 3.2% and a relative difference of 21.7%), but the independent dual abstraction plus adjudication approach took approximately 50% longer.

To our knowledge, only the Buscemi et al study4 has examined the trade-offs between single abstraction plus verification and independent dual abstraction plus adjudication, and that study focused on a single systematic review topic with only 4 data abstractors; therefore, current standards for data abstraction rest on a weak evidence base. Major sponsors and producers of systematic reviews (eg, Agency for Healthcare Research and Quality [AHRQ] Evidence-based Practice Centers [EPCs], , Centre for Research and Dissemination [CRD]) and organizations that develop methodology standards for systematic reviews (eg, AHRQ, Cochrane, Institute of Medicine [IOM; now named National Academy of Medicine]) made inconsistent recommendations for approaches to reducing errors in data abstraction.1,2,7,8 Because “so little is known about how best to optimize accuracy and efficiency,”1 the IOM Committee stopped short of recommending independent dual abstraction for all data elements. Instead, it recommended “at minimum, use two or more researchers, working independently, to extract quantitative and other critical data from each study.”1 Thus, although the IOM recommended independent dual abstraction for “critical data,” an important gap in our current methodological understanding of data abstraction remains. The recommendation for critical data could represent unnecessary work or, conversely, the IOM’s implicit


recommendation that a single person could abstract noncritical data could represent an opportunity for error. The Patient-Centered Outcomes Research Institute (PCORI) endorses the IOM standards for conducting systematic reviews in general but noted that “Dual screening and data abstraction are desirable, but fact-checking may be sufficient. Quality control procedures are more important than dual review per se.”9

Computer-aided abstraction could potentially make the abstraction process more efficient and more accurate by facilitating the location and tracking of key information in articles. In recent years, several web-based data abstraction systems, such as the Systematic Review Data Repository (SRDR),10,11 Covidence, EPPI-Reviewer,12 DistillerSR, and Doctor Evidence, have been developed to for creating data abstraction forms and for receiving and organizing data collected. Although these data abstraction systems can record which source documents are used for data abstraction, they do not track the specific locations and context of relevant pieces of information in these often-lengthy documents. The ability to track the specific location and context of abstracted data in source documents would record initial data abstraction and likely facilitate data verification and adjudication. This would likely promote the validity of the systematic review findings, save time, and advance the transparency and reproducibility of the systematic review enterprise.

Specific Aims We had 3 specific aims for this work:

Aim 1: Develop Data Abstraction Assistant (DAA), a software tool to identify and track the location of data in articles and to automatically enter data into SRDR.

Aim 2: Conduct a randomized controlled trial (RCT) to evaluate the comparative effectiveness of 3 approaches—(A) DAA-facilitated single data abstraction plus verification, (B) single data abstraction plus verification, and (C) independent dual data abstraction plus adjudication—on the accuracy and efficiency of data abstraction.


Aim 3: Disseminate DAA, study findings, and a decision tool to enable users to better understand trade-offs between accuracy and efficiency when selecting data abstraction approaches during systematic reviews.



We partnered with 13 stakeholders, including patients, systematic reviewers, clinical trialists, practice-guideline developers, policy makers, and industry representatives. These stakeholders, recruited through the core investigative team, represented the following broad set of domains of expertise: patient advocacy, data mining, machine learning, natural language processing, health informatics, systematic review methodology, methodology, patient-centered outcomes research, epidemiology, biostatistics, medicine, educational outreach, public policy, and regulatory science. We decided on and achieved this balance to reflect the multi-stakeholder representation of the systematic review enterprise.1

Impact of Stakeholder Engagement on Project We engaged with the entire investigative team of stakeholders via conference calls every 3 months. During these calls, we discussed progress, issues, and challenges; explored possible solutions; and laid out next steps. Specific areas where stakeholder engagement was particularly helpful were (1) refining the desired features of the DAA software; (2) providing feedback on the features and functioning of the DAA software; (3) refining the design details of the DAA trial; (4) developing the strategy for recruiting participants for the DAA trial; (5) suggesting additional analyses for the DAA trial; (6) interpreting the findings of the DAA trial; and (7) disseminating DAA and results of the DAA trial via stakeholder networks and publications in peer-reviewed journals.

Specific Contributions of Our Patient Stakeholders In addition to the aforementioned contributions of all stakeholders, the patient stakeholders made specific contributions to this project. For aim 2, our patient stakeholders (Vernal Branch, Sandra Walsh, and Elizabeth Whamond) helped us select 4 systematic reviews that address patient-important conditions. Then, to identify the outcomes for data abstraction during the DAA trial, we worked with the patient stakeholders and selected patient-centered outcomes that had the maximum number of studies. For aim 3, the patient stakeholders helped us disseminate our work through useful comments on the manuscripts and presentations. Aim


1, which involved technical software development, was not amenable to the patient stakeholders’ time constraints.



We first developed and implemented DAA, and then conducted an RCT to evaluate it, randomly assigning pairs of data abstractors to different sequences of abstracting data under 3 different data abstraction approaches. We analyzed and compared the proportion of errors and time taken to complete data abstraction under the 3 approaches. We developed a set of considerations to guide selection of data abstraction approaches during systematic reviews.

Aim 1: Developing DAA Appendix 1 contains our published paper that describes the technical details of DAA.13

Three Essential Features for DAA We identified the following 3 essential features for DAA:

1. A platform to indicate the source of abstracted information: The major impetus behind the development of DAA was to create a platform where data abstractors could indicate the source of information by placing flags at, or pinpointing, specific locations in source documents (eg, journal articles), thereby creating a potentially permanent linkage (ie, tracking) between abstracted information and its source. 2. Compatibility with a variety of data abstraction systems: Systematic reviewers usually use a data abstraction system to help extract, manage, and archive the primary study data abstracted during the review. Examples of data abstraction systems include SRDR, Covidence, and DistillerSR. DAA’s main purpose is to contain information that links individual abstracted data items to specific locations in source documents. To make DAA compatible with a variety of data abstraction systems, we designed the DAA platform to be distinct from the data abstraction system. This distinction is attained by keeping separate the process of linkage with an item on the data abstraction form (in the data abstraction system) and the process of capturing and navigating to the location of information (in the source document). Details about the technical implementation of this separation are available in a data repository ( 3. User-friendliness: To make navigation easy and fast, we developed DAA to be user- friendly and menu driven. When abstracting data, the data abstractor can see DAA as integrated into the data abstraction system, side by side on the same screen (ie, a split- screen view).


How DAA Functions Behind the Scenes DAA works behind the scenes through 3 steps:

1. Converting documents from portable document format (PDF) to hypertext markup language (HTML) format 2. Transmitting the HTML version of the source document to the data abstraction system 3. Displaying and allowing for annotation of the HTML version of the source document in the data abstraction system

We examined the accuracy of content and formatting of the conversion of PDF to HTML via visual inspection. We developed a process that establishes linkage between abstracted data and its source as follows: (1) use of a unique identifier (ID 1) that denotes a specific data item in the data abstraction system (SRDR in this project); (2) use of a unique identifier (ID 2) that denotes a specific location in the HTML file of a specific source document; and (3) creation of a record of the mapping between IDs 1 and 2. Each mapping record is also provided a unique identifier (ID 3).

How DAA Functions at the Data Abstractor End DAA is designed to assist with data abstraction, a step that is carried out after the set of relevant studies for the systematic review is identified (Figure 1). Data abstractors can interface directly with DAA by logging into the password-protected DAA web application and uploading study documents (as PDF files). This uploading process can be centrally managed by the project lead if more protected governance of the data abstraction and document management process is desired.

Once source documents are uploaded as PDF files, they are converted into HTML format and organized into document stores, which are groups or collections of source documents. DAA assigns each document store a security token, allowing access to the HTML files from any systematic review project that the data abstractor is working on. Using SRDR as an example, upon logging in, SRDR requires the data abstractor to provide the security token to access the


data abstractor’s document stores. After the data abstractor selects the document store and, subsequently, a source document in HTML format, DAA transmits the HTML file to SRDR. Once DAA transmits the HTML to SRDR, SRDR displays the HTML in an area adjacent to the abstraction form (ie, the split-screen view).

Survey of Early Users of DAA We surveyed early users of DAA (all 52 individual data abstractors enrolled in the DAA trial described in aim 2) regarding their opinions about the user-friendliness of DAA. After completing data abstraction for the DAA trial, we asked each data abstractor to complete a brief survey designed using Qualtrics. We asked questions related to the data abstractor’s self- reported ease with completing each of the following tasks: (1) opening source documents in split-screen view in SRDR, (2) scrolling between pages of a source document, (3) placing flags in a source document, and (4) clicking on existing flags to automatically navigate to the relevant location in the source document. We also asked data abstractors to assess the overall ease of using DAA and to indicate the DAA feature they liked the most. Finally, we asked data abstractors about their likelihood of using DAA in the future and of recommending that others use it in the future (Appendix 2 contains the survey instrument and Appendix 3 presents a summary of the survey responses).

Aim 2: Conducting a Randomized Controlled Trial to Evaluate DAA Appendix 4 contains our published paper describing the protocol for the DAA trial.14

Study Population We recruited individuals who met each of the following criteria (based on self-report): at least 20 years of age, self-reported proficiency with reading scientific articles in English, completed data abstraction for at least 1 journal article for a systematic review in any field, and provided informed consent. We used 4 strategies to recruit potential participants: (1) emails to students who registered for courses in systematic review methods through Johns Hopkins Bloomberg School of Public Health (JHBSPH) and Brown University, (2) emails to faculty and staff at the Johns Hopkins EPC and the Brown EPC, (3) advertising on the SRDR website, and (4)


advertising through patient organizations such as Consumers United for Evidence-based Healthcare and Cochrane Consumer Network.

To mimic how individuals are often paired for data abstraction in systematic reviews, we formed pairs consisting of 1 less-experienced data abstractor and 1 more-experienced data abstractor. On the basis of the results of a pilot study,14,15 we determined that the number of published systematic reviews authored, dichotomized at fewer than 3 vs 3 or more, was best able (ie, had the highest area under the curve) to classify abstractors into less or more experienced with abstraction.

Approaches Compared We compared 3 abstraction approaches:

Approach A: DAA-facilitated Single Abstraction Plus Verification. In approach A, which used DAA, the less-experienced data abstractor in a pair completed the abstraction for an article first. The more-experienced data abstractor verified the information. The less- experienced data abstractor was instructed to place a flag identifying each location within the source document (eg, journal article) supporting the answer to every question on the abstraction form. DAA allowed multiple locations in the document to be flagged for a given question. Once the initial abstraction had been completed, the more-experienced data abstractor was given access to the data abstraction form with the abstracted data in SRDR, together with the flagged locations in the documents. The more-experienced data abstractor could change any of the less-experienced data abstractor’s responses that the former considered appropriate (verification) and, if desired, request discussion with the less- experienced data abstractor (data adjudication).

Approach B: Single Abstraction Plus Verification. Approach B did not use DAA. As in approach A, the less-experienced data abstractor in a pair completed the abstraction form first, without using DAA. The more-experienced data abstractor then verified the information abstracted by the less-experienced partner.


Approach C: Independent Dual Abstraction Plus Adjudication. Approach C also did not use DAA. The 2 data abstractors in a pair each abstracted data independently for the assigned articles using the abstraction form in SRDR. The 2 data abstractors informed each other when they had completed their independent abstractions, and they developed a plan for adjudication (eg, video call, phone call, in-person meeting). The data abstractors compared their abstractions and addressed any discrepancies in the abstracted data (data adjudication).

Randomization and Allocation Concealment Data abstractors were randomly assigned in pairs. Each pair completed abstraction for 6 articles, 2 under each of the 3 aforementioned approaches. Three different pairs abstracted data for each article. To maximize efficiency, we used a crossover design (Table 1), such that each pair of data abstractors implemented all 3 approaches being evaluated, with the intent of estimating differences within pairs. The 6 possible sequences were AABBCC, AACCBB, BBCCAA, BBAACC, CCAABB, and CCBBAA. The DAA trial protocol14 (Appendix 4) and Table 1 describe the randomization schema in detail.


Table 1. Assignment of 24 Pairs of Data Abstractors to 6 Sequences and 48 Articlesa

aA, B, and C denote 3 different approaches for data abstraction; see Aim 2: Approaches Compared. Note: Random sequence is the permuted arrangement of 3 different approaches for data abstraction. For example, sequence 1 indicates data abstractors will collect data from 6 unique articles using AABBCC approaches, respectively.


The senior statistician (C.H.S.) used the R statistical environment to generate the random order. To maintain allocation concealment, we kept the project director (I.J.S.), who was responsible for pairing data abstractors and communicating the randomized sequence to the pair, unaware of the next sequence to be assigned. When a given pair was ready to be randomized, the project director contacted and received from the senior statistician via email the random sequence to which the pair was to be assigned.

Masking It was not feasible to mask data abstractors, because the data abstractors needed to be aware of the abstraction approach in order to abstract data. It is possible that the lack of masking of data abstractors might have caused some bias, but we do not anticipate that this had a meaningful impact on our results, and we are not able to surmise the direction of any bias. The project director was not masked, because he needed to be aware of the sequence of assigned approaches in order to allocate articles and follow data abstractors through the trial. However, the project director played no part in recording the data for either of the trial’s outcomes (ie, errors and time). The lack of masking is unlikely to have influenced our results.

Follow-up and Retention of Participants To maximize retention, the project director maintained regular email contact with data abstractors throughout the trial. We provided each data abstractor US$250 as compensation for participation in the trial only after abstraction for all 6 articles had been completed (ie, there was no partial or interim compensation). As a result of these efforts and the commitment of the participants, all participants completed the trial; we had no missing data.

Evaluative Framework We identified 48 journal articles from 4 systematic reviews reporting results of RCTs (12 articles per systematic review) for use in the trial (Table 1). To ensure the systematic reviews and the outcomes were relevant to patients, our 3 patient co-investigators were involved in the selection of systematic reviews and outcomes. In cases where a systematic review included


more than 12 articles, we selected 12 articles that reported the largest number of outcomes. The topics addressed in the selected systematic reviews were (1) multifactorial interventions to prevent falls in older adults16; (2) proprotein convertase subtilisin/kexin type 9 (PCSK9) antibodies for adults with hypercholesterolemia17; (3) interventions to promote physical activity in cancer survivors18; and (4) omega-3 fatty acids for adults with depression.19

Data Collection, Management, and Monitoring We collected all data through websites, SRDR (the data abstraction system), and DAA. We developed and pilot tested a data abstraction form in SRDR with recommended data elements compatible with each of the 4 systematic review topics (the forms are publicly available at Each form comprised predominantly multiple-choice or numeric entry data items. We organized the data elements into separate “tabs” in SRDR: Design Tab (study design, risk of bias), Baseline Tab (characteristics of participants by study arm at baseline), Outcomes Tab (list of outcomes reported in the article), and Results Tab (quantitative results data). We combined data from the Outcomes and Results Tabs in the analysis. The forms had a median of 145 data items (range, 106-187 data items) for analysis. The total number of data items abstracted varied between articles, depending on the systematic review topic, number of outcomes, and the amount of information available in each article.

Figure 2 displays an example screenshot of 2 items (pertaining to sample size and study participant age) in the Baseline Tab of the form used during the DAA trial.


Figure 2. Screenshot from the Baseline Tab of a data abstraction form used during the DAA trial

Abbreviation: DAA, Data Abstraction Assistant.


Study Outcomes The 2 primary outcomes for the DAA trial were the proportion of data items abstracted that constitute an error (hereafter referred to as “error proportions,” for simplicity) and the time taken to complete abstraction (by both data abstractors, including adjudication). In approaches A and B, we used the verified data from the senior data abstractor’s form as the final answers; in approach C, we used data from the senior data abstractor’s form after adjudication by both data abstractors. For each approach, we compared the final answers from the pair with an answer key generated using data independently abstracted and adjudicated by 2 investigators with extensive experience with data abstraction for systematic reviews (T.L. and I.J.S.). We also manually double-checked each data item that had an error proportion ≥50% in case its corresponding answer key value needed correction.

All errors were ascertained by a computer program that automatically compared the selected or entered value of a given data item with the answer-key value for that data item. We defined an error as any discrepancy or difference between an entry for a data item and the answer key value for that data item. We were interested in abstraction errors resulting from either omission or incorrect abstraction. If participants abstracted more data items than were in the answer key, the additional data items were discarded and not considered errors.

The total time taken to complete abstraction for a given article was defined as the sum of the time taken (in minutes) for initial abstraction(s) plus subsequent verification or adjudication. Because we summed the time spent by data abstractors in a pair, the times technically refer to person-minutes. We asked each data abstractor to record the time spent (in minutes) on each step of data abstraction for each article: initial abstraction, verification, and adjudication (self-recorded time). These data were recorded using the online survey tool Qualtrics. The study data abstraction system (ie, SRDR) also automatically recorded time.

To assess the accuracy of our assessment of time, we corroborated self-recorded time and auto-recorded time, and noted that self-recorded time was consistently shorter than auto- recorded time, partly because the auto-recording clock continued to count time when data abstractors took a break. Our primary analysis of time focuses on the auto-recorded time.


As a post hoc secondary objective of the DAA trial, which is not part of the contractual deliverables for this project, we evaluated bias of meta-analytic summary statistics constructed using various possible results data abstracted for 2 outcomes, compared with data from the answer key.

Analytical and Statistical Approaches

Overview. We conducted all analyses according to the intention-to-treat (ITT) principle, using all pairs who contributed data. In addition, we conducted a per-protocol analysis, using only the pairs who properly completed abstraction (ie, ignoring 2 pairs in whom protocol violations occurred). We computed summary error proportions and time statistics for each approach, systematic review topic, and by type of question (ie, questions in the Design Tab, Baseline Tab, or the Outcomes and Results Tabs).

Statistical Models. We used 2-level mixed models to compare the times and error proportions of the 3 data abstraction approaches. Analyses for the times used a linear mixed model and those for error proportions used a binomial generalized linear mixed model. The first level described variation within pairs of data abstractors across the 6 articles abstracted by each pair; the second level described variation between pairs. Factors investigated at the first level included the 3 approaches as well as indicators for the approach used on the first and last article abstracted by each pair. We included these 2 indicator variables to investigate learning effects (ie, whether time and error proportions tended to decrease as data abstractors abstracted additional articles). Factors at the second (pair) level included the systematic review from which the articles were abstracted and the sequence in which the pair abstracted data. We considered the pair as a random effect by including a random intercept in the first-level model. We also explored interactions of approach with sequence, systematic review, and first and last articles reviewed. Because all participants completed all abstractions, there was no need for additional analyses to deal with missing data.

Exploring the Impact of Errors on Meta-analysis. When different data abstractors abstract different values for an estimate of effect or fail to abstract a value at all, meta-analytic


summaries using the different abstractions might differ if the errors made involve values used in the meta-analysis. To explore the potential impact of errors on meta-analyses, we identified 2 outcomes for meta-analysis, 1 continuous (from topic 2) and 1 binary (from topic 1), which each had 5 or more studies reporting results for that outcome.

Some studies reported arm-level results, some reported between-arm results, and some reported both; we instructed data abstractors to abstract all results for each outcome of interest. Accordingly, we conducted each meta-analysis using 2 methods, 1 based on arm-level results (method 1) and the other on between-arm results (method 2). With each method, if some studies reported only 1 type of result, we used it instead. For instance, in a meta-analysis using estimates of effect derived from between-arm data, missing between-arm results were computed with arm-level results, provided these were available.

By the design of the DAA trial, each article in the meta-analysis had 3 pairs of data abstractors. We carried out all possible combinations of meta-analyses formed by randomly choosing for each study in the meta-analysis 1 of the 3 data abstractor pairs who had abstracted results for the given outcome. For each combination of abstractions, we carried out a random-effects meta-analysis using the DerSimonian and Laird method20 and recorded the mean treatment effect (mean difference [MD] for the continuous outcome and risk ratio for the binary outcome) along with the between-study variance and the I2 statistic. We examined the distribution of each statistic and compared it with the estimate from using data from the answer key.

Conduct of the Study The DAA trial was approved by the IRBs at JHBSPH (dated July 13, 2015; IRB no. 00006521) and Brown University (dated August 21, 2015). Online informed consent for participation in the trial was obtained from every participant via the DAA trial consent website (


Aim 3: Disseminating the Study Findings Because there are various stakeholders in the systematic review enterprise, our target audience for dissemination activities includes, among others, PCORI investigators, Cochrane systematic review authors, AHRQ EPCs, guideline producers, US Preventive Services Task Force, Centers for Disease Control and Prevention, National Institute for Health and Clinical Excellence in the United Kingdom, Center for Reviews and Dissemination in the United Kingdom, Blue Cross Blue Shield, Hayes Inc., industry, academic institutions, and individual users.

Specific Dissemination Strategies of the DAA Software and Findings of the DAA Trial We adopted a multipronged dissemination strategy to ensure that DAA software reaches various systematic review stakeholders. First, we developed DAA to be open source, open access, and free of charge for any future systematic reviews. This development model promotes sharing of original source code and will allow modification and improvement of the software by the community and the general public. Second, we are publishing multiple manuscripts in peer-reviewed journals. Third, we have presented information about DAA and its features at scientific conferences, where we networked with individuals and entities who are likely to use DAA in their systematic reviews.21-24 Finally, our 4 data collection forms (1 for each systematic review in the DAA trial) contain common data items that can be readily adapted for any future systematic reviews. We have made these forms publicly available ( Systematic reviewers can use the forms entirely or in part and can modify questions as desired.

Developing a Set of Considerations to Guide Selection of Data Abstraction Approaches We examined the trade-offs in accuracy versus efficiency for the various data abstraction approaches as well as the resource needs for each approach. We developed a set of considerations for stakeholders to understand the pros and cons of the various choices that they will inevitably have to make in selecting an approach for data abstraction during systematic reviews.


RESULTS Aim 1: Developing DAA

DAA Demonstration and Source Codes DAA records mappings between abstracted data elements and their corresponding locations in source documents.13 Examples of locations include a specific line or paragraph of text, a figure, and a row in a table in a journal article or any report about the study. Mapping a single data element to multiple locations is also supported. Clicking on established mappings automatically loads the source document side by side with the data abstraction form in split- screen view, scrolls the location into view, and highlights the relevant text. Data abstractors can then use a mouse and drag a flag from any item on the data abstraction form to any desired location on the adjacent HTML (Figure 3).

Results of Survey of Early Users of DAA All 52 data abstractors who participated in the DAA trial (aim 2) completed the survey (Appendix 3). Most data abstractors (n = 43 of 52; 83%) found using DAA to be either very or somewhat easy overall. Opening source documents in split-screen view and scrolling between pages of a source document were reported to be easy by 83% and 69% of data abstractors, respectively. Among those who placed flags initially (ie, less-experienced data abstractors), 62% agreed that doing so was easy. Among those who clicked on existing flags (ie, more- experienced data abstractors), 73% agreed that doing so was easy.


Figure 3. Screenshot showing how DAA displays the source document in HTML format (right) adjacent to the data abstraction form in the data abstraction system (SRDR, left)a

aA demonstration video is available at The source code of DAA (Data Abstraction Assistant) can be found at The SRDR (Systematic Review Data Repository Code) source code that has the DAA implementation can be found at These code repositories include documentation to assist in setting up DAA and SRDR server instances.

When asked about use of DAA for data abstraction in the future, 65% of less- experienced and 77% of more-experienced data abstractors stated they would be very or somewhat likely to use it. Similarly, 80% of less-experienced and 93% of more-experienced data abstractors stated that they would be very or somewhat likely to recommend that others use it (see Appendix 3 for detailed breakdown of responses). When asked to name their favorite DAA feature, 54% of all data abstractors chose the ability to click on existing flags marking information sources (73% of more-experienced data abstractors named this feature), 19% of data abstractors chose the ability to open a document in split-screen view, and 17% chose the ability to place flags on the PDF (23% of less-experienced data abstractors named this feature).


Aim 2: Conducting a Randomized Controlled Trial to Evaluate DAA Between March 18, 2016, and February 1, 2017, we screened 160 potential data abstractors for eligibility and randomly assigned 52 (n = 26 pairs) (Figure 4).

Figure 4. Participant flow during the DAA trial

Abbreviation: DAA, Data Abstraction Assistant.

We enrolled 26 pairs instead of the 24 pairs planned in the design, because 3 protocol violations required replacing 2 of the pairs. The first 2 violations (nos. 1 and 2) occurred because the first data abstractor in the pair forgot to place flags during data abstraction under approach A. Protocol violation no. 3 occurred because the project director assigned 2 incorrect studies to a pair. Two protocol violations (nos. 2 and 3) occurred in the same pair. After discussing the issues with the entire investigative team, we enrolled 2 additional pairs of data abstractors (pairs 25 and 26) to replace the 2 pairs in whom the violations occurred.

All participants completed the DAA trial by April 3, 2017. We did not encounter any missing data and analyzed data from all 52 participants under the ITT principle. We conducted a per-protocol sensitivity analysis by replacing the 2 pairs in whom the protocol violations occurred with the 2 added pairs. Because of the crossover design, we present the baseline


characteristics of all 52 participants by sequence (Table 2). In brief, most participants were between 20 and 40 years old, reflecting the populations from which we recruited, and most participants had abstracted data within the past 6 months. Most participants (90%) had abstracted data from 10 or more studies, and all participants had previously received some form of training in systematic reviews. Nearly all participants characterized their level of experience as “somewhat/moderately experienced” or “very experienced.”

Table 2. Baseline Characteristics of All 52 Participants in the DAA Trial

Random sequence, No. (%) AABBCC BBCCAA CCAABB AACCBB BBAACC CCBBAA Characteristic n = 8 n = 8 n = 8 n = 10 n = 10 n = 8 Age range, y 20-29 3 (38) 3 (37) 7 (88) 6 (60) 5 (50) 5 (63) 30-39 2 (25) 5 (63) — 4 (40) 4 (40) 2 (25) 40-49 1 (13) — — — — 1 (13) 50-59 2 (25) — — — 1 (10) — 60-69 — — — — — — ≥70 — — 1 (13) — — — No. of articles abstracted 1-9 — — 2 (25) 1 (10) 1 (10) 1 (13) 10-19 — 3 (38) — — 2 (20) 3 (38) ≥20 8 (100) 5 (63) 6 (75) 9 (90) 7 (70) 4 (50) No. of systematic reviews published 0 1 (13) 4 (50) 3 (38) 4 (40) 4 (40) 3 (38) 1-2 — — 1 (13) 1 (10) 1 (10) 1 (13) 3-5 3 (38) 2 (25) 2 (25) 4 (40) 2 (20) 1 (13) ≥6 4 (50) 2 (25) 2 (25) 1 (10) 3 (30) 3 (38) Last time abstracting data


Random sequence, No. (%) AABBCC BBCCAA CCAABB AACCBB BBAACC CCBBAA Characteristic n = 8 n = 8 n = 8 n = 10 n = 10 n = 8 Within the last 6 mo 7 (88) 7 (88) 8 (100) 8 (100) 7 (88) 7 (88) ≥6 mo ago 1 (13) 1 (13) — — 1 (13) 1 (13) Training in systematic reviewsa No training — — — — — — Took a systematic review 5 (63) 5 (63) 3 (38) 7 (70) 7 (70) 7 (88) methods course Attended a systematic 3 (38) 3 (38) 1 (13) 2 (20) 3 (38) 2 (25) review workshop Received on-the-job 5 (63) 4 (50) 7 (88) 5 (50) 7 (70) 5 (63) training Received other forms of 2 (25) 2 (25) 2 (25) 3 (38) 1 (10) 1 (13) training Self-rated level of experience Slightly experienced — 1 (13) — 1 (10) 1 (10) 1 (13) Somewhat/moderately 4 (50) 3 (38) 2 (25) 5 (50) 6 (60) 6 (75) experienced Very experienced 4 (50) 4 (50) 6 (75) 4 (40) 3 (30) 1 (13) Primary professional status Faculty 3 (38) 1 (13) 1 (13) 2 (20) 3 (38) — Doctoral student 1 (13) 2 (25) 2 (25) 3 (30) 2 (20) 2 (25) Master’s student 2 (25) 2 (25) 1 (13) 2 (20) 2 (20) 1 (13) Staff 1 (13) 3 (38) 3 (38) — 2 (20) 3 (38) Other 1 (13) — 1 (13) 3 (30) 1 (10) 2 (25) Abbreviation: DAA, Data Abstraction Assistant. a Participants could select all options that apply, so the percentages add up to more than 100%.

We also present the baseline characteristics of participants by the classified level of experience with data abstraction (Table 3).


Table 3. Baseline Characteristics of All 52 Participants in the DAA Trial by Level of Experience With Data Abstraction

Less experienced More experienced Overall (n = 26) (n = 26) (N = 52) Characteristic No. (%) No. (%) No. (%) Demographics Age category, y 20-29 18 (69) 11 (42) 29 (56) 30-39 7 (27) 10 (39) 17 (33)

40-49 0 (0) 2 (8) 2 (4) 50-59 1 (4) 2 (8) 3 (6) 60-69 0 (0) 0 (0) 0 (0) ≥70 0 (0) 1 (4) 1 (2) Current professional status Masters student 7 (27) 3 (12) 10 (19) Doctoral student 7 (27) 5 (19) 12 (23) Staff 4 (15) 8 (31) 12 (23) Faculty 3 (12) 7 (27) 10 (19) Other 5 (19) 3 (12) 8 (16) Affiliationa Brown University 1 (4) 8 (31) 9 (17) Johns Hopkins University 24 (92) 9 (35) 33 (65) Other 0 (0) 7 (27) 7 (14) Unclear 1 (4) 2 (8) 3 (6) Training Type of training receivedb Systematic review methods 3 (12) 11 (42) 14 (27) workshop Systematic review course 20 (77) 14 (54) 34 (65)


Less experienced More experienced Overall (n = 26) (n = 26) (N = 52) Characteristic No. (%) No. (%) No. (%) On-the-job training 10 (39) 23 (89) 33 (64) Other 2 (8) 9 (35) 11 (21) Prior experience with data abstraction for systematic reviews No. of articles abstracted 1-9 5 (19) 0 (0) 5 (10) 10-19 7 (27) 1 (4) 8 (15) ≥20 14 (54) 25 (96) 39 (75) No. of systematic reviews published 0 19 (73) 0 (0) 19 (37) 1-2 7 (27) 0 (0) 7 (14) 3-5 0 (0) 11 (42) 11 (21) ≥6 0 (0) 15 (58) 15 (29) Self-assessment of prior experience Slightly experienced 3 (12) 1 (4) 4 (8)

Somewhat/moderately 18 (69) 8 (31) 26 (50) experienced Very experienced 5 (19) 17 (65) 22 (42) Abbreviation: DAA, Data Abstraction Assistant. a Based on email addresses and project director’s familiarity with the participant’s current or past affiliation(s). b Participants could select all that apply, so the percentages add up to more than 100%.



Error Proportions. Table 4 provides the error proportions observed during the DAA trial by data abstraction approach (A, B, and C), type of error (error of omission, incorrect abstraction, and total errors), type of data abstracted (study design, baseline characteristics, outcomes/results, and all types of data), and systematic review topic (1, 2, 3, 4, and all topics). Table 5 reports these data aggregated across all approaches.

Across all approaches, the proportion of errors committed by pairs per abstraction form was 16% (range, 2%-33%; see Table 5). These proportions were similar among data abstraction approaches: 17% (range, 6%-33%) for approach A, 16% (range, 4%-33%) for approach B, and 15% (range, 2%-30%) for approach C (Table 4). Error proportions were much higher when abstracting data items related to outcomes/results (36%) compared with data items related to study design (15%) or baseline characteristics (10%; see Table 5). When extracting data items related to outcomes/results, error proportions were higher for approach A (41%) than for approach B (36%) or approach C (31%; see Table 4). Differences were smaller for data items related to study design (ranging between 13% and 17%) and baseline characteristics (all 10%). Error proportions did not vary as much by systematic review topic. Errors of omission were less common than incorrect abstractions, except among the outcomes/results data items, for which a large majority of the errors were omissions.


Table 4. Proportion of Errors by Data Abstraction Approach, Type of Error, Type of Data Item, and Systematic Review Topic

Approach Aa Approach Bb Type of Error Type of Error Errors of Incorrect Errors of Incorrect Total errors omission abstractions No. of fields Total errors omission abstractions No. of fields Mean % Mean % Mean % Mean Mean % Mean % Mean % Mean (Range) (Range) (Range) (Range) (Range) (Range) (Range) (Range) Study Design Topic 1d 21 (7-49) 0 (0-0) 21 (7-49) 42 (37-43) 14 (7-19) 0 (0-0) 14 (7-19) 42 (37-43) Topic 2e 18 (9-30) 0 (0-0) 18 (9-30) 45 (42-46) 13 (0-21) 0 (0-0) 13 (0-21) 45 (42-46) Topic 3f 12 (5-20) 0 (0-0) 12 (5-20) 45 (42-46) 12 (2-20) 0 (0-0) 12 (2-20) 45 (42-46) Topic 4g 17 (2-48) 0 (0-0) 17 (2-48) 46 (42-48) 15 (2-36) 0 (0-0) 15 (2-36) 46 (42-48) All topics 17 (2-49) 0 (0-0) 17 (2-49) 45 (37-48) 13 (0-36) 0 (0-0) 13 (0-36) 45 (37-48) Baseline Characteristics Topic 1 11 (3-19) 0 (0-0) 11 (3-19) 62 (59-65) 10 (0-20) 1 (0-9) 9 (0-20) 62 (59-65) Topic 2 11 (0-35) 0 (0-0) 11 (0-35) 76 (63-84) 11 (0-34) 0 (0-0) 11 (0-34) 76 (63-84) Topic 3 9 (0-24) 0 (0-0) 9 (0-24) 73 (64-81) 9 (0-26) 0 (0-0) 9 (0-26) 72 (64-81) Topic 4 9 (0-33) 0 (0-0) 9 (0-33) 52 (45-57) 11 (0-27) 0 (0-0) 11 (0-27) 51 (45-57) All topics 10 (0-35) 0 (0-0) 10 (0-35) 65 (45-84) 10 (0-34) 0 (0-9) 10 (0-34) 65 (45-84) Outcomes and Results Topic 1 48 (9-95) 44 (9-76) 4 (0-19) 24 (10-38) 37 (10-65) 35 (0-65) 2 (0-17) 24 (10-38) Topic 2 42 (0-86) 32 (0-86) 9 (0-37) 31 (7-52) 41 (6-95) 28 (0-95) 13 (0-86) 31 (7-52) Topic 3 40 (0-100) 36 (0-100) 4 (0-23) 22 (7-43) 35 (7-100) 32 (0-100) 3 (0-27) 23 (7-43) Topic 4 35 (8-100) 22 (0-100) 13 (0-71) 13 (3-25) 31 (0-100) 27 (0-100) 4 (0-40) 11 (3-25) All topics 41 (0-100) 33 (0-100) 8 (0-71) 22 (3-52) 36 (0-100) 31 (0-100) 5 (0-86) 22 (3-52) All data items Topic 1 20 (6-28) 8 (2-17) 11 (3-20) 150 (131-162) 16 (7-21) 8 (0-13) 8 (4-13) 144 (123-168)


Approach Aa Approach Bb Type of Error Type of Error Errors of Incorrect Errors of Incorrect Total errors omission abstractions No. of fields Total errors omission abstractions No. of fields Mean % Mean % Mean % Mean Mean % Mean % Mean % Mean (Range) (Range) (Range) (Range) (Range) (Range) (Range) (Range) Topic 2 17 (6-24) 6 (0-14) 11 (6-16) 166 (128-181) 16 (7-33) 6 (0-23) 10 (3-20) 161 (123-181) Topic 3 14 (8-30) 6 (0-15) 8 (2-15) 155 (133-187) 14 (4-32) 6 (1-15) 8 (1-18) 148 (123-177) Topic 4 17 (9-33) 7 (4-15) 11 (1-24) 130 (116-142) 18 (6-25) 7 (4-10) 11 (2-21) 122 (106-137) All topics 17 (6-33) 7 (0-17) 10 (1-24) 150 (116-187) 16 (4-33) 6 (0-23) 9 (1-21) 143 (106-181)

Table 4. Proportion of Errors by Data Abstraction Approach, Type of Error, Type of Data Item, and Systematic Review Topic (cont’d)

Approach Cc Type of Error Errors of Incorrect Total errors omission abstractions No. of fields Mean % Mean % Mean % Mean (Range) (Range) (Range) (Range) Study Design Topic 1d 18 (7-35) 0 (0-0) 18 (7-35) 42 (37-43) Topic 2e 10 (2-23) 0 (0-0) 10 (2-23) 45 (42-46) Topic 3f 10 (4-21) 0 (0-0) 10 (4-21) 45 (42-46) Topic 4g 17 (4-36) 0 (0-0) 17 (4-36) 46 (42-48) All topics 14 (2-36) 0 (0-0) 14 (2-36) 45 (37-48) Baseline Characteristics Topic 1 7 (0-14) 0 (0-0) 7 (0-14) 62 (59-65)


Approach Cc Type of Error Errors of Incorrect Total errors omission abstractions No. of fields Mean % Mean % Mean % Mean (Range) (Range) (Range) (Range) Topic 2 15 (0-41) 0 (0-0) 15 (0-41) 76 (63-84) Topic 3 10 (1-24) 0 (0-0) 10 (1-24) 72 (64-81) Topic 4 8 (0-20) 0 (0-0) 8 (0-20) 52 (45-57) All topics 10 (0-41) 0 (0-0) 10 (0-41) 65 (45-84) Outcomes and Results Topic 1 43 (4-100) 40 (0-100) 3 (0-14) 24 (10-38) Topic 2 35 (7-86) 33 (0-86) 2 (0-14) 31 (7-52) Topic 3 21 (0-60) 17 (0-53) 5 (0-57) 23 (7-43) Topic 4 29 (0-100) 28 (0-100) 1 (0-8) 11 (3-25) All topics 31 (0-100) 29 (0-100) 2 (0-57) 22 (3-52) All data items Topic 1 16 (8-27) 8 (0-18) 9 (2-16) 144 (123-156) Topic 2 16 (8-30) 6 (0-12) 10 (4-21) 161 (128-179) Topic 3 12 (2-19) 4 (0-13) 9 (2-15) 149 (123-177) Topic 4 17 (6-25) 7 (4-15) 10 (2-20) 123 (106-142) All topics 15 (2-30) 6 (0-18) 9 (2-21) 144 (106-179) a Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification. b Approach B was single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Topic 1: Multifactorial interventions to prevent falls in older adults.16 e Topic 2: PCSK9 (proprotein convertase subtilisin/kexin type 9) antibodies for adults with hypercholesterolemia.17 f Topic 3: Interventions to promote physical activity in cancer survivors.18 g Topic 4: Omega-3 fatty acids for adults with depression.19


Table 5. Proportion of Errors Across All Approaches, by Type of Error, Type of Data Abstracted, and Systematic Review Topic

All approachesa Type of error Errors of Incorrect Total errors omission abstractions No. of fields Mean % (Range) Mean % (Range) Mean % (Range) Mean (Range) Study Design Topic 1b 18 (7-49) 0 (0-0) 18 (7-49) 42 (37-43) Topic 2c 14 (0-30) 0 (0-0) 14 (0-30) 45 (42-46) Topic 3d 11 (2-21) 0 (0-0) 11 (2-21) 45 (42-46) Topic 4e 16 (2-48) 0 (0-0) 16 (2-48) 46 (42-48) All topics 15 (0-49) 0 (0-0) 15 (0-49) 45 (37-48) Baseline Characteristics Topic 1 9 (0-20) 0 (0-9) 9 (0-20) 62 (59-65) Topic 2 12 (0-41) 0 (0-0) 12 (0-41) 76 (63-84) Topic 3 9 (0-26) 0 (0-0) 9 (0-26) 72 (64-81) Topic 4 9 (0-33) 0 (0-0) 9 (0-33) 52 (45-57) All topics 10 (0-41) 0 (0-9) 10 (0-41) 65 (45-84) Outcomes and Results Topic 1 43 (4-100) 40 (0-100) 3 (0-19) 24 (10-38) Topic 2 39 (0-95) 31 (0-95) 8 (0-86) 31 (7-52) Topic 3 32 (0-100) 28 (0-100) 4 (0-57) 23 (7-43) Topic 4 32 (0-100) 26 (0-100) 6 (0-71) 12 (3-25) All topics 36 (0-100) 31 (0-100) 5 (0-86) 22 (3-52) All data items Topic 1 17 (6-28) 8 (0-18) 9 (2-20) 146 (123-168) Topic 2 17 (6-33) 6 (0-23) 11 (3-21) 163 (123-181) Topic 3 14 (2-32) 5 (0-15) 8 (1-18) 151 (123-187) Topic 4 17 (6-33) 7 (4-15) 10 (1-24) 125 (106-142) All topics 16 (2-33) 6 (0-23) 10 (1-24) 145 (106-187) a Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification; approach B was single abstraction plus verification; approach C was independent dual abstraction plus adjudication. b Topic 1: Multifactorial interventions to prevent falls in older adults.16 c Topic 2: PCSK9 (proprotein convertase subtilisin/kexin type 9) antibodies for adults with hypercholesterolemia.17 d Topic 3: Interventions to promote physical activity in cancer survivors.18 e Topic 4: Omega-3 fatty acids for adults with depression.19


Between-Approach Comparisons of Errors. We fit a variety of models to compare error proportions among the 3 data abstraction approaches. Error proportions varied by abstraction approach, by the sequence in which approaches were undertaken, by review topic, and by the order in which articles were abstracted (error proportions were generally higher for the first article extracted and lower for the last article). We did not find any interactions with approach, except for sequence, but these were difficult to interpret. Table 6 presents comparisons of error proportions between approaches using a model based on the DAA trial design that adjusted for sequence, systematic review topic, and indicators for the approach used on the first and last article abstracted by each pair.

Table 6. Between-Approach Comparisons of Error Proportions by Type of Data Abstracteda

Approach Ab vs Approach Cc Approach Bd vs Approach C Approach A vs Approach B Tab Adj. OR 95% CI P Adj. OR 95% CI P Adj. OR 95% CI P Study Design 1.30e 1.11-1.53 .002 0.99 0.83-1.17 0.87 1.32e 1.12-1.55 .001 Baseline 1.02 0.87-1.20 .83 1.05 0.89-1.23 0.59 0.97 0.83-1.14 .74 Characteristics Outcomes and 1.52e 1.27-1.82 <.0001 1.17 0.97-1.40 0.10 1.30e 1.09-1.56 .004 Results All data items 1.12e 1.03-1.22 .01 1.04 0.95-1.13 0.41 1.08 0.99-1.17 .09

Abbreviation: Adj. OR, adjusted odds ratio. a The model that did not include indicators for the approach used on the first and last article abstracted by each pair rendered similar findings. b Approach A was DAA (Data Abstraction Assistant)-facilitated single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Approach B was single abstraction plus verification. e Significant at 0.05 level.


Overall, across all types of data items, although the crude error proportions were similar (17% for A, 16% for B, and 15% for C; see Table 4), approach A was associated with a statistically significant 12% higher odds of errors than approach C (OR, 1.12; 95% CI, 1.03-1.22) and with a nonstatistically significant 8% higher odds of errors than approach B (OR, 1.08; 95% CI, 0.99-1.17; see Table 6). The majority of these between-approach differences arose from the data items related to outcomes/results and study design, where, for example, compared with approach C, approach A was associated with 52% (OR, 1.52; 95% CI, 1.27-1.82) and 30% (OR, 1.30; 95% CI, 1.11-1.53) higher odds of errors, respectively, for each type of data item. Approach A also was associated with statistically significantly higher odds of errors than approach B in these 2 types of data items. Approaches B and C were associated with similar odds of errors in data items related to study design, but approach B was associated with marginally significantly higher odds of errors in data items related to outcomes/results than approach C. No between-approach differences were observed in data items related to baseline characteristics.

Time Mean times for data abstraction during the DAA trial, as captured by auto-recorded time, were generally longer than those captured by self-recorded time. Across all approaches, the mean times per abstraction were 136 minutes (range, 39-399 minutes) and 107 minutes (range, 30-285 minutes), as captured by the auto- and self-recorded times, respectively.

Auto-recorded Time. Mean times for data abstraction during the DAA trial, as captured by auto-recorded time, were longer for approach C (172 minutes; range, 48-399 minutes) than for approach A (128 minutes; range, 41-350 minutes) and approach B (107 minutes; range, 39-341 minutes; see Table 7). Some systematic review topics took longer to abstract than others. Auto-recorded time clocks were not able to differentiate between initial abstraction versus adjudication or verification; however, they recorded times by type of data item. Regardless of the abstraction approach, data abstractors spent between 2 and 3 times more time on data items related to study design or outcomes/results than on items related to


baseline characteristics. Abstracting data related to study design and baseline characteristics took slightly longer using approach A than approach B, even though both approaches involved verification.

Table 7. Auto-recorded Time Spent (in minutes) by Data Abstraction Approach, Type of Data Item, and Systematic Review Topic Approach Aa Approach Bb Approach Cc All approaches Mean % (range) Mean % (range) Mean % (range) Mean % (range) Study Design Topic 1d 46 (21-107) 36 (19-59) 51 (22-70) 44 (19-107) Topic 2e 61 (17-111) 58 (10-232) 50 (36-85) 56 (10-232) Topic 3f 54 (17-148) 37 (9-82) 84 (43-145) 58 (9-148) Topic 4g 63 (16-199) 41 (16-81) 63 (24-166) 55 (16-199) All topicsh 56 (16-199) 43 (9-232) 63 (22-166) 54 (9-232) Baseline Characteristics Topic 1 9 (5-18) 7 (4-15) 15 (5-32) 10 (4-32) Topic 2 27 (8-66) 14 (3-24) 29 (16-78) 24 (3-78) Topic 3 19 (6-52) 11 (3-20) 28 (15-47) 19 (3-52) Topic 4 23 (4-155) 9 (5-19) 23 (8-75) 18 (4-155) All topicsh 20 (4-155) 10 (3-24) 24 (5-78) 18 (3-155) Outcomes and Results Topic 1 29 (5-68) 27 (10-82) 75 (16-165) 44 (5-165) Topic 2 44 (11-111) 46 (9-96) 55 (27-138) 48 (9-138) Topic 3 43 (10-128) 58 (8-244) 69 (20-140) 57 (8-244) Topic 4 27 (5-69) 33 (8-97) 72 (13-245) 44 (5-245) All topicsh 36 (5-128) 41 (8-244) 68 (13-245) 48 (5-245) All data items Topic 1 98 (44-170) 84 (39-194) 162 (48-243) 114 (39-243) Topic 2 146 (50-290) 132 (44-341) 150 (93-267) 143 (44-341) Topic 3 134 (46-350) 118 (40-311) 199 (113-310) 151 (40-350) Topic 4 132 (41-326) 96 (42-172) 174 (51-399) 134 (41-399) All topicsh 128 (41-350) 107 (39-341) 172 (48-399) 136 (39-399) a Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification. b Approach B was single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Topic 1: Multifactorial interventions to prevent falls in older adults.16 e Topic 2: PCSK9 (proprotein convertase subtilisin/kexin type 9) antibodies for adults with hypercholesterolemia.17 f Topic 3: Interventions to promote physical activity in cancer survivors.18 g Topic 4: Omega-3 fatty acids for adults with depression.19 h Total time for All topics are greater than the sum of the Design, Baselines, and Outcomes and Results Tabs because All Topics also incorporates time spent on the other tabs in SRDR ([Systematic Review Data Repository] ie, Key Questions, Publications, Arms, and Finalize Tabs).


Self-recorded Time. Mean times for data abstraction during the DAA trial, as captured by self-recorded time, were similar between the 2 verification approaches (90 minutes [range, 39-229 minutes] for approach A; and 90 minutes [range, 3-285 minutes] for approach B). The mean time was longer for independent abstraction (142 minutes; range, 59-256) for approach C (Table 8). Across all abstraction approaches, approximately 60% of the time was spent on initial abstraction and approximately 40% on adjudication or verification. Some systematic review topics took longer to abstract than others. Table 9 reports these data aggregated across approaches.


Table 8. Self-recorded Time (in minutes) Spent by Data Abstraction Approach, Step of Data Abstraction, and Systematic Review Topic

Approach Aa Approach Bb Approach Cc Step of data abstraction Step of data abstraction Step of data abstraction Initial Adjudi- Initial Adjudi- Initial abstraction Verification cation Total abstraction cation Total abstraction Verification Adjudi- Total Mean Mean Mean Mean Mean Verification Mean Mean Mean Mean cation Mean (range), (range), (range), (range), (range), Mean (range), (range), (range), (range), Mean (range), Min Min Min Min Min (range), Min Min Min Min Min (range), Min Min Topic 51 (36-80) 24 (11-38) 5 (0-15) 80 (61- 44 (20-97) 23 (5-41) 8 (0-31) 75 (39- 82 (40- 3 (0-10) 47 (22-87) 132 (65- 1d 105) 145) 132) 229) Topic 70 (20- 30 (19-60) 4 (0-30) 103 (48- 57 (20- 26 (2-50) 9 (0-30) 92 (35- 90 (38- 0 (0-0) 48 (28-90) 138 (76- 2e 210) 229) 140) 172) 182) 255) Topic 61 (20- 31 (10-75) 4 (0-21) 96 (45- 70 (18- 22 (10-41) 20 (0-80) 112 (30- 95 (69- 4 (0-42) 62 (24- 161 (98- 3f 149) 224) 210) 285) 145) 145) 227) Topic 45 (19-72) 31 (18-65) 5 (0-20) 81 (39- 46 (18- 27 (10-55) 8 (0-28) 81 (43- 64 (32- 4 (0-20) 66 (24- 135 (59- 4g 145) 113) 136) 153) 190) 256) All 56 (19- 29 (10-75) 5 (0-30) 90 (39- 54 (18- 25 (2-55) 12 (0-80) 90 (30- 83 (32- 3 (0-42) 56 (22- 142 (59- topics 210) 229) 210) 285) 182) 190) 256) a Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification. b Approach B was single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Topic 1: Multifactorial interventions to prevent falls in older adults.16 e Topic 2: PCSK9 (proprotein convertase subtilisin/kexin type 9) antibodies for adults with hypercholesterolemia.17 f Topic 3: Interventions to promote physical activity in cancer survivors.18 g Topic 4: Omega-3 fatty acids for adults with depression.19


Table 9. Self-recorded Time (in minutes) Spent Across All Approaches, by Step of Data Abstraction and Systematic Review Topic

All approachesa Step of data abstraction Initial abstraction Verification Adjudication Total Mean (range), Mean (range), Mean (range), Mean (range), Min Min Min Min Topic 1b 59 (20-132) 17 (0-41) 20 (0-87) 96 (39-229) Topic 2c 72 (20-210) 19 (0-60) 20 (0-90) 111 (35-255) Topic 3d 75 (18-210) 19 (0-75) 29 (0-145) 123 (30-285) Topic 4e 52 (18-153) 21 (0-65) 27 (0-190) 99 (39-256) All topics 64 (18-210) 19 (0-75) 24 (0-190) 107 (30-285) a Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification; approach B was single abstraction plus verification; approach C was independent dual abstraction plus adjudication. b Topic 1: Multifactorial interventions to prevent falls in older adults.16 c Topic 2: PCSK9 (proprotein convertase subtilisin/kexin type 9) antibodies for adults with hypercholesterolemia.17 d Topic 3: Interventions to promote physical activity in cancer survivors.18 e Topic 4: Omega-3 fatty acids for adults with depression.19

Between-Approach Comparisons for Time. We fit a variety of models for the self- and auto-recorded times. Both sets of times varied by abstraction approach, by sequence of approaches, and by the order in which articles were abstracted, but not by review topic. Again, because of difficulties with interpretation, we ignored interactions between approach and the order in which articles were abstracted. Table 10 reports comparison data of auto-recorded time between approaches using a model based on the DAA trial design that adjusted for sequence, systematic review topic, and indicators for the approach used on the first and last article abstracted by each pair. Table 11 presents similar comparison data for self-recorded time. Irrespective of which time was used, the comparisons between approaches rendered similar findings.

When considering total time spent on studies, approach A took statistically significantly less time than approach C by both methods of time recording: by 46 minutes (95% CI, 26-66 minutes) using auto-recorded time and by 53 minutes (95% CI, 39-66 minutes) using self-


recorded time. Approaches A and B did not differ on self-recorded time but differed in favor of B on auto-recorded time by 20 minutes (95% CI, 1-40 minutes). Approach B also took statistically significantly less time than approach C: by 66 minutes (95% CI, 47-86 minutes) using auto-recorded time and by 52 minutes (95% CI, 39-66 minutes) using self-recorded time.

When considering time spent by type of data abstracted (Table 10), approach A took statistically significantly less time than approach C for data items related to outcomes/results, but not for study design and baseline characteristics. Approach B took statistically significantly less time than approach C for each type of data item. Approach A took longer than approach B for data items related to study design and for baseline characteristics, but not for data items related to outcomes/results.


Table 10. Between-Approach Comparisons of Auto-recorded Time by Type of Data Abstracteda

Approach Ab – Approach Cc Approach Bd – Approach C Approach A – Approach B Tab Adj. MD 95% CI P Adj. MD 95% CI P Adj. MD 95% CI P Design Tab –7.2 –17.7 to 3.3 .18 –20.2e –31.2 to –9.2 .0003 13.4e 3.0-23.9 .01 Baselines Tab –4.2 –9.8 to 1.5 .15 –13.8e –19.4 to –8.2 <.0001 9.7e 4.0-15.3 .0008 Outcomes –33.3d –45.7 to –20.9 <.0001 –27.4e –39.8 to –15.0 <.0001 –5.9 –18.3 to 6.5 .35 and Results All data items –45.9d –65.5 to–26.3 <.0001 –66.1e –85.7 to –46.5 <.0001 20.2e 0.6-39.8 .04 Abbreviation: Adj. MD, adjusted mean difference. a The model that did not include indicators for the approach used on the first and last article abstracted by each pair rendered similar findings, except that the comparison between approaches A and B for all data items was not statistically significant (when those indicators were not included in the model). b Approach A was DAA (Data Abstraction Assistant)–facilitated single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Approach B was single abstraction plus verification. e Significant at 0.05 level.

Table 11. Between-Approach Comparisons of Self-recorded Time Across All Topicsa

Approach Ab – Approach Cc Approach Bd – Approach C Approach A – Approach B Tab Adj. MD 95% CI P Adj. MD 95% CI P Adj. MD 95% CI P All data items –52.7e –66.2 to –39.2 <.0001 –52.4d –65.9 to –39.0 <.0001 –0.3 –13.7 to 13.2 .97 Abbreviation: Adj. MD, adjusted mean difference. a The model that did not include indicators for the approach used on the first and last article abstracted by each pair rendered similar findings. b Approach A was DAA (Data Abstraction Assistant)-facilitated single abstraction plus verification. c Approach C was independent dual abstraction plus adjudication. d Approach B was single abstraction plus verification. e Significant at 0.05 level.


Sensitivity Analyses For both error proportions and time, per-protocol sensitivity analyses returned similar results (results not shown) as the main ITT analyses.

Impact of Errors on Meta-analysis Continuous Outcome (low-density lipoprotein–cholesterol [LDL-C] level absolute change from baseline to 12 weeks). Eight of the 12 studies in systematic review topic 2 (ie, PCSK9 antibodies for adults with hypercholesterolemia17) provided sufficient data for a meta-analysis for this outcome, when comparing evolocumab 420 mg (a PCSK9 antibody) and placebo. With 3 data abstractors for each study and 8 studies, there were 38 = 6561 possible combinations of data that could be used for this meta-analysis. Because data abstractors sometimes omitted an outcome (ie, failed to abstract any data for an outcome) or did not abstract sufficient data for a meta-analysis (eg, abstracted mean without measures of precision), the mean number of studies per meta-analysis was 5.67 and 4.67 for methods 1 and 2, respectively.

When using data from the answer key, the pooled MD in the continuous outcome (ie, LDL-C level absolute change from baseline to 12 weeks) using analysis method 1 was –2.08 mmol/L (95% CI, –2.48 to –1.68). When using the resampling meta-analysis, we found that the mean of the MDs was of slightly smaller magnitude (–1.92 mmol/L) and ranged from –2.27 mmol/L to –1.63 mmol/L. Although the objective of the DAA trial was to compare 3 data abstraction approaches, in the resampling meta-analysis, there were only 3 combinations that had the same approach for all 8 sampled studies, 1 combination for each abstraction approach. Compared with the answer key (–2.08 mmol/L), the magnitude of the MD was slightly lower for approach A (–1.89 mmol/L), similar for approach B (–2.11 mmol/L), and moderately lower for approach C (–1.68 mmol/L). Despite the smaller mean number of studies, the MDs for analysis method 2 were very similar to those for method 1.


Binary Outcome (having at least 1 fall by 12 months). Ten of the 12 studies in systematic review topic 1 (ie, multifactorial interventions to prevent falls in older adults16) provided sufficient data for a meta-analysis for this outcome when comparing physical activity and usual care. With 3 data abstractors for each study and 10 studies, there were 310 = 59 049 possible combinations of data that could be used for this meta-analysis. We considered a random sample of 10 000 of these possible combinations. There was a mean of 6.65 studies per meta-analysis for both analysis methods 1 and 2.

When using data from the answer key, the relative risk (RR) of the binary outcome (ie, having at least 1 fall by 12 months) using analysis method 1 was 0.93 (95% CI, 0.84-1.03). When using the resampling meta-analysis, the mean of the RRs rendered a slightly larger effect (0.91) and ranged from 0.86 to 0.97. Compared with the answer key (0.93), the RR for approach A was similar (0.94), slightly stronger for approach B (0.91), and slightly weaker for approach C (0.96). The RRs using analysis method 2 were very similar to those for analysis method 1.

Aim 3. Disseminating the Study Findings Table 12 presents a set of considerations to guide systematic reviewers in their choice of approach to data abstraction. We published manuscripts in peer-reviewed journals describing the features and functioning of DAA14 and the protocol for the DAA trial.14 We also are preparing a separate manuscript that describes the primary results of the DAA trial. We presented DAA and the DAA trial design at the Cochrane Colloquium in 201615 and at the Global Evidence Summit in 2017.22 We presented the results of the DAA trial at the annual meeting of the Society for Research Synthesis Methodology in 201823 and at the Cochrane Colloquium in 2018.24 We responded to a manuscript about reducing research waste in systematic reivews.25


Table 12. Considerations When Selecting Data Abstraction Approaches During Systematic Reviews

Tasks Guidance Data abstraction Use electronic data abstraction systems where possible. The system system chosen should be able to implement best practices of form development, enhance open science and reproducibility, and reduce research waste. Form development • Pilot test form. • Provide clear instructions. • Provide definitions to clarify terms. • Minimize open-ended questions. • Use existing templates, existing (and common) data items, tailoring questions to specific topics as needed. Training and Conduct regular and ongoing training to reinforce methods and composition of data prevent inconsistencies in interpretation of data items. abstractor team Data abstraction • Avoid single-data abstraction to minimize errors. approach (directly • Single abstraction plus verification (without using DAA) leads to a informed by findings similar amount of errors overall as independent dual abstraction of the DAA trial) plus adjudication but takes substantially less time. Single abstraction plus verification may lead to more errors than independent dual abstraction plus adjudication for data items related to outcomes and results. • DAA-facilitated single abstraction plus verification could be considered for the following reasons: 1. It has similar overall error proportions to independent dual abstraction plus adjudication. 2. It takes substantially less time than independent dual abstraction plus adjudication 3. It has the potential to promote. reproducible science through creation of permanent linkages between abstracted data and their sources (something that single abstraction plus verification without DAA does not do). This could facilitate the updating of systematic reviews and sharing of previously abstracted data for other purposes. 4. It can contribute to evaluating and advancing the use of various automated and semiautomated natural-language processing and machine-learning tools for systematic review production. • Regardless of the approach chosen, pay careful attention to data items that are more prone to errors (eg, outcomes and numeric results) and those that are subjective and require judgment (eg,


Tasks Guidance risk of bias). These types of data items may benefit from independent dual abstraction plus adjudication. Managing abstracted • Anticipate challenges associated with the complexities of data data management, especially for large systematic reviews, and plan accordingly. • Decide whether calculation-type questions should be dealt with during data abstraction or centrally during data management.

Abbreviation: DAA, Data Abstraction Assistant.


DISCUSSION We developed DAA to assist the data abstraction process during systematic reviews and tested DAA using a randomized crossover trial conducted online. We found that although the overall error proportions were similar among the 3 data abstraction approaches tested in the DAA trial (range, 15%-17%), DAA-assisted single abstraction plus verification (approach A) was associated with higher odds of errors than were the other 2 approaches, especially for data items related to study outcomes and results. The overall and data type-specific error proportions for single abstraction plus verification (approach B) were similar to those for independent dual data abstraction plus adjudication (approach C). Regardless of the abstraction approach, certain types of data items (namely, outcomes/results) were more prone to errors, and a large proportion of errors in numeric results were omissions, because data abstractors missed certain outcomes. Approach A took substantially less time than approach C, but longer than approach B.

Error Proportions Observed and Context for Study Results There are several possible reasons for the relatively high error proportions observed in our study. First, the error proportions might be higher for studies in which the quality of reporting was poor, a factor we did not evaluate in this study. Ambiguity of reporting poses challenges for data abstraction. In addition, accurate abstraction of certain data items requires a nuanced understanding of methodological and statistical concepts related to study design and analysis. Although we required as an eligibility criterion for the DAA trial that participants have experience with data abstraction, we did not evaluate data abstractor expertise related to statistics or clinical trial methodology. We also did not require data abstractors to have knowledge related to the content of the topics of the reviews. In addition, data abstractors were not involved in conceiving the systematic reviews, such as protocol development, screening of studies, and form design and testing. By participating in these activities, data abstractors develop domain knowledge and become familiar with relevant concepts, terminology, measures, and methods. As such, it is possible that the data abstractors in the DAA trial were less familiar with the data items to be abstracted than might be expected of data


abstractors working on real-life systematic reviews. Finally, although we tried to make the questions and instructions on the data abstraction forms as clear as possible, we did not intervene to improve the quality of data abstraction midway, as might be attempted in real-life systematic reviews through regular and ongoing training and group discussions.

The highest proportions of errors were observed for data items related to outcomes and results, and most of these errors arose because of omissions, either of entire outcomes or specific fields within outcomes. The proportions of errors were also high for data items that require judgment (eg, risk of bias). Because opinions may vary even among the most experienced data abstractors, the so-called errors in data items that require judgment may simply reflect a range of views and interpretations. Quality assurance procedures, including development of detailed protocols and data abstraction instructions, and regular and ongoing training of data abstractors,26 should focus on these areas to minimize errors.

Differences in Error Proportions and Time Among Data Abstraction Approaches The overall error proportions were similar among the 3 approaches in the DAA trial (range, 15%-17%). However, when focusing on data items related to outcomes and results, we noted that DAA-assisted single abstraction plus verification (approach A) was associated with a higher proportion of errors (41%) than single abstraction plus verification without DAA (approach B; 36%) and independent dual abstraction plus adjudication (approach C; 31%). However, the 2 verification approaches (A and B) required considerably less time (almost 1 hour less per article) than independent dual data abstraction plus adjudication (C). This translated into a time saving of more than one-third. The independent nature of abstraction in approach C, coupled with both abstractors having to spend time adjudicating their data, likely led to approach C taking the longest time. This suggests that precautions such as independent dual data abstraction plus adjudication may be most important for data needed for meta- analysis but may not be necessary for all items.


Possible Reasons for Higher Error Proportions With DAA The higher error proportions using DAA than the other 2 approaches may arise from several sources. First, DAA is a new software application that was tested using data abstractors who were naive to using it. Although we provided data abstractors with training videos for using DAA, some of the errors might be related to abstractors being unfamiliar with a new technology. Second, we did not monitor whether DAA was being used as intended. When placing and reviewing flags, it is possible that abstractors flagged only the first instance of relevant information for a given data item in an article and missed other locations in the article that might have provided relevant information. It is also possible that DAA verifiers were anchored to what had already been flagged and, therefore, were particularly prone to missing information that was not flagged. Regarding these first 2 factors, it should be noted that when using DAA, it is good practice to flag all locations that contain relevant information for a given data item, and we had instructed abstractors as such. In addition, there generally is a delay during which human performance with a new tool might be expected to peak. With diffusion of innovation, adequate training, and integration of the tool with other enhancements, such as the use of machine learning and natural-language processing to locate items in text, we surmise that these error proportions may become lower over time. Third, outcomes and results (the type of data items with the highest error proportions) are often reported in tables and figures. It is possible that the format of such tables and figures in some articles used in the DAA trial did not allow for appropriate flagging of specific-enough pieces of text to answer specific data items, thereby limiting the value of using DAA.

Subpopulation Considerations The consideration of subpopulations is not applicable because the interventions in the DAA trial (ie, data abstraction approaches) were not expected to affect the health of data abstractors.

Value of Using DAA and Implications for Future Research To the extent that DAA is used appropriately, it has the potential to promote reproducible science through the creation of permanent linkages between abstracted data and


their sources. This facilitates the updating of systematic reviews and sharing of previously abstracted data for other purposes. DAA also can contribute to evaluating the performance of various automated or semiautomated tools that facilitate data abstraction during systematic reviews. These tools use natural-language processing and machine-learning approaches to assist with data abstraction.27 Most existing tools focus on automating the abstraction of data items, such as number, age, and sex of participants; number of recruiting centers; intervention groups; and outcomes.28 A few tools can abstract information about study objectives and certain aspects of study design (eg, study duration, participant flow) and risk of bias.28,29 However, to date, most of the data items typically abstracted during systematic reviews, including outcomes and results needed for meta-analyses, have not been explored for automated abstraction. Before automated tools for text identification and highlighting can achieve the goals set for their use, their performance should be evaluated using a common data set. The linkages created by DAA can facilitate this evaluation and provide lessons about how these tools can fit into existing systematic review workflows.

Challenges With Independent Dual Data Abstraction Plus Adjudication The 2 verification approaches required considerably less time (almost 1 hour less per article) than did independent abstraction. In addition, our findings suggest that, for approach C, the step of adjudication (ie, by 2 data abstractors after the initial independent abstraction) took about two-thirds the amount of time as the initial abstraction by 1 data abstractor (Table 8). When adjudicating, data abstractors had to reorient themselves to the content of the article, identify discrepancies on their data collection forms in the data abstraction system (in our case, SRDR), and discuss the discrepant fields to arrive at consensus. Such reorientation would likely take longer as the time between initial abstraction and the adjudication session(s) increases. The time taken for adjudication would possibly be less if SRDR could automate the comparison step in identifying discrepancies. Such a data comparison tool, now available in the newly launched SRDR Plus (, was unavailable during the DAA trial.


Implications and Uptake of Study Results The findings of the DAA trial fill a critical methodological gap in our current understanding of data abstraction best practices, as revealed in a 2017 systematic review30 and the 2011 IOM standards for systematic reviews.1 Both documents identified only 1 study (by Buscemi et al4) that had compared verification with independent abstraction. In the Buscemi et al4 study, the absolute error proportions were similar between the 2 approaches (17.7% for verification and 14.5% for independent abstraction). These proportions are consistent with the error proportions in our study (range, 15%-17%). However, the main conclusion (that the verification approach resulted in more errors than did independent dual abstraction) in the Buscemi et al4 study was based on a relative difference of a 21.7% lower error proportion for independent abstraction (P = 0.02).4 Our study may now be added to the information needed to determine best practices for systematic reviews.

Study Limitations and Strengths The current version of the DAA software created in aim 1 has some limitations. Currently, the smallest unit of text that can be highlighted as source material for a given data item is an entire line in a paragraph in the source document. Also, DAA currently does not allow the highlighting of text in image-based tables and figures. We are continuing to develop and refine DAA to address these limitations.

The DAA trial itself (aim 2) had some limitations. First, we evaluated as a test case DAA’s compatibility with only 1 data abstraction system (ie., SRDR). Second, certain questions on the data abstraction forms might have been reasonably interpreted differently by different pairs of data abstractors, leading to multiple acceptable answers. For example, in instances when an article provided no or ambiguous information about masking of outcome assessors, the distinction between “No,” “Not reported,” and “Not applicable” might not be readily apparent. This might have artificially inflated the error proportions for such questions.

This study also has several strengths. First, we designed DAA to be open access, open source, and free. To our knowledge, DAA is the only software application that enables tracking


of the source of abstracted data. To the extent that DAA is used as intended, it has the potential to promote reproducible science through the creation of permanent linkages between abstracted data and their sources. Such links may facilitate the updating of reviews and sharing of previously abstracted data for other purposes. Second, DAA is compatible with a wide range of data abstraction systems. Third, related to the DAA trial, we used a rigorous and efficient crossover design with random allocation and allocation concealment, testing the effectiveness of DAA-assisted single abstraction plus verification vis-à-vis 2 standard approaches to data abstraction. The studies included for data abstraction in the trial covered a range of topics and examined a range of outcomes. Fourth, we included 52 data abstractors in the trial, and each completed all steps of the trial; we had no missing data. The generalizability of the trial is likely high because of the broad eligibility criteria for data abstractors from multiple locations and organizations with various types of backgrounds and levels of experience with data abstraction for systematic reviews. Fifth, we obtained from all 52 participants, on their completion of activities in the trial, their opinion about the user friendliness of DAA and suggestions for its improvement (which we are incorporating). Once we incorporate these suggestions and make other improvements, we will make DAA available to the public, initially through SRDR and then for use with other data abstraction systems. Sixth, we collaborated with multiple stakeholders, including patients, in developing DAA and designing, conducting, analyzing, and disseminating the results of the DAA trial. Finally, our multipronged dissemination strategy likely will ensure that the DAA software reaches various systematic review stakeholders.



Because data abstraction is still largely a manual process, errors in data abstraction are almost inevitable and, in some cases, quite frequent. Users of systematic reviews, including patients, clinicians, guideline developers, and others, should be aware that systematic reviews may sometimes be based on inaccurately abstracted data. However, on the basis of findings from this study, we do not know and cannot predict how the conclusions of an individual systematic review and meta-analysis might be affected by data abstraction errors.

Systematic reviewers should always adopt quality assurance procedures during data abstraction, develop detailed protocols and instructions, and regularly train data abstractors. Such efforts should focus on areas where error proportions are particularly high, such as data items related to study outcomes and results.

In summary, considering accuracy and efficiency together, our findings suggest independent dual abstraction plus adjudication is necessary for outcomes and results data during systematic reviews; a verification approach is sufficient for other types of data. By linking abstracted data with their exact source, DAA provides an audit trail that is crucial for reproducible research and complete transparency. Reviewers should choose their data abstraction approach on the basis of the inevitable trade-off between saving time and minimizing errors.



We are grateful to the 52 data abstractors who participated in the Data Abstractor Assistant (DAA) trial as well as the 98 individuals who provided consent but were not needed for participation. We are grateful to the patient stakeholders who participated as collaborators on this project: Vernal Branch (public policy manager and patient advocate), Sandra A. Walsh, BS (California Breast Cancer Organizations), and Elizabeth J. Whamond (Cochrane Consumer Network). Whenever possible, we used verbatim text from the published protocol and other manuscripts emanating from this project.

PCORI funded this work under contract no. ME-1310-07009.

DAA investigators: Joseph Lau, MD (Brown University School of Public Health); Kay Dickersin, MA, PhD (Johns Hopkins Bloomberg School of Public Health); Jesse A. Berlin, ScD (Johnson & Johnson); Vernal Branch (Public Policy Manager and Patient Advocate); Bryant T. Smith, MPH, CPH (Brown University School of Public Health); Simona Carini, MA (University of California, San Francisco, School of Medicine); Wiley Chan, MD (Kaiser Permanente Northwest); Berry De Bruijn, MSc, PhD (National Research Council Information and Communications Technologies Portfolio, Canada); Byron C. Wallace, PhD (Northeastern University College of Computer and Information Science); Susan M. Hutfless, MS, PhD (Johns Hopkins School of Medicine); Ida Sim, MD, PhD (University of California, San Francisco, School of Medicine); M. Hassan Murad, MD, MPH (Mayo Clinic); Sandra A. Walsh, BS (California Breast Cancer Organizations); Elizabeth J. Whamond (Cochrane Consumer Network).


