Comprehensive Inventory of Research Networks

Clinical Data Research Networks, Patient-Powered Research Networks, and Patient Registries

This report was prepared by researchers based at the University of California, San Diego, RAND Corporation, and San Francisco State University on behalf of PCORI. Acknowledgements

This report was prepared by researchers based at the University of California, San Diego, RAND Corporation, and San Francisco State University on behalf of PCORI. PCORI would like to thank the team for its thorough report, delivered on a quick timeline. PCORI notes that any networks omitted from this report and limitations in the amount of detail provided on each network have resulted from the tight turnaround time required for this report. Information on the original Request for Proposal for the Comprehensive Inventory of Networks is available at pcori.org. Report submited February 19, 2013 and published June 12, 2013. It was revised July 30,2013

Principal Investigator: Lucila Ohno-Machado, University of California San Diego

Researchers:

University of California San Diego, Division of Biomedical Informatics Neda Alipanah Michele E. Day Robert El-Kareh Seena Farzaneh Patricia Freeland Adela Grando Hyeon-eui Kim

RAND Corporation Daniella Meeker

San Francisco State University Katherine Kim

CDRN, PPRN, Patient Registries: Taxonomy and Comprehensive Inventories

DISCLAIMER All statements in this publication, including its findings and conclusions, are solely those of the authorsand do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI) or its’ Board of Governors. This publication was developed through a contract to support PCORI’s research agenda and PCORI has not peer-reviewed or edited the content. The publication is being made available free of charge for the information of the scientific community and general public as part of PCORI’s ongoing research programs. Questions or comments may be sent to PCORI at [email protected] or by mail to 1828 L St., NW, Washington, DC 20036. Executive Summary

Objective The objective of this summary is to provide a lay summary of our methods, key findings about patient engagement, and descriptions of the final products (taxonomy and comprehensive inventories). We were tasked with developing a taxonomy and comprehensive inventories of three types of collaboratives: clinical data research networks (CDRNs), patient-powered research networks (PPRNs), and patient registries based on 22 criteria defined by the Patient-Centered Outcomes Research Institute (PCORI).

Methods We translated the 22 criteria into interview questions (see Appendix for the original 22 criteria with our reworded criteria) and defined CDRNs, PPRNs, and patient registries using the characteristics listed in Table 1.

Table 1. Characteristics used to classify networks into CDRNs and PPRNs, and patient registries. Collaborative Characteristics CDRN • Provides researchers with access to aggregate data such as counts and descriptive statistics (in some cases, patient-level data are provided) • Includes multiple healthcare institutions and/or research organizations • Has the ability to extract all data, i.e., does not only extract data based on a specific condition or disease PPRN • Provides patients with access to patient-provided data and/or their own genetic data • Enables patient-patient interactions • Has the ability to involve physicians and researchers Patient • Provides researchers with patient-level data Registry • Has a specific condition or disease focus • Sometimes provides patient contact information to researchers • Sometimes allows patients to contact researcher • No patient-patient interactions

We identified CDRNs, PPRNs, and patient registries by consulting experts, browsing the Internet and funded research projects on NIH reporter, searching for citations, and reading through PCORI’s RFIs. Initially, we focused on CDRNs that covered at least one million lives and PPRNs that covered at least 10,000 individuals with a particular condition (or 1,000 for rare diseases as defined by the NIH—fewer than 200,000 affected individuals in the United States). However, through our search for the CDRNs and PPRNs to include in this report, we found some that have been in existence for only a few years and therefore included fewer than one million lives (e.g., Community Health Applied Research Network or CHARN) or fewer than 10,000 individuals (e.g., Cancer Commons). Although these do not meet the covered lives criterion, including them provides a richer landscape of the types of existing networks; therefore, we expanded our parameters to include these networks. The final list of CDRNs, PPRNs, and patient registries

i included in this report are listed in Tables 2, 3, and 4 respectively. We also categorized the PPRNs into tiers: • Tier 1 – meets minimum population criterion and characteristics from Table 1 • Tier 2 – new and does not meet minimum population criterion yet • Tier 3 – meets minimum population criterion and collects data, but not necessarily for research • Tier 4 – meets minimum population criterion but does not collect any data for research (e.g., message boards)

We created a preliminary taxonomy structure based on an initial assessment of how the criteria could be grouped by criterion subject. This structure evolved as we conducted our research so that the taxonomy would better represent the distinguishing features of the networks and registries. We collected data from information obtained through each CDRN, PPRN, and patient registry’s respective website, articles obtained from the website, and RFI sent by PCORI if available. We also conducted 48 phone interviews (see Tables 2, 3, and 4). Note that figures and tables in the inventory pages were taken from documents generated by the respective network or registry.

ii Table 2. CDRNs included in this inventory. Underlined name indicates that information from PCORI’s RFI was incorporated in the inventory. “E-mail” in Interview Date column indicates that questions were answered through e-mail and a phone interview was declined. Interview CDRN Name Website Date Association of Asian Pacific Community 1 http://www.aapcho.org/ 2/7/13 Health Organizations (AAPCHO) Breast Cancer Surveillance Consortium 2 http://breastscreening.cancer.gov/ 2/4/13 (BCSC)

3 Cancer Research Network (CRN) http://crn.cancer.gov 2/1/13

Connecticut Center for Primary Care 4 http://www.centerforprimarycare.org/ 1/30/13 (CCPC)

5 CER2 Not available 2/15/13

6 CERTAIN http://www.becertain.org 2/8/13

Community Health Applied Research 7 http://www.kpchr.org/CHARN 1/29/13 Network (CHARN) Children’s Hospital of Philadelphia 8 http://www.research.chop.edu 2/12/13 Research Consortium (CHOP-PeRC) Distributed Ambulatory in Therapeutics 9 http://www.dartnet.info/ 1/31/13 Network (DARTNet) Electronic Medical Records and Genomics 10 http://emerge.mc.vanderbilt.edu/emerge-network 1/30/13 (eMERGE) Network

11 HMO Research Network (HMORN) http://www.hmoresearchnetwork.org/ 2/5/13

HOspital Medicine Reengineering Network 12 Not available 2/11/13 (HOMERuN)

13 Mini-Sentinel http://www.mini-sentinel.org/ 1/29/13

The National Dental Practice-Based 14 http://nationaldentalpbrn.org/ E-mail* Research Network Pediatric Emergency Care Applied 15 http://www.pecarn.org 1/29/13 Research Network (PECARN) Pediatric Health Information System 16 Not available 2/1/13 (PHIS+) SCAlable National Network for 17 http://scanner.ucsd.edu 1/25/13 Effectiveness Research (SCANNER) Society for Vascular Surgery Vascular 18 http://www.vascularqualityinitiative.org 1/28/13 Quality Initiative (SVS VQI)

19 UC-Research eXchange (UCReX) http://www.ucrex.org 2/8/13

Wisconsin Network for Health Research 20 https://ictr.wisc.edu/WiNHR 2/11/13 (WiNHR) *Information forwarded did not answer the criteria. No response to follow-up requests for additional information.

iii Table 3. PPRNs included in this inventory. “E-mail” in Interview Date column indicates that questions were answered through e-mail and phone interviews were declined. Interview PPRN Name Tier # Website Date

1 23andMe Tier 1 http://www.23andme.com 1/29/13

Association of Cancer Online 2 Tier 4 http://www.acor.org/ No response Resources (ACOR) Dr. Susan Love Research 3 Foundation’s Love/Avon Army of Tier 1 http://www.armyofwomen.org/ 2/19/13 Women 4 Asthmapolis Tier 2 http://asthmapolis.com/ 2/15/13

5 BRIDGE Tier 2 http://sagebridge.org Declined

6 Cancer Commons Tier 2 http://www.cancercommons.org 2/18/13

7 Crohnology Tier 2 http://crohnology.com/ 2/5/13

Collaborative Chronic Care 8 Tier 1 http://c3nproject.org/ 2/4/13 Network (C3N) Tier 1 9 DIYgenomics (population http://www.diygenomics.org/ 2/15/13 unknown)

10 Genomera Tier 1 http://genomera.com/ 2/11/13

11 Glu Tier 1 http://www.myglu.org E-mail

12 Inspire Tier 1 http://www.inspire.com No response

13 Insulindependence Tier 3 http://www.insulindependence.org 2/6/13

International Waldenstrom’s 14 Tier 1 http://www.imwf.com 2/12/13 Macroglubulinemia Foundation

15 MDJunction Tier 4 http://www.mdjunction.com/ No response

16 MedHelp Tier 3 http://www.medhelp.org/ 2/1/13

17 PatientsLikeMe Tier 1 http://www.patientslikeme.com 1/15/13

18 Personal Genome Project Tier 2 http://www.personalgenomes.org/ No response

19 Quantified Self Tier 4 http://quantifiedself.com/ 2/8/13

20 TuDiabetes Tier 1 http://www.tudiabetes.org/ 2/15/13

iv Table 4. Patient registries included in this inventory. Underlined name indicates that information from PCORI’s RFI was incorporated in the inventory. “E-mail” in Interview Date column indicates that questions were answered through e-mail and phone interviews were declined. Patient Registry Name Website Interview Date Autism Genetic Resource 1 http://agre.autismspeaks.org 2/14/13 Exchange (AGRE) http://www.autismspeaks.org/science/resources- 2 Autism Treatment Network E-mail programs/autism-treatment-network Be the Match Bone Marrow 3 http://marrow.org 2/19/13 Donor Registry Breast Cancer Family 4 http://epi.grants.cancer.gov/CFR/about_breast.html 2/13/13 Registry (BCFR) BreastCancerTrials.org 5 https://www.breastcancertrials.org 2/11/13 (BCT) California Cancer Registry 6 http://www.ccrcal.org 2/13/13 (CCR) California Immunization 7 http://cairweb.org/ E-mail Registry (CAIR) California Joint Replacement 8 http://www.caljrr.org 2/6/13 Registry (CJRR) The Colon Cancer Family 9 http://epi.grants.cancer.gov/CFR/about_colon.html 2/13/13 Registry (CCFR) Cystic Fibrosis Patient http://www.cff.org/LivingWithCF/QualityImprovement/PatientR 10 1/30/13 Registry egistryReport/ Kaiser Permanente Total 11 Not available No response Joint Replacement Registry

12 Life Raft Group http://liferaftgroup.org/ 1/31/13 Multi-Institutional Consortium for Comparative Effectiveness Research in 13 http://www.supreme-dm.org No response Prevention and Treatment of Diabetes Mellitus (SUPREME-DM) 14 MURDOCK https://www.murdock-study.com/ 2/13/13

New York State Congenital http://www.health.ny.gov/diseases/congenital_malformations/ 15 No response Malformations Registry cmrhome.htm Physician-Hospital 16 Not available 2/11/13 Organization (PHO)

17 Reg4ALL https://www.reg4all.org/ 2/15/13

18 ResearchMatch http://www.researchmatch.org 2/4/13

United Network for Organ 19 http://www.unos.org 2/21/13 Sharing (UNOS) http://www.huntsmancancer.org/research/shared- 20 Utah Population Database 2/15/13 resources/utah-population-database/overview

v Final products

Taxonomy The final taxonomy (whose excerpts are shown in Figures 1A-E) organizes the 22 criteria into three top-level classes (Network Characteristics, Evidence of Clinical Studies Capacity, and Data Processing) (Figure 1A). Each class is broken down into subclasses. For example, Network Characteristics is a top-level class with Clinical Focus as its subclass (Figure 1B). When possible, we added subclasses to the subclass to include more specific details that covered the answers we gathered. The entire taxonomy structure is provided in the Appendix. These subclasses were annotated with criterion number and question (Figure 1C). We considered this annotation as representing the instances for the ontology classes. Instances are the answers to the criteria that were gathered during our data collection process (Figure 1D). We also included the Resource Descriptor Framework (RDF), which defines the classes or subclasses in the taxonomy, their annotations, and their hierarchy (Figure 1E).

Figure 1. Taxonomy. A-C. OWL file viewed with Protégé1. D. Answers gathered for each criterion. E. Resource Descriptor Framework2 A B C

D Criteria' Answer' 1.e.i.'(Y/N)'Does'the'network'have'a' Yes! focus'(i.e.,'topic'area'or'purpose)?' 1.e.i.1.'What'does'the'network'focus' Medically!underserved!popula1ons!of!Asian!Americans,!Na1ve!Hawaiians,!and!other!Pacific!Islanders! on?''

!! E ! !!!!! !!!!!!!!! !!!!!!!!1.e.i.!(Y/N)!Does!the!network!have!a!focus!(i.e.,!topic!area!or!purpose)?! 1.e.i.1.!What!does!the!network!focus!on?!! !!!!!

Comprehensive Inventories A comprehensive inventory of each CDRN, PPRN, and patient registry is included in the Appendix. Each inventory is displayed with the same format of Criteria in the left column and Answers in the right column.

1 Protégé is an open source framework for modeling taxonomies that is available for download at http://protege.stanford.edu/. 2 The Resource Descriptor Framework can be viewed with a text editor application.

vi Key Findings To assess the level of patient engagement in the networks and registries, we examined our data at three levels: governance, study, and data. While we are not covering the full spectrum of patient engagement for our evaluation, we assessed the following criteria to determine if patients are involved vs. not directly involved at each level.

• Governance o 1.g.i. Are patients involved in the decision-making process on the use of data they provided to the network?

• Study o 1.f. Does the network use informed consent forms? § 1.f.i. Do patients consent to the broad3 … or specific use of their electronic data? § 1.f.ii. Do patients consent to the broad … or specific use of their biological specimens? § 1.f.iii. Can patients be re-contacted for consent for a new study? • Data o 1.g.ii.1. What are the sources of self-reported data? o 1.g.ii.2. What are the sources of health care-derived data?

We found that patient involvement in the decision-making process for the use of their data is high in PPRNs (17 out of 20) and relatively low in CDRNs (5 out of 20) and patient registries (6 out of 20) (Figure 2). Examples of how patients were involved in the decision-making process include serving as members of the advisory board, controlling how much data are shared via privacy settings, and owning data and determining how much data to contribute. We also analyzed if informed consent is included (Figure 3), whether consent is for broad or specific use of the respective patient’s data (Figure 4), and if it would be possible to re-contact a patient for a new study (Figure 5). Based on Figure 3, all three types of collaboratives tend to engage patients in a study through the use of informed consent forms. The consent within CDRNs tends to be for specific use of data, while the consent within PPRNs tends to be for broad use of data. The types of data used are almost exclusively health care-derived in CDRNs and mostly self-reported in PPRNs (Figure 6).

3 We define broad to mean that data may be analyzed for other research.

vii Figure 2. Counts of Patientspatient involvement involved in the decision-making in decision-making process on the use of his or her data.

CDRNs PPRNs

3 5

15 17

Registries

Yes 1 6 No

13 Not available

Informed Consent Figure 3. Counts of each collaborative type using informed consent.

CDRNs PPRNs

6 10 10 14

Registries

1 Yes 6 No 13 Not available

viii Figure 4. Counts of patient consent to the broad or specific use of his or her electronic data or biological specimens. CountsPatients of not availableconsent and notfor applicable use of are data not depicted.

25

20 6 CDRNs 15 7 8 PPRNs 10 2 Registries 3 4 5 5 8 5 4 1 2 2 0 Broad Specific Both Broad Specific

Use of Electronic Data Use of Biological Data

Figure 5. Counts of whetherPatients patients can can be re-contactedbe re-contacted for a new study.

CDRNs PPRNs

1 6 9 10 14

Registries

Yes 3 No 5 12 Not available

ix Types of Data Used Figure 6. Counts of each collaborative type using Health-Care Derived and/or Self-Reported data.

30'

25' CDRNs

20' PPRNs 19 1

15' Registries

14 10' 5

5' 8 5 3

0' Health Care-Derived Self-Reported Both

Taken together, meaningful patient engagement throughout the research study process seems to be missing other than engaging the patient as a research subject. This gap may be due to the different perspectives regarding data sharing, e.g., patients with rare diseases want multiple scientists to share data so they would not need to participate in so many studies, while scientists are reluctant to share unanalyzed, raw data or analyze data collected by another group (see “Families Push for New Ways to Research Rare Diseases” from the Wall Street Journal).

Issues and Gaps We found that demographics information is not represented consistently across the networks and registries. In addition, some network/registry information was confidential, not provided, or difficult to interpret through interviews, e.g., budget and what elements would be considered metadata. Because of time constraints, the interviewees were not granted the option to review inventories. The information contained in this report represent our interpretation of answers provided by the interviewees and therefore may contain errors. We removed a few registries from our list whose websites did not contain enough information to answer the criteria and whose listed contact did not respond to our requests for an interview, e.g., Alzheimer Disease Patient Registry (http://www.washington.edu/research/centers/146), BioSense (http://www.cdc.gov/biosense/), and National Spina Bifida Patient Registry (http://www.cdc.gov/ncbddd/spinabifida/NSBPRregistry.html).

x Contents

Executive Summary i I. Taxonomy 1 II. Original 22 Criteria with Reworded Criteria 3 III. Inventories of CDRNs 5 IV. Inventories of PPRNs 73 V. Inventories of Patient Registries 139 I. Taxonomy 1. Network Characteristics a. Patient Population i. Number of Lives Covered (1.a) ii. Demographics 1. Racial/Ethnic (1.b.i.1) 2. Geography (1.b.i.2) 3. Age (1.b.i.3) 4. Gender (1.b.i.4) b. Clinical Focus (1.e.i, 1.e.i.1) c. Finances i. Total annual budget (1.c.i) ii. Total annual cost network infrastructure and maintenance (1.c.i.1) (1.c.iii) iii. Total annual cost conducting studies (1.c.i.2) iv. Sources of funding (1.c.ii) d. Years in existence (1.d) e. Clinical data i. Electronic Data 1. Source i. Self-reported data (1.g.ii.1) ii. Health care-Derived data (1.g.ii.2) iii. Data collected in clinical trials (1.g.ii.3) 2. Type (4.f) ii. Biospecimen 1. Source i. Biobank (3.a) ii. Collected by the network for research (3.d) 2. Type (3.b) f. Policies i. Patient-related policies 1. Type of Consent (1.f) a. No consent required (1.f) b. Broad use of electronic data (1.f.i) c. Broad use of biosamples (1.f.ii) 2. Governance involvement mechanisms (1.g.i, 1.g.i.1) 3. Re-contact for new study needed (1.f.iii.1) ii. Data sharing 1. Requirements for institutional investigators to collaborate with each other (1.g.iii.1.a) 2. Requirements for sharing outside the network (1.g.iii.1.b) 3. Policies for protecting proprietary data (1.g.iii.1.c) g. Healthcare organizations engagement (2.c.i) i. Mechanisms of participation (2.c.ii) h. Methods for Data Security (4.a)

1 2. Evidence of Clinical Studies Capacity a. Publications i. Evidence of clinical care or quality improvement (1.a.iii.1) ii. Studies published in peer reviewed journals (2.a) iii. Evidence of longitudinal follow-up studies (2.b.i) iv. Evidence of randomized control trials (2.d.i.1) b. Study type i. New studies in the same or different condition from the clinical focus (1.a.iii) ii. Longitudinal follow up (2.b) 1. From existing reports by passively reviewing the data (2.b.ii) a. Using mechanisms to standardize data elements (2.b.ii.1) iii. Randomized controlled trials using network data (2.d.i) iv. Analysis of biospecimens 1. Analysis of biospecimens from biobanks (3.c) 2. Analysis of biospecimens collected by the network (3.d.i) 3. Results linkable to patient outcomes (3.d.ii) v. Clinical care delivery (1.a.iii) vi. Quality improvement (1.a.iii) 3. Data processing a. Harmonization i. Query distribution via central hub (4.b.i) 1. Architecture (4.b.ii) ii. Standardized terminologies adopted (4.c.i, 4.c.ii) iii. Common data model used (4.d.i, 4.d.ii) 1. Data mapping and transformation mechanism (4.d.iii) iv. Metadata collected (4.e.i) 1. Description (4.e.i.1) b. Extraction i. Natural language processing (4.g.i) 1. Approaches (4.g.ii) c. Aggregation i. Before it leaves the local site (4.h.i) ii. Transformation method (4.h.ii) d. Statistical analysis i. Applications (4.i) e. Integration for longitudinal analysis (4.j.i) i. Tools used (4.j.ii)

2 II. Original Criteria with Reworded Criteria

Criteria Listed in RFP Reworded Criteria 1. Number of covered lives 1.a. How many people does the network cover or involve? 11. Evidence of capacity for expansion to cover additional lives, 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, diseases, conditions, or procedures conditions, or procedures 20. Reusability (is the network available for new studies in the same or 1.a.ii.1. Can the network be used for new studies in the same or a different a different condition, or is it restricted to a single study?) condition? 21. Ability of the network to perform quality improvement and assist 1.a.iii. (Y/N) Is there evidence from the past that show the network can be in clinical care delivery used for clinical care delivery or quality improvement? 21. Ability of the network to perform quality improvement and assist 1.a.iii.1. What is the evidence? in clinical care delivery 2. Demographics: describe the covered population in terms of 1.b.i.1. Demographics: racial/ethnic racial/ethnic groups 2. Demographics: describe the covered population in terms of 1.b.i.2. Demographics: geography geography 2. Demographics: describe the covered population in terms of age 1.b.i.3. Demographics: age 2. Demographics: describe the covered population in terms of gender 1.b.i.4. Demographics: gender 17. Total annual budget 1.c.i. What is the total annual budget? 1.c.i.1. How much of that budget is dedicated to infrastructure and 17. proportions dedicated to maintenance and infrastructure maintenance? 17. proportions dedicated to conduct of studies 1.c.i.2. How much of that budget is dedicated to conducting studies? 17. current source(s) of funding 1.c.ii. What are the current sources of funding? 16. Annual cost of maintaining and updating network 1.c.iii. How much does it cost each year to maintain and update the network? 18. Years in existence 1.d. How many years has this network existed? 3. Specify the clinical characteristics, such as disease, condition, or 1.e.i. (Y/N) Does the network have a focus (i.e., topic area or purpose)? treatment focus, if any 3. Specify the clinical characteristics, such as disease, condition, or 1.e.i.1. What does the network focus on? treatment focus, if any combination of 4. and 5. 1.f. (Y/N) Does the network use informed consent forms? 4. Whether patient consent for broad use of electronic data is present 1.f.i. Do patients consent to the broad (meaning data may be analyzed for and currently in effect other research) or specific use of their electronic data? 4. Whether patient consent for broad use of biological specimens is 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for present and currently in effect other research) or specific use of their biological specimens? 5. Whether patient consent for re-contact is present and currently in 1.f.iii. (Y/N) Can patients be re-contacted for consent for a new study? effect 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of 6. Are patients involved in governance of the uses of network data? the data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they 6. If so, how? involved in the decision-making process? 7. Sources of electronic data: claims; registry data; electronic health 1.g.ii.1. What are the sources of Self-Reported data collected in the network? record (EHR) data (which EHR vendor?); and the capacity to link with (e.g., conditions, medications, medication adherence, procedures, pharmacy and diagnostic databases, especially imaging- and lab-based labs/imaging, health-related quality of life) 7. Sources of electronic data: claims; registry data; electronic health 1.g.ii.2. What are the sources of Health care-Derived data collected in the record (EHR) data (which EHR vendor?); and the capacity to link with network? (e.g., coded diagnostics, pharmacy orders, pharmacy fulfillment, pharmacy and diagnostic databases, especially imaging- and lab-based procedures, lab orders, diagnostic results, imaging data) 7. Sources of electronic data: claims; registry data; electronic health 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? record (EHR) data (which EHR vendor?); and the capacity to link with (e.g., coded diagnostics, drug information, procedures, lab orders, diagnostic pharmacy and diagnostic databases, especially imaging- and lab-based results, imaging data, biospecimen, health-related quality of life) 8. Data sharing policy, including existence of requirements for 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaboration with institutional investigators collaborate with each other using the data 8. Data sharing policy, including existence of requirements for 1.g.iii.1.b. Policies for sharing data outside the network collaboration with institutional investigators 8. Data sharing policy, including policies in place to protect proprietary 1.g.iii.1.c. Policies for protecting proprietary data data 19. Exemplar studies (at least three with publications in peer-reviewed 2.a. Three most recent (or high impact) studies published in peer-reviewed literature, if available) journals 9. Evidence of capacity to conduct, and experience in conducting, 2.b. (Y/N) Have researchers conducted studies that involve longitudinal longitudinal follow-up for clinical outcomes (multiple values rather than one time) follow-up? 9. evidence of the capacity to analyze data from longitudinal follow-up 2.b.i. What is the evidence? 10. Are there passive means of determining follow-up and ongoing 2.b.ii. (Y/N) Can researchers conduct follow-up or ongoing observation from observation? existing reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do 10. If so, describe any standardization of data elements researchers standardize survey type questions over a period of time?)

3 Criteria Listed in RFP Reworded Criteria 12. Extent to which the network benefits from the support of, or active 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively involvement from, a healthcare delivery system participating or engaging in research activities conducted by the network? 12. Extent to which the network benefits from the support of, or active 2.c.ii. How? (Examples: by referring patients, giving access to EHRs, etc.) involvement from, a healthcare delivery system 13. Past performance conducting randomized controlled trials (cluster, 2.d.i. (Y/N) Have there been any randomized control trials using the data individual) using the database collected in the network? 13. Past performance conducting randomized controlled trials (cluster, 2.d.i.1. What is the evidence? individual) using the database 14. Present availability of biospecimens/biobank 3.a. (Y/N) Does the network have biobanks? 14. detail on type of biospecimens (such as DNA, RNA, protein, and 3.b. What types of biospecimens are collected? other biomarkers) collected 14. for what types of analysis 3.c. What types of analysis are done on them? 3.d. (Y/N) Do researchers in the network collect biospecimens for research 15. Prior experience in collecting biospecimens for research purposes purposes? 15. Prior experience in analyzing biospecimens for research purposes 3.d.i. What types of analyses do they conduct on them? 3.d.ii. Were they able to link the analysis/research results back to patient 15. capacity to link biospecimens [sic] to patient outcomes outcomes? 22.a. Does the network manage security 4.a. What type of security technology does the network use? 22.a. Does the network manage query distribution via a central hub? 4.b.i. (Y/N) Are queries distributed via a central hub? 22. a. Please describe in brief. 4.b.ii. What is the architecture of the query distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, 22. b. Does the network use standardized terminologies (ie, ICD-9, SNOMED, etc.)? SNOMED, etc)? 22. b. If so, please provide information on which terminologies are 4.c.ii. Which terminologies? used. 4.d.i.(Y/N) Does the network use a common data model (CDM)? 22. c. Does the network use a common data model (CDM)? 22. c. If so, please provide information on which CDM is used 4.d.ii. Which CDM is used? 22. c. If so, please provide information on how the data is transformed 4.d.iii. How are the data transformed and mapped? and mapped to the model. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and 22. d. Is metadata routinely collected? interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is 22. d. If so, please list key metadata elements collected. there a way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and 22. e. Please list the types of data that are being collected or access incorporated into the network (e.g., EHR data, claims, patient-reported and incorporated into the network (eg, EHR data, claims, patient- outcomes, etc.). reported outcomes, etc). 4.g.i. (Y/N) Does the network use natural language processing? 22. f. Are you conducting natural language processing? 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different 22. f. If so, which application or approach are you using? parsers, etc.) or approaches (examples are machine learning, rule-based) are being used? 22. g. Is data aggregated before it leaves the local site and shared with 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are the network? shared with the network? 22. g. Please describe in brief how the data is transformed and when it 4.h.ii. How are the data transformed (i.e., based on what criteria are the data leaves control of the local site. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers 22. h. Does the network provide data analysis tools for researchers? through the network? Please describe in brief. 4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into 22. i. Are IT or informatics tools used to integrate administrative, longitudinal patient-level data? (Are administrative, billing, and clinical records billing, and/or clinical records data into patient-level longitudinal data? kept in individual places or lumped in with patient-level data?)

22. i. If so, which informatics tools? 4.j.ii. What informatics tools are used?

4 III. Inventories of CDRNs

Association of Asian Pacific Community Health Organizations (AAPCHO)

Criteria Answers 1.a. How many people does the network 450,000 cover or involve? 1.a.i. Evidence of capacity for expansion to EMRs are currently being installed in all clinics, the network was recently awarded an NIH CBPR grant, new clinical health cover additional lives, diseases, conditions, or sites are being added to the network, the network is partners with CHARN, N^2, and NACHC to increase capacity for procedures research. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? This network helps doctors make decisions about clinical care by incorporating enabling services data, and social 1.a.iii.1. What is the evidence? determinants of health data, culturally efficient and effective care that advances health and reduces disparities, and integrate essential enabling services (e.g., interpretation, eligibility assistance) that facilitate access to care. 1.b.i.1. Demographics: racial/ethnic High concentrations of medically underserved Asian Americans, Native Hawaiians, and other Pacific Islanders (66%) California, Hawaii, Washington, New York, Massachusetts, Minnesota, Illinois, Florida, and the Republic of the Marshall 1.b.i.2. Demographics: geography Islands 0-2: 5.8% <15: 23.2% 1.b.i.3. Demographics: age 15-64: 67.5% >65:9.3% Male: 32% 1.b.i.4. Demographics: gender Females: 58% 1.c.i. What is the total annual budget? $666,000 1.c.i.1. How much of that budget is dedicated $250,000-350,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $250,000 to conducting studies? Bureau of Primary Health Care, ARC funded project (N^2), Health Resources and Services Administration (HRSA), The 1.c.ii. What are the current sources of California Endowment, Centers for Disease Control and Prevention, Gilead Sciences, National Institutes of Health (NIH), funding? New York University Center for the Study of Asian American Health, Office of Minority Health

1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 25 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Medically underserved populations of Asian Americans, Native Hawaiians, and other Pacific Islanders 1.f. (Y/N) Does the network use informed No - IRB approval and waivers of authorization are required for research studies consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and There is a Community IRB, which includes members of the patient population. Community stakeholders also collaborated in what mechanism? How are they involved in with the network to develop the Criteria for Community Engagement in Research that includes principles of community the decision-making process? involvement, alignment with community mission, equity, and community accountability. 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR (NextGen, Centricity) pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

5 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Researchers must file IRB application forms, data request form, memoranda of understanding, business associate institutional investigators to collaborate with agreement, and data use agreement each other using the data 1.g.iii.1.b. Policies for sharing data outside Does not currently share data outside the network but if it were to be shared it would require the same IRB application the network process as for sharing within the network 1.g.iii.1.c. Policies for protecting proprietary Each health center can see their own patient-level data only. All other visible data is aggregated. data 1) Chang Weir, R., Law, H. Enabling Services Health Information Exchange at Hawaii Community Health Centers: Evaluation Report. Association of Asian Pacific Community Health Organizations, February 2012.

2) Chang Weir, R., Law, H., Valle-Perez, M., & Ayson, A. The Pacific Innovation Collaborative Health Information 2.a. Three most recent (or high impact) Technology: A report highlighting the development of the PIC data repository and report manager. Association of Asian studies published in peer-reviewed journals Pacific Community Health Organizations, October 2011.

3) Chang Weir, R., Law, H., Oneha, M., Lee, S., & Chien, A. (Under Review). Impact of a Pay for Performance Program to Improve Emergency Department Utilization at Community Health Centers serving Asian American, Native Hawaiian, and Other Pacific Islander Communities. Submitted February 2013 to Journal of Health Care for the Poor and Underserved. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Chang Weir, R., Law, H., Oneha, M., Lee, S., & Chien, A. (Under Review). Impact of a Pay for Performance Program to 2.b.i. What is the evidence? Improve Emergency Department Utilization at Community Health Centers serving Asian American, Native Hawaiian, and Other Pacific Islander Communities. Submitted February 2013 to Journal of Health Care for the Poor and Underserved. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers The network tries to code lists in the same manner that is reported to UDS (Uniform Data System, Health Resources and standardize survey type questions over a Services Administration reporting system) whenever possible. period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Clinics give access to patient EHRs and data on other patient enabling services giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? When a health center statistician logs onto the network, they can see the data and ask for customized reports to be sent to 4.b.ii. What is the architecture of the query them. External collaborators would submit a query to the website and, if approved, would get the data returned in a distribution? standard SQL format. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

6 Criteria Answers 4.c.ii. Which terminologies? ICD-9, SNOMED 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Using a home grown data dictionary that includes social determinants of health. There is also a change log and a limited way to map back to standards? (Data level of versioning. Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into EHR, lab, pharmacy, ER/urgent care, specialty/referral, and data on non-clinical support services including case the network (e.g., EHR data, claims, patient- management assessment, case management treatment or planning, referrals, interpretation, transportation, eligibility reported outcomes, etc.). assistance, health education, and outreach services

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., The health plan and health center data are aggregated together on the regional end and then forwarded to the central based on what criteria are the data hub. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

7 Breast Cancer Surveillance Consortium (BCSC)

Criteria Answers 1.a. How many people does the network 2,300,000 cover or involve? A current project evaluates performance characteristics of standard and advanced breast imaging technologies based on 1.a.i. Evidence of capacity for expansion to breast cancer risk and specific subgroups (e.g., age, race/ethnicity, breast density), as these technologies disseminate into cover additional lives, diseases, conditions, or community practices. The BCSC will use existing and new data collected from the 6 current BCSC breast imaging registries. procedures Collaborations using the data will be possible in the future. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? The BCSC has worked collaboratively with the American College of Radiology (ACR) External Web Site Policy to develop common data forms that collect patient and radiology information. This collaboration has resulted in improvements in the quality of mammography data collected and has improved the quality of data within the BCSC. BCSC sites provide reports 1.a.iii.1. What is the evidence? to participating facilities that include information on volume of mammograms read, true positives, false positives, and other data. Radiologists use this information for quality improvement and in their Mammography Quality Standards Act (MQSA) compliance activities. Total Population White (Non-Hispanic): 70% Hispanic: 7.3% 1.b.i.1. Demographics: racial/ethnic Black (Non-Hispanic): 5.6% Asian/Pacific Islander: 6% American Indian/Alaskan Native: 0.9% Mixed/Other/Unknown: 10.2% Sites: Carolina Mammography Registry (Chapel Hill, NC), Vermont Breast Cancer Surveillance System (Burlington, VT), Group Health (Seattle, WA), 1.b.i.2. Demographics: geography San Francisco Mammography Registry (San Francisco, CA), New Hampshire Mammography Network (Lebanon, NH), New Mexico Mammography Project (Albuquerque, NM), Colorado Mammography Project (Golden, CO) 1.b.i.3. Demographics: age Ages 18 and up, but the vast majority of patients are over age 40 1.b.i.4. Demographics: gender Female: 100% 1.c.i. What is the total annual budget? $1,500,000 dedicated to the statistical coordinating center 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of National Cancer Institute contract funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 15 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? Mammography performance, performance of new breast imaging technologies (e.g., breast MRI), effectiveness of breast 1.e.i.1. What does the network focus on? imaging by patient and provider factors, and biological measures of risk 1.f. (Y/N) Does the network use informed Yes - Varies by registry. Some registry sites get informed consent while other sites get a waiver of informed consent at the consent forms? time of their mammogram. 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other At some registry sites, patients consent to broad use of their data research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other At some registry sites, patients consent to broad use of their data research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network?

8 Criteria Answers 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for A potential investigator presents a concept proposal form, specifying the scientific idea. The steering committee reviews institutional investigators to collaborate with and if they approve it, the researcher works with an analyst at the statistical coordinating center to write up a full proposal, each other using the data which is reviewed by the steering committee and if it is feasible, it is approved. A potential investigator presents a concept proposal form, specifying the scientific idea. The steering committee reviews 1.g.iii.1.b. Policies for sharing data outside and if they approve it, the researcher works with an analyst at the statistical coordinating center to write up a full proposal, the network which is reviewed by the steering committee and if it is feasible, it is approved. 1.g.iii.1.c. Policies for protecting proprietary None- No data that are considered sensitive are released. data 1) Henderson LM, Hubbard RA, Onega TL, Zhu W, Buist DS, Fishman P, Tosteson AN. Assessing health care use and cost consequences of a new screening modality: the case of digital mammography. Med Care 50(12):1045-52. 2012 Dec

2) Onega T, Smith M, Miglioretti DL, Carney PA, Geller BA, Kerlikowske K, Buist DS, Rosenberg RD, Smith RA, Sickles EA, 2.a. Three most recent (or high impact) Haneuse S, Anderson ML, Yankaskas B. Radiologist agreement for mammographic recall by case difficulty and finding type. studies published in peer-reviewed journals J Am Coll Radiol 9(11):788-94. 2012 Nov

3) James TA, Mace JL, Virnig BA, Geller BM. Preoperative needle biopsy improves the quality of breast cancer surgery. J Am Coll Surg 215(4):562-8. 2012 Oct 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? The Breast Cancer Surveillance Consortium (BCSC) has the nation’s largest longitudinal collection of mammography data 2.b.i. What is the evidence? from breast cancer screening in community practice. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? When a new question is added to the patient questionnaire, the statistical coordinating center makes sure that the 2.b.ii.1. How do researchers standardize question is being asked the same way at each site so that the data can be coordinated across the board. those data items? (e.g., how do researchers With two different sources of data on the same construct, the coordinating center creates a new data element that is standardize survey type questions over a populated by the new data only and also retains the old data elements that are populated by the old data and then the period of time?) statistical coordinating center creates a computed variable to harmonize the two. 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? The data that BCSC collects from women and radiologists/facilities are linked to cancer outcomes data from population- based cancer and pathology registries. This linkage occurs at each site. Three sites—Group Health Cooperative, the New Mexico Mammography Project, and the San Francisco Mammography Registry—are linked to registries within NCI’s 2.c.ii. How? (Examples: by referring patients, Surveillance, Epidemiology, and End Results (SEER) Program. The Colorado Mammography Project is linked to its statewide giving access to EHRs, etc.) pathology registry. The Carolina Mammography Registry, New Hampshire Mammography Network, and Vermont Breast Cancer Surveillance System collect benign and malignant breast pathology reports from laboratories in their defined regions and additionally link to their respective state cancer registries. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? In one study, radiologists were randomized to receive an intervention to try to improve their interpretive performance in 2.d.i.1. What is the evidence? reading mammograms. 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Breast tissue biopsies collected?

9 Criteria Answers Type: total mastectomy, partial mastectomy, core biopsy, fine needle aspiration Guidance: clinical palpation, ultrasonography, stereotaxis, needle localized, mammographic Pathologic Variables Histologic type: ductal, lobular, other special types; grade, estrogen and progesterone receptor status 3.c. What types of analysis are done on them? Staging: tumor size, number of positive lymph nodes, distant metastasis (American Joint Committee on Cancer TNM stage), extent of disease (SEER) Histopathology: atypical hyperplasia (ductal and/or lobular), ductal hyperplasia, fibroadenoma, phyllodes tumor, other benign, normal, inconclusive) 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does All data are encrypted. the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query A query is sent to the coordinating center and an analyst runs the query for the researcher and sends the results back to distribution? him or her. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Home grown common data model Local sites collect the data and code according to the data dictionary. The data are encrypted and put up on the SSP. Then, 4.d.iii. How are the data transformed and the coordinating center receives the data, decrypts, and processes the raw data files for data quality. Finally, data are mapped? pulled together from all the sites to create computed variable versions that are used in analysis. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown data dictionary way to map back to standards? (Data Dictionary?) • Demographics, risk factors, clinical history 4.f. List the types of data that are being • Mammography examinations: indication, assessment, recommendation, breast density collected or accessed and incorporated into • Facilities: services, technologies, characteristics the network (e.g., EHR data, claims, patient- • Tumor registries and pathology labs: breast and ovarian cancer, tumor characteristics, benign breast reported outcomes, etc.). disease, treatment • Vital statistics: death date and cause of death 4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS codes, STATA scripts, R code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

10 Criteria Answers 4.j.ii. What informatics tools are used? Not applicable

11 Connecticut Center for Primary Care (CCPC)

Criteria Answers 1.a. How many people does the network 364,293 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or CCPC collaborates with DARTNet procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? The Children’s Fund of Connecticut and the Child Health and Development Institute of Connecticut (CHDI) have approved a strategic implementation plan to explore the engagement of commercial insurers in CHDI’s work underway with the Connecticut Department of Social Services and the HUSKY insurance program. This plan includes support for targeted strategies in primary care that, when implemented, will improve care and outcomes for children, and among other things, will ensure their readiness for school, efficient utilization of health and other services, and overall improved health status. 1.a.iii.1. What is the evidence? The targeted strategies for which CHDI is currently seeking support include: Universal developmental screening at 9, 18, and 24 to 30 months of age. Reimbursement for care coordination services performed in the primary care setting. Expanded capacity of pediatric primary care to address behavioral health issues.

Total Population White (Non-Hispanic): 92.23% Hispanic: 5.78% 1.b.i.1. Demographics: racial/ethnic Black: 4.76% Asian: 2.55% American Indian/Alaska Native: 0.27% Pacific Islander/Hawaii Island: 0.19% 1.b.i.2. Demographics: geography Connecticut Total Population 0-19: 10,3848 20-24: 26,670 25-29: 16,041 30-34: 15,425 35-39: 16,186 1.b.i.3. Demographics: age 40-44: 21,214 45-49: 24,834 50-54: 27,525 55-59: 25,918 60-64: 21,768 65-69: 18,246 Males: 47.6% 1.b.i.4. Demographics: gender Females: 53.4% 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $400,000 to conducting studies? Primary Care Summit brings in $40,000-50,000 - there is no core sustainable infrastructure 1.c.ii. What are the current sources of U.S. Department of Education, the U.S. Department of Health and Human Services, the Agency for Healthcare Research funding? and Quality, the Connecticut Department of Public Health, the University of Connecticut, the Donaghue Foundation, the Commonwealth Fund, the American Academy of Pediatrics, public contributors 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 11 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Primary Care 1.f. (Y/N) Does the network use informed Yes - for studies that involve direct interactions with patients (e.g., a survey) consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad consent, or the IRB gives waivers for specific studies because the data are de-identified research) or specific use of their electronic data?

12 Criteria Answers 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Three community members with non-medical backgrounds are on the Board of Directors. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR (AllScripts Enterprise) pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Institutional agreements and business associate agreements, data are always de-identified each other using the data 1.g.iii.1.b. Policies for sharing data outside Multiple collaborative studies with other networks that require data use agreements. The database is not open to the the network public. 1.g.iii.1.c. Policies for protecting proprietary Data are de-identified by CCPC data 2.a. Three most recent (or high impact) All articles written thus far were for conferences studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? CCPC is currently applying for grants to extend the database to look at patient outcomes over time. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By giving access to EHRs, by enrolling patients in research studies who are coming to the healthcare organization to be giving access to EHRs, etc.) seen by a physician 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them?

13 Criteria Answers 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? ProHealth maintains a dedicated secure computer center in their corporate office, a certified co-location facility for business continuity, a SAN for hourly data backup, and a VMware server environment for application recovery. A full fiber optic WAN connects each site to the central facility. All ProHealth clinical encounters are processed through a central 4.a. What type of security technology does administrative system which includes Microsoft Business Intelligence solutions for analysis and of the network use? administrative and clinical data. This informatics capability runs on a dedicated integrated SQL data repository and a SharePoint communication platform.

4.b.i. (Y/N) Are queries distributed via a Yes central hub? The researcher asks CCPC's Principle Investigator (PI) for information, the data from the 8 sites all go into a common 4.b.ii. What is the architecture of the query clinical repository and PI strips and de-identifies the information. The PI writes the SQL code and returns it to the distribution? researcher. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, CPT4 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- EHR and all payer claims data, surveys reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Proprietary 3rd party software called FollowMyHealth by Jardogs aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Linked together by the practice management system

14 CERTAIN

Criteria Answers 1.a. How many people does the network 3,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Covers mostly surgical care and outcomes procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Adoption of Laparoscopy for Elective Colorectal Resection: A Report from the Surgical Care and Outcomes Assessment 1.a.iii.1. What is the evidence? Program. J Am Coll Surg 2012 Jun;214(6):909-18. White: 61.6% Hispanic: 4.6% Black/African American: 2.9% American Indian/Alaska Native: 1.0% 1.b.i.1. Demographics: racial/ethnic Asian: 9.6% Pacific Islander/Hawaiian: 13.7% Other: 10.9%

1.b.i.2. Demographics: geography Not available < 18: 1.7% 1.b.i.3. Demographics: age 18-30: 15.4% Female: 54.5% 1.b.i.4. Demographics: gender Male: 45.5% 1.c.i. What is the total annual budget? $2,300,000 1.c.i.1. How much of that budget is dedicated $500,000-800,000/yr to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $500,000-800,000/yr to conducting studies? 1.c.ii. What are the current sources of AHRQ, Life Sciences Discovery Fund, Nestle Foundation funding? 1.c.iii. How much does it cost each year to $500,000-800,000/yr maintain and update the network? 1.d. How many years has this network 2 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Improving patient surgical outcomes 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - Consent is obtained when identified patient-level data are being used for specific studies not when de-identified research) or specific use of their electronic data are being used for quality improvement studies data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? The Patient Advisory Groups bring together patients, advocates, or advocacy organizations to provide a valuable patient 1.g.i.1. What are the roles patients play and perspective to researchers and clinicians in multiple CERTAIN research studies. Patient Advisory Groups, or individual in what mechanism? How are they involved in patient advisors within the groups, routinely provide feedback on research questions; research materials; maximizing the decision-making process? patient participation and benefit to individual patient’s for research participation; interpretation of study findings; and development of publicly released information, documents or tools to share with other patients broadly. 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

15 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR (, EPIC, MEDITECH), data extracted from skilled nurse systems and/or doctor's offices pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Other groups may request either SCOAP (quality improvement) or CERTAIN (research data) through existing data use institutional investigators to collaborate with policies and application procedures. These will soon be posted to the CERTAIN website; in the interim, initial inquiries may each other using the data be submitted to the CERTAIN Program Director. Other groups may request either SCOAP (quality improvement) or CERTAIN (research data) through existing data use 1.g.iii.1.b. Policies for sharing data outside policies and application procedures. These will soon be posted to the CERTAIN website; in the interim, initial inquiries may the network be submitted to the CERTAIN Program Director. CERTAIN employs rigorous processes for ensuring the protection of all patient data collected for research purposes. A unique study code is assigned to each study participant and is used on all study related data collection documents and 1.g.iii.1.c. Policies for protecting proprietary analyses. A master list of codes and identifiers is maintained in a secured password protected spreadsheet on the research data computers. Only select research personnel directly involved in conducting study procedures have access to the master list. These persons have signed a Confidentiality Agreement. The link between the subject identifiers and unique study code will be maintained for the duration of the study and destroyed once all data points have been analyzed. 1) Progress in the Diagnosis of Appendicitis: A Report from Washington State’s Surgical Care and Outcomes Assessment Program. Ann Surg 2012 Oct;256(4):586-94.

2.a. Three most recent (or high impact) 2) Adoption of Laparoscopy for Elective Colorectal Resection: A Report from the Surgical Care and Outcomes Assessment studies published in peer-reviewed journals Program. J Am Coll Surg 2012 Jun;214(6):909-18.

3) β-blocker continuation after noncardiac surgery: a report from the surgical care and outcomes assessment program. Arch Surg. 2012 May;147(5):467-73. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Spine SCOAP: For the Spine SCOAP module, the Patient Voices Project is capturing PROs through the use of the Owestry Disability Index and Neck Disability Index – two validated instruments to assess functional outcomes as reported by 2.b.i. What is the evidence? patients. Presently, questionnaires are administered in the 30 days following their surgical procedure, and then bi-annually through 5 years post procedure date. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers They have standardized quarterly reports that researchers can review. standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Sharing data from EHRs giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? CERTAIN has both the allocation and concealed methods to adequately perform such randomization, and a broad enough 2.d.i.1. What is the evidence? population about hospitals/clinics, providers and patients to be able to identify them and match them accordingly. 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct No on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes?

16 Criteria Answers Only select research personnel directly involved in conducting study procedures have access to the master list. These persons have signed a Confidentiality Agreement. The link between the subject identifiers and unique study code will be maintained for the duration of the study and destroyed once all data points have been analyzed. Data gathered for research purposes is entered and analyzed on password protected computers belonging to the research center. Only research personnel have access to these computers. Domain passwords must be at least 8 characters in length, conform to complexity rules and be changed at least every 120 days. All laptops are encrypted using 4.a. What type of security technology does PGP Whole Disk Encryption (PGP Corp., Menlo Park CA, 94025). All computing systems are configured with active the network use? anti-virus software, host-based firewalls and automatic installation of operating system critical patches and updates Anti-virus software is configured to update daily. The host-based firewalls restrict in-bound connections to only the subnets where department workforce reside or that are needed for firewall administration. The firewall rule set on the dedicated server is further restricted to the network subnets used by research personnel. On the file server, all project data will be located in a folder structure with access rights controlled by domain security groups whose membership is restricted to selected workforce. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, CPT, LOINC, UMLS, HL7 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Home grown CDM 4.d.iii. How are the data transformed and All data come in looking the same from each of the sites based on the normalized adhoc extraction mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a They use their own home grown method by normalizing the data adhoc not post-hoc, i.e., they defined standards at the way to map back to standards? (Data beginning to keep data consistent across all sites. Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Demographic, pre-hospital conditions, medications, lab work, discrete operative decision making, post-operative outcomes the network (e.g., EHR data, claims, patient- up to 12 months, surveys up to 3 years reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, They have their own team working on home grown NLP algorithms. etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are aggregated and sent to the centralized data warehouse. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Have CER tools available network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Use a home grown systematic matching algorithm

17 CER2 Note: This is a fairly new CDRN that is comprised of 5 already established Health Organizations Criteria Answers 1.a. How many people does the network 800,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Not available procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Fiks AG, Grundmeier RW, Margolis B, Bell LM, Steffes J, Massey J, Wasserman RC. Comparative effectiveness research 1.a.iii.1. What is the evidence? using the electronic medical record: an emerging area of investigation in pediatric primary care. J Pediatr 2012; 160:719- 724. 1.b.i.1. Demographics: racial/ethnic Confidential 1.b.i.2. Demographics: geography Confidential 1.b.i.3. Demographics: age Confidential 1.b.i.4. Demographics: gender Confidential 1.c.i. What is the total annual budget? $1,000,000 1.c.i.1. How much of that budget is dedicated Too complex to break down to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Part of the $1 million to conducting studies? 1.c.ii. What are the current sources of Health Resources and Services Administration Maternal and Child Health Bureau and the Eunice Kennedy Shriver National funding? Institute of Child Health & Human Development 1.c.iii. How much does it cost each year to Too complex to break down maintain and update the network? 1.d. How many years has this network 6 months existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Mainly on children with chronic conditions as well as less common but prevalent conditions 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad Not applicable - The network as a whole does not use consent forms because the data they collect are limited data sets (meaning data may be analyzed for other and therefore do not require consent forms. If more specific patient level data are needed for a study, then a consent research) or specific use of their electronic form will be developed and utilized. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR and Claims data pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

18 Criteria Answers 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data Use Agreements needed for investigators to gain access to the data collected by the network each other using the data 1.g.iii.1.b. Policies for sharing data outside Currently, there are no policies for sharing outside the network. the network 1.g.iii.1.c. Policies for protecting proprietary All data captured are de-identified to HIPAA limited status data 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Participating health organizations provide access to EHR data and also participate in research studies giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Received funding to begin a randomized control trial in 3 years 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query The data are aggregated at a central site and then the investigator queries that site distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? NDC, LOINC, SNOMED-CT 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Using a proprietary vendor provider to do the standardization for them way to map back to standards? (Data Dictionary?)

19 Criteria Answers 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographic, Conditions, Medications reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Are working towards using NLP applications and approaches etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are aggregated at the data center aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

20 Community Health Applied Research Network (CHARN)

Criteria Answers 1.a. How many people does the network 519,636 cover or involve? CHARN has just partnered with National Dental Practice-Based Research Network which will allow CHARN to study dental 1.a.i. Evidence of capacity for expansion to practices and policies related to their patient population. cover additional lives, diseases, conditions, or "CHARN currently has both patient-level and visit-level data from our patients from 2008-2010 and will be expanding that procedures range from 2006-2012. CHARN is currently creating registries to assist in the research process." 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement?

The ePro Project is facilitating engagement between providers and patients by determining the patients’ preferences, risk-behaviors, and symptoms and making those preferences available to the provider during the encounter. Patients enter information into a touch-screen tablet while waiting for their provider appointment. CHARN has previously demonstrated 1.a.iii.1. What is the evidence? that patients are more willing to report inadequate medication adherence, substance use, sexual risk behavior, and other potentially socially non-desirable behaviors on the tablet than to providers even in situations where the patient knows the provider will receive the results. Collecting information on the tablets facilitates more comprehensive capture of patient-reported data enabling better patient-provider communication and clinical care.

White: 314,487 (60.5%) Black/African American: 94,849 (18.3%) American Indian/Alaska Native: 4,500 (0.9%) Asian/NHOPI: 91,092 (17.5%) Multi-racial: 4,964 (1.0%)

1.b.i.1. Demographics: racial/ethnic Hispanic: Hispanic or Latino: 242,960 (46.8%) Not Hispanic or Latino: 94,221 (18.1%) Missing (reported unknown): 92,566 (17.8%) Missing (left blank): 89,889 (17.3%) Other: 26,848 (5.2%) No race indicated (missing): 57,328 (11.0%) Association of Asian Pacific Community Health Node: New York, Hawaii, California Alliance of Chicago Community Health Services Node: Illinois, North Georgia, Arizona, California 1.b.i.2. Demographics: geography Fenway Health Node: Maryland, South Carolina, Massachusetts Oregon Community Health Information Center, Inc. Node: Oregon Less than 18: 155,531 (29.9%) 18-25: 72,827 (14.0%) 26-39: 113,334 (21.8%) 1.b.i.3. Demographics: age 40-64: 144,935 (27.9%) 65-79: 26,867 (5.2%) 80 and older: 6,141 (1.2%) Male: 217,169 (41.8%) Female: 302,311 (58.2%) 1.b.i.4. Demographics: gender Transgendered: 125 (0.0%) Unknown or missing: 31 (0.0%) 1.c.i. What is the total annual budget? $10,000,000 1.c.i.1. How much of that budget is dedicated Almost all of the annual budget is directed towards infrastructure building to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of HRSA funding? 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? Primary care in safety net populations, especially focusing on cardiovascular disease, diabetes, dyslipidemia, hypertension, 1.e.i.1. What does the network focus on? hepatitis A and B, and AIDS and AIDS-related conditions 1.f. (Y/N) Does the network use informed Yes consent forms?

21 Criteria Answers 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Not applicable - no studies have involved direct patient contact consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) APCHO, OCHIN and ALLIANCE have data sharing agreements within their network, i.e., they agree to share limited data 1.g.iii.1.a. Data use and sharing policies for sets without needing to go through specific consent. institutional investigators to collaborate with Any Community Health Center (CHC) or node can choose to participate in any project and express consent is given for each other using the data specific projects. If a CHC is participating in that project, there is a representative of the CHC involved in that project. 1.g.iii.1.b. Policies for sharing data outside Has not been addressed yet the network 1.g.iii.1.c. Policies for protecting proprietary Data ownership resides with the Community Health Center (CHC) - the coordinating center does not do anything to any data data without express consent of the CHCs. CHCs own their data. 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize All of the CHCs are responsible for Uniform Data System (UDS) reporting so CHARN's code lists are in the same manner that those data items? (e.g., how do researchers is reported to UDS whenever possible. UDS is a HRSA reporting system to which all Health Centers must contribute data. standardize survey type questions over a CHARN captures race and ethnicity data using criteria from the U.S. Census 2010. CHARN is using the newly mandated HIV period of time?) variables such as sexual orientation that will be added to health center data EHRs in the future. 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By giving access to EHRs giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable

22 Criteria Answers 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? The data that comes from the Community Health Centers (CHCs) are de-identified, the data then get put into a SQL 4.a. What type of security technology does database that is predefined and uploaded to 128-bit encrypted website where it is posted. Two employees at the node the network use? level have access to that data, then a few network-level employees have access to individual files. They can grab from website secure file transfer and put it on their network in a shared file service for a particular project only. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? Version one of the data hub is a series of disease cohorts. This data is structured into an SQL database. The Community Health Centers (CHCs) populate the SQL structure locally, then upload local structure to the coordinating center, then this 4.b.ii. What is the architecture of the query data is combined into centralized resource and the queries can be made locally or centrally. distribution? Version two is not restricted to particular cohorts. Data on medications, procedures, specified labs, patient characteristics, encounter characteristics are captured in a 5 year period. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, SNOMED 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

A home grown SQL structure is pushed out to the nodes by the central hub and the SQL comes back to the central hub with 4.d.ii. Which CDM is used? those common data elements. 4.d.iii. How are the data transformed and SQL fields are populated by the nodes and then sent to the central hub mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown, using a data dictionary and a data submissions document way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into All medical encounters (visits, emails, and phone calls), medications ordered, lab results, and diagnoses if they had one of the network (e.g., EHR data, claims, patient- the seven CHARN related conditions of interest. These include diabetes, cardiovascular disease, HIV, Hepatitis B and C, reported outcomes, etc.). hypertension, and dyslipidemia

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Based on the needs of the researcher aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

23 Children's Hospital of Philadelphia Research Consortium (CHOP-PeRC)

Criteria Answers 1.a. How many people does the network 204,827 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Have 41 active studies involving asthma, obesity, ADHD, depression and Autism procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Fiks, AG, Alessandrini, EA, Luberti, AA, Ostapenko, S., Zhang, X., and Silber, JH. Identifying factors predicting immunization 1.a.iii.1. What is the evidence? delay for children followed in an urban primary care network using an . Pediatrics. 2006 Dec;118(6):e1680-6 White: 55.75% Black/African American: 27.91% American Indian/Alaskan Native: 0.09% 1.b.i.1. Demographics: racial/ethnic Asian: 2.66% Native Hawaiian/Other Pacific Islander: 0.02% Two or More: 0.14% Missing/Unknown: 13.41% 1.b.i.2. Demographics: geography Not available < 1: 7.07% 1-6: 38.76% 1.b.i.3. Demographics: age 7-12: 31.16% 13 or more: 23.21% Male: 50.5% 1.b.i.4. Demographics: gender Female: 49.5% 1.c.i. What is the total annual budget? $25,000,000 1.c.i.1. How much of that budget is dedicated $241,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $56,000 to conducting studies? 1.c.ii. What are the current sources of National Institute of Health, Foundation, State grants, and/or Institutional grants funding? 1.c.iii. How much does it cost each year to $241,000 maintain and update the network? 1.d. How many years has this network 11 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Clinical Decision Support to study a variety of childhood chronic conditions 1.f. (Y/N) Does the network use informed Yes - The CHOP Institutional Review Board (IRB) manages all issues of informed consent and ethics. consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not available the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

24 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for PeRC is governed under one single institutional structure, which means a single IRB and the ability to easily study network- institutional investigators to collaborate with wide interventions. each other using the data 1.g.iii.1.b. Policies for sharing data outside CHOP enters into Data Use Agreements (DUA) with organizations that wish to collaborate and share data. the network 1.g.iii.1.c. Policies for protecting proprietary Information is de-identified data 1) Fiks AG, Localio AR, Alessandrini EA, Asch DA, Guevara JP, “Shared Decision Making in Pediatrics: A National Perspective,” Pediatrics, 2010, Vol. 126: 306-314.

2.a. Three most recent (or high impact) 2) Fiks AG, Mayne S, Localio AR, Alessandrini EA, Guevara JP, “Shared Decision Making, Health Care Expenditures and studies published in peer-reviewed journals Utilization Among Children with Special Health Care Needs,” Pediatrics, 2012:Vol. 129: 99-107.

3) Fiks AG, Mayne S, Localio R, Feudtner C, Alessandrini EA, Guevara JP, “Shared decision making and behavioral impairment: a national study among children with special health care needs.” BMC Pediatrics, 2012: Vol. 12: 153. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Power, TJ, Mautone, JA, Manz, PH, Frye, L, Blum, NJ. Managing attention-deficit/hyperactivity disorder in primary care: A 2.b.i. What is the evidence? systematic analysis of roles and challenges. Pediatrics. 2008 Jan;121;e65-e72 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By providing EHR access and participating in research studies giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network?

1) Fiks AG, Hunter, KF, Localio, AR, Grundmeier, RW, Bryant-Stephens, T, Luberti, AA, Bell, LM, Alessandrini, EA “Impact of Electronic Health Record-Based Primary Care Clinical Alerts on Influenza Vaccination for Children and Adolescents with Asthma: A Cluster Randomized Trial,” Pediatrics, 2009, Vol. 124: 159-169. 2.d.i.1. What is the evidence? 2) Bell LM, Grundmeier R, Localio R, Zorc J, Fiks A, Zhang X, Guevara J, Bryant-Stephens T, Swietlik M. Electronic Health Record Based Decision Support to Improve Asthma Care: A Cluster Randomized Trial. Pediatrics 2010;125:e770-e777.)

3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood collected? 3.c. What types of analysis are done on them? whole genome sequencing 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct whole genome sequencing on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes?

25 Criteria Answers 4.a. What type of security technology does Multi-layer approach, Edge protection coverage from attack, internal segregation including access and control as well on- the network use? going monitoring 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- All data available in the EHR, ranging from demographics, medications, conditions, vitals, procedures, etc. reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Use a home grown approach etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS codes, SPSS scripts, R network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

26 Cancer Research Network (CRN)

Criteria Answers 1.a. How many people does the network 10,966,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Covers cancer studies that also involve other risk factors in addition to cancer. Also have over 106 active studies procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Potter MB, Somkin CP, Ackerson LM, Gomez V, Dao T, Horberg MA, Walsh J ME "The FLU-FIT program: an effective 1.a.iii.1. What is the evidence? colorectal cancer screening program for high volume flu shot clinics." Am J Manag Care 17(8):577-83, 2011 White: 87% African American: 2% 1.b.i.1. Demographics: racial/ethnic Asian American: 3% American Indian: < 1% Hispanic: 8% 1.b.i.2. Demographics: geography Not available <= 24: 29% 25-44: 24% 1.b.i.3. Demographics: age 45-64: 27% 65-74: 9% >= 75: 11% Male: 49% 1.b.i.4. Demographics: gender Female: 51% 1.c.i. What is the total annual budget? $3,300,000 1.c.i.1. How much of that budget is dedicated $660,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $1,980,000 to conducting studies? 1.c.ii. What are the current sources of National Cancer Institute funding? 1.c.iii. How much does it cost each year to $660,000 maintain and update the network? 1.d. How many years has this network 13 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Mainly cancer research and treatment but also conducts research on other health factors 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - If a researcher is conducting a study on a new intervention, then an additional patient consent form is needed for research) or specific use of their electronic that specific study. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

27 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Researchers within the network propose a research study and describe the data elements they would like to collect. Then, institutional investigators to collaborate with each of the sites figures out what data within their local site matches that criteria and then collect that data. each other using the data 1.g.iii.1.b. Policies for sharing data outside Not sharing public use data but share through scientific institutions the network 1.g.iii.1.c. Policies for protecting proprietary Make sure no patient identifiers are transmitted outside the network - data is encrypted and de-identified data 1) The prevalence of obesity and obesity-related health conditions in a large, multiethnic cohort of young adults in California. Koebnick C, Smith N, Huang K, Martinez MP, Clancy HA, Kushi LH.Ann Epidemiol. 2012 Sep;22(9):609-16

2) Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. 2.a. Three most recent (or high impact) Strauss JA, Chao CR, Kwan ML, Ahmed SA, Schottinger JE, Quinn VP.J Am Med Inform Assoc. 2012 Aug 2 studies published in peer-reviewed journals 3) Factors associated with inadequate colorectal cancer screening with flexible sigmoidoscopy. Laiyemo AO, Doubeni C, Pinsky PF, Doria-Rose VP, Sanderson AK 2nd, Bresalier R, Weissfeld J, Schoen RE, Marcus PM, Prorok PC, Berg CD. Cancer Epidemiol. 2012 Aug;36(4):395-9 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? The prevalence of obesity and obesity-related health conditions in a large, multiethnic cohort of young adults in California. 2.b.i. What is the evidence? Koebnick C, Smith N, Huang K, Martinez MP, Clancy HA, Kushi LH.Ann Epidemiol. 2012 Sep;22(9):609-16 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHR data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Analyze specimens for genetic markers, biopsies, tissue blocks, microdissections on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes - looked at recurrence outcomes? caBIG® Data Sharing and Security Framework (DSSF) as a decision support tool to facilitate data sharing by determining 4.a. What type of security technology does which data can be shared and under which type of access and data security controls. To do so, will need to assess the the network use? sensitivity of the data by using the Framework's four elements: Economic/Proprietary/IP Value, Privacy/Confidentiality/Security Considerations, IRB or Institutional, Restrictions, Sponsor Restrictions 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution?

28 Criteria Answers 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, CPT, RxNORM 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? HMORN Virtual Data Warehouse 4.d.iii. How are the data transformed and A research team identifies the data elements; which are then are sent to central location mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home caGrid standard service metadata, expose a standard data service metadata (DomainModel), which details not only the grown, are used? If home grown, is there a UML Classes exposed by the service, but their relationships such as associations and inheritance. This information way to map back to standards? (Data describes the logical model over which data service queries are executed. Dictionary?) 4.f. List the types of data that are being Demographics and vital signs, enrollment into the health care plan, utilization, including inpatient and outpatient visits, collected or accessed and incorporated into emergency department visits, long-term care admissions and home health visits, and communications with health the network (e.g., EHR data, claims, patient- professionals via phone, diagnoses, procedures, and lab tests/results, Incident cancer, pharmacy data, provider reported outcomes, etc.). information, census data, birth and death data, outside claims, patient scheduling, deaths, cost 4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Clinical Text Analysis and Knowledge Extraction System (cTAKES) etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not available aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

29 Distributed Ambulatory in Therapeutics Network (DARTNet) Umbrella network for: AAFP Electronic National Quality Improvement and Research Network (eNQUIRENet), the Collaborative National Network Examining Comparative Effectiveness Trials (CoNNECT), the South Texas Ambulatory Research Network (STARNet), ProHealth, the Scalable Architecture for Federated Therapeutic Inquiries Network (SAFTINet), the Upstate New York Practice Based Research Network (UNYNet), the Washington, Alaska, Montana, and Idaho Area PBRN (WAMI), the Free Clinic Research & Educational Engagement Network (FREENet), and the Minnesota Academy of Family Physicians Research Network (MAFPRN)

Criteria Answers 1.a. How many people does the network 5,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to DARTNet is conducting three simultaneous projects focusing on asthma cover additional lives, diseases, conditions, or procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Clinicians can join eLearning Alliance clinical practice communities and Methods and Research communities to learn about new approaches to care and EHRs. The network also presents clinical data with the goal of informing best practices for 1.a.iii.1. What is the evidence? care. The network's projects aim to disseminate tested clinical decision support algorithms and encourage workflow sharing amongst groups and non-members for quality improvement. 1.b.i.1. Demographics: racial/ethnic Most offices in the network are private practice, so they do not collect information on race/ethnicity. California, Colorado, Kansas, Missouri, Arkansas, Illinois, Indiana, Tennessee, Florida, North Carolina, Virginia, Ohio, 1.b.i.2. Demographics: geography Pennsylvania, New Jersey, New York, Connecticut, New Hampshire, Vermont, Minnesota, Wyoming, Alaska, Montana, and Idaho 65 and older: 25% 1.b.i.3. Demographics: age Under 18: 15% Average age: 45 Male: 42% 1.b.i.4. Demographics: gender Female: 58% 1.c.i. What is the total annual budget? $2,000,000 1.c.i.1. How much of that budget is dedicated Unknown - because the network was multiple networks joined together as DARTNet in December of 2011 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of CTSA, internal funding from clinical organizations, NIH, revenue from sharing data outside the network funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 6 existed? 1.e.i. (Y/N) Does the network have a focus No (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Patients are involved in advisory board activities of member networks the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

30 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Researchers who are members of the subnetworks that have donated data are not charged for queries, but researchers institutional investigators to collaborate with are charged for analysis each other using the data 1.g.iii.1.b. Policies for sharing data outside Services and data may be purchased through the DARTNet Website the network 1.g.iii.1.c. Policies for protecting proprietary Not available data 1) Rates of 5 Common Antidepressant Side Effects Among New Adult and Adolescent Case of Depression: A Retrospective US Claims Study. Anderson HS, Pace, WD, Libby AM, West DR, Valuck RJ. Clinical Therapeutics. 2012; 34(1): 113-123.

2) Enhancing Electronic Health Record Measurement of Depression Severity and Suicide Ideation: A Distributed 2.a. Three most recent (or high impact) Ambulatory Research in Therapeutics Network (DARTNet) Study. Valuck RJ, Anderson HO, Libby AM, Brandt E, Bryan C, studies published in peer-reviewed journals Allen RR, Staton EW, West DR, Pace WD. J Am Board Fam Med. 2012 Sep;25(5):582-93.

3) An assessment of the Hawthorne Effect in practice-based research. Fernald DH, Coombs L, DeAlleaume L, West D, Parnes B. J Am Board Fam Med. 2012 Jan-Feb;25(1):83-6. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Pulling data every 3 months following patients with Chronic Kidney Disease 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHR data and claims data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? This network has used advanced methods for cluster randomized trials where numerous outcome variables and practice 2.d.i.1. What is the evidence? level variables are included. 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? Data are queried via a secure web portal; permission from each practice is required each time to make data available to 4.a. What type of security technology does DARTNet; databases reside at individual practices and they are responsible for their own firewalls; the limited dataset that the network use? sits on the grid node operates within the triad system run by Ohio State; 3 level security logins are required 4.b.i. (Y/N) Are queries distributed via a Yes central hub?

31 Criteria Answers There are three difference methods for query distribution: (1) All the data are locally mapped and crosswalked into the observational medical outcomes standards and sent back to 4.b.ii. What is the architecture of the query the central hub. distribution? (2) Data are standardized and pulled by a third party and sent back to the central hub. (3) A clinic standardizes their own data - ROSITA converts the data and standardizes to OMOP - the data are put in a local OMOP data structure behind a clinic's own firewall locally and then the results are sent back to the central hub. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? SNOMED, ICD-9, LOINC, CPT, First Data Bank 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Observational Medical Outcomes Project (OMOP) ROSITA mapping system takes the file in and performs record linkage if data from the same set of patients are being 4.d.iii. How are the data transformed and loaded from multiple sources (e.g., EHR and claims). It recodes the source values into standardized concept IDs (using mapped? OMOP V4 Vocabulary and local mapping), strips direct patient identifiers, and outputs a limited data set to the grid node housed at each site where the data are available for query. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- EHR data, insurance claims data, patient reported outcomes data, clinician reported outcomes data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not available aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SPSS scripts, SAS code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

32 Electronic Medical Records and Genomics (eMERGE) Network

Criteria Answers 1.a. How many people does the network 310,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Covers genetic studies and phenotype studies on certain conditions procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic See Table 1 1.b.i.2. Demographics: geography See Table 1 1.b.i.3. Demographics: age See Table 1 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? $6,775,000 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $500,000 per research group over 3 year period to conducting studies? RFA-HG-11-022: [grants.nih.gov] The Electronic Medical Records and Genomics (eMERGE) Network, Phase II - Pediatric Study Investigators (U01) 1.c.ii. What are the current sources of RFA HG-10-010: [grants.nih.gov] The Electronic Medical Records and Genomics (eMERGE) Network, Phase II - Coordinating funding? Center (U01) RFA HG-10-009: [grants.nih.gov] The Electronic Medical Records and Genomics (eMERGE) Network, Phase II - Study Investigators (U01) 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? More than a dozen phenotypes that are currently being investigated including Multiple Sclerosis, Crohn's Disease, Atrial 1.e.i.1. What does the network focus on? Fibrillation 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad - If a patient agrees to take part in the biobank, some of their genetic and health information might be placed into research) or specific use of their biological one or more scientific databases. specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and In the event that a researcher would like to use a patient's biospecimen for a study, they would need to contact the patient in what mechanism? How are they involved in and the patient may opt-in or out of that study the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR, Biobanks and genetic data pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

33 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for The eMERGE Network is open to all academic, government, and private sector scientists who are interested in participating institutional investigators to collaborate with in an open process to facilitate genomic research in biorepositories with electronic medical records and application of each other using the data genomic results to clinical care, and who agree. 1.g.iii.1.b. Policies for sharing data outside To maximize these collaborations between sites, participating institutions had to develop Data Use Agreements in order to the network share de-identified research data, including the HIPAA-defined limited data sets, with other sites within the Consortium. 1.g.iii.1.c. Policies for protecting proprietary Data Use Agreement and publication policy, and all data are de-identified once data leave the site data 1) Validation and discovery of genotype-phenotype associations in chronic diseases using linked data. Pathak J, Kiefer R, Freimuth R, Chute C. Stud Health Technol Inform. 2012;180:549-53.

2) Gene-centric meta-analyses of 108 912 individuals confirm known body mass index loci and reveal three novel signals. Guo Y, Lanktree MB, Taylor KC, Hakonarson H, Lange LA, Keating BJ; IBC 50K SNP array BMI Consortium. Hum Mol Genet. 2013 Jan 1;22(1):184-201.

3) Large-scale gene-centric meta-analysis across 32 studies identifies multiple lipid loci. Asselbergs FW, Guo Y, van Iperen EP, Sivapalaratnam S, Tragante V, Lanktree MB, Lange LA, Almoguera B, Appelman YE, Barnard J, Baumert J, Beitelshees AL, Bhangale TR, Chen YD, Gaunt TR, Gong Y, Hopewell JC, Johnson T, Kleber ME, Langaee TY, Li M, Li YR, Liu K, McDonough CW, Meijs MF, Middelberg RP, Musunuru K, Nelson CP, O'Connell JR, Padmanabhan S, Pankow JS, Pankratz N, Rafelt S, Rajagopalan R, Romaine SP, Schork NJ, Shaffer J, Shen H, Smith EN, Tischfield SE, van der Most PJ, van Vliet-Ostaptchouk JV, Verweij N, Volcik KA, Zhang L, Bailey KR, Bailey KM, Bauer F, Boer JM, Braund PS, Burt A, 2.a. Three most recent (or high impact) Burton PR, Buxbaum SG, Chen W, Cooper-Dehoff RM, Cupples LA, deJong JS, Delles C, Duggan D, Fornage M, Furlong CE, studies published in peer-reviewed journals Glazer N, Gums JG, Hastie C, Holmes MV, Illig T, Kirkland SA, Kivimaki M, Klein R, Klein BE, Kooperberg C, Kottke-Marchant K, Kumari M, LaCroix AZ, Mallela L, Murugesan G, Ordovas J, Ouwehand WH, Post WS, Saxena R, Scharnagl H, Schreiner PJ, Shah T, Shields DC, Shimbo D, Srinivasan SR, Stolk RP, Swerdlow DI, Taylor HA Jr, Topol EJ, Toskala E, van Pelt JL, van Setten J, Yusuf S, Whittaker JC, Zwinderman AH; LifeLines Cohort Study, Anand SS, Balmforth AJ, Berenson GS, Bezzina CR, Boehm BO, Boerwinkle E, Casas JP, Caulfield MJ, Clarke R, Connell JM, Cruickshanks KJ, Davidson KW, Day IN, de Bakker PI, Doevendans PA, Dominiczak AF, Hall AS, Hartman CA, Hengstenberg C, Hillege HL, Hofker MH, Humphries SE, Jarvik GP, Johnson JA, Kaess BM, Kathiresan S, Koenig W, Lawlor DA, März W, Melander O, Mitchell BD, Montgomery GW, Munroe PB, Murray SS, Newhouse SJ, Onland-Moret NC, Poulter N, Psaty B, Redline S, Rich SS, Rotter JI, Schunkert H, Sever P, Shuldiner AR, Silverstein RL, Stanton A, Thorand B, Trip MD, Tsai MY, van der Harst P, van der Schoot E, van der Schouw YT, Verschuren WM, Watkins H, Wilde AA, Wolffenbuttel BH, Whitfield JB, Hovingh GK, Ballantyne CM, Wijmenga C, Reilly MP, Martin NG, Wilson JG, Rader DJ, Samani NJ, Reiner AP, Hegele RA, Kastelein JJ, Hingorani AD, Talmud PJ, Hakonarson H, Elbers CC, Keating BJ, Drenos F. Am J Hum Genet. 2012 Nov 2;91(5):823-38 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? https://www.zotero.org/groups/emerge_network/items/collectionKey/NUV7UTBP 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By signing the DUA, a healthcare organization can participate in using eMERGE for research purposes as well as providing giving access to EHRs, etc.) genomic and EHR data 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are RBC count, hemoglobin level, mean corpuscular volume, mean corpuscular collected? hemoglobin, RBC distribution width and erythrocyte sedimentation rate. DNA, plasma, and serum and neuroimaging 3.c. What types of analysis are done on them? Genomic analyses, complete blood counts, chemistry panel, B12, thyroid stimulating hormone 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes?

34 Criteria Answers 3.d.i. What types of analyses do they conduct Genomic analyses, complete blood counts, chemistry panel, B12, thyroid stimulating hormone on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does The security technology is different for each of the local sites and therefore cannot be assessed the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Researchers login to a web portal but can only obtain record counts of patients within the network based on ICD-9 and distribution? demographics 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10, RxNORM, CPT, LOINC, SNOMED-CT,caDSR, NCI 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? CDISC SDTM 4.d.iii. How are the data transformed and caBIG mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a eleMAP allows researchers to harmonize their local phenotype data dictionaries to existing metadata and terminology way to map back to standards? (Data standards such as caDSR, NCIT, and SNOMED-CT Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, medical conditions, medications, vitals, and genetic data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

clinical Text and Knowledge Extraction System (cTAKES) Health Information Text Extraction (HITEX) NegEx (NegEx) 4.g.ii. What applications (e.g., UIMA, cTAKES, ConText (ConText) NegEx, MetaMap, many different parsers, National Library of Medicine's MetaMap (MetaMap) etc.) or approaches (examples are machine MedEx learning, rule-based) are being used? SecTag Stanford Named Entity Recognizer (NER) Stanford CoreNLP (CoreNLP) 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are aggregated based on the request of the researcher established in the DUA aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

PheWAS methods leverage EHR billing data (ICD9s) to derive case and control populations. Using this data, a large number 4.j.ii. What informatics tools are used? of disease phenotypes can be investigated simultaneously against a specified variant or variants.

35 Table 1

*Table from http://www.genome.gov/27540473

36 HMO Research Network (HMORN) Umbrella network for: Cancer Research Network (CRN), Cardiovascular Research Network (CVRN), Diabetes Research Network, Accelerating Change and Transformation in Organizations and Networks (ACTION II), Developing Evidence to Improve Decisions about Effectiveness (DEcIDE) Network, Medical Exposure in Pregnancy Risk Evaluation Program (MEPREP), Mental Health Research Network (MHRN), Mini-Sentinel, Multi-Institutional COnsortium for Comparative Effectiveness Research in Prevention and Treatment of Diabetes Mellitus (SUPREME-DM). Criteria Answers 1.a. How many people does the network 18,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or In two years, HMORN increased from 11 to 19 sites, and from 10 million to 18 million covered procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Transforming Primary Care Study: Evaluating the spread of Group Health's Medical Home, PI: Robert J. Reid. 7,018 followed for two years. An evaluation of the effects of the patient-centered medical home model of primary care on 1.a.iii.1. What is the evidence? patients’ experiences, quality, burnout of clinicians, and total costs; results showed improvements in patients’ experiences, quality, and clinician burnout—with an estimated total savings of $10.3 per patient per month. Citations: Reid, Fishman et al. 2009; Reid, Coleman et al. 2010 1.b.i.1. Demographics: racial/ethnic See Table 1 19 research centers - Denver-Boulder-Colorado Springs, Atlanta, Central Texas, Hawaii, Northwest Oregon-Southwest Washington, Sacramento-San Francisco Bay Area, New Mexico, Washington-Northern Idaho, Wisconsin, Northeast and 1.b.i.2. Demographics: geography Central Pennsylvania, Southeast Michigan, Minneapolis-St. Paul, Massachusetts-New Hampshire-Maine, Massachusetts, Los Angeles County- Orange County-San Diego County, Wisconsin-Minnesota-North Dakota-Idaho, Tel Aviv (Israel), Maryland-Virginia-District of Columbia, Northern California 1.b.i.3. Demographics: age See Table 1 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? See Table 2 1.c.i.1. How much of that budget is dedicated See Table 2 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated See Table 2 to conducting studies? 1.c.ii. What are the current sources of NIH, CDC, AHRQ, Community Benefit Funds funding? 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 18 existed? 1.e.i. (Y/N) Does the network have a focus No (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR (Epic, Care Plus, Allscripts, Cattails MD, ICT, Next Gen) pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

37 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data Use Agreements each other using the data 1.g.iii.1.b. Policies for sharing data outside Data are shared on a case to case basis. Typically the outside researcher needs to collaborate with a researcher who is a the network part of the network. 1.g.iii.1.c. Policies for protecting proprietary Patient data are de-identified data 1) PS2-51: Utilization Quality Assurance: Are We Better Yet? Donald Bachman, Terry Field, Christine Bredfeldt, Mark Hornbrook, Alan Bauck, Heather Tavel, Lucas Ovans, Debbie Godwin and Dean Kjar, Clinical Medicine & Research August 1, 2012 vol. 10 no. 3 195-196.

2) PS2-58: A Survey of HMORN VDW Tumor Data Sources. Rick Krajenta, Dustin Key and Amy Butani. Clinical Medicine & 2.a. Three most recent (or high impact) Research August 1, 2012 vol. 10 no. 3 197. studies published in peer-reviewed journals 3) PS2-61: Establishment of a Cohort of Women to Study the Effect of Cervical Procedures on Reproductive Health Outcomes. Erin Masterson, Sheila Weinmann, Allison Naleway, Meredith Vandermeer, Tracy Dodge, Bhakti Arondekar, Jovelle Fernandez, Shanthy Krishnarajah, Geeta Swamy and Evan Myers. Clinical Medicine & Research August 1, 2012 vol. 10 no. 3 180. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Adult Changes in Thought (ACT) Study, Eric B. Larson. About 2,000 (at any one time, new participants are enrolled as others die) followed since 1994. Total enrollment to date > 4,000 including >400 research quality autopsy specimens. An ongoing 2.b.i. What is the evidence? longitudinal study following adults over age 65 to identify risk factors for cognitive decline with aging and related conditions, such as Alzheimer's disease. Citations: Gray, Anderson et al. 2008; Breitner, Haneuse et al. 2009; Ehlenbach, Hough et al. 2010; Gray, Walker et al. 2011; Trittschuh, Crane et al. 2011 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Access to EHRs, administrative, laboratory data, pharmacy data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? Effect of Massage on Chronic Low Back Pain, Daniel C. Cherki. 400 patients followed for one year: first study to compare 2.d.i.1. What is the evidence? structural and relaxation (Swedish) massage for chronic low back pain; the randomized controlled trial found that both types of massage worked well, with few side effects. Citations: Cherkin, Sherman et al. 2009; Cherkin, Sherman et al. 2011 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood, serum, and DNA samples collected? 3.c. What types of analysis are done on them? DNA sequence analysis to identify genetic variants 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct DNA sequence analysis to identify genetic variants, examined three biomarkers as potential predictors of future diagnosis on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does Data stored locally, computerized datasets stored behind separate security firewalls the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub?

38 Criteria Answers For multi-site studies that use data from the standardized Virtual Data Warehouse, efficiencies are achieved by sharing 4.b.ii. What is the architecture of the query data extraction code that has been written and validated at a single site, then deployed at other sites to be run against distribution? local Virtual Data Warehouse files. Data management staff at all sites work closely with site investigators to refine data queries and prepare analytic data sets. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? NDC, ICD-9/10, CPT-4, DRG, ISO, HCPCS, LOINC 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Virtual Data Warehouse 4.d.iii. How are the data transformed and Data are stored locally and are mapped when data extraction codes are sent to the local sites. mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into EHR/EMR records, health plan claims, medical charts, lab data, clinical registries, biospecimen resources, patient reported the network (e.g., EHR data, claims, patient- outcomes, data on health care cost, utilization, and benefit designs, pharmacy data, survey data, clinical trials data, cancer reported outcomes, etc.). registries, Medicare/Medicaid, vital records

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Machine learning, logistic regression, support vector machine etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Based on the query of the researcher aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Virtual Data Warehouse SAS Macros network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

39

Table 1. Population Characteristics (last updates range, 2010-2012)

EH FCHP GH GHS HFHS HP HPHC KPCO KPGA KPHI KPMA KPNC KPNW KPSC LCF MC MHS PAMF S&W Age % ≤ 17 yrs 25 19 16 19 16 25 22 21 22 20 20 22 22 25 38 21 41 21 22 % 18 – 44 yrs 33 31 40 27 39 39 34 37 35 36 35 34 36 26 32 32 39 33 60 % 45 – 64 yrs 29 35 26 34 29 34 30 32 30 32 29 31 28 21 26 19 27 28 % 65 + 15 19 18 16 22 5 4 15 9 15 12 13 14 11 15 20 5 12 17 Race8 % American Indian/Alaska Native 4 <1 1 0 1 1 0 1 0 1 <1 <1 1 <1 NA <1 0 <1 0 % Asian <1 3 4 0 3 5 5 3 2 38 9 17 5 NA <1 0 32 1 10 available % Native Hawaiian or Other Pacific <1 0 0 1 0 0 0 0 0 33 NA 4 0 NA <1 0 NA 0 Islander % Black or African American <1 2 2 1 38 10 12 4 18 1 37 8 3 10 NA <1 0 2 6 available % White 97 87 33 98 52 59 83 57 18 27 42 51 87 37 NA 68 95 51 45 available % Other or unknown <1 0 60 0 0 25 0 36 62 0 2 0 5 3 100 30 5 <1 45 not not not not not not not not not not not Not not not Not not % ethnicity known 54 50 68 specified specified specified specified specified specified specified specified specified specified specified specified specified specified specified specified % known Hispanic or Latino - 8 2 1 1 2 4 10 2 4 19 5 41 40 2 0 14 7 ethnicity Member Retention % enrolled at 1 yr n/a 95 82 82 99 88 85 91 82 85 67 87 83 90 80 88 99 90 84 % enrolled at 3 yrs n/a 92 63 64 86 70 54 66 57 72 39 75 67 76 51 82 98 79 65 % enrolled at 5 yrs n/a 92 52 47 63 55 45 54 43 63 27 66 59 66 40 70 98 68 53

8 may be > 100% if multiple responses allowed at collection, 'other' may included persons reporting multiple races.

Health Plan Acronyms: EH = Essentia Health HP = HealthPartners KPMA = Kaiser Permanente Mid-Atlantic MCRF = Marshfield Clinic FCHP = Fallon Community Health Plan HPHC = Harvard Pilgrim Health Care KPNC = Kaiser Permanente Northern California MHS = Maccabi Healthcare Services GHC = Group Health KPCO = Kaiser Permanente Colorado KPNW = Kaiser Permanente Northwest PAMF = Palo Alto Medical Foundation GHS = Geisinger Health System KPGA = Kaiser Permanente Georgia KPSC = Kaiser Permanente Southern California S&W = Scott & White Healthcare HFHS = Henry Ford Health System KPHI = Kaiser Permanente Hawaii LCF = Lovelace Clinic Foundation

40

GHRI GHS HFHS HPRF HPHC KPCO KPGA KPHI KPNC KPNW LCF MCRF MPCI S&W Began 1983 2003 1983 1989 1969 1987 1998 1991 1961 1964 1990 1959 1996 1985 Research clinic         Survey       department Facility that can do research lab             tests Facility that can fill research            prescriptions 2010Funding§§ – all sources 43.3 10.5 52.4 17.0 32.1 16.6 3.3 4.4 94.4 35.3 5.7 31.9 4.3 13.1 ($millions) 2010 Federal 82 16 50 64 84 54 44 62 69 76 91 32 72 22 Funding, % PI FTE 32 10 82 23 36 10 6 5 48 31 7 31 26 23 Investigator‐ initiated clinical 1‐10 >10 1‐10 1‐10 0 0 1‐10 0 >10 >10 0 1‐10 1‐10 >10 trials (avg/year) Total clinical trials <50 50+ 50+ <50 <50 50+ <50 <50 50+ 50+ 0 50+ <50 50+ (avg/year)

§§ Revenue/expense.

41 Hospital Medicine Reengineering Network (HOMERuN)

Criteria Answers 1.a. How many people does the network 50,000 patients per year - 1200 patients in Transitions of Care project cover or involve? 1.a.i. Evidence of capacity for expansion to Although HOMERuN is currently using their data for one project, they receive data for all patients in the network who were cover additional lives, diseases, conditions, or admitted to network hospitals procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Transitions of Care project data are being locally used and sites have talked about increasing post discharge acute clinic 1.a.iii.1. What is the evidence? care follow ups, and issues surrounding decreased access to pre- and post- acute care . 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $1,600,000 1.c.i.1. How much of that budget is dedicated $800,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $800,000 to conducting studies? 1.c.ii. What are the current sources of Association of American Medical Colleges funding? 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 4 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Discharge care coordination, and a readmission review at all 13 sites, determining preventability 1.f. (Y/N) Does the network use informed Yes - the UCSF IRB approves informed consent documentation consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

42 Criteria Answers 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data Use Agreements, Business Associate Agreements each other using the data 1.g.iii.1.b. Policies for sharing data outside This network does not share data--and does not plan to share data--outside the network the network 1.g.iii.1.c. Policies for protecting proprietary All proprietary data are property of UCSF data Publications under review: 1) PREVENTABILITY OF READMISSIONS IN A NATIONAL SAMPLE OF PATIENTS: PRELIMINARY RESULTS FROM THE HOSPITAL MEDICINE REENGINEERING NETWORK (HOMERUN), AD Auerbach, Mourad M, Maselli J, Sehgal N, Lindenauer PK, Kim C, Robinson E, Ruhnke G, Metlay J, Herzig S, Vasilevskis E, Kripalani S, Williams M, Fletcher G, Critchfield J, Schnipper J

2) PRIMARY CARE PHYSICIAN AND HOSPITALIST PERCEPTIONS OF CAUSES OF READMISSIONS: PRELIMINARY RESULTS 2.a. Three most recent (or high impact) FROM THE HOSPITAL MEDICINE REENGINEERING NETWORK (HOMERUN), AD Auerbach, Mourad M, Maselli J, Sehgal N, studies published in peer-reviewed journals Lindenauer PK, Kim C, Robinson E, Ruhnke G, Metlay J, Herzig S, Vasilevskis E, Kripalani S, Williams M, Fletcher G, Critchfield J, Schnipper J

3) The Hospital Medicine Reengineering Network (HOMERuN): A learning organization focused on improving hospital care. Andrew D. Auerbach MD MPH, Mitesh S. Patel MD MBA, Joshua P. Metlay MD PhD, Jeffrey L. Schnipper MD MPH, Mark V. Williams MD, Edmondo J. Robinson MD MBA, Sunil Kripalani MD MSc, Peter K. Lindenauer MD MSc

2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers No standardization has been needed thus far standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Healthcare organizations are allowing researchers to include their patients in research and also interact with patients for giving access to EHRs, etc.) purposes of the clinical trial while a patient is in the hospital. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does All data are stored utilizing the security technology of the UCSF secure data center. the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query The query is submitted to the principal investigator and the principal investigator compiles the data set and sends it back distribution? to the researcher as an Excel, CSV, SAS, or STATA dataset. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not for the current project but moving toward using ICD-10 SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable

43 Criteria Answers 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- EHR data (for initial review only), Patient chart review, Surveys of physicians, Patient interviews reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are transformed based on the data needs of the researcher aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

44 Mini-Sentinel

Criteria Answers 1.a. How many people does the network 126,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Conducts studies involving drug, vaccine, and medical device safety procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic See Table 1 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age See Table 1 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? $14,000,000 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of FDA funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 4 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Mainly on drugs, vaccines, other biologics (such as blood products), and medical devices 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR, Claims data pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Data Partners may use their own original source data transformed into Mini-Sentinel Common Data Model format for institutional investigators to collaborate with other purposes, such as research, as long as they comply with applicable state and federal laws and regulations, including each other using the data HIPAA and the Common Rule

45 Criteria Answers Data Use Agreements are not required for Mini-Sentinel activities. However, Collaborators and the Mini-Sentinel Coordinating Center, including all its components, may only use data obtained from sources other than their own 1.g.iii.1.b. Policies for sharing data outside institution (referred to as “outside source data”) in the conduct of Mini-Sentinel activities for Mini-Sentinel’s public health the network purposes. Such data may not be reused, re-disclosed, altered, or sold for any purposes other than those defined in the base contracts and subsequent task order contracts. Direct patient identifiers may be used by Data Partners when necessary to gather additional clinical and demographic 1.g.iii.1.c. Policies for protecting proprietary information or to link their data to data from other sources, as required by specific projects. Prior to sharing information data with the Operations Center, direct patient identifiers are stripped. 1) Nguyen, M., Ball, R., Midthun, K., & Lieu, T. A. (2012). The Food and Drug Administration's Post-Licensure Rapid Immunization Safety Monitoring program: strengthening the federal vaccine safety enterprise. Pharmacoepidemiology and Drug Safety, 21, 291-297

2.a. Three most recent (or high impact) 2) Fireman, B., Toh, S., Butler, M. G., Go, A. S., Joffe, H. V., Graham, D. J., ... & Selby, J. V. (2012). A protocol for active studies published in peer-reviewed journals surveillance of acute myocardial infarction in association with the use of a new antidiabetic pharmaceutical agent. Pharmacoepidemiology and Drug Safety, 21, 282-290.

3) Lopez, M. H., Holve, E., Sarkar, I. N., & Segal, C. (2012). Building the Informatics Infrastructure for Comparative Effectiveness Research (CER): A Review of the Literature. Medical Care, 50, S38-S48. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Validation of Severe Liver Injury Cases 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By providing EHR data and DUAs giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? All current implementations using PopMedNet are NIST 800-53 REVE 2 / FISMA compliant and have successfully passed a full audit of the hosting facility, application, and operations procedures. The Application Portal is hosted in a two server 4.a. What type of security technology does configuration, one server (Portal Web server) to run the application and to service all applications requests that come in the network use? via the Web. This server runs the Portal application under IIS and ASP .NET. The second server (Portal Database server) houses the Portal Database in a MS SQL Server 2008 instance. There is no connection from the Portal Database server to the web. All requests are made via the Portal Web server. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? The Data Mart Client polls the portal for queries awaiting execution, downloads the query, executes the query, and manages the workflows associated with query execution (Administrator in box, notification, workflow processing, etc.). 4.b.ii. What is the architecture of the query The Data Mart executes the query directly via an ODBC connection; it is not passed off to another service. Queries can be distribution? reviewed before local execution, and results reviewed before release. The system does not require an open port and is not designed to be fully synchronous – although all query fulfillment steps can be automated. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10/11, NDC, LOINC, SNOMED-CT, CPT-4, HCPCS, HCPCS Level III, CPT Cat II, CPT Cat III

46 Criteria Answers 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Mini-Sentinel 4.d.iii. How are the data transformed and Mini-Sentinel utilizes SAS Macro toolkits to extract data from EHR/EMR from the current site mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home Each query allows the requester to describe the nature of the query. System metadata include the requester name and grown, are used? If home grown, is there a contact information, his/her role in the system, the query description, and which other sites also received the query. The way to map back to standards? (Data Data Mart Administrator can see the query parameters and its results before uploading to the portal. Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Enrollment, Demographic, Medication, Encounter, Diagnosis, Procedures, Labs, Vitals reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., After it leaves the local site but it all depends on the permissions of the user. Some may only view aggregated results and based on what criteria are the data others may view site-specific results. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS toolkits are available for users to utilize with the Mini-Sentinel network network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? ETL tools are used to load the data

47 Table 1. Snapshot of the Mini-Sentinel Distributed Database Demographic Table in Extract 1 (Unique Individuals = 83,003,100)

*Table from http://www.mini-sentinel.org/work_products/Data_Activities/Mini-Sentinel_Year-1-Data- Quality-and-Characterization-Procedures-and-Findings-Report.pdf, page 41

48 The National Dental Practice-Based Research Network (NDPBRN)

Criteria Answers 1.a. How many people does the network 40,000 cover or involve? 1.a.i. Evidence of capacity for expansion to 4 main types of studies - retrospective studies using dental records; observational studies of routine care activities, case- cover additional lives, diseases, conditions, or control studies, and clinical trials comparing alternative treatment strategies procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Allows participating dentists to compare their results to the aggregated results of other practices; network tracks whether member practices have implemented the approaches disseminated by research. 1.a.iii.1. What is the evidence? 1) Riley JL III, Gordan VV, Rindal DB, Fellows JL, Qvist V, Sager P, Foy P, Williams OD, Gilbert GH for The National Dental PBRN Collaborative Group. Components of patient satisfaction with a dental restorative visit: results from The Dental Practice-Based Research Network. Journal of the American Dental Association 2012; 143(9):1002-1010. 1.b.i.1. Demographics: racial/ethnic Not available United States-- Alabama, California, Colorado, Delaware, District of Columbia, Florida, Georgia, Illinois, Kentucky, Louisiana, Maine, Massachusetts, Michigan, Minnesota, Mississippi, New Jersey, New Mexico, North Carolina, Ohio, Oregon, 1.b.i.2. Demographics: geography Pennsylvania, South Carolina, Tennessee, Texas, Washington, Wisconsin - divided into 6 regional nodes, the Western Region, the Midwest Region, the Northeast Region, the Southwest Region, the South Central Region, and the South Atlantic Region 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $66,800,000 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of National Institute of Dental and Craniofacial Research (NIDCR) funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 1 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Dental Practice-Based Research 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

49 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for A researcher from the network submits a protocol concept to the network. Once a protocol concept has been approved, a institutional investigators to collaborate with study team is formed. This team administers the study from protocol development, to feasibility and pilot testing, data each other using the data collection, data analysis, and study closure. 1.g.iii.1.b. Policies for sharing data outside Data are not shared outside the network the network 1.g.iii.1.c. Policies for protecting proprietary Practitioners sign a confidentiality agreement, receive Human Subjects Training, and HIPAA training data 1) Houston TK, DeLaughter KL, Ray MN, Gilbert GH, Allison JJ, Kiefe CI, Volkman JE for the National Dental PBRN Collaborative Group. Impact of a web-assisted tobacco quality improvement intervention of subsequent smoker behavior: a National Dental PBRN study. BMC Oral Health 2013; accepted for publication.

2) Blue CM, Funkhouser DE, Riggs S, Rindal DB, Worley D, Pihlstrom DJ, Gilbert GH for the National DPBRN Collaborative 2.a. Three most recent (or high impact) Group. Utilization of non-dentist providers and attitudes toward new provider models: findings from the National Dental studies published in peer-reviewed journals Practice-Based Research Network. Journal of Public Health Dentistry 2013; accepted for publication.

3) Ray MN, Allison JJ, Coley HL, Williams JH, Kohler C, Gilbert GH, Richman JS, Kiefe CI, Sadasivam RS, Houston TK for the National DPBRN Collaborative Group. Variations in tobacco control in National Dental PBRN practices: the role of patient and practice factors. Special Care in Dentistry 2013; accepted for publication. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Houston TK, Coley HL, Sadasivam RS, Ray MN, Williams JH, Allison JJ, Gilbert GH, Kiefe CI, Kohler C for The DPBRN Collaborative Group. Impact of content-specific email reminders on provider participation in an online intervention: a 2.b.i. What is the evidence? Dental PBRN study. Studies in Health Technology and Informatics 2010; 160: 801-805.

2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Dentists have three options for participation once they have joined the network: 1. informational (receive newsletters and giving access to EHRs, etc.) correspondence only); 2. limited (also participate in questionnaires); or 3. full (also participate in in-office clinical studies) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? Houston TK, Richman JS, Ray MN, Allison JJ, Gilbert GH, Shewchuk RM, Kohler CL, Kiefe CI, for The DPBRN Collaborative 2.d.i.1. What is the evidence? Group. Internet-delivered support for tobacco control in dental practice: randomized controlled trial in The Dental PBRN. Journal of Medical Internet Research 2008; 10(5): e38. 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution?

50 Criteria Answers 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Not available

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, treatments, procedures, medications, and surveys collected in clinical trials reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Not available

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Not available the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not available aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts and SPSS code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Not available billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

51 Pediatric Emergency Care Applied Research Network (PECARN)

Criteria Answers 1.a. How many people does the network 1,200,000 cover or involve? 1.a.i. Evidence of capacity for expansion to Collaborates with 18 hospital emergency departments and children are being treated in the emergency department for cover additional lives, diseases, conditions, or acute illnesses and injuries across a wide spectrum of conditions from the most common to the very rare procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Stanley R, Lillis K, Zuspan SJ, Lichenstein R, Ruddy RM, Gerardi MJ, Dean JAM, and the Pediatric Emergency Care Applied 1.a.iii.1. What is the evidence? Research Network. Development and implementation of a performance measure tool in an academic pediatric research network. Controlled Clinical Trials 2010 White (Non-Hispanic): 33% Hispanic: 21% Black or African American: 38% Asian: 1% 1.b.i.1. Demographics: racial/ethnic American Indian or Alaskan Native: 0% Native Hawaiian or Other Pacific Islander: 0% Other: 4% Multiple Races: 1% 1.b.i.2. Demographics: geography Not available 0: 17% 1: 13% 2: 10% 3: 7% 4: 6% 5: 5% 6: 4% 7: 4% 8: 4% 1.b.i.3. Demographics: age 9: 3% 10: 3% 11: 3% 12: 3% 13: 3% 14: 3% 15: 3% 16: 3% 17: 3% 18: 1% Male: 53% 1.b.i.4. Demographics: gender Female: 47% 1.c.i. What is the total annual budget? $5,280,000 1.c.i.1. How much of that budget is dedicated Percentage of the total annual budget to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $5,000,000 to conducting studies? HRSA/MCHB/EMSC funds the infrastructure 1.c.ii. What are the current sources of External Grants funded by NICHD, NHLBI, CDC, NIH-Eunice Kennedy Shriver National Institute of Child Health & Human funding? Development, AHRQ, NIAAA and HRSA/MCHB/EMSC

1.c.iii. How much does it cost each year to Percentge of the total annual budget maintain and update the network? 1.d. How many years has this network 11 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Focuses on Pediatric Emergency Care 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - Consent to specific use of their data based on the IRB submitted by the researcher. Consent forms are changed research) or specific use of their electronic based on the study. data?

52 Criteria Answers 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific - Consent to specific use of their data based on the IRB submitted by the researcher. Consent forms are changed research) or specific use of their biological based on the study. specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Investigators will request the use of a specific dataset by submitting a formal request that includes a research plan 1.g.iii.1.a. Data use and sharing policies for describing the proposed research, a signed data Research Data Use Agreement (RDUA) approval from the researcher’s IRB institutional investigators to collaborate with for use of the dataset, or documentation that the use of public data sets is exempt from IRB review by institutional policy. each other using the data The data coordinating center will disseminate the dataset after receipt of the aforementioned items. Investigators will request the use of a specific dataset by submitting a formal request that includes a research plan 1.g.iii.1.b. Policies for sharing data outside describing the proposed research, a signed data Research Data Use Agreement (RDUA) approval from the researcher’s IRB the network for use of the dataset, or documentation that the use of public data sets is exempt from IRB review by institutional policy. The data coordinating center will disseminate the dataset after receipt of the aforementioned items. Data collected in this project do not include names, but do include sufficient identifying information (such as date of birth, 1.g.iii.1.c. Policies for protecting proprietary gender, zip code) that project investigators must protect the confidentiality of in accordance with privacy regulations such data as the Health Insurance Portability and Accountability Act (HIPAA). 1) Holmes, JF, Lillis K, Monroe D, Borgialli D, Kerrey BT, Mahajan P, Adelgais K, Ellison AM, Yen K, Atabaki S, Menaker J, Bonsu B, Quayle KS, Garcia M, Rogers A, Blumberg S, Lee L, Tunik M, Kooistra J, Kwok M, Cook LJ, Dean JM, Sokolove PE, Wisner DH, Ehrlich P, Cooper A, Dayan PS, Wootton-Gorges S, Kuppermann N, Pediatric Emergency Care Applied Research Network (PECARN). Identifying Children at Very Low Risk of Clinically Important Blunt Abdominal Injuries. Annals of Emergency Medicine, Available online 1 Feb 2013, ISSN 0196-0644, 10.1016/j.annemergmed.2012.11.009. 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) Pemberton VL, Browning B, Webster A, Dean JM, Moler FW. Therapeutic hypothermia after pediatric cardiac arrest trials: the vanguard phase experience and implications for other trials. Pediatr Crit Care Med. 2013 Jan;14(1):19-26.

3) Shaw KN, Lillis KA, Ruddy RM, Mahajan PV, Lichenstein R, Olsen CS, Chamberlain JM. Reported Medication Events in a Paediatric Emergency Research Network: Sharing to Improve Patient Safety. Emerg Med J. 2012 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Holmes JF, Borgialli DA, Nadel FM, Quayle KS, Schamban N, Cooper A, Schunk JE, Miskin ML, Atabaki SM, Hoyle JD, Dayan 2.b.i. What is the evidence? PS, Kuppermann N, and the TBI Study Group for the PECARN. Do children with blunt head trauma and normal cranial CT scans require hospitalization for neurological observation? Ann Emerg Med 2011. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHRs and providing biospecimens giving access to EHRs, etc.)

53 Criteria Answers 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? Corneli HM, Zorc JJ, Majahan P, Shaw KN, Holubkov R, Reeves SD, Ruddy RM, Malik B, Nelson KA, Bregstein JS, Brown KM, Denenberg MN, Lillis KA, Cimpello LB, Tsung JW, Borgialli DA, Baskin MN, Teshome G, Goldstein MA, Monroe D, Dean JM, 2.d.i.1. What is the evidence? Kuppermann N; Bronchiolitis Study Group of the Pediatric Emergency Care Applied Research Network (PECARN). A multicenter, randomized, controlled trial of dexamethasone for bronchiolitis. N Engl J Med. 2007 Jul 26;357(4):331-9. 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood collected? Explore the differences in host responses to bacterial vs. non-bacterial infections in young, febrile infants by quantifying 3.c. What types of analysis are done on them? changes in the host gene mRNA expression (transcriptional biosignatures) 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Explore the differences in host responses to bacterial vs. non-bacterial infections in young, febrile infants by quantifying on them? changes in the host gene mRNA expression (transcriptional biosignatures) 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? The Data Coordinating Center (DCC) is housed in a building with 24-hour on-site security guards. The DCC coordinates network infrastructure and security with the Health Sciences Campus (HSC) information systems at the University of Utah. This provides the DCC with effective firewall hardware, automatic network intrusion detection, and the expertise of 4.a. What type of security technology does dedicated security experts working at the University. User authentication is centralized with two Windows 2003-2008 the network use? domain servers. Communication over public networks is encrypted with virtual point-to-point sessions using secure socket layer (SSL) or virtual private network (VPN) technologies, both of which provide at least 128 bit encryption. All of the DCC Web-based systems use the SSL protocol to transmit data securely over the Internet. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10, CPT 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Site, Patient ID, Date of Birth, Gender, Race, Ethnicity, Zip Code, Triage Category, Chief Complaint, Procedure Codes, the network (e.g., EHR data, claims, patient- Diagnosis Codes, E-Code, Payer Type (Insurance), ED Disposition, Date Time (Triage Date/Time and Discharge Date/Time, reported outcomes, etc.). Mode of Arrival

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., The data are aggregated before they leave the site. The data sets are put on CD or DVD along with the Data Dictionary and based on what criteria are the data then sent to the researcher. aggregated)?

54 Criteria Answers 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Utilize SAS or Excel network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

55 Pediatric Health Information System+ (PHIS+)

Criteria Answers 1.a. How many people does the network 6,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to The FURTHER platform is scalable to allow addition of new hospitals and data types. PHIS+ will augment the Children's cover additional lives, diseases, conditions, or Hospital Association's existing database, PHIS, with laboratory and radiology data for children seen in the ambulatory and procedures inpatient departments of six large children's hospitals. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? "Merging of the National Cancer Institute–funded cooperative oncology group data with an administrative data source to develop a more effective platform for clinical trial analysis and comparative effectiveness research: a report from the Children's Oncology Group" R. Aplenc, B. T. Fisher, Y. S. Huang, Y. Li, T. A. Alonzo, R. B. Gerbing, M. Hall, D. Bertoch, R. 1.a.iii.1. What is the evidence? Keren, A. E. Seif, L. Sung, P. C. Adamson, A. Gamis. Pharmacoepidemiology and Drug Safety Supplement: Methods for Developing and Analyzing Clinically Rich Data for Patient-Centered Outcomes Research Volume 21, Issue Supplement S2, pages 37–43, May 2012 1.b.i.1. Demographics: racial/ethnic See Table 1 Children's Hospital of Philadelphia (CHOP), Cincinnati Children’s Hospital Medical Center (CCHMC), Children’s Hospital Boston (CHB), Children’s Hospital of Pittsburgh (CHP), Primary Children’s Medical Center, Intermountain Healthcare (Salt 1.b.i.2. Demographics: geography Lake City) (PCMC), Seattle Children’s Hospital (SCH) are the hospitals that the laboratory and radiology data comes from for PHIS+. Administrative data also come from all 43 PHIS Hospitals. 1.b.i.3. Demographics: age Most patients are ages 0-18 Females: 159,663 1.b.i.4. Demographics: gender Males: 645,255 1.c.i. What is the total annual budget? $9,000,000 1.c.i.1. How much of that budget is dedicated $600,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $2,900,000 to conducting studies? 1.c.ii. What are the current sources of Agency for Healthcare Research and Quality (AHRQ) funding? 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Pediatrics 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

56 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

Each of the hospitals that send data to Children's Hospital Association (CHA) has a business associate agreement with CHA. This means the entire patient record (with PHI) is sent to CHA, but when researchers go to pull the data, researchers fill out data use agreements with CHA and the researchers receive limited data sets, meaning they receive masked MRN and account numbers that allow researchers to follow patients longitudinally.

Business Associate Agreement (BAA) Between Hospitals and CHA (1) In order to facilitate matching of PHIS+ clinical data with corresponding administrative data shared with CHA through PHIS, hospital clinical data sent to CHA contain patient identifiers such as medical record number, hospital billing number, and date of service. To authorize the sharing of data with identifiers, a business associates agreement (BAA) was employed 1.g.iii.1.a. Data use and sharing policies for between each hospital and CHA. This BAA was already in place as a result of the PHIS participation of the 6 hospitals. institutional investigators to collaborate with each other using the data Data Use Agreement Between CHA and University of Utah BMIC (2) CHA drafted a data use agreement governing the sharing of de-identified hospital clinical data with the University of Utah BMIC. Under the agreement, CHA sends de-identified clinical data (as limited data sets) to BMIC, who uses the data to test and refine their mapping software. BMIC then sends the mapped results back to CHA. The only personal identifiers contained in the limited data sets are dates of service. This data use agreement is needed until CHA assumes responsibility for the mapping of clinical data sent from the hospitals.

Data Use Agreement Between CHA and Participating Hospitals (3) After PHIS+ is established, hospitals who want to receive limited data sets for research will sign a separate DUA for this data. CHA drafted a data use agreement to govern the delivery of PHIS+ data to hospital investigators. 1.g.iii.1.b. Policies for sharing data outside No outside researcher has access to PHIS+ data. In order to access this data, the researcher's hospital must contribute this the network data. 1.g.iii.1.c. Policies for protecting proprietary Researchers sign DUAs promising not to attempt to identify any of the patients. Additionally, hospitals cannot be identified data in the research. 1) S. P. Narus, R. Srivastava, R. Gouripeddi, O. E. Livne, P. Mo, J. P. Bickel, D. de Regt, J. W. Hales, E. Kirkendall, R. L. Stepanek, J. Toth, and R. Keren, (2011). “Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results from the PHIS+ Consortium,” AMIA Annu Symp Proc, vol. 2011, pp. 994–1003. 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) R. Gouripeddi, P. Warner, P. Mo, J. E. Levin, R. Srivastava, S. S. Shah, D. de Regt, E. Kirkendall, J. Bickel, E. K. Korgenski, M. Precourt, R. L. Stepanek, J. A. Mitchell, S. P. Narus, R. Keren, (2012). Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results for Microbiology from the PHIS+ Consortium. AMIA 2012 Annual Symposium Proceedings, November 3 -7, 2012, Proposal ID: AMIA-0205-A2012. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 1) Therapy for Acute Osteomyelitis in Children Prolonged Intravenous Therapy Versus Early Transition to Oral Antimicrobial, Theoklis Zaoutis, A. Russell Localio, Kateri Leckerman, Stephanie Saddlemire, David Bertoch and Ron Keren, Pediatrics 2009;123;636

2) Reflux related hospital admissions after fundoplication in children with neurological impairment: retrospective cohort 2.b.i. What is the evidence? study, Rajendu Srivastava, Jay G Berry, Matt Hall, Earl C Downey, Molly O’Gorman, J Michael Dean, Douglas C Barnhart, BMJ 2009;339:b4411

3) Hospital-Level Compliance With Asthma Care Quality Measures at Children’s Hospitals and Subsequent Asthma-Related Outcomes,Rustin B. Morse, Matthew Hall, Evan S. Fieldston, Gerd McGwire, Melanie Anspacher, Marion R. Sills, Kristi Williams, Naomi Oyemwense, Keith J. Mann, Harold K. Simon, Samir S. Shah, JAMA. 2011;306(13):1454-1460 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers A unique patient identifier permits longitudinal tracking of individual patients standardize survey type questions over a period of time?)

57 Criteria Answers 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHRs, and data from inpatient, emergency department, observation and outpatient care settings giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Not applicable collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? Child Health Corporation of America (CHCA) utilizes real-time security scans using an intrusion prevention system. This identifies network security threats and shuns traffic to prevent damage to the organization or compromised data. This service analyzes global and local sensor data in real-time and identifies hostile activity and other threats. An automated update engine then automatically issues commands to the firewall to block attacks. CHCA leverages a clustered firewall 4.a. What type of security technology does system in conjunction with the IPS to provide layered defenses against unauthorized access to data assets. CHCA the network use? application architecture isolates the databases from the SSL and SFTP processes, the ETL processes, as well as the web collection tools. The data gathered through web collection tools are SSL encrypted in-transit as it passes from the local device through the web server and onto the application server. No CHCA developed web collection tools create local copies of data on the local device. Database tape backups are additionally encrypted with 256-bit AES. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? With FURTHeR, on the fly query capability is replaced with a data file adaptor. FURTHeR typically aggregates and stores translated query results in a temporary, in-memory database for presentation and analysis by the investigator for the 4.b.ii. What is the architecture of the query duration of the user’s session. PHIS+ has added software to allow the in-memory database to instantiate a hibernate object distribution? that could be persisted to a physical, JDBC-compliant database. A special adapter also parses the text batch files.

4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? LOINC8, SNOMED, HL7, RxNorm, CPT 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

Using the Federated Utah Research and Translation Health electronic Repository, FURTHeR, the data are translated from 4.d.ii. Which CDM is used? the original source system to a common database using a tool developed by the Biomedical Informatics Core in Utah Using Regenstrief LOINC Mapping Assistant (RELMA). All laboratory and radiology data from six hospitals are pulled and sent to CHA and run through filters. If statisticians at CHA have mapped a particular element to a corresponding data element in the common database, then the data element 4.d.iii. How are the data transformed and will map. Everything that does not get mapped is put in a "bin" of unmapped data. This data can be used at a later date; if mapped? statisticians choose to add additional data elements, they have the unmapped data waiting to be remapped using these additional data elements. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a See Table 1 way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- EHR data, Radiology Data, Laboratory Data, Administrative Data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

58 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, An adaptation of the clinical information extraction system ‘Textractor’ etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? When a query is submitted by a researcher, a simple command line interface initiates the process by pointing the data file adapter to the correct sample file and configuration file, and invoking the FURTHeR application. The translation engine 4.h.ii. How are the data transformed (i.e., marshals the raw lab data into the FURTHeR lab object and translates all local codes to the standard terminologies (using based on what criteria are the data the code associations in the terminology server). Unrecognized codes and malformed input data are flagged to a log file for aggregated)? manual review. An output adapter takes each translated lab result and inserts it into a MySQL database via a Java Hibernate object. 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SPSS code, SAS scripts, STATA code, and R code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Clinical Transaction Code System by Thompson Reuters

59 Table 1. A subset of the Lab Sample 1 metadata fields and their descriptions.

60 SCAlable National Network for Effectiveness Research (SCANNER)

Criteria Answers 1.a. How many people does the network Not available cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or SCANNER is designed to be scalable so that additional studies can be added using the existing technology procedures 1.a.ii.1. Can the network be used for new Yes, SCANNER is designed to be study-agnostic studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? In theory, this should be the case, but no evidence yet Caucasian: 8% African American: 0.94% American Indian/Eskimo: 0.048% Asian/Pacific Islander: 1.17% 1.b.i.1. Demographics: racial/ethnic Hispanic: 2.47% Hispanic/Latino: 2.54% Multi-Racial: 2.92% Non-Hispanic: 10.56% 1.b.i.2. Demographics: geography Not available < 18: 4.6% 18-30: 12.2% 1.b.i.3. Demographics: age 31-50: 30% 51-70: 32% > 70: 21.2% Male: 47% 1.b.i.4. Demographics: gender Female: 53% 1.c.i. What is the total annual budget? $2,769,968 1.c.i.1. How much of that budget is dedicated to infrastructure and maintenance? Not available 1.c.i.2. How much of that budget is dedicated to conducting studies? Not available 1.c.ii. What are the current sources of funding? Agency for Healthcare Research and Quality 1.c.iii. How much does it cost each year to maintain and update the network? Not applicable until after network is deployed 1.d. How many years has this network existed? Not applicable until after network is deployed 1.e.i. (Y/N) Does the network have a focus (i.e., topic area or purpose)? Yes Comparative Effectiveness Research: 1. medication surveillance: old vs. new antiplatelet medications (patients with acute 1.e.i.1. What does the network focus on? coronary syndrome) and old vs. new anticoagulant medications (patients with atrial fibrillation and patients with venous thromboembolism); 2. medication therapy management in patients with diabetes and hypertension 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - Patients consent on a study-by-study basis research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life)

61 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for IRB needed for all institutions; IRB and data sharing agreement needed for VA (VA's data sharing agreement is called institutional investigators to collaborate with CRADA) each other using the data 1.g.iii.1.b. Policies for sharing data outside Sharing allowed with approved IRB and data use agreements (for limited data sets or identified data) the network 1.g.iii.1.c. Policies for protecting proprietary Data are HIPAA-compliant and limited datasets are shared with approved IRB in place data 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHRs giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Not applicable 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Not applicable collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Planned to be 2-factor authentication and study-based authorization the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Query distribution via central hub through a portal; the architecture of the distribution is hub and spoke style distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, RxNORM, LOINC, CPT 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Observational Medical Outcomes Project (OMOP)

62 Criteria Answers 4.d.iii. How are the data transformed and SQL scripts, ETL mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a If studies require introducing additional concepts to the OMOP vocabulary, the OMOP vocabulary is augmented. way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, encounters, procedures, medications, labs, vitals, and conditions reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data aggregated)? If data are aggregated, options include summary statistics from regressions executed locally on each node. 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Options include custom implementations of multivariate statistics, as well as features of the Weka package that are network? installed on every node

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? ETL and source data management is left to the discretion of each site.

63 Society for Vascular Surgery Vascular Quality Initiative (SVS VQI)

Criteria Answers 1.a. How many people does the network 65,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Covers vascular procedure studies at multiple hospitals procedures 1.a.ii.1. Can the network be used for new Yes - same condition studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? Use of protamine sulfate to reverse heparin anticoagulation during carotid endarterectomy, Stone et al, J Vasc Surg, 2010 White: 88.6% Black or African American: 7.9% Hispanic or Latino: 3.1% Asian: 0.6% 1.b.i.1. Demographics: racial/ethnic American Indian or Alaskn Native: 0.2% Native Hawaiian or Other Pacific Islander: 0.1% Unknown/Other: 2.6% More than 1 race: 0.1% 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age 69.2 +/- 11 Age (15 - 103) Male: 63.5% 1.b.i.4. Demographics: gender Female: 36.5% 1.c.i. What is the total annual budget? $2,100,000 1.c.i.1. How much of that budget is dedicated $1,500,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $650,000 to conducting studies? 1.c.ii. What are the current sources of Participating center annual subscription fees funding? 1.c.iii. How much does it cost each year to $1,500,000 maintain and update the network? 1.d. How many years has this network 2 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Vascular surgery procedures 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

64 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Participants who want to be involved with VQI must sign agreements with M2S (Network security provider) and SVS PSO institutional investigators to collaborate with (Patient Safety Organization. Once these agreements have been signed and approved, the participant must pay annual each other using the data fees. Participants who want to be involved with VQI must sign agreements with M2S (Network security provider) and SVS PSO 1.g.iii.1.b. Policies for sharing data outside (Patient Safety Organization. Once these agreements have been signed and approved, the participant must pay annual the network fees. 1.g.iii.1.c. Policies for protecting proprietary Utilize the AHRQ-listed Patient Safety Organization, data stored in the network are all de-identified data 1) Nolan BW, De Martino RR, Goodney PP, Schanzer A, Stone DH, Butzel D, Kwolek CJ, Cronenwett JL; Vascular Study Group of New England. Comparison of carotid endarterectomy and stenting in real world practice using a regional quality improvement registry. J Vasc Surg. 2012; 55: 990-6.

2.a. Three most recent (or high impact) 2) Simons JP, Schanzer A, Nolan BW, Stone DH, Kalish JA, Cronenwett JL, Goodney PP; Vascular Study Group of New studies published in peer-reviewed journals England. Outcomes and practice patterns in patients undergoing lower extremity bypass. J Vasc Surg. 2012;55:1629-36.

3) Wallaert JB, Nolan BW, Adams J, Stanley AC, Eldrup-Jorgensen J, Cronenwett JL, Goodney PP. The impact of diabetes on perioperative outcomes following lower-extremity bypass surgery. J Vasc Surg. 2012; 56: 1317-23.

2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Simons JP, Schanzer A, Nolan BW, Stone DH, Kalish JA, Cronenwett JL, Goodney PP; Vascular Study Group of New England. 2.b.i. What is the evidence? Outcomes and practice patterns in patients undergoing lower extremity bypass. J Vasc Surg. 2012; 55:1629-36. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By providing claims data and manually entering data into a web form giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes?

65 Criteria Answers PATHWAYS is a cloud-based platform which stores information directly into a database at a central data warehouse managed by M2S. Unique username-password combinations authenticate users and permit access only to the appropriate content. All passwords are stored using a one-way hash encryption process with a custom salt. Passwords expire every 180 days and cannot be reused for five generations. This ensures that the user is the only person who knows his or her password. PATHWAYS will also automatically log the user out of his or her session after 15 minutes of inactivity. To protect accounts from malicious attacks, users will be locked out of the system after five consecutive unsuccessful attempts to log- in. The database manager will then need to unlock the account before the user can log-in again. PATHWAYS utilizes 256-bit 4.a. What type of security technology does SSL encryption protocols, which is the same technology used by online banking and financial institutions, as well as the network use? healthcare providers, to protect their customers’ personal information. M2S registry users do not interface directly with the database server, but rather connect to the registry through a separate server, or “proxy” server. This proxy server filters all communication between the clients and the database and prevents unauthorized users from accessing the registry data. Communication from authorized users is relayed by the proxy server to the database through M2S’s internal firewall. Registry data is never stored on the proxy server, which greatly reduces the possibility for data to be lost, stolen, or accessed by an unauthorized party. PATHWAYS protects PHI by preventing the browser from caching sensitive data. Furthermore, PATHWAYS does not require ActiveX or Java plug-ins to run, and never writes PHI to the user’s computer. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? CPT, ICD-9 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown, they use a data dictionary way to map back to standards? (Data Dictionary?)

4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographic, risk factor, major outcomes, and complication data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., The data are aggregated before they leave the site and are then sent electronically and securely to the researcher based on what criteria are the data requesting the data aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

66 UC-Research eXchange (UCReX)

Criteria Answers 1.a. How many people does the network 11,800,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or No procedures 1.a.ii.1. Can the network be used for new No - Not yet studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $1,000,000 1.c.i.1. How much of that budget is dedicated $1,000,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of NCATS and NIH funding? 1.c.iii. How much does it cost each year to $1,000,000 maintain and update the network? 1.d. How many years has this network 2 existed? 1.e.i. (Y/N) Does the network have a focus No - none yet (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Does not share data outside network each other using the data

67 Criteria Answers 1.g.iii.1.b. Policies for sharing data outside Does not share data outside network the network Queries return only aggregate counts. 1.g.iii.1.c. Policies for protecting proprietary Aggregate numbers are blurred (or obfuscated), so that the counts returned are an estimate of the number of patients data meeting the queried upon criteria at each institution. No personally identifiable patient information ever leaves an individual institution. 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Provide access to EHR giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? Each user of the system needs to be authenticated at their individual institution to verify employment and faculty status. All communications are encrypted using standards approved by the W3C Consortium. 4.a. What type of security technology does Institution-specific user log-in credentials never leave an individual institution. the network use? Users must register the topics they would like to query with the Data Steward application. The Data Steward administrator manually reviews all query requests to make sure they are in compliance. Actual query histories are logged and audited on a regular basis to ensure that there have been no violations of the Terms and Conditions. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Can query the database either by ICD-9 codes for diagnostics or by demographics (no standardized terminologies used), distribution? and returns results as counts 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? LOINC, SNOMED-CT, CPT-4, ICD-9, UCUM, RXNorm, HL7 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? i2b2 4.d.iii. How are the data transformed and SQL scripts, ETL mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

68 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Standardize data types and date ranges way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Age, Ethnicity, Gender, Language, Marital Status, Race, Religion, Diagnosis, and procedure data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Only counts are being aggregated locally and then sent out to the central node aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the None network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Custom ETL tool

69 Wisconsin Network for Health Research (WiNHR)

Criteria Answers 1.a. How many people does the network 4,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The network consists of hospitals all across the state of Wisconsin. Conducting studies on a multitude of conditions procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of Confidential funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 8 existed? 1.e.i. (Y/N) Does the network have a focus No - does not have a focus because they cover a wide-range of hospitals across the state and see a large population (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Currently the institutions involved within our network participate in the Wisconsin Institutional Review Board (IRB) institutional investigators to collaborate with Consortium (WIC). This leads to a shared vision of human subjects protection priorities as well as shared Standard each other using the data Operating Procedures.

70 Criteria Answers 1.g.iii.1.b. Policies for sharing data outside In order for a researcher to utilize the data within the WiNHR network, they must have at least 2 WiNHR sites participating the network and who have agreed to do so before obtaining the data 1.g.iii.1.c. Policies for protecting proprietary All data are HIPAA compliant and de-identified data 1) WMJ. 2011 Apr; 110(2):68-73. “The differential diagnosis of pulmonary blastomycosis using case vignettes: a Wisconsin Network for Health Research (WiNHR) study.” Baumgardner DJ, Temte JL, Gutowski E, Agger WA, Bailey H, Burmester JK, Banerjee I. 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) WMJ. 2009 Dec; 108(9):453-8. “The Wisconsin Network for Health Research (WiNHR): a statewide, collaborative, multi- disciplinary, research group.” Bailey H, Agger W, Baumgardner D, Burmester JK, Cisler RA, Evertsen J, Glurich I, Hartman D, Yale SH, DeMets D.

2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHR data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Genetic analyses on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does Source data are protected and managed by the OnCore security framework the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? Each integration channel is supported through a set of internal APIs. The APIs can be used to either automate the process 4.b.ii. What is the architecture of the query of determining what data become part of OnCore or, if necessary, allow users to review the data to manually determine distribution? what should be transferred to OnCore. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

71 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a XMAPS way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, labs, pathology, vitals, medications, conditions, procedures, and treatments reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the No network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Built-in protocol in OnCore

72 IV. Inventories of PPRNs

23andMe

Criteria Answers 1.a. How many people does the network 150,000 cover or involve? 1.a.i. Evidence of capacity for expansion to 23andMe mainly conducts studies on Parkinson's disease but their researchers do other genetic studies using their cover additional lives, diseases, conditions, or customer's data procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Mostly White and 10,000 African Americans (Roots into the Future project) 1.b.i.2. Demographics: geography NY and CA 1.b.i.3. Demographics: age 30-65 Male: 60% 1.b.i.4. Demographics: gender Female:40% 1.c.i. What is the total annual budget? $50,000,000 1.c.i.1. How much of that budget is dedicated Factored into the staffing costs to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Roots into the Future project to conducting studies? 1.c.ii. What are the current sources of Venture funding, subscription costs funding? 1.c.iii. How much does it cost each year to Factored into the staffing costs maintain and update the network? 1.d. How many years has this network 4 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Parkinson's Disease, Sarcoma, and Myeloproliferative Neoplasms 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Have control on changing privacy settings and consent status the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Currently does not share data with institutional investigators each other using the data 1.g.iii.1.b. Policies for sharing data outside Does not share data outside of the network the network

73 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary investigators do not have access to personally identifying "Registration Information" data 2.a. Three most recent (or high impact) None but have research findings on their website: https://www.23andme.com/about/factoids/ studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By referring patients to studies (esp. Parkinson's disease) giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct DNA sequencing on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does Security Audits, Telepost Kabel-Service (tks) protocol, Transfer Layer Security (tls) protocol the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, Genetic Data, Conditions, Medications, Outcomes reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

74 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Use customized scripts to extract drug names from free text etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data is located at one site already aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the R, Python scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

75 ACOR

Criteria Answers 1.a. How many people does the network Not available cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Users interested in a new condition can start a new list by contacting the list coordinator procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 18 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Virtual cancer support groups 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No - studies are conducted using data collected by this website consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and The data required from the users by ACOR are e-mail and name. It is at the discretion of the user as to what other personal in what mechanism? How are they involved in identifying information they choose to share on a message board the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Information collected about patient may be used by researchers in aggregate form only, all surveys initiated by a third institutional investigators to collaborate with party must be approved by ACOR before being posted each other using the data 1.g.iii.1.b. Policies for sharing data outside Information collected about patient may be used by researchers in aggregate form only, all surveys initiated by a third the network party must be approved by ACOR before being posted 1.g.iii.1.c. Policies for protecting proprietary Proprietary data is only released in aggregate form. data

76 Criteria Answers 2.a. Three most recent (or high impact) Not applicable studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not applicable values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Name, Email, list subscriptions, disease subtopic interests reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used?

77 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

78 The Dr. Susan Love Research Foundation’s Love/Avon Army of Women

Criteria Answers 1.a. How many people does the network 371,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Have matched women volunteers to 71 studies procedures 1.a.ii.1. Can the network be used for new No studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic See Table 1 1.b.i.2. Demographics: geography See Table 3 1.b.i.3. Demographics: age See Table 2 Male: 0.3% 1.b.i.4. Demographics: gender Female: 99.7% 1.c.i. What is the total annual budget? $300,000 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not applicable to conducting studies? 1.c.ii. What are the current sources of Avon Foundation for Women funding? 1.c.iii. How much does it cost each year to $250,000 maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Primarily on matching women who would like to participate in breast cancer studies with researchers 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Researchers must register and submit an application with Army of Women about themselves, including CV, and 1.g.iii.1.a. Data use and sharing policies for information about the study they would like to conduct. They would also have to submit an IRB. The Science Advisory institutional investigators to collaborate with Committee reviews the application and once approved. The researcher will then be assigned to two advocates of Army of each other using the data Women who will aid in the research

79 Criteria Answers Researchers must register and submit an application with Army of Women about themselves, including CV, and 1.g.iii.1.b. Policies for sharing data outside information about the study they would like to conduct. They would also have to submit an IRB. The Science Advisory the network Committee reviews the application and once approved. The researcher will then be assigned to two advocates of Army of Women who will aid in the research 1.g.iii.1.c. Policies for protecting proprietary None data 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By referring patients giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Hosted on a secure website using firewalls the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Name, age, city, and state of residence reported outcomes, etc.).

80 Criteria Answers 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data When making presentations at scientific conferences aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

81 Army of Women Demographics (February 20, 2013)

Ethnicity # members Percent total

Caucasian 321,000 86.35%

African American 12,776 3.44%

Hispanic/Latina 12,344 3.32%

Asian 4,031 1.08%

Native American 1,697 0.46%

Pacific Islander 648 0.17%

Other 5,945 1.6%

None Selected 13,286 3.57%

Year of Birth (Ordered by Year)

Year # Women % Total 1910 6 0% 1912 1 0% 1913 1 0% 1914 1 0% 1916 2 0% 1917 2 0% 1918 2 0% 1919 3 0% 1920 44 0.01% 1921 35 0.01% 1922 40 0.01% 1923 68 0.02% 1924 79 0.02% 1925 118 0.03% 1926 154 0.04% 1927 205 0.06% 1928 270 0.07% 1929 297 0.08% 1930 452 0.12% 1931 531 0.14%

82 Year of Birth (Ordered by Year) (continued)

Year # Women % Total 1932 660 0.18% 1933 731 0.2% 1934 1002 0.27% 1935 1290 0.35% 1936 1555 0.42% 1937 1956 0.53% 1938 2472 0.67% 1939 2844 0.77% 1940 3587 0.96% 1941 4362 1.17% 1942 5524 1.49% 1943 6022 1.62% 1944 6079 1.64% 1945 6519 1.75% 1946 8743 2.35% 1947 9959 2.68% 1948 10011 2.69% 1949 9887 2.66% 1950 10102 2.72% 1951 10445 2.81% 1952 10804 2.91% 1953 10775 2.9% 1954 10932 2.94% 1955 10798 2.9% 1956 10467 2.82% 1957 10768 2.9% 1958 10269 2.76% 1959 9904 2.66% 1960 9595 2.58% 1961 9240 2.49% 1962 8802 2.37% 1963 8774 2.36% 1964 8471 2.28% 1965 7822 2.1% 1966 7385 1.99% 1967 7235 1.95% 1968 7253 1.95% 1969 7402 1.99% 1970 7496 2.02% 1971 6936 1.87% 1972 6309 1.7% 1973 5914 1.59% 1974 5949 1.6% 1975 5765 1.55% 1976 5599 1.51%

83 Year of Birth (Ordered by Year) (continued)

Year # Women % Total

1977 5712 1.54% 1978 5464 1.47% 1979 5437 1.46% 1980 5234 1.41% 1981 5038 1.36% 1982 4500 1.21% 1983 4106 1.1% 1984 3698 0.99% 1985 3135 0.84% 1986 2553 0.69% 1987 2051 0.55% 1988 1677 0.45% 1989 1346 0.36% 1990 4056 1.09% 1991 550 0.15% 1992 292 0.08% 1993 111 0.03% 1994 41 0.01% 1995 2 0%

Members by State

State # Women % Total California 38814 10.44% None Selected 30460 8.19% New York 22562 6.07% Florida 21744 5.85% Texas 18913 5.09% Pennsylvania 14218 3.82% Illinois 13266 3.57% Massachusetts 12426 3.34% Michigan 12208 3.28% Ohio 12098 3.25% Virginia 12093 3.25% New Jersey 11198 3.01% Georgia 10622 2.86% North Carolina 10461 2.81% Maryland 9921 2.67% Washington 7956 2.14% Colorado 7542 2.03% Arizona 7234 1.95% Wisconsin 7226 1.94%

84 Members by State (continued)

State # Women % Total

Indiana 6513 1.75% Minnesota 6092 1.64% Missouri 5780 1.55% Connecticut 5766 1.55% Tennessee 5207 1.4% Oregon 4937 1.33% South Carolina 4625 1.24% Alabama 3716 1% Iowa 3661 0.98% Kentucky 3562 0.96% Kansas 3090 0.83% Maine 3087 0.83% New Hampshire 2679 0.72% Oklahoma 2607 0.7% Louisiana 2401 0.65% Nevada 2167 0.58% New Mexico 2134 0.57% Rhode Island 2046 0.55% Nebraska 2041 0.55% Arkansas 1987 0.53% Idaho 1840 0.49% Utah 1750 0.47% West Virginia 1579 0.42% Mississippi 1396 0.38% Delaware 1320 0.36% Vermont 1148 0.31% Montana 1074 0.29% District of Columbia 1068 0.29% Alaska 1021 0.27% Hawaii 787 0.21% Ontario 780 0.21% South Dakota 618 0.17% North Dakota 591 0.16% Wyoming 535 0.14% British Columbia 336 0.09% Alberta 221 0.06% Puerto Rico 189 0.05% Quebec 88 0.02% Nova Scotia 75 0.02% Manitoba 64 0.02% Saskatchewan 60 0.02% New Brunswick 47 0.01% AE 29 0.01%

85 Members by State (continued)

State # Women % Total

Newfoundland 16 0% Prince Edward Island 12 0% AP 7 0% Yukon Territory 6 0%

86 Asthmapolis

Criteria Answers 1.a. How many people does the network 1,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The network has just signed contracts that will double the number of users in March 2013. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? http://www.asthmapolis.com/wp-content/uploads/2012/12/Quality-Measures-White-Paper.pdf 1.b.i.1. Demographics: racial/ethnic High levels of Hispanics and African Americans 1.b.i.2. Demographics: geography Louisville, KY, Sacramento, CA, Florida, Boston, Hawaii, Seattle, 1.b.i.3. Demographics: age Ages 5 and older 1.b.i.4. Demographics: gender Same as the census 1.c.i. What is the total annual budget? Not available (spending increases ~10% per month) 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Walgreens is a major funder funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Since summer 2012 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Improving the management of asthma 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - The patient decides to share specific information with their care team, family, or friends. The patient can give or research) or specific use of their electronic remove access to their data instantly through their personal profile. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and The patient owns and controls their data. Asthmapolis is opt-in and the patient can change their preferences about which in what mechanism? How are they involved in data, if any, they share. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Business associate agreements, IRB and advisory board activities each other using the data 1.g.iii.1.b. Policies for sharing data outside Business associate agreements, IRB and advisory board activities the network 1.g.iii.1.c. Policies for protecting proprietary The partner hospitals own the data, the patients own the data data

87 Criteria Answers Van Sickle, D, Magzamen, S, Truelove, S, and Morrison, T. “Remote Monitoring of Inhaled Bronchodilator Use and Weekly 2.a. Three most recent (or high impact) Feedback About Asthma Management: An Open-Group Short-Term Pilot Study of the Impact on Asthma Control.” PLoS studies published in peer-reviewed journals One (2013): XX(XX):XX-XX. Publication is under embargo. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Asthmapolis brings remote monitoring to asthma epidemiology by providing the first real-time geospatial view of where 2.b.i. What is the evidence? asthma symptoms are occurring and asthma inhalers are used 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? For McKesson or Allere healthways, Asthmapolis is the replacement for their disease management systems. Dignity Health 2.c.ii. How? (Examples: by referring patients, Systems and the VA in Seattle use Asthmapolis for their COPD patients. Asthmapolis then uses the data collected at these giving access to EHRs, etc.) sites for research. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? 2.d.i.1. What is the evidence? Currently underway at Dignity Health Systems in Sacramento, CA. 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Multiple networks - frequency hopping protocol for Bluetooth, SSL encryption for all the data, everything is run on Amazon the network use? Web Services 4.b.i. (Y/N) Are queries distributed via a Yes central hub? The SQL server pulls data being recorded by the apps and the web. No data is held locally on the phone or personal 4.b.ii. What is the architecture of the query computer. A provider or researcher logs onto a secure portal using a secure login and the researcher can download the distribution? data that they need from the website. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Activity limitations, symptoms triggers, day to day burden and management, inhaler medications (daily and as needed the network (e.g., EHR data, claims, patient- medications), frequency, time, and location of rescue medication, diagnostic results reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

88 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Sometimes, shown in an aggregate way to the patients themselves the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data It is based on who the information is being given to (researcher, patient, doctor) aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the BETA program of about 100 users network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Mongo using C++

89 BRIDGE

Criteria Answers 1.a. How many people does the network 923 cover or involve? 1.a.i. Evidence of capacity for expansion to Currently only focusing on three areas: Breast Cancer, Fanconi Anemia, and Real Names (Amyotrophic Lateral Sclerosis cover additional lives, diseases, conditions, or (ALS) and Parkinson's) procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Confidential 1.b.i.2. Demographics: geography Confidential 1.b.i.3. Demographics: age Confidential 1.b.i.4. Demographics: gender Confidential 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $200,000 to conducting studies? 1.c.ii. What are the current sources of Confidential funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 9 months existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Breast Cancer, ALS, Parkinson's, Fanconi Anemia 1.f. (Y/N) Does the network use informed Yes consent forms? Broad - Portable Legal Consent (PLC) is a standardized informed consent system for anyone who has obtained data relevant to their health and would like to donate that data for research purposes. PLC works by running volunteers through 1.f.i. Do patients consent to the broad a short process in which they learn about informed consent, sign an IRB-approved informed consent form, and then upload (meaning data may be analyzed for other the data they have chosen for donation. The existing PLC system does not transmit “identified” data, donors must indicate research) or specific use of their electronic that they understand there are some risks of re-identification and harm in volunteering for donation. For the purposes of data? the RNDP it will be necessary to rewrite the PLC to recognize that all RNDP participants will willingly provide their own names and genomic data. Broad - Portable Legal Consent (PLC) is a standardized informed consent system for anyone who has obtained data relevant to their health and would like to donate that data for research purposes. PLC works by running volunteers through 1.f.ii. Do patients consent to the broad a short process in which they learn about informed consent, sign an IRB-approved informed consent form, and then upload (meaning data may be analyzed for other the data they have chosen for donation. The existing PLC system does not transmit “identified” data, donors must indicate research) or specific use of their biological that they understand there are some risks of re-identification and harm in volunteering for donation. For the purposes of specimens? the RNDP it will be necessary to rewrite the PLC to recognize that all RNDP participants will willingly provide their own names and genomic data. 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and The users are in charge of what information they would like to provide when signing-up and also whether or not they in what mechanism? How are they involved in would like to participate in answering surveys the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

90 Criteria Answers Portable Legal Consent (PLC) is a standardized informed consent system for anyone who has obtained data relevant to their health and would like to donate that data for research purposes. PLC works by running volunteers through a short 1.g.iii.1.a. Data use and sharing policies for process in which they learn about informed consent, sign an IRB-approved informed consent form, and then upload the institutional investigators to collaborate with data they have chosen for donation. The existing PLC system does not transmit “identified” data, donors must indicate that each other using the data they understand there are some risks of re-identification and harm in volunteering for donation. For the purposes of the RNDP it will be necessary to rewrite the PLC to recognize that all RNDP participants will willingly provide their own names and genomic data. 1.g.iii.1.b. Policies for sharing data outside Not available the network Collect personal information but encrypt it using SSL protocol. Use/disclose personal information without separate consent 1.g.iii.1.c. Policies for protecting proprietary to provide information about BRIDG or other issues of interest, or inform the users about the new studies of interest, to data meet legal requirements. BRIDG does not sell personal information without prior written consent. 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood collected? whole genome sequence for each patient (from whole blood) serial draw whole blood transcriptomics data 3.c. What types of analysis are done on them? serial draw blood serum proteomics data serial draw blood serum metabolomics data 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Sequencing on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does Web 2.0, the secure server software receives encrypted information through the Secure Sockets Layer (SSL) the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query REST distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

91 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, conditions, medications, and genomic data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data The REST querying system aggregates the data aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the R, Bioconductor, Gene Pattern network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

92 Collaborative Chronic Care Network (C3N)

Criteria Answers 1.a. How many people does the network 15,000 cover or involve? 1.a.i. Evidence of capacity for expansion to There are currently approximately 10 studies underway and is actively engaged in quality improvement and has automated cover additional lives, diseases, conditions, or population management and pre-visit planning tools that provide real-time clinical information and decision support procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for Yes clinical care delivery or quality improvement? Crandall WV, Boyle BM, Colletti RB, Margolis PA, Kappelman MD. Development of process and outcome measures for 1.a.iii.1. What is the evidence? improvement: lessons learned in a quality improvement collaborative for pediatric inflammatory bowel disease. Inflamm Bowel Dis. 2011 Oct;17(10):2184-91 White: 85% 1.b.i.1. Demographics: racial/ethnic African American:10% Other: 5% 1.b.i.2. Demographics: geography National (27 states) + 1 site in London, England 0 to 14 years: 40% 1.b.i.3. Demographics: age 15 to 17 years: 35% >17 years: 25% Male: 55% 1.b.i.4. Demographics: gender Female: 45% 1.c.i. What is the total annual budget? $2,000,000 1.c.i.1. How much of that budget is dedicated $30,000-$100,00/year to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $925,500 to conducting studies? 1.c.ii. What are the current sources of AHRQ Enhanced Registries grant and NIH-funded Transformative TR01 grant funding? 1.c.iii. How much does it cost each year to $1,200,000 maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Focuses primarily on Crohn's disease 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad Not applicable - Under the new protocol, the entire population is included for clinical and QI purposes (with no consent (meaning data may be analyzed for other required). It includes consent for research and limited research datasets (i.e., dates). The new IRB includes provisions for research) or specific use of their electronic transferring data from legacy patients based on local IRB review. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Patients are able to change privacy settings and elect the amount of information they provide when registering and/or in what mechanism? How are they involved updating their condition status in the decision-making process? 1.g.ii.1. What are the sources of Self- Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR and Registry data pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

93 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, Not applicable lab orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Policy is a work in progress, Not available yet each other using the data 1.g.iii.1.b. Policies for sharing data outside Policy is a work in progress, Not available yet the network Queries return only aggregate counts. 1.g.iii.1.c. Policies for protecting proprietary Aggregate numbers are blurred (or obfuscated), so that the counts returned are an estimate of the number of patients data meeting the queried upon criteria at each institution. No personally identifiable patient information ever leaves an individual institution. 1) Colletti RB, Baldassano RN, Milov DE, Margolis PA, Bousvaros A, Crandall WV, Crissinger KD, D'Amico MA, Day AS, Denson LA, Dubinsky M, Ebach DR, Hoffenberg EJ, Kader HA, Keljo DJ, Leibowitz IH, Mamula P, Pfefferkorn MD, Qureshi MA. Variation in care in pediatric Crohn disease. J Pediatr Gastroenterol Nutr. 2009 Sep;49(3):297-303

2) Kappelman MD, Crandall WV, Colletti RB, Goudie A, Leibowitz IH, Duffy L, Milov DE, Kim SC, Schoen BT, Patel AS, Grunow 2.a. Three most recent (or high impact) J, Larry E, Fairbrother G, Margolis P. Short pediatric Crohn's disease activity index for quality improvement and studies published in peer-reviewed journals observational research. Inflamm Bowel Dis. 2011 Jan;17(1):112-7

3) Burt RS, Meltzer DO, Seid M, Borgert A, Chung JW, Colletti RB, Dellal G, Kahn SA, Kaplan HC, Peterson LE, Margolis P. What's in a name generator? Choosing the right name generators for social network surveys in healthcare quality and safety research. BMJ Qual Saf. 2012 Dec;21(12):992-1000. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? The registry contains data from all visits for each patient including standardized process of care measures and outcome 2.b.i. What is the evidence? measures. Beginning in January 2013, patient reported outcomes will begin to be measured 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, The network is actively collaborating with 50 clinical sites many of which are large academic medical centers including giving access to EHRs, etc.) most of the largest children’s hospitals. There is senior leadership involvement at many if not all of these sites. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? The network has conducted a pilot center-based study involving planned allocation of centers to different combinations of chronic care management approaches across 30+ centers. The network is currently supporting a project involving 2.d.i.1. What is the evidence? randomization of treatments for individual patients as part of an N of 1 trials. Randomization of patients for RCTs has not been undertaken but is feasible. 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on Not applicable them? 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? Connections to the web-based registry front-end are encrypted using SSL. Users are given a unique username and password that needs to be changed on a periodic basis and must conform to certain characteristics. The servers are 4.a. What type of security technology does located in the hospital (Cincinnati Children's) data center, which is physically secured. The servers are on a protected the network use? network that is firewalled off from the hospital network and the internet. Access is controlled via an identity and access management appliance. Non-date PHI elements are stored in an encrypted database. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Log into the SHRINE web-portal and input your query based on conditions and demographics. The query is then sent to distribution? participating sites where it aggregates data and returns the count

94 Criteria Answers 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? LOINC, RxNorm 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? i2b2 4.d.iii. How are the data transformed and Utilizing the SHRINE network mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

Centers are provided with a case report form and asked to modify their clinical visit forms to capture the necessary data in 4.e.i.1. What standards, possibly home the medical record. For centers with an EMR, they can configure a form to capture that data directly at the point of care. grown, are used? If home grown, is there a This data can then be extracted from the EMR and uploaded to the registry. We are pushing for a model where there is way to map back to standards? (Data one form for each of the major EMR vendors used by ImproveCareNow centers (Epic, Cerner, GE). We will create one Dictionary?) mapping per vendor. This already exists for Epic and is in process for Cerner and GE. Centers who are not live with or do not have an EMR can abstract the data and perform double data entry into the registry webforms 4.f. List the types of data that are being collected or accessed and incorporated into Registry Data, EMR, Patient reported outcomes, daily symptoms, disease activities indices, short form of promise survey, the network (e.g., EHR data, claims, patient- PDSQL, remote sensors, custom SMS queries reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Exploring these approaches etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data The data are aggregated at a central site. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Use the statistical tool using SHRINE but extracts are available for SAS network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

95 Cancer Commons

Criteria Answers 1.a. How many people does the network 100 cover or involve? 1.a.i. Evidence of capacity for expansion to Working on studies with Melanoma Research Foundation, Melanoma Research Alliance, Lung Cancer Foundation, Lung cover additional lives, diseases, conditions, or Cancer Alliance procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $2,000,000 1.c.i.1. How much of that budget is dedicated $200,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated None to conducting studies? 1.c.ii. What are the current sources of Pharmaceutical Companies, Stand up to Cancer, SEED Philanthropy funding? 1.c.iii. How much does it cost each year to $200,000 maintain and update the network? 1.d. How many years has this network 6 months existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Building a community to bring Patients, Physicians, and Researchers together 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in They are able to control how much of their data are shared the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Share data with others using a Data Use Agreement each other using the data 1.g.iii.1.b. Policies for sharing data outside Does not share outside of network yet the network 1.g.iii.1.c. Policies for protecting proprietary Stores only de-identified data

96 Criteria Answers 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not available values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Not available reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Refer patients, provide EHR, and participate in research giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Genomic sequencing to determine the subtype of cancer the patient has on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? 4.a. What type of security technology does Third party cloud server that is HIPAA-Compliant the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Have a home grown mapping tool way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographic data, cancer sub-type, treatments, biomarkers, outcomes reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used?

97 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

98 Crohnology

Criteria Answers 1.a. How many people does the network 2,800 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The network began with a few hundred users and is currently 2,800. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available United states: 75% 1.b.i.2. Demographics: geography Europe, Australia, Asia and South Africa: 25% 1.b.i.3. Demographics: age See Chart 1 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Budget for 3 software developers 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated None to conducting studies? 1.c.ii. What are the current sources of Y combinator, Angel investors funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 2 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Crohn's disease and colitis 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Patients decide what personally identifying information to provide to the website. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data sharing has not taken place but in the future Crohnology would like to share data with institutional collaborators. each other using the data 1.g.iii.1.b. Policies for sharing data outside No patient information has been shared outside the network for research purposes yet, but Crohnology would like to share the network data outside the network in the future.

99 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary No proprietary data data 2.a. Three most recent (or high impact) No published studies studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers No standardization yet necessary because the PPRN is so new standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does The server is kept in a secure location in Colorado. the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Database queries are written in SQL and the output can be determined by the researcher, can be graphic visualizations, distribution? excel spreadsheets, etc. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown standards way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being Patient's birthday, date of diagnosis, dates of treatment use combined with overall user self-reported wellness scores, daily collected or accessed and incorporated into self-rated health rating, treatments patients are considering taking, patient's supplements, treatments, food (each one the network (e.g., EHR data, claims, patient- rated by self-reported overall wellness scores of users while taking the medication, and rated on a 1 to 5 star scale for reported outcomes, etc.). quality of the treatment) 4.g.i. (Y/N) Does the network use natural language processing? No

100 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

101 Chart 1

102 DIYgenomics

Criteria Answers 1.a. How many people does the network Not available cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The website is designed to allow users to create their own studies based on diseases and conditions of their interest. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? Preventive medicine through crowdsourced health research studies especially focusing on health risk, drug response, and 1.e.i.1. What does the network focus on? athletic performance. 1.f. (Y/N) Does the network use informed Yes - first when they become registered users of the system, and second when a user joins a study (each study has its own consent forms? consent process). 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Either - Users have the option to decide if they will share specific data on a study by study basis or that they want to research) or specific use of their electronic broadly share their data data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Users can initiate and become principle investigators for studies on this website. Users also can contribute data to the in what mechanism? How are they involved in studies if they choose to join as a participant. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Study results are always reported back to the participants of the study, there are not data use agreements because institutional investigators to collaborate with institutional investigators are not involved in the studies. each other using the data 1.g.iii.1.b. Policies for sharing data outside Only data that has been approved by users can be shared outside the network the network

103 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary Data are protected by Genomera's privacy policy. data 1) Swan, M. Scaling crowdsourced health studies: the emergence of a new form of contract research organization. Personalized Medicine 2012, Mar;9(2):223-234.

2.a. Three most recent (or high impact) 2) Swan, M., Hathaway, K., Hogg, C., McCauley, R., Vollrath, A. Citizen science genomics as a model for crowdsourced studies published in peer-reviewed journals preventive medicine research. J Participat Med. 2010 Dec 23; 2:e20.

3) Swan, M. Emerging patient-driven health care models: an examination of health social networks, consumer personalized medicine and quantified self-tracking. Int. J. Environ. Res. Public Health 2009, 2, 492-525. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Highest level encryption, browser is a secure http the network use? 4.b.i. (Y/N) Are queries distributed via a Principle investigator-users sign onto the Genomera platform and can download the data in formats including CSV or JSON. central hub? When a participant agrees to participate in the study, the participant's data generated from the study goes to the study's 4.b.ii. What is the architecture of the query data collection and to the user's profile. The researcher can see the data flowing into their data collection portal and they distribution? also have access to links they can use to download the de-identified data from their study. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10 and HL-7 codes will be used after Meaningful Use Stage 2 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?)

104 Criteria Answers 4.f. List the types of data that are being E-mail, username, optionally, real name, current city of residence, birthdate, gender, (ancestry record field as experiment), collected or accessed and incorporated into Use of a free form basis to tell about interests- examples include genetics, omega 3, and sleep. Specialized data types like a the network (e.g., EHR data, claims, patient- genome file can also be submitted. reported outcomes, etc.). Each study has one or more data collection instruments, devise reported data (ZO sleep monitor), lab reported data (urine analysis), user reported data (examples include demographic surveys and morning and evening evaluations). 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Varies based on the study aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

105 Genomera

Criteria Answers 1.a. How many people does the network 1000's cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The website is designed to allow users to create their own studies based on diseases and conditions of their interest. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Angel investors and venture capitalists funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? The network focuses on democratizing the process of conducting research, by allowing individuals who are not academic 1.e.i.1. What does the network focus on? researchers to team with other users to conduct clinical-style research studies. 1.f. (Y/N) Does the network use informed Yes - first when they become registered users of the system, and then for each study has their own consent process consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Either - Users have the option to decide if they will share specific data on a study by study basis or that they want to research) or specific use of their electronic broadly share their data data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Users can initiate and become principle investigators for studies on this website. Users also can contribute data to the in what mechanism? How are they involved in studies if they choose to join as a participant. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Study results are always reported back to the participants of the study, there are not data use agreements because institutional investigators to collaborate with institutional investigators are not involved in the studies. each other using the data 1.g.iii.1.b. Policies for sharing data outside Only data that has been approved by users can be shared outside the network the network

106 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary Data is protected by Genomera's privacy policy. data Crowdsourced Health Research Studies: An Important Emerging Complement to Clinical Trials in the Public Health 2.a. Three most recent (or high impact) Research Ecosystem, Reviewed by Paul Wicks, Thomas Pickard, and Ute Francke, Melanie Swan, MBA, J Med Internet Res. studies published in peer-reviewed journals 2012 Mar-Apr; 14(2): e46. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? http://genomera.com/studies/aging-risk-reduction-for-common-aging-conditions-through-monitoring-intervention 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers No standardization done standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Referring patients to Genomera for studies giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable

3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Highest level encryption, browser is a secure http the network use? 4.b.i. (Y/N) Are queries distributed via a Principle investigator-users sign onto the Genomera platform and can download the data in formats including CSV or JSON. central hub? When a participant agrees to participate in the study, the participant's data generated from the study goes to the study's 4.b.ii. What is the architecture of the query data collection and to the user's profile. The researcher can see the data flowing into their data collection portal and they distribution? also have access to links they can use to download the de-identified data from their study. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10 and HL-7 codes will be used after Meaningful Use Stage 2 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being E-mail, username, optionally, real name, current city of residence, birthdate, gender, (ancestry record field as experiment), collected or accessed and incorporated into Use of a free form basis to tell about interests- examples include genetics, omega 3, and sleep. Specialized data types like a the network (e.g., EHR data, claims, patient- genome file can also be submitted. reported outcomes, etc.). Each study has one or more data collection instruments, devise reported data (ZO sleep monitor), lab reported data (urine analysis), user reported data (examples include demographic surveys and morning and evening evaluations)

107 Criteria Answers 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Varies based on the study aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

108 Glu

Criteria Answers 1.a. How many people does the network 25,883 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Mainly involves patients with Type 1 Diabetes procedures 1.a.ii.1. Can the network be used for new Yes but only within the same condition studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable White (Non-Hispanic): 81% Black (Non-Hispanic): 5% Hispanic or Latino: 8% 1.b.i.1. Demographics: racial/ethnic Native Hawaiian/Other Pacific Islander: 1% Asian: 1% American Indian/Alaskan Native: 1% Other: 3% 1.b.i.2. Demographics: geography Not available < 6: 49% 6 - 13: 27% 13 - 18: 24% 18 - 26: 15% 1.b.i.3. Demographics: age 26 - 31: 4% 31 - 50: 13.3% 50 - 65: 8.31% >= 65: 2.74% Male: 50% 1.b.i.4. Demographics: gender Female: 50% 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of Helmsley Charitable Trust funding? 1.c.iii. How much does it cost each year to Pays other sites $75 per patient to update data manually maintain and update the network? 1.d. How many years has this network 1 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? An online community of type 1 diabetes 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Can provide as much or as little information as they feel comfortable with. However, any information provided up until the in what mechanism? How are they involved in point the patient stops providing information will remain in the database indefinitely to be used for research the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life)

109 Criteria Answers 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Researchers will be able to request to use the information for their research. This might involve the Glu team performing 1.g.iii.1.a. Data use and sharing policies for analyses on the information and giving the researcher the results. It also could involve giving the researcher a dataset or institutional investigators to collaborate with information. There may be a charge to researchers when they request analyses of the information or a dataset. This charge each other using the data is intended to cover the costs involved in collecting, storing, processing, and analyzing the information Glu members provide. Researchers will be able to request to use the information for their research. This might involve the Glu team performing analyses on the information and giving the researcher the results. It also could involve giving the researcher a dataset or 1.g.iii.1.b. Policies for sharing data outside information. There may be a charge to researchers when they request analyses of the information or a dataset. This charge the network is intended to cover the costs involved in collecting, storing, processing, and analyzing the information Glu members provide. 1.g.iii.1.c. Policies for protecting proprietary All information provided to researchers are de-identified and sometimes aggregated data data 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No - will be starting their first RCT in May 2013 network? 2.d.i.1. What is the evidence? 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are DNA, RNA, peripheral blood mononuclear cells (PBMC), serum and plasma collected? Metabolic measures including HbA1c, glucose and C-peptide. Immune and genetic measures such as HLA typing and 3.c. What types of analysis are done on them? diabetes-related autoantibodies 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Metabolic measures including HbA1c, glucose and C-peptide. Immune and genetic measures such as HLA typing and on them? diabetes-related autoantibodies 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? Data are entered on the Jaeb Center for Health Research’s secure website through an SSL encrypted connection. The Jaeb 4.a. What type of security technology does Center websites are maintained on Unix and Linux servers running Apache web server software and on a Windows server the network use? running IIS, all with strong encryption. The study website is password-protected and restricted to users who have been authorized by the Jaeb Center to gain access. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? RxNorm, MEDRA

110 Criteria Answers 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) Nickname, mixology, My story/quote, E-mail, Designation(you have type 1 or are a caregiver), Date of Birth, Country, Terms and Conditions Consent, Data Use Consent, Gender, Race/Ethnicity, Zip code, Age of diagnosis, diagnosis scenario, 4.f. List the types of data that are being insulin delivery method, other, information about when you developed diabetes collected or accessed and incorporated into how your diabetes has been treated, blood sugar measurements, problems related to your diabetes, other medical the network (e.g., EHR data, claims, patient- problems you may have, blood tests that have been done, medicines that you take, whether anyone else in the family has reported outcomes, etc.). diabetes, your education level (such as whether you completed high school), your family income level, what type of health insurance you have, if any, how you feel about your diabetes, problems in your life, information about your lifestyle, such as how much you exercise 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

111 Inspire

Criteria Answers 1.a. How many people does the network 315,274 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Not available procedures 1.a.ii.1. Can the network be used for new Not available studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Not available care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 7 existed? 1.e.i. (Y/N) Does the network have a focus Not available (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not available 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Do not have to provide their information. In order to interact with other users and post content, have to provide additional in what mechanism? How are they involved in information. Also, the information in posts becomes public information so users are in control of what they post. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Not available each other using the data 1.g.iii.1.b. Policies for sharing data outside Not available the network 1.g.iii.1.c. Policies for protecting proprietary Not available data

112 Criteria Answers 2.a. Three most recent (or high impact) Not available studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not available values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Not available reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? Not available 3.b. What types of biospecimens are Not available collected? 3.c. What types of analysis are done on them? Not available 3.d. (Y/N) Do researchers in the network Not available collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Not available

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Your inspiration, photo, relationship status, birthday, gender, zip code and country of residence, interests reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Not available

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used?

113 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Not available the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not available aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not available network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Not available billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

114 Insulindependence

Criteria Answers 1.a. How many people does the network 15,000 cover or involve? 1.a.i. Evidence of capacity for expansion to Focus mainly on the fitness and recreation of patients with diabetes but do want to facilitate research studies using the cover additional lives, diseases, conditions, or data collected procedures 1.a.ii.1. Can the network be used for new No studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Grassroots, Corporate Sponsorship, Helmsley Charitable Trust funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 7 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Diabetes self-management outreach, not specifically a research network/organization 1.f. (Y/N) Does the network use informed No - not the site specifically consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable - If a research investigator would like to have individuals from the site participate in a study, it is the research) or specific use of their electronic responsibility of that researcher to obtain consent from the patient data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable - If a research investigator would like to have individuals from the site participate in a study, it is the research) or specific use of their biological responsibility of that researcher to obtain consent from the patient specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Not available each other using the data 1.g.iii.1.b. Policies for sharing data outside Not available the network

115 Criteria Answers "When you submit information via our Websites, we take efforts to protect your information both online and offline. 1.g.iii.1.c. Policies for protecting proprietary Please keep in mind, however, that whenever you give out personal information online, such information is not always data secure in transit. While we strive to protect your privacy and secure your information, we cannot guarantee the security of information sent over the Internet, and you disclose such information at your own risk." 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No - but it is in the process of making it possible in the future values rather than one time) follow-up? 2.b.i. What is the evidence? 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Fairway Technologies is the security technology company supporting the website and database the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, medications, conditions, devices used, health status reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

116 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

117 International Waldenstrom's Macroglubulinemia Foundation

Criteria Answers 1.a. How many people does the network 4,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Conducts studies mainly on Waldenstrom's macroglobulinemia procedures 1.a.ii.1. Can the network be used for new No studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $500,000 1.c.i.1. How much of that budget is dedicated A percentage of $500,000/year to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $500,000 to conducting studies? 1.c.ii. What are the current sources of Confidential funding? 1.c.iii. How much does it cost each year to A percentage of $500,000/year maintain and update the network? 1.d. How many years has this network 18 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Focuses mainly on Waldenstrom's macroglobulinemia 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable - Researchers wanting to use patient data from the Registry must get consent from the patients themselves research) or specific use of their electronic on a specific study data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in By providing as much information about their condition and health status the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for In order for an institutional investigator to use the data, they must apply for a grant through IWMF. Then must go through institutional investigators to collaborate with a review process before being given access to the data to conduct their study each other using the data 1.g.iii.1.b. Policies for sharing data outside In order for an investigator to use the data, they must apply for a grant through IWMF. Then must go through a review the network process before being given access to the data to conduct their study 1.g.iii.1.c. Policies for protecting proprietary Data stored are all de-identified data

118 Criteria Answers MYD88 L265P in Waldenstrom's Macroglobulinemia, IgM Monoclonal Gammopathy, and other B-cell Lymphoproliferative Disorders using Conventional and Quantitative Allele-Specific PCR. 2.a. Three most recent (or high impact) Xu L, Hunter ZR, Yang G, Zhou Y, Cao Y, Liu X, Morra E, Trojani A, Greco A, Arcaini L, Varettoni M, Brown JR, Tai YT, studies published in peer-reviewed journals Anderson KC, Munshi NC, Patterson CJ, Manning R, Tripsas C, Lindeman NI, Treon SP. Blood. 2013 Jan 15. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Researchers at these organizations participate in on-going research with IWMF giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Not available

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Blood properties, trends, treatments, demographics reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Not available

119 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Not available the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not available aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

120 MDJunction

Criteria Answers 1.a. How many people does the network Visited by 16,000,000 in the past year cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or There is a link for users to apply to start their own support groups procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Private philanthropists funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 8 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Support groups 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Patients choose what information to share on the message board the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with No data are offered to investigators each other using the data 1.g.iii.1.b. Policies for sharing data outside No data are offered to investigators the network

121 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary No proprietary data are collected data 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)? 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Name, e-mail reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Not available

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used?

122 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the None available network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

123 MedHelp

Criteria Answers 1.a. How many people does the network 12,000,000 site visitors monthly cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Yes - Conditions are added to the site based on what conditions receive the most hits on Google procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated 80% percent of the total budget to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of Confidential funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 19 existed? 1.e.i. (Y/N) Does the network have a focus No (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed Yes - User consent by signing a disclaimer consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Patients decide what data they want to be shared with the public and with their healthcare providers. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Data shared outside the PPRN contains no personally identifiable information. Third parties must agree that they will not institutional investigators to collaborate with attempt to make this information personally identifiable. The user has the option of granting access to personally each other using the data identifiable information to their physician or hospital. 1.g.iii.1.b. Policies for sharing data outside Investigators from outside the network follow the same data sharing procedures as investigators inside the network. the network 1.g.iii.1.c. Policies for protecting proprietary The data contain no personally identifiable information. data

124 Criteria Answers 2.a. Three most recent (or high impact) Cataract and intraocular implant surgery concerns and comments posted at two internet eye care forums. Hagan JC 3rd, studies published in peer-reviewed journals Kutryb MJ. Mo Med. 2009 Jan-Feb;106(1):78-82. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Data are encrypted the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query When a researcher submits a query, the MedHelp team queries their database and sends the results back to the distribution? researcher 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a JSON way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Consumer collected data, from condition-specific health applications and Personal Health Records (PHRs) reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Confidential etc.) or approaches (examples are machine learning, rule-based) are being used?

125 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data The data are aggregated based on the needs of the researcher. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Google analytics and home grown analysis tools network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

126 PatientsLikeMe

Criteria Answers 1.a. How many people does the network 202,000 cover or involve? 1.a.i. Evidence of capacity for expansion to In April 2011, the platform expanded so that any patient with any condition (and multiple conditions) cover additional lives, diseases, conditions, could use the system. To date, there are over 2,000 conditions registered on the system. There are over or procedures 30,000 patients with fibromyalgia or MS; over 10,000 patients with major depressive disorder, chronic fatigue syndrome, or generalized anxiety disorder; over 5,000 with, epilepsy, type 2 diabetes, Parkinson's disease, ALS, panic disorder, social anxiety disorder, PTSD, or rheumatoid arthritis. There are also substantial numbers of patients with rare conditions, for example, over 2,000 with kidney transplant, over 1,000 with cystic fibrosis, over 400 with primary lateral sclerosis, Devic's neuromyelitis optica, or progressive muscular atrophy, over 300 with polycystic kidney disease or idiopathy pulmonary fibrosis, and over 60 with the orphan disease alkaptonuria, for instance. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past Yes that show the network can be used for clinical care delivery or quality improvement? 1.a.iii.1. What is the evidence? Three peer-reviewed studies: A study in ALS, MS, Parkinson’s, HIV, Fibromyalgia, and mood disorders suggested a number of patient- reported benefits from using the system including better understanding of their condition and symptoms, better quality of life, and better medication adherence (Wicks P, Massagli M, Frost J, Brownstein C, Okun S, Vaughan T, Bradley R, Heywood J (2010) Sharing Health Data for Better Outcomes on PatientsLikeMe, Journal of Medical Internet Research, 12(2):e19).

A second study replicated these findings in epilepsy, and also found some evidence of better clinical outcomes (e.g. ER admissions, fewer seizures) as well as a ”dose-effect curve” of benefits against social interactions on the site (Wicks P, Keininger DL, Massagli MP, de la Loge C, Brownstein C, Isojarvi J, Heywood JA (2012) Perceived benefits of sharing health data between people with epilepsy on an online platform, Epilepsy & Behavior, 23:16-23).

An additional study assessing quality of care in epilepsy was used by the American Academy of Neurology to update how they train neurologists and in a submission to the National Quality Forum on quality of care in epilepsy (Wicks P & Fountain NB (2012) Patient assessment of physician performance of epilepsy quality-of-care measures, Neurology Clinical Practice, 2:335-345) 1.b.i.1. Demographics: racial/ethnic Among those reporting race: White: 85% Black or African-American: 4% Mixed Race: 4% Prefer not to answer: 4% Asian: 3% American Indian or Alaskan Native: 1% Native Hawaiian or other Pacific Islander: <1%

Among those reporting ethnicity: Non-hispanic: 83% Prefer not to answer: 10% Hispanic: 6% 1.b.i.2. Demographics: geography Among those reporting location: USA: 80% UK:6% Canada:5% Australia: 2% 184 other countries: 1% or less 1.b.i.3. Demographics: age Among those reporting age: <10: 1% 11-20: 2% 21-30: 13% 31-40: 21% 41-50: 27% 51-60: 23% 61-70: 11% 71-80: 3% 81-90: <1% 91+:<1% 1.b.i.4. Demographics: gender Among those reporting gender: Female: 72% Male: 28% 1.c.i. What is the total annual budget? Confidential

1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of The PatientsLikeMe Research Team has received research funding from Abbott, Acorda, The AKU Society, funding? Astra Zeneca, Avanir, Biogen, Boehringer Ingelheim, Genzyme, Johnson & Johnson, Merck, National Institutes of Health, Novartis, The Robert Wood Johnson Foundation, Sanofi, and UCB. 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network Founded in 2004, ALS community launched in 2006. existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Any illness or medical condition, but with a historical emphasis on neurological conditions (e.g. ALS, Parkinson's, MS, epilepsy) and serious or disabling medical conditions (e.g. organ transplants, HIV, mood disorder, fibromyalgia) 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad Broad - Patients are told upfront that when they sign up that their information will be used for studies and (meaning data may be analyzed for other will also be sold to partner companies for research purposes. In addition, when additional information is research) or specific use of their electronic collected via surveys there may be additional informed consent language specified by the respective data? partner companies' and institutions' IRBs. 1.f.ii. Do patients consent to the broad Not applicable (meaning data may be analyzed for other research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the Yes decision-making process on the use of the data they provided to the network? 1.g.i.1. What are the roles patients play and In accordance to how much information they provide and share in their online profile. Users of in what mechanism? How are they involved PatientsLikeMe can opt-in of sharing their profile and information to the public. Approximately 14% of in the decision-making process? users share their information in a manner accessible to the public. The remainder keep their data visible only to other members of the community. 1.g.ii.1. What are the sources of Self- Primarily self-reported but starting to include sensor data (e.g., voice, devices) Reported data collected in the network? (e.g., conditions, medications, medication adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Not applicable Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials PatientsLikeMe has partnered with the University of Michigan to assist in an ongoing clinical trial by data collected in the network? (e.g., coded aggregating level statistics on patient reported-data through the PLM platform for all individuals who join diagnostics, drug information, procedures, PLM through their clinical trial, conducting yearly surveys of all individuals who join PLM through their lab orders, diagnostic results, imaging data, clinical trial, and providing individual-level patient reported data for all individuals who join PLM through biospecimen, health-related quality of life) their clinical trial and have given PLM permission to provide that individual level data to the University of Michigan for purposes of the clinical trial. Additionally, a free and publicly available tool allows members (and non–members) to easily access all the trials registered on ClinicalTrials.gov. If they provide demographic information such as age, sex, and location, and the name of their condition, they will be shown the trials most relevant for them. 1.g.iii.1.a. Data use and sharing policies for Will need institutional investigators to contact PatientsLikeMe research team and provide them with the institutional investigators to collaborate with initial research proposal. If they feel that the research project will be interesting and beneficial to their each other using the data users, PatientsLikeMe will assist in writing a grant proposal and help describe what they do for a local IRB. 1.g.iii.1.b. Policies for sharing data outside "Please write to the research team with your initial research proposal. If we think a research project has the network the potential to benefit our users we would be happy to assist you in writing a grant proposal and helping to describe what we do for your local Internal Review Board (IRB). The proportion of funding we would receive depends on a number of factors including the contribution of our staff to the design, the difficulty of accessing the specific population of interest, and the source of funding." 1.g.iii.1.c. Policies for protecting proprietary Outside of what the user shares to the public and/or within the network, PatientsLikeMe does not share data any personal identifying information 2.a. Three most recent (or high impact) 1) Nakamura C, Bromberg M, Bhargava S, Wicks P, Zeng-Treitler Q studies published in peer-reviewed journals Mining Online Social Network Data for Biomedical Research: A Comparison of Clinicians’ and Patients’ Perceptions About Amyotrophic Lateral Sclerosis Treatments J Med Internet Res 2012;14(3):e90

2) Bove R, Secor E, Healy BC, Musallam A, Vaughan T, Glanz BI, Weiner HL, Chitnis T, Wicks P, de Jager PL Evaluation of an online platform for multiple sclerosis research: Patient description, validation of severity scale, and exploration of BMI effects on disease progression PLoS ONE 2013, 8(3):e59707

3) Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Paul Wicks, Timothy E Vaughan, Michael P Massagli & James Heywood. Nature Biotechnology 2011, 29:411–414 2.b. (Y/N) Have researchers conducted Yes studies that involve longitudinal (multiple values rather than one time) follow-up? 2.b.i. What is the evidence? http://www.patientslikeme.com/research

2.b.ii. (Y/N) Can researchers conduct follow- Yes up or ongoing observation from existing reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize Standardized questionnaires, such as the EQ-5D, have been used (with permission from the licensors) to those data items? (e.g., how do researchers ensure comparability of populations across multiple studies. standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations Yes (hospitals, outpatient centers) actively participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, The Veteran's Administration (VA) is engaged in a research study called "Policy for Optimal Epilepsy giving access to EHRs, etc.) Management" (POEM) to refer veterans with seizures to PLM for the purpose of establishing whether the platform helps improve self-efficacy. In another study, Movement Disorders specialists at Johns Hopkins are offering telemedicine consultations to PLM members with Parkinson's disease, using the information in their profile to enhance the consult. Results from both studies are anticipated in 2013/14. 2.d.i. (Y/N) Have there been any randomized Yes control trials using the data collected in the network? 2.d.i.1. What is the evidence? Clinical trial investigators have (unwittingly) had their patients sign up for PatientsLikeMe. This was described in a paper (Heywood, Vaughan, Wicks (2012) Waiting for p<0.05, Figshare, http://dx.doi.org/10.6084/m9.figshare.96802) 3.a. (Y/N) Does the network have biobanks? No

3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on Not applicable them? 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the Not applicable analysis/research results back to patient outcomes? 4.a. What type of security technology does "We follow the best practices in security (as per HIPAA Security Compliance). We use a respected, secure the network use? hosting provider for the site. which has signed a HIPAA compliance agreement and earned SAS Type II certification. We also use state of the art firewalls for our production servers, and our systems have been developed to prevent the most common security vulnerabilities. For secure browsing, we use 128-bit SSL encryption using Verisign certificates. Finally, when we do any testing and development work to the site, we use sanitized versions of the site, with all personally identification information stripped out." 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use Yes standardized terminologies (i.e., ICD-9, SNOMED, etc.)? 4.c.ii. Which terminologies? Multiple UMLS terminologies including SNOMED-CT, ICD-10, ICF, HL7, MEDDRA, unifying grammar, internal PatientsLikeMe Patient Vocabulary 4.d.i.(Y/N) Does the network use a common No data model (CDM)? 4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect No additional fields to help with analysis and interpretation (metadata)? 4.e.i.1. What standards, possibly home PatientsLikeMe Patient Vocabulary is a home-grown repository of symptoms, conditions, side effects, and grown, are used? If home grown, is there a treatments. It maps patient-entered terminology to standardized vocabularies including ICD10, SNOMED- way to map back to standards? (Data CT and MEDDRA Dictionary?) 4.f. List the types of data that are being Biographical information, e.g., photograph, biography, gender, age, location (city, state and country), collected or accessed and incorporated into general notes; Condition/disease information, e.g., diagnosis date, first symptom, family history; the network (e.g., EHR data, claims, patient- Treatment information, e.g., treatment start dates, stop dates, dosages, side effects, treatment reported outcomes, etc.). evaluations; Symptom information, e.g., severity, duration; Primary and secondary outcome scores over time, e.g., ALSFRS-R, MSRS, PDRS, FVC, PFRS, Mood Map, Quality of Life, weight, InstantMe; Laboratory results, e.g., CD-4 count, viral load, creatinine; Genetic information, e.g., information on individual genes and/or entire genetic scans; Individual and aggregated survey responses; Information shared via free text fields, e.g., the forum, treatment evaluations, surveys, annotations, journals, feeds, adverse event reports 4.g.i. (Y/N) Does the network use natural NLP has been used for adverse event detection processes. NLP and machine learning will be used by end language processing? of 2013 for various purposes. 4.g.ii. What applications (e.g., UIMA, cTAKES, Not applicable NegEx, MetaMap, many different parsers, etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the Yes data leave the local site and are shared with the network? 4.h.ii. How are the data transformed (i.e., Report publicly shared data in aggregates based on demographic distribution by treatments and/or based on what criteria are the data conditions aggregated)? 4.i. What data (statistical) analysis tools, if Not applicable any, are available for researchers through the network? 4.j.i. (Y/N) Are administrative, billing, and/or No clinical records integrated into longitudinal patient-level data? (Are administrative, billing, and clinical records kept in individual places or lumped in with patient-level data?) 4.j.ii. What informatics tools are used? PatientsLikeMe developed a "User Voice Dashboard" where data not previously captured in their databases is triaged by a clinical team (RNs, PharmDs). These data are curated using internal data integrity conventions and informatics science.

Personal Genome Project

Criteria Answers 1.a. How many people does the network 2,428 cover or involve? 1.a.i. Evidence of capacity for expansion to Personal genomes in progress: from the human genome project to the personal genome project., Lunshof JE, Bobe J, Aach cover additional lives, diseases, conditions, or J, Angrist M, Thakuria JV, Vorhaus DB, Hoehe MR, Church GM., PMID: 20373666 [PubMed - indexed for MEDLINE] PMCID: procedures PMC3181947 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? M. P. Ball et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 13 July 2012 1.a.iii.1. What is the evidence? (doi:10.1073/pnas.1201904109) 1.b.i.1. Demographics: racial/ethnic Information not available in aggregate form 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Information not available in aggregate form 1.b.i.4. Demographics: gender Information not available in aggregate form 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 12 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Personal genomic sequencing 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with All data are publicly available each other using the data 1.g.iii.1.b. Policies for sharing data outside All data are publicly available the network

130 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary All data are publicly available data M. P. Ball et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 13 July 2012 (doi: 2.a. Three most recent (or high impact) 10.1073/pnas.1201904109) studies published in peer-reviewed journals G M Church. The Personal Genome Project. Molecular Systems Biology 1:2005.0030

2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Not available network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Tissue Samples collected? Creation of cell lines, transformation into somatic cell-derived stem cells, DNA sequencing, gene expression, and the 3.c. What types of analysis are done on them? identification of bacteria and viruses in the specimen sample 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct The study of biological characteristics, including DNA, RNA (gene expression), physical traits, biochemical traits, and the on them? presence and characteristics of micro-organisms and viruses in the specimen. 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Not available central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) Biometric data, conditions, medications, allergies, family members 4.f. List the types of data that are being Imaging data, EHR, procedures, test results, immunizations collected or accessed and incorporated into Data from 23ndMe, surveys, enrollment history, cell lines, genomic and phenotypic data the network (e.g., EHR data, claims, patient- Complete Genomics-CGI Sample, weight, fat mass, immunizations, red blood cell count, white blood cell count, Total PSA, reported outcomes, etc.). Total Protein, RDW, platelet count, PH, Occult blood, Non-HDL Cholesterol, Nitrite, Neutrophils, mpv, Monocytes, MCV, LDL-Cholesterol, Ketones, Hyaline Cast, Hemoglobin, Glucose, reflexive urine culture, sodium, triglycerides, white blood cell count, calcium, ast, demographic information

131 Criteria Answers 4.g.i. (Y/N) Does the network use natural language processing? Not available

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Genome-Environment–Trait Evidence network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Not available billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

132 Quantified Self

Criteria Answers 1.a. How many people does the network 16,000 signed up for MeetUps cover or involve? 1.a.i. Evidence of capacity for expansion to The MeetUp groups sponsored by Quantified Self are open to any citizen scientist (amateur or nonprofessional scientist) cover additional lives, diseases, conditions, or who would like to attend or present at a meeting. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Autodesk, Intel, 23andMe, Scanadu funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Not available existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Fostering self-tracking and self-experimentation on health behaviors, conditions, etc. 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Patients create a profile for themselves at MeetUp.com, so they can see where Quantified Self meetings are taking place. in what mechanism? How are they involved in They can also decide who can see their MeetUp profile information. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Quantified Self does not hold user's data each other using the data 1.g.iii.1.b. Policies for sharing data outside Quantified Self does not hold user's data the network 1.g.iii.1.c. Policies for protecting proprietary Quantified Self does not hold user's data data

133 Criteria Answers 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes - Please note in this case researchers are citizen scientists (amateur or nonprofessional scientist) values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes, researchers are not third parties but rather citizen scientists, i.e., the users themselves reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers No study has required standardization standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Data are not collected by the network reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used?

134 Criteria Answers 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Yes - Creating tools to help users studying themselves make sense of their data -- data aggregation systems network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

135 TuDiabetes.org with TuAnalyze

Criteria Answers 1.a. How many people does the network English (tudiabetes.org): 27,000; Spanish (tuesdiabetes.org): 20,000 cover or involve? 1.a.i. Evidence of capacity for expansion to Yes - Studies are survey based and added routinely, TuAnalyze has the capacity to cover users internationally as well as cover additional lives, diseases, conditions, or nationally procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not collected US: 60+%, 1.b.i.2. Demographics: geography US with Canada, UK, India, and Australia: 90% of members 1.b.i.3. Demographics: age Average age mid 40s, 80% of members between age 35 to 65 1.b.i.4. Demographics: gender Female: 60% 1.c.i. What is the total annual budget? $70,000 - $75,000 (Diabetes Hands Foundation receives $600,000) 1.c.i.1. How much of that budget is dedicated $65,000 - $70,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $5,000 to conducting studies? 1.c.ii. What are the current sources of Diabetes Hands funding? 1.c.iii. How much does it cost each year to Included in amount of annual budget dedicated to infrastructure and maintenance maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Type 1 and 2 Diabetes 1.f. (Y/N) Does the network use informed No for TuDiabetes.org., Yes for TuAnalyze consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - meaning users choose what data may be seen by researchers research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Patients control what information they make public to other users, to the Internet community, and to researchers the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for If the data are made public by a patient, then researchers at Children's Hospital Boston can see it because they operate the institutional investigators to collaborate with TuAnalyze Site. If it has been marked private, the researchers cannot see it -- they can only see data marked private by each other using the data users in an aggregate format. A researcher approaches TuDiabetes.org with a ".edu" e-mail and proof that their survey has been approved by their home 1.g.iii.1.b. Policies for sharing data outside IRB and if the survey is approved by TuDiabetes.org, TUD allows the researcher to post the survey on the website and will the network send e-mails to users inviting them to take the survey

136 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary Data marked private can be viewed only in aggregate form data 2.a. Three most recent (or high impact) Weitzman ER, Adida B, Kelemen S, Mandl KD (2011) Sharing Data for Public Health Research by Members of an studies published in peer-reviewed journals International Online Diabetes Social Network. PLoS ONE 6(4): e19256. doi:10.1371/journal.pone.0019256 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are No collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does NING platform and network, IP blocking to prevent spammers the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Site administrators are responsible for querying the database for TuDiabetes and Children's Hospital Boston researchers distribution? are responsible for querying the database for TuAnalyze, and sending that information to the researcher 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being TuDiabetes asks for type of diabetes, how long a user has had it, type of therapy (optional), A1C question (optional), collected or accessed and incorporated into location, name, e-mail the network (e.g., EHR data, claims, patient- TuAnalyze - a survey is conducted that serves as metadata for all other surveys that the user fills out while using TuAnalyze. reported outcomes, etc.). The survey asks name, type of diabetes, type of therapy A1c question 4.g.i. (Y/N) Does the network use natural language processing? Yes

137 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, TuDiabetes has entered an agreement with another company to assign key terms to an open field from the website that etc.) or approaches (examples are machine asks users, "What do you want to get out of the community?" learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Aggregated based on the researcher's needs aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

138 V. Inventory of Patient Registries

Autism Genetic Resource Exchange (AGRE)

Criteria Answers 1.a. How many people does the network 10,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or None procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Wall DP, Dally R, Luyster R, Jung JY, Deluca TF. 1.a.iii.1. What is the evidence? Use of artificial intelligence to shorten the behavioral diagnosis of autism. PLoS One. 2012;7(8):e43855. 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $800,000 1.c.i.1. How much of that budget is dedicated $400,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $400,000 to conducting studies? 1.c.ii. What are the current sources of National Institute of Health (NIH) funding? 1.c.iii. How much does it cost each year to A percentage of the $400,000 maintain and update the network? 1.d. How many years has this network 15 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Autism research involving families with two or more children with Autism 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for The Principal Investigator must obtain IRB approval or exemption and then sign the AGRE Researcher Distribution institutional investigators to collaborate with Agreement. each other using the data 1.g.iii.1.b. Policies for sharing data outside Investigators go through a rigorous approval process by obtaining an IRB approval and by signing an agreement with AGRE. the network

139 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary They have a series of protocols that protect the PHI data housed in their database. Data are de-identified. data 1) Martin LA, Horriat NL. The effects of birth order and birth interval on the phenotypic expression of autism spectrum disorder. PLoS One. 2012;7(11):e51049. doi: 10.1371/journal.pone.0051049. Epub 2012 Nov 30. PMID:23226454

2) Skafidas E, Testa R, Zantomio D, Chana G, Everall IP, Pantelis C. Predicting the diagnosis of autism spectrum disorder 2.a. Three most recent (or high impact) using gene pathway analysis. Mol Psychiatry. 2012 Sep 11. doi: 10.1038/mp.2012.126. [Epub ahead of print] studies published in peer-reviewed journals PMID:22965006

3) Hall D, Huerta MF, McAuliffe MJ, Farber GK. Sharing Heterogeneous Data: The National Database for Autism Research. Neuroinformatics. 2012 May 24. [Epub ahead of print] PMID:22622767 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Norris M, Lecavalier L, Edwards MC. The Structure of Autism Symptoms as Measured by the Autism Diagnostic Observation 2.b.i. What is the evidence? Schedule. J Autism Dev Disord. 2011 Aug 20 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are LCL DNA, Transformed Cell Lines, Serum, Plasma, Whole Blood collected? 3.c. What types of analysis are done on them? Whole genome scan and fine mapping, High-density SNP 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? The Diagnostic and Statistical Manual of Mental Disorders (DSM) 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?)

140 Criteria Answers 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Genotype Data, Phenotype Data, Clinical Data, Medical Data, Demographic Data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

141 Autism Treatment Network

Criteria Answers 1.a. How many people does the network 1,550 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Conduct studies within the realm of children with Autism procedures 1.a.ii.1. Can the network be used for new Yes, but within the same condition studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable White: 73% African American: 6% 1.b.i.1. Demographics: racial/ethnic Asian: 5% Latino: 10% 1.b.i.2. Demographics: geography US and Canada < 5: 45% 1.b.i.3. Demographics: age 5-7: 20% 7+: 32% Male: 84% 1.b.i.4. Demographics: gender Female 16% 1.c.i. What is the total annual budget? $4,000,000 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $4,800,000 to conducting studies? 1.c.ii. What are the current sources of Health Resources and Services Administration, Materna and Child Health Bureau funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 4 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? Providing self-management support, shared-decision making, delivery system design, decision support, and coordination of 1.e.i.1. What does the network focus on? care 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad - Patients only consent to have their de-identified data included in the patient registry. research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific - There is a separate informed consent form. research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Registry Data and EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

142 Criteria Answers 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Investigators within the network have access to the data each other using the data Clinics and/or Researchers must first submit an application for a "Custom Form". Once approved, the "Custom Form" can 1.g.iii.1.b. Policies for sharing data outside be filled out and submitted for approval. As soon as they receive approval, they will be able to have access to the Registry the network Data. 1.g.iii.1.c. Policies for protecting proprietary Stored data are de-identified data 1) Coury D. Very little high-quality evidence to support most medications for children with autism spectrum disorders. J Pediatric. 2011; 159(5):872-3.

2) Coury DL. Review: little evidence of clear benefit for most medical treatments for children with autism spectrum 2.a. Three most recent (or high impact) disorders. Evid Based Ment Health. 2011; 14(4):105. Epub 2011 Sep 30. studies published in peer-reviewed journals 3) Goldman S, McGrew S, Johnson K, Richdale A, Clemons T, & Malow B. Sleep is associated with problem behaviors in children and adolescents with Autism Spectrum Disorders. Res Autism Spectr Disord. 2011; 5 (3): 1223-1229 doi: 10.1016/j.rasd.2011.01.010 . 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not available values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Developed a set of proprietary or "custom" forms to be used across the clinics standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Providing EHR data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood and Urine collected? 3.c. What types of analysis are done on them? Fragile X testing and genotyping 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct None yet on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Current security protocols the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? DSM-IV (Diagnostic and Statistical Manual of Mental Disorders) 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

143 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, clinical data, medications, conditions, outcomes reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

144 Be The Match Bone Marrow Donor Registry

Criteria Answers 1.a. How many people does the network 10,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or None procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Not available care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $23,000,000 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of U.S. Government Funding funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 16 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Improving transplantation of marrow for patients who have Leukemia or Lymphoma 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - The donor or patient is given details about the type of data asked for or sample needed and the purpose of the research) or specific use of their electronic research. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Specific - The donor or patient is included in the research only if he or she agrees and signs a consent form. research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., Other - Transplant Centers, Donor Centers, Cord Blood Banks, Collection Centers, Apheresis Centers, Laboratories, coded diagnostics, pharmacy orders, Repositories pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Any research project that involves the donors or patients also is reviewed and approved by their Institutional Review 1.g.iii.1.a. Data use and sharing policies for Board (IRB) before the research begins. The IRB continues to oversee each project until it is complete. IRB members are institutional investigators to collaborate with doctors, ethicists, and people of the community who have no stake in the research. The IRB exists to protect the rights of each other using the data our donors and patients who participate in research.

145 Criteria Answers Any research project that involves the donors or patients also is reviewed and approved by their Institutional Review 1.g.iii.1.b. Policies for sharing data outside Board (IRB) before the research begins. The IRB continues to oversee each project until it is complete. IRB members are the network doctors, ethicists and people of the community who have no stake in the research. The IRB exists to protect the rights of our donors and patients who participate in research. When a donor joins the Be The Match Registry®, he or she gives a swab of cheek cells OR blood sample and is assigned a donor ID number. The blood or cell sample is labeled only with the donor ID number and is tested for the donor's tissue 1.g.iii.1.c. Policies for protecting proprietary type. The only time the blood or cell sample and ID number are ever linked with a donor's name is when it is necessary to data contact a donor to ask for more testing because he or she matches a patient. All staff and subcontractors that provide services for Be The Match, such as storing blood and cell samples, are required by law and contract to keep donor- identifying information private. 1) Lenalidomide after stem-cell transplantation for multiple myeloma. McCarthy PL, Owzar K, Hofmeister CC, Hurd DD, Hassoun H, Richardson PG, Giralt S, Stadtmauer EA, Weisdorf DJ, Vij R, Moreb JS, Callander NS, Van Besien K, Gentile T, Isola L, Maziarz RT, Gabriel DA, Bashey A, Landau H, Martin T, Qazilbash MH, Levitan D, McClune B, Schlossman R, Hars V, Postiglione J, Jiang C, Bennett E, Barry S, Bressler L, Kelly M, Seiler M, Rosenbaum C, Hari P, Pasquini MC, Horowitz MM, Shea TC, Devine SM, Anderson KC, Linker C New England Journal of Medicine 366(19):1770-1781 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) Costs and cost-effectiveness of hematopoietic cell transplantation. Preussler JM, Denzen EM, Majhail NS Biology of Blood & Marrow Transplantation 18(11)1620-1628

3) A combined DPA1~DPB1 amino acid epitope is the primary unit of selection on the HLA-DP heterodimer. Hollenbach JA, Madbouly A, Gragert L, Vierra-Green C, Flesch S, Spellman S, Begovich A, Noreen H, Trachtenberg E, Williams T, Yu N, Shaw B, Fleischhauer K, Fernandez-Vina M, Maiers M Immunogenetics 64(8):559-569 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Outcomes after matched unrelated donor versus identical sibling hematopoietic cell transplantation in adults with acute myelogenous leukemia 2.b.i. What is the evidence? Wael Saber, Shaun Opie, J. Douglas Rizzo, Mei-Jie Zhang, Mary M. Horowitz, Jeff Schriber Blood. 2012 April 26; 119(17): 3908–3916. Prepublished online 2012 February 10. doi: 10.1182/blood-2011-09-381699 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Using Health and Human Services standards and noting the dates of change, but mostly correct the data elements by hand standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By referring patients and participating in research giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? Acute graft-versus-host disease biomarkers measured during therapy can predict treatment outcomes: a Blood and Marrow Transplant Clinical Trials Network study 2.d.i.1. What is the evidence? John E. Levine, Brent R. Logan, Juan Wu, Amin M. Alousi, Javier Bolaños-Meade, James L. M. Ferrara, Vincent T. Ho, Daniel J. Weisdorf, Sophie Paczesny Blood. 2012 April 19; 119(16): 3854–3860. Prepublished online 2012 March 1. doi: 10.1182/blood-2012-01-403063 3.a. (Y/N) Does the network have biobanks? Yes Whole blood, Cryopreserved whole blood, Plasma, Blood spotted on filter paper, Peripheral blood mononuclear cells 3.b. What types of biospecimens are (PBMC) viable and non-viable, B-Lymphoblastoid cell lines (B-LCL) viable and non-viable, Granulocytes, Serum, DNA, Whole collected? genome amplified DNA 3.c. What types of analysis are done on them? Human Leukocyte Antigen characteristics 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Human Leukocyte Antigen characteristics on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Preliminary search is done using donor HLA characteristics and then a more formal search can be done by entering patient distribution? name, HLA, disease, etc. that generates a report sorting by match ranks. Also links to other registries databases. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

146 Criteria Answers 4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Demographics, condition, HLA reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

147 Breast Cancer Family Registry (BCFR)

Criteria Answers 1.a. How many people does the network See Table 1 cover or involve? 1.a.i. Evidence of capacity for expansion to The BCFR conducts special recruitment initiatives including initiatives to recruit Ashkenazi families and racial and ethnic cover additional lives, diseases, conditions, or minorities for further broaden their study of breast cancer. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic Not available Ontario Cancer Center (Canada), University of Southern California Consortium, University of Melbourne (Australia), Hawaii 1.b.i.2. Demographics: geography Cancer Registry, Mayo Clinic (Rochester, MN), Fred Hutchinson Cancer Research Center (Seattle, WA) 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of National Cancer Institute (NCI) funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Not available existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Breast cancer in families 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Not available consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data Use Agreements and Data Submission Agreements are required from sites that send data. each other using the data 1.g.iii.1.b. Policies for sharing data outside Outside investigators must collaborate with a member of the consortium. the network

148 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary The individual clinical sites own the data. data 1) A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. Hum Mol Genet. 2012 Dec 15;21(24):5373-84.

2.a. Three most recent (or high impact) 2) Better cancer biomarker discovery through better study design. Eur J Clin Invest. 2012 Dec;42(12):1350-9. studies published in peer-reviewed journals 3) Risk of Asynchronous Contralateral Breast Cancer in Noncarriers of BRCA1 and BRCA2 Mutations With a Family History of Breast Cancer: A Report From the Women's Environmental Cancer and Radiation Epidemiology Study. J Clin Oncol. 2013 Feb 1;31(4):433-9. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Genes, Environment and Breast Cancer Risk: the 15 Year Follow-Up of the Prof-SC - 2.b.i. What is the evidence? http://maps.cancer.gov/overview/DCCPSGrants/abstract.jsp?applId=8196169&term=CA159868 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Healthcare organizations agree to share data patient data with the data coordination center. giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Not available network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood/buccal samples, Cell lines, Tumor material collected? 3.c. What types of analysis are done on them? Not available 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? Held at Georgetown UIS Laurel Data Center, authentication for users and the backend is only available to programmers. 4.a. What type of security technology does NetID system at Georgetown requires that the principle investigator at Georgetown approves everyone who receives an ID the network use? to the database. Data are not sent via e-mail or transferred on hard drives. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? A project concept is submitted to the steering committee. If approved, the data coordination center sends the investigator 4.b.ii. What is the architecture of the query a link to the data request form. The coordination center processes the data request by querying the central database and distribution? puts it into the format that the investigator requests and puts it on their website. The investigator logs into the website and downloads the data. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Home grown 4.d.iii. How are the data transformed and Common data elements were created by the central hub working group and the query is sent to the individual sites. The mapped? data elements are captured and sent back to the central hub. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown standards way to map back to standards? (Data Dictionary?)

149 Criteria Answers Previous cancer diagnoses in the patient and the patient's parents, siblings, and children; all cancers, except non- 4.f. List the types of data that are being melanoma skin cancers and cervical carcinoma in situ; dates of all cancer diagnoses and deaths, demographics, collected or accessed and incorporated into race/ethnicity, religion; personal history of cancer, breast and ovarian surgeries, radiation exposure, smoking and alcohol the network (e.g., EHR data, claims, patient- consumption, menstrual and pregnancy history, breast-feeding, hormone use, weight, height, and physical activity; reported outcomes, etc.). frequency of food consumption and portion size; 30 ml sample of blood, paraffin blocks are requested for individuals with a history of breast or ovarian cancer 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No, but they are de-identified the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the R code and SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

150 Table 1. Family Recruitment

*Table from http://epi.grants.cancer.gov/CFR/about_breast.html

151 BreastCancerTrials.org (BCT)

Criteria Answers 1.a. How many people does the network 29,977 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Each week, 4-5 new clinical trials for breast cancer are added to the site. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No - *BCT.org does not track whether a patient signed up for a study or what the results of that study were care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable White (Non-Hispanic): 89% White (Hispanic): 3% Asian: 4% 1.b.i.1. Demographics: racial/ethnic African-American: 3% American Indian/Alaskan Native: 0.6% Pacific Islander: 0.03% 1.b.i.2. Demographics: geography See Table 1 Total Patients < 30: 1.5% 30-39: 10% 40-49: 32% 1.b.i.3. Demographics: age 50-59: 38% 60-69: 16% 70-79: 2.9% 80: 0.3% Female: 80% 1.b.i.4. Demographics: gender Male: 20% 1.c.i. What is the total annual budget? $350,000 1.c.i.1. How much of that budget is dedicated $25,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $60,000 to conducting studies? 1.c.ii. What are the current sources of Safeway Food Stores, California Endowment, Research and collaboration with community-based organizations, CA Breast funding? Cancer Research Program, individual donors 1.c.iii. How much does it cost each year to This amount is included in the amount of the budget dedicated to infrastructure and maintenance annually maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Breast Cancer Clinical Trials 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Users have complete control over what is contained in their "Health History" and with whom it can be shared. BCT never in what mechanism? How are they involved in shares user's personal health information with any individual or organization without a user's explicit permission. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

152 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) BCT will only release patient information to Trial Site Network sites that users have explicitly requested BCT to contact on 1.g.iii.1.a. Data use and sharing policies for their behalf. BCT requires that all BCT Trial Site Network sites agree to protect the privacy and security of BCT-referred institutional investigators to collaborate with patient health information as they would their own patient records and in full compliance with their institution's HIPAA each other using the data policies and procedures. Furthermore, BCT requires that research sites only permit individuals who have been authorized by a designated BCT liaison to log onto BCT and view patient records. 1.g.iii.1.b. Policies for sharing data outside Data are not shared outside the network unless a patient allows the registry to connect him or her to researchers using the the network SecureConnect program. 1.g.iii.1.c. Policies for protecting proprietary All data sharing is patient directed and can be shared on behalf of the patient using SecureConnect only. data 2.a. Three most recent (or high impact) No studies published studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not available values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Healthcare organizations are conducting the trials that the BCT connects its users to. giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Not applicable collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does All patients and researchers have user IDs and passwords the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? SecureConnect -- If a trial is in a matching service and the patient wants to participate, the patient notifies BCT that he 4.b.ii. What is the architecture of the query would like to participate in the trial. Then, BCT sends a notification to the researcher saying that the patient is interested in distribution? the trial. The researcher can then log on the BCT site and see the patient's medical history and decide whether to contact the patient. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

153 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Pathology report detailing precisely what the pathologist saw in the tumor tissue, breast cancer staging information, the network (e.g., EHR data, claims, patient- imaging reports such as mammographies, ultrasounds, bone scans, CT, MRI, and PET scans, breast cancer treatment, or reported outcomes, etc.). survivorship plans.

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Based on criteria of the clinical trial aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

154 Table 1. Geographical Distribution of BreastCancerTrials Users in United States

*Submitted by interviewee from BreastCancerTrials.org 155 California Cancer Registry (CCR)

Criteria Answers 1.a. How many people does the network 1,277,200 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Mainly conducts studies involving cancer research procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic See Table 1 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? $1,200,000 1.c.i.1. How much of that budget is dedicated Percentage of the annual budget to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Percentage of the annual budget to conducting studies? 1.c.ii. What are the current sources of Centers for Disease Control (CDC) and Surveillance, Epidemiology and End Results (SEER) funding? 1.c.iii. How much does it cost each year to Percentage of the annual budget maintain and update the network? 1.d. How many years has this network 5 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Cancer Research 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Cancer researchers must go through a rigorous process to access any CCR data. The CCR will only release patient contact information to qualified researchers under tightly controlled circumstances where the research has first been approved by 1.g.iii.1.a. Data use and sharing policies for the California State Committee for the Protection of Human Subjects (CPHS) Institutional Review Board. Research institutional investigators to collaborate with proposals are evaluated by CPHS to ensure patients’ rights are protected and the research justified. Additionally, a each other using the data federally approved Institutional Review Board (IRB) at the researcher’s institution must also approve the research proposal. This IRB will also ensure that patient rights are monitored and protected.

156 Criteria Answers Cancer researchers must go through a rigorous process to access any CCR data. The CCR will only release patient contact information to qualified researchers under tightly controlled circumstances where the research has first been approved by 1.g.iii.1.b. Policies for sharing data outside the California State Committee for the Protection of Human Subjects (CPHS) Institutional Review Board. Research the network proposals are evaluated by CPHS to ensure patients’ rights are protected and the research justified. Additionally, a federally approved Institutional Review Board (IRB) at the researcher’s institution must also approve the research proposal. This IRB will also ensure that patient rights are monitored and protected. 1.g.iii.1.c. Policies for protecting proprietary Safeguards in place to protect, but not all HIPAA identifiers are removed. data 1) Y. Zak, K. F. Rhoads and B. C. Visser. Predictors of Surgical Intervention for Hepatocellular Carcinoma: Race, Socioeconomic Status, and Hospital Type. Arch Surg. 2011. 46(7) 778-84

2.a. Three most recent (or high impact) 2) H. Zheng, W. Zhang, J. Z. Ayanian, L. B., Zaborski and A. M. Zaslavsky. Profiling Hospitals by Survival of Patients with studies published in peer-reviewed journals Colorectal Cancer. Health Serv Res. 2011. 46(3) 729-46

3) M. Cockburn, P. Mills, X. Zhang, J. Zadnick, D. Goldberg and B. Ritz. Prostate Cancer and Ambient Pesticide Exposure in Agriculturally Intensive Areas in California. Am J Epidemiol. 173(11) 1280-8 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Providing EHR data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Barracuda system, RSA for 2-Factor Authentication, IP-Filtering, External and Internal Firewalls the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, SEER ICDO 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? North American Association of Central Cancer Registries Data Model 4.d.iii. How are the data transformed and There are code crosswalks that allow data to be mapped and transformed from the source. mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

157 Criteria Answers 4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a North American Association of Central Cancer Registries Data Standards way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Patient’s name, address at time of diagnosis, sex, race, and age at diagnosis, type of cancer (such as breast cancer) and the network (e.g., EHR data, claims, patient- stage of disease at time of diagnosis, whether the patient had surgery, radiation, or chemotherapy as the first course of reported outcomes, etc.). treatment.

4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the The SEER*Stat tool provided by SEER National Cancer Institute network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

158 Table 1

* Table from http://www.ccrcal.org/pdf/Reports/ACS_2012.pdf, page 23

159 California Immunization Registry (CAIR)

Criteria Answers 1.a. How many people does the network 12,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to The registry's target is to cover 62 percent of children under age of 6 years. The registry plans to expand to allow schools cover additional lives, diseases, conditions, or to access immunization information electronically. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? The registry provides reminders when an immunization is due or overdue, consolidates immunizations into a single record, 1.a.iii.1. What is the evidence? provides current recommendations and information on new vaccines, helps identify high-risk populations and under-immunized populations, and generates a variety of reports including coverage reports, e.g., HEDIS. 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography All counties in California except Imperial County 1.b.i.3. Demographics: age More heavily weighted towards 0-18 years although all ages are included 1.b.i.4. Demographics: gender Comparable to state of California gender composition 1.c.i. What is the total annual budget? $2,600,000 1.c.i.1. How much of that budget is dedicated $2,600,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $0 to conducting studies? 1.c.ii. What are the current sources of All federal - not further specified funding? 1.c.iii. How much does it cost each year to This amount is included in budget dedicated to infrastructure and maintenance annually. maintain and update the network? 1.d. How many years has this network 15 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Immunization records for residents of California 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable - covered by HIPAA which allows collection of data that is required by law to be sent to the database, but a research) or specific use of their electronic disclosure is shared with all parents. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication None adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab None orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Data Exchange Agreement between doctors and the registry. Additionally, epistomologists have internal access during institutional investigators to collaborate with outbreak investigations. each other using the data 1.g.iii.1.b. Policies for sharing data outside Only outside access is to health plans for HEDIS determinations the network

160 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary User registry access agreements define conditions for data usage data 2.a. Three most recent (or high impact) The Challenge and Potential of Childhood Immunization Records. Victoria A. Freeman and Gordon H. DeFriese. Annu. Rev. studies published in peer-reviewed journals Public Health 2003. 24:227–46. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Health care providers and public health departments link the CAIR system with their EHR system and update patient giving access to EHRs, etc.) records of immunization into the system. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Secure File Transfer Protocol (SFTP) the network use? 4.b.i. (Y/N) Are queries distributed via a Data is sent via Secure File Transfer Protocol (SFTP) central hub? 4.b.ii. What is the architecture of the query No querying system because the network uses an SFTP server distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? HL-7 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Through an export process that retrieves immunization data from the clinic's EHR system and then exports it as an HL-7 or mapped? flat file to CAIR. 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Immunization Records from EHRs reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

161 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? HL-7

162 California Joint Replacement Registry (CJRR)

Criteria Answers 1.a. How many people does the network 11 hospitals and 54 surgeons cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Mainly conducts studies involving joint replacement procedures and outcomes procedures 1.a.ii.1. Can the network be used for new Only in the same condition studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? http://www.caljrr.org/pdf/Rationale_for_CJRR.pdf 1.b.i.1. Demographics: racial/ethnic Confidential 1.b.i.2. Demographics: geography Confidential 1.b.i.3. Demographics: age Confidential 1.b.i.4. Demographics: gender Confidential 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of Funded by California HealthCare Foundation and Pacific Business Group on Health funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Hip and Knee Joint replacement 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad - Give your permission to your surgeon and hospital so that it can share information about you, your surgery, and research) or specific use of their electronic how you felt before and after it with the database data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Participants own the data from their own institutions even after that data has been contributed to the CJRR. Specific terms in what mechanism? How are they involved in of use for the data provided by a participant are outlined in Business Associate Agreements and Participation Agreements the decision-making process? agreed upon by each participating site and the CJRR. 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with The registry is new and does not yet allow others to access the data. each other using the data 1.g.iii.1.b. Policies for sharing data outside The registry is new and does not yet allow others to access the data. the network

163 Criteria Answers To protect your SSN, before sending any information to the CJRR registry, special software is used to scramble each patient’s SSN and create a new number to track each patient. This scrambled number (not your SSN) is then saved in the 1.g.iii.1.c. Policies for protecting proprietary registry database. Only the hospital where you received care can match your SSN to the scrambled code; the CJRR cannot data do this matching. Stores data on dedicated servers that have physical and electronic protections and verifes that all communications with the registry are from valid sources (“authenticated”). 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Complete a survey about you before your surgery and at several points in time after your surgery (6 months, one year). The survey collects information about you that only you know, such as whether you can walk better after your surgery and 2.b.i. What is the evidence? whether you are free from pain. The survey takes about 20 minutes to complete. The questions do not require that you provide long answers. If you participate in the CJRR, you would fill out the surveys through a secure on-line application that you would get to from an e-mail link sent to you by your hospital or surgeon. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By referring patients giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient No outcomes? 4.a. What type of security technology does Data are stored at a data center that is not accessible via the web or online. SFTP is used by sites to upload data to the the network use? database. Users have to contact the registry and go through a process in order to obtain the data. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Home grown 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Utilizes a data dictionary way to map back to standards? (Data Dictionary?)

164 Criteria Answers where your surgery took place (which hospital); who your surgeon was; the specific type of implant you received; which 4.f. List the types of data that are being side of your body you were operated on; the medications given to you before and after your survey; other selected collected or accessed and incorporated into information about you that is important to know since it can impact the results of the surgery, such as your age and the network (e.g., EHR data, claims, patient- whether you have other conditions like diabetes or heart disease; information from you about how you felt before and reported outcomes, etc.). after your surgery (called “patient- reported outcomes”). This information is collected through surveys that you would fill out on a secure website before your surgery and at a few times after your surgery (e.g. six months, and one year); and your scrambled Social Security Number which identifies you as you 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., Data are aggregated, but at patient level, and can be identified or de-identified based on what the researchers requested. based on what criteria are the data Then, the data are sent to the researcher on an encrypted disk. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

165 The Colon Cancer Family Registry (CCFR)

Criteria Answers 1.a. How many people does the network See Table 1 cover or involve? CCFR conducts its enrollment efforts in Phases. In phase I recruitment (1998-2002), population-based sampling ranged 1.a.i. Evidence of capacity for expansion to from all incident cases of colorectal cancer to a subsample based on age at diagnosis and/or family cancer history. During cover additional lives, diseases, conditions, or phase II (2002-2007), population-based recruitment targeted cases diagnosed before the age of 50 years are more likely procedures attributable to genetic factors. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available Fox Chase Cancer Center (Philadelphia, PA), Columbia University (New York), University of Utah, University of Melbourne 1.b.i.2. Demographics: geography (Australia), Ontario Cancer Center (Canada), Northern California Cancer Center (Fremont), University of California, Irvine 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of National Cancer Institute funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Not available existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Colon cancer in families 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Not available consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Data Use Agreements and Data Submission Agreements from sites that send data. Within the consortium, there is free institutional investigators to collaborate with collaboration. each other using the data 1.g.iii.1.b. Policies for sharing data outside Outside investigators must collaborate with a member of the consortium the network

166 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary The individual clinical sites own the data data 1) Peters U, Hutter CM, Hsu L, Schumacher FR, Conti DV, Carlson CS, Edlund CK, Haile RW, Gallinger S, Zanke BW, Lemire M, Rangrej J, Vijayaraghavan R, Chan AT, Hazra A, Hunter DJ, Ma J, Fuchs CS, Giovannucci EL, Kraft P, Liu Y, Chen L, Jiao S, Makar KW, Taverna D, Gruber SB, Rennert G, Moreno V, Ulrich CM, Woods MO, Green RC, Parfrey PS, Prentice RL, Kooperberg C, Jackson RD, Lacroix AZ, Caan BJ, Hayes RB, Berndt SI, Chanock SJ, Schoen RE, Chang-Claude J, Hoffmeister M, Brenner H, Frank B, Bézieau S, Küry S, Slattery ML, Hopper JL, Jenkins MA, Le Marchand L, Lindor NM, Newcomb PA, Seminara D, Hudson TJ, Duggan DJ, Potter JD, Casey G. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum Genet. 2012 Feb;131(2):217-34. 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) Adams SV, Newcomb PA, Burnett-Hartman AN, White E, Mandelson MT, Potter JD. Circulating 25-hydroxyvitamin-D and risk of colorectal adenomas and hyperplastic polyps. Nutr Cancer. 2011 Apr;63(3):319-26.

3) Bertuccio P, La Vecchia C, Silverman DT, Petersen GM, Bracci PM, Negri E, Li D, Risch HA, Olson SH, Gallinger S, Miller AB, Bueno-de-Mesquita HB, Talamini R, Polesel J, Ghadirian P, Baghurst PA, Zatonski W, Fontham ET, Bamlet WR, Holly EA, Lucenteforte E, Hassan M, Yu H, Kurtz RC, Cotterchio M, Su J, Maisonneuve P, Duell EJ, Bosetti C, Boffetta P. Cigar and pipe smoking, smokeless tobacco use and pancreatic cancer: an analysis from the International Pancreatic Cancer Case-Control Consortium (PanC4). Ann Oncol. 2011 Jun;22(6):1420-6. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? The Family Health Promotion Project (FHPP): design and baseline data from a randomized trial to increase colonoscopy 2.b.i. What is the evidence? screening in high risk families. Lowery JT, Marcus A, Kinney A, Bowen D, Finkelstein DM, Horick N, Garrett K, Haile R, Sandler R, Ahnen DJ. Contemp Clin Trials. 2012 Mar;33(2):426-35. doi: 10.1016/j.cct.2011.11.005. Epub 2011 Nov 12. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Healthcare organizations agree to share data patient data with the data coordination center giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? The Family Health Promotion Project (FHPP): design and baseline data from a randomized trial to increase colonoscopy 2.d.i.1. What is the evidence? screening in high risk families. Lowery JT, Marcus A, Kinney A, Bowen D, Finkelstein DM, Horick N, Garrett K, Haile R, Sandler R, Ahnen DJ. Contemp Clin Trials. 2012 Mar;33(2):426-35. doi: 10.1016/j.cct.2011.11.005. Epub 2011 Nov 12. 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Flood/Cuccal samples, cell lines, tumor material collected? 3.c. What types of analysis are done on them? blood sample separation and aliquoting (or tissue sectioning) 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? Held at Georgetown UIS Laurel Data Center. Authentication for users and the backend is only available to programmers. 4.a. What type of security technology does NetID system at Georgetown requires that the PI at Georgetown approves everyone who receives an ID to the database the network use? data are not sent via e-mail or transferred on hard drives. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? A project concept is submitted to the steering committee. If approved, the data coordination center sends the investigator 4.b.ii. What is the architecture of the query a link to the data request form, the coordination center processes the data request by querying the central database and distribution? puts it into the format that the investigator requests, then puts it on its website. The investigator logs into the website and downloads the data. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? Home grown

167 Criteria Answers 4.d.iii. How are the data transformed and Common data elements were created by the central hub working group and the query is sent to the individual sites and mapped? the data elements are captured and sent back to the central hub 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown standards way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being Information on the number, sex, and birthdates of first-degree relatives (parents, siblings, and children), their cancer collected or accessed and incorporated into history, vital status, and, if deceased, date of death. All cancers, except for nonmelanoma skin cancers, were recorded with the network (e.g., EHR data, claims, patient- dates of diagnoses; information on established and suspected risk factors for colorectal cancer, including medical history reported outcomes, etc.). and medication use, reproductive history (for female participants), physical activity, demographics, alcohol and tobacco use, race and ethnicity, and limited dietary data; blood and paraffin-embedded tumor tissue 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No, but data are de-identified the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the R code and SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

168 Table 1. Family Recruitment

*Table from http://epi.grants.cancer.gov/CFR/about_colon.html

169 Cystic Fibrosis Patient Registry

Criteria Answers 1.a. How many people does the network 27,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Patient data includes any new treatments or studies that the cystic fibrosis (CF) patient is participating in procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Annual center-level reports inform healthcare professionals of their current practice patterns and clinical outcomes, and allow comparisons to the national averages. Patient data are continually updated and it allows the healthcare community 1.a.iii.1. What is the evidence? to see a comprehensive medical description of the CF population as a whole, to see the impact of specific treatments, and gauge the care of the CF patients based on the data. 1.b.i.1. Demographics: racial/ethnic See Table 1 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age See Table 1 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Cystic Fibrosis Foundation funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 57 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Cystic Fibrosis 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific - Patients must sign an informed consent to participate in the registry and then an additional consent for any study research) or specific use of their electronic they participate in. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Not available each other using the data

170 Criteria Answers 1.g.iii.1.b. Policies for sharing data outside Center-level data are available publicly on the CF Foundation website (www.cff.org) the network 1.g.iii.1.c. Policies for protecting proprietary Not available data 1) Yen EH, Quinton H, Borowitz D., Better Nutritional Status in Early Childhood is Associated with Improved Clinical Outcomes and Survival in Patients with Cystic Fibrosis. J Pediatr. 2012 Oct 11.Epub ahead of print. 2012

2) Quon BS, Psoter K, Mayer-Hamblett N, Aitken ML, Li CI, Goss CH. Disparities in Access to Lung Transplantation for Cystic 2.a. Three most recent (or high impact) Fibrosis Patients by Socioeconomic Status. Am J Respir Crit Care Med. 2012 Sep 13. [Epub ahead of print] 2012 studies published in peer-reviewed journals 3) Bradley S. Quon, MD; Nicole Mayer-Hamblett, PhD; Moira Aitken, MD; Christopher H. Goss, MD, MSc Risk of Post Lung Transplant Renal Dysfunction in Adults with Cystic Fibrosis Published online before print January 5, 2012, doi: 10.1378/chest.11-1926. CHEST January 2012111926. 2012 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Patient data are updated and forwarded to the registry after each visit and patients fill out an annual questionnaire. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Over 100 participating clinics update and send patient data to the registry. giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? Michael R. Knowles, M.D., Kathy W. Hohneker, R.N., Zhaoquing Zhou, Ph.D.., et. al. A Controlled Study of Adenoviral- 2.d.i.1. What is the evidence? Vector-Mediated Gene Transfer in the Nasal Epithelium of Patients with Cystic Fibrosis. The New England Journal of Medicine; September 1995. 1995 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Blood, urine, stool, and tissue from CF clinical trials collected? Spirometry, Exacerbations, Blood Inflammatory Mediators, LRT Microbiology, Growth, Sweat Chloride, Sputum 3.c. What types of analysis are done on them? Inflammatory Mediators 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Spirometry, Exacerbations, Blood Inflammatory Mediators, LRT Microbiology, Growth, Sweat Chloride, Sputum on them? Inflammatory Mediators 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query The researcher submits the data request to the Cystic Fibrosis Foundation via a central hub and if approved the data are distribution? returned to the researcher. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Not available

4.d.ii. Which CDM is used? Not available 4.d.iii. How are the data transformed and Not available mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?)

171 Criteria Answers 4.f. List the types of data that are being collected or accessed and incorporated into State of residence, height, weight, gender, CF mutations, lung function test resultsfrom pulmonary function tests, the network (e.g., EHR data, claims, patient- medication use, complications (problems) related to CF reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Statistical process control charts, GeneGo's MetaMiner CF, network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

172 Table 1

*Table from http://www.cff.org/UploadedFiles/research/ClinicalResearch/2011-Patient-Registry.pdf, page 26.

173 Kaiser Permanente Total Joint Replacement Registry

Criteria Answers 1.a. How many people does the network 160,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or The registry adds 17,000 new patients each year. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? The registry documents surgical techniques and implant characteristics; characterizes patients undergoing joint replacements and the relationships between these characteristics and techniques/implant selection; compares incidence rates and variations in clinical care; identifies relationships between variations in practice and short-term outcomes; and identifies risk factors associated with joint replacement revisions. 1.a.iii.1. What is the evidence? - The registry helps Kaiser Permanente immediately notify and identify patients about recalled or defective implants prior to an official recall notice. - The registry has successfully monitored and identified two recalls and advisories. - Prevented 16 revisions through information sharing from the registry 1.b.i.1. Demographics: racial/ethnic Not available Southern California, Northern California, Washington, Oregon, Idaho, Hawaii, Colorado, Georgia, Ohio, Maryland, District 1.b.i.2. Demographics: geography of Columbia, Virginia 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Kaiser Permanente Integrated Health Plan funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 12 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Joint Replacements 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data)

174 Criteria Answers 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Not available each other using the data 1.g.iii.1.b. Policies for sharing data outside Data are not shared outside the network the network 1.g.iii.1.c. Policies for protecting proprietary HIPAA compliant data 2.a. Three most recent (or high impact) Not available studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Registry consists of patients who have had a joint replacement at the Kaiser Permanente Healthcare organization giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Data are collected from the individual sites and stored at a central hub (Clarity Database), where it can be queried using distribution? SAS and merged into an SQL database with a front end Microsoft Access Application 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9-CM 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Home grown core data standards way to map back to standards? (Data Dictionary?)

175 Criteria Answers 4.f. List the types of data that are being Patient (e.g., age, gender, and diagnoses), procedure (e.g., operative date, laterality, surgical approach), hospital admission collected or accessed and incorporated into (e.g., length of stay, discharge disposition), implant and fixation information (e.g., manufacturer, catalog, and lot numbers) the network (e.g., EHR data, claims, patient- and outcome variables including complications (i.e., surgical site infections, VTE), revisions, re-operations, hospital reported outcomes, etc.). readmissions, and death 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are aggregated based on encounter or transaction. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts, Crystal Report network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? SQL scripts

176 Life Raft Group

Criteria Answers 1.a. How many people does the network 1,500 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Can cover additional lives in the tissue bank and registry procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? A nonrandom association of gastrointestinal stromal tumor (GIST) and desmoid tumor (deep fibromatosis): Case series of 1.a.iii.1. What is the evidence? 28 patients. A.G. Dumont; L. Rink; A.K. Godwin; M. Miettinen; H. Joensuu; J.R. Strosberg; A. Gronchi; C.L. Corless; D. Goldstein; B.P. Rubin; et al. Annals of Oncology. 2012;23(5):1335-1340. 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $2,835,317 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $2,126,487 to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 10 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Gastrointestinal Stromal Tumors (GIST) 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Patients may decide to contribute as little or as much information as they feel comfortable with. This ranges from their e- in what mechanism? How are they involved in mail address, symptoms, and date of diagnosis to full contributions to the tissue bank. the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for 9 research team members must agree to collaborate results and share tissue in order to receive funding from Life Raft institutional investigators to collaborate with Group each other using the data 1.g.iii.1.b. Policies for sharing data outside Researchers have access to a de-identified, HIPAA compliant tissue bank by signing a DUA. Stanford University's IRB the network handles research data requests because the tissue is stored in its Microarray Database.

177 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary All tissue and pathology reports are de-identified after being processed by Oregon Health Sciences University (HSU). All data information about the patient is HIPAA compliant. 2.a. Three most recent (or high impact) Not available studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Researchers are currently studying a particular metabolic pathway using the tissue from the tissue bank and matching it 2.b.i. What is the evidence? with de-identified patient data. The publication should be released in a few months. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers The same data elements are collected over time. standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively No participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Not applicable giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are Tumor tissue, paraffin-based collected? 3.c. What types of analysis are done on them? Tissue undergoes mutational testing at Oregon HSU and then is processed into a tissue microarray at Stanford University. 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Mutational Testing on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? Researchers are given entry into a cordoned off portion of the electronic registry that includes only de-identified patient 4.a. What type of security technology does data. Only the patient registry supervisor has the ability to match patient identifying information to other information. The the network use? server is housed at Life Raft Group on a separate server. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? The de-identified patient record is sent by Life Raft Group and matched with the patient's particular tissue which is sent by 4.b.ii. What is the architecture of the query Stanford, to the researcher. Researchers typically ask for data based on 1 or 2 criteria - information can be given distribution? electronically in a spreadsheet or as a hard copy. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a The registry uses home grown standards and a data dictionary way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Conditions, medications, procedures, health-related quality of life, all updated after each doctor appointment, registry the network (e.g., EHR data, claims, patient- data, pathology reports, biospecimens reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

178 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SPSS code network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

179 MURDOCK

Criteria Answers 1.a. How many people does the network 9,000 cover or involve? 1.a.i. Evidence of capacity for expansion to With guidance from a nationally recognized group of epidemiologists and the MURDOCK Study Leadership group, the cover additional lives, diseases, conditions, or Kannapolis-based team will begin recruiting a representative sample of the local population this January 2013 into the procedures MURDOCK Study Community Registry and Biorepository. 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable Hispanic: 9% 1.b.i.1. Demographics: racial/ethnic African American: 13% 1.b.i.2. Demographics: geography North Carolina- Kannapolis and Cabarrus Counties 1.b.i.3. Demographics: age Median age is 55 Male: 25% 1.b.i.4. Demographics: gender Female: 65% 1.c.i. What is the total annual budget? 2,000,000 1.c.i.1. How much of that budget is dedicated $400,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated $1,600,000 to conducting studies? 1.c.ii. What are the current sources of Mr. David H. Murkock funding? 1.c.iii. How much does it cost each year to Included in the annual budget maintain and update the network? 1.d. How many years has this network 4 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Improve disease characterization on a molecular level 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Broad research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes - up to 4 times a year consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and Registry participants are on registry boards; volunteers from the community recruit registry patients at locations around in what mechanism? How are they involved in the community the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Collected in Clinical Trials orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for Proposal form reviewed by leadership group. Group reviews at an ad hoc basis and then if approved they work with study institutional investigators to collaborate with personnel. An agreement is in place that results are returned to the study and publications must identify the Murdock each other using the data Study. 1.g.iii.1.b. Policies for sharing data outside Research proposal is submitted and leadership team decides how to proceed. Budget is generated and from there the the network process parallels the policy for institutional investigators.

180 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary All managed through consent or investigator agreements data 1) The Measurement to Understand Reclassification of Disease of Cabarrus/Kannapolis (MURDOCK) Study Community Registry and Biorepository. Sayanti Bhattacharya, Ashley A Dunham, Melissa A Cornish, Victoria A Christian, Geoffrey S Ginsburg, Jessica D Tenenbaum, Meredith L Nahm, Marie Lynn Miranda, Robert M Califf, Rowena J Dolor, L. Kristin Newby. Am J Transl Res 2012;4(4):458-470. 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 2) The MURDOCK Study: a long-term initiative for disease reclassification through advanced biomarker discovery and integration with electronic health records. Jessica D Tenenbaum, Victoria Christian, Melissa A Cornish, Rowena J Dolor, Ashley A Dunham, Geoffrey S Ginsburg, Virginia B Kraus, John G McHutchison, Meredith L Nahm, L. Kristin Newby, Laura P Svetkey, Krishna Udayakumar, Robert M Califf. Am J Transl Res 2012;4(3):291-301. 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? MURDOCK is designed to be a population-based, longitudinal health study. Participants of the registry commit to yearly 2.b.i. What is the evidence? follow-up exams. Researchers are currently in the process of following these cohorts in studies. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Versioning and making electronic notations standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, The study uses health sites to enroll patients in the study. Some staff at these sites enroll patients in the study. Sites also giving access to EHRs, etc.) give access to their patients' EHRs. 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are (1) plasma, n=16 (500 uL each); (2) buffy coat; (3) serum, n=10 (500 uL each); (4) environmental serum, n=1 (3 mL each); collected? (5) whole blood, n=2 (2 mL each); (6) PaxGene RNA, n=3; (7) urine, n=4 (10 mL each) 3.c. What types of analysis are done on them? Analysis is determined based on the research that is being conducted 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Proteomic analysis or genomic testing on them? 3.d.ii. Were they able to link the analysis/research results back to patient Papers addressing this link to patient outcomes are forthcoming outcomes? 4.a. What type of security technology does Resides on Duke servers, behind firewalls the network use? 4.b.i. (Y/N) Are queries distributed via a Yes- The MURDOCK Integrated Data Repository (MIDR) houses all the clinical data from early projects of the MURDOCK central hub? studies, plus study metadata, consent data, omics and imagine metadata, biospecimen data, and EHR data. 4.b.ii. What is the architecture of the query The query distribution is via a web-based querying system called the Registry Query Interface (RQI). Datasets are stored at distribution? their original sites and can be sent via secure FTP to MURDOCK database for researchers to access. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? RxNorm, ICD-9, SNOMED, UMLS 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Data dictionary way to map back to standards? (Data Dictionary?)

181 Criteria Answers Environmental exposures, personal and family history of disease, patient-reported outcomes, a series of questions of the NIH PROMIS Study questions 4.f. List the types of data that are being collected or accessed and incorporated into EHR data the network (e.g., EHR data, claims, patient- reported outcomes, etc.). Longitudinal outcomes assessment, biobanked samples, particular cohorts where they collect additional data - MS, severe acne, physical performance, memory health screener for over 55 cohort, individuals over the age of 100 for genome sequencing 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Data are aggregated at the central hub (not at the site level) for reporting purposes aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Registry Query Interface (home grown) network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Multiple systems integrate this data

182 New York State Congenital Malformations Registry

Criteria Answers 1.a. How many people does the network 9,584 cover or involve? 1.a.i. Evidence of capacity for expansion to Data entry method allows for new types of congenital malformation information to be entered. By law, newly diagnosed cover additional lives, diseases, conditions, or patients must be added to the registry. procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? The registry ensures that families of children identified in the registry locate available resources so that each child can 1.a.iii.1. What is the evidence? maximize his or her development. The registry also assists in identifying families of children with specific malformations who may be invited to participate in research studies.

1.b.i.1. Demographics: racial/ethnic See Table 1

1.b.i.2. Demographics: geography See Table 1 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender See Table 1 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 32 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Congenital malformations in children diagnosed before age 2 in New York state 1.f. (Y/N) Does the network use informed No consent - patient data are required by law to be added by physicians consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., New York State Registry - physicians and hospitals send reports over the Internet using the New York State Department of coded diagnostics, pharmacy orders, Health’s (NYSDOH) Health Provider Network (HPN). pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

183 Criteria Answers 1.g.iii.1.a. Data use and sharing policies for All investigators are outside the institution and must follow policies listed for data sharing outside the network by filing a institutional investigators to collaborate with report using the New York State Department of Health's (NYSDOH) Health Provider Network (HPN) website each other using the data Researchers must fill out a data request form. 1.g.iii.1.b. Policies for sharing data outside Families of registered patients are never contacted without prior consent of the Department of Health's Institutional the network Review Board and the notification of the patient's physician. Data collected by the registry can be used only for surveillance and to facilitate epidemiologic research into the prevention 1.g.iii.1.c. Policies for protecting proprietary of environmental diseases, as prescribed by Public Health Law 206(1J). Confidentiality of all data reported to the Registry is data strictly maintained by Department of Health staff and rigorously safeguarded by Section 206(1J), which specifically prohibits the release of personal identifiers. 1) Lin S, Herdt-Losavio M, Gensburg L, Marshall E, Druschel C. "Maternal asthma, asthma medication use and the risk of congenital heart defects." Birth Defects Research, Part A 2009; 85(2):161-1688.

2.a. Three most recent (or high impact) 2) Kumar J, Gordillo R, Kaskel FJ, Druschel CM, Woroniecki, RP. "Increased Prevalence of Renal and Urinary Tract Anomalies studies published in peer-reviewed journals in Children with Congenital Hypothyroidism." The Journal of Pediatrics 2009; 263-266.

3) Wang Y, Tao Z, Cross PK, Le LH, Steen PK, LaSelva nee-Babcock GD, Druschel CM, Hwang SA. Development of a Web- based Integrated Birth Defects Surveillance System in New York State. J Public Health Manag & Pract. 2008; 14(6):E1-E10.

2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up?

2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Physicians are lawfully required to submit information on patients diagnosed with a congenital malformation. giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Not available network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? Yes 3.b. What types of biospecimens are DNA samples collected? 3.c. What types of analysis are done on them? Chromosomal studies reporting the karyotype 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does ID and password needed to imput and review patient information. Physicians can only see information for patients whose the network use? information they imput. Browser must support 128-bit strength SSL encryption. 4.b.i. (Y/N) Are queries distributed via a Not applicable central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9-CM, ICD-10-CM, British Pediatric Association (BPA) 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped?

184 Criteria Answers 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) Congenital Anomalies, Fetal Alcohol Syndrome, Amniotic Bands, Congenital Infections: including rubella, cytomegalovirus 4.f. List the types of data that are being toxoplasmosis and herpes simplex, ipoma, benign neoplasm of skin, hemangioma of skin, umbilical hernia, accessory collected or accessed and incorporated into auricle, other specified anomalies of ear, unspecified anomaly of ear, branchial cleft cyst, other specified anomalies of face the network (e.g., EHR data, claims, patient- and neck, other unspecified anomalies of face and neck, single umbilical artery, embryonic cyst of cervix, vagina and reported outcomes, etc.). external female genitalia, imperforate hymen, dermatoglyphic anomalies, vascular hamartomas, congenital pigmentation anomalies of skin, other anomalies of skin, specified anomalies of hair, specified anomalies of nails, specified anomalies of breast, other specified anomalies of integument, unspecified anomalies of the integument 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? No tools utilized - information is input into the patient-level data form

185 Table 1

*Table from http://www.health.ny.gov/diseases/congenital_malformations/2007/section1.htm

186 Physician-Hospital Organization (PHO)

Criteria Answers 1.a. How many people does the network 200,000 cover or involve? 1.a.i. Evidence of capacity for expansion to Web-based asthma registry with longitudinal tracking/reporting of patient, transparent comparative practice, and network- cover additional lives, diseases, conditions, or level data for key process and outcome measures procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? Mandel KE. Aligning rewards with large-scale improvement. JAMA. 2010 Feb 17;303(7):663-4. 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $1,400,000 1.c.i.1. How much of that budget is dedicated $200,000 to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Percentage of the annual budget to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to $200,000 maintain and update the network? 1.d. How many years has this network 16 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Children with asthma 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad Not applicable - Quality improvement initiative falls under “operations,” thus obtaining patient consent is not required. (meaning data may be analyzed for other Business associate agreements are in place between each primary care practice and the PHO. Primary care practices issue research) or specific use of their electronic notice of privacy practices document to patients/families. data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for No formal policies exist—these decisions would be made by primary care independent practice association (IPA) Board and institutional investigators to collaborate with PHO Board. each other using the data 1.g.iii.1.b. Policies for sharing data outside No formal policies exist—these decisions would be made by primary care independent practice association (IPA) Board and the network PHO Board. 1.g.iii.1.c. Policies for protecting proprietary No formal policies exist—these decisions would be made by primary care independent practice association (IPA) Board and data PHO Board.

187 Criteria Answers 1) “Aligning Rewards with Large-Scale Improvement” (JAMA, 2010)

2.a. Three most recent (or high impact) 2) “Planning a Registry: Managing Care and Quality Improvement for Chronic Diseases” (Agency for Healthcare Research studies published in peer-reviewed journals and Quality: “Registries for Evaluating Patient Outcomes: A User’s Guide, 2nd Edition,” 2010)

3) “Pay for Performance Alone Cannot Drive Quality” (Archives of Pediatrics and Adolescent Medicine, 2007) 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? 2.b.i. What is the evidence? Mandel KE. Aligning rewards with large-scale improvement. JAMA. 2010 Feb 17;303(7):663-4. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Yes reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Health Organizations refer patients, give access to EHR data, and participate in research activities giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not supported and would need to recruit approval from IPA Board, PHO Board, and IRB 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does HIPAA security privacy protection the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9, CPT 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Demographic data captured in web-based registry/database: date of birth, address (including zip code), payor the network (e.g., EHR data, claims, patient- Data collected at point of care from patients/parents and providers. Admission and ED/urgent care visit data reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

188 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Primary care practice billing systems: structured queries submitted to PHO via secure email file transfer.

189 Reg4ALL

Criteria Answers 1.a. How many people does the network 0 (user enrollment will begin on Feb. 28th) cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Not applicable procedures 1.a.ii.1. Can the network be used for new Not applicable studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Not applicable care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not applicable 1.b.i.2. Demographics: geography Not applicable 1.b.i.3. Demographics: age Not applicable 1.b.i.4. Demographics: gender Not applicable 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Sanofi's Collaborate Activate Innovation Challenge funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Network goes online on Feb. 28th existed? 1.e.i. (Y/N) Does the network have a focus No (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Not applicable 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Participants control what data they store, with whom they share, and for what purposes their information is used the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) Any research study must first meet all of clinicaltrials.gov's requirements. Users decide if they want to be discoverable for research. A researcher sees aggregrate data when making the query to 1.g.iii.1.a. Data use and sharing policies for search for study participants. Reg4All sends the users all the information from clinicaltrials.gov and IRB approval, then the institutional investigators to collaborate with participant decides to make himself available for this study at which point a user's identifying information is shared with each other using the data the researcher.

190 Criteria Answers 1.g.iii.1.b. Policies for sharing data outside No data shared outside the network the network 1.g.iii.1.c. Policies for protecting proprietary No data shared outside the network data 2.a. Three most recent (or high impact) No publications studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Registry has not been in existence long enough to need to standardize data over time standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Outreach organizations, Genetic Alliance, Sanofi will refer patients to Reg4All giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Approved researchers send a query for the audience they want to connect with and the results are presented back to distribution? them in an aggregate format. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? ICD-9/10 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a NIH common data elements (CDE) codes way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into Surveys asking common health questions, common data elements (NIH CDE) in a survey format, disease specific data, the network (e.g., EHR data, claims, patient- uploaded clinical data sets from their EHRs and data from groups like Personal Genome Project, biobanked tissue (2014) reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

191 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., When a researcher first searches for potential participants, they can see counts of participants in the registry that meet based on what criteria are the data the research criteria. aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

192 ResearchMatch

Criteria Answers 1.a. How many people does the network 31,806 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Enables volunteers to be matched with researchers for a wide variety of studies involving different diseases and conditions procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available White: 79% Black or African American: 11% Hispanic or Latino: 6% 1.b.i.1. Demographics: racial/ethnic Asian: 4% American Indian or Alaska Native: 1% Multi-Racial: 3% Other 2% 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available Male: 28% 1.b.i.4. Demographics: gender Female: 72% 1.c.i. What is the total annual budget? Confidential 1.c.i.1. How much of that budget is dedicated Confidential to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Confidential to conducting studies? 1.c.ii. What are the current sources of NIH, Clinical and Translational Science Award (CTSA) and National Center for Advancing Translational Sciences (NCATS) funding? 1.c.iii. How much does it cost each year to Confidential maintain and update the network? 1.d. How many years has this network 3 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? To match volunteers to researchers for studies 1.f. (Y/N) Does the network use informed No consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes - it is the responsibility of the researcher to re-contact the patient consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Yes data they provided to the network? 1.g.i.1. What are the roles patients play and The patient may enter as much or as little information as they would like. If the patient decides to stop participating with in what mechanism? How are they involved in ResearchMatch, they can remove their profile and their information will no longer be shared/available the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Self-Reported adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, Not applicable pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life)

193 Criteria Answers Researchers from any participating institution may use ResearchMatch to recruit study participants. Researchers must first agree to ResearchMatch’s rules of use, including maintaining volunteers’ confidentiality and stipulating that all activity will be approved by local IRBs. After creating a ResearchMatch profile (contact information plus username/password), the researcher must electronically submit an IRB approval letter for at least one actively recruiting study. 1.g.iii.1.a. Data use and sharing policies for Researchers’ access requests are automatically routed to the appropriate institutional liaisons, who confirm the requests’ institutional investigators to collaborate with legitimacy and accuracy using local IRB approval letters. Once access is approved, the liaison sets an access expiration date each other using the data that corresponds to the study’s local IRB expiration date; the liaison can extend the access expiration date on receiving proof that the local IRB has extended its approval. ResearchMatch allows more than one authorized researcher to access the same protocol (e.g., a principal investigator plus multiple study coordinators). Access by other researchers in the same study requires permission from the principal investigator and the institutional liaison. 1.g.iii.1.b. Policies for sharing data outside The researcher has to be part of the network, CTSA institution, in the process of expanding outside the network the network 1.g.iii.1.c. Policies for protecting proprietary No one has access to the user's data unless they give permission via their account settings to share their identifying data information with researchers. None of the staff and/or liaisons have access to the volunteer's data. 2.a. Three most recent (or high impact) None studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple No values rather than one time) follow-up? 2.b.i. What is the evidence? Not applicable 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Provide a liaison to work with ResearchMatch who then coordinate with the researchers at their local site as well as their giving access to EHRs, etc.) local IRB 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the No network? 2.d.i.1. What is the evidence? Not applicable 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? Data is encrypted at rest. The application is written in PHP scripting language and is housed on the VUMC Apache server primary website. The back-end database for the application is mySQL server maintained on a separate server which houses all of the data related to the registry. All research subject recruitment data sent between web server and browsers is 4.a. What type of security technology does encrypted using Secure Sockets Layer (SSL) protection. Any record fields which are identified as health information (HI) are the network use? encrypted before storing in the database (encryption at rest) to ensure maximum data security. Both web and database servers are secure and firewall protected. Inputs are also filtered for web attacks, such as cross site scripting or sql injections. 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? UMLS 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped?

194 Criteria Answers 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and No interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not applicable way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into geographical, demographic data (age, height, weight, body mass index, gender, race, ethnicity, tobacco use, multiple birth the network (e.g., EHR data, claims, patient- status), medical conditions reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Not applicable network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

195 SUPREME-DM

Criteria Answers 1.a. How many people does the network 1,300,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or None procedures 1.a.ii.1. Can the network be used for new Yes but only for studies involving Diabetes Mellitus studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Not available care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not available 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? Not available 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of Not available funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network Not available existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Diabetes Mellitus 1.f. (Y/N) Does the network use informed Not available consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Not available consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the Not available data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not available the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Not available each other using the data 1.g.iii.1.b. Policies for sharing data outside Not available the network 1.g.iii.1.c. Policies for protecting proprietary Not available data

196 Criteria Answers 1) Nichols GA, Desai J, Elston Lafata J, Lawrence JM, O’Connor PJ, Pathak RD, Raebel MA, Reid RJ, Selby JV, Silverman BG, Steiner JF, Stewart WF, Vupputuri S, Waitzfelder B. Construction of a Multisite DataLink Using Electronic Health Records for the Identification, Surveillance, Prevention, and Management of Diabetes Mellitus: The SUPREME-DM Project. Preventing 2.a. Three most recent (or high impact) Chronic Disease 2012; 9:110311. DOI: http://dx.doi.org/10.5888/pcd9.110311 studies published in peer-reviewed journals 2) Desai JR, Wu P, Nichols GA, Lieu TA, O'Connor PJ. Diabetes and asthma case identification, validation, and representativeness when using electronic health data to construct registries for comparative effectiveness and epidemiologic research. Medical Care 2012 Jul; 50 Suppl:S30-5. PMID: 22692256 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Nichols GA, Desai J, Lawrence JM, Reid R, Schroeder EB, Steiner JF, Vupputuri S, Yan X, for the SUPREME-DM Study Group. 2.b.i. What is the evidence? 5-Year incidence of diabetes among 6.7 million adult HMO members: The SUPREME-DM project. Diabetes 2012; 61(Suppl 1):A 356. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing Not available reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not available standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, There are about 11 healthcare organizations that provide demographic, clinical data elements, and EHR data. giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Not available network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? Not available 3.b. What types of biospecimens are Not available collected? 3.c. What types of analysis are done on them? Not available 3.d. (Y/N) Do researchers in the network Not available collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not available on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not available outcomes? 4.a. What type of security technology does Not available the network use? 4.b.i. (Y/N) Are queries distributed via a No central hub? 4.b.ii. What is the architecture of the query Not applicable distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Not available SNOMED, etc.)?

4.c.ii. Which terminologies? Not available 4.d.i.(Y/N) Does the network use a common data model (CDM)? Yes

4.d.ii. Which CDM is used? HMORN Virtual Data Warehouse 4.d.iii. How are the data transformed and The data are mapped and transformed locally at each site to its own Virtual Data Warehouse mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Not available interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- EHR including demographic and clinical data elements reported outcomes, etc.).

197 Criteria Answers 4.g.i. (Y/N) Does the network use natural language processing? Yes

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, MediClass etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with Yes the network? 4.h.ii. How are the data transformed (i.e., Programs are typically distributed via e-mail or by posting them to a secure website. They must be manually downloaded, based on what criteria are the data approved by the individual site for execution, run by personnel at the sites, and results are then returned manually. Thus, aggregated)? site personnel retain complete control over their local data. 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Not available billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not available

198 United Network for Organ Sharing (UNOS)

Criteria Answers 1.a. How many people does the network 1,000,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or None procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical No care delivery or quality improvement? 1.a.iii.1. What is the evidence? Not applicable 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $42,762,536 1.c.i.1. How much of that budget is dedicated Not available to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Not available to conducting studies? 1.c.ii. What are the current sources of HERSA, computer registration fees when patients are listed for transplants, data services funding? 1.c.iii. How much does it cost each year to Not available maintain and update the network? 1.d. How many years has this network 13 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Organ donation and transplants 1.f. (Y/N) Does the network use informed No - not for purposes of being entered into the registry. UNOS has an IRB exemption. consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Specific consent is necessary for extenuating situations, for example: a patient who want to receive an expanded criteria research) or specific use of their electronic donor data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not applicable research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for No consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., Other - Clinical information, medical history, treatment information inputed into the UNOS system manually by the coded diagnostics, pharmacy orders, participating hospitals pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with Data use agreements each other using the data 1.g.iii.1.b. Policies for sharing data outside Data use agreement and researchers who are not members of UNOS are charged for data they receive the network 1.g.iii.1.c. Policies for protecting proprietary UNOS does not store patient identity information on their database data

199 Criteria Answers 2.a. Three most recent (or high impact) Not available studies published in peer-reviewed journals 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Not available values rather than one time) follow-up? 2.b.i. What is the evidence? Not available 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, By entering data about donors and candidates via a web based application run by UNOS giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Not available network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network No collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Not applicable on them? 3.d.ii. Were they able to link the analysis/research results back to patient Not applicable outcomes? UNOS monitors emerging threats and vulnerabilities using physical and automated tools, audits performed by internal and 4.a. What type of security technology does external personnel with the goal to have zero security incidents and minimal interruption to service. Future improvements the network use? to security are based on a process-driven analysis of emerging security threats and vulnerabilities, realistic assessment of the risk, implementation of controls to mitigate the risk, and regular testing of the controls to assure proper operation. 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query A researcher submits a data request and the Research Department at UNOS returns the data in the form of a report or a distribution? research dataset. 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, No SNOMED, etc.)?

4.c.ii. Which terminologies? Not applicable 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped? 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a UNOS uses a data dictionary way to map back to standards? (Data Dictionary?) 4.f. List the types of data that are being collected or accessed and incorporated into the network (e.g., EHR data, claims, patient- Clinical information, medical history, treatment information reported outcomes, etc.).

4.g.i. (Y/N) Does the network use natural language processing? Not available

200 Criteria Answers 4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not available etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the SAS scripts network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, No billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

201 Utah Population Database

Criteria Answers 1.a. How many people does the network 6,500,000 cover or involve? 1.a.i. Evidence of capacity for expansion to cover additional lives, diseases, conditions, or Conduct a wide variety of research consisting of different conditions procedures 1.a.ii.1. Can the network be used for new Yes studies in the same or a different condition? 1.a.iii. (Y/N) Is there evidence from the past that show the network can be used for clinical Yes care delivery or quality improvement? Coffield JE, Metos JM, Utz RL, Waitzman NJ. "A multivariate analysis of federally mandated school wellness policies on 1.a.iii.1. What is the evidence? adolescent obesity." J Adolesc Health. 2011 Oct;49(4):363-70. [Abstract] 1.b.i.1. Demographics: racial/ethnic Not available 1.b.i.2. Demographics: geography Not available 1.b.i.3. Demographics: age Not available 1.b.i.4. Demographics: gender Not available 1.c.i. What is the total annual budget? $1,500,000 1.c.i.1. How much of that budget is dedicated Percentage of the $1.5 million to infrastructure and maintenance? 1.c.i.2. How much of that budget is dedicated Percentage of the $1.5 million to conducting studies? 1.c.ii. What are the current sources of NIH, Huntsman Cancer Institute funding? 1.c.iii. How much does it cost each year to Percentage of the $1.5 million maintain and update the network? 1.d. How many years has this network 30 existed? 1.e.i. (Y/N) Does the network have a focus Yes (i.e., topic area or purpose)? 1.e.i.1. What does the network focus on? Biomedical, cancer, and other health-related research across the state of Utah 1.f. (Y/N) Does the network use informed Yes consent forms? 1.f.i. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their electronic data? 1.f.ii. Do patients consent to the broad (meaning data may be analyzed for other Not available research) or specific use of their biological specimens? 1.f.iii. (Y/N) Can patients be re-contacted for Yes consent for a new study? 1.g.i. (Y/N) Are patients involved in the decision-making process on the use of the No data they provided to the network? 1.g.i.1. What are the roles patients play and in what mechanism? How are they involved in Not applicable the decision-making process? 1.g.ii.1. What are the sources of Self-Reported data collected in the network? (e.g., conditions, medications, medication Not applicable adherence, procedures, labs/imaging, health- related quality of life) 1.g.ii.2. What are the sources of Health care- Derived data collected in the network? (e.g., coded diagnostics, pharmacy orders, EHR pharmacy fulfillment, procedures, lab orders, diagnostic results, imaging data) 1.g.ii.3. What are the sources of Clinical Trials data collected in the network? (e.g., coded diagnostics, drug information, procedures, lab Not applicable orders, diagnostic results, imaging data, biospecimen, health-related quality of life) 1.g.iii.1.a. Data use and sharing policies for institutional investigators to collaborate with All research projects must have IRB and RGE (Resources for Genetic and Epidemiologic Research) approval each other using the data 1.g.iii.1.b. Policies for sharing data outside All research projects must have IRB and RGE (Resources for Genetic and Epidemiologic Research) approval the network

202 Criteria Answers 1.g.iii.1.c. Policies for protecting proprietary HIPAA, Data Use Agreement data 1) Hawkes JE, Cassidy PB, Manga P, Boissy RE, Goldgar D, Cannon-Albright L, Florell SR, Leachman SA. "Report of a novel OCA2 gene mutation and an investigation of OCA2 variants on melanoma risk in a familial melanoma pedigree." J Dermatol Sci. 2013 Jan;69(1):30-7. doi: 10.1016/j.jdermsci.2012.09.016. [Abstract]

2) Hurdle JF, Haroldsen SC, Hammer A, Spigle C, Fraser AM, Mineau GP, Courdy SJ. "Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database." J Am Med Inform Assoc. 2013 Jan 1;20(1):164-71. doi 10.1136/amiajnl-2012-001050 [Abstract] 2.a. Three most recent (or high impact) studies published in peer-reviewed journals 3) Xu J, Lange EM, Lu L, Zheng SL, Wang Z, Thibodeau SN, Cannon-Albright LA, Teerlink CC, Camp NJ, Johnson AM, Zuhlke KA, Stanford JL, Ostrander EA, Wiley KE, Isaacs SD, Walsh PC, Maier C, Luedeke M, Vogel W, Schleutker J, Wahlfors T, Tammela T, Schaid D, McDonnell SK, Derycke MS, Cancel-Tassin G, Cussenot O, Wiklund F, Gronberg H, Eeles R, Easton D, Kote-Jarai Z, Whittemore AS, Hsieh Cl, Giles GG, Hopper JL, Severi G, Catalona WJ, Mandal D, Ledet E, Foulkes WD, Hamel N, Mahle L, Moller P, Powell I, Bailey-Wilson JE, Carpten JD, Seminara D, Cooney KA, Isaacs WB; International Consortium for Prostate Cnacer Genetics. "HOXB13 is a susceptibility gene for prostate cancer: results form the International Consortium for Prostate Cancer Genetics (ICPCG)." Hum Genet. 2013 Jan;132(1):5-14. doi: 10.1007/s00439-012-1229-4 [Abstract] 2.b. (Y/N) Have researchers conducted studies that involve longitudinal (multiple Yes values rather than one time) follow-up? Brown SM, Jones JP, Aronsky D, Jones BE, Janspa MJ, Dean NC. "Relationships among initial hospital triage, disease 2.b.i. What is the evidence? progression and mortality in community-acquired pneumonia." Respirology. Nov. 2012;17(8):1207-13. doi: 10.1111/j.1440- 1843.2012.02225.x. 2.b.ii. (Y/N) Can researchers conduct follow- up or ongoing observation from existing No reports by passively reviewing data rather than actively pulling it? 2.b.ii.1. How do researchers standardize those data items? (e.g., how do researchers Not applicable standardize survey type questions over a period of time?) 2.c.i. (Y/N) Are healthcare organizations (hospitals, outpatient centers) actively Yes participating or engaging in research activities conducted by the network? 2.c.ii. How? (Examples: by referring patients, Giving access to EHR data giving access to EHRs, etc.) 2.d.i. (Y/N) Have there been any randomized control trials using the data collected in the Yes network? 2.d.i.1. What is the evidence? Not available 3.a. (Y/N) Does the network have biobanks? No 3.b. What types of biospecimens are Not applicable collected? 3.c. What types of analysis are done on them? Not applicable 3.d. (Y/N) Do researchers in the network Yes collect biospecimens for research purposes? 3.d.i. What types of analyses do they conduct Genome sequencing, identify biomarkers on them? 3.d.ii. Were they able to link the analysis/research results back to patient Yes outcomes? 4.a. What type of security technology does firewalls, HIPAA review the network use? 4.b.i. (Y/N) Are queries distributed via a Yes central hub? 4.b.ii. What is the architecture of the query Not available distribution? 4.c.i. (Y/N) Does the network use standardized terminologies (i.e., ICD-9, Yes SNOMED, etc.)?

4.c.ii. Which terminologies? CPT, ICD-9, Diagnosis Related Group codes (DRG) 4.d.i.(Y/N) Does the network use a common data model (CDM)? No

4.d.ii. Which CDM is used? Not applicable 4.d.iii. How are the data transformed and Not applicable mapped?

203 Criteria Answers 4.e.i. (Y/N) Does the network collect additional fields to help with analysis and Yes interpretation (metadata)?

4.e.i.1. What standards, possibly home grown, are used? If home grown, is there a Not available way to map back to standards? (Data Dictionary?) Family History (Genealogy File and Ancestral File), Cancer Records (Utah Cancer Registry, Cancer Data Registry of Idaho), 4.f. List the types of data that are being Vital Records (Birth and Death Certificates, Marriage and Divorce Records), Utah Driver License, Social Security Death collected or accessed and incorporated into Index, Voter Registration, Patient visits, demographic information, facility code, admission date, discharge date and status, the network (e.g., EHR data, claims, patient- principal diagnosis code, other diagnosis codes, CPT-4 or principal procedure codes, other procedure codes, procedure reported outcomes, etc.). coding method, total charges, primary payer, secondary payer, third payer. Claims data, hospital code, principal diagnosis and principal procedure codes, eight (maximum) other diagnosis and other procedure codes, an external injury E-code, admit and discharge information, mortality risk codes, and payer category 4.g.i. (Y/N) Does the network use natural language processing? No

4.g.ii. What applications (e.g., UIMA, cTAKES, NegEx, MetaMap, many different parsers, Not applicable etc.) or approaches (examples are machine learning, rule-based) are being used? 4.h.i. (Y/N) Are data aggregated before the data leave the local site and are shared with No the network? 4.h.ii. How are the data transformed (i.e., based on what criteria are the data Not applicable aggregated)? 4.i. What data (statistical) analysis tools, if any, are available for researchers through the Kinclass and Dynaped network?

4.j.i. (Y/N) Are administrative, billing, and/or clinical records integrated into longitudinal patient-level data? (Are administrative, Yes billing, and clinical records kept in individual places or lumped in with patient-level data?)

4.j.ii. What informatics tools are used? Not applicable

204