
Feature to share genome data, the field is generally viewed as generous compared with other disciplines. Still, the repositories meant to foster sharing often present barriers to those HOW A FIELD uploading and downloading data. Researchers tell tales of spending months or years track- ing down data sets, only to find dead ends or unusable files. And journal editors and funding agencies struggle to monitor whether scien- BUILT ON DATA tists are sticking to their agreements. Many scientists are pushing for change, but it can’t come fast enough. Clinical genomicist Heidi Rehm says the SHARING BECAME field has come to recognize that big scientific advances require vast amounts of genomic data linked to disease and health-trait data. “But it isn’t compatible and shareable,” says Rehm, based at Massachusetts General Hos- A TOWER OF BABEL pital in Boston and the Broad Institute in Cambridge. “How do we get everyone in the The immediate and open exchange of world — patients, clinicians and researchers information was key to the success of — to share?” Barriers everywhere the Human Genome Project 20 years Sequencing the human genome made it easier ago. Now the field is struggling to keep to study diseases associated with mutations in a single gene — Mendelian disorders such as its data accessible. By Kendall Powell non-syndromic hearing loss2 (see page 218). But identifying the genetic roots of more com- mon complex diseases, including cardiovascu- lar disease, cancer and other leading causes of death, required the identification of multiple genetic risk factors throughout the genome. n July 2000, David Haussler remembers scientists hoarded the data they were produc- To do this, researchers in the mid-2000s began crying as he watched the first fully assem- ing, it would derail the project. So in 1996, the comparing the genotypes of thousands to hun- bled human genome streaming across HGP researchers got together to lay out what dreds of thousands of individuals with and his computer screen. He and Jim Kent, a became known as the Bermuda Principles, with without a specific disease or condition, in an graduate student at the time, built the all parties agreeing to make the human genome approach known as genome-wide association first-ever web-based tool for exploring the sequences available in public databases, ideally studies, or GWAS. three billion letters of the human genome. within 24 hours — no delays, no exceptions. The approach proved popular — more than They had published the rough draft of the Fast-forward two decades, and the field 10,700 GWAS have been conducted since Igenome on the Internet a mere 11 days after is bursting with genomic data, thanks to 2005. And that has produced oceans of data, finishing the herculean task of stitching it all improved technology both for sequencing says Chiea Chuen Khor, a group leader at the together — a task assigned to them as part of whole genomes and for genotyping them Genome Institute of Singapore, who studies the Human Genome Project (HGP), the inter- by sequencing a few million select spots to the genetic basis of glaucoma. A study with national collaboration that had been working quickly capture the variation within. These 10,000 people, looking at 1 million genetic towards this goal for a decade. It would still be efforts have produced genetic readouts for markers in each, for example, says Khor, would several months before the group published its tens of millions of individuals, and they sit generate a spreadsheet with 10 billion entries. analysis of the genome in the pages of Nature1, in data repositories around the globe. The Most of these individual-level genomic but the data were ready to share. principles laid out during the HGP, and later data now live in ‘controlled-access’ databases. “There it was, going out into the whole adopted by journals and funding agencies, These were set up to deal with the sticky legal world,” recalls Haussler, scientific director of meant that anyone should be able to access and ethical concerns that come with genomic the University of California Santa Cruz Genom- the data created for published genome studies data that have been linked to personal infor- ics Institute. Soon, every person in the world and use them to power new discoveries. mation — ‘phenotype data’ that can include could explore it — chromosome by chromo- If only it were that simple. health-care records, disease status or lifestyle some, gene by gene, base by base — on the web. The explosion of data led governments, choices. Even in anonymized data sets, it’s It was a historic moment, says Haussler. funding agencies, research institutes and pri- technically possible that individuals can be Before the HGP launched in the early 1990s, vate research consortia to develop their own reidentified. So, controlled-access databases “there had not been a serious discussion about custom-built databases for handling the com- vet the researchers seeking access and ensure data sharing in biomedical research”, Haus- plex and sometimes sensitive data sets. And that the data are used only for the purposes sler says. “The standard was that a successful the patchwork of repositories, with various that participants consented to. investigator held onto their own data as long rules for access and no standard data format- The US National Institutes of Health (NIH) as they could.” ting, has led to a “Tower of Babel” situation, requires its grant recipients to place GWAS That standard clearly wouldn’t work for such says Haussler. data into its official repository, the Database a large and collaborative effort. If countries or Although some researchers are reluctant for Genotypes and Phenotypes, or dbGaP. 198 | Nature | Vol 590 | 11 February 2021 ©2021 Spri nger Nature Li mited. All ri ghts reserved. ©2021 Spri nger Nature Li mited. All ri ghts reserved. ILLUSTRATION BY ANA KOVA ANA BY ILLUSTRATION Nature | Vol 590 | 11 February 2021 | 199 ©2021 Spri nger Nature Li mited. All ri ghts reserved. ©2021 Spri nger Nature Li mited. All ri ghts reserved. Feature European researchers can deposit data into UK BioBank, which holds genomic data on Sussman explains that the journal’s editors the European Genome-phenome Archive 500,000 people, are still invaluable. Mathias will work through data-sharing obstacles with (EGA) housed at the European Bioinformat- is fiercely protective of the participants in authors on a case-by-case basis to find solu- ics Institute (EMBL-EBI) in Hinxton, UK. Simi- TOPMed and sees merit in the protection tions. This can go as far as asking authors to larly, other large generators of genomic data, that controlled access provides. Like many, reapply for approval from their institutional such as the for-profit company 23andMe in she would like to see the repositories better review board, going back to participants to Sunnyvale, California, and the non-profit resourced. But, she says, “I am an advocate for reobtain their consent or rerunning an analysis Genomics England in London, operate their the checks and balances”. after removing unshareable data. The journal own controlled-access databases. And others are happy to have access, even has turned away authors who state upfront But uploading data into some of these repos- if it is hard to obtain. “It’s out of our scope to that they cannot share data. “The community itories often takes a long time. As a result, says generate that amount of data,” says Melanie and the funders demand this transparency and Khor, the data are often “minimal and sparse”, Bahlo, who runs a statistical-genetics lab at reproducibility,” she says. because researchers are depositing just what’s the Walter and Eliza Hall Institute of Medical But even when authors do agree to share required to be compliant. Research in Melbourne, Australia. Her lab is data, editors and reviewers have limited ability Sometimes the data get stored in more than more than willing to wade through the digital to confirm that it is being done. They might not one place, and that creates other challenges. paperwork to use the dbGaP, and has done so have the time — or the access to controlled-ac- Rasika Mathias, a genetic epidemiologist for more than ten projects. She also recently cess databases — to check data quality, format- at Johns Hopkins University in Baltimore, spent a fruitless six months chasing after a data ting or completeness. Maryland, who studies the genetics of asthma set that was supposed to be publicly available Trenkmann says funders should require in people of African ancestry, says that decen- through a research institute’s data portal, but researchers to have a concrete data-sharing tralization is a huge problem. She is part of wasn’t. plan from the outset of a project. This could TOPMed, a precision-medicine programme “Nothing is harder than getting data out of help to shift attitudes so that researchers see run by the NIH’s National Heart, Lung, and dbGaP and EGA,” says Khor, “unless it’s getting sharing as a duty, she says. Blood Institute. It consists of more than it from a researcher who is unwilling to share.” An NIH-wide data-sharing policy to be 155,000 research participants across more implemented in January 2023 does just that. than 80 studies and shares its data in sev- The sharing police It requires all NIH grant applicants to put a eral repositories, including dbGaP and some Twenty years on from the HGP, there is no Data Management and Sharing (DMS) Plan into university-based portals. specific universal policy that says research their grant proposals and allows researchers “It’s a remarkable resource,” says Mathias. groups have to share their human-genome to allocate some of their budget to the task.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-