<<

SCIENTISTS DECONSTRUCT COMPLEXITY THROUGH AND TRANSCRIPTOME ANALYSIS

At Cold Spring Harbor Laboratory, scientists used ‘Genome Gymnastics’ SMRT® to decode one of the most challenging cancer ever encountered. Along SK-BR-3 wasn’t chosen at random. The Her2-amplified the way, they built a portfolio of open-access analysis breast cancer line was originally isolated in 1970 and has been used extensively in preclinical research tools that will help researchers everywhere make structural for Her2-targeting therapeutics, such as Herceptin. variation discoveries with long-read sequencing data. Cited in hundreds of papers, the cell line is dramatically rearranged, possibly because it came from a lung When Mike Schatz realized a few years ago that metastasis rather than from a primary tumor. his PacBio® System had reached the throughput needed to process human genomes, he decided to The cell line was also well known at Cold Spring Harbor, give it a real challenge: the incredibly complicated, where scientists had used it for other projects and were massively rearranged SK-BR-3 breast cancer cell already familiar with its peculiarities. The widespread line. The genome consists of 80 chromosomes, rearrangements and fusions appealed to Schatz, who believes that such complication may be characteristic and that’s just the tip of the complexity iceberg. of some . “That’s really been the driver behind “We were really interested in sequencing a human why we’re so interested to study it,” he says, noting that genome that would be maximally impactful and these extremely rearranged cancer genomes tend to that was aligned with our research interest in cancer be associated with the poorest patient outcomes. genomes, where it’s been well documented that structural variations play a major role,” says Schatz, now an associate research professor of computer “We’re at the point where, if you’re trying to at Johns Hopkins University and an adjunct associate assemble a genome de novo, be it for cancer or professor of quantitative biology at Cold Spring model , PacBio has really emerged as Harbor Laboratory, where the analysis took place. He notes that despite its importance, structural variation the best technology.” has not been thoroughly studied because short-read sequencers cannot reliably identify these large genomic elements. “One of the really special properties about De novo sequencing and assembly were the first steps in the PacBio Sequencer is in addition to being able making sense of the SK-BR-3 genome. With 72-fold SMRT to call SNPs or small variants, we also get to look for Sequencing coverage, “we got an outstanding assembly large variants such as structural variation,” he says. of this genome even though it’s so complicated,” Schatz says, citing a contig N50 size of 2.5 Mb compared to But as Schatz and his collaborators at Cold Spring Harbor a state-of-the-art short-read assembly with a contig Laboratory and the Ontario Institute for Cancer Research N50 of just 3 kb. “That’s nearly a thousand-fold more delved into this work, they realized that existing variant contiguous going from short-read to long-read callers were tailored to short-read data. To make the assemblies, and it’s through that improved assembly most of the large amount of long-read information they that the majority of structural variants were detected.” were generating, the team wrote a suite of new analysis Using custom-built analysis tools including variant tools optimized for SMRT Sequencing data. “The tools callers Sniffles, by Schatz lab member Fritz Sedlazeck, catering to short-read data just aren’t made to capture the and Assemblytics, by Nattestad, the scientists found awesome information that we can now take advantage more than 10,000 structural variants in the SK-BR-3 of,” says Maria Nattestad, a graduate student in Schatz’s genome ranging in size from 50 bases to millions of base lab who wrote several of the new algorithms. “Building pairs long. “That’s an important result, because other our own tools was really the only way to go here.” studies of cancer genomes with short-read technology only capture a small fraction of that,” Schatz says. Those tools, which are especially important for understanding structural variation, are now being Assemblytics, an assembly-based variant caller, publicly released to fuel further SMRT Sequencing detected insertions and deletions of hundreds of novel studies of human genomes. Also coming out soon is Alu elements throughout the genome, among other the team’s detailed analysis of the SK-BR-3 genome and structural variants. “That’s a good confirmation that we’re transcriptome, which includes a high-quality assembly seeing biologically relevant results,” Nattestad says. as well as a new understanding of gene fusions, the Another major discovery involved meticulously evolutionary history of this cell line, and more. characterizing the complicated process that led to

www.pacb.com/cancer 1 www.pacb.com/cancer the cell line’s Her2 oncogene level as they saw in the DNA. “In the amplification. The background of Iso-Seq analysis, we see many tens the amplification had never been of thousands of novel isoforms,” determined, but a new tool called Schatz says. “That’s a really strong SplitThreader unfolded the region’s testament to the long reads, which history. “We see that there’s been fully capture an isoform in one this fascinating set of processes sequence — unlike short reads, where where that gene region has been you have to infer isoform structure.” fused into other chromosomes, Her2, for instance, proved to be and from there have been a whole one of the most complicated series of further amplifications and at the transcriptome level, with inversions,” Schatz says. “We call dozens of novel isoforms detected. that genome gymnastics. It was “That was all resolved through a very sophisticated process that PacBio Iso-Seq and had never been resulted in this current state.” documented before,” he adds. SplitThreader works by “threading Gene fusions were a major focus of through the genome and finding the the team’s transcriptome analysis. likely variants that are connected to In addition to validating many each other” to identify the most likely fusions previously reported in this evolutionary path, says Nattestad, cell line, they also found several who developed the tool. “It shows novel ones, including some from the how the different amplifications translocation between and variants have come together chromosome 8 and the Her2 oncogene on chromosome 17. Maria Nattestad is a graduate to amplify the Her2 oncogene.” student in the Schatz lab. By reconstructing the history of Perhaps more importantly, the regions like this, she adds, it may researchers discovered the genetic eventually be possible to find entirely mechanism responsible for some Moving Forward new classes of cancer variations. gene fusions that had always been a mystery: scientists could see them in Conquering SK-BR-3 has Schatz and Two Variants, One Fusion RNA, but no direct link between the his team eager to tackle additional genes had ever been found in cancer genomes, both from cell The team also used the Iso-Seq™ the DNA. lines and from patient samples. method to analyze the full “We broke into a lot of new territory transcriptome of SK-BR-3, finding “We were able to figure out why through this project, especially at as much complexity at the RNA this happened, and it’s because the level of algorithmics where we there’s more than one fusion in the invented entirely new structural genome necessary for bringing variation callers and new systems these two genes right next to each for analyzing them,” Schatz says. other,” Nattestad says. “In four “Now we’re really excited to apply cases, we found fusions that take this to additional samples and place through two variants. You go see if the complicated oncogene from one gene into an intermediate amplifications that have taken sequence through one variant, span place in SK-BR-3 are also driving along there for a little while, split mechanisms in other cancers.” off again through a second variant, and then you hit the next gene.” “We’ll continue to focus on samples that have complicated structural The process perfectly explains variation because that’s where a these gene fusions, but was not lot of our strength lies,” Nattestad previously discovered because says, noting that BRCA1-mutated scientists didn’t have access to cancers tend to fit those criteria. full-length isoforms of the gene product nor the high-quality genome Future projects will rely on SMRT sequence made possible by the Sequencing; indeed, the Schatz long reads. “If you only look at one lab is anticipating installation of variant at a time, you don’t capture the higher-throughput Sequel™ much of the complexity or much System to meet demand. “We’re at of the biological story of what this the point where, if you’re trying to cancer is undergoing,” Nattestad assemble a genome de novo, be Mike Schatz, Ph.D., is a computational biologist adds. Using the Iso-Seq analysis, it for cancer or model organisms, and an expert at large-scale computational examination of DNA sequencing data. on the other hand, “shows us the PacBio has really emerged as the sheer complexity of this genome.” best technology,” Schatz says.

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2016, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacificbiosciences.com/licenses.html. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.

PN: CS115- 033116