Whole Genome Sequencing (WGS) 101
Total Page:16
File Type:pdf, Size:1020Kb
Whole Genome Sequencing (WGS) 101 nderstanding the genetics of bacteria has been they serve functions other than describing the U a priority of the industry for many years. sequence of a gene. Bacteria also contain plasmids, Genetic typing reveals whether bacteria have the which can be thought of as mini-chromosones, that genes for certain traits, such as antibiotic resistance, can move from one bacterium to another and often or have the genes that make them pathogenic, i.e., carry genes coded for antibiotic resistance and capable of making people sick. Genetic typing also virulence. Together the chromosome and plasmids, if can show how closely related bacteria are to one any, constitute a bacterial genome. To sequence the another, which can help determine the source of whole genome means to identify all the DNA bases pathogenic bacteria. Genetic typing has commonly from the chromosome and plasmids in the correct been done using pulsed-field gel electrophoresis order, and then, by comparing to a previously (PFGE), a method that uses enzymes to break the sequenced genome, to identify what genes are bacterial DNA into pieces of different lengths. The present and what they code for that give the pieces of DNA are then separated on a gel and the bacteria specific characteristics. different lengths form different bands on the gel. PFGE banding patterns tell us how similar bacteria Whole genome sequencing is currently done with are to one another. If two bacteria have the same two main technologies. Short-read technology gives banding pattern, they are considered identical and continuous bases of sequence up to 300 bases. may have the same source. But PFGE testing doesn’t Long-read technology gives thousands of bases of provide as much information as the whole DNA sequence. Short-read technology is popular genome sequence. Thus, as the ability to sequence because it’s faster and lower cost. Long-read the whole genome has become more cost effective technology is more expensive, so it is not used for over the last decade, researchers and public health routine WGS-based typing. However, long-read agencies are beginning to use whole genome technology can recover the complete genome and in sequencing (WGS) technology for genetic typing of the correct order. Short-read technology provides a bacteria, including pathogens relevant to food majority of the genome; however, when put safety. A commonly used definition for WGS is the together, regions are missing or can’t be resolved, process of using a modern DNA sequencing so gaps are introduced that split the genome into platform with the goal of sequencing the majority of smaller pieces. Once the initial sequence is an organism’s genome. obtained, data analysis and quality control software help obtain the final sequence. In contrast to humans and other organisms that have multiple linear chromosomes, the genome of most The main objective of WGS analysis is the bacteria form a single circular chromosome. Four identification of genomic differences between possible DNA molecules, called nucleotides or bases bacterial strains. The main difference of interest is (A, G, T, C), are strung together to code for genes. when there is a single nucleotide (A, G, T or C) As an example, an E. coli chromosome is about 4.7 change between two genomes when compared at million bases which code for about 4,300 genes. To the same location. These are called single put this in perspective, human genomes have more nucleotide polymorphisms (SNPs). The number of than 3 billion bases and 20,000 genes. All genomes SNPs indicates how closely related two bacterial also contain regions of “non-coding” DNA because strains are. The number of SNPs required before two strains are considered “different” is still under interviews linking illnesses with consumption of a debate because the use of WGS for the purpose of certain food) are still essential to conclusively genetic typing is still evolving as more is learned identify the source of a foodborne disease about its advantages and limitations. Data analysis outbreak. Rigorous comparisons have yet to be programs that consider the number of SNPs and conducted to assess how often strains with identical their location in the genome are used to draw WGS can be found in widely separate locations relatedness trees that provide a visual of how and products that are unrelated, which emphasizes closely related different strains are. These trees are the continued need for epidemiological data in usually referred to as phylogenies. outbreak investigations. Further research on how to decide if two strains are the same is a critical need, While WGS-based methods should allow for especially in food-associated environments, in order significant improvement in foodborne disease to facilitate improved interpretation of WGS data outbreak detection and source traceback compared in the context of foodborne disease outbreak to PFGE, strong epidemiological data (patient investigations. Writers: Tommy Wheeler, [email protected] Henk den Bakker, [email protected] Martin Wiedmann, [email protected] Reviewers: Zaid Abdo, [email protected] Jim Bono, [email protected] .