Whole Genome Sequencing as a Tool to Study the Genomic Landscape of Pathogens

Dissertation by

Sharif Matouq Ahmed Hala

In Partial Fulfillment of the Requirements

For the Degree of

Doctor of Philosophy

King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia

© June 2020 Sharif Matouq Ahmed Hala All Rights Reserved

2

EXAMINATION COMMITTEE

The dissertation of Sharif Hala is approved by the examination committee.

Committee Chairperson: Prof. Arnab Pain Committee Members: Prof. Jasmeen Merzaban, Prof. Niveen Khashab, Dr Michael Carr [External]

3

ABSTRACT

Whole-Genome Sequencing as a Tool to Study the Genomic Landscape of Pathogens

Sharif Hala

In healthcare settings and beyond, the ESKAPE pathogens (Enterococcus faecium,

Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) among other pathogens exchange antibiotic resistance and virulence factors and emerge as new infectious clones. According to the

Saudi General Authority for Statistics (stats.gov.sa), Saudi Arabia is a country where more than 27 million pilgrims meet in annual continual mass-gathering events. This massive influx of people could introduce novel pathogens to the community that could not necessarily be detected with traditional culture-dependent clinical microbiological tests.

Conventional clinical microbiology and environmental pathogen detection methods have had many limitations and narrow search scope. These methods can only target known and culturable pathogens. Over the past decade, applications of next-generation sequencing

(NGS) and bioinformatics tools have revolutionized the way pathogens are detected and their relevant phenotypes such as clonal types, antibiotic resistance are predicted to aid in clinical decision making as additional practice to traditional clinical microbiology-based testing protocols.

The aim of this study was to apply whole-genome sequencing (WGS) and bioinformatic analysis tools on clinical samples and bacterial isolates in order to pave the way for transforming current clinical microbiology practices in a tertiary referral hospital in Jeddah,

Saudi Arabia.

4

My attempt to utilize WGS as a tool on pathogenic strains in this study combined with the clinical data has resulted in discovering a silent outbreak of an emerging hypervirulent strain of Klebsiella pneumoniae (Chapter 2). Analysis of the strains antimicrobial profiles genetically has yielded the first characterization of a misidentified Klebsiella quasipneumoniae harboring plasmid-mediated carbapenemases of Klebsiella pneumoniae carbapenemases (KPC) (Chapter 3). Similarly, I was able to study mobile colistin resistance genes in the isolates and identify a novel occurrence of mcr-1 and mcr-8 (Chapter

4). I applied clinical metagenomic protocol on an intestinal biopsy of an inflammatory bowel disease patient with Crohn’s disease, where I identified an association of three co- occurring and an actively replicating non-tuberculosis mycobacteria (Chapter 5). The deployment of whole-genome sequencing and metagenomic in infectious disease surveillance and diagnostics could prove beneficial in limiting epidemics and detect transmission patterns of antimicrobial-resistant genes.

5

ACKNOWLEDGEMENTS

First of all, I would like to thank ALLAH almighty, the most merciful and compassionate, for his support, help and generosity.

Then I want to highlight that all this work would not have been possible, were it not for the help, guidance, and support of numerous magnificent individuals.

I would like to express my gratitude to the committee chair, and my supervisor Prof. Arnab

Pain, for all his efforts and lessons, I feel truly blessed to have him as my teacher as well as the dean of the BESE and the great committee members who graciously agreed to examine me during the tough times of the current pandemic.

I am grateful to all the pathogen genomics laboratory, where I was given the golden opportunity to do this incredible clinical and genomic work.

A special thanks are due to my mother “Tawadod Al Eitany” and my father “Matouq Hala” for their continuous prayers and support during my studies. I would like to thank my beautiful wife Duaa Ahmed, and my friends for their continuous support throughout my studies and helping me reach my dream.

I would like to thank each member of KAUST staff for their aid during and every aspect of my Ph.D. project; they have made the difference in advancing all the projects. I would like to thank the committee of the IBEC in KAUST for their approval of the projects.

In the end, I would like to take this opportunity to express my gracious gratitude towards all those who have helped me make these projects possible.

6

TABLE OF CONTENTS

EXAMINATION COMMITTEE ...... 2

ABSTRACT ...... 3

ACKNOWLEDGEMENTS ...... 5

TABLE OF CONTENTS ...... 6

LIST OF ABBREVIATIONS...... 9

LIST OF ILLUSTRATIONS ...... 10

LIST OF TABLES ...... 11

Chapter 1: Introduction ...... 12 1.1 The Threat of Antimicrobial Resistance ...... 12 1.2 Defining Resistance and the Current Threat ...... 12 1.3 Local and Global Surveillance ...... 14 1.4 Applications of Whole-Genome Sequencing (WGS) ...... 17 1.4.1 AMR Surveillance Using WGS ...... 19 1.4.2 Metagenomics ...... 29 1.4.3 Bioinformatics Applications in Clinical Metagenomics ...... 32 1.4.4 Other Issues of Using WGS in Clinical Metagenomics for Diagnosis ...... 36 1.5 Research Aims and Objectives ...... 38

Chapter 2: An Outbreak of Hypervirulent and Multidrug-Resistant Clone (ST2096) of Klebsiella pneumoniae Silently Transmitting in a Saudi Arabia Western Region Hospital: A Clinical and Molecular Surveillance Study ...... 46 2.1 Summary ...... 47 2.1.1 Background and Purpose ...... 47 2.1.2 Methods...... 47 2.1.3 Findings...... 47 2.2 Research in Context ...... 48 2.2.1 Review and Evidence ...... 48 2.2.2 Added Value ...... 48 2.2.3 Repercussions ...... 49 2.3 Introduction ...... 50 2.4 Methods ...... 53 2.4.1 Surveillance and Outbreak Identification ...... 53 2.4.2 Data Collection ...... 53 2.4.3 Bacterial Phenotype ...... 54 2.4.4 DNA Extraction and Whole-Genome Shotgun Sequencing ...... 55 2.4.5 Data Availability ...... 55 2.4.6 Genome Assembly and Characterization ...... 55 2.4.7 Single Nucleotide Polymorphism (SNP) Detection and Phylogenetic Analysis ...... 56

7

2.4.8 Transmission Maps ...... 57 2.4.9 Statistical Analysis ...... 57 2.4.10 Ethics ...... 57 2.4.11 Role of Funding Source ...... 58 2.5 Results ...... 59 2.6 Discussion ...... 62

Chapter 3: First Report of Klebsiella quasipneumoniae Harboring blaKPC-2 in Saudi Arabia ...... 74 3.1 Abstract ...... 75 3.1.1 Background ...... 75 3.1.2 Methods ...... 75 3.1.3 Results...... 75 3.1.4 Conclusion ...... 76 3.2 Background ...... 77 3.3 Materials and Methods ...... 78 3.3.1 Clinical Setting and the Case ...... 78 3.3.2 Genome Sequencing, Assembly, and Analyses ...... 79 3.3.3 Phylogenetic Analysis and Annotation ...... 80 3.4 Results ...... 81 3.5 Discussion ...... 82

Chapter 4: Co-occurrence of mcr-1 and mcr-8 genes in a multidrug-resistant Klebsiella pneumoniae from a 2015 clinical isolate in Saudi Arabia ...... 95 4.1 Introduction ...... 96 4.2 Materials and Methods ...... 97 4.3 Results and Discussion ...... 99

Chapter 5: Crohn’s Disease Patient Infected with Multiple Co-occurring Nontuberculous Mycobacteria ...... 107 5.1 Case ...... 108

Chapter 6: Discussion & Conclusion ...... 112 6.1 Infectious Disease Surveillance and WGS ...... 113 6.1.1 Clonal Lineages of Hypervirulent and MDR K. pneumoniae ...... 114 6.1.2 Studying the Transmission Patterns of ST2096 ...... 115 6.2 Unique Co-occurrence of mcr Genes in K. pneumoniae ...... 115 6.3 Emergence of Carbapenem-Hydrolyzing β-lactamase KPC ...... 117 6.4 Clinical Metagenomics may Enable Broader Pathogen Detection...... 118 6.4.1 Mycobacteria Associated with Crohn’s Disease ...... 119 6.5 Future Experiment and Lessons for Pathogen Identification ...... 120 6.6 Future Perspectives: ...... 121

APPENDICES ...... 125

8

Chapter 2 ...... 125

Chapter 4 ...... 126 Supplementary Material ...... 126

Chapter 5 ...... 131 Supplementary Data ...... 131

9

LIST OF ABBREVIATIONS

AMR Antimicrobial Resistance AST Antimicrobial Susceptibility Testing blaKPC β-lactamases Klebsiella pneumoniae carbapenemase gene blaNDM β-lactamases New Delhi metallo-beta-lactamase gene blaOXA β-lactamases oxacillinases gene blaSHV β-lactamases sulfhydryl variable gene blaCTX-M β-lactamases cefotaxime-Munich gene blaTEM β-lactamases Temoniera gene BLAST Basic local alignment search tool CARD Comprehensive antibiotic resistance database CC Clonal complex CDC US Centers for Disease Control and Prevention CG Clonal group cgMLST Core Genome Multi-Locus Sequence Typing CLSI Clinical and laboratory standards institute CRE Carbapenem-Resistant DNA Deoxyribonucleic acid ESBL Extended Spectrum β-Lactamase EUCAST European committee on antimicrobial susceptibility testing FDA Federal Drug Administration GLASS Global Antimicrobial Resistance Surveillance System HAI Hospital-acquired infections HCW Healthcare workers hvKP Hypervirulent (Hypermucoviscous) K. pneumoniae ICU Intensive care unit Inc Incompatibility group IPC Infection prevention and control KPC Klebsiella pneumoniae carbapenemase MAG Metagenomic assembled genome mcr Mobile colistin resistance gene MDR Multidrug resistant MENA Middle East and North Africa MERS-COV Middle East respiratory syndrome coronavirus MIC Minimum inhibitory concentration MLST Multi-locus sequence typing MNGHA The Ministry of National Guard Health Affairs NCBI National Center for Biotechnology Information NGS Next-generation sequencing NTM Non-tuberculosis mycobacteria OXA Oxacillinase types PDR Pandrug resistant SNP Single nucleotide polymorphism ST Sequence type wgMLST Whole-Genome Multi-Locus Sequence Typing WGS Whole-Genome Sequencing WHO The World Health Organization XDR Extensively drug resistant

10

LIST OF ILLUSTRATIONS Figure 1.1 ...... 16 Figure 1.2: ...... 17 Figure 1.3: ...... 18 Figure 1.4: ...... 23 Figure 1.5: ...... 30 Figure 1.6: ...... 32 Figure 1.7: ...... 35 Figure 2.1: ...... 66 Figure 2.2: ...... 67 Figure 2.3: ...... 68 Figure 2.4: ...... 69 Figure 2.5: ...... 70 Figure 2.6: ...... 71 Figure 3.1: ...... 88 Figure 3.2: ...... 90 Figure 3.3: ...... 91 Figure 4.1: ...... 103 Figure 4.2: ...... 104

11

LIST OF TABLES

Table 1.1: ...... 14 Table 1.2: ...... 27 Table 3.1: ...... 86 Table 3.2: ...... 87 Supplementary Table 2.1: ...... 125 Supplementary Table 2.2: ...... 125 Supplementary Table 3.1: ...... Error! Bookmark not defined. Supplementary Table 3.2: ...... Error! Bookmark not defined. Supplementary Table 3.3: ...... Error! Bookmark not defined. Supplementary Table 3.4: ...... Error! Bookmark not defined. Supplementary Table 3.5: ...... Error! Bookmark not defined. Supplementary Table 4.1...... 131

12

Chapter 1: Introduction

1.1 The Threat of Antimicrobial Resistance

Bacterial antimicrobial resistance (AMR) is a challenging threat for both developed and developing countries equally. In Europe, there was an estimated 25,000 infection-related deaths and evaluated to cost over 1.6 billion euro annually which will exponentially increase each year [1]. Similarly, in the United States, 23,000 mortalities occur annually, resulting from over 2 million infected cases costing over 55 billion US dollars [1, 2].

Globally, more than 700,000 people die annually associated with infective pathogens carrying AMR genes, and the number has been forecasted to increase to 10 million annual deaths by the year 2050 [3]. In particular, ESKAPE pathogens (Enterococcus faecium,

Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) have been instigating major health outbreaks worldwide through hospital-acquired infections (HAI) [4] and community-acquired infections [5, 6]. The decline in developing novel antibiotics and the notable increase in resistance to current antibiotics demonstrate a persisting challenge at hand. Therefore, the global action plan on AMR adopted in 2015 by all countries initiated by the World Health

Organization (WHO) among other entities declared the need for a national action plan [7].

Every participating country agreed to establish a local, national action plan to implement relevant policies that will aid in prevention, control, and monitoring AMR pathogens. In

March 2015, Saudi Arabia joined the fight against AMR, and the Saudi Ministry of Health developed a national action plan published by WHO [8].

1.2 Defining Resistance and the Current Threat

13

Bacterial pathogens differ in their defining terminology depending on the number of antibiotics agents and categories of resistance. Internationally accepted definitions are multidrug‐resistant (MDR), extensively drug‐resistant (XDR), and pandrug‐resistant

(PDR) organisms [9]. In Enterobacteriaceae, the criteria for the terminologies with regards to resistance to a defined set of antibiotics are listed in (Table.1) [9]:

1. MDR: resistant to ≥1 agent in ≥3 different antimicrobial categories listed in

(Table.1).

2. XDR: resistant to ≥1 agent in all of the antimicrobial agents listed in (Table.1)

except ≤2 categories.

3. PDR: resistant to all of the antimicrobial agents listed in (Table.1).

MDR pathogens are harmful even at low quantity since they can spread by several means in the health-care and community settings. High-risk resistant clones are the result of the frequent and sporadic use of antimicrobial agents, which in turn have left a few options of treatment regimens available as the last-resort antibiotics such as carbapenems, mainly β- lactam, followed by polymyxins (mainly colistin) [10]. Eventually, Gram-negative pathogens have developed resistance and left few treatment options available, including the last-resort antibiotics. Multiple carbapenem-resistant Enterobacteriaceae (CRE) and colistin-resistant phenotypes reports have been accumulating from both clinical cases and environmental samples [11]. In a short period, CREs and colistin-resistant pathogens have been associated with a high mortality rate, such as the outbreak of Klebsiella pneumoniae carbapenemase (KPC) reported in 1996 [12]. Reports of AMR genetic factors of the simillar category started surfacing in the Middle East and North Africa (MENA) region such as New Delhi metallo-β-lactamase-1 (NDM-1) in CRE pathogens and mobile colistin-

14 resistant gene (mcr) variants [8, 13]. Thus, the ESKAPE pathogens and their AMR profile have been on the WHO highest priority watch list [7]. Currently, multiple efforts are ongoing to identify the emerging resistance genes that confer resistance toward last resort antibiotics to repurpose mixtures of available drugs in the fight against MDR pathogens

[14].

Antimicrobial category Antimicrobial agent Gentamicin Tobramycin Aminoglycosides Amikacin Netilmicin Ceftaroline (approved only for , Klebsiella Anti‐MRSA cephalosporins pneumoniae, Klebsiella oxytoca) Antipseudomonal penicillins + β‐lactamase Ticarcillin‐clavulanic acid inhibitors Piperacillin‐tazobactam Ertapenem Imipenem Carbapenems Meropenem Doripenem Non‐extended spectrum cephalosporins; 1st and Cefazolin 2nd generation cephalosporins Cefuroxime Cefotaxime or ceftriaxone Extended‐spectrum cephalosporins; 3rd and 4th generation cephalosporins Ceftazidime Cefepime Cefoxitin Cephamycins Cefotetan Fluoroquinolones Ciprofloxacin Folate pathway inhibitors Trimethoprim‐sulphamethoxazole Glycylcyclines Tigecycline Monobactams Aztreonam Penicillins Ampicillin Amoxicillin‐clavulanic acid Penicillins + β‐lactamase inhibitors Ampicillin‐sulbactam Phenicols Chloramphenicol Phosphonic acids Fosfomycin Polymyxins Colistin Tetracycline Tetracyclines Doxycycline Minocycline Table 1.1: Antimicrobial categories and agents used to define MDR, XDR, and PDR for Enterobacteriaceae [9].

1.3 Local and Global Surveillance

Surveillance systems have been evolving beyond collecting clinical laboratory data into isolate-based metadata, which includes the clinical genetics combined with epidemiological information [15]. Global Antimicrobial Resistance Surveillance System

(GLASS), is the first collective effort to standardize antimicrobial resistance surveillance

15 and share data [15]. Nevertheless, there is a need for additional comprehensive genetic surveillances at health centers and mass-gathering locations to limit the threat of emerging high-risk clones and outbreaks [16, 17]. Examples of such outbreaks were as recent as the

Middle East respiratory syndrome coronavirus (MERS-CoV), which emerged in 2012 in

Saudi Arabia, the Bacillus anthracis bioterrorism-associated outbreak in the United States, and the Zika virus in the Americas in 2015/2016 [17] and the ongoing pandemic of Severe

Acute Respiratory Syndrome-Coronaviruses-2 (SARS-CoV-2), which started in Wuhan

City, China, in December 2019 [18]. According to the general authority for statistics in

Saudi Arabia, over 27 million pilgrims from around the world visit the two Holy Mosques in Mecca and Madinah for Umra and Hajj in 2018 [19]. The Saudi Ministry of Health has formed multiple partnerships with world-leading public health agencies such as the United

States Center for Disease Control and Prevention (CDC), and the United Nations health agency. Together and in accordance with the WHO action plan, the ministry established several entities in charge of detecting, controlling and reducing the impact of emerging diseases [20-22]. Consequently, in 2016 the Gulf Cooperation Council (GCC) has launched a strategic plan for combating antimicrobial resistance [8]. According to the tertiary centers in Saudi Arabia, the highest number of Gram-negative reported are Klebsiella pneumoniae and Acinetobacter baumannii [23]. Advanced surveillance reports of a Gram- negative organism, such as K. pneumoniae in Saudi Arabia, should depend on a comprehensive combination of clinical microbiology and molecular methods. There have been only a handful of studies performed on carbapenem-resistance in combination with colistin-resistance using genetic analysis in Saudi Arabia (Figure 1.2). Nonetheless, local clinical and microbiological laboratories still depend only on conventional microbial

16 detection methods to identify both environmental and clinical pathogens (Figure 1.1) [24].

Existing clinical surveillance and discovery systems detect pathogens either by the gold- standard culturing, organism-specific gene amplification, or protein markers (Figure 1.1)

[25]. The requested tests for a target pathogen is usually based on assumptions driven by symptoms leading to a narrow scope search [26]. Even though, less than 5% of the known bacterial strains could be cultured for clinical detection and many bacterial pathogens are unculturable [27, 28]. Therefore, genetic identification could provide aid to clinical laboratory identification.

FIGURE 1.1: This figure illustrates the laborious and time-consuming analysis of current practice in pathogen detection built on human interpretation and one test per one bug method [25].

17

K.pneumoniae Publications in Saudi Arabia

120

100

80

60

40

20

0 Surveilance Research Review Case Study Descriptive/Letter

FIGURE 1.2: A literature review of K. pneumoniae published articles in Pubmed with adding a filter of Saudi Arabia from 1985-2018 yielded 204 papers with a focus on surveillance of hospital clinical data.

1.4 Applications of Whole-Genome Sequencing (WGS)

Whole-genome sequencing (WGS) is a combined laboratory and bioinformatic process used to determine complete (or near complete) genomes, including coding and non-coding regions of a single or multiple organisms. WGS is an invaluable tool for practical and comprehensive pathogen identification and monitoring. Recent advancements in second and third-generation shotgun sequencing allowed for the identification of pathogen genomes and predicted their AMR genes of both known and unknown infections [26]. With the price of WGS dropping to an economically acceptable level and the ease of the sampling protocols, multiple healthcare centers around the world have begun including

WGS as an option to support their clinical sample analysis of difficult to diagnose cases with the etiologic agent (Figure 1.3). However, there is no set protocol for sample

18 preparation and data analysis as yet [26]. The capability of WGS to rapidly sequence pathogens and share data could permit faster responses to infectious outbreaks and it is expected to be the new gold standard for infectious disease diagnostics and epidemiology

[26]. If current bioinformatic analysis bottlenecks, speed of analysis, and ethical issues are resolved, it is highly likely that routine sequencing for precision medicine will be included in the clinical microbiological tests around the world very soon [26]. This will also be important in the identification of clonal diversity of ESKAPE pathogens, causing outbreaks.

A B

FIGURE 1.3: (A) In 2019, the National Human Genome Research Institute (NHGRI) in the United States produced an analysis of cost per raw megabase of DNA sequence over time. The y-axis is a logarithmic scale and the x-axis shows years of data collection. The out-pacing of Moore's Law from January 2008 is evident which was the time when sequencing centres transitioned from Sanger-based to next-generation DNA sequencing technologies. Moore's Law describes a trend in the computer hardware to double processing speed every two years which is a comparator which is widely-regarded as an efficient improvement in technology and the figure shows the dramatic reduction in sequencing costs making clinical applications far more feasible. (B) the picture shows the direct and straightforward workflow of NGS from samples [29, 30].

K. pneumoniae is one of the critical drug-resistant organisms that has been classified as an urgent threat to human health by the WHO, the CDC, and the UK Department of Health

[31]. K. pneumoniae, one of the ESKAPE pathogens, and a major opportunistic pathogen in Saudi Arabian healthcare facilities causes fatal infections [13]. Therefore, it is essential

19 to survey the prevalence and the emergence of Klebsiella-resistance profiles, especially to carbapenemases and colistin, in Saudi Arabia. In this study, we will focus on K. pneumoniae AMR and clonal diversity as it has been among the highest reported pathogens related to nosocomial infections in our hospital and in the region [32].

1.4.1 AMR Surveillance Using WGS Effective infection control and AMR surveillance can directly benefit from a comprehensive information resource on the emergence, spread, and persistence of local clones of AMR compared to data from internationally circulating clones. On the other hand, WGS could be applied to the patient sample directly to study the metagenome and identify the causative agents for suspected infections. Metagenomics can be defined as the sequencing of all possible organisms from all domains of life (archaea, bacteria, and eukaryote) and could also identify quasi-life such as DNA and RNA viruses using metatranscriptomics [33]. Data resulting from single-isolate sequencing surveillance or metagenomics explains the clonal diversity of an emerging AMR pathogen through studying multi-locus sequencing type (MLST) and phylogenetic analysis. Moreover, the sequenced data can identify resistance and virulence factors, as well as novel mutations in

AMR genes in strains such as the high-risk clones of K. pneumoniae.

1.4.1.1 Klebsiella pneumoniae

Klebsiella pneumoniae, from the Enterobacteriales order, is part of the Enterobacteriaceae family comprises 51 genera and 238 species, including Escherichia and Shigella genera, amongst many [34]. The bacteria was first identified from examining airways of patients by Edwin Klebs in 1875, who died from pneumonia, and later documented by Carl

Friedlander in 1882 [35]. Klebsiella spp. are among the most common opportunistic HAI, accounting for about one-third of overall Gram-negative infections globally [36]. It is

20 involved in extra-intestinal infections, including cystitis, pneumonia, urinary tract infections (UTI), surgical wound infections, and life-threatening septicemia [37]. AMR to various β-lactam antibiotics was first found in K. pneumoniae due to the production of extended-spectrum β-lactamases (ESBLs) [12]. ESBL-producing K. pneumoniae have additionally been reported to be resistant to various other antibiotics, including quinolones and other carbapenems [6, 9]. Since then, K. pneumoniae has been a major public health concern, and a significant threat linked to many outbreaks [38]. Some of the AMR genes linked to K. pneumoniae are major β-lactamases mainly: KPC, NDM, oxacillinase types

(OXA), and their variants [34]. K. pneumoniae has been discovered to undergo large-scale chromosomal recombination and exchange of plasmids with other bacteria to adapt a range of virulence factors and AMR genes [39].

Plasmids are the gateway to exchanging a broad array of AMR, virulence, and toxin genes, which can be directly linked to the clinical presentation of infections [38]. The size and copy number of plasmids can sometimes dictate their capabilities of carrying specific genes

[40]. With the rapid increase of long-read and shotgun WGS, many plasmid classification systems have been proposed, including Inc group, relaxase groups, and plasmid MLST

(pMLST) [41]. Plasmids classification based on their stability during transmission is evaluated by exploring their incompatibility categories. Incompatibility grouping represents “the inability of two plasmids to coexist stably over a number of generations in the same bacterial cell line” such that plasmids incompatible to each other are assigned the same group. The most used method covering both virulence and AMR plasmids, in this study, we will be referring to plasmid in our K. pneumoniae isolates by Inc groups classification as characterized by the National Collection of Type Cultures (London,

21

United Kingdom). The Inc group plasmids have been known to carry multiple AMR genes such as the β-lactamases KPC gene (blaKPC) that were found mainly in IncN, IncFII and the mobile colistin resistance gene (mcr-1) were found mainly in IncX4, and IncI2 [42, 43].

The Inc groups discovered in K. pneumoniae and was associated with carrying virulence and AMR genes are IncA/C, B/O, L/M, HI1, HI2, I, N, F, X and P [44]. However, plasmids frequently encompass a fluctuating number of transposons and different insertion sequences that could introduce variations in the Inc of the plasmid or introduce different

AMR genes in different Inc group plasmids, eventually making it difficult to assign correct classification [43]. These variations directly affect the detection of specific plasmids using polymerase chain reaction (PCR)-based approaches in epidemiological studies [43].

Generally, mobile genetic elements are important to identify, and plasmid-mediated horizontal transfer of the AMR genes makes it difficult to control the spread of hospital outbreaks of MDR K. pneumoniae [45]. Thus, WGS offers the benefits of tracking both chromosomal and plasmid-mediated AMR through whole-genome MLST (wgMLST)

[38]. WGS analysis of drug-resistant isolates can interpret the transmission mechanisms associated with resistance by correlating the information with the clinical data.

1.4.1.2 Clonal Diversity Using WGS

Clonal diversity of infective pathogenic strains of K. pneumoniae has been identified through MLST, core genome MLST (cgMLST) and wgMLST to be given a specific sequence type (ST) then placed in clonal complex (CC) lineages and clonal groups (CG) according to their genetic relatedness [38]. The analysis of the chromosomal genome of an isolate from a clonal subpopulation can correctly identify the cgMLST. The definition of

CC refers to CG of two or more independent STs, assigned by MLST, of the same organism

22 sharing identical alleles at six or more loci [46]. Fundamentally, MLST is assigned according to 500 bp sequences extracted from the seven house-keeping genes, gapA, infB, mdh, pgi, phoE, rpoB, and tonB [47]. Each allele (unique sequence variant) is allocated a different number, and every seven allelic combination is allocated a different ST. Thus, a change in a single nucleotide in a housekeeping gene affects the designation of a novel allele for that gene that consecutively results in a new mixture of alleles and, subsequently, a new ST. The cgMLST uses a similar allelic mixture method except with a considerably more extensive set of genes. The pattern of cgMLST assignment depends on the complexity of the dataset used, and the existing scheme uses about 1,143 core genes and hence higher accuracy than MLST [48].

The majority of the “high-risk” STs have been the result of combined global surveillance studies, which identified many examples such as K. pneumoniae CC258 as a primary AMR carrier [38]. However, current local surveillance reports in Saudi Arabia and the GCC region does not reflect a similar pattern of high-risk STs to the ones reported from AMR strains in the USA and Europe and yet such clones and AMR can spread globally (Figure

1.4) [13]. In China, ST11 is a common AMR carrying high-risk clone that was found spreading into several European countries [47]. The most dominant CC in the MENA region for K. pneumoniae is yet to be defined and it may require multiple extensive cohort studies to represent the emerging STs and link them genetically.

23

Global Clonal Diversity

3161 3033 2714 2542 2287 2077 1962 1824 1741 1626 1486 1440 1322 1180 1065 985 914 873 834 774 656 580

525 Sequence Sequence Type 493 471 433 409 384 342 321 283 258 219 187 133 101 48 36 26 15 1 1 10 100 1000 10000 Log10 of K. pneumoniae isolate numbers

FIGURE 1.4: Global MLST sequences overview of K. pneumoniae downloaded from NCBI and filtered for 50x depth sequenced on the Illumina platform (Illumina, USA). The highest reported sequence of the downloaded set was ST258, followed by ST307, ST512, and ST15. The sequences were analyzed using Kleborate [49].

24

1.4.1.3 PHYLOGENOMIC AND PHYL ODYNAMIC ANALYSIS

Phylogenetic analysis is a crucial tool to study the past evolutionary history of a pathogen based on the present genomic sequencing data [50]. The WGS data of a pathogen provides insights into a pathogen’s origin, evolution, and transmission routes leading to the discovery of the index cases of outbreaks before before epidemics/pandemics arise

[51]. Through studying the probabilities of genetic mutations of a pathogen it is possible to understand changes in the population dynamics of outbreaks [51]. A so-called

‘probabilistic modeling’ approach for outbreak detection, which is not a novel method, has resurfaced in the past couple of years due to the use of the Bayesian inference model. This model integrates multiple sources of data from a suspected outbreak, including pathogen sequencing data, length of patients stay, and the specific time of pathogen isolation [52].

Multiple infectious disease studies are applying the Bayesian framework on bacterial [53], and viral [54] outbreaks to determine super-spreaders and the index case. However, it requires WGS data, which currently is scarce on strains circulating in Saudi Arabia.

The combination of phylogenetic evolutionary parameters, epidemiology, and target pathogen infection dynamics are referred to as phylodynamic in our study following the term set by Grenfell et al. [55]. Phylodynamics will allow us to discover the geographical path of transmission, the potential population size as well as pathogen genealogy, including epidemic caused by AMR and virulence factors. For example, K. pneumoniae was divided originally into three phylogroups (KpI, KpII, and KpIII), since all three phylogroups were pathogenic to humans. However, using WGS and genomics more in-depth analysis of several K. pneumoniae clones suggested that each phylogroup is a distinct species.

Therefore, currently, K. pneumoniae (KpI), K. quasipneumoniae (KpII) (17), and K.

25 variicola (KpIII) are distinct species [38]. Phylogenetic data resulting from WGS have been aiding in tracking the evolution of outbreak causing CREs and indicate high-risk clones associated with AMR and virulence [38].

1.4.1.4 AST and Virulence Detection by WGS

The use of antibiotics indiscriminately by clinicians with insufficient information on the pathogen AMR profiles can lead to the selection of microbes with new AMR genes.

Treatment with antibiotics has to be governed by the antimicrobial stewardship, which relys on the updated surveillance information of all the current local AMR profiles of pathogens [4]. Although the guidelines of antibiotic management are rigorously updated, empirical treatment with an antibiotic in Saudi Arabia appears to be surpassing the international baseline [56]. Comprehensive clinical data could control excessive usage and provide sufficient information on the AMR profile of infected cases in a complete microbiological report [26]. Rapid reporting is an essential part of the clinical microbiology laboratory service. The conventional methods of diagnosing bacterial infections and identifying the AMR profiles currently used are hindered by the limited ability of bacterial growth in cultures followed by a restricted range of antimicrobial susceptibility-testing

(AST) [4]. The current AST methods, include various methods such as disc diffusion method (DDM) or Epsilometer test (E-test), and Matrix-Assisted Laser Desorption

Ionization Time-of-Flight (MALDI-TOF) Mass Spectrometry i.e., VITEK 2 (bioMérieux,

USA). When conventional methods are compared to WGS, all three methods among other methods are indeed cost-effective and robust for clinical work but suffer from issues with inaccuracy, broad-scope identification and could not be employed for all AST [26]. For example, guidelines now restrict the use of disk diffusion or E-test to measure colistin

26 phenotype but approve broth microdilution (BMD) [57]. To treat CRE pathogens, usually colistin drug is regarded as the last-resort antibiotic available. However, colistin is a complex mixture ~30 compunds of which two are the principal components, colistin A and colistin B with molecular weights of 1169 and 1155 g/mol, respectively [57]. The rapid increase in the prevalence of CREs prompted the reconsideration of colistin as a valid therapeutic option for critically ill patients regardless of their serious side-effects, i.e., nephrotoxicity and neurotoxicity issues [12]. Colistin resistance in K. pneumoniae has been reported from numerous regions (Table.2), including Europe, North America, South

America, Asia, and South Africa [12]. Reports of MDR and XDR from the Middle East and North Africa are rare and likely underreported [12]. In the GCC region, blaNDM variants are dominant with sporadic cases of blaKPC [58]. Unfortunately, colistin resistance phenotypes have not been regularly defined and the information on the emerging colistin resistance genes in the regions are lacking.

The disk diffusion method and the gradient diffusion were found not performing to the expected standards in measuring hard to diffuse molecules such as colistin [59]. However, despite its validity for colistin testing, BMD is not a favorable method in the clinical laboratory because it is a time-consuming and tedious protocol [60]. The clinical and laboratory standards institute (CLSI) and the European committee on antimicrobial susceptibility testing (EUCAST) method recommendations for colistin testing stipulate that endpoints must be read by eye [59]. The CLSI has approved clinical breakpoints for colistin against the Enterobacteriaceae in 2019 and fixed intermediate and resistant categories. However, the colistin susceptible category has not been set due to a lack of clinical evidence, as mentioned in the CLSI meeting minutes

27

(https://clsi.org/meetings/ast/ast-meeting-files-resources/). Since profiling of last-resort antibiotics, such as carbapenemases and colistin, resistance is critical in choosing the correct treatment [61]. Recently, CLSI-EUCAST has joined forces and issued warnings to highlight the limitation of DDM, E-test, and other commercially available products and emphasize that BMD remains the only valid method for measuring colistin [59, 62]. BMD is not readily available in all clinical laboratories in the MENA region. Therefore, there is a need for an unconventional routine method that utilize readily available components at low complexity and low cost. Presently, we may be able to rely on the accuracy of WGS in identifying the genotype of colistin resistance until an approved and accurate antimicrobial susceptibility testing method is available.

# Gene Name Function 1 mgrb Represses PhoP/PhoQ signaling 2 phoP transcriptional regulatory protein 3 phoQ Sensor protein 4 pmrA/BasR Regulate transcription – part of the phosphorelay signal transduction system 5 pmrB/BasS sensor protein 6 pmrC Hydrolase (Phosphoethanolamine transferase EptA) (PmrC protein) 7 crrABC Glucose-specific phosphotransferase enzyme IIA component 8 mcr mcr family phosphoethanolamine--lipid A transferase

Table 1.2: List of known genes and their functions associated with colistin-resistant in some gram- negative bacteria. Most of the genes (1-7) have been known to be chromosomal and only known mobile gene is the mobile colistin resistance mcr [10, 63-65].

Current practice in clinical microbiology does not involve testing for virulent factors, except for the rarely reported string test, where mucoid bacterial colonies form a viscus string that stretches more than 5 mm in length are considered as hypermucoidy [66].

Virulence factors in bacteria increase pathogen survivability, adhesion and eventually transmission to multiple hosts [67]. Hypervirulent (hypermucoviscous) K. pneumoniae

28

(hvKP) clones containing AMR genes are considered superbugs [1, 68]. They are an emerging variant of Klebsiella that could cause pyogenic liver abscess and metastatic spread causing high mortality infections even in individuals with no comorbidities [69].

Only a handful of clones of hvKP have been characterised. These clones contain serum- resistant capsules, such as a K1 capsule in the ST23 clone, and a K2 capsule in ST86 clone, which were usually found to carry other virulence genes on Inc plasmids [70]. WGS enabled defining hvKP-specific clones and their virulence factors. Identifying these factors such as aerobactin-deficient (ΔiucA), Yersinia high-pathogenicity island (encodes yersiniabactin) and iucABCD-iutA, and rmpA1/rmpA2 could result in a significant impact on saving patients’ lives [71]. There are many conserved virulence factors in K. pneumoniae, such as lipopolysaccharide and mrk operon encoding type 3 fimbriae [72].

Both biofilm and fimbriae types have been associated with colonization and enhanced virulence and infection [73]. The presence of rmpA1/rmpA2 was found to affect the conserved factors' expression, resulting in the hypermucoviscous capsule phenotype [74].

Classical serological methods have been used to detect the 130 capsular polysaccharide or

K antigen as well as the 10 types of O antigens, yet WGS provided higher accuracy [75].

These are repeating polysaccharides part of the lipopolysaccharide if overexpressed by rmpA1/rmpA2 could affect the immune system stopping the complement-mediated killing of hvKP and resulting in an endotoxic shock [76]. Another group of virulence factors are , which can be acquired through mobile elements carrying genes of aerobactin, yersiniabactin, salmochelin, and rmpA1/rmpA2 in hvKP [77]. The worst possible clone is a hvKP clone that has multiple AMR genes of last-resort antibiotics. Not only will it be hard to treat and will undoubtedly be fatal, it also can spread these genes through horizontal

29 gene transfer to other pathogens in other patients and survive in the environment longer.

Discovery of these genes and the strains carrying them through using WGS metagenomic analysis of patient samples and environmental surveillance is a necessity equal to defining

AMR genes in order to improve patient care and save lives.

1.4.2 Metagenomics Shotgun metagenomics data allows for a holistic overview for the host, pathogens and commensal microorganisms that dictate a specific microbiome [26]. Metagenomic sequencing has unlimited potential in combining diagnosis, discovery, understanding the spread of disease. In recent years, reports characterizing AMR directly from clinical samples using WGS metagenome has been increasing [78]. In other sectors, including food safety, public health, environmental pathogen monitoring, sewage quality and treatment, as well as soil, metagenomic studies have been employed to report microbial biodiversity and identify AMR carrying pathogens [79-81]. Metagenomics also play a crucial role in the perception and progression of outbreaks-causing pathogenic clones to aid in preventing epidemics (Figure 1.5) [82]. Gilmour et al., illustrated an example of how high-throughput metagenomic data analysis detect and subtype clinical strains and link them to an outbreak of pathogens in meat products [83]. Integration of the epidemiological data and local WGS clinical clones could provide an up-to-date understanding of current infectious diseases spreading in the region and the spread of AMR and virulence genes. This integration is directly linked to the growth of epidemiological information that directs the explanation of possible transmission and the optimal type of antibiotic regimens.

30

FIGURE 1.5: K. pneumoniae clones movement between different niches within the environment, and hosts exchanging AMR genes. [84].

1.4.2.1 Clinical Metagenomics

Current practice includes an in vitro culture, which requires labor/technical skills, and exhibits high specificity but low sensitivity [85]. Routine clinical microbiology needs extraction of pathogen DNA and running a targeted PCR based on a presumptive identification. Even though there are recent advances in detection, include multiplex PCR assays that may allow for the detection of multiple pathogens (Figure 1.6) [86] Yet, identification is still limited to an error-prone list of organisms that produce false negatives if the target sequence is mutated [24]. Culture-based diagnostics is still the gold standard; however, delay in diagnosing an infection or, even worse, reporting false positives or false negatives have had an overwhelming effect on patients [87]. Blood cultures' accuracy has been limited by contamination, which is expected to be less than 3% and it can take up to seven days of sampling [88]. However, current reports have ranged between 0.6%-12.5%

[89]. Algorithms generated Greub and co-workers (Figure 1.6) describe a comparison between conventional microbiological samples and clinical genomics. This contrast is especially important when facing the diagnostics of slow-growing pathogens [90].

31

Previously, clinical metagenomics was limited to 16S and 18S rRNA amplicon analysis to identify bacteria and fungi, respectively [85]. Targeted-amplicon analyses are suitable to study and catalog variations in the bacterial alpha and beta diversity [91]. The analysis of the entire microbiome, as well as the human host genome and transcriptome in patient samples gives a comprehensive overview that will guide better diagnosis.

The road towards casting a wide discovery net by using WGS metagenomics has some challenges and great rewards. WGS and metagenomic analysis directly from the patient’s sample have been helping diagnostics by producing essential information for formulating a clinical metagenomic-dependent protocol and identifying disease causes within hours.

We observed locally in The Ministry of National Guard Health Affairs (MNGHA) hospitals that limitations in clinical microbiology methods could present discovery of false- positive pathogens in multiple patients causing perception about a pseudo-outbreak.

Depending on the geographical location and clinical services available, ranges of intensive care unit (ICU) patients and post-surgery suffer from sepsis or septic shocks due to the unknown cause of their infection [92, 93]. Rapid and accurate discovery of the infectious agents is a need that has not been globally met yet and many protocols are still under evaluation. Routine metagenomics-guided analytics is not yet available in many hospitals, especially in Saudi Arabia, including many tertiary hospitals such as King Khalid Hospital,

MNGHA, Jeddah. There is direct financial support from the Saudi government to move forward in the field of precision medicine [94].

The platform for metagenomic sampling and analysis differs as well as the pipelines used to extract useful information. Subsequently, for clinical metagenomics to be approved by the Federal Drug Administration (FDA) it has to show high specificity and sensitivity, a

32 unified protocol, low cost, and less detection time from sample to diagnosis. Moreover, metagenomics is increasingly being exploited in diagnostic microbial applications, and in making synthetic drugs, vaccine plans as well as monitoring airborne pathogens in the clinical environment [95].

FIGURE 1.6: Comparison between traditional methods of microbial detection which can require propagation of microorganisms which is time consuming (a) and clinical genomics (b) The time factor illustrates how clinical genomics can cast a wider net reaching a faster diagnostics, regardless of the sample origin [90].

1.4.3 Bioinformatics Applications in Clinical Metagenomics WGS protocols and bioinformatics applications for single organisms and metagenomics tools are continuously changing as new and specialized tools are regularly emerging.

Therefore, it is tough to standardize analyses across all laboratories and samples locally and globally [96]. The output reads of WGS machines such as Illumina platforms (i.e.,

MiSeq/HiSeq) (Illumina, USA) produces a large quantity of short-read (shotgun) sequencing data (fastq file) that requires sophisticated computational pipelines for single

33 organism or metagenome analysis. Another recent alternative long-read sequencing platform that has been utilized for the metagenomic diagnostic is the MinION from Oxford

Nanopore Technologies (Oxford, UK). This instrument is considered faster and more cost- effective, yet it is still considered an under development platform due to its higher base- calling error rate when compared to Illumina shotgun sequencing. Further, an accurate analysis requires high coverage and depth as well as a defined bioinformatics pipeline that is optimized and validated for clinical samples. WGS coverage can be described as the average number of identified copies for a genome fragment in sequencing is presented in the data to generate read numbers with a specific size in an ideal genome [97]. Typically, for single organism analysis of the WGS, the number of reads calculation requires an adequate coverage, however, for metagenomics the coverage and abundance is calculated using gamma approximation method commonly known as the concept of bins [98].

The leading issue preventing the use of WGS metagenomic applications in outbreak investigations is the relative inability to obtain sufficient coverages of pathogen genomes and without chemically depleting host genome. Currently, sequencing coverage can be determined using multiple methods, including rarefaction curves, which is the number of reads generated versus the sequencing coverage, and plotting the annotated operational taxonomic units versus coverage [98]. The outcome of the rarefaction curve can indicate the likelihood of detection of new species in the sample [98]. A higher species discovery rate will reflect in an increased ability to define pathogens. For example, an accepted sequence read depth required for identifying pathogens from blood, to be diagnostically useful, would require large amounts of data [99]. The ratio in some clinical samples was estimated to be 1010:1 human: pathogen genome per each ml of blood in infected hosts

34

[99]. In some cases, alternative methods such as enrichment of pathogens’ DNA, may be useful to reduce the cost of sequencing, however, it may introduce bias to the analysis [99].

Other trials underwent human genome depletion from substantial amounts of chosen samples to increase the depth of microbiota in the samples [87]. For example, viral pathogens can sometimes integrate into host DNA and depletion of the host DNA, which sometimes accounts for more than 98% of the reads, which may result in misidentification.

Even if the host genome was not depleted, most current pipelines could identify and virtually remove host (i.e., human) reads; however, this would be a high-cost sequencing protocol. However, in our study, applying available enrichment and host depletion protocols was not applicable since the samples we acquired in many projects yielded low genome quantities, as low as 100 pg per sample in some cases. In low quantity samples, there is a need for PCR amplification step, which may introduce bias to the results.

A proposed pipeline for the clinical metagenomic sample contains many steps, such as trimming the data by removing both the adapter sequence and low quantity and complexity reads but differ in the ways of identification, which is heavily dependent on the availability of curated databases [100]. The most favored pipelines for diagnostics of pathogens apply either reference-guided assembly of the data combined with target mapping or short signature sequences for a single or group of microbes in hierarchically clustered sequence datasets [26]. The assembly method is much slower than the signature- based search but significantly more accurate [101]. Both methods are limited by the coverage of the desired sequence to establish a valid identification [26]. Therefore, to combine clinical prognosis with limited genome coverage using a fast pipeline, the concept of clinicogenomics was proposed (Figure 1.7a) [102]. In order to apply clinicogenomics in

35 precision medicine, Chiu et al. proposed a sequencing council review board to approve all sequencing results [26]. The council might suggest a specific extraction protocol or suspect a type of pathogen even if it presented at low coverage. With the help of the sequencing council, the clinicogenomics concept could provide a fast, although less accurate, and acceptable diagnosis of unknown infections (Figure 1.7b).

FIGURE 1.7: The clinicogenomics concept provides a distinct difference between a true metagenomics research study and a modified clinically-explained fast metagenomics analysis approach which is applied for cases with fast diagnostics of suspected infections [103]. operational taxonomic unit (OTU)

The alignment of multiple species to identify a homologous DNA sequences with high confidence depends on the level of curation of the chosen database used and the genome coverage. The basic local alignment search tool (BLAST) to the National Center for

Biotechnology Information (NCBI) is the most commonly used uncurated database. For

AMR, the comprehensive antibiotic resistance database (CARD) among many other tools is considered as one of the major AMR databases used [104]. Nowadays, many online web- services provide a simple uploading service, which can be used by any clinicians without the need for a bioinformatics expertise, i.e., MG-RAST for metagenomics [105],

36

Taxonomer [106] web-tools. There are multiple examples of WGS and metagenomic analyses that accept or contain a curated database for clinical samples, such as MetaPhlan2

[107], ClinVar [108], and CosmosID [109] for online interactive analysis. There are many examples of current bioinformatics tools explicitly made for diagnostics of pathogen identification tasks such as SURPI [110] and others for efficient basic data management, including Galaxy [111] and Geneious [112].

In our K. pneumoniae project, we used SRST2 and Kleborate from Holt’s laboratory [49].

The tools provided an in-depth analysis of multiple genetic factors of Klebsiella pathogenicity, virulence, cgMLST, and K/O antigen types [38, 113]. There are also specialized pipelines for the targeted disease. For example, a recent manuscript by Sohn et al. delivered a simple equation termed it the sepsis indicating quantifier (SIQ) score that allows for unambiguous identification of different types of bacteria that match blood culture count [85]. Implementing the SIQ score among other tools in a WGS diagnostic tool could provide an enhanced outcome from analysing clinically relevant WGS data.

Combining all the available protocols and analysis illustrates that predictions of a specific pathogens presence and also antimicrobial resistance genes are possible and beneficial using shotgun WGS and metagenomics of patient samples. There are multiple platforms to choose from, but nothing yet is considered as a one-size-fits-all. Thus, the expectations for the WGS platforms and bioinformatics analysis development are highly anticipated to surpass all the technical and ethical hurdles.

1.4.4 Other Issues of Using WGS in Clinical Metagenomics for Diagnosis Implementation of WGS in clinical diagnostics is a challenge that involves customization of multiple quality management, ethical approvals of metagenomic samples that include host genomic material, and development of monitoring procedures [114]. The WGS

37 protocols are continuously changing and new methods for DNA extraction, library preparation, novel sequencing platform, and computational pipelines are being developed

[114]. Typically clinically-relevant protocols have to be validated from multiple independent laboratories and hierarchically clustered sequence datasets before applying it in real life clinical diagnostics [26]. Presently, the guidelines for using of sequencing for clinical diagnosis in infectious diseases is undergoing constant validation and development despite the ambiguous rules that have been proposed in May 2016, by the USA FDA who published the Infectious Disease Next Generation Sequencing Based Diagnostic Devices:

Microbial Identification and Detection of Antimicrobial Resistance and Virulence Markers guidelines [114].

Moreover, infection prevention and control (IPC) guidelines do not yet contain sequencing as a valid method and may require multiple validation points to be included [115]. Locally, in many parts of Saudi Arabia, a clinical information system has not been implemented yet, which in turn adds another hurdle to WGS-based phylodynamic studies and may need to be urgently addressed.

Herein, epidemiological and outbreak studies, such as our study, require detailed patients’ data that may not be available, which renders the sequencing data unusable for a comprehensive conclusion. In our study, in particular, we faced multiple challenges, including educating the infectious disease clinicians on benefits of WGS as a diagnostic tool, formalizing a new ethical process and documents, proposing a method of transferring the clinical samples, failure-mode of unsuccessful sequence reads [116] and effects analysis of each protocol, and independent confirmation of unforeseen findings. Sample collection and ethical approval for our project extended the project by an extended period

38 due to its novelty in the region. Our application was the first of its kind for both Institutional review board (IRB) at the MNGHA hospital and the Institutional Biosafety and Bioethics

Committee (IBEC) in King Abdullah University of Science and Technology (KAUST) requesting approval of both WGS projects and clinical metagenomic data analysis. In our outbreak study, we met direct resistance to obtaining any rectal swaps from hospital staff due to definite refusal from the IPC committee as this would need staff education and establishing de-colonisation protocols in the hospital. It has been quite exciting and rewarding to apply novel methods such as WGS in Saudi Arabia. Moreover, the discoveries resulted from our study were of value to the clinical laboratory and the community of Saudi

Arabia and the world.

1.5 Research Aims and Objectives

Most clinical staff testimonies in relation to epidemics believe that and outbreaks start at hospitals [8, 37, 38, 117]. Investigating the effectiveness of using WGS and metagenomics applications on clinical isolates and the clinical environment can be employed to characterize pathogens, AMR, and virulence. The reservoirs of infection are often challenging to detect, and the level of perception and microbiological methods of identification are not yet advanced enough to suggest probable transmission events.

Thus, the introduction of WGS protocols could have great potential in unveiling the biodiversity of pathogens, and aid in diagnostics leading to outbreak prevention. In order to improve the service and advance pathogen detection, our study employed a combination of WGS, metagenomics, and epidemiological data collection during a four-year collection of clinical isolates of K. pneumoniae, and unknown clinical cases, which resulted in the following projects:

39

1. A clinical and molecular surveillance study of an outbreak of hypervirulent and multidrug-resistant clone (ST2096) of Klebsiella pneumoniae silently transmitting in a Saudi Arabia western region hospital (Chapter 2). 2. Identification and report of the first case of misidentified Klebsiella

quasipneumoniae harboring blaKPC-2 in Saudi Arabia (Chapter 3). 3. Defined the first case of a co-occurrence of mcr-1 and mcr-8 inside a Klebsiella pneumoniae isolated from an oncology patient that was not exposed to colistin (Chapter 4). 4. Applied metagenomic sequencing on a biopsy from a Chron’s disease patient to reveal three co-occurring Nontuberculous Mycobacteria (NTM) species that were not identified through conventional methods. (Chapter 5).

These projects are presented in each chapter in the form of a set of 4 manuscripts – two of which have already been published and another two are about to be submitted for publication. The outcomes of this study have directly reflected on the practice and policy at the MNGHA hospitals, identified novel AMR variants in Saudi Arabia, and will be an example to build on for applying WGS data in clinical practice in Saudi Arabia and the

GCC.

40

BIBLIOGRAPHY

1. Ahmad, M. and A.U. Khan, Global economic impact of antibiotic resistance: a review. Journal of global antimicrobial resistance, 2019. 2. Olaitan, A.O., et al., Worldwide emergence of colistin resistance in Klebsiella pneumoniae from healthy humans and patients in Lao PDR, Thailand, Israel, Nigeria and France owing to inactivation of the PhoP/PhoQ regulator mgrB: an epidemiological and molecular study. International journal of antimicrobial agents, 2014. 44(6): p. 500-507. 3. Kostyanev, T., M.J. Bonten, and H. Goossens, Since the discovery of the first antibiotic molecules and their use in clinical practice for the treatment of infections in the 1940s, antibiotic resistance has evolved and has been growing to reach its current endemic proportions. Antimicrobial resistance is currently recognised as a public health threat on a global scale, causing, according to some, at least 700000 deaths every year, which is predicted to increase to 10 million deaths per year in 2050 if no measures to contain antimicrobial resistance are taken [1]. A Review on Antimicrobial Resistance publication suggested 10 crucial recommendations in their final report to tackle antimicrobial resistance, including an emphasis on the importance of increased awareness, infection prevention, better use of existing treatments, and reducing. Anti-infectives and the Lung: ERS Monograph, 2017. 75: p. 289. 4. Jagadevi, B.S., et al., Clinical Importance of Emerging ESKAPE Pathogens and Antimicrobial Susceptibility Profile from a Tertiary Care Centre. Int. J. Curr. Microbiol. App. Sci, 2018. 7(5): p. 2881-2891. 5. Kontopoulou, K., et al., Hospital outbreak caused by Klebsiella pneumoniae producing KPC-2 β- lactamase resistant to colistin. Journal of Hospital Infection, 2010. 76(1): p. 70-73. 6. Liang, Z., et al., Molecular basis of NDM-1, a new antibiotic resistance determinant. PLoS One, 2011. 6(8): p. e23606. 7. World Health Organization, Worldwide country situation analysis: response to antimicrobial resistance: summary. 2015, World Health Organization. 8. Balkhy, H.H., et al., The strategic plan for combating antimicrobial resistance in Gulf Cooperation Council States. 2016, Elsevier. 9. Magiorakos, A.P., et al., Multidrug‐resistant, extensively drug‐resistant and pandrug‐resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clinical microbiology and infection, 2012. 18(3): p. 268-281. 10. Grégoire, N., et al., Clinical Pharmacokinetics and Pharmacodynamics of Colistin. Clinical Pharmacokinetics, 2017: p. 1-20. 11. Reardon, S., Resistance to last ditch antibiotic has spread further than anticipated. Nature, 2017. 10. 12. Ah, Y.-M., A.-J. Kim, and J.-Y. Lee, Colistin resistance in Klebsiella pneumoniae. International journal of antimicrobial agents, 2014. 44(1): p. 8-15. 13. uz Zaman, T., et al., Clonal diversity and genetic profiling of antibiotic resistance among multidrug/carbapenem-resistant Klebsiella pneumoniae isolates from a tertiary care hospital in Saudi Arabia. BMC infectious diseases, 2018. 18(1): p. 205. 14. Peyclit, L., S.A. Baron, and J.M. Rolain, Drug Repurposing to Fight Colistin and Carbapenem- Resistant Bacteria. Front Cell Infect Microbiol, 2019. 9: p. 193. 15. Tornimbene, B., et al., WHO global antimicrobial resistance surveillance system early implementation 2016–17. The Lancet Infectious Diseases, 2018. 18(3): p. 241-242. 16. Pittet, D., et al., The World Health Organization guidelines on hand hygiene in health care and their consensus recommendations. Infection Control & Hospital Epidemiology, 2009. 30(7): p. 611-622. 17. Rantsiou, K., et al., Next generation microbiological risk assessment: opportunities of whole genome sequencing (WGS) for foodborne pathogen surveillance, source tracking and risk assessment. International journal of food microbiology, 2017. 18. Organization, W.H., Coronavirus disease 2019 (COVID-19): situation report, 57. 2020. 19. Statistics, T.G.A.f., Umrah Statistics Bulletin. 2018. 20. Liljander, A., et al., MERS-CoV Antibodies in Humans, Africa, 2013-2014. Emerg Infect Dis, 2016. 22(6).

41

21. Kuehn, B.M., WHO: More than 7 million air pollution deaths each year. JAMA, 2014. 311(15): p. 1486. 22. Al-Tawfiq, J.A., K. Hinedi, and Z.A. Memish, Systematic review of the prevalence of Mycobacterium tuberculosis resistance in Saudi Arabia. J Chemother, 2015. 27(6): p. 378-82. 23. Leangapichart, T., et al., Emergence of drug resistant bacteria at the Hajj: A systematic review. Travel Medicine and Infectious Disease, 2017. 18: p. 3-17. 24. Gargis, A.S., L. Kalman, and I.M. Lubin, Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories. Journal of Clinical Microbiology, 2016: p. JCM. 00949-16. 25. Almonacid, D.E., et al., Correction: 16S rRNA gene sequencing and healthy reference ranges for 28 clinically relevant microbial taxa from the human gut microbiome. PloS one, 2019. 14(2): p. e0212474. 26. Chiu, C.Y. and S.A. Miller, Clinical metagenomics. Nat Rev Genet, 2019. 20: p. 341-355. 27. Wang, S.-S., et al., Diversity of culture-independent bacteria and antimicrobial activity of culturable endophytic bacteria isolated from different Dendrobium stems. Scientific reports, 2019. 9(1): p. 1-12. 28. Wade, W., Unculturable bacteria—the uncharacterized organisms that cause oral infections. Journal of the Royal Society of Medicine, 2002. 95(2): p. 81-83. 29. Almonacid, D.E., et al., 16S rRNA gene sequencing and healthy reference ranges for 28 clinically relevant microbial taxa from the human gut microbiome. PloS one, 2017. 12(5): p. e0176555. 30. Wetterstrand, K., DNA sequencing costs: Data. National Human Genome Research Institute (NHGRI), 2016. 31. Kidd, T.J., et al., A Klebsiella pneumoniae antibiotic resistance mechanism that subdues host defences and promotes virulence. EMBO molecular medicine, 2017. 9(4): p. 430-447. 32. Jabbour, J.-F. and S.S. Kanj, Emergence of Antimicrobial Resistance in the Arab Countries. APUA/ISAC Merger Announcement, 2019: p. 4. 33. Hugenholtz, P. and G.W. Tyson, Microbiology: metagenomics. Nature, 2008. 455(7212): p. 481. 34. Octavia, S. and R. Lan, The family enterobacteriaceae. The Prokaryotes: Gammaproteobacteria, 2014: p. 225-286. 35. Friedländer, C., Ueber die Schizomyceten bei der acuten fibrösen Pneumonie. Virchows Archiv, 1882. 87(2): p. 319-324. 36. Navon-Venezia, S., K. Kondratyeva, and A. Carattoli, Klebsiella pneumoniae: a major worldwide source and shuttle for antibiotic resistance. FEMS microbiology reviews, 2017. 41(3): p. 252-275. 37. Tan, H.-L., et al., Genome sequence of an extensively drug-resistant strain of Klebsiella pneumoniae, strain YN-1, with carbapenem resistance. Genome announcements, 2015. 3(1): p. e01279-14. 38. Holt, K.E., et al., Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proceedings of the National Academy of Sciences, 2015. 112(27): p. E3574-E3581. 39. DeLeo, F.R., et al., Molecular dissection of the evolution of carbapenem-resistant multilocus sequence type 258 Klebsiella pneumoniae. Proceedings of the National Academy of Sciences, 2014. 111(13): p. 4988-4993. 40. Llosa, M., et al., Bacterial conjugation: a two‐step mechanism for DNA transport. Molecular microbiology, 2002. 45(1): p. 1-8. 41. Carattoli, A., et al., PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrobial Agents and Chemotherapy, 2014: p. AAC. 02412-14. 42. Zhang, H., et al., Expression characteristics of the plasmid-borne mcr-1 colistin resistance gene. Oncotarget, 2017. 8(64): p. 107596. 43. Brandt, C., et al., Assessing genetic diversity and similarity of 435 KPC-carrying plasmids. Scientific reports, 2019. 9(1): p. 1-8. 44. Mutai, W.C., et al., Plasmid profiling and incompatibility grouping of multidrug resistant serovar Typhi isolates in Nairobi, Kenya. BMC research notes, 2019. 12(1): p. 422. 45. Chen, C., et al., Tracking carbapenem-producing Klebsiella pneumoniae outbreak in an intensive care unit by whole genome sequencing. Frontiers in cellular and infection microbiology, 2019. 9: p. 281.

42

46. Ewers, C., et al., Clonal spread of highly successful ST15-CTX-M-15 Klebsiella pneumoniae in companion animals and horses. Journal of Antimicrobial Chemotherapy, 2014. 69(10): p. 2676- 2680. 47. Protonotariou, E., et al., Emergence of Klebsiella pneumoniae ST11 co-producing NDM-1 and OXA-48 carbapenemases in Greece. Journal of global antimicrobial resistance, 2019. 19: p. 81-82. 48. Zhou, H., et al., Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Klebsiella pneumoniae. Frontiers in microbiology, 2017. 8: p. 371. 49. Lam, M., et al., Kleborate: comprehensive genotyping of Klebsiella pneumoniae genome assemblies. 2018. 50. Jarvis, P., B. Holland, and J. Sumner, Phylogenetic Invariants and Markov Invariants. 2017. 51. Kühnert, D., C.-H. Wu, and A.J. Drummond, Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. Infection, genetics and evolution, 2011. 11(8): p. 1825-1841. 52. De Maio, N., C.-H. Wu, and D.J. Wilson, SCOTTI: efficient reconstruction of transmission within outbreaks with the structured coalescent. PLoS computational biology, 2016. 12(9): p. e1005130. 53. Cella, E., et al., Multi-drug resistant Klebsiella pneumoniae strains circulating in hospital setting: whole-genome sequencing and Bayesian phylogenetic analysis for outbreak investigations. Scientific reports, 2017. 7(1): p. 3534. 54. Alkhamis, M.A., et al., Applications of Bayesian phylodynamic methods in a recent US porcine reproductive and respiratory syndrome virus outbreak. Frontiers in microbiology, 2016. 7: p. 67. 55. Grenfell, B.T., et al., Unifying the epidemiological and evolutionary dynamics of pathogens. science, 2004. 303(5656): p. 327-332. 56. Balkhy, H.H., et al., Antimicrobial consumption in three pediatric and neonatal intensive care units in Saudi Arabia: 33-month surveillance study. Annals of clinical microbiology and antimicrobials, 2019. 18(1): p. 20. 57. Lo-Ten-Foe, J.R., et al., Comparative evaluation of the VITEK 2, disk diffusion, Etest, broth microdilution, and agar dilution susceptibility testing methods for colistin in clinical isolates, including heteroresistant Enterobacter cloacae and Acinetobacter baumannii strains. Antimicrobial agents and chemotherapy, 2007. 51(10): p. 3726-3730. 58. Hala, S., et al., First report of Klebsiella quasipneumoniae harboring bla KPC-2 in Saudi Arabia. Antimicrobial Resistance & Infection Control, 2019. 8(1): p. 203. 59. Testing, E.C.o.A.S., EUCAST warnings concerning antimicrobial susceptibility testing products or procedures. European Committee on Antimicrobial Susceptibility Testing, Växjö, Sweden: www. eucast. org/ast_of_bacteria/warnings, 2018. 60. Hanson, C.W. and W.J. Martin, Modified agar dilution method for rapid antibiotic susceptibility testing of anaerobic bacteria. Antimicrobial agents and chemotherapy, 1978. 13(3): p. 383-388. 61. Morrill, H.J., et al. Treatment options for carbapenem-resistant Enterobacteriaceae infections. in Open forum infectious diseases. 2015. Oxford University Press. 62. Group, C.-E.P.B.W., Recommendations for MIC determination of colistin (polymyxin E) as recommended by the joint CLSI-EUCAST Polymyxin Breakpoints Working Group. EUCAST http://www. eucast. org/fileadmin/src/media/PDFs/EUCAST_files/General_documents/Recommendations_for_MIC_d etermination_of_colistin_March_2016. pdf. Accessed, 2017. 4. 63. Sonnevend, Á., et al., Plasmid-mediated colistin resistance in Escherichia coli from the Arabian Peninsula. International Journal of Infectious Diseases, 2016. 50(Supplement C): p. 85-90. 64. Olaitan, A.O., et al., Acquisition of extended-spectrum cephalosporin-and colistin-resistant Salmonella enterica subsp. enterica serotype Newport by pilgrims during Hajj. International journal of antimicrobial agents, 2015. 45(6): p. 600-604. 65. Leangapichart, T., et al., Acquisition of mcr-1 Plasmid-Mediated Colistin Resistance in Escherichia coli and Klebsiella pneumoniae during Hajj 2013 and 2014. Antimicrobial agents and chemotherapy, 2016. 60(11): p. 6998-6999. 66. Wiskur, B.J., J.J. Hunt, and M.C. Callegan, Hypermucoviscosity as a virulence factor in experimental Klebsiella pneumoniae endophthalmitis. Investigative ophthalmology & visual science, 2008. 49(11): p. 4931-4938. 67. Foreman-Wykert, A.K. and J.F. Miller, Hypervirulence and pathogen fitness. Trends in microbiology, 2003. 11(3): p. 105-108.

43

68. Lai, Y.-C., M.-C. Lu, and P.-R. Hsueh, Hypervirulence and carbapenem resistance: two distinct evolutionary directions that led high-risk Klebsiella pneumoniae clones to epidemic success. Expert review of molecular diagnostics, 2019. 19(9): p. 825-837. 69. Shon, A.S. and T.A. Russo, Hypervirulent Klebsiella pneumoniae: the next superbug? Future microbiology, 2012. 7(6): p. 669-671. 70. Lee, C.-R., et al., Antimicrobial resistance of hypervirulent Klebsiella pneumoniae: epidemiology, hypervirulence-associated determinants, and resistance mechanisms. Frontiers in cellular and infection microbiology, 2017. 7: p. 483. 71. Russo, T.A., et al., Aerobactin mediates virulence and accounts for the increased production under iron limiting conditions by hypervirulent (hypermucoviscous) Klebsiella pneumoniae. Infection and immunity, 2014: p. IAI. 01667-13. 72. Martin, R.M. and M.A. Bachman, Colonization, infection, and the accessory genome of Klebsiella pneumoniae. Frontiers in cellular and infection microbiology, 2018. 8: p. 4. 73. Connell, H., et al., Fimbriae—Mediated Adherence Induces Cosal Inflammation and Bacterial Clearance, in Toward anti-adhesion therapy for microbial diseases. 1996, Springer. p. 73-80. 74. Yu, W.-L., et al., Comparison of prevalence of virulence factors for Klebsiella pneumoniae liver abscesses between isolates with capsular K1/K2 and non-K1/K2 serotypes. Diagnostic microbiology and infectious disease, 2008. 62(1): p. 1-6. 75. Follador, R., et al., The diversity of Klebsiella pneumoniae surface polysaccharides. Microbial genomics, 2016. 2(8). 76. Alexander, H.R., et al., Recombinant interleukin-1 receptor antagonist (IL-1ra): effective therapy against gram-negative sepsis in rats. Surgery, 1992. 112(2): p. 188-194. 77. Lam, M.M., et al., Frequent emergence of pathogenic lineages of Klebsiella pneumoniae via mobilisation of yersiniabactin and colibactin. bioRxiv, 2017: p. 098178. 78. Forbes, J.D., et al., Metagenomics: the next culture-independent game changer. Frontiers in microbiology, 2017. 8: p. 1069. 79. Brown, E., et al., Use of Whole-Genome Sequencing for Food Safety and Public Health in the United States. Foodborne pathogens and disease, 2019. 80. Edge, T.A., et al., The ecobiomics project: Advancing metagenomics assessment of soil health and freshwater quality in Canada. Science of The Total Environment, 2019: p. 135906. 81. Hendriksen, R.S., et al., Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage. Nature communications, 2019. 10(1): p. 1124. 82. Snitkin, E.S., et al., Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science translational medicine, 2012. 4(148): p. 148ra116- 148ra116. 83. Gilmour, M.W., et al., High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC genomics, 2010. 11(1): p. 120. 84. Wyres, K.L. and K.E. Holt, Klebsiella pneumoniae as a key trafficker of drug resistance genes from environmental to clinically important bacteria. Current opinion in microbiology, 2018. 45: p. 131-139. 85. Grumaz, S., et al., Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Medicine, 2016. 8(1): p. 73. 86. Walker, B., et al., Cost-Effectiveness Analysis of Multiplex PCR with Magnetic Resonance Detection versus Empiric or Blood Culture-Directed Therapy for Management of Suspected Candidemia. Journal of clinical microbiology, 2016. 54(3): p. 718-726. 87. Planes, A.M., et al., Evaluation of the usefulness of a quantitative blood culture in the diagnosis of catheter-related bloodstream infection: Comparative analysis of two periods (2002 and 2012). Enfermedades infecciosas y microbiologia clinica, 2016. 88. Baron, E.J., et al., A guide to utilization of the microbiology laboratory for diagnosis of infectious diseases: 2013 recommendations by the Infectious Diseases Society of America (IDSA) and the American Society for Microbiology (ASM) a. Clinical infectious diseases, 2013. 57(4): p. e22- e121. 89. Gupta, K., et al., International clinical practice guidelines for the treatment of acute uncomplicated cystitis and pyelonephritis in women: a 2010 update by the Infectious Diseases Society of America and the European Society for Microbiology and Infectious Diseases. Clinical infectious diseases, 2011. 52(5): p. e103-e120.

44

90. Cartwright, M., et al., A broad-spectrum infection diagnostic that detects pathogen-associated molecular patterns (PAMPs) in whole blood. EBioMedicine, 2016. 9: p. 217-227. 91. Hugerth, L.W. and A.F. Andersson, Analysing microbial community composition through amplicon sequencing: from sampling to hypothesis testing. Frontiers in Microbiology, 2017. 8: p. 1561. 92. Long, Y., et al., Diagnosis of Sepsis with Cell-free DNA by Next-Generation Sequencing Technology in ICU Patients. Archives of Medical Research, 2016. 47(5): p. 365-371. 93. Leibovici, L., et al., Septic shock in bacteremic patients: risk factors, features and prognosis. Scandinavian journal of infectious diseases, 1997. 29(1): p. 71-75. 94. Jacob, M., et al., Metabolomics toward personalized medicine. Mass spectrometry reviews, 2019. 38(3): p. 221-238. 95. Vincenzi, B., et al., Procalcitonin as diagnostic marker of infection in solid tumors patients with fever. Scientific Reports, 2016. 6. 96. Maljkovic, I.B., et al., Next Generation Sequencing and Bioinformatics Methodologies for Infectious Disease Research and Public Health: Approaches, Applications, and Considerations for Development of Laboratory Capacity. The Journal of infectious diseases, 2019. 97. Wooley, J.C., A. Godzik, and I. Friedberg, A primer on metagenomics. PLoS computational biology, 2010. 6(2): p. e1000667. 98. Almeida, O.G. and E.C. De Martinis, Bioinformatics tools to assess metagenomic data for applied microbiology. Applied microbiology and biotechnology, 2019. 103(1): p. 69-82. 99. Nieman, A., et al., A prospective multicenter evaluation of direct molecular detection of blood stream infection from a clinical perspective. BMC Infectious Diseases, 2016. 16(1): p. 314. 100. Long, S.W., et al., Whole-genome sequencing of human clinical Klebsiella pneumoniae isolates reveals misidentification and misunderstandings of Klebsiella pneumoniae, Klebsiella variicola, and Klebsiella quasipneumoniae. mSphere, 2017. 2(4): p. e00290-17. 101. Breitwieser, F.P., J. Lu, and S.L. Salzberg, A review of methods and databases for metagenomic classification and assembly. Briefings in bioinformatics, 2017. 102. Westblade, L.F., et al., Role of clinicogenomics in infectious disease diagnostics and public health microbiology. Journal of clinical microbiology, 2016: p. JCM. 02664-15. 103. Campone, M., et al., Prediction of metastatic relapse in node-positive breast cancer: establishment of a clinicogenomic model after FEC100 adjuvant regimen. Breast cancer research and treatment, 2008. 109(3): p. 491-501. 104. Crofts, T.S., A.J. Gasparrini, and G. Dantas, Next-generation approaches to understand and combat the antibiotic resistome. Nature Reviews Microbiology, 2017. 105. Wilke, A., et al., The MG-RAST metagenomics database and portal in 2015. Nucleic acids research, 2015. 44(D1): p. D590-D594. 106. Flygare, S., et al., Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome biology, 2016. 17(1): p. 111. 107. Truong, D.T., et al., MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature methods, 2015. 12(10): p. 902. 108. Landrum, M.J., et al., ClinVar: public archive of interpretations of clinically relevant variants. Nucleic acids research, 2015. 44(D1): p. D862-D868. 109. Hasan, N.A., et al., Microbial community profiling of human saliva using shotgun metagenomic sequencing. PLoS One, 2014. 9(5): p. e97699. 110. Naccache, S.N., et al., A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome research, 2014. 111. Blankenberg, D., et al., Galaxy: a web‐based genome analysis tool for experimentalists. Current protocols in molecular biology, 2010. 89(1): p. 19.10. 1-19.10. 21. 112. Kearse, M., et al., Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 2012. 28(12): p. 1647-1649. 113. Inouye, M., et al., SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome medicine, 2014. 6(11): p. 90. 114. GUIDANCE, D., Infectious Disease Next Generation Sequencing Based Diagnostic Devices: 2 Microbial Identification and Detection 3 of Antimicrobial Resistance and 4 Virulence Markers 5.

45

115. Chen, D.S., et al. 531. Practical and Evidence-Based Considerations for Implementation of Bacterial Whole-Genome Sequencing Within Longitudinal Infection Control Practice. in Open Forum Infectious Diseases. 2019. Oxford University Press US. 116. Yang, G.S., et al., High-throughput sequencing: a failure mode analysis. BMC genomics, 2005. 6(1): p. 2. 117. Zhang, Y., et al., Emergence of a hypervirulent carbapenem-resistant Klebsiella pneumoniae isolate from clinical infections in China. Journal of Infection, 2015. 71(5): p. 553-560.

46

Chapter 2: An Outbreak of Hypervirulent and Multidrug-Resistant Clone (ST2096) of Klebsiella pneumoniae Silently Transmitting in a Saudi Arabia Western Region Hospital: A Clinical and Molecular Surveillance Study

In the following project, I have collected the samples and their related clinical information, obtained the ethical approval, as well as carried out the microbiological and molecular experiments done on all the isolates. I have written the paper, analysed the transmission pattern, and produced figures 2, 3, 4, and 5. I have done part of the bioinformatic analysis that includes identification of antibiotic resistance, virulence factors, and the phylogenetic tree of figure 1A. Finally, I have discussed and presented the outcome to the infection control unit at the hospital to propose the outcomes of figure 5.

Hala S1,2,3,4, Ghazzali R1, Antony CP1,5, Alshehri M2,3,4, Ben-Rached F1, Alsaedi A2,3,4, Al-ahmadi G2,3,4, Kaaki M2,3,4, Alazmi M1,5, Alhaj-Hussein T B2,3,4, Yasen M2,3,4, M. Zowawi H2,3,4,7,8, Alghoribi M2,3,4, Althaqafi O A2,3,4, Al-Amri A2,3,,4, Pain A1,8,9

1 Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering, King Abdullah University Of Science And Technology, Jeddah Makkah, Saudi Arabia 2 King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia 3 King Abdullah International Medical Research Centre, Saudi Arabia 4 Ministry of National Guard Health Affairs, Saudi Arabia 5 College of Computer Science and Engineering, University of Hail, Hail, Saudi Arabia 6 Red Sea Research Centre, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Saudi Arabia 7 The University of Queensland, UQ Centre for Clinical Research, Herston Queensland, Australia 8 Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, N20 W10 Kita-ku, Sapporo, Japan 9 Nuffield Division of Clinical Laboratory Sciences (NDCLS), University of Oxford, Oxford, OX3 9DU, UK

Key Points

Whole-genome sequencing, in combination with both epidemiological data and infection control, permitted to reveal an outbreak of the high-risk ST2096 clone in Saudi Arabia, its transmission patterns, and defining the infection sources.

47

2.1 Summary

2.1.1 Background and Purpose. The nosocomial dissemination of high-risk multi-drug resistant (MDR) Klebsiella pneumoniae strains has been a growing worldwide threat in healthcare facilities. In this study, we identified an outbreak of an MDR and hypervirulent Klebsiella pneumoniae clone.

2.1.2 Methods. We collected 235 MDR Klebsiella pneumoniae isolates from 220 patients of King Abdulaziz medical city within the Ministry of National Guard - health affairs (MNGHA) hospital (Jeddah, Saudi Arabia) between 2014-2018. The isolates were tested for phenotypic resistance profiles using clinical and microbiological methods. In parallel, the strains were sequenced at high-depth using Illumina shotgun whole-genome sequencing (WGS) to identify their antibiotic resistance profiles, virulence factors, clonal relationships, and transmission patterns in silico. Core-genome multilocus sequence typing (cgMLST) and molecular assay results were analyzed along with the clinical metadata and infection control strategies to classify the outbreak sequence types that were responsible for disease transmission.

2.1.3 Findings. This study identified K. pneumoniae ST2096 as an emerging MDR strain, which eventually became an extensively-drug resistant (XDR), and identified as a vastly communicable hypervirulent clone. ST2096 clone has been silently transmitting within the hospital and was found to be associated with high mortality (59%). The transmission of ST2096 predominated over other sequence types until the end of our collection period in March 2018. We identified the outbreak index case to have arrived at the hospital in December 2016. The index case carrying the first hypervirulent K. pneumoniae ST2096 isolate had been transferred between several wards for treatments and clinical-related tests. Consequently, the transmission network model illustrated the spread of ST2096 from ward A to the intensive care unit and other wards that host long-term care patients within the hospital. The transmission network analysis and the genetic distance between isolates suggested the involvement of different unsampled strains that could have been colonizing hospital staff or contaminating the hospital environment.

2.1.4 Interpretation. The emerging hypervirulent ST2096 clone poses a substantial threat, and addressing its transmission is of vital importance in limiting the spread of infections nationally and internationally. Update of the guidelines and control procedures ought to be applied to avoid further spread of such isolates.

Keywords. Nosocomial outbreak; clonal complex; Klebsiella pneumoniae; hypervirulent; multi-drug resistant; infection control; whole-genome sequencing, Multi-Locus Sequence Type, Transmissibility

48

2.2 Research in Context

2.2.1 Review and Evidence We searched PubMed without any filters or restrictions for published reports, without any filters or restrictions, that was published, and our searches contained the terms “Klebsiella pneumoniae Saudi

Arabia”, “Hypervirulent and Klebsiella pneumoniae and antimicrobial resistance”, “Klebsiella pneumoniae and ST2096 and antimicrobial resistance”, “Hypervirulent Klebsiella pneumoniae and virulence plasmid”, “ST2096 and Klebsiella pneumoniae and virulence plasmid”,

“Hypervirulent Klebsiella pneumoniae and virulence ”, and “Klebsiella pneumoniae and outbreak”.

There were no publications associated with Saudi Arabia or ST2096. Subsequently, when we searched only for “Klebsiella pneumoniae and ST2096”, one paper from India was found, which discussed the molecular characterization of colistin-resistant K. pneumoniae and the clonal relationships among

Indian isolates that were collected between 2016-2017. The study identified the clonal group CG14 to be the predominant one rather than ST258, which is known internationally as the most prevalent high- risk clone. However, the authors of the study did not attempt to link the molecular resistance mechanisms with phylogeny. Additionally, they neither characterized the percentage occurrence of multi-drug resistant (MDR) isolates for an extended time-period nor the presence of virulence factors in the genomes of their sampled strains.

2.2.2 Added Value

To date, whole-genome sequencing (WGS), identifying hypervirulence and drug resistance genes, and wgMLST have not been applied in combination to understand the transmission of nosocomial infections in the Middle East and North Africa (MENA) region. No previous report exists that uncovers and defines the phenotypic and genetic basis of phenotypic drug resistance mechanisms of K. pneumoniae in the region and compares it to the global high-risk clonal groups. Our results show that MDR ST2096 hypervirulent K pneumoniae strains have caused high mortality in long-term and intensive care unit

(ICU) patients in a tertiary Saudi hospital. A recent local cross-sectional analysis, from another city branch of our hospital in Saudi Arabia, did not identify ST2096. The infection control procedures ought to be updated and include wgMLST surveillance for pathogens with potentially high prevalence. Long-

49 term and ICU patients may potentially be at the highest risk of contracting a recurrent infection from the same strain.

2.2.3 Repercussions

According to the Saudi Census (stats.gov.sa), More than 27 million pilgrims visit the holy city of

Makkah annually for Hajj and Umrah pilgrims, and MNGHA is the main tertiary hospital on the road to this destination. The high-risk ST2096 hypervirulent and MDR K. pneumoniae strains have been causing a considerable threat to patients’ safety in the hospital. Hence, being a highly transmissible clone, ST2096, could spread from Indian pilgrims and ex-pats coming to Saudi Arabia and eventually into other countries through mass-gathering events. Continuous surveillance and reporting of high-risk clones and the ESKAPE pathogens in such critical locations are essential to prevent further dissemination in hospital and environmental settings.

50

2.3 Introduction

Nosocomial infections caused by multidrug-resistant (MDR) Klebsiella pneumoniae strains represent a significant public health threat [1]. Outbreak strain transmissions were mainly correlated with asymptomatic gastrointestinal colonization of health-care workers (HCW) or hospital environments that are propelled by inherited selective advantages [2]. The long-term (LT) care wards and intensive care units (ICU) mainly host immunocompromised, neonatal, and elderly patients in close proximity for an extended period, and therefore are prone to contracting communicable infections such as MDR K. pneumoniae [2, 3]. Infection prevention and control (IPC) policies emphasize the importance of device-associated and nosocomial infections, especially in the ICU [4]. Uninterrupted continuous transmissions have been the foundation of outbreaks with limited therapeutic options resulting in a 30-50% global mortality [5].

MDR K. pneumoniae is listed as one of the top priority pathogens associated with the urgent warning of antibiotic resistance threats by the World Health Organization (WHO) and the US Centers for Disease Control and Prevention (CDC) [6]. In Arab-league countries, MDR K. pneumoniae is among the top infectious organisms reported; however, detailed phenotyping and linking that to strain sequence types, genomic, phylogenetic, and phylodynamic data are lacking or hardly discussed [7]. Although rarely described, K. pneumoniae has been shown to have the ability to combine both multiple virulence factors and gaining resistance to multiple antibiotics [8]. Reports of K. pneumoniae outbreaks in Saudi Arabia have indicated the presence of few unique multi- locus sequencing types (MLST) strains such as ST29 and ST14 as opposed to globally reported high-risk clones such as ST258 [9].

51

MLST has been a valuable tool in tracking high-risk clonal groups (CG) and clonal complexes (CC) [10]. Nonetheless, the high abundance, the resistance profile versatility, combined with mutations in the housekeeping genes of K. pneumoniae clones directly affect correct sequence typing classification with a conventional molecular typing method or scheme [10]. Accurate clonal relatedness combined with molecular characterization and identification of antimicrobial resistance (AMR) markers determine high-risk sequence types and their CC within a region. Few reports have identified an abundance of the clonal complex CC14 represented by sequence type ST14 in the

MENA region [11]. A study in India of blood samples between 2016 and 2017 identified

ST231, ST14, and ST2096 as the highest prevalent sequence types associated with colistin resistance [12]. Both ST14 and ST2096 are part of the CC14, which is known to be a widespread MDR CC [8]. Only a few reports have discussed ST2096 and presented evidence of its capability in harboring carbapenemases, among other resistance and virulence genes [12, 13].

Epidemic K. pneumoniae has been associated with both hypervirulent and MDR clonal lineages. Hypermucoviscous and hypervirulent K. pneumoniae strains gained the ability to obtain iron and generate capsular material through siderophore gene clusters and virulence plasmids containing rmpA and rmpA2 genes, which enable these bacteria to survive longer and thus cause higher mortality [14]. Although there have been claims about surface capsular (K) loci (KL1 and KL2) associating with hypervirulent strains

[15], recent studies suggest that other types of K antigen serotypes are may also be involved in virulence [16]. An alarming increase of outbreaks is associated with MDR and hypervirulent sequence types, which presents a significant challenge for IPC [17].

52

The molecular epidemiology and clonal diversity of K. pneumoniae in the Arabian

Peninsula have not been thoroughly investigated [11]. WGS-based surveillance serves as a powerful tool allowing for tracking resistance genes, virulence factors as well as for identifying phylogenetic patterns. WGS enabled the use of core-genome MLST

(cgMLST), identification of polymorphisms, and extrachromosomal DNA with higher accuracy and efficiency [18]. Genomic analyses also allow the identification of novel genes that are associated with resistance to last-resort drugs such as β-lactams and colistin.

Here, we applied WGS as a surveillance method to understand the primary source behind the spread of Klebsiella spp. in our hospital in the western region of Saudi Arabia, and we combined the K. pneumoniae clinical metadata with WGS and clonal lineages to study the association between resistance patterns and disease. We characterized STs of K. pneumoniae along with the AMR and virulence markers of 235 isolates collected between

November 2014- March 2018 at the Ministry of National Guard Health Affairs (MNGHA),

Jeddah, Saudi Arabia. We discovered a predominantly fatal and silent outbreak of a dominant and emerging strain “ST2096” spreading through LT and ICU patients. The transmission reconstruction revealed an ongoing outbreak and allowed us to improve IPC procedures within our hospital. We aimed to determine the origin and the molecular basis of the transmission to reflect it on introducing updated and revised IPC policies and protocols in MNGHA and other Saudi clinics.

53

2.4 Methods 2.4.1 Surveillance and Outbreak Identification As part of the hospital surveillance program, we collected Klebsiella spp. isolates from 235 patients between November 2014 to March 2018 with their ethically approved epidemiological data. The isolates were collected from King Abdulaziz Medical City

Jeddah (KAMC-J), MNGHA, Jeddah in Saudi Arabia. The medical city has a tertiary care hospital, which has over 600 beds and adult ICUs for both medical and surgical patients with a capacity of 85 beds. The hospital hosts a large population of adult LT patients spread in 20 wards, with each having the possibility of single, double, and quadruple beds per room. Each ward has an assigned staff on rotation per ward and gender-specific shared bathrooms and waiting areas. The facilities provide services for more than 125,000 Saudi

National Guard soldiers, hospital employees, and their families within the western region of Saudi Arabia. This center is the first point of contact for an estimate of more than 27 million pilgrims during Hajj and Umrah (major mass-gathering events) on their way to and back from Makkah city annually [19].

2.4.2 Data Collection We collected multiple epidemiological data from patients with clinically diagnosed MDR

Klebsiella spp., including the length of hospital stay, location, sampling date, and sample type. The collection was a result of increased incidences of Gram-negative related infections and not related to any ongoing outbreak. Patients were defined as infected or colonized by an MDR Klebsiella-related infection by the molecular microbiology laboratory at the hospital. All the isolates patients’ data were collected under the supervision of an infectious disease (ID) consultant or ID fellow physician to ensure

54 clinical association of antibiotic treatment choice, hospital stay, co-morbidities, sepsis score, and demographical data.

2.4.3 Bacterial Phenotype Our original collection was more than 600 isolates from cryopreserved culture plates; however, many isolates yielded no growth or had no metadata available to us. We identified

235 MDR Klebsiella spp. related infections based on the clinical laboratory tests performed between 2014-2018. They were isolated from patients from various specimen types after cultures, from blood, sputum, urine, wound swabs, and other clinical specimens, on Blood agar and MacConkey agar (Saudi prepared media laboratory SPML, Saudi Arabia).

Combined with the automated mass spectrometry microbial identification system VITEK

MS (bioMérieux, France), further identification of single isolates using Sanger sequencing

(Applied Biosystems, USA) of conserved 16S rRNA gene(s) has been performed. Positive virulence for each isolate was initially determined using the string test of a vicious string

> 5 mm in length to be considered as hyper-mucoid and possibly hypervirulent.

Antimicrobial susceptibility test (AST) were carried out using VITEK2 rapid identification

(bioMérieux, France). The isolates with a recovery of 90% or more were included and were only included if they were marked in hospital records as MDR. The resistance profile was determined based on exhibiting resistance to three or more antibiotic classes. To further confirm AMR and other antibiotic-resistant, we employed alternative ASTs such as agar disc diffusion (BioRad, France), E-test (bioMérieux, France) and Micronaut broth microdilution (Merlin-Diagnostika, Germany) as per the manufacturers’ instructions and according to the European Committee on Antimicrobial Susceptibility Testing (EUCAST) recommendations (http://www.eucast.org/) [20]. All isolates were tested three independent

55 times (n=3), according to the Centers for Disease Control and Prevention (CDC) recommendations and guidelines [21].

2.4.4 DNA Extraction and Whole-Genome Shotgun Sequencing DNA was extracted using single colony boiling and GenElute Bacterial Genomic DNA Kit

(Sigma-Aldrich, Germany). DNA was sheared by sonication to reach 300-500bp product size (Covaris, USA). Manual and automated libraries were prepared using NEBNext Ultra

II DNA Library Prep Kit for Illumina (NEB, UK) using BioMek FXP liquid handling automation (Becman Corporation, USA). Samples were sequenced in multiple pools using

HiSeq4000 (Illumina, USA) to produce 150bp paired-end reads.

2.4.5 Data Availability Raw read data were deposited in the European Nucleotide Archive (ENA) under the study accession PRJEB36683.

2.4.6 Genome Assembly and Characterization Raw paired-end reads from Illumina HiSeq4000 were adapter- and quality-trimmed

(QC30) and phiX-cleaned using bbduk tool from the BBtools suite v37.31 (Bushnell, B. http://sourceforge.net/projects/bbmap/). The processed reads were then assembled on

Spades v3.11.1 [22] using the default parameters. The quality of the assemblies was evaluated on Quast [23], and completeness and contamination of the assemblies were confirmed on CheckM [24]. To verify that 90% or more of each K. pneumoniae genome is retrieved, sequencing depth had to be at least 50X sequencing coverage, Picard-tools

(http://broadinstitute.github.io/picard/) was first used to remove the sequencing duplicates

56 and then the resulting bam file was analyzed using bedtools genomecov v2.25.0 [25].

Kleborate v0.3.0 (https://github.com/katholt/Kleborate) was used to screen the assemblies for the whole genome multi-locus sequence type (wgMLST), antibiotic resistance genes, virulence, and hypermucoidy loci, and to ascertain that the species-level taxonomic assignment for all the generated assemblies was K. pneumoniae. Plasmid groups were classified based on the Inc typing of Gram-negative bacterial pathogens.

2.4.7 Single Nucleotide Polymorphism (SNP) Detection and Phylogenetic Analysis

The Binary Alignment Map (BAM) files that were obtained after running the above Picard- tools step on the sequenced genomes in this study, as well as K. pneumoniae genomes from public databases (n=3701 in total), were further fed into the GATK v3.7 pipeline [26] to call SNPs by using K. pneumoniae strain KPM_nasey (GenBank accession:

GCA_000826605.1) as the reference. VCFtools v0.1.12b [27] was used to merge all resulting vcf files from GATK into one file to filter out the non-CDs regions. This vcf file was subsequently loaded on SVAMP [28] to generate a multi-sequence alignment file

(containing a total of 874,381 SNP positions), which was then used to calculate a phylogenetic tree on FastTree v2.1.10 by using the default parameters [29]. Moreover, we downloaded, analyzed, and screened the sequences of 3,987 K. pneumoniae available in the National Center for Biotechnology Information (NCBI) GenBank database (July 2018)

(Figure 2.1). The resulting tree was visualized and edited on iTOL v3 [30].

57

2.4.8 Transmission Maps We implemented genomic inferences to analyze the possibility and source of the outbreak.

We used Structured COalescent Transmission Tree Inference (SCOTTI) plugin, a Bayesian approach in BEAST analysis, to reconstruct the transmission map of the suspected outbreak

[31]. The transmission events were predicted by analysis using the SCOTTI script which is part of the BEAST tool (https://github.com/Taming-the-BEAST/SCOTTI-Tutorial/).

Ninety-nine genome sequences of ST2096 K. pneumoniae isolates from 64 patients were included in the analysis along with the time of sampling, and the length of stay in the hospital. The transmission map identified the probability of transmission and the likelihood of the original host causing the outbreak. The duration of infections and the force of the bottleneck were set at 2 million iterations and the default settings on the application.

2.4.9 Statistical Analysis Chi-square test of independence of variables in a contingency table was used to calculate the P-value of independence of the observed frequencies compared with the expected frequencies, which were computed based on the marginal sums under the assumption of independence. To calculate the P-value, we used a function called chi2_contingency in

Python. The Ishikawa diagram was used to conduct root cause analysis of the IPC possible outbreak causations.

2.4.10 Ethics The Institute Review Board committees ethically approved the project at both MNGHA under the approval number (RJ17-023-J) and at King Abdullah University for Science and

Technology with approval number (17IBEC38_Pain).

58

2.4.11 Role of Funding Source The study was supported by the baseline funding of Prof. Arnab Pain BAS/1/1020-01-01 at King Abdullah University of Science and Technology (KAUST). The corresponding authors had full access to all the isolates data with full responsibility for analysis and publication.

59

2.5 Results

The cgMLST analysis of the strains unveiled several high-risk clones such as ST-14 and

ST-101. However, the highest reported sequence type found in 99 out of a total of the 235 isolates was the ST2096, which is part of CC14. This ST was not observed to be represented among the global KP sequence data (Figure 2.1). This exclusivity was also evident in the phylogenetic analysis of the study isolates (Figure 2.2). The timeline of the patients, from whom ST2096 KP strains were originally isolated from has provided evidence of possible continuous transmission. The strain appeared in December 2016 in the middle of our collection of 2014-2018 and seemed to continue transmitting until the end of our collection period in March 2018. The index case was identified both genetically and by tracking the timeline to an elderly burn victim who was transferred from another hospital in a different city and was placed in the assigned burn victim ward for one week. The patient was transferred three times to three separate wards. These wards were ICU, oncology, and one of the LT care-wards. Following, the patient died due to a stroke associated with septic shock. The average age of the infected patients, including patients with co-morbidities, in this outbreak, was above 60 years. The highest reported number of isolates were from respiratory pneumonia, followed by urinary tract infection, then blood/wounds (Figure

2.3). The timeline of each isolated case illustrated frequent patient transfer between wards.

ST2096 infections resulted in 59% mortality that was caused mainly by septic shock or urosepsis. Assessment of the map of interfacility transmissions indicated higher incidences among the ICU or LT care units during their hospital admission (Figure 2.4). We hypothesize that continuous transmission occurred because of either the high frequency of

60 patient transfers, patients’ interactions, HCW colonization, and the inherent hypervirulent nature of the ST2096 strains.

The ST2096 strains were phenotypically MDR with high resistance to carbapenems (90% of the strains) and with some being resistant to colistin (30% of the strains) (see also Figure

2.2). The frequency of AMR genes was significantly unusual for the majority of antibiotic classes tested (Table.1). AMR distribution pattern p-value was significantly uncommon, demonstrating that the ST2096 clone gained and lost multiple resistance genes during the outbreak transmission, which was explained by the multiple small and contained outbreak events (Figure 2.2). The original index case strain displayed hypervirulent phenotype and carried siderophores as well as rmpA2. The hypervirulence genotype was represented among 85% of the outbreak isolates. The ST2096 isolates were highly virulent and harbored multiple plasmid incompatibility types known to carry virulence factors (Figure

2.2). We used Kleborate scoring system [32] as a representative measurement for assessing the level of virulence and resistance in the ST2096 isolates. The isolates contained a unique capsular locus type KL64. The cgMLST and phylogenetic tree confirmed the plausibility of a simulated ST2096 outbreak (Figure 2.1).

Contrary to the global cgMLST patterns, no known internationally-relevant clones such as

ST258 and ST512 were identified in our collection. The global phylogenetic analysis of the combined local surveillance isolates from MNGHA K. pneumoniae strains along with a global collection of the K. pneumoniae illustrated a unique cluster pattern (Figure 2.1).

The 99 (ST2096) outbreak strains were phylogenetically distant from the majority of the global Klebsiella phylogeny (Figure 2.1). The interpretation of the simulated molecular clock and transmission maps have revealed a high probability of multiple smaller outbreaks

61 clustering in different wards between patients of ICU and LT care wards (Figure 2.4 and

Figure 2.5). Direct transmission analysis confirmed the occurrence of the continuation of these smaller outbreak clusters throughout 1.3 years of the collected K. pneumoniae isolates between 2016-2018. Indirect transmission map illustrated weaker probability linkage between these direct outbreak clusters of infection, suggesting “unsampled” isolates being the missing link of unsampled but mutating and growing, which could be asymptomatic colonization in healthy HCWs.

62

2.6 Discussion

In the 4-year surveillance of MDR Klebsiella pathogens within our hospital, we discovered a silently transmitting K. pneumoniae outbreak of a unique strain type of ST2096.

Analyzing the dissemination and transmission patterns of K. pneumoniae revealed a highly adaptable and transmissible ST2096 superbug. We discovered an unreported pattern of virulence and resistant genes spreading within our hospital, including carbapenem and colistin-resistant elements, which resulted in over 60% mortality. Therefore, ST2096 should be regarded as a high-risk clone and could result in a severe and fatal epidemic in the region.

The strain was not identified in the collection until December 2016 and remained throughout the surveillance period until March 2018. The initial diagnosis written in patients' files indicated the majority to be a nosocomial infection. A large portion of the isolates was defined as highly virulent and resistant strains, with the dominant sequence type being ST2096 of the CC14, followed by the high-risk clone ST14 of the same clonal group. Observationally, it is clear that increased virulence and the introduction of antibiotic resistance often occur almost concurrently in nosocomial outbreaks. The majority of the isolates were resistant to second and third-generation antibiotics with multiple known drug- resistant plasmids. The inc groups identified, including IncF, Col, IncH, and IncL, were known to carry virulence and resistance strains in Asia and South Asia [33]. Nonetheless, our results are evident of the existence of these mosaic plasmids carrying virulence and

AMR as well as associated with nosocomial infections and outbreaks in Saudi Arabia.

Subsequently, the isolates were a mixture of MDR and eventually became extensively-drug resistant (XDR) with concurrent hypervirulent factors, which caused high mortality

63 associated with septic shock. The surface antigen KL64 reported in the strains has been found to be associated with virulence and cell surface adhesion [34]. The ST2096 clone persisted in the hospital environment, infecting ICU and LT patients.

The transmission map identified the most probable original/index case as isolate 240, the locations of the outbreak, and the spreaders of the ST2096 strain during 1.3 years. Mainly, frequent transfer of patients within several wards and mixing of patients in LT units and

ICU may have permitted for ST2096 to spread. The strain’s virulence phenotype allows for more prolonged survival of the organism due to its improved ability to adhere to surfaces, and therefore, it is highly plausible for this outbreak to be linked to non-technical equipment or medical devices used within these locations. The root of this outbreak was identified as a burn victim who was transferred to four different wards within one month, including a burn unit followed by the ICU and LT wards. Even though our transmission maps suggested a root and clusters of transmission, the recurrent infections may have been through ST2096 colonizing a host (i.e., HCW) for an extended period asymptomatically causing more infections. We considered the source of infection could be a result of our index case transferring from another remotely located hospital, where less stringent IPC procedures are followed. Such policies and procedures have been deliberated to implement a much stringent isolation policy for all transferred patients with suspected bacterial infections in order to limit future outbreaks. The outline of ST2096 resistance, and virulence profile demonstrates a real threat of the emergence of a high-risk clone similar to ST14. Phylogenetically, ST2096 and ST14 are closely related, and they represented the highest prevalence in our hospital. Analyzing the outcome of this outbreak suggests

ST2096 accumulated greater resistance becoming XDR pathogen while maintaining its

64 virulence. This increase in resistance is worrisome for patient safety and needs to be addressed before the strain becomes Pandrug resistant with hypervirulence phenotype. This transmission is a significant cause for concern that such a strain, if not eliminated, could cause untreatable infections within local hospitals.

Even though the IPC team in our hospital enforces the international guidelines, they lack the necessary tools to reflect in updating their practice and change the outcome of such outbreaks (Figure 2.6). The hospital has been working on updating their IPC policy to focus on identifying the outbreak source. Limited antimicrobial compounds remain active against

XDR ST2096 isolates that continue to acquire different AMR combinations. These drugs have become the last resort treatment option for patients within the ST2096 infectious outbreak. The presentation of the outbreak clusters indicates that contact isolation of patients infected with MDR Gram-negative was effective in limiting the spread of the infection in certain wards; however, it did not prevent endemicity of ST2096. A multifaceted intervention was implemented featuring three critical phases of investigating the source, applying molecular techniques, and updating IPC procedures. First, a hospital- wide screening of all MDR K. pneumoniae isolates from the high-risk clones and isolating the source was established. Second, we implemented a molecular screening protocol of all environmental and patient isolated MDR Enterobacteriaceae, using primers specific to

ST2096, with further rigorous isolation protocols. Lastly, the disinfection of all affected wards and monthly surveillance using WGS for MDR isolates will be in considered until the end of 2020.

Our study has some limitations relating to sample size due to our inability to recover genomic DNA from all stored isolates collected over a period of 3.4 years. Moreover, many

65 strains were not included due to improper documentation and missing information required for drawing a probability of transmission maps. Although this study was supposed to be a genomic snapshot of Klebsiella isolates, our collection should have been more systematic, and it should have included environmental and hospital staff isolates to enhance and complete the surveillance procedures. Nonetheless, our initial objective was to focus on the types of resistances and virulence factors in the collected isolates. Consequently, the discovery of a large number of ST2096 has directed us to pursue the outbreak scenario and utilize our high-throughput genomic data for its full characterization.

In conclusion, combining genomic and epidemiological data allowed us to uncover issues that may have aided in the spread of the outbreak. Understanding the transmission pattern combined with identifying the cgMLST, has permitted us to focus on the strains to identify their pathogenic component of virulent and resistant. The ST2096 strains have been major contributors to the high mortality within the ICU, and LT care wards, yet they remained causing infections until the last day of our collection. Continuous investigations are needed to determine the pattern of dissemination and improve IPC procedures. These results have endorsed for a joint effort between the IPC team, hospital management, and clinical microbiology to unite in order to trace and eliminate this high-risk clone. Importantly, it is critical to educate HCWs on increasing awareness and monitoring by conducting competency assessments for better IPC practices.

66

Contributions S.H., A.S., A.A., and A.P. conceived the study; S.H., B.A., K.M., A.A., M.A. A.O., and Y.M. collected the isolates and performed clinical, microbiological experiments and analysis. S.H., M.A., M. ALG and G.A. interpreted the clinical data; S.H. and B.R.F. performed the microbiological and molecular experiments; S.H. and M.A. performed the laboratory characterization experiments; S.H., R.G., and C.P.A performed the bioinformatics analysis; S.H., and M.AZ. performed the statistical analyses; S.H., H.Z., C.P.A, MALG, B.F. and A.P. prepared the figures and wrote or revised the manuscript.

Conflict of Interest The authors have no conflicts to declare.

Acknowledgments We would like to express our appreciation to the infectious disease department and the pathology laboratory at the King Abdulaziz Medical City at the Ministry of the National Guards – Health Affairs western region for their support in this study. We would also like to thank the Bioscience Core Lab (BCL) at King Abdullah University of Science and Technology (KAUST) for their help with the sequencing operations. We also thank Malak Haidar and Sara Mfarrej in KAUST for their technical assistance and advice. This study was supported by baseline funding (BAS/1/1020-01-01) from KAUST to AP.

FIGURE 2.1: Distribution of local K. pneumoniae isolates and their global distribution (A) A phylogenetic tree of the 235 MNGHA K. pneumoniae isolates from whole-genome 874,381 SNPs data. The circle in-between figure A and B represent the cluster of the 99 isolates of ST2096 as well as the closely related ST14 isolates. (B) Combined phylogeny of the global and local K. pneumoniae isolates using maximum likelihood trees inferred from core genome SNPs of the sequencing data (n=3,987) (the black color represents the K. pneumoniae isolates from around the world (n=3,701) (1,000 bootstraps). (C) Distribution of multi-locus sequence type (MLST) of all the MNGHA Saudi hospital showing dominant strains of ST2096 and the high-risk clone ST14; red color represents ST2096 from MNGHA (n=99), and blue color represents other sequence type isolates from MNGHA (n=136).

67

A C

O th e r S e q u e n ce T y p e s

6 9 0 2 T S

ST307

ST 147

S T1 01

ST14

B

FIGURE 2.2: Characteristics of the genomic, clinical, and phenomic of ST2096 strains. (A) Maximum likelihood phylogeny based on core genome SNPs of the ST2096 (n=99) K. pneumoniae isolates subset against their antimicrobial resistance phenotypes and genetic profile of outbreak strain. (B) Plasmid incompatibility groups, isolate collection locations, case mortality status, virulence, and resistance score according to Kleborare tool score (Holt, K https://github.com/katholt/Kleborate) [35] as seen below, (C) phenotypic resistance using broth microdilution are all denoted by colored squares as shown.

68

FIGURE 2.3: Distribution of patients' diagnosis (left) and collection site (right) among the whole collection of MDR K. pneumoniae between 2014-2018. Most of the patients are

69 above 60-year-old indicating co-morbidity association with infection and nosocomial transmission. (n=235)

FIGURE 2.4: (A) and (C) Beanbag graphs of the probabilities of ST2096 indirect and direct transmission among patients, respectively. Patient locations are inside the circles, isolate numbers are around the edge of the circle, and the gradient in the color intensity of the circles (from light to dark) represents the possibility of being super infector, and the darkest green circle represents the original (index) transmitter or the root of the outbreak. An arrow is present from one patient to another, and the thickness and color of the arrow indicate the probability of indirect transmission. The data generated in this figure is a result of sequencing data, sampling time, length of stay, and the patients' location to simulate direct transmission. (B) Timeline of the patients infected with ST2096 divided into sections. Gray, pink, orange, and yellow-colored circles shared among all figures refering to the mini outbreak clusters.

70

FIGURE 2.5: Indirect transmission link between ST2096 Isolates (n=99). Maypole maximum clade credibility tree based on maximum clade credibility tree of the phylodynamic data of ST2096 (n=99), the color of the node indicates the transmission location and unsampled label indicates possible environmental or hospital staff colonization isolate that was not included. Branch colors indicate potential patient spreaders, and the branch width indicates the probability of the transmission [31].

71

FIGURE 2.6: Root-cause analysis of the outbreak. Six different potential causes shared the outcome resulting in the K. pneumoniae outbreak. A significant emphasis is on the causes related to HCW and patients’ interactions.

72

BIBLIOGRAPHY 1. Agarwal, M., S. Shiau, and E.L. Larson, Repeat gram-negative hospital-acquired infections and antibiotic susceptibility: a systematic review. Journal of infection and public health, 2018. 11(4): p. 455-462. 2. Snitkin, E.S., et al., Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science translational medicine, 2012. 4(148): p. 148ra116- 148ra116. 3. Rodrigues, C., et al., Long-term care facility (LTCF) residents colonized with multidrug-resistant (MDR) Klebsiella pneumoniae lineages frequently causing infections in Portuguese clinical institutions. Infection Control & Hospital Epidemiology, 2017. 38(9): p. 1127-1130. 4. Alshamrani, M.M., et al., Burden of healthcare-associated infections at six tertiary-care hospitals in Saudi Arabia: A point prevalence survey. Infection Control & Hospital Epidemiology, 2019. 40(3): p. 355-357. 5. Xu, L., X. Sun, and X. Ma, Systematic review and meta-analysis of mortality of patients infected with carbapenem-resistant Klebsiella pneumoniae. Annals of clinical microbiology and antimicrobials, 2017. 16(1): p. 18. 6. Shrivastava, S.R., P.S. Shrivastava, and J. Ramasamy, World health organization releases global priority list of antibiotic-resistant bacteria to guide research, discovery, and development of new antibiotics. Journal of Medical Society, 2018. 32(1): p. 76. 7. Moghnieh, R.A., et al., Epidemiology of common resistant bacterial pathogens in the countries of the Arab League. The Lancet Infectious Diseases, 2018. 18(12): p. e379-e394. 8. Holt, K.E., et al., Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proceedings of the National Academy of Sciences, 2015. 112(27): p. E3574-E3581. 9. uz Zaman, T., et al., Clonal diversity and genetic profiling of antibiotic resistance among multidrug/carbapenem-resistant Klebsiella pneumoniae isolates from a tertiary care hospital in Saudi Arabia. BMC infectious diseases, 2018. 18(1): p. 205. 10. Zhou, H., et al., Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Klebsiella pneumoniae. Frontiers in microbiology, 2017. 8: p. 371. 11. Moubareck, C.A., et al., Clonal emergence of Klebsiella pneumoniae ST14 co-producing OXA-48- type and NDM carbapenemases with high rate of colistin resistance in Dubai, United Arab Emirates. International journal of antimicrobial agents, 2018. 52(1): p. 90-95. 12. Shankar, C., et al., Molecular characterization of colistin-resistant Klebsiella pneumoniae & its clonal relationship among Indian isolates. The Indian Journal of Medical Research, 2019. 149(2): p. 199. 13. Wyres, K.L., et al., Genomic surveillance for hypervirulence and multi-drug resistance in invasive Klebsiella pneumoniae from south and southeast Asia. Genome Medicine, 2020. 12(1): p. 1-16. 14. Li, X., et al., Characterization of clinically relevant strains of extended-spectrum β-lactamase- producing Klebsiella pneumoniae occurring in environmental sources in a rural area of China by using whole-genome sequencing. Frontiers in microbiology, 2019. 10: p. 211. 15. Yeh, K.-M., et al., Capsular serotype K1 or K2, rather than magA and rmpA, is a major virulence determinant for Klebsiella pneumoniae liver abscess in Singapore and Taiwan. Journal of clinical microbiology, 2007. 45(2): p. 466-471. 16. Wyres, K.L., et al., Genomic surveillance for hypervirulence and multi-drug resistance in invasive Klebsiella pneumoniae from south and southeast Asia. BioRxiv, 2019: p. 557785. 17. Gu, D., et al., A fatal outbreak of ST11 carbapenem-resistant hypervirulent Klebsiella pneumoniae in a Chinese hospital: a molecular epidemiological study. The Lancet infectious diseases, 2018. 18(1): p. 37-46. 18. Ruppitsch, W., et al., Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Listeria monocytogenes. Journal of clinical microbiology, 2015. 53(9): p. 2869-2876. 19. Zumla, A. and Z.A. Memish, Risk of antibiotic resistant meningococcal infections in Hajj pilgrims. 2019, British Medical Journal Publishing Group.

73

20. Testing, E.C.o.A.S., Antimicrobial susceptibility testing of colistin—problems detected with several commercially available products. EUCAST warnings concerning antimicrobial susceptibility testing products or procedures. Växjö: EUCAST, 2017. 21. So, M., Antimicrobial Stewardship in Patients with Hematological Malignancies: Key Considerations. Current Treatment Options in Infectious Diseases, 2019. 11(2): p. 161-176. 22. Bankevich, A., et al., SPAdes: a new genome assembly algorithm and its applications to single- cell sequencing. Journal of computational biology, 2012. 19(5): p. 455-477. 23. Gurevich, A., et al., QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013. 29(8): p. 1072-1075. 24. Parks, D.H., et al., CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research, 2015. 25(7): p. 1043-1055. 25. Quinlan, A.R. and I.M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010. 26(6): p. 841-842. 26. McKenna, A., et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data. Genome research, 2010. 20(9): p. 1297-1303. 27. Petr, D., et al., 1000 Genomes Project Analysis Group. The Variant Call Format and VCF tools. Bioinformatics, 2011. 28. Naeem, R., et al., SVAMP: sequence variation analysis, maps and phylogeny. Bioinformatics, 2014. 30(15): p. 2227-2229. 29. Price, M.N., P.S. Dehal, and A.P. Arkin, FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular biology and evolution, 2009. 26(7): p. 1641-1650. 30. Letunic, I. and P. Bork, Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research, 2016. 44(W1): p. W242- W245. 31. De Maio, N., C.-H. Wu, and D.J. Wilson, SCOTTI: efficient reconstruction of transmission within outbreaks with the structured coalescent. PLoS computational biology, 2016. 12(9): p. e1005130. 32. Lam, M., et al., Kleborate: comprehensive genotyping of Klebsiella pneumoniae genome assemblies. 2018. 33. Ragupathi, N.K.D., et al., Plasmid profiles among some ESKAPE pathogens in a tertiary care centre in south India. The Indian Journal of Medical Research, 2019. 149(2): p. 222. 34. Dong, N., et al., Genome analysis of clinical multilocus sequence Type 11 Klebsiella pneumoniae from China. Microbial genomics, 2018. 4(2). 35. Lam, M., et al., Kleborate: comprehensive genotyping of Klebsiella pneumoniae genome assemblies. 2018.

74

Chapter 3: First Report of Klebsiella quasipneumoniae Harboring blaKPC-2 in Saudi Arabia

In the following project, I have written the manuscript and prepared the tables and figures included. I have carried out all the microbiological, molecular, and sequencing work for this isolate. The co-authors prepared the initial analysis and I carried out the analysis of the isolate sequence and plasmids to identify AMR genes and confirm the identification of the resistance profile presented herein.

Sharif Hala1,2,3, Chakkiath Paul Antony1,7, Mohammed Alshehri,2,3, Abdulhakeem O. Al Thaqafi3,4, Asim Alsaedi3,4, Areej Mufti3, Mai Kaaki3, Baraa T. Alhaj-Hussein3, Hosam M. Zowawi2,3,4,5, Abdulfattah Al-Amri3*, Arnab Pain1,6*

1Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Saudi Arabia. 2Clinical Microbiology Department, King Abdullah International Medical Research Centre - Ministry of National Guard Health Affairs, Saudi Arabia 3King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia 4WHO Collaborating Centre for Infection Prevention and Control, and GCC Center for Infection Control, Riyadh, Saudi Arabia 5The University of Queensland, UQ Centre for Clinical Research, Herston, Queensland, Australia 6Global Station for Zoonosis Control, Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, Sapporo, Japan 7Present address: Red Sea Research Center, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Saudi Arabia.

*Correspondence

DOI: https://doi.org/10.1186/s13756-019-0653-9

75

3.1 Abstract

3.1.1 Background Nosocomial infections caused by multi-drug resistant Enterobacteriaceae are a global public health threat that ought to be promptly identified, reported, and addressed accurately. Many carbapenem-resistant Enterobacteriaceae-associated genes have been identified in Saudi Arabia but not the endemic Klebsiella pneumoniae carbapenemases

(KPCs), which are encoded by blakpc-type genes. KPCs are known for their exceptional spreading potential.

3.1.2 Methods We collected n=286 multi-drug resistant (MDR) Klebsiella spp. isolates as part of screening for resistant patterns from a tertiary hospital in Saudi Arabia between 2014-2018.

Antimicrobial susceptibility testing was carried out using both VITEK II and the broth microdilution of all collected isolates. Detection of resistance-conferring genes was carried out using Illumina whole-genome shotgun sequencing and PacBio SMRT sequencing protocols.

3.1.3 Results A Carbapenem-resistant Enterobacteriaceae (CRE) Klebsiella quasipneumoniae subsp. similipneumoniae strain was identified as a novel ST-3510 carrying a blaKPC-2 carbapenemase encoding gene. The isolate, designated as NGKPC-421, was obtained from shotgun Whole Genome Sequencing (WGS) surveillance of 286 MDR Klebsiella spp. clinical isolates. The NGKPC-421 isolate was collected from a septic patient in late 2017 and was initially misidentified as K. pneumoniae. The sequencing and assembly of the

NGKPC-421 genome resulted in the identification of a putative ~39.4 kb IncX6 plasmid harboring a blaKPC-2 gene, flanked by transposable elements (ISKpn6-blaKPC-2–ISKpn27).

76

3.1.4 Conclusion This is the first identification of a KPC-2-producing CRE in the Gulf region. The impact on this finding is of major concern to the public health in Saudi Arabia, considering that it is the religious epicenter with a continuous mass influx of pilgrims from across the world.

Our study strongly highlights the importance of implementing rapid sequencing-based technologies in clinical microbiology for precise taxonomic classification and monitoring of antimicrobial resistance patterns.

Keywords

MDR, Klebsiella quasipneumoniae, carbapenemases, KPC-2, blakpc-2, Tn3

77

3.2 Background

Beta-lactam antibiotics such as penicillins, cephalosporins, and carbapenems represent up to 60% of the available treatment options for antibiotic-resistant Gram- negative bacteria [1]. These antibiotics can be hydrolyzed by the production of extended- spectrum beta-lactamases (ESBL) and carbapenemases [2]. Carbapenem-resistant

Enterobacteriaceae (CRE) is known to cause a variety of nosocomial infections that are associated with high mortality rates [3, 4]. Many CRE-associated genes have been identified, including the endemic Klebsiella pneumoniae carbapenemases (KPCs) and the New Delhi Metallo-beta-lactamases (NDMs) [3, 5]. Rapid identification and reporting of such markers could aid in minimizing the horizontal transfer of these genes to other bacterial species and ultimately preventing the spread of antibiotic resistance [6,

7]. The KPC-type enzyme, which is encoded by blakpc-type genes, are hard to detect by routine susceptibility screening and are known for their exceptional dissemination potential in various CREs [8]. To date, 24 variants of the blakpc-type genes have been reported from different countries, including the USA, Poland, South Korea, Malaysia, and Thailand [9-12]. These variants are typically identified within a Tn3-family of transposable elements capable of transferring resistance genes at high frequency [13].

Klebsiella pneumoniae was initially classified into three genetically closely- related phylogroups, and more recently, they have been classified into three distinct species: K. pneumoniae [KPI], K. quasipneumoniae [KpII], and K. variicola [KpIII] [14].

This distinction was possible through the comparison of their core genomes and not through conventional Multilocus Sequence Typing (MLST) and capsule genotyping [13,

14]. Due to significant overlap in their biochemical profiles, phenotypic testing using

78 traditional clinical microbiological assays is incapable of accurately differentiating between Klebsiella spp. leading to false reporting of clinically isolated K. quasipneumoniae to be K. pneumoniae [13].

K. quasipneumoniae was initially known as a commensal intestinal colonizer.

However, recent genomics-driven studies have documented it as an etiologic agent in a number of clinical Klebsiella-related infection cases [13, 15-17]. Clearly, such findings would underline the importance and usefulness of comparative genomics in defining the relatedness and differences between such strains, which could ultimately aid in reaching an accurate and prompt diagnosis.

Here, we report the first blakpc-2 harboring plasmid in a K. quasipneumoniae strain isolated from a tertiary care hospital in the Gulf states (GCC) region, identified by whole- genome sequencing (WGS) approach to detect multi-drug resistant (MDR) markers in a large collection of Klebsiella spp. isolates.

3.3 Materials and Methods

3.3.1 Clinical Setting and the Case As part of a more extensive study aiming to characterize Klebsiella spp. in a tertiary hospital in King Abdulaziz Medical City (KAMC) in Jeddah, Saudi Arabia, we collected

MDR isolates (n=286) between 2014-2018. All isolates were subjected to routine identification by using VITEK MS and VITEK II systems (bioMérieux Inc., Durham, NC) and were recorded in the laboratory information system as K. pneumoniae. Following the initial analysis of shotgun sequencing of the 286 MDR Klebsiella clinical isolates by

Illumina sequencing technology (Illumina, USA) (data not shown), we identified a single isolate (NGKPC-421) as K. quasipneumoniae, which previously was falsely identified as

79

Klebsiella pneumoniae by the clinical microbiology laboratory in the hospital. This isolate was obtained from a male patient in his mid-60s with chronic comorbidities admitted due to trauma-associated airway obstruction in November 2017, which promptly required a

Percutaneous Tracheostomy. The patient developed nosocomial infection one-week post- surgery, and his health deteriorated gradually with symptoms mainly indicative of pneumonia. The first positive culture from the patient’s tracheostomy wound was identified as MDR K. pneumoniae in December 2017 using VITEK II and VITEK MS. The infection persisted for a couple of months without improvement despite the antibiotic treatment and eventually caused septic shock and subsequent death. Antibiotic susceptibility testing

(AST) and Minimum Inhibitory Concentration (MIC) of the NGKPC-421 isolate was conducted using the automated MICRONAUT system (Merlin Diagnostika, Germany).

The AST result was interpreted using the resistance breakpoints defined by the EUCAST

[18]. The resistance phenotype reported by both VITEK II and MICRONAUT methods were in complete agreement (Table 3.1).

3.3.2 Genome Sequencing, Assembly, and Analyses Genomic DNA of the NGKPC-421 strain was extracted from a single colony and

DNA library was prepared using the NEBNext Ultra II kit for Illumina (New England

BioLabs, UK). Paired-end shotgun WGS was performed on Illumina HiSeq 4000 platform

(Illumina, CA, USA). To obtain long sequence reads for assembling the complete genome, we also sequenced the high molecular-weight genomic DNA on a Pacbio RS II platform

(Pacific Biosciences, CA, USA). The raw PacBio reads were first converted to fastq format by using bamtools v2.3.0 [19] and then assembled on Canu v1.6 [20] by using the default assembly settings. The resulting assembly file was further polished with Pilon v1.20 [21] by supplying the sorted bam file that was generated from bwa v0.7.17 [22] mapping of

80

Illumina reads to the initial Canu assembly. The gfa output file from the Canu-assembly of

NGKPC-421 was further loaded on Bandage [23] to visualize the connections between contigs so as to identify the putative chromosome and plasmid sequence-containing contigs. All contig sequences that showed no connections to the chromosome (potentially extrachromosomal in nature) were probed for the presence of signature genes that are commonly found associated with bacterial plasmids i.e. genes that encode for plasmid replication (Rep) or conjugative transfer (Tra) proteins. These identified putative plasmids that were then further analyzed on pMLST v1.4 and PlasmidFinder v2.1 web-tools [24] web-tools. Antibiotic resistance gene identification was generated with the Kleborate tool v0.3.0 [25] and Resfinder [26]. Confirmation of the gene annotations was made using the

PATRIC online database [27].

3.3.3 Phylogenetic Analysis and Annotation The final assembly was used to assign core genome MLST (cgMLST) by using the

K. quasipneumoniae BIGSdb database (http://bigsdb.pasteur.fr/Klebsiella/Klebsiella.html)

[28, 29]. Available assembled complete genomes of [KPII-A], [KPII-B] K. quasipneumoniae reference sequences were downloaded from the NCBI database

(November 2018) and aligned with identified K. quasipneumoniae (NGKPC-421) genome using progressiveMauve [30]. Alignment-based phylogenetic tree was constructed of K. quasipneumoniae subspecies (CP014154/HKUOPA4, CP034136/G747,

CP034129/G4584, CP031257/L22, CP030171/A708, CP029597/ATCC 700603,

CP023480/KPC-142 CP029443/CAV1947, CP029432/CAV2018, CP014156/HKUOPL4,

CP014155/HKUOPJ4, CP012300/HKUOPLC, CP012252/HKUOPLA, and

CP0023478/HKUOPA4) as well as outgroup strains of [KPI] K. pneumoniae

81

(CP013322/CAV1193) and [KPIII] K. variicola (CP000964/342). Locally collinear blocks (LCBs) larger than 500bp were selected from the full alignment file using a Perl script stripSubsetLCBs [30]. The core alignment file in .xmfa format was then converted into the .fasta format using the Perl script xmfa2fasta

(https://github.com/kjolley/seq_scripts/blob/master/xmfa2fasta.pl). The core alignment (4.25 Mb) was further inspected visually for any misaligned regions. To further ensure the bacterial strain type, a phylogenetic tree was generated using RAxML v8.2.3

[31] with the GTRGAMMA model, bootstrapping (1,000 replicates), and the best maximum likelihood tree inference. Additionally, a core-genome-SNP-based tree was constructed on Parsnp v1.2 from the Harvest suite [32] by using the recombination filtration (-x) and curated genome directory (-c) parameters.

3.4 Results

Analysis of the WGS data from the K. pneumoniae surveillance study identified one isolate (NGKPC-421) as K. quasipneumoniae subsp. similipneumoniae [KpII-B] that was originally misidentified as K. pneumoniae in the hospital laboratory. As shown in Fig.

1, the phylogenetic comparison of the NGKPC-421 isolate with 16 publicly available strains of K. quasipneumoniae 10 [KpII-B], and 4 [KpII-A], as well as K. pneumoniae

[KpI], and K. variicola [KpIII], revealed a close clustering of NGKPC-421 isolate with the ATCC strain 700603 [KpII-B], originally isolated from Australia in 2016. A separate core-genome-SNP-based tree that was constructed with Parsnp also showed identical topology (Supplementary-Fig.1). Our cgMLST analysis defined NGKPC-421 isolate as a novel K. quasipneumoniae subsp. similipneumoniae assigned as ST-3510.

82

Searching for genetic markers encoding antibiotic resistance showed the presence of the KPC-2 gene in the NGKPC-421 isolate. The presence and location of the KPC-2 gene were identified and confirmed as a result of the assembly of 6 contigs comprised of one large chromosome (~5.26 Mb) and four putative circular plasmids of varying sizes

(Table 3.2). Further analysis of the complete genome also showed the presence of other antibiotic resistance genes and virulence factors (Table 3.2).

Four putative plasmids identified in the NGKPC-421 isolate were annotated as pKPC-421, pBK30683-like, pK245, and a potentially novel plasmid pKq-NGSA-1 (Fig. 2).

The plasmid pKPC-421 (~39.4 Kb) belongs to the Inc6X group, where the blakpc-2 gene is flanked by genes belonging to the Tn3-based transposon family (ISKpn6 and ISKpn27)

(Fig. 3). Notably, a public database search of the ISKpn6- blakpc-2-ISKpn27 harboring sequences showed its presence on several plasmids across numerous bacterial species previously reported from many countries but not from the Gulf region (Fig. 3). The largest plasmid pKq-NGSA-1 (~204 Kb), represented by two contigs, was found to be an IncFII(k), harboring the majority of resistance genes, including blaOXA-1, blaCTX-M-15, and blaTEM-1B in multiple copies (Table 3.2). The second-largest plasmid pBK30683-like (~136 Kb), found as a single contig, had no known antibiotic resistance genes present. The fourth plasmid pK245 (~135 Kb) belonged to the IncR group and was found to harbor blaCTX-M-14b (Table

3.2).

3.5 Discussion

Although K. quasipneumoniae was initially considered as a benign human gut commensal bacteria, recent reports suggest its involvement in potentially fatal infections

83

[15, 16, 33]. Despite being rarely documented, K. quasipneumoniae appears to be capable of undergoing homologous recombination with other Klebsiella spp., allowing it to exchange both chromosomal and plasmid-borne antibiotic resistance genes [14].

Additionally, KPC-positive isolates co-producing Metallo-β-lactamase enzymes are known to be associated with diverse mobile genetic elements causing large outbreaks in

China, Europe, and the USA [34-37]. Our data demonstrate that blakpc-2, in addition to other carbapenemase resistance genes, is present in the Gulf region and that K. quasipneumoniae subsp. similipneumoniae could be misidentified as K. pneumoniae in clinical laboratories.

This finding should encourage further surveillance within Saudi Arabia and neighboring countries for additional occurrences of such resistance markers. This is particularly of epidemiological significance because the blakpc-2 was found in the IncX6 plasmid (pKPC-421) that is known to circulate among Enterobacteriaceae species [38].

The blakpc-2 was found to be flanked by transposons, which could have major implications on the spread of antibiotic resistance within the region. Furthermore, the isolate carried other plasmids (pKq-NGSA-1, pBK30683, and pK245) containing various other genes that encode for antibiotic resistance. While such genes are known to be present in clinical CRE- associated K. pneumoniae strains [5, 39, 40], none of the plasmids included here were reported nor genetically characterized prior to this study in the Gulf region [41-44].

Although the presence of various β-lactamase genes among Enterobacteriaceae such as blashv-§, blaTEM-1, blactx-M-1, blandm-1 and blaoxa-48 in Saudi Arabia and the Gulf region has been previously reported [41-44], though this is the first report of a KPC-type producing K. quasipneumoniae. The discovery of a misidentified MDR K. quasipneumoniae isolate is worrisome, suggesting the need to update identification

84 methods and the consideration of the implementation of rigorous molecular surveillance

[45]. The data reported in this study support the need to investigate the emergence and spread of KPC-producing strains from the region. Furthermore, the inability of conventional laboratory techniques to detect the KPC-2 gene in this CRE isolate increases the need for routine reconnaissance using WGS. Considering the importance of

Saudi Arabia as a host of one of the largest recurring mass gathering events (i.e. Hajj and

Umra pilgrimage) with 7 million pilgrims from more than 180 countries [46], infection control efforts are essential so as to monitor the potential spread of such KPC-producing

Enterobacteriaceae. Current conventional molecular microbiology laboratory diagnostic methods are sufficient for diagnosing infectious diseases and WGS may be used as an additional confirmatory tool [47]. Although performing WGS techniques would essentially require access to expensive equipment and reagents in addition to computational infrastructures and relevant expertise; our study re-emphasizes that the introduction of clinical genomics-driven surveillance methods in the GCC region could aid in accurate surveillance and confirmed diagnoses of etiologic agents and the determination of their antibiotic resistance markers. This is of particular public health relevance for monitoring the potential spread of KPC-producing Enterobacteriaceae during mass gathering events such as Hajj and Umrah in Saudi Arabia.

85

Abbreviations ESBL: Extended-spectrum beta-lactamase; CRE: Carbapenem-Resistant Enterobacteriaceae; KPC: Klebsiella pneumonia Carbapenemase; WGS: Whole Genome Shotgun Sequencing; MIC: Minimum Inhibitory Concentration Acknowledgment We thank the infectious disease department and the pathology laboratory at King Khaled hospital in the Ministry of the National Guards – Health Affairs western region for their support of surveillance research. We also would like to thank the Bioscience core lab in KAUST for their help with the sequencing. We also thank Fathia Ben Rached, Malak Haidar, Qingtian Guan, and Sara Mfarrej from KAUST Microbial Genomics Laboratory for their technical help. We thank the team of curators at the Institut Pasteur MLST system (Paris, France) for importing novel alleles, profiles and/or isolates at http://bigsdb.pasteur.fr. Funding This study was supported by baseline funding (BAS/1/1020-01-01) from KAUST to AP. Availability of Data and Materials The raw reads and assembly files from this study are publicly available at the European Nucleotide Archive (ENA) under the study accession number PRJEB30599 http://www.ebi.ac.uk/ena/data/view/PRJEB30599 and at the Zenodo open- access repository- https://doi.org/10.5281/zenodo.2583210 Competing Interests The authors declare that they have no competing interests. Consent for Publication Not applicable. Ethics Approval and Consent to Participate The project was approved by the IRB committees at NGHA (RJ17-023-J) and KAUST (17IBEC38_Pain). Authors' Contributions S.H., A.S., A.T., B.H., and M.A. collected the clinical data and clinical interpretation; S.H., A.M, M.K., and M.A. performed the biochemical experiments; S.H. performed the wet-lab characterization experiments ; S.H. and C.P.A performed the bioinformatics analysis; S.H., A.A., and A.P. conceived the study; S.H., H.Z., C.P.A and A.P. prepared figures and wrote or revised the manuscript.

86

Table 3.1: The AST profile and MIC of the NGKPC-421 isolate.

Antimicrobial agent MIC (mg/L) VITEK II¶ Micronaut¶ Amikacin 4 S S Amoxicillin-clavulanate 32 R - Ampicillin 32 R R Cefazolin 30 R R Cefepime 4 R R Cefoxitin 30 - R Cefotaxime 2 R R Ceftazidime 64 R R Ceftriaxone 6 R - Chloramphenicol 4 - S Ciprofloxacin 0.5 R R Colistin 1 - S Fosfomycin <32 - S Gentamicin >16 R - Imipenem 4 R R Levofloxacin 1 - R Meropenem 8 R R Piperacillin 16 R R Piperacillin-tazobactam 64/4 - R Temocillin <32 - S Tigecycline 0.25 S S Trimethoprim-sulfamethoxazole 4/76 R R  VITEK II and MICRONAUT assays were repeated (n=2) and (n=3), respectively.

R indicates resistant and S indicates susceptible to a given antibiotic based on the EUCAST guidelines [18] and (-) indicates the undetermined antimicrobial sensitivity test for a given antimicrobial agent.

87

Table 3.2: Antibiotic resistance genes on the assembled chromosome and plasmid

sequences in K. quasipneumoniae NGKPC-421 strain (ENA ERS3013985).

Name MLST/pMLST Size (Kb) CDS/GC content Resistance genes Predicted phenotype

Chromosome ST3510 5255.70 5816/57.35 blaOKPB-3 β-lactam oqxA Fluoroquinolone oqxB Fluoroquinolone pKPC-421 plasmid IncX6 39.40 58/47.32 blakpc-2 β-lactam pBK30683-like plasmid IncFII 136.16 199/55.48 n/a n/a pKq-NGSA-1 plasmid IncFII(k) 204.00 303/53.31 aac(3)lla Aminoglycoside aac(3’’)lb Aminoglycoside aac(6’)lb-cr Fluoroquinolone aac(6)ld β-lactam

blaOXA-1 Phenicol

blaCTX-M-15

blaTEM-1B Fluoroquinolone qnrB1 Sulphonamide catB3 Sul2 Tetracycline tet(A) Trimethoprim drA14 pK245 plasmid IncR 134.81 188/53.68 blaCTX-M-14b β-lactam  Assembly of K. quasipneumoniae NGKPC-421 sequences resulted in 6 contigs

comprising of 1 chromosome and 4 putative circular plasmids (Note: 2 contigs represent

the pKq-NGSA-1 plasmid).

88

STRAIN LOCATION USA UNITED KINGDOM CHINA UNITED KINGDOM UNITED KINGDOM UNITED KINGDOM BRAZIL AUSTRALIA SAUDI ARABIA CHINA USA USA CHINA CHINA CHINA Bootstrap CHINA 100 64 CHINA

FIGURE 3.1: The phylogenetic tree based on segmental alignments of chromosomal sequences.

The phylogenetic tree was constructed using RAxML (maximum likelihood) using reference sequences downloaded from NCBI database, including K. quasipneumoniae subspecies as well as strains of K. pneumoniae [KPI] (CAV1193) and K. variicola [KPIII] and viewed with the iTOL online tool v 4.4.2 [48]. Bootstrap values are represented by the size of the circles on each node. The NGKPC-421 [KPII-B] is most closely related to the ATCC strain 700603 [KpII-B], originally isolated from Australia. K. quasipneumoniae subspecies (CP014154/HKUOPA4, CP034136/G747,

CP034129/G4584, CP031257/L22, CP030171/A708, CP029597/ATCC 700603,

CP023480/KPC-142 CP029443/CAV1947, CP029432/CAV2018, CP014156/HKUOPL4,

CP014155/HKUOPJ4, CP012300/HKUOPLC, CP012252/HKUOPLA, and

CP0023478/HKUOPA4) as well as outgroup strains of [KPI] K. pneumoniae

(CP013322/CAV1193) and [KPIII] K. variicola (CP000964/342).

89

Func onal Categories Predicted protein Predicted protein Resistance gene Transposable element with known function with unknown function

90

FIGURE 3.2: Schematic gene organization of all 4 putative circular plasmids in NGKPC-

421. (a) pKPC-421 plasmid, (b) pKq-NGSA-1 plasmid, (c) pK245 plasmid, and (d) pBK30683 plasmid.

91

Name Species Size pMLST Accession (Kb) Number ISKpn6 blakpc-2 ISKpn27 tnpR pKPC-421 Klebsiella quasipneumoniae 39.40 IncX6 ERZ801715 ISKpn6 blakpc-2 ISKpn27 tnpR pCRE3-KPC Citrobacter freundii 47.13 IncR CP024709 ISKpn6 blakpc-2 blaTEM-1 ISKpn27 tnpR pGN2-KPC Proteus mirabilis 46.32 IncX6 MF156710 ISKpn6 blakpc-2 blaTEM-1 ISKpn27 tnpR pGN26-KPC Serratia marcescens 46.29 IncX6 MF156711 ISKpn6 blakpc-2 blaTEM-1 ISKpn27 tnpR pGN28-KPC Morganella morganii 46.12 IncX6 MF156712

3,141bp

FIGURE 3.3: Schematic of gene arrangements of the blakpc-2 flanked by Tn3-based transposon in plasmids across different species. IS481 and IS1182 family of genes harboring transposase and resolvase genes and insertion sequences, i.e., ISKpn6, and ISKpn27 in 5 plasmids compared to the pKPC-421 of K. quasipneumoniae. This fragment has been previously identified in various plasmids in several Gram-negative species from China. The cassette of SKpn6-blakpc-2-ISKpn27 can be found in similar IncX6 plasmids (included in Figure 3.2) as well as multiple types of plasmids across various bacterial species.

92

BIBLIOGRAPHY 1. Wong, D.R. and D. van Duin, Novel Beta-Lactamase Inhibitors: Unlocking Their Potential in Therapy. Drugs, 2017. 77(6): p. 615-628. 2. Zowawi, H.M., et al., β-Lactamase production in key gram-negative pathogen isolates from the Arabian Peninsula. Clinical microbiology reviews, 2013. 26(3): p. 361-380. 3. Bar-Yoseph, H., et al., Risk factors for mortality among carbapenem-resistant enterobacteriaceae carriers with focus on immunosuppression. Journal of Infection, 2018. 4. Rees, C.A., et al., Detection of high-risk carbapenem-resistant Klebsiella pneumoniae and Enterobacter cloacae isolates using volatile molecular profiles. Scientific reports, 2018. 8(1): p. 13297. 5. Krapp, F., et al. Case Report of an Extensively Drug-Resistant Klebsiella pneumoniae Infection With Genomic Characterization of the Strain and Review of Similar Cases in the United States. in Open forum infectious diseases. 2018. Oxford University Press US. 6. Goren, M.G., et al., Transfer of carbapenem-resistant plasmid from Klebsiella pneumoniae ST258 to Escherichia coli in patient. Emerging infectious diseases, 2010. 16(6): p. 1014. 7. Zowawi, H.M., Antimicrobial resistance in Saudi Arabia: An urgent call for an immediate action. Saudi medical journal, 2016. 37(9): p. 935. 8. Vaidya, V.K., Horizontal transfer of antimicrobial resistance by extended-spectrum β lactamase- producing enterobacteriaceae. Journal of laboratory physicians, 2011. 3(1): p. 37. 9. Baraniak, A., et al., Comparative population analysis of Klebsiella pneumoniae with extended- spectrum β-lactamases colonizing patients in rehabilitation centers in four countries. Antimicrobial agents and chemotherapy, 2013: p. AAC. 02571-12. 10. Machulska, M., et al., KPC-2-producing Klebsiella pneumoniae ST11 in a Children's Hospital in Poland. Polish journal of microbiology, 2017. 66(3): p. 401-405. 11. Lee, M.Y., et al., High prevalence of CTX-M-15-producing Klebsiella pneumoniae isolates in Asian countries: diverse clones and clonal dissemination. International journal of antimicrobial agents, 2011. 38(2): p. 160-163. 12. Ramos, A.C., et al., Evaluation of a rapid immunochromatographic test for detection of distinct variants of Klebsiella pneumoniae carbapenemase (KPC) in Enterobacteriaceae. Journal of microbiological methods, 2017. 142: p. 1-3. 13. Long, S.W., et al., Whole-genome sequencing of human clinical Klebsiella pneumoniae isolates reveals misidentification and misunderstandings of Klebsiella pneumoniae, Klebsiella variicola, and Klebsiella quasipneumoniae. mSphere, 2017. 2(4): p. e00290-17. 14. Holt, K.E., et al., Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proceedings of the National Academy of Sciences, 2015. 112(27): p. E3574-E3581. 15. Breurec, S., et al., Liver abscess caused by infection with community-acquired Klebsiella quasipneumoniae subsp. quasipneumoniae. Emerging infectious diseases, 2016. 22(3): p. 529. 16. Brinkac, L.M., et al., Emergence of New Delhi Metallo-ß-Lactamase (NDM-5) in Klebsiella quasipneumoniae from Neonates in a Nigerian Hospital. 2019. 17. Nicolás, M.F., et al., Comparative genomic analysis of a clinical isolate of Klebsiella quasipneumoniae subsp. similipneumoniae, a KPC-2 and OKP-B-6 beta-lactamases producer harboring two drug-resistance plasmids from southeast Brazil. Frontiers in microbiology, 2018. 9: p. 220. 18. Testing, E.C.o.A.S., EUCAST. Breakpoint tables for interpretation of MICs and zone diameters. Version 5.0. 2015. 2015. 19. Barnett, D.W., et al., BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 2011. 27(12): p. 1691-1692. 20. Koren, S., et al., Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research, 2017: p. gr. 215087.116. 21. Walker, B.J., et al., Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one, 2014. 9(11): p. e112963. 22. Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics, 2009. 25(14): p. 1754-1760.

93

23. Wick, R.R., et al., Bandage: interactive visualization of de novo genome assemblies. Bioinformatics, 2015. 31(20): p. 3350-3352. 24. Carattoli, A., et al., PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrobial Agents and Chemotherapy, 2014: p. AAC. 02412-14. 25. Wick, R.R., et al., Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for Klebsiella genomes. Journal of clinical microbiology, 2018: p. JCM. 00197-18. 26. Zankari, E., et al., Identification of acquired antimicrobial resistance genes. Journal of antimicrobial chemotherapy, 2012. 67(11): p. 2640-2644. 27. Wattam, A.R., et al., PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids research, 2013. 42(D1): p. D581-D591. 28. Diancourt, L., et al., Multilocus sequence typing of Klebsiella pneumoniae nosocomial isolates. Journal of clinical microbiology, 2005. 43(8): p. 4178-4182. 29. Ruppitsch, W., et al., Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Listeria monocytogenes. Journal of clinical microbiology, 2015. 53(9): p. 2869-2876. 30. Darling, A.E., B. Mau, and N.T. Perna, progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PloS one, 2010. 5(6): p. e11147. 31. Stamatakis, A., RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 2014. 30(9): p. 1312-1313. 32. Treangen, T.J., et al., The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome biology, 2014. 15(11): p. 524. 33. Shankar, C., et al., Whole genome analysis of hypervirulent Klebsiella pneumoniae isolates from community and hospital acquired bloodstream infection. BMC microbiology, 2018. 18(1): p. 6. 34. Maltezou, H., et al., Outbreak of infections due to KPC-2-producing Klebsiella pneumoniae in a hospital in Crete (Greece). Journal of Infection, 2009. 58(3): p. 213-219. 35. Steinmann, J., et al., Outbreak due to a Klebsiella pneumoniae strain harbouring KPC-2 and VIM- 1 in a German university hospital, July 2010 to January 2011. Eurosurveillance, 2011. 16(33): p. 19944. 36. Yang, J., et al., A nosocomial outbreak of KPC‐2‐producing K lebsiella pneumoniae in a C hinese hospital: dissemination of ST11 and emergence of ST37, ST392 and ST395. Clinical Microbiology and Infection, 2013. 19(11): p. E509-E515. 37. Woodford, N., et al., Outbreak of Klebsiella pneumoniae producing a new carbapenem- hydrolyzing class A β-lactamase, KPC-3, in a New York medical center. Antimicrobial agents and chemotherapy, 2004. 48(12): p. 4793-4799. 38. Li, B., et al., Dissemination of KPC-2-encoding IncX6 plasmids among multiple Enterobacteriaceae species in a single Chinese hospital. Frontiers in microbiology, 2018. 9: p. 478. 39. Curiao, T., et al., Emergence of bla KPC-3-Tn 4401 a associated with a pKPN3/4-like plasmid within ST384 and ST388 Klebsiella pneumoniae clones in Spain. Journal of antimicrobial chemotherapy, 2010. 65(8): p. 1608-1614. 40. Chen, Y.-T., et al., Complete nucleotide sequence of pK245, a 98-kilobase plasmid conferring quinolone resistance and extended-spectrum-β-lactamase activity in a clinical Klebsiella pneumoniae isolate. Antimicrobial agents and chemotherapy, 2006. 50(11): p. 3861-3866. 41. uz Zaman, T., et al., Multi-drug carbapenem-resistant Klebsiella pneumoniae infection carrying the OXA-48 gene and showing variations in outer membrane protein 36 causing an outbreak in a tertiary care hospital in Riyadh, Saudi Arabia. International Journal of Infectious Diseases, 2014. 28: p. 186-192. 42. Memish, Z.A., et al., Molecular characterization of carbapenemase production among gram- negative bacteria in Saudi Arabia. Microbial Drug Resistance, 2015. 21(3): p. 307-314. 43. Zaman, T.U., et al., Insertion element mediated mgrB disruption and presence of ISKpn28 in colistin-resistant Klebsiella pneumoniae isolates from Saudi Arabia. Infection and drug resistance, 2018. 11: p. 1183. 44. Abdalhamid, B., et al., First description of methyltransferases in extensively drug-resistant Klebsiella pneumoniae isolates from Saudi Arabia. Journal of medical microbiology, 2017. 66(7): p. 859-863.

94

45. Wyres, K.L. and K.E. Holt, Klebsiella pneumoniae population genomics and antimicrobial- resistant clones. Trends in microbiology, 2016. 24(12): p. 944-956. 46. Elachola, H., et al., A crucial time for public health preparedness: Zika virus and the 2016 Olympics, Umrah, and Hajj. The Lancet, 2016. 387(10019): p. 630-632. 47. Dunne, W., L. Westblade, and B. Ford, Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. European journal of clinical microbiology & infectious diseases, 2012. 31(8): p. 1719-1726. 48. Letunic, I. and P. Bork, Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research, 2016. 44(W1): p. W242- W245.

95

Chapter 4: Co-occurrence of mcr-1 and mcr-8 genes in a multidrug- resistant Klebsiella pneumoniae from a 2015 clinical isolate in Saudi Arabia

In the following project, I have written the majority of the manuscript except for the bioinformatics and protein structure methodologies and prepared figure 1. I have carried out all the microbiological, molecular, cloning, and sequencing of the isolate. I have produced the AMR genes and helped in the bioinformatics and phylogenetic analysis.

Authors Hala S1,2,3,4, Antony CP1,5, Momin A A6, Alshehri M2,3,4, Ben-Rached F.1, Al-Ahmadi G4, Zakri S2,3,4, Baadhaim M2,3,4, Alsaedi A2,3,4, Al Thaqafi OA2, Arold ST6, Al-Amri A2,3.4, Pain A1,7,8#

Affiliations 1 Pathogen Genomics Laboratory, Division of Biological and Environmental Sciences and Engineering (BESE), Thuwal, Makkah, Saudi Arabia 2 King Saud bin Abdulaziz University for Health Sciences, Jeddah, Makkah, Saudi Arabia 3 King Abdullah International Medical Research Centre, Jeddah, Makkah, Saudi Arabia 4 Ministry of National Guard Health Affairs, Jeddah Makkah, Saudi Arabia 5 Red Sea Research Centre, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Saudi Arabia 6 Computational Bioscience Research Centre (CBRC), Division of Biological and Environmental Sciences and Engineering (BESE), King Abdullah University of Science And Technology (KAUST), Thuwal, Makkah, Saudi Arabia 7 Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, N20 W10 Kita-ku, Sapporo, Japan 8 Nuffield Division of Clinical Laboratory Sciences (NDCLS), University of Oxford, Oxford, OX3 9DU, UK

4.1 Abstract Background: Colistin represents the last-resort antibiotics used for treating carbapenem-resistant Enterobacteriaceae-related infections. Colistin resistance has been reported in Saudi Arabia in multiple- drug resistant (MDR) Gram-negative bacteria as higher percentage than expected. Objectives: To discover unique mcr variants in Gram-negative MDR bacteria in a tertiary hospital in Saudi Arabia during a genomic surveillance study on bacterial strains collected over four years. Patients and methods: We conducted a surveillance of 233 Klebsiella pneumoniae isolates during 2014-18 by whole genome sequencing. One of the isolates (NGKP-54) was cultured from the sputum of a female patient with pneumonia in her late-60s who was being treated for Non-Small-Cell Lung Carcinoma in 2015. Results: We produced the complete genome of the NGKP-54 strain using a combination of short and long reads WGS and performed in silico multilocus sequence typing (MLST). The NGKP-54 isolate was assigned a novel sequence type (ST3513) that is a single-locus variant of ST1425. The NGKP-54 isolate was resistant to aminoglycoside, cephalosporin, and fluoroquinolone, as determined by conventional microbiological protocols, and to colistin by broth microdilution with MIC of 4 μg/ml. Analysis of the complete genome revealed the presence of three colistin-resistance determining loci, including mcr-1, mcr- 8, and a partial mcr-8 gene fragment. Structural predictions and complementation experiments support that the MCR-1 and MCR-8 are likely to be functional. Conclusions: We have identified the earliest reported clinical isolate harboring mcr-8 in a colistin-resistant K. pneumoniae strain in Saudi Arabia. The strain also harbored a co-occurring mcr-1 in the same strain despite the patient not receiving colistin as a treatment. KEYWORDS Klebsiella pneumoniae, antimicrobial resistance, colistin, mcr-8, mcr-1, multidrug resistance

96

4.1 Introduction

Polymyxin E (colistin) is among the last-resort antibiotics used to treat carbapenem- resistant Enterobacteriaceae-related infections [1]. Colistin had been discontinued for human use since it was found to be associated with nephrotoxicity and neurotoxicity, but since the 1990s, clinicians were required to reconsider the clinical value of a modified version as a last resort antibiotic. Currently, single plasmid-mediated mobile colistin resistance (mcr) gene is recognized as a global threat, but multiple mcr genes combination in the same pathogen are rarely identified [2]. Membrane binding domains in MCR proteins are necessary for membrane charge modifications involved in the colistin resistance mechanism [3]. Wang et al. discussed the first mcr-8 containing isolate of Klebsiella pneumoniae with a high minimum inhibitory concentration (MIC) to colistin in a Chinese hospital in 2016, which was prior to the discovery of the mcr-8 gene in 2017 from a pig farm in China related to colistin usage in animals [4]. Typically, mcr-variants are investigated through molecular identification tools or next-generation sequencing (NGS), which is not usually implemented in clinical molecular biology laboratories. Such studies identified a close evolutionary relationship between mcr and phosphoethanolamine transferase (EptA) of Neisseria that is known to induce colistin resistance [5]. Thus, the discovery of multiple variants of mcr genes or the possibilities of identification of co- occurrence in the clinic has been possible through epidemiological surveillance of disease- causing pathogens. Therefore, continuous monitoring and identification of clinical cases that harbor unique mcr-variants is essential for understanding and monitoring colistin resistant bacteria. In this study, we have identified a colistin-resistant multidrug-resistant

(MDR) Klebsiella pneumoniae (Kp) clinical isolate (NGKP-54 ) carrying two variants of

97 mcr genes (mcr-1, and mcr-8) and a partial mcr-8 fragment. We assessed the level of resistance and the potential functional role of these mcr variants in inducing colistin resistance.

4.2 Materials and Methods

The Kp NGKP-54 was identified from an epidemiological survey using a large cohort of

MDR pathogens. The isolate was first cultured in 2015 from a sputum sample of a female patient in her late-60s who was treated for pneumonia and had gone through seven cycles of pemetrexed for Non-Small-Cell Lung Carcinoma (NSCLC) at King Abdulaziz

Medical City (KAMC), Jeddah, Saudi Arabia. The patient received Azithromycin and

Piperacillin/Tazobactam, for an extended period due to mixed reoccurring infections that included MDR Kp. The infection pattern of the patient has multiple strains with similar resistance patterns. Initially, the phenotypic resistance profile of the NGKP-54 strain was analyzed on a VITEK II system (bioMérieux Inc., Durham, NC) for multiple antimicrobial resistance (AMR) classes except for colistin. Colistin was measured by broth microdilution (BMD) Micronaut-S and Micronaut MIC-Strip Colistin kits (Merlin-

Diagnostika, Germany), as per the manufacturer’s instructions, and it was interpreted as per the EUCAST guidelines on breakpoints [6]. The control strains used in tandem during testing were Proteus vulgaris ATCC 49132 and Escherichia coli ATCC 25922 between these methods.

In order to produce a complete genome, we sequenced the NGKP-54 strain using a paired-end short-read shotgun sequencing using Illumina Hiseq4000 (Illumina, USA).

The genomic DNA from a single colony of the NGKP-54 strain was extracted for the preparation of a library using the NEBNext Ultra II kit for Illumina (New England

98

BioLabs, UK). NGKP-54 strain was also sequenced using long-read sequencing with high molecular-weight genomic DNA using the Pacbio RS II platform (Pacific

Biosciences, CA, USA). The genome was assembled with Canu v1.6 [7] using the default assembly settings, and the resulting assembly was error-corrected using Illumina reads using Pilon [8]. The polished canu-assembly of NGKP-54 was analyzed using Bandage

[9] to visualize the contigs and identify the core genome and plasmids (ENA accession number PRJEB36000). The core genome MLST (cgMLST) of NGKP-54 was assigned as a novel sequence type (ST3513) that is a single-locus variant of ST-1425 by using the Kp

BIGSdb database (http://bigsdb.pasteur.fr/Klebsiella/Klebsiella.html) [10]. All contig sequences that showed no connections to the chromosome (potentially extrachromosomal in nature) were probed for the presence of signature genes that are commonly found associated with bacterial plasmids, i.e. genes that encode for plasmid replication (Rep) or conjugative transfer (Tra) proteins. The coverages for the NGKP54 chromosome and plasmids were NGKP54 Chromosome: 34.67X, pNGHA-mcr-1: 106.40X, and pNGHA-

54: 64.79X.

The assembly was compared to all available mcr sequences available in NCBI (in total 67 sequences, accessed October 2019). All sequences were annotated and aligned with

MCR-1, MCR-8 and the partial MCR-8 fragment based on amino acid sequences. To analyze the phylogenetic aspects of the identified MCR-variants in NGKP-54, all available MCR in Resfinder alongside EptA (MG1655) (Supplementary Table.5) were aligned (Supplementary Figure.1). The three-dimensional structures for MRC-8 was modeled using SwissModel [11] based on the 2.8Å crystal structure of the lipooligosaccharide phosphoethanolamine transferase A EptA (PDBID: 5FGN).

99

Each mcr variants (mcr-1, mcr-8) coding region were amplified using the corresponding mcr variant targeted primers (Supplementary Table.3) and cloned into the pUC19 vector

(Thermo Fisher Scientific, USA). Three recombinant plasmids were created expressing mcr variant genes using Escherichia coli DH5α under the control of the isopropyl-β-d- thiogalactopyranoside-induced (IPTG) promoter, and MIC values for colistin were determined by manual and Micronaut MIC-Strip Colistin BMD. The amplified sequences were tested using polymerase chain reaction (PCR) amplification and Sanger sequencing method before cloning each of the complete mcr-1, mcr-8 into separate pUC-19 vectors

(Thermo Fisher Scientific, USA) as well as an empty pUC-19 vector into Escherichia coli

TOP10. The recombinant plasmids were expressed under the control of the IPTG promoter.

Plasmid extraction from each cloned isolate was made using midi-prep kit (Qiagen, USA) as per the manufacturer’s instructions and Sanger sequencing using the corresponding primers was carried out for confirmation with negative controls (Supplementary Table.2).

4.3 Results and Discussion

The ST-1425 is known to frequently accompany the production of many carbapenemases in the Middle East [12]. The general AMR resistance pattern was in an agreement between both methods (Supplementary Table.1&2). The MIC of colistin resistance of the

NGKP-54 with the combined mcr-1 and mcr-8 was 4 μg/ml and this was much lower when compared to the MIC of +mcr-8 clone 16 μg/ml (Supplementary Table.2). This may suggest an antagonistic function between mcr-variants co-occurring in the same organism. The negative effect of co-existing colistin resistance genes has been shown before between chromosomally encoded mgrb and mcr-1 in KP [13]. The sequence analysis identified one complete mcr-8 (1898 bp) coding region and another highly

100 similar partial and fragmented mcr-8 region (633 bp) on the same IncHI2 plasmid

(pNGHA54) (Figure.1a, 1b). The plasmid containing mcr-8 was compared to the original mcr-8 plasmid (pKP91) described previously (Figure.1c) [4]. Within the same isolate, a complete copy of mcr-1 (1826 bp) was identified on an IncI1 plasmid (pNGHA-mcr-1).

Protein structure, MRC-8 shares a sequence identity of 36.67% to EptA (modeling score

GMQE for MCR-8 was 0.71; Figure.2). Phosphatidylethanolamine (PE) and n-dodecyl β-

D-maltoside (DDM) were reported to similarly bind to EptA, MCR-1, and MCR-2 with

PE binding to EptA with an affinity of -9.9 kcal/mol [14]. Our docking studies suggested that PE can bind to MCR-8 and EptA with affinities of -12.8 kcal/mol and -12.6 kcal/mol, whereas DDM can bind to MCR-8 and EptA with the affinities of -11.4 kcal/mol and -

10.7 kcal/mol respectively. The PE-recognizing cavity of EptA, MCR-1, and MCR-2 is divided into two distinct subparts, (i) a 5-residue Zn-centered motif (E244, T283, H388,

D463, and H464) and (ii) a 7-residue PE-interacting domain (N106, T110, E114, S328,

K331, H393, and H476) [14] (Figure.2 G, H). The PE binding pocket for MCR-8 showed that the binding pocket for PE and Zn2+ is highly conserved in MCR-8 (Figure.2 H, K).

The phylogenetic relations between EptA with MCR-8 and MCR-5 revealed a closer evolutionary relationship than other MCR protein variants (Supplementary Figure.1).

The difference in the MIC measurements in the isolate with the combinations of mcr-1 and mcr-8 could play a role in the level of lipopolysaccharide (LPS)-modification to maintain the fitness of the organism during treatment [15]. During the treatment process of NSCLC, the patient was not treated with colistin during her lengthy infection of MDR NGKP-54, which is worrisome. Recent studies on the effect of pemetrexed treatment on the gut of cancer patient illustrated an increase in the abundance of Enterobacteriaceae, which could

101 enhance the exchange of resistance genes and improve the selection of such novel resistance patterns [16]. MDR pathogens can acquire resistance genes through mobile genetic elements or selective pressure [5]. This has been previously demonstrated in

Raoultella ornithinolytica that contained a partial mcr-8.4 with an 83% query coverage

(Supplementary Figure.1) [17]. The microbiome of the patient could contain commensal bacteria with intrinsic colistin-resistance, which have been suggested to be reservoirs for mcr-like genes [18]. The toxic stress of cancer treatment may affect pathogens infecting cancer patients [15]. To our knowledge, this is the first and earliest clinical report of mcr-

8 in the Middle East and North Africa (MENA) and the first global co-occurrence of such a combination of mcr variants in a single isolate.

102

Acknowledgments We wish to thank the Infectious Disease department and the Pathology Laboratory at King Khaled hospital in the Ministry of the National Guards – Health Affairs Western region for their support of our research. We also wish to acknowledge the Bioscience core laboratory at KAUST for their help with the sequencing operations. We thank Malak Haidar from KAUST Microbial Genomics Laboratory for the technical help. We also acknowledge help from the team of curators at the Institut Pasteur MLST system (Paris, France) for importing novel alleles, profiles, and/or isolates at http://bigsdb.pasteur.fr.

Funding The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) through the baseline fund BAS/1/1020-01-01 to AP and BAS/1/1056-01-01 to STA, and the Award No. URF/1/1976-25 from the Office of Sponsored Research (OSR).

Availability of Data and Materials The raw reads and/or assembly files (chromosome plus plasmids) from this study are publicly available at the European Nucleotide Archive (ENA) under the study accession number PRJEB36000 http://www.ebi.ac.uk/ena/data/view/PRJEB36000 and at the Zenodo open-access repository- https://doi.org/10.5281/zenodo.3752068

Ethics Approval and Consent to Participate The Institute Review Board (IRB) committee has approved the project at the MNGHA with the number (RJ17-023-J), and Institutional Biosafety and Bioethics Committee (IBEC) at KAUST has approved the project with the number (17IBEC38_Pain).

103

FIGURE 4.1: Schematic presentation of the two circular plasmids in NGKP-54 and Comparison between the first published plasmid carrying mcr-8 (pKP91) and NGKP- 54 mcr-8 plasmid (pNGHA54) (a) Schematic illustrating the two plasmids carrying mcr- 8, the plasmid carrying the first discovered mcr-8 (pKP91) (MG736312.1) [19] and the plasmid identified in our strain carrying a complete and a fragmented partial mcr-8. The comparison is showing complete and partial similarities in relation to the original mcr-8 gene. These figures were visualized using Circos Perl-native [20], (b) pNGHA-mcr-1 plasmid containing mcr-1 colistin resistance gene among other resistance-carrying marker genes, and (c) pNGHA-54 plasmid containing a complete mcr-8 gene, a fragmented partial sequence of mcr-8 and an eptA gene. The red arrows point to antibiotic resistance marker genes, and blue arrows are plasmid coding regions.

104

A DDM D PE G PE- interacting residues J Zn2+ - interacting residues

S325 H383 K328 H378 D452

A H465 t E114 T280

p

E N106 E240 H453 T110

B E H K H388

H471

8 H383 E115 D458

R

C K331 T282 M T111 N107 E243

FIGURE 4.2: Structural comparison of MCR-8, partial MCR-8, and lipooligosaccharide phosphoethanolamine transferase EptA. SwissModel was used to predict 3D models of MCR-8 and partial MCR-8 using PDB 5FGN as a template. The ligand structure for PE and DDM were downloaded from the ChemSpider database. The molecular docking studies were performed using SwissDock [21]. Structural analyses were carried out on PyMol [22] Left: Structural models show conserved binding sites between EptA domains: transmembrane and soluble periplasmic domains in comparison to MCR-8 and partial MCR-8. 3D model of full-length MCR-8 (blue) superimposed on the 3D model of the fragment (green). Right: PE binding cavity of A: EptA, and B: MCR-8 showing DDM bound in the cavity. PE binding cavity of D: EptA and E: MCR-8 showing PE bound in the cavity. The ligands are presented as cyan sticks. Important PE binding residues are displayed as sticks for G: EptA, and H: MCR8 . Important Zn2+ binding residues shown as sticks for J: EptA, and K: MCR-8. Zn2+ is shown as an orange sphere.

105

BIBLIOGRAPHY 1. Collignon, P.C., et al., World Health Organization ranking of antimicrobials according to their importance in human medicine: a critical step for developing risk management strategies to control antimicrobial resistance from food animal production. Clinical Infectious Diseases, 2016. 63(8): p. 1087-1093. 2. Li, R., et al., Identification of a novel hybrid plasmid coproducing MCR-1 and MCR-3 variant from an Escherichia coli strain. Journal of Antimicrobial Chemotherapy, 2019. 74(6): p. 1517- 1520. 3. Xu, Y., et al., An evolutionarily conserved mechanism for intrinsic and transferable polymyxin resistance. Mbio, 2018. 9(2): p. e02317-17. 4. Wang, X., et al., Emergence of a novel mobile colistin resistance gene, mcr-8, in NDM-producing Klebsiella pneumoniae. Emerging microbes & infections, 2018. 7(1): p. 122. 5. Sun, J., et al., Towards understanding MCR-like colistin resistance. Trends in microbiology, 2018. 26(9): p. 794-808. 6. Testing, E.C.o.A.S., Antimicrobial susceptibility testing of colistin—problems detected with several commercially available products. EUCAST warnings concerning antimicrobial susceptibility testing products or procedures. Växjö: EUCAST, 2017. 7. Koren, S., et al., Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research, 2017: p. gr. 215087.116. 8. Walker, B.J., et al., Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one, 2014. 9(11): p. e112963. 9. Wick, R.R., et al., Bandage: interactive visualization of de novo genome assemblies. Bioinformatics, 2015. 31(20): p. 3350-3352. 10. Ruppitsch, W., et al., Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Listeria monocytogenes. Journal of clinical microbiology, 2015. 53(9): p. 2869-2876. 11. Waterhouse, A., et al., SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research, 2018. 46(W1): p. W296-W303. 12. Ahn, C., et al., OXA-48-producing Enterobacteriaceae causing bacteremia, United Arab Emirates. International Journal of Infectious Diseases, 2015. 30: p. 36-37. 13. Zhang, H., et al., MCR-1 gene has no effect on colistin resistance when it coexists with inactivated mgrB gene in Klebsiella pneumoniae. Microbial Drug Resistance, 2018. 24(8): p. 1117-1120. 14. Sun, J., et al., Deciphering MCR-2 colistin resistance. MBio, 2017. 8(3): p. e00625-17. 15. Hwang, K.-E., et al., Inhibition of autophagy potentiates pemetrexed and simvastatin-induced apoptotic cell death in malignant mesothelioma and non-small cell lung cancer cells. Oncotarget, 2015. 6(30): p. 29482. 16. Pensec, C., et al., Impact of chemotherapy on the intestinal microbiome and epithelial barrier in PDX models of lung cancer. 2019, AACR. 17. Wang, X., et al., Emergence of Colistin Resistance Gene mcr-8 and Its Variant in Raoultella ornithinolytica. Front Microbiol, 2019. 10: p. 228. 18. Kieffer, N., P. Nordmann, and L. Poirel, Moraxella species as potential sources of MCR-like polymyxin resistance determinants. Antimicrobial agents and chemotherapy, 2017. 61(6): p. e00129-17.

106

19. Wang, X., et al., Emergence of colistin resistance gene mcr-8 and its variant in Raoultella ornithinolytica. Frontiers in microbiology, 2019. 10: p. 228. 20. Krzywinski, M., et al., Circos: an information aesthetic for comparative genomics. Genome research, 2009. 19(9): p. 1639-1645. 21. Grosdidier, A., V. Zoete, and O. Michielin, SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic acids research, 2011. 39(suppl_2): p. W270-W277. 22. Yuan, S., et al., PyMOL and Inkscape bridge the data and the data visualization. Structure, 2016. 24(12): p. 2041-2042.

107

Chapter 5: Crohn’s Disease Patient Infected with Multiple Co- occurring Nontuberculous Mycobacteria

In the following project, I have written the ethical and consent to collected the biopsy sample, discussed and collect the clinical data and outcomes with the medical team, as well as wrote the majority of the manuscript with the exception of the bioinformatics methodologies. I have carried out all the molecular and sequencing protocols.

Sharif Hala1,2,3,4*, Chakkiath Paul Antony 1,5*, Qingtian Guan 1, Mohammed Alshehri 2,3,4, Asim Alsaedi 2,3,4, Alaa Alsharief 2,3,4, Abdulfattah Al-Amri 2,3,4, Arnab Pain 1, 6,7# 1Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal-Jeddah, Saudi Arabia 2King Saud bin Abdulaziz University for Health Sciences, Jeddah, Saudi Arabia 3King Abdullah International Medical Research Centre, Jeddah, Saudi Arabia 4Ministry of National Guard Health Affairs, Jeddah, Saudi Arabia 5Red Sea Research Centre, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Saudi Arabia 6Center for Zoonosis Control, Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, N20 W10 Kita-ku, Sapporo, Japan. 7Nuffield Division of Clinical Laboratory Sciences (NDCLS), The John Radcliffe Hospital, University of Oxford, Headington, Oxford, OX3 9DU, United Kingdom

*These authors contributed equally to this article  For correspondence

Conflict of Interest The authors have no conflicts to declare. Author Contribution: SH and CPA contributed equally to the manuscript; SH, AbA, and AsA collected and interpreted the clinical data; SH and MA performed the biochemical experiments; SH performed the laboratory characterization experiments; AlA analysed the medical imaging data; CPA and QG performed the bioinformatics analysis; SH, AbA, AsA and AP conceived the study; and SH, CPA, AlA, QG and AP prepared the figures and wrote or revised the manuscript. Funding This study was supported by baseline funding (BAS/1/1020-01-01) from KAUST to AP. Ethics Approval and Consent to Participate The project was approved by the Institutional Biosafety and Bioethics Committee (IBEC) KAUST (17IBEC38_Pain) and informed patient consent was obtained.

DOI: https://doi.org/10.1093/ibd/izaa100

108

5.1 Case

Inflammatory bowel disease (IBD), defined as chronic inflammation of the gastrointestinal tract, is a condition that encompasses Crohn’s disease (CD) and ulcerative colitis [1]. The diagnostic procedure for IBD always involves a challenging infection assessment1. Identification of the microbes present and their pathogenic potential are essential for choosing a treatment plan. Gastrointestinal infections caused by nontuberculous mycobacteria (gNTM) are challenging to diagnose based on only disease symptoms. Diagnosis is especially difficult in the case of IBD patients suspected of having CD. Diagnostic studies that investigated symptom similarities have suggested the potential need for investigating mycobacteria-related infections in IBD cases [2, 3]. Both CD and mycobacterial infections have been associated with the development of granuloma and branching fistula. However, conventional methods are likely insufficient for the identification of NTM associated with CD [4]. Furthermore, the diagnosis and treatment of mycobacterial infection in an IBD patient could depend on the type of mycobacteria and the presence of certain chromosomally encoded virulence factors, such as the toxin- antitoxin (TA) elements parD, hipB, mazEF, and VapB [5]. TA loci have been implicated in mediating bacterial adaptive responses resulting in the development of latent infections and virulence [5]. NTM infections are exacerbated by their intrinsic drug resistance, their TA mechanisms and the lack of adequate or timely identification [5].

In 2018, a 42-year-old Saudi male (resident of Western Province, Saudi Arabia) presented to the emergency room at King Abdulaziz Medical City (KAMC) Hospital in Jeddah with hypoglycaemia, hypovolemia, and a decreased level of consciousness. He was admitted for three months and had a history of vomiting, diarrhoea, chest tightness, and abdominal pain for two weeks along with joint pain, decreased oral intake, weight loss, subjective fever, and night sweats for 3 months. In addition to computed tomography (CT) of the abdomen and pelvis, magnetic resonance imaging (MRI) of the perianal area was performed. The patient was tested for mycobacteria using an interferon-gamma release assay (IGRA) (QuantiFERON) and acid-fast bacillus (AFB) smear and culture. Eventually, the patient was subjected to an ileo-colonoscopy procedure based on a suspicion of IBD. The biopsy from the terminal ileum was subjected to shotgun metagenomic sequencing on a HiSeq

109

4000 platform (Illumina, USA) followed by metagenomic assembly, binning and analyses (see Methods in Supplementary Data). MRI of the perianal area showed complex branching perianal fistulae with fluid collections (Supplementary Figure S1). There was mild inflammation with cobblestoning at the terminal ileum, suggesting CD. Histopathological examination of the biopsy revealed acute ileitis but was negative for microorganisms and granuloma. A repeat of the AFB smear along with culturing from the extracted biopsies presented negative results. The metagenomic sequencing results revealed a unique co-occurrence of three different NTM species: M. smegmatis (32.8x average sequencing coverage), M. riyadhense (29.7x), and M. kansasii (12.4x) (Figure 5.1 and Supplementary Table S1). Antibiotic resistance genes were detected in the M. smegmatis metagenome-assembled genome (MAG) (Bin 1), while no resistance genes were found in the other two MAGs (Figure 5.1 and Supplementary Table S1). The distribution of the TA profiles of the NTMs indicated that they were highly similar to human-infecting NTMs that were sequenced previously (Supplementary Figure S2). The comparison of the TA systems of clinically relevant infective strains and that of our isolates illustrated similarities in the presence of virulence factors, thereby suggesting pathogenicity.

The detection of three co-occurring and actively replicating pathogenic mycobacteria within an immunocompetent CD patient is particularly unique and requires further investigation. Metagenomic analysis allowed us to identify the first case of M. riyadhense colonizing the gastrointestinal tract. We recommend the shotgun metagenomic sequencing approach as a valuable diagnostic tool for the accurate identification of any infectious agent(s) in suspected IBD patients. The raw reads and MAGs from this study are publicly available at the European Nucleotide Archive (ENA) under study accession number PRJEB30883.

110

FIGURE 5.1 Metagenome binning plot (total number of read pairs in shotgun library=27.3 Mio) and characteristics of bacterial metagenome-assembled genomes (MAGs). Each dot represents a contig, and all contigs are coloured according to their taxonomic identity.

111

BIBLIOGRAPHY 1. Katz, J.A., G. Melmed, and B.E. Sands, The FACTS ABOUT Inflammatory Bowel Diseases. Crohn’s & Colitis Foundation of America, New York, 2011. 2. Graham, D.Y., D.C. Markesich, and H.H. Yoshimura, Mycobacteria and inflammatory bowel disease: results of culture. Gastroenterology, 1987. 92(2): p. 436-442. 3. Jayanthi, V., et al., Does Crohn's disease need differentiation from tuberculosis? Journal of gastroenterology and hepatology, 1996. 11(2): p. 183-186. 4. Almadi, M.A., et al., New insights into gastrointestinal and hepatic granulomatous disorders. Nature Reviews Gastroenterology & Hepatology, 2011. 8(8): p. 455. 5. Gerdes, K., S.K. Christensen, and A. Løbner-Olesen, Prokaryotic toxin–antitoxin stress response loci. Nature Reviews Microbiology, 2005. 3(5): p. 371-382.

112

Chapter 6: Discussion & Conclusion

In this chapter, I will be including a collective summary of the work presented in this thesis.

The incidences of MDR infections are on the rise in hospitals worldwide, and the existing methods of pathogen identification based on traditional culture-based microbiological protocols are suboptimal and often time-consuming. Slow reporting and diagnosis of an infectious agent could warrant the overuse of broad-spectrum antibiotics leading to poor patient outcomes and prolonged hospital stays. Long-term hospitalization has been reported to have an increased carriage of MDR Enterobacteriaceae in upwards of 50% [1], which is considerably higher than the general population [2]. Over the last two decades, the progress towards high-throughput microbial identification technologies, such as WGS, leading to the generation of microbial genomes databases are now considered the new gold standard. Recent developments in WGS and metagenomic diagnostics have transformed the way in which the identification of pathogen’s AMR genes, virulence factors, sequence types, and transmission patterns are determined. However, WGS and metagenomic diagnostics have not yet been commonly adopted in clinical microbiology in many countries, including Saudi Arabia. WGS and metagenomics can be utilized for the adoption of prophylactic and early pathogen detection interventions, which will aid in reducing and even preventing potential outbreaks.

Therefore, it is essential to produce impactful data that can be directly associated with current infectious agents’ issues, their transmission patterns, and updating clinicians on the type of AMR genes driving successful infective lineages. These studies herein have notably contributed to many aspects of clinical microbiology in providing: 1) a greater

113 understanding of K. pneumoniae clonal transmission pattern and its position of recurrent infections in ICU and LT care patients in Saudi Arabia, 2) a protocol for tracing CREs dissemination locally and link it with the global picture, 3) an illustrative reporting of a novel co-occurrence of mobile genetic elements causing colistin resistance as it is regarded as the last resort antibiotic, and 4) an example of metagenomic applications on hard to solve cases associated with infective agents.

6.1 Infectious Disease Surveillance and WGS

The WHO have highlighted the ESKAPE pathogens as a high priority that requires continuous surveillance and immediate control strategies [3]. Beyond pathogen identification and a collection of resistance phenotype, the clinical microbiology laboratory provides limited information. Nevertheless, ESKAPE pathogens carry a highly diverse

AMR and virulence gene repertoires that could cause invasive nosocomial infections.

Therefore, we collected a large number of isolates from the MNGHA, Jeddah branch between November 2014 to March 2018 of ESKAPE pathogens to apply WGS to study their clonal lineages, AMR, and virulence markers. Since the preliminary epidemiological reports of the highest infective agents were K. pneumoniae, I opted to focus on identifying the factors causing its successful persistence and transmission. I chose to analyse K. pneumoniae isolates in-depth and link the clinical, and the epidemiological data to provide a road map for future work on the ESKAPE pathogens from multiple hospitals.

114

6.1.1 Clonal Lineages of Hypervirulent and MDR K. pneumoniae MDR K. pneumoniae is part of the highest reported nosocomial-causing ESKAPE pathogens in Saudi Arabia, which results in high mortality [4-8]. There has been great number of reports globally, describing several high-risk clones and ST of K. pneumoniae

[9]. Among these STs are ST258 [10, 11], ST14 [12], as well as other types with less prevalent STs, including ST101, ST48, and ST15 [13]. Our local surveillance of the 235

K. pneumoniae has identified 99 of the isolates belonging to the same clone as ST2096

(Chapter 2). These 99 isolates contained hypervirulence factors, which allowed them to survive on surfaces and to colonize hosts for an extended period. The ST2096 isolates were also MDR and XDR, carrying various AMR genes against last-resort antibiotics such as carbapenems and colistin. We noticed a strong association between the presence of ST2096 and the high mortality rate (59%) in cases associated with K. pneumoniae combined with continuous transmission of nosocomial infections, which prompted outbreak signs.

Additionally, the first detected isolates were in the middle of our surveillance collection period in late 2016 and continued until the end of our collection in early 2018, suggesting this was a vastly communicable clone of K. pneumoniae. A similar outbreak pattern of hypervirulent and lethal K. pneumoniae ST11 was presented recently in China [14]. This outbreak was assessed using WGS from 21 carbapenem-resistant ST11 K pneumoniae strains from five patients. The 99 isolates ST2096 had a similar profile of hypervirulence and carried a large number of AMR genes to the ST11 outbreak, but the number of continuously identified ST2096 isolates was much higher (99 isolates). Thus, it was a necessity to investigate further the transmission, phylodynamic, phylogenetic, and

AMR profiles of our ST2096 and informed the hospital of this recurrent outbreak.

115

6.1.2 Studying the Transmission Patterns of ST2096 The pattern of transmission of ST2096 revealed multiple mini incidences of outbreak within a continuous transmission over 1.3 years (Chapter 2). I utilized the SCOTTI [15] tool to reconstruct the transmission map of ST2096, which benefits from a large number of repeated sampling. The same tool was used in investigating the neonatal unit K. pneumoniae outbreak in Nepal [16]. This tool combines the timeline of infection and length of patients’ stay with the WGS data of ST2096 isolates. The phylodynamic transmission maps of direct spread were able to guide us into the index case responsible for the outbreak.

The phylogenetic data from the indirect transmission analysis provided further information on the possibility of long-term colonization of the organism as well as reinfection instances.

Moreover, since the surveillance isolates were collected over 3.4 years, the WGS analysis provided a comprehensive list of the expected AMR and virulence factors in our hospital.

The collection had a noticeable recurrent increase in the status of the isolates from MDR to XDR and a high exchange of known resistance mobile elements. WGS was used as a tool that enabled the hospital to discover a silent outbreak of K. pneumoniae successfully through exploring the genomic landscape of pathogens. In turn, the genomic data filled the knowledge gap which, when linked with the epidemiological and the clinical data, improved the IPC policies and procedures followed in the hospital. Previous studies reporting outbreaks to improve IPC have had a positive effect on patient outcomes and lowered nosocomial infections [17-19].

6.2 Unique Co-occurrence of mcr Genes in K. pneumoniae

116

The initial analysis of the surveillance samples from the MNGHA AMR profiles revealed that 34% of the 235 isolates were phenotypically resistant to colistin antibiotics. Since colistin is considered as the last-resort antibiotic for the treatment of MDR K. pneumoniae, further investigation of these colistin resistance isolates was a priority. Especially that colistin MIC could not be measured using the conventional methods, and the MIC could only be determined through BMD according to EUCAST and CLSI warnings. Herein, I was able to apply both BMD phenotypic and WGS genotypic analyses on the 235 surveillance isolates. The colistin study verified the ability to complement the phenotype and MIC with a genetic marker for colistin and revealed mobile colistin markers in the collection (Chapter 3). The WGS analysis supported the hypothesis that mcr gene variants are abundant [20], however, it also revealed yet other novel variants of mcr in Saudi Arabia

[21].

Nonetheless, through deep sequencing by combining short-read and long-read sequencing assemblies I was able to detect a unique co-occurrence of complete mcr-1 and mcr-8 as well as a partial fragment of mcr-8 in one of the isolates from 2015 (Chapter 2). Both mcr-

8 genes were presented on the same plasmid. As a result of the combined assemblies, I was able to generate completed, closed, and circular plasmid assemblies from the isolate, which permitted accurate analysis of the variable mcr regions encoded by the plasmid.

Importantly, this unique multiple co-occurrence has never been reported in the world. The colistin resistance K. pneumoniae that hosted these mcr variants was isolated from the sputum of an NSCLC patient with pneumonia infection among the 2015 collection, which makes it the earliest clinical sample discovered carrying of mcr-8 [22].

117

Moreover, exposing mcr genes presence alerted the clinicians to the possibility of transmission in other pathogens in our hospital, which may cause significant issues in treating nosocomial infections. The presence of other mcr variants was observed in other isolates of our surveillance collection, which could be an indicator of a mobile colistin resistance element being exchanged within the hospital, and that is alarming. Furthermore,

WGS of the MDR K. pneumoniae was necessary for establishing colistin profile, even in patients who were not treated with it (Chapter 2). WGS data and analysis were able to detect fragment of mcr-8 gene and allowed us to show that such mcr genes can drive resistance even at partial presence. I believe the results obtained in this study (Chapter 2) will benefit IPC in updating the current identification protocols to include various types of colistin AMR genes at our hospital. These findings could not have been possible without the utilization of WGS, which enabled the detection of rare and unique AMR mobile variations.

6.3 Emergence of Carbapenem-Hydrolyzing β-lactamase KPC

Since Saudi Arabia is considered the religious epicenter with a continuous mass influx of pilgrims from across the world, identification, and reporting of AMR factors is essential for public health and safety measures. Especially AMR genes resistant to beta-lactam antibiotics, which represent the majority of available treatments of Gram-negative bacterial infections. Additionally, false reporting of the infection agent is known to occur by routine, conventional phenotypic testing, especially in closely related species such as K. pneumoniae and K. quasipneumoniae (Chapter 3). The KPC-type enzymes encoded by blaKPC-type genes are difficult to detect by routine screening and have been known for their

118 high transmission potential in various CREs due to transposons Tn4401 family [43].

Further investigation of our surveillance strains from MNGHA using WGS with the focus on carbapenemases led to identifying the first blaKPC-2 harboring plasmid in a K. quasipneumoniae strain in Saudi Arabia and the GCC region (Chapter 3). Mobile genetic elements, in particular plasmids, are characteristically highly challenging to identify and assemble correctly as a result of repeat regions. However, I was able to surpass this hurdle by combining short and long reads assemblies, which produced higher-quality assemblies of the chromosomal and plasmid genomes. My cgMLST study of NGKPC-421 isolate, which carried the blaKPC-2, identified it as a novel K. quasipneumoniae subsp. similipneumoniae strain that was assigned ST-3510. The blaKPC-

2 was discovered flanked by transposons in an IncX6 plasmid (pKPC-421), which can circulate among other MDR Enterobacteriaceae species rapidly. Detection of a misidentified MDR K. quasipneumoniae isolate was troublesome. It suggested the need for an urgent update of the surveillance methods by applying high-throughput WGS for accurate pathogen identification to avoid the emergence and spread of MDR strains in the region..

6.4 Clinical Metagenomics may Enable Broader Pathogen Detection

In addition to surveillance of MDR Gram-negative isolates and AMR phenotypes and genotypes, I chose to explore the possibility of implementing metagenomics protocol

(Chapter 5) in our local hospitals targeting hard-to-diagnose cases such as IBD cases of

Crohn’s disease (Chapter 5). Metagenomic analysis directly from the patient’s sample has been assisting in generating the necessary information for detecting microbes associated

119 with the disease. It was discussed in detail by Chiu lab [23-25]. I observed in MNGHA-

Jeddah hospital limitations in clinical microbiology methods in producing false-negative

(Chapter 2, 3, and 4). Therefore, applying metagenomics directly to patient samples could be used as supportive evidence towards an accurate diagnosis and identification.

6.4.1 Mycobacteria Associated with Crohn’s Disease By using the metagenomic approach of directly sequencing patient samples, I was able to investigate a latent infection caused by three co-occurring NTM species, M. smegmatis, M. riyadhense, and M. kansasii, in a Crohn’s disease patient that were not identified by conventional diagnostics methods (Chapter 5). Typically, gastrointestinal NTM infections are hard to diagnose and differentiate from IBD, especially Crohn’s, because of disease symptom similarities. Differentiation based on macroscopic and microscopic investigations can be challenging, which directly affect the treatment plan. In 2018, a 42- year-old Saudi male presented to the emergency room in our local hospital was suspected of having Crohn’s disease. However, the patient test results did not confirm nor deny an association of NTM to his symptom. This uncertainty prevented his treating physicians from providing the best course for his treatment and required further investigations.

Additionally, this approach assisted to further investigate the TA and AMR patterns amongst the co-occurring mycobacteria. Further phylogenetic analysis identified the M. kansasii genome in the patient to be associated with a strain isolated from tap water [26]. I observed this case as an opportunity to assist the medical team by applying metagenomic analysis on a biopsy sample of the patient and discovered the co-occurrence of the three

NTM species. Interestingly, the metagenomic analysis allowed us to identify the first case of M. riyadhense infecting the gastrointestinal tract. This case shows the importance of

120 employing metagenomic methodology for the identification of suspected or unknown infections, which provided strong evidence of its validity as a valuable diagnostic tool in our hospitals.

6.5 Future Experiment and Lessons for Pathogen Identification

This work has revealed the need to implement WGS and metagenomics in clinical settings and illustrated the importance of the need for genomic surveillance of the ESKAPE pathogens. It was shown that such surveillance combined with epidemiological data could reveal silent and lethal outbreak strains of K. pneumoniae in a tertiary hospital. Although literature did not mention strain ST2096 as hypervirulent high-risk, our data provided evidence that it has been causing fatal nosocomial infections for more than 1.3 years. This report has been used to guide IPC, in need to include metagenomic surveillance and the necessity to identify the MLST of the highest reported bacterial pathogens.

Even though the identification of AMR genotype is not yet sufficient enough to guide treatment decisions without the use of conventional phenotypic analysis, in cases of colistin resistance and unique beta-lactamases, the strong evidence of WGS-based technology indicates that it can be an effective tool for an early detection reporting of resistance. WGS could be used as a powerful personalized medicine tool to amend antimicrobial treatment for the individual patient. For example, it has been challenging to determine an accurate colistin MIC in hospitals, which dictates its usage as a last-resort antibiotic. WGS and metagenomics could be applied in such cases to study such XDR pathogens and identify any possible AMR factors and hence provide the correct treatment and improve patient safety. While this study elucidated a significant fatal outbreak of ST2096 and previously

121 undetected AMR genes to carbapenems and colistin, it was focused on K. pneumoniae and did not include examples of other ESKAPE pathogens. The future experiment will include multiple hospitals in Saudi Arabia and multiple organisms of the ESKAPE pathogens.

These types of studies would allow us to uncover the transmission pattern of AMR genes and monitor the dominant high-risk clones.

6.6 Future Perspectives:

The outcome of the current study was focused only on K. pneumoniae as it was reported to be among the highest pathogen-causing infections in the MNGHA-western region.

Throughout this project, I have obtained more than 1600 isolates from various ESKAPE pathogens that I will be focusing on applying a similar protocol and analysis on each type associating it with the clinical data. I aim to aid with metagenomic protocol in diagnosing unknown infections in the near future, especially with unknown fever, as it is the highest reported undiagnosed and infection-related symptom. I am also introducing third- generation sequencing to our hospital to be used in urgent cases and for confirmation, as illustrated in chapter 3. The projects' outcomes in chapters 2 and 5 have been used to educate the infectious disease doctors in MNGHA and are looking to apply the technology to their patients.

The aim is to produce enough data to convince the government and hospital staff to invest in WGS as a diagnostic tool. However, with the current Coronavirus disease (COVID-19) pandemic, the Saudi Ministry of Health, along with multiple laboratories across tertiary hospitals in Saudi Arabia, have turned their focus into molecular microbiology and sequencing. Therefore, I am optimistic that repeating our study on multiple bacterial pathogens and aiding in future unknown infections will be more plausible. I have started

122 collecting various types of samples from COVID-19 patients to apply the protocols mentioned in this thesis. Therefore, identifying viral genomic sequences and analyzing novel mutations for a global phylogenetic analysis is essential to aid in understanding the spread of infections and emergence of mutations. This will provide further valuable information such as the transmission pattern. The pattern of the COVID-19 tranmission

WGS I am hoping that my future WGS projects will aid in linking Saudi Arabia's data with the global AMR networks.

123

BIBLIOGRAPHY 1. Ludden, C., et al., Colonisation with ESBL-producing and carbapenemase-producing Enterobacteriaceae, vancomycin-resistant enterococci, and meticillin-resistant Staphylococcus aureus in a long-term care facility over one year. BMC infectious diseases, 2015. 15(1): p. 168. 2. Karanika, S., et al., Fecal colonization with extended-spectrum beta-lactamase–producing Enterobacteriaceae and risk factors among healthy individuals: a systematic review and metaanalysis. Reviews of Infectious Diseases, 2016. 63(3): p. 310-318. 3. Shrivastava, S.R., P.S. Shrivastava, and J. Ramasamy, World health organization releases global priority list of antibiotic-resistant bacteria to guide research, discovery, and development of new antibiotics. Journal of Medical Society, 2018. 32(1): p. 76. 4. Dat, V.Q., et al., Bacterial bloodstream infections in a tertiary infectious diseases hospital in Northern Vietnam: aetiology, drug resistance, and treatment outcome. BMC infectious diseases, 2017. 17(1): p. 493. 5. Balkhy, H.H., et al., The strategic plan for combating antimicrobial resistance in Gulf Cooperation Council States. 2016, Elsevier. 6. uz Zaman, T., et al., Clonal diversity and genetic profiling of antibiotic resistance among multidrug/carbapenem-resistant Klebsiella pneumoniae isolates from a tertiary care hospital in Saudi Arabia. BMC infectious diseases, 2018. 18(1): p. 205. 7. Zaman, T.U., et al., Insertion element mediated mgrB disruption and presence of ISKpn28 in colistin-resistant Klebsiella pneumoniae isolates from Saudi Arabia. Infection and drug resistance, 2018. 11: p. 1183. 8. Balkhy, H.H., et al., Ten-year resistance trends in pathogens causing healthcare-associated infections; reflection of infection control interventions at a multi-hospital healthcare system in Saudi Arabia, 2007–2016. Antimicrobial Resistance & Infection Control, 2020. 9(1): p. 21. 9. Woodford, N., J.F. Turton, and D.M. Livermore, Multiresistant Gram-negative bacteria: the role of high-risk clones in the dissemination of antibiotic resistance. FEMS microbiology reviews, 2011. 35(5): p. 736-755. 10. Marsh, J.W., et al., Evolution of Outbreak-Causing Carbapenem-Resistant Klebsiella pneumoniae ST258 at a Tertiary Care Hospital over 8 Years. mBio, 2019. 10(5): p. e01945-19. 11. Soria-Segarra, C., et al., Tracking KPC-3-producing ST-258 Klebsiella pneumoniae outbreak in a third-level hospital in Granada (Andalusia, Spain) by risk factors and molecular characteristics. Molecular Biology Reports, 2020. 47(2): p. 1089-1097. 12. Arena, F., et al., Large oligoclonal outbreak due to Klebsiella pneumoniae ST14 and ST26 producing the FOX-7 AmpC β-lactamase in a neonatal intensive care unit. Journal of clinical microbiology, 2013. 51(12): p. 4067-4072. 13. Marcade, G., et al., The emergence of multidrug-resistant Klebsiella pneumoniae of international clones ST13, ST16, ST35, ST48 and ST101 in a teaching hospital in the Paris region. Epidemiology & Infection, 2013. 141(8): p. 1705-1712. 14. Gu, D., et al., A fatal outbreak of ST11 carbapenem-resistant hypervirulent Klebsiella pneumoniae in a Chinese hospital: a molecular epidemiological study. The Lancet infectious diseases, 2018. 18(1): p. 37-46. 15. De Maio, N., C.-H. Wu, and D.J. Wilson, SCOTTI: efficient reconstruction of transmission within outbreaks with the structured coalescent. PLoS computational biology, 2016. 12(9): p. e1005130. 16. Stoesser, N., et al., Genome sequencing of an extended series of NDM-producing Klebsiella pneumoniae isolates from neonatal infections in a Nepali hospital characterizes the extent of community-versus hospital-associated transmission in an endemic setting. Antimicrobial agents and chemotherapy, 2014. 58(12): p. 7347-7357. 17. Decraene, V., et al., A Large, Refractory Nosocomial Outbreak of Klebsiella pneumoniae Carbapenemase-Producing Escherichia coli Demonstrates Carbapenemase Gene Outbreaks Involving Sink Sites Require Novel Approaches to Infection Control. Antimicrobial Agents and Chemotherapy, 2018. 62(12). 18. Mutters, N.T., et al., Improvement of infection control management by routine molecular evaluation of pathogen clusters. Diagnostic microbiology and infectious disease, 2017. 88(1): p. 82-87.

124

19. Li, M., et al., Infection-prevention and control interventions to reduce colonisation and infection of intensive care unit-acquired carbapenem-resistant Klebsiella pneumoniae: a 4-year quasi- experimental before-and-after study. Antimicrobial Resistance & Infection Control, 2019. 8(1): p. 8. 20. Wise, M.G., et al., Prevalence of mcr-type genes among colistin-resistant Enterobacteriaceae collected in 2014-2016 as part of the INFORM global surveillance program. PLoS One, 2018. 13(4). 21. Hadjadj, L., et al., Study of mcr-1 gene-mediated colistin resistance in Enterobacteriaceae isolated from humans and animals in different countries. Genes, 2017. 8(12): p. 394. 22. Wang, X., et al., Emergence of a novel mobile colistin resistance gene, mcr-8, in NDM-producing Klebsiella pneumoniae. Emerging microbes & infections, 2018. 7(1): p. 1-9. 23. Naccache, S.N., et al., A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res, 2014. 24(7): p. 1180-92. 24. Chiu, C.Y. and S.A. Miller, Clinical metagenomics. Nat Rev Genet, 2019. 20: p. 341-355. 25. Gu, W., S. Miller, and C.Y. Chiu, Clinical metagenomic next-generation sequencing for pathogen detection. Annual Review of Pathology: Mechanisms of Disease, 2019. 14: p. 319-338. 26. Tagini, F., et al., Phylogenomics reveal that Mycobacterium kansasii subtypes are species-level lineages. Description of Mycobacterium pseudokansasii sp. nov., Mycobacterium innocens sp. nov. and Mycobacterium attenuatum sp. nov. International journal of systematic and evolutionary microbiology, 2019. 69(6): p. 1696-1704.

125

APPENDICES Chapter 2

Statistical analysis

Drug Frequency Resistant Susceptible Intermediate P-value Amikacin 87 12 - <0.001 Cefotaxim 94 2 3 <0.001 Colistin 30 69 - <0.001 Fosfomycin 7 92 - <0.001 Ciproflaxacin 98 1 - <0.001 Temocillin 89 10 - <0.001 Chloramphenicol 49 50 - 1 Ceftazidim 64 30 5 <0.001 Ceftazidime\Avibactam 3 96 - <0.001 Meropenem 88 10 1 <0.001 Piperacillin\Tazobectam 99 0 - <0.001 Ceftolozane\Tazobactam 92 7 - <0.001 Imipenem 66 12 21 <0.001 Trimethoprim\Sulfamethoxazol 92 7 - <0.001

Supplementary Table 2.1: The frequency and the P-value calculated for each resistance profile for all isolate based on resistant, susceptible, or intermediate phenotype according to EUCAST breakpoints. The total number of isolates is 99.

number of resistant genes 6 7 8 10 11 12 13 14 15 16 17 18 19 21 Period 1 2 2 0 1 0 0 0 3 0 2 5 3 1 1 2 1 1 1 0 0 1 8 1 0 1 3 3 0 0 3 1 1 0 0 0 0 6 0 4 2 5 1 0 0 4 0 1 2 0 1 0 3 2 1 2 6 1 0 0 5 0 0 0 0 0 0 0 2 2 1 5 6 4 0 P-value 0.0306

Supplementary Table 2.2: Shows the frequency distribution of the isolates based on the number of resistant genes and the collection date of the isolate. We clustered the isolates into five equally-distributed periods based on the collection date. P-value is calculated based on the distribution of the number of isolates’ frequency within one period. Each isolate has the number of resistant genes. The total number of isolates is 99.

126

Chapter 4

Supplementary Material

Antimicrobial MIC Interpretation (mg/L) Amikacin 4 Sensitive Cefepime 8 Resistance Ceftazidime 1 Sensitive Ciprofloxacin 2 Sensitive Colistin 8 Resistance Gentamicin > 16 Resistance Imipenem < 0.25 Sensitive Meropenem < 0.25 Sensitive Piperacillin/Tazobactam 64/4 Resistance Tigecycline 0.25 Sensitive Trimethoprim/Sulfamethoxazole 4/76 Resistance Levofloxacin 2 Resistance Fosfomycin 16 Sensitive Temocillin 128 Resistance Cefotaxim 2 Resistance Ceftolozane/tazobactam 8/4 Resistance Piperacillin < 0.25 Sensitive Supplementary Table 1: Characteristics and antibiotic resistance profiles and MIC of NGKP-54

Type/location Resistance Gene Antibiotics Chromosomal blaSHV-40 Beta-lactam blaSHV-56 blaSHV-79 blaSHV-85 blaSHV-89 fosA5 Fosfomycin oqxA Quinolone IncHI2 mcr-8 Colistin IncI1 mcr-1 Colistin sul1 Sulfonamide erm(B) Macrolide tet(x) Tetracycline aadA12 Aminoglycoside aac(3)-IIa Aminoglycoside IncR blaTEM-1B Beta-lactamas blaTEM-1D

127

dfrA14 Trimethoprim

aph(3')-Ia Aminoglycoside IncFIB(K) mph(A) Macrolide IncFII(K) aph(3')-Ia Aminoglycoside Supplementary Table 2: Resistance genes and their corresponding locations

Primer name The Sequence mcr-8-Long-541-F GCAGGGTGATGCGCTTATTG mcr-8-Long-541-R GGAAGACAGTGGTGTGTGGT mcr-8-Long-445-F CTCCACCGAAACTGGTGGTT mcr-8-Long-445-R ATAAGCGCATCACCCTGCAT mcr-8-Long-166-F ACTACCCTGCATGTTCTCGC mcr-8-Long-166-R ATCTGTGGGTACTCGCTTGC mcr-1-205-F TCCAAAATGCCCTACAGACC mcr-1-205-R GCCACCACAGGCAGTAAAAT mcr-1-899-F GCGAGTGTTGCCGTTTTCTT mcr-1-899-R AAGCGATCCAGCGTATCCAG

Supplementary Table.3: Primers list used for fragment amplification of the colistin mcr variants.

Type + mcr-1 + mcr-1 + mcr-8 + mcr-8 E. coli E. coli + Partial mcr-8 (KpNG-54) K. pneumoniae “MicroNaut” Resistant Resistant Resistant broth 4 ug/ml > 4 ug/ml > 16 ug/ml microdilution Manual Resistant Resistant Resistant broth 4 ug/ml > 4 ug/ml > 16 ug/ml microdilution

Supplementary Table.4: The resistance profile of the NGKP-54 and the cloned variants (n=6). Two different methods confirmed the colistin resistance profile in the original NGKP-54 isolate K. pneumoniae and three separate cloned E. coli +mcr-1, +mcr- 8 as well as a sensitive control. We noticed that combined variants decreased the MIC of colistin in the original isolate when compared to +mcr-8.

128

Protein Accession Number MCR-1.1 WP_049589868.1 MCR-1.2 WP_065274078.1 MCR-1.3 WP_077064885.1 MCR-1.4 WP_076611062.1 MCR-1.5 WP_076611061.1 MCR-1.6 WP_077248208.1 MCR-1.7 WP_085562392.1 MCR-1.8 WP_085562407.1 MCR-1.9 WP_099982800.1 MCR-1.10 WP_096807442.1 MCR-1.11 WP_099982815.1 MCR-1.12 WP_104009850.1 MCR-1.13 WP_109545056.1 MCR-1.14 WP_109545052.1 MCR-1.15 WP_116786830.1 MCR-1.16 WP_136512110.1 MCR-1.17 WP_136512111.1 MCR-1.18 WP_106743337.1 MCR-1.19 WP_129336087.1 MCR-1.20 WP_140423329.1 MCR-1.21 WP_140423330.1 MCR-1.22 WP_148044477.1 MCR-2.1 WP_065419574.1 MCR-2.2 WP_078254299.1 MCR-2.3 WP_094323230.1 MCR-3.1 WP_039026394.1 MCR-3.2 WP_094315354.1 MCR-3.3 WP_099982814.1 MCR-3.4 WP_065804663.1 MCR-3.5 WP_089613755.1 MCR-3.6 WP_042649074.1 MCR-3.7 WP_099156047.1 MCR-3.8 WP_099156048.1 MCR-3.9 WP_099156049.1 MCR-3.10 WP_099982820.1 MCR-3.11 WP_102607465.1 MCR-3.12 WP_109545070.1 MCR-3.13 WP_111273842.1 MCR-3.14 WP_111273843.1 MCR-3.15 WP_111273844.1 MCR-3.16 WP_111273845.1 MCR-3.17 WP_111273846.1 MCR-3.18 WP_111273847.1 MCR-3.19 WP_087879616.1 MCR-3.20 WP_065801616.1 MCR-3.21 WP_094312656.1 MCR-3.22 WP_094308975.1 MCR-3.23 WP_094313523.1 MCR-3.24 WP_094321595.1 MCR-3.25 WP_103252528.1 MCR-3.26 WP_140423331.1 MCR-3.27 WP_017778762.1 MCR-3.28 WP_150823496.1 MCR-3.29 WP_136512112.1 MCR-3.30 WP_140423332.1 MCR-4.1 WP_099156046.1 MCR-4.2 WP_109545058.1 MCR-4.3 WP_011638903.1

129

MCR-4.4 WP_109545055.1 MCR-4.5 WP_109545054.1 MCR-4.6 WP_116786828.1 MCR-5.1 WP_053821788.1 MCR-5.2 WP_109545057.1 MCR-5.3 WP_114699278.1 MCR-5.4 WP_148044478.1 MCR-6.1 WP_099982813 MCR-7.1 WP_104009851 MCR-8.1 WP_114699275.1 MCR-8.2 WP_072310976.1 MCR-8.3 WP_150823497.1 MCR-9.1 WP_001572373 EpTA WP_000919299.1 “MCR-like” Z1140 AAG55285.1

Supplementary Table 5: ENA accession numbers for the proteins used for phylogenetic analysis.

STRAIN LOCATION USA UNITED KINGDOM 130 CHINA UNITED KINGDOM UNITED KINGDOM UNITED KINGDOM BRAZIL AUSTRALIA SAUDI ARABIA CHINA USA USA CHINA CHINA CHINA Bootstrap CHINA 100 64 CHINA

Supplementary Figure.1 A phylogenetic tree based on alignments of MCR amino acid sequences. The rooted protein sequence-based phylogenetic tree using NGKP-54 mcr variant genes, mcr homologs, eptA and Z1140 as internal control. Each clade represented a distinct cluster of mcr subgroup. The tree is rooted using a PE transferas that was experimentally confirmed to have no function (1). Sequences were aligned using MUSCLE (2) and created with the GTRGAMMA model, bootstrapping (1,000 replicates) was created using RAxML (maximum likelihood) and viewed with the iTOL online tool v 4.4.2 (3). MCR-8 is placed in a subclade neighboring the chromosome-encoded colistin-resistant reference assembly MG1655 potentially implies parallel evolutionary paths for the both genes complete and partial MCR-8. MCR-1 gene is placed. The tree is drawn to scale of 0.2 with branch lengths measured in the number of substitutions per site and all the protein accession numbers are included in the (Supplementary material). Blue circles represent bootstrap.

131

Chapter 5

Supplementary Data

Supplementary Table 4.1. (a) Small subunit (SSU) ribosomal RNA (rRNA) genes reconstructed from metagenomic library reads (27.3 Mio paired-end Illumina reads) using phyloFlash v3.3b1 (prefixed with PF) and targeted read-mapping and assembly of Mycobacterium spp. 16S rRNA genes (prefixed with Map). a) Mapping percentages. Phylotype Length (bp) Coverage (X) Closest hit on SILVA/EZBioCloud database (% identity) PF_spades_1 1859 212.6 Homo sapiens (99.8%) PF_spades_2 1520 30.4 M. smegmatis NCTC8159 (100%) Map_spades_3 788 3.9 M. riyadhense DSM45176 (99.9%) Map_spades_4 382 13.5 M. kansasii NLA001001166 (100%) PF_spades_5 1474 5.5 Klebsiella pneumoniae ATCC43816 (99.9%) PF_spades_6 1505 4.2 Cutibacterium acnes DSM1897 (99.9%) b) Assembly metrics for mycobacterial MAGs. Bin1 Bin2 Bin3 Total length (bp) 6109547 6145425 6145349 Number of contigs 54 170 170 Largest contig 311,560 209,821 171,227 N50 163,530 64,151 57,896 N75 108,959 29,873 32,653 L50 15 30 37 L75 26 63 72 Number of Ns 0 0 0 c) Acquired antibiotic resistance genes (ARGs) that were identified in the mycobacterial MAGs MAG ARG Identity on ResFinder Phenotype Bin1 (M. smegmatis) tet(V) 100.00% Tetracycline resistance erm(38) 99.91% Macrolide resistance aac(2’)-Id 99.84% Aminoglycoside resistance Bin2 (M. riyadhense) None identified - - Bin3 (M. kansasii) None identified - -

132

d e f

Supplementary Figure 4.1: (Top) Three consecutive coronal CT images (a, b, and c) after oral and intravenous contrast administration (from anterior to posterior) demonstrate three areas of circumferential wall thickening (skip lesions) involving the ileal loops (red arrows). There is a fistulous tract between two skip lesions in the terminal ileum (white arrow). There is another fistulous tract between the terminal ileum and the sigmoid colon (black arrow). Note the air pockets within the bilateral obturator internus muscles indicating the presence of abscesses related to complex perianal disease (green arrows). (Bottom) Three selected coronal images along the axis of the anal canal (from anterior to posterior) of an MRI (d, e, and f) T2 sequence with fat suppression. There is a trans- sphincteric fistulous tract (white arrows) that extends from the left side of the anal canal to the left ischio-anal fossa that extends superiorly to the bilateral obturator internus muscles, forming fluid collections/abscesses with air-fluid level (green arrows). Similar changes, which are partially seen in these images, are present in the right side. Note the posterior tract/branching fistula (red arrows) that terminates in the left gluteal muscles, forming a third fluid collection (blue arrows).

133

Supplementary Figure 4.2. (a)Phylogenomic tree of mycobacterial MAGs (highlighted) along with Mycobacterium spp. genomes that are publicly available. Phylogenomic tree reconstruction for the Mycobacterial MAGs and the publicly available genomes of Mycobacterium spp. were performed with GToTree using the default settings and the Bacterial.hmm (74 single-copy-genes [SCGs]) [1]. Olsenella umbonata was used as the outgroup. Node support values (SH-like) are indicated. (b) Core-genome SNP tree for M. kansasii MAG (highlighted) and publicly available M. kansasii genomes along with their type number (I-VI), country, year and source of isolation. (c) Hierarchical clustering of the presence (black) and absence (white) of M. tuberculosis H37Rv toxin/antitoxin orthologues in MAGs of M. smegmatis (Bin 1), M. riyadhense (Bin 2) and M. kansasii (Bin 3). The orange-coloured blocks indicate only one of the toxin or antitoxin gene orthologues present from a given pair of the toxin/antitoxin system. The names of the toxin/antitoxin systems are shown for each row on the right. Our NTM MAGs were compared to M. tuberculosis H37Rv, M. riyadhense MR226, M. kansasii 12,478, and M. smegmatis MC2_155 strains [2]. We selectively chose TA to represent similarities between species, as they have the potential to be useful diagnostic markers for distinguishing pathogenic species associated with infections [3]. a

134

b

135

C

5

5

v

6

1

)

R

)

2

_

2

7

1

2

2

n

3

n

)

i

R

i

8

C

3

H

7

B

B

n

(

M

(

M

i

4

s

i

2

e

e

s

s

B

s

i

i

s

s

(

1

o

t

t

n

n

l

i

i

a

a

i

i

e

e

u

s

s

h

h

c

m

m

a

a

r

d

d

g

g

s

s

e

e

e

a

a

n

n

b

y

y

a

a

m

m

i

i

u

s

s

k

k

r

r

t

.

.

.

.

.

.

.

M

M

M

M

M

M M

136

Methods

DNA extraction, library preparation and sequencing

Genomic DNA extraction was done on the biopsy bits (~30 μg), which was first homogenized in lysis buffer for 30 minutes at 37°C. Proteinase K (Qiagen, USA) was added to the lysate and the protocol of the extraction followed Qiagen Blood and Tissue protocol (Qiagen, USA). DNA library was prepared according to the NEBNext Ultra II kit for Illumina (New England BioLabs, UK). The library quality was tested on the Agilent 2100 Bioanalyzer system (Agilent, USA) and paired-end sequencing was performed on Illumina HiSeq 4000 platform (Illumina, CA, USA).

Reconstruction of SSU rRNA genes

Raw reads from the sequence library were screened for SSU rRNA genes using the ‘almosteverything’ flag on phyloFlash v3.3 [4]. Since the 16S rRNA gene of Mycobacterium smegmatis was reconstructed by the tool, the publicly available Mycobacterium spp. 16S rRNA gene sequences were downloaded from the NCBI database and a more targeted read-mapping and reassembly approach was performed using BBMap v37.31 (Bushnell, B.-http://sourceforge.net/projects/bbmap/) and Spades v3.11.1 [5], respectively. The taxonomic identities of the reconstructed SSU rRNA genes were then confirmed by performing searches on EZBioCloud [6] and BLAST nt database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). All bacterial SSU rRNA gene sequences obtained in this study were deposited in GenBank under the accession numbers MK418686-MK418688 and MK418801-MK418802.

Reads processing, metagenomic assembly and binning

Metagenome was assembled with Megahit v1.1.4 [7], following phiX decontamination, quality filtering (QV=2), Illumina adapter-trimming and error correction of reads with the BBDuk tool from the BBMap v37.31 suite. The metagenomic assembly was visualized on gbtools [8] and binned on Metabat 2.12.1 [9] using default parameters. The obtained genomic bins were reassembled with Spades, using default parameters, following re- mapping of Illumina reads to the bins using BBMap with 0.98 minimum identity. The genomic completeness and contamination of the metagenome-assembled-genomes (MAGs) were estimated on CheckM [10] and the quality of the assemblies was evaluated on Quast [11]. The taxonomic identities of the MAGs were further ascertained by running GTDB-tk (https://github.com/Ecogenomics/GTDBTk) and also, by performing Tetra correlation searches (TCS) on JSpeciesWS [12]. For bacterial MAG assemblies that were missing the SSU rRNA genes, the corresponding gene sequences were manually added. The average sequencing coverages for each of the MAGs were additionally estimated on BBMap using the ‘covstats’ function. The draft MAGs and metagenomic library reads were deposited in European Nucleotide Archive (ENA) under the study accession PRJEB30883.

Identification of ARGs and genomic/phylogenomic analyses

137

Acquired ARGs in the Mycobacterial MAGs/bins were identified on ResFinder [13] database (January, 2019). Phylogenomic tree reconstruction for the Mycobacterial MAGs and the publicly available genomes of Mycobacterium spp. were performed with GToTree [14] using the default settings and the Bacterial.hmm (74 single-copy-genes [SCGs]). Parsnp from the Harvest suite [15] was used to construct a core genome SNP tree for the Bin3 MAG (M. kansasii) and the publicly available M. kansasii genomes (types I-VI). Trees were visualized and manually edited on iTOL [16]. Replication rates for the individual MAGs were determined on iRep [17] after ensuring the absence of contigs with unusually high or low coverage values and manually refining the assemblies (i.e., elimination of contigs < 5Kb) and further re-confirming on CheckM that these refined MAGs were possessing genomic completeness >75%.

138

BIBLIOGRAPHY

1. 1. Lee MD. GToTree: a user-friendly workflow for phylogenomics. BioRxiv 2019:512491. 2. 2. Guan Q, Garbati M, Mfarrej S, et al. Mycobacterium riyadhense clinical isolates provide insights into ancestry and adaptive evolution in tuberculosis. bioRxiv 2019:728923. 3. 3. Rahman SA, Ahmad J, Ehtesham NZ, et al. Genetic markers for diagnosis of tuberculosis caused by mycobacterium tuberculosis: Google Patents, 2016. 4. 4. Gruber-Vodicka HR, Seah BK, Pruesse E. phyloFlash—Rapid SSU rRNA profiling and targeted assembly from metagenomes. bioRxiv 2019:521922. 5. 5. Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 2012;19:455-477. 6. 6. Yoon S-H, Ha S-M, Kwon S, et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. International journal of systematic and evolutionary microbiology 2017;67:1613. 7. 7. Li D, Luo R, Liu C-M, et al. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 2016;102:3-11. 8. 8. Seah BK, Gruber-Vodicka HR. gbtools: interactive visualization of metagenome bins in R. Frontiers in microbiology 2015;6:1451. 9. 9. Kang DD, Froula J, Egan R, et al. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 2015;3:e1165. 10. 10. Parks DH, Imelfort M, Skennerton CT, et al. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research 2015;25:1043- 1055. 11. 11. Gurevich A, Saveliev V, Vyahhi N, et al. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013;29:1072-1075. 12. 12. Richter M, Rosselló-Móra R, Oliver Glöckner F, et al. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 2016;32:929-931. 13. 13. Zankari E, Hasman H, Cosentino S, et al. Identification of acquired antimicrobial resistance genes. Journal of antimicrobial chemotherapy 2012;67:2640-2644. 14. 14. Lee MD. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics 2019;35:4162-4164. 15. 15. Treangen TJ, Ondov BD, Koren S, et al. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome biology 2014;15:524. 16. 16. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research 2016;44:W242-W245. 17. 17. Brown CT, Olm MR, Thomas BC, et al. Measurement of bacterial replication rates in microbial communities. Nature biotechnology 2016;34:1256. 18. 1. Xu Y, Wei W, Lei S, Lin J, Srinivas S, Feng Y. 2018. An evolutionarily conserved mechanism for intrinsic and transferable polymyxin resistance. Mbio 9:e02317-17. 19. 2. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32:1792-1797.

139

20. 3. Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research 44:W242-W245.