Operational impact of library development on the identification of organisms in the family in pharmaceutical manufacturing using MALDI-TOF mass spectrometry backed up by sequencing

Christine E. Farrance, Sunhee Hong and Prasanna D. Khot

1 ABSTRACT 2 STUDY COHORT 3 STUDY DESIGN

The family Bacillaceae comprises a wide phylogenetic diversity of approximately 70 genera. Members • Analyzed 14,181 isolates from family Bacillaceae submitted for MALDI backed by sequencing from • Retrieved data from CRL’s AccuPRO-ID® service, which is MALDI-TOF MS backed up by sequencing 2011 to 2018 by 662 customers in North America and Europe are commonly found in pharmaceutical manufacturing facility microflora. Most form spores, • Retrieved ID results for each species until 200 samples were reached for frequently identified species thus, they are prevalent contaminants and resistant to disinfection. Even as MALDI-TOF-MS (MALDI) • These isolates represent 213 species or species groups of the family Bacillaceae spanning 20 genera or up to 100 for less frequently identified species is becoming widely available as an identification method, there is little published regarding its efficacy for the Bacillaceae. Over 7 years, Charles River has expanded our MALDI library with nearly 400 • Tracked reasons for No ID by MALDI (Transfer-to-Sequencing): Top Six Frequently • Due to top species scores too close (unresolvable to single species) entries that now represent 67.9% of the total entries for the family Bacillaceae. In this retrospective Frequently Identified Species Within the Genera study, we analyzed the identifications of approximately 14,200 isolates spanning 20 genera from the Identified Genera • Due to No Match (lack of library coverage) Bacillaceae that were submitted for identification by MALDI. If the MALDI failed to provide an • Due to No Spectra (suboptimal sample preparation) identification, the isolates were identified by 16S rDNA sequencing. Operational impact of using MALDI B. cereus group, B. licheniformis, B. circulans for identification was calculated for factors such as sequencing as a backup identification system, Lysinibacillus L. fusiformis, L. boronitolerans, L. massiliensis • Calculated ID rate with and without the impact of targeted library development additional MALDI library development, sample preparation methods, and desired taxonomic resolution. Geobacillus G. stearothermophilus, G. kaustophilus, G. thermodenitrificans Workflow of MALDI Backed By Sequencing (AccuPRO-ID Service) In addition, impact of capturing strain diversity within the MALDI library and the option to discriminate Oceanobacillus O. profundus, O. sojae, O. caeni closely related species using protein coding gene sequencing was reviewed. This study demonstrated <1.75 <1.75 (transfer) Any level ID MALDI- MALDI- Seq the operational benefit of MALDI library development, backed up with sequence-based identification for Psychrobacillus P. insolitus, P. psychrodurans, P. psychrotolerans Report ID Direct Extract (16S) accurate and rapid identification. Virgibacillus V. halodenitrificans, V. pantothenticus, V. proomii ≥1.75 ≥1.75 Report ID Report ID

4 RESULTS 5 CONCLUSIONS

Impact of Targeted Library Development (n=14,181) Impact of Strain Diversity in the MALDI Library Species Resolution

Impact of Targeted Library Development and One MSP (Entry) Predominates Protein-Coding MALDI Library Coverage for Family Bacillaceae Optimized Threshold MALDI 16S rRNA Gene Bacillus idriensis A2 Gene Bacillus idriensis B1 Bacillus australimaris / pumilus / 100% Bacillus idriensis A1 Yes (gyrB gene) ID’d with safensis / zhangzhouensis • Targeted library development is critical to improving ID rate 15.8% Bacillus horneckiae A4 Bacillus pumilus / safensis / altitudinis 90% 16S Seq 32.1% Bacillus horneckiae A3 Bacillus horneckiae B2 Bacillus altitudinis Not needed 80% Bacillus horneckiae A2 46.1% Bacillus horneckiae B1 Bacillus horneckiae A1 67.9% 70% Bacillus aryabhattai / megaterium Bacillus aryabhattai / megaterium Yes (recA gene) • Capturing strain diversity during library development is necessary Lysinibacillus massiliensis A1 60% Lysinibacillus massiliensis B1 Lysinibacillus massiliensis A2 Lysinibacillus massiliensis A3 50%

Out-of-Box system With Library Development 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 16S rRNA gene gyrB gene ID rate ID 40% 23.8% 84.2% • Sequencing backup is very effective to ID isolates and resolve 30% Diveristy Is Important species MALDI cannot Bacillus niabensis A1 20% Bacillus niabensis A2 Reasons for No ID by MALDI (n=2233, 15.8%) 30.0% Bacillus niabensis A3 Bacillus niabensis A4 10% Bacillus niabensis A5

Bacillus nealsonii A1 15.3% 0% Bacillus nealsonii A2 • Specialized sample preparation may be needed for some species Out-ofBruker-the-Box With CRLLibrary Bacillus nealsonii A3 Bacillus nealsonii A4 44.8% System Development Bacillus nealsonii A5 Bacillus nealsonii B1 that do not generate spectra on MALDI by routine methods Bacillus nealsonii A6

39.9% MALDI Score Key Bruker CRL Geobacillus stearothermophilus B1 Geobacillus stearothermophilus A1 Species ID ≥2.0 ≥1.75 Geobacillus stearothermophilus B2 Geobacillus stearothermophilus A2 Low confidence ID ≥1.7 and <2.0 N/A Geobacillus stearothermophilus A3 Geobacillus stearothermophilus B3 Geobacillus stearothermophilus B4 Unresolvable Species No Spectra Lack of Library coverage No ID <1.7 <1.75 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%