iAM.AMR

Sep 01, 2021

Welcome

1 Background 3

2 Project 5

3 Scope 7

i ii iAM.AMR

Welcome 1 iAM.AMR

2 Welcome CHAPTER 1

Background

Antimicrobial resistance (AMR) refers to the ability of microorganisms to withstand the effects of antimicrobials (to which they were formerly susceptible). In short, AMR reduces (or eliminates) our ability to treat certain types of infections. While the primary driver (cause) of AMR is human antimicrobial use (AMU), we’re also concerned with animal AMU in the agri-food production system; AMU may increase AMR in zoonotic pathogens, and Canadians may be exposed to these pathogens through the consumption of microbially-contaminated foods.

3 iAM.AMR

4 Chapter 1. Background CHAPTER 2

Project

The goal of the iAM.AMR project is to quantify the relative contribution of each bug-drug-commodity combination to Canadians’ overall exposure to resistant pathogens arising from the agri-food production system. To that end, we searched the literature to identify factors affecting resistance, and used these factors in an integrated assessment model (IAM) framework to characterize the prevalence of AMR along the farm-to-fork continuum. Ulti- mately, we describe the number of servings at risk for each bug-drug-commodity combination. This framework allows us to acheive a secondary objective: to understand how broad changes (e.g. encouraging a practice, or withdrawing an antimicrobial) can influence the entire agri-food system.

5 iAM.AMR

6 Chapter 2. Project CHAPTER 3

Scope

The iAM.AMR project focuses on four [food-animal species | commodities]: • broiler chicken | chicken • swine | pork • dairy cattle or beef cattle | beef • turkey | turkey The iAM.AMR project focuses on four microbes: • E.coli • Salmonella Spp. • Campylobacter Spp. • Enterococcus Spp. The iAM.AMR project focuses on resistance to drugs of human importance, including: • • fluoroquinolones • third-generation See the Model Directory repository for an up-to-date list of models and locations.

3.1 Getting Started

Welcome to the iAM.AMR project! The first step in getting started is to explore this documentation; get to know the project in-depth here!

7 iAM.AMR

Then, check out our Start-Here GitHub repository. GitHub is where we host our models, store more of our documen- tation, and do our model development. Finally, fetch and run the models! For the Public For New Collaborators

3.1.1 For the Public

At present, the models are not being openly circulated. Please contact Brennan Chapman to be granted access to the model repositories. See the Model Directory repository for an up-to-date list of models and locations.

3.1.2 For New Collaborators

The first step is to complete the on-boarding survey. Then, access the iAM.AMR GitHub Organization (requires approval, and login). If you can’t access these links, contact Brennan Chapman. Note, it may take up to 48h after completing the onboarding survey to be added – before you reach out, ensure you’ve accepted all invites at our GitHub Organization’s invite page and checked the Notifications section. See our Start Here repository for more information. See the Model Directory repository for an up-to-date list of models and locations.

3.1.3 Important Links

See the public and private iAM.AMR directories.

3.2 Background

3.2.1 Antimicrobial Resistance (AMR)

Antimicrobial resistance (AMR) refers to the ability of microorganisms, such as , fungi, or viruses, to withstand (or partially withstand) the effects of an antimicrobial to which they were formerly susceptible. While AMR may occur naturally as a result of the evolutionary process, the development and spread of AMR in microbial populations has been accelerated by the sustained anthropogenic use of antimicrobials. At present, an estimated 700,000 persons die each year as a result of antimicrobial resistant infections1 – a number which is expected to grow as the prevalence of resistance continues to increase at a rate that far outpaces our ability to develop new antimicrobial therapies.

3.2.2 AMR and the Agri-food Production System

In addition to human and veterinary medicine, antimicrobials are used in livestock and agricultural production (to- gether, agri-food production) to reduce the occurrence of disease and increase yield. While antimicrobial use (AMU) in human medicine is recognized as the primary driver of anthropogenic resistance development, AMU and other resistance-promoting practices in the agri-food production system are of particular concern with respect to human health, given the ease with which resistant pathogens may transfer between animals, humans, and their environments

1 O’Neill, J. Tackling drug-resistant infections globally: final report and recommendations. Rev. Antimicrob. Resist. 84 (2016). doi:10.1016/j.jpha.2015.11.005

8 Chapter 3. Scope iAM.AMR along the farm-to-fork continuum. Despite the risk posed by these pathogens, there remain a number of significant knowledge gaps in our understanding of the processes governing the development and persistence of AMR in the agri-food production system.

3.2.3 Integrated Assessment Modelling

Integrated assessment modelling differs from traditional risk modelling approaches in that it (generally) does not seek to develop numerical answers to specific questions; while integrated assessment models (iAMs) are simplifications of reality, they are not designed to simplify systems to the point of solution. Rather, iAMs are designed to integrate vastly different forms and scales of information, from traditional and non-traditional stakeholders, into a single framework through which users can address broad and complex questions. The output of an iAM, while often unrealistic or nonsensical in terms of a specific numerical value, is designed to increase the users’ understanding of the direction and magnitude of changes resulting from perturbations to a large, complex system. In the context of integrated assessment modeling’s original application – climate change science – these perturbations are often characterized as inadvertent consequences of human actions. In the context of generalized risk assessment, these perturbations may also take the form of strategic interventions, designed to achieve risk reductions throughout (or more commonly, within a manageable subsection of) the system.

3.3 Project Structure

3.3.1 Goals

The overall goal of the iAM.AMR project is to elucidate and quantify the relative contributions of specific agri-food commodities and related environmental exposure pathways to Canadians’ overall exposure to antimicrobial resistant bacteria arising from the agri-food production system. To meet this goal, we endeavor to: • create a conceptual model describing the agri-food production system, the drivers of AMR within this system, and the effects of AMR within this system (and beyond) on human, animal, and environmental health. • collect data from national AMR and AMU surveillance programmes, associated research projects, and the sci- entific literature to describe the ecology and epidemiology of AMR in this system historically, and at present. • quantify the individual effect of each driver of resistance – and identify omitted drivers – through a comprehen- sive literature search. • integrate the conceptual model, epidemiological data, and the effect of each driver within a standardized mod- elling framework. • use the developed mathematical model(s) to understand how drivers, such as AMU and Canadian production practices affect human exposure to antimicrobial resistant bacteria arising from the agri-food production system. • engage industry stakeholders to inform identified knowledge gaps, communicate high-risk practices, and provide recommendations considering each human, animal, and environmental health. • use these findings to inform broader AMR human risk-reduction initiatives.

3.3.2 Overview

An overview of the entire project can be accessed via Kumu via the AMR Org Chart, embedded below. To edit or contribute, please contact Brennan Chapman.

3.3. Project Structure 9 iAM.AMR

Scope

To reduce the scope of the project to a manageable size, the three most common food-animal species and the three most commonly isolated enteric bacteria from those species were selected as the core areas of focus. These include chicken, cattle, and swine, and E. coli, Salmonella Spp. and Campylobacter Spp. for host species and bacterial species respectively. While the primary human exposure route is assumed to be consumption of the corresponding agri-food products (chicken, beef, and pork), additional focus has been placed on environmental exposure routes (e.g. through the consumption of leafy greens or root vegetables, grown in manure-amended soils). Additional food-animal and bacterial species of interest include turkeys and Enterococcus Spp., which may be explored later as the project progresses.

Organization via stories

The models are organized primarily by ‘stories’, or bug-drug-host combinations of particular interest. Existing stories and their corresponding models are available in the iAM.AMR GitHub repository.

Literature Search

The iAM.AMR models are informed by a single, all-encompassing literature search. A description of the literature search and associated products is provided on the literature search main page.

The CEDAR database

CEDAR, the Collection of Epidemiologically Derived Associations with Resistance, is a Microsoft Access database, designed to house data extracted in support of the iAM.AMR project and associated activities. The studies identified through the literature search are reviewed, and data extracted as per a number of criteria outlined on the CEDAR main page.

The sawmill package

The sawmill package is an R package which processes queries extracted from the CEDAR database. The package is fully documented in-line using Roxygen2, and is outlined on the sawmill main page.

Model Building

Once the data are extracted from the literature, collated in – and queried from – the database, and processed by the R package, they are included in a model built around a robust model framework. The iAM.AMR models are designed to predict the frequency of exposure of Canadians (and specifically, Ontarians) to antimicrobial-resistant bacteria from agri-food products. To do so, the models follow agri-food production along the farm-to-fork continuum – where possible, from birth (or hatch), through rearing, slaughter, processing, retail, and finally human consumption. The models start with a baseline prevalence of resistance, derived from the earliest sampling point we have available – ideally as close to birth (or hatch) as possible. This prevalence of resistance is affected by practices at each stage of production; we term these practices ‘factors’ which affect resistance. These factors are grouped by stage of production: on-farm, at abattoir, and at retail.

Note: The models only include binary factors (i.e. those described by yes or no). This means that a factor may be considered in one of two ways: as a risk (i.e. it increases the prevalence of resistance) or as being protective (i.e. it

10 Chapter 3. Scope iAM.AMR decreases the prevalence of resistance). It is important to note that this perspective (risk vs. protective) depends on the scenario. For example, changing the litter material of broiler chickens from straw to wood curls may increase the prevalence of resistance (a risk) – but if we have a scenario in which all litter material is already wood curls, this factor is only informative when we flip it, and consider the protective effect of changing the material to straw.

At each stage of production, we take the prevalence of resistance from the previous stage (e.g. baseline for on-farm, on-farm for abattoir, etc.) and update it, considering the effect of each factor (in combination with all others), and how often that factor is implemented in Canadian industry. We then pass this updated prevalence of resistance to the next stage. The calculations used to update the prevalence at each stage are identical, though as you will likely notice, we have more factors at the on-farm stage than any other. After calculating the final prevalence of resistance at retail, we use the prevalence of bacterial contamination at retail (i.e. the recovery rate of the bacteria, derived from national surveillance programmes), estimates of population size, and consumer consumption behaviours, to derive the final output of the model - the number of servings of the agri-food product that are contaminated with antimicrobial-resistant bacteria in a one week period

3.3.3 Funding and History

Stakeholders from each human, animal, and environmental health disciplines are often engaged in addressing the risk posed by AMR in the agri-food production system. A project1 by an associated team aimed to identify non-traditional stakeholders, who are often overlooked for engagement, but are nonetheless affected by AMR. As part of this project, the team created a large diagram of drivers, included below. The iAM.AMR project was born out of the concept of enumerating these identified pathways. Beginning in 2014, the iAM.AMR project was supported by the Ontario Ministry of Agriculture, Food and Rural Affairs (OMAFRA) New Directions Funding Program (Project ND2013-1967), with a focus on the applicability of the models specifically to Ontario (a focus that remains today). Subsequently, the project has been continued as a sub-project of GRDI-AMR.

GRDI

The Genomics Research and Development Initiative (GRDI) funds genomic research across the federal science port- folio. A specific focus of GRDI is the development of shared priority projects (i.e. projects involving multiple federal departments). The GRDI-AMR project (2016 – 2021) is a nine million dollar shared priority project lead by Ed Topp at AAFC, which aims to use genomics to understand how the development of AMR in the agri-food production system impacts human health. The project is broadly divided into five working groups; the iAM.AMR project is a significant, evergreen deliverable from work package five.

3.4 The CEDAR Database

The Collection of Epidemiologically Derived Associations with Resistance or CEDAR database is the central repository of epidemiological data for the iAM.AMR project.

3.4.1 Introduction

1 Majowicz, S.E., Parmley, E.J., Carson, C. et al. BMC Res Notes (2018) 11: 170. https://doi.org/10.1186/s13104-018-3279-8

3.4. The CEDAR Database 11 iAM.AMR

Fig. 1: Figure 2 from Majowicz et al. (2018) demonstrating the complexity of the drivers of AMR.

12 Chapter 3. Scope iAM.AMR

What is a database?

A database is a structured set of data, organized a way that makes it easy to search for, select, and retrieve specific subsets or combinations of information. There is no one defining characteristic that makes a database a database, but a database is often differentiated from a simpler application by its formal structure, and rigidly defined data- relationships.

Tip: We often use the term database to refer to the sum of the data, the data structure, and the software used to create, manipulate, and access the database. However, we can more accurately refer to the data and its structure as the database, and the software as the database management system or DBMS.

Why use a database?

There are numerous benefits of using a database to store large amounts of complex data, most of which become evident when we contrast a database against a spreadsheet or flat-file. Take a look at the table below, which includes demographic and political information for some of Canada’s largest cities (circa 2020). This represents a flat-file approach.

City Province Population (2016) Premier Party Toronto Ontario 5,429,524 Doug Ford OPC Montreal Quebec 3,519,595 François Legault CAQ Vancouver BC 2,264,823 John Horgan NDP Calgary Alberta 1,237,656 Jason Kenney UCP Edmonton Alberta 1,062,643 Jason Kenney UCP Winnipeg Manitoba 711,925 Brian Pallister PCM Quebec City Quebec 705,103 François Legault CAQ Hamilton Ontario 693,645 Doug Ford OPC Guelph Ontario 132,397 Doug Ford OPC

There are two obvious drawbacks to this approach. The first is practical – this table contains a number of duplicate values, which increase the size of the table, and add opportunities for input error. The second is more conceptual, in that this table has no singular purpose – if you had to title it, what would that title be? When we mix heterogenous data (i.e. demographic data with political data), we often lose clarity, and forget where we stored data, or with whom that data should be shared. The alternative is a relational database, which involves organizing our complex data, and defining relationships between the disparate parts. Take a look at the tables below.

ID City Population (2016) 01 Toronto 5,429,524 02 Montreal 3,519,595 03 Vancouver 2,264,823 04 Calgary 1,237,656 04 Edmonton 1,062,643 05 Winnipeg 711,925 02 Quebec City 705,103 01 Hamilton 693,645 01 Guelph 132,397

3.4. The CEDAR Database 13 iAM.AMR

ID Province Premier Party 01 Ontario Doug Ford OPC 02 Quebec François Legault CAQ 03 BC John Horgan NDP 04 Alberta Jason Kenney UCP 05 Manitoba Brian Pallister PCM

Now each table has a singular theme or purpose, and is clear in the information it conveys. We have fewer error-prone entries (e.g. the names of the premieres), and fewer duplicate datapoints. And by matching the IDs in the table, we can recreate the main table if necessary, or share component parts without sharing the entire data collection. The benefits of a database approach are evident, even at this small of a scale.

What is the terminology?

A relational database is a collection of tables, linked together by relationships. A table contains data, and consists of rows and columns. The rows – also known as tuples or records – are sets of data related to a single object. These sets consist of multiple, named elements of data, organized into columns – also known as attributes or fields. A relationship defines how we match data between tables. Often, this matching is done via unique primary key or ID. A form is a graphical user-interface for entering data into the database. A query is a request we pass to the database to retrieve a specific subset of records and fields, constrained by criteria we specify.

How do we store a database?

Generally, databases are separated into two parts: a front-end and a back-end. The front-end consists of the user- interface, through which we enter, manipulate, and retrieve data. The back-end consists of the data itself, organized into tables and other data storage formats.

Tip: The front-end and back-end can be thought of as a web browser and website respectively; the distributed front-end is used to retrieve and display information from a centralized back-end.

This configuration allows multiple users to simultaneously access and work with the same, always up-to-date set of information. There is no explcit requirement for these parts to be seperate, however combining the files reduces multi-user capability.

3.4.2 Access CEDAR

Locate CEDAR

CEDAR consists of two files: • the back-end file: CEDAR_forest.accdb • the front-end file: CEDAR.accdb You will need both to access CEDAR.

14 Chapter 3. Scope iAM.AMR

Locate the back-end file CEDAR_forest.accdb

If you are accessing CEDAR from the GoC network, locate CEDAR_forest.accdb in the CEDAR sub-folder of the iAM.AMR project. If you are accessing CEDAR from outside the GoC network, you will need a local of CEDAR_forest.accdb.

Locate the front-end file CEDAR.accdb

You can access the front-end file CEDAR.accdb from the private CEDAR GitHub Repository. You can request access to the repository by contacting @chapb. If you have been granted access, you can accept the invite here.

Open CEDAR

Always access CEDAR by opening the front-end file CEDAR.accdb. When you open CEDAR.accdb, you will be presented with with a mostly blank screen:

Fig. 2: The launch screen of CEDAR.accdb.

On the left-hand side, the database objects are organized by type (tables, queries, forms) in the Navigation Pane.

What do I do if I get a security warning?

Upon opening CEDAR.accdb, you may see a security warning prompt like one of those shown below. You may also see a security prompt if you are re-linking or using a new version of the CEDAR.accdb file. In all cases, you can simply select Enable Content or Accept/Trust as necessary.

3.4. The CEDAR Database 15 iAM.AMR

assets/figures/sec_warn_01.png

Fig. 3: Example security warning

assets/figures/sec_warn_02.png

Fig. 4: Another example security warning

Re-link CEDAR.accdb and CEDAR_forest.accdb

The first time you open CEDAR.accdb (or an updated version of CEDAR.accdb), you must re-link the front-end and back-end databases. If you forget to re-link the databases, opening a database object like a query or form will result in an error message, similar to the one below:

Fig. 5: An example of the error message recieved when opening a database object from an unlinked front-end.

To re-link the files: 1. Locate the External Data tab in the ribbon (the top, red menu bar), and select Linked Table Manager. 2. On the right-hand side of the Linked Table Manager, use Select All to select all tables. 3. On the right-hand side of the Linked Table Manager, select Relink, and navigate to CEDAR_forest.accdb. In Access 365, an additional confirmation dialogue is presented: Select No. If you select Yes, you will have to confirm each table name manually (by clicking accept through the subsequent dialogues).

Tip: Don’t forget that you will need to re-link the database each time the front-end CEDAR.accdb is updated, or the files are moved.

16 Chapter 3. Scope iAM.AMR

Fig. 6: The name confirmation dialogue box is only displayed in the latest versions of Access.

3.4.3 Read CEDAR

There are two primary ways to interact with CEDAR: to read reference-level information, and to read factor-level information. Both of these tasks are accomplished via forms, accessible via the Navigation Pane on the left-hand side of the window. To access reference-level information, use the Add or Edit a Reference form. To access factor-level information, use the Add or Edit a Factor form.

3.4.4 Navigating CEDAR

Most navigation in CEDAR is accomplished through the Navigation Pane, where you can select tables, queries, or forms, and the Record Navigation Bar, at the bottom of the screen:

Fig. 7: The Record Navigation Bar is highlighted in red at the bottom of the screen.

You can use the left and right arrows to navigate between records (generally between references), or the right arrow with yellow star to create a new record (generally a new reference). This bar also contains a search feature to quickly find records.

3.4. The CEDAR Database 17 iAM.AMR

3.5 Getting Started

This section provides instructions for data extraction into CEDAR. If you are not familiar with CEDAR, please review the section on CEDAR. There are two types of data you will be extracting: reference-level data, which provides context for the outcome, and factor-level data, which describes the outcome itself. These data are entered in a reference-level form, and a factor-level form respectively. On the Add or Edit a Reference form, you will extract reference-level information such as: • study location • study design • reporting methods On the Add or Edit a Factor form, you will extract factor-level information such as: • the exposed and referent groups • the host, microbe, and resistance tested • counts, prevalences, or odds ratios describing the effect of the factor

Note: Only counts, prevalences, and odds ratios are extracted. Other result formats, such as relative risks, are not extractable for our purposes.

3.6 The Add or Edit a Reference Form

3.6.1 Summary

The Add or Edit a Reference Form handles reference-level data extraction.

3.6.2 Main Tab

The Main Tab includes bibliograpic information, study identifiers, and reference exclusion status.

Bibliographic Information

The title, author, and publication year should already be extracted. Do not edit these fields; updated bibliographic data is managed in Mendeley. Extract the publisher (journal).

Study Identifiers

Extract the DOI (preferred), or PMID (if available). The Bibtex key is a unique string representing the reference. The Bibtex key may be blank; do not add one here.

18 Chapter 3. Scope iAM.AMR

Exclusion Status

If the reference should be excluded, set is excluded to TRUE. Additionally, provide an exclusion reason.

3.6.3 Study Design Tab

The Study Design Tab includes study design information, and study reporting characteristics.

Study Design

Select a study design from the dropdown menu. Then, extract additional detail from the text. The detail should include information such as: • unit selection process • experimental group allocation • experimental conditions Then, extract the sampling method from the text. This can be as simple as the sample type (e.g. “fecal samples collected from fresh pats on the barn floor”), or may include additional information (e.g. “Two clinical swabs per steer were inserted approximately 5 cm into the rectum and rotated until covered with a uniform amount of feces.”) if available.

AST

AST Method

Select the antimicrobial susceptibility testing method used in the study.

Explicit (Enumerated) Breakpoints

A reference contains explicit, enumerated breakpoints when the level(s) at which isolates were considered resistant are reported in the text. Explicit breakpoints are reported differently for different antimicrobial susceptibility testing (AST) methods: • for dilution-based assays, concentrations will be reported in µg/mL • for diffusion-based assays, zone diameters will be reported in mm For example, Awosile et al. (2018) include explicit breakpoints: The following antimicrobial agents were tested with the resistance breakpoints presented in parentheses: (32 µg/mL), - (32/16 µg/mL), (32 µg/mL), ceftri- axone (4 µg/mL), (8 µg/mL), ciprofloxacin (1 µg/mL) (32 µg/mL), (16 µg/ mL), kanamycin (64 µg/mL), nalidixic acid (32 µg/ mL), (64 µg/mL), sulfisoxazole (512 µg/mL), trimethoprim-sulfamethoxazole (4/76 µg/ mL), and (16 µg/mL). But, references that only include reporting to the effect of “the results were interpreted according to CLSI | EUCAST | NARMS | CIPARS guidelines” do not include explicit breakpoints. For example, from Diarra et al. (2007): The MIC results were interpreted according to the breakpoints of the CLSI and the 2005 Canadian Inte- grated Program for Antimicrobial Resistance Surveillance (CIPARS) guidelines. Or from Cameron-Veas et al. (2018):

3.6. The Add or Edit a Reference Form 19 iAM.AMR

An isolate was identified as susceptible or resistant, based on the epidemiological cut-off value (ECOFF) defined by the European Committee of Antimicrobial Susceptibility Testing (EUCAST, 2016). Or from Jahanbakhsh et al. (2015), using disk diffusion: The strains were recorded as susceptible, intermediate, orresistant according to the zone diameter inter- pretative standards recommended by Clinical and Laboratory Standards Institute (CLSI) in 2010 (CLSI, 2010) for most of the antimicrobials and in 2008 for ceftiofur (CLSI, 2008).

Tip: Studies that use diffusion-based assays often report the concentration of the disks – this is not an explicit breakpoint. Instead an explicit breakpoints would be given as diameters (in mm).

MIC Table

An MIC table contains counts (or frequencies) of isolates that are inhibited at each dilution range. If a factor has an MIC table, it almost always gives explicit breakpoints; these breakpoints are often indicated in the table using a dark line. An example of an MIC table is provided below:

Fig. 8: An example of an MIC table from Avrain et al. (2003).

3.6.4 Location Tab

Select the country in which the study was conducted. If the location of study is not explicitly reported, it can be inferred from the PI’s institution. Note, some countries are listed with their full (infrequently used) names; check all variations if you cannot locate a country. If a more precise location is provided (e.g. state, province, canton), add an entry for each applicable sub-region. If the precise location provided does not map to the specified sub-regions, select Other, and provide more detail. If no precise location is provided, select Other from the sub-region list, and leave the detail blank. For example, if a study was conducted in Ohio and Michigan of the United States of America, select “United States of America” as the location, and include two sub-regions, “Ohio” and “Michigan”. If the study was described as being conducted “in the mid-western USA”, select “United States of America” as the location, and include an entry with sub-region “other”, and detail “mid-west”.

Hint: If the location is not stated in the text, you can infer it from the PI’s home institution.

20 Chapter 3. Scope iAM.AMR

3.6.5 History Tab

The data extraction process can be broken down into steps; the History tab tracks the progress of the references as they move along each of these steps toward completion. Each time a user completes an activity, they must update the reference history by adding an entry in the tab. This includes when a user completes an activity previously assigned to another user; the user should always add an entry for all completed activities.

Hint: Think of the history tab as tracking milestones; any time the reference reaches a new stage of completion or verification, the history should be updated. This is used to infer the completeness and reliability of the data for downstream activities.

Activities are generic terms for steps in the data extraction process; always select the appropriate, specific step when updating the reference history. In brief, the life cycle of a reference consists of these activities: • the reference is imported • the reference is assigned • the reference is extracted (or extracted in duplicate) • the reference is reviewed • the reference is (optionally) signed-off on by a senior user

Status Definition imported The reference has been imported into the database from the literature search. import_single The reference had been extracted in a previous version (V1) of CEDAR, and was imported here (replicate 1 of 2). import_dual The reference had been extracted in a previous version (V1) of CEDAR, and was imported here (replicate 2 of 2). im- The reference had been extracted in a previous version (V1) of CEDAR, and was imported here port_reviewed (already reviewed). assigned The reference has been assigned to a user for data extraction. ex- The reference has been extracted (replicate 1 of 2). Or, the reference has been excluded. tracted_excluded _single re- The reference has been extracted (singular extraction), and a second user has reviewed and cor- viewed_single rected any errors or omissions (or concurs the reference should be excluded). signed_off_single The reference has been extracted (singular extraction), and a senior user has reviewed and cor- rected any errors or omissions (or concurs the reference should be excluded). recheck_single The reference has been extracted, but upon review the original extractor (select their name here, not yours) must re-check the reference. Check the notes field for details. extracted_dual The reference has been extracted in duplicate (replicate 2 of 2). reviewed_dual The reference has been extracted in duplicate, all conflicts were resolved. signed_off_dual The reference has been extracted in duplicate, all conflicts were resolved by a senior user.

3.6.6 Notes and Issues Tab

The Notes and Issues tab allows users to add notes to the reference, describing issues like: • problems with data extraction

3.6. The Add or Edit a Reference Form 21 iAM.AMR

Fig. 9: The life cycle of a reference.

22 Chapter 3. Scope iAM.AMR

• additional context • omitted factors • level data Attach a seperate note for each concern.

3.7 The Add or Edit a Factor Form

3.7.1 Summary

The Add or Edit a Factor Form handles factor-level data extraction.

3.7.2 Title

Create a title to describe the factor in title case. The title should be simple, direct, and give no experimental context – the title should be generic, as to easily identify comparable factors between studies.

Common factor types

Antimicrobial use (AMU)

Where the factor solely describes antimicrobial use, the title should be recorded in the format “ Use”, where is the antimicrobial used. The antimicrobial(s) used should be tagged using the AMU field.

Before and after AMU

Where the factor describes the same group of host animals before and after AMU, designate pre-AMU data as repre- sentative of the referent group and post-AMU data as representative of the exposed group. Treat the factor exactly like an Antimicrobial Use factor.

Production type

Where the factor describes production type (i.e. a comparison between conventional, and organic, ABF, or free-range production), the title should be recorded as “Production Type”.

Important: These alternative systems are not the same. While all organic is ABF (-free), not all ABF is organic. ‘Welfare’ and ‘humane’ production systems are likewise different.

3.7. The Add or Edit a Factor Form 23 iAM.AMR

Antimicrobial bans or changes in industry policy

Where the factor describes the effect of a growth promoter ban, or of a related change in industry policy, the title should be recorded as “ Ban”. In this case, the exposed and referent groups should represent pre-ban and post-ban conditions, respectively. Any antimicrobials available for use pre-ban, as well as any available both pre- and post-ban that were specifically mentioned, should be tagged using the AMU field, as if the factor is an Antimicrobial Use factor. Any information about the prevalence of actual growth promoter use pre-ban (and sometimes post-ban) should be provided in the factor description and/or in the reference notes.

3.7.3 Description

Create a description to provide context in sentence case. The description should include relevant experimental conditions, not captured elsewhere in data extraction. This in- cludes details such as the identity and quantity of antimicrobials administered, duration of exposure, prior antimicrobial use, etc.. For example, the factor titled “Chlortetracyline Use” may have a description: “Chlortetracyline, administered in feed (days 17 - 78, 164 - 206), as Aureomycin 100-g at 11 ppm. Isolates cultured on agar amended with 4 휇g/ml TET-HCL.” Note that this is a particularly data-rich example – many factors will not be recorded with that level of detail because it is not reported in the literature.

3.7.4 Host and microbe

Select a host and microbe from the dropdown menus. Once you select a host, you will be able to select a host sub-type. Likewise, once you select a microbe, you will be able to select a microbe sub-type.

Attention: The sub-type will only be shown correctly if the type used in the last record selected is a parent of the sub-type. For example – for a reference with two factors – if the first factor was for cattle, and the second factor for chicken: while the cattle factor is in focus (selected), the cattle sub-type will be shown, but the chicken sub-type will disappear (and vice-versa). This also applies to the microbe and microbe sub-type dropdown, and similarly applies to the AMU field.

Do not be alarmed if the sub-types seem to disappear when extracting from a paper with multiple host or microbe types – the data are still there, but not visable. You can check the data are still there by selecting the record (by interacting with one of the fields, or clicking in the white-space around the fields).

3.7.5 Location

Location (Loc.) refers to the location of the factors’ data in the text. This is generally a table or figure. However, if the data are in the body of the text, use page (pg.) and paragraph (para.) numbers to indicate the location. Always use the physical page number if available. If only the electronic page number is available (the page in the PDF), use the electronic page number (epg.).

3.7.6 Result

Result refers to the format of the factors’ data. Data are presented in one of several formats:

24 Chapter 3. Scope iAM.AMR

• as contingency tables (counts of AMR+, AMR-, and totals) • as prevalence tables (percentages of AMR+, AMR -, and totals) • as relative risks • as odds ratios When multiple data formats are available, we always prefer contingency tables (count data), followed by prevalence tables, and finally odds ratios or relative risk. You only need to extract one format of data for a given factor.

Attention: If extracting an odds ratio, be sure to extract the p-value corresponding to that odds ratio, if provided. For factors defined by odds ratios, p-values cannot be calculated later, unlike factors defined by contingency or prevalence tables.

3.7.7 Stage

Select both an allocation and observtion production stage: • The allocation stage refers to the production stage at which the exposed and referent groups are effectively established, and where the factor effectuates change. • The observation stage refers to the production stage at which the effects of the factor are observed, and where sampling was performed.

Tip: A study which involves the retail sampling of organically- and conventionally-raised chicken products to deter- mine the effect of production type would have an allocation stage of Farm, and a observation stage of Retail, as the factor effectuates changes on-farm, but these are measured at retail.

3.7.8 AMR

Select the ingredient to which resistance was assayed. As you begin to type, the field will be auto-completed from the list of available ingredients. If you cannot locate the appropriate ingredient, try exploring the available ingredients.

3.7.9 Exposed and referent groups

Describe both the exposed and referent groups, in title case. The exposed and referent groups are allocated as described in the literature (i.e. if the authors use ‘wood curl bedding’ as the exposure, and ‘flax bedding’ as the referent, it should be recorded as such). If no allocation is provided, the interventionist practice should be used as the exposure, and the default practice should be used as the referent (i.e. ‘doing something’ is the exposure, ‘doing nothing’ is the referent). The exception to these rules is Antimicrobial Use. Where the factor describes antimicrobial use – regardless of how the authors allocate the exposed and referent groups – the exposure should always be antimicrobial use, and the referent should always be no use. Additionally, the factor should be recorded in the format “ Use” (where is the antimicrobial used), and “No Use”. For example, if a study compares the prevalences of resistance in broilers administered ceftiofur, the exposure should be recorded as “Ceftiofur Use” and the referent as “No Use”.

3.7. The Add or Edit a Factor Form 25 iAM.AMR

3.7.10 Result or analysis unit

Select the unit of analysis (i.e. the unit allocated to the exposed and referent groups). Generally, this will be at the isolate or sample level, but some analyses are conducted at the flock, herd or farm levels.

3.7.11 AMU

Select the ingredients used as part of the factor. As you begin to type, the field will be auto-completed from the list of available ingredients. Then, select ‘Add AMU’ to add the ingredient to the list. Likewise, highlight the ingredient and select ‘Delete AMU’ to remove it from the list. Refer to the selecting an antimicrobial section for details on how to extract data for factors including multiple ingredi- ents.

Selecting an Antimicrobial

We use the WHO’s ATCvet index as our controlled vocabulary for recording antimicrobial resistance (AMR) and antimicrobial use (AMU). The process of selecting an antimicrobial to describe AMR (i.e. the resistance assayed) is straightforward, owing to the fact only one antimicrobial is assayed at a given time, and there are a limited number of antimicrobials included in most antimicrobial susceptibility tests (ASTs). The process of selecting antimicrobial(s) to describe AMU is more complex, as multiple antimicrobials may be used at a given time, and in a greater number of combinations. Regardless of whether you are selecting an antimicrobial for AMR or AMU, the goal is the same – to find the most appropriate and specific ATCvet code that describes the antimicrobial(s).

Note: You do not need to have direct knowledge of, or work with the ATCvet codes directly. When we say ‘select an ATCvet code’, what we really mean is ‘select the most appropriate ingredient(s), represented in the ATCvet index’.

Below, we use the terms ingredient, antiinfective and antimicrobial, and these are largely interchangable for our purposes. An ingredient is a generic term for an item described in the index. An antiinfective is an umbrella term for an ingredient with anti-infective properties (e.g. an antimicrobial, antiparasitic, or a compound like copper sulphate that has antimicrobial properties). And an antimicrobial is an ingredient with antimicrobial properties, generally recognized as a ‘drug’. An AST generally includes at least one traditional antiinfective, and may include one or more additional active in- gredients (e.g. and copper supplementation) or an adjuvant (e.g. with a beta-lactamase inhibitor).

Hint: When the study uses a drug that specifies a different form than what appears in ATCvet, (e.g. -tartarate, -sulfate, -free acid, -chloride, copper, ccfa, etc) do not attach a note to the reference. Instead, in the factor description field, write “drug administered as X”. e.g. (official name in ATCvet) may be administered as tylosin tartrate.

ATCVet Code Reference

You can explore the ATCVet codes using the Search ATCvet by AM form.

26 Chapter 3. Scope iAM.AMR

This form allows you to enter a single ingredient, and view all codes where that ingredient is included. Additionally, it will show you the class (level 4 grouping) to which the ingredient belongs, other ingredients in that class, and any combinations in which it may be involved outside of the class (level 3 grouping).

Tip: You can view the entire ATCvet index by opening the table s_atc_vet in the Navigation Pane.

Selecting an ATCvet Code with one ingredient

Select the appropriate ingredient.

Two ingredients

An antiinfective and adjuvant

Select the appropriate combination of ingredients. Generally, the adjuvant is not explicitly listed, but is specified by class. e.g. amoxicillin and clavulanic acid would be recorded as amoxicillin and beta-lactamase inhibitor.

An antiinfective and active ingredient

If the ingredients include an antiinfective and another active ingredient...... and the antiinfective and active ingredient are explicitly specified as a combination: • select the appropriate combination – e.g. and . . . and the antiinfective and active ingredient are not explicitly specified as a combination, but belong to the same class, or level 4 grouping . . .

. . . and a non-specific class combination exists . . . • select the appropriate non-specific combination – e.g. and used together would be recorded as chlortetracycline, com- binations . . . and a non-specific class combination does not exist . . . • select the appropriate non-specific combination from the Combinations of Antibacterials level 3 grouping as described below – note that this is an uncommon outcome, as most classes include non-specific combinations . . . and the antiinfective and active ingredient are not explicitly specified as a combination, and do not belong to the same class, or level 4 grouping . . .

. . . and one of the ingredients is included in the Combinations of Antibacterials level 3 grouping . . . • select the appropriate combination • additionally, select the individual ingredients – e.g. chlortetracycline and sulfamethazine used together would be recorded as tetracyclines, com- binations with other antibacterials, chlortetracycline, and sulfadimidine

3.7. The Add or Edit a Factor Form 27 iAM.AMR

. . . and more than one of the ingredients is included in the Combinations of Antibacterials level 3 grouping . . .

• select the appropriate combination using the order of preference below • additionally, select the individual ingredients 1. quinolones 2. cephalosporins 3. macrolides 4. polymyxines 5. 6. 7. tetracyclines 8. amphenicols 9. 10. sulfonamides – e.g. ciprofloxacin and amoxicillin used together would be recorded as quinolones, combinations with other antibacterials (not penicillins, combinations with other antibacterials), ciprofloxacin, and amoxicillin – e.g. amoxicillin and chlortetracycline used together would be recorded as penicillins, combina- tions with other antibacterials (not tetracyclines, combinations with other antibacterials), amox- icillin, and chlortetracycline

Note: Sulfonamides/sulfa drugs are almost always provided as an existing combination - you do not have to select the individual antimicrobials • i.e. sulfamethoxazole and trimethoprim

Idiosyncracies of the ATCvet index

Common alternative ingredient names

The following ingredients have commonly used alternative names – only the official name is given by ATCvet:

Common Name ATCvet Name Cephalothin Cephradine Flavomycin bambermycin Penicillin G Penicillin V/K Sulfamethazine sulfadimidine Sulfisoxazole sulfafurazole Linco-Spectin , combinations

28 Chapter 3. Scope iAM.AMR

Note: When the Penicillin type is not specified, select Penicillin V (phenoxymethylpenicillin).

Always together

Some antimicrobials are always administered in combination, even if the combination isn’t specified in text. If you see one of these antimicrobials, assume they mean this code. These include: • and cilastatin (136)

Order of ingredients

Combinations with sulfonamides are almost always specified with the first • e.g. sulfadimidine and trimethoprim

Other Additions to the Index

• A2C (a resistance pattern of amoxicillin-clavulanic acid, ceftiofur and cefoxitin)

3.8 Literature Search

Several comprehensive literature searches have been conducted to inform the IAM.AMR models. The first search was conducted in May of 2015, and was subsequently updated in June of 2016. A qualitative description of the included studies is available (see below). As of May of 2019, the search is being repeated with a modified search string, designed to include addtional food- animal species and commodities, and additional bacterial species. Additionally, this search excludes human-related factors influencing resistance; human-related factors (e.g. AMU, immune status) will be identified in a seperate search at a later date.

3.8.1 Goal of the Literature Search

The goal of the literature search is to identify, from existing literature, factors that potentially influence antimicrobial resistance in the agri-food production system.

3.8.2 What is a ‘Factor’?

The general definition of a ‘factor’ is a “circumstance, fact, or influence that contributes to a result or outcome”. Sim- ilarly, the term ‘driver’ is defined as “a factor which causes a particular phenomenon to happen or develop”. Despite their near interchangeability, the terms ‘factor’ and ‘driver’ are often interpreted differently in an epidemiological con- text; the term ‘factor’ is used more loosely, while ‘driver’ is generally reserved for those ‘factors’ with a purported causative relationship with their outcome. In the context of the IAM.AMR project, we have defined a ‘factor’ as a practice or circumstance which influences the occurrence of AMR. This is an intentionally broad definition that does not consider the concept of causality; we consider any relationship between an exposure and outcome as a factor, whether or not we can elucidate a causal pathway.

3.8. Literature Search 29 iAM.AMR

Tip: A factor may have a clear, causal relationship, like the relationship between antimicrobial use and resistance. A factor may also have a statistical relationship, but lack a clear causal relationship, like a purported relationship between vacuum packaging and resistance.

3.8.3 The 2015–16 Search

For a complete overview of the search strategy used in 2015–16, and a qualitative description of the captured literature, please see Murphy CP, Carson C, Smith BA, et al. Factors potentially linked with the occurrence of antimicrobial resistance in selected bacteria from cattle, chickens and pigs: A scoping review of publications for use in modelling of antimicrobial resistance (IAM.AMR Project). in Zoonoses and Public Health. 2018; 65:957–971. https://doi.org/10. 1111/zph.12515. The following extracts are provided by the authors:

Search strategy

Comprehensive literature search strings were developed and pretested in Medline to return records for both human and animal populations of (a) the frequency of antimicrobial use or resistance (results not presented) and (b) the factors potentially associated with antimicrobial use or resistance (results presented herein; Appendix 1). The characterization of associations was broad and was not limited to interpretations of statistical significance, but included non-significant, causal and correlative relationships, or possible spurious findings. The searches included multiple broad and specific search terms for antimicrobial susceptibility, antimicrobial use, and population (animal or human), and specific search terms for Campylobacter species, E. coli, and S. enterica, and searches were not limited to a particular study design (e.g., observational, experimental, field trials, mathematical models). Following the pre-test, Medline and three other databases were searched as follows: Agricola, Centre for Agriculture and Bioscience, and Cumulative Index to Nursing and Allied Health Literature, using database-specific search strings adapted from the initial pretested Medline search string.

Search string

((((Antimicrobial[Title/Abstract] OR Antibiotic[Title/Abstract]) AND (Resistance[Title/Abstract] OR Susceptibil- ity[Title/Abstract])) AND (Blactam$[ All Fields] OR (“cephalosporins”[MeSH Terms] OR “cephalosporins”[All Fields] OR “”[All Fields]) OR (“tetracycline”[MeSH Terms] OR “tetracycline”[All Fields]) OR (“quinolones”[MeSH Terms] OR “quinolones”[All Fields] OR “quinolone”[All Fields]) OR (“flu- oroquinolones”[MeSH Terms] OR “fluoroquinolones”[All Fields] OR “fluoroquinolone”[All Fields]) OR (“macrolides”[MeSH Terms] OR “macrolides”[All Fields] OR “”[All Fields]) OR (“nalidixic acid”[MeSH Terms] OR (“nalidixic”[All Fields] AND “acid”[All Fields]) OR “nalidixic acid”[All Fields]) OR (“ciprofloxacin”[MeSH Terms] OR “ciprofloxacin”[All Fields] OR (“enrofloxacin” [MeSH Terms] OR “enrofloxacin” [All Fields])))) AND (cow$[Title/Abstract] OR cattle[Title/Abstract] OR beef[Title/Abstract] OR dairy[Title/Abstract] OR pig$[Title/Abstract] OR sow$[Title/ Abstract] OR piglet$[Title/Abstract] OR pork[Title/Abstract] OR chicken$[Title/Abstract] OR broiler$[Title/Abstract] OR chick$[Title/ Abstract] OR horse$[Title/Abstract] OR turkey$ss[Title/Abstract] OR human$[Title/Abstract] OR foal$[Title/Abstract] OR cat$[Title/ Abstract] OR dog$[Title/Abstract] OR sheep[Title/Abstract] OR lamb$[Title/Abstract] OR goat$[Title/Abstract] OR fish[Title/Abstract] OR rabbit$[Title/Abstract] OR people[Title/Abstract] OR adult$[Title/ Abstract] OR chil- dren[Title/Abstract] OR kid$[Title/Abstract])) AND (E. coli[Title/Abstract] OR Escherichia coli [Title/Abstract] OR Salmonella[Title/Abstract] OR Campylobacter[Title/Abstract])

3.8.4 The 2019 Search

The 2019 search is currently in progress. Please find the instructions for reviewers below.

30 Chapter 3. Scope iAM.AMR

Search strategy

To Be Updated.

Search string

To Be Updated.

3.9 Primary Screening

3.9.1 Instructions for Title-Abstract Screening using Rayyan

What is title-abstract screening?

After a literature search is conducted in the database(s) of interest, the references are imported into a reference manager and de-duplicated. Then, each reference is screened for inclusion in the review. To expedite the screening process, screening is generally conducted in two phases: title-abstract screening, and full-text screening. Title-abstract screening is the process of reviewing studies for inclusion based solely upon their title and abstract. Title- abstract screening allows reviewers to rapidly screen out irrelevant references, leaving fewer to retrieve full-text. Once the full-texts of the references included during the title-abstract phase are retrieved, the references are re-screened to determine their ultimate inclusion in the review. As a reviewer, it is your job to decide, based on the criteria provided, whether a study is potentially relevant, and therefore should be included, or likely irrelevant, and should be excluded.

What software do we use for title-abstract screening?

The systematic review team at PHAC uses DistillerSR, through which the entire review process can be completed. For the initial search, we completed the screening using an Excel spreadsheet. For our updated search, we chose Rayyan1 – a free, online screening tool – to complete title-abstract screening, because it offered the best combination of data managment and collaboration features at no cost. Other screening tools we considered included: abstrackr, Covidence, and DistillerSR. The former was rejected because of difficulties importing bibliographic information from RefWorks, while the latter two were rejected due to cost- prohibitive pricing structures.

Sign up for a Rayyan account

If you do not already have an account at Rayyan (created during the onboarding process), create one on the sign up page. You will not be asked to create a password during sign up; instead, you will be sent a verification link to confirm your email address that will allow you to create a password.

Note: Rayyan will ask for an affiliation; the affiliation provided does not need to match your collaborators.

1 Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210. Published 2016 Dec 5. doi:10.1186/s13643-016-0384-4

3.9. Primary Screening 31 iAM.AMR

Select a review

When you initially sign in to Rayyan, no reviews will be available. You can create a new review in the ‘My Reviews’ tab, or join an existing review in the ‘Collaboration Reviews’ tab. If you signed up for Rayyan during the onboarding process, you should have access to the reviews within one to two days. If you signed up seperately, or do not have access after several days, contact the Literature Search Director for access.

Pretest

The first stage of the review process is the pretest. The pretest serves to allow the reviewers to become familiar with the software and decision process, and to evaluate the training materials and reviewer agreement. The pretest consists of 50 articles, and is designed to highlight any problems with the review protocol or software. Some references with established relevancy (i.e. are known to contain relevant information) have been included to ensure a sufficient number of includes are available in the pretest. Please screen all of the articles in the pretest.

Note: There will be a pretest exit meeting, to ensure lessons learned are incorporated into the full screening process.

Screening

The second stage of the review process is the full screening. This will be opened after the pretest has been completed.

3.9.2 Rayyan Reference

Tip: Recall, to view larger versions of images, right-click on the image and select Open in new tab.

Using the interface

There are three main sections of the Rayyan interface: filters, references, and preview. Rayyan works like an email client; references are selected in the references panel, and are reviewed and acted upon in the preview panel. The filters panel allows for the selection of specific subsets of references, including by year, decision, keywords, etc. You will go through each reference (selected using the preview panel), read the title and abstract, and decide if the study should be included or excluded from the review.

Selecting references to review

Unfortunately, Rayyan does not automatically assign references to reviewers; each reviewer must select their own references to screen. You must set these filters each time you open Rayyan. Each reference must be screened by two reviewers (i.e the references are dual-screened). To only show references which have 0 or 1 reviews (and thus require an additional review), use the Maximum collaborator decisions filter in the filters panel to filter the references with At most 1 decision, as shown below:

32 Chapter 3. Scope iAM.AMR

Fig. 10: The Rayyan interface is comprised of three sections: the filters, references, and preview panels.

Fig. 11: Using the ‘At most 1’ filter limits the returned references to those not already screened in duplicate.

3.9. Primary Screening 33 iAM.AMR

To only show references to which your region has been assigned (e.g. for Ontario and Alberta, to facilitate conflict resolution), select your region (AB or ON) from the “Search Methods” panel, as shown below. The regions are “Up- loaded References [AB.txt]“ and “Uploaded References [ON.txt]”. The region “Uploaded References [OTHER.txt]” is used for other reviewers.

Fig. 12: Select your region to facilitate later conflict resolution.

You must set these filters each time you open Rayyan. Beyond that, selection of references to screen is up to the partcipant; you may choose to go in order, or start from anywhere in the filtered list.

Making decisions

Rayyan supports three decision states: include, maybe, and exclude.

Important: Please ignore the maybe state – we will not be using maybe in this review.

After reviewing the reference, you will either choose to include the study, or exclude the study and provide a reason for exclusion. The figure below shows an example of five references, each in a different decision state. The first two references are included – the second reference is included with a label. The third and forth references are excluded – the fourth includes a reason for exclusion. The fifth reference is marked as maybe.

Fig. 13: An example of each inclusion/exclusion decision state in Rayyan.

Note: Labels, are akin to reasons for exclusion, but can be used to tag included references. Both maybe and labels are not used in this review.

How do I decide?

Recall that the goal of this review is to identify any factors that influence the occurrence of resistance in our bacterial species of interest, isolated from our food-animal commodities of interest:

34 Chapter 3. Scope iAM.AMR

Food Animal Species • Chicken • Cattle • Swine • Turkey Bacterial Species •E. coli • Salmonella Spp. • Campylobacter Spp. • Enterococcus Spp. Determining whether a reference includes the above populations is the easy part – determining whether the reference reports the effect of a factor is more difficult. You will have to read the title and abstracts closely to determine whether or not the reference includes a factor. A key concept to keep in mind is that a reference that includes a factor must include a comparison amoung two or more groups (or report the prevalence of resistance in two more more groups, from which we can derrive a comparison). We have developed a simple flowchart to assist in the screening process: Each question (stage of the flowchart) is designed to be more specific than the last; we can quickly identify studies to be excluded as we work through each of these questions. And by dividing the process into discrete questions, we can define explicit reasons that studies were excluded from the review.

Using reasons

Reasons make it clear why a reference was excluded from the review. The reasons for exclusion are important metrics to report when publishing a literature review. Unfortunately, Rayyan does not include a way of pre-populating reasons; we must add each reason as we find appli- cable excluded studies. Luckily, once we add a reason, it will remain in the list for easy access. We use the following exclusion reasons: • 00. Other • 01. Wrong Commodity • 02. Wrong Bacterial Species • 03. No Factor

Tip: We add numbers to the reasons because they are sorted alphabetically; adding numbers ensures they are easily accessible, and listed above the built-in reasons.

Reasons 1 through 3 correspond to the questions in the flowchart, and are organized hierarchically. If more than one reason applies, we use the first reason to exclude the reference.

Hint: For example, if the study assayed Brucella in sheep, the reference is excluded with the reason ‘01. Wrong Commodity’, despite that both reasons 1 and 2 (and potentially 3), apply.

If reasons 1 through 3 do not apply, but the reference nonetheless should be excluded, select reason ‘00. Other’.

3.9. Primary Screening 35 iAM.AMR

Fig. 14: The screening process.

36 Chapter 3. Scope iAM.AMR

Hint: Only use reason ‘00. Other’ where the reference is so unrelated that the flowchart is not applicable. For example, a book ‘The risk of bioterrorism from Salmonella’ may be excluded using reason 0, as there is no commodity, and it is not clear if resistance was even assayed.

Using the keywords highlighting feature

During screening, it may become apparent that the presence or absence of certain key words predicts reference inclu- sion or exclusion – or vice-versa. For example, an abstract containing the word ‘murine’ is very likely to refer to a study conducted in mice. If your review is focused on outcomes in humans, this reference is likely to be irrelevant, and be rejected.

Caution: Keywords are simply correlated with reference inclusion or exclusion – you must still review the contents of the abstract in full to make a decision.

Rayyan includes a keywords highlighting feature which allows you to automatically highlight inclusion-related key- words in green, and exclusion-related keywords in red. This highlighting feature is enabled by default, and prepop- ulated with lists of automatically generated keywords. You will likely have to heavily-curate these lists; by default, Rayyan assumes you are conducting a systematic review of clinical trials, and bases the keywords on this assumption.

Tip: To disable this feature, simply toggle the large green ‘Highlights’ button in the preview panel.

You can customize your keywords in the filter panel. Delete the existing keywords from the ‘Keywords for include’ and ‘Keywords for exclude’ lists using the garbage can icon beside each term. To add new keywords, use the ‘Add new’ link in the lists’ titles. Note that the keywords must match exactly – you may need to add singular and plural versions of each keyword. Some recommended keywords are provided below. Include: • Food-animal commodity names (e.g. broilers, chicken, cattle, swine, pork, beef, steak) • Food-animal commodity adjectives (e.g. avian, bovine, porcine) • Bacterial genus names (e.g. E. coli, Salmonella, Campylobacter, Enterococcus) • Indications of a factor/comparison (e.g. factor, factors, comparison, effect) Exclude: • Unrelated food-animal commodity names or indications of human populations (e.g. dog, fish, aquaculture, public health, hospitalization) • Unrelated bacterial genus names (e.g. Staphylococcus, Klebsiella) • Indications of no factor/comparison (e.g. prevalence, surveillance)

3.9.3 FAQs

What should I do if the abstract is missing from Rayyan?

If the abstract is missing from Rayyan, judge inclusion based upon the title.

3.9. Primary Screening 37 iAM.AMR

What should I do for conference proceedings?

If the reference refers to a conference proceeding, reject the reference unless the abstract indicates the proceedings are of obvious value (e.g. a proceeding for “the symposium on factors influencing antimicrobial resistance in the agri-food system”.

What should I do if I am unsure whether to include or exclude the reference?

Use your best judgement! By ensuring we screen every reference twice (using a different reviewer), the likelihood of a reference being erro- neously included or excluded is minimized; your counterpart is unlikely to make the same mistake (if your choice was not appropriate).

What happens if the reviewers are in disagreement over the inclusion status of a reference?

If the reviewers are in disagreement (i.e. are in conflict) with respect to the inclusion status of a reference (i.e. one includes, and one excludes the reference), the reviewers will resolve this disagreement at the end of the screening process. If the reviewers cannot come to an agreement (i.e. both maintain that they believe their interpretation to be correct), the conflict will be resolved by the Screening Arbitrator.

3.10 Data Extraction Rules

3.10.1 Data Extraction Rules

Immutable Factors

Immutable factors are defined as those that are not practically modifiable or reproducible. These include factors such as: • unique comparison groups or locations – Barn A vs. Barn B – Sweden vs. Switzerland • breed or type of animal – Ross chicks vs. Cobb chicks – Swedish Friesian vs. Swiss Holsteins • life stage or production stage – egg vs. chick vs. broiler – grow-finish vs. farrow-to-finish – farm vs. abattoir vs. retail • unique years/periods of time – 1998 vs. 1999 • size of herd of origin – small vs. large

38 Chapter 3. Scope iAM.AMR

Note that factors which assess the same animals before and after AMU are acceptable. Immutable factors should not be extracted.

Important If a reason for the comparison is given, such as a growth promoter ban/change in related industry policy, factors comparing unique years/periods of time may be valid for extraction. See Antimicrobial ban factors.

Common Factor Types

To compare conventional, ABF, organic, ‘welfare’ or ‘humane’ production systems, note that these alternative systems are not the same. While all organic is ABF (antibiotic-free), not all ABF is organic. ‘Welfare’ and ‘humane’ production systems are likewise different.

Selective Media

About Selective Media

A growth medium (plural media) or culture medium is a fluid designed to support the growth of microorganisms. There are many different types of media; some examples related to the iAM.AMR project include: MacConkey agar (for gram-negative bacteria), and XLD agar (commonly used for Salmonella spp.). When compounds are added to media to select for the growth of specific organisms, it is referred to as selective media. However, it is important to differentiate the uses of selective media in the context of Antimicrobial Susceptibility Testing (AST): 1. to broadly exclude unwanted organisms 2. to select for desired organisms 3. as a simple AST Selective media are routinely used to broadly exclude unwanted organisms (purpose 1). For example, MacConkey agar is used to exclude gram-positive organisms when culturing gram-negatives. And media supplemented with third- generation cephalosporins (3GCs) are used to exclude other, faster-growing organisms while culturing Campylobacter spp.

Note: This is an important example to note. Campylobacter are considered intrinsically resistant to 3GCs. The use of 3GCs is not selecting for Campylobacter based on this resistance – it’s excluding other, non-resistant organisms. This is in contrast to the example below, where fluoroquinolones are used to select for specific Campylobacter.

Selective media can also be used to select for specific organisms based upon a specific trait, such as antimicrobial resistance (purpose 2). In contrast to the previous example, we could add a fluoroquinolone to the medium to select for only fluoroquinolone-resistant Campylobacter – susceptible Campylobacter will be inhibited by the antimicrobial. Finally, selective media can be used as an AST in and of itself (purpose 3). Adding a suspect organism to selective media (and watching to see if it grows) can answer binary questions (e.g. is this organism resistant?). This gives no information on the level of resistance. To determine the level of resistance, this process must be repeated with various levels of selective compound (akin to broth dilution methodologies).

3.10. Data Extraction Rules 39 iAM.AMR

Selective Media in CEDAR

• In the context of data extraction, we only consider “selective media” as used for purpose 2: to select for desired organisms. • As per our data extraction rules, we prefer results from non-selective media where available. • There is a separate field for indicating where selective media are used for growth/culture, and for the AST itself (which can also be via selective media, as described above).

Hint: Another way to think of this distinction: we only consider media “selective“ in the context of data extraction where the added compound could differentiate between organisms of the same species, based on an acquired resistance.

Binary and Continuous Factors

When a factor is binary (i.e. two discrete outcomes, such as “Yes” and “No”), it shall be extracted. When a factor is continuous (e.g. a one unit increase in the predictor results in an X unit increase in the outcome) it shall not be extracted.

Selecting a Referent Level

Binary factors consist of a referent and exposure group (e.g. your control and exposure groups respectively). The referent should generally be defined as the default practice in industry, or the least interventionist, while the exposure is the less common, or more interventionist approach. See above for more details.

Multiple Discrete Levels (Categories)

When a factor has multiple levels (e.g. low, medium, and high), the factor shall be extracted seperately for each level, using the same referent level. For example, for a factor with levels low, medium, and high, the factor is extracted as low vs. medium, and low vs. high. The factor medium vs. high shall not eb extracted. The choice of referent level is described above.

Non-informative Levels

An exception is non-informative levels, which shall not be extracted. For example, for a factor with levels ‘red’, ‘blue’, and ‘other’, the factor is only extracted as red vs. blue, because the ‘other’ is not part of a defined set, and cannot be inferred from the comparison. But, where levels are drawn from a defined set, they shall be extracted (these are few and far-between). For example, for a factor with levels ‘summer’, ‘winter’, ‘other’, the factor is extracted as ‘summer’ vs. ‘winter’ and ‘summer’ vs. ‘other’, as the ‘other’ can be inferred.

Factor Data

When multiple data formats are available, we always prefer contingency tables (count data), followed by prevalence tables, and finally odds ratios or relative risk. You only need to extract one format of data for a given factor. If data is only presented as relative risk we cannot use relative risk at this time so do not extract the factor’s data, but indicate the omission in the notes field.

40 Chapter 3. Scope iAM.AMR

If data are presented as odds ratios, extract those from univariable analyses, but not those from multi-variable analyses. If the results are in log(Odds) or an estimate/coefficient of a logistic regression, recall that the Odds Ratio = e^x, where x is the coefficient. In cases where there are zero observations of resistance in both the exposed and referent groups, corresponding values may be omitted from tables but still mentioned in-text. Such “non-significant” values should still be extracted. If a study includes an ‘Intermediate’ category, add the intermediate isolates/prevalence to the resistant category (i.e. we round up intermediate to resistant).

Figure Extraction Using WebPlotDigitizer

If the factor data are only available in a figure, it may be possible to extract the data with the use of WebPlotDigitizer. Here is the link to the website https://apps.automeris.io/wpd/. This is what you will see when you go to the website:

Under “File” at the top left corner, click “Load Image(s)”. Select the screenshot image file (.png or .jpg file types) of the figure(s) you want to extract. After selecting the image file(s), you will be asked to select the type of graph.

3.10. Data Extraction Rules 41 iAM.AMR

After selecting the type of graph, you will be shown how to calibrate the figure.

Click the points on the axes, then fine tune using the arrow keys on your keyboard.

42 Chapter 3. Scope iAM.AMR

To calibrate a scatterplot, 2D(X-Y) Plot should be selected and both axes require 2 data points to be added.

You will be asked to enter the values of the points chosen.

3.10. Data Extraction Rules 43 iAM.AMR

For this example, the image was magnified by using the “+” button at the top right corner. On the right side of the screen is the data extraction where automatic extraction can be used first. For figure like this example where 2 colours are used, change colour from foreground to background and adjust the distance to detect smaller differences in colour (70 selected for this example). Click “run” at the bottom.

The manual method can be used to make adjustments as needed (in this example there are many extra points that require deleting).

44 Chapter 3. Scope iAM.AMR

After editing labels, go to “View Data” where it can be exported as a csv file.

When extracting data using the 2D (X-Y) Plot method, you will notice that there is no option to edit labels.

3.10. Data Extraction Rules 45 iAM.AMR

The number formatting can be adjusted to 2 decimal places by entering 2 for digits and selecting “Fixed” from the dropdown menu. The data can now be downloaded as a csv file.

Since you cannot add labels for the 2D (X-Y) Plot option, it may be useful to sort the data by x value. Order can be left as ascending. Note that the x-axis cannot be properly calibrated with the irregular intervals between the days for scatterplots like the on in this example. Only y-values will be used for the figure extraction.

46 Chapter 3. Scope iAM.AMR

3D Plots

It is not always feasible to extract 3D plots. WebPlotDigitizer can only extract from a 3D plot when the axes are in the same plane as your computer screen, or where the 3D bars are flush against the front and back of the axis. For example, the figure below can be extracted by manually selecting points, ideally from the top left corner of each 3D bar.

assets/figures/wpd_3D_good.PNG

Fig. 15: Extractable 3D Plot. From Figure 1 in “Decontamination treatments can increase the prevalence of resistance to of Escherichia coli naturally present on poultry” by Capita et al., 2013. doi: 10.1016/j.fm.2012.11.011.

However, WebPlotDigitizer is unable to extract 3D plots with distorted perspectives, i.e. where the axes are offset (putting a space between the axis and the bars). Plots like those shown below are thus excluded from extraction.

assets/figures/wpd_3D_bad.PNG

Fig. 16: Non-Extractable 3D Plot. From “Changes in Antimicrobial Susceptibility of Native Enterococcus faecium in Chickens Fed ” by McDermott et al., 2005. doi: 10.1128/AEM.71.9.4986-4991.2005.

Tip: For more on how to use WebPlotDigitizer tutorials can be found here https://automeris.io/WebPlotDigitizer/

3.10. Data Extraction Rules 47 iAM.AMR tutorial.html and the user manual can be found here https://automeris.io/WebPlotDigitizer/userManual.pdf.

Resistances and MDR

All factors related to antimicrobial resistance should be extracted, including those related to non-traditional antimicro- bials (e.g. ionophores, coccidiostats, and metals). They should be extracted as finely as possible where specified (e.g. ceftiofur-resistance, rather than third-generation cephalosporin resistance). Multi-drug resistance (MDR) should not be extracted, because the specific combination of resistances is impossible to compare to across studies/situations. However, if you are presented with MDR data, it may be possible to tease out antimicrobial-specific data. Before you do - ensure that the individual antimicrobial data For example, imagine that ‘X’ and ‘Y’ number of isolates were tested for each ‘Poor’ and ‘Good’ producers, as in the study below:

Fig. 17: An example of an MDR table using prevalences from Spears (1990).

We can tease out this information by adding up the occurence of resistance across all profiles, to calculate the number of resistant organisms.

Antimicro- AMR+ in Poor Producers AMR+ in Good Producers bial GM (0.19)(X) + (0.579)(X) + (0.744)(X) (0.218)(Y) + (0.902)(Y) + (0.451)(Y) SU (0.19)(X) + (0.579)(X) + (0.1074)(X) + (0.218)(Y) + (0.902)(Y) + (0.827)(Y) + (0.0992)(X) (0.0977)(Y) AM (0.0165)(X) (0.0376)(Y)

48 Chapter 3. Scope iAM.AMR

Multiple Measurements

Often, factors may be assessed at multiple time-points. For example, swine may be sampled for resistant organisms at birth, weaning, growing-finishing, and again at abattoir. As a general rule, where the allocation and observation stages are the same, the Measurement Closest to Human Exposure or MCHE should be extracted. Where the allocation and observation stages differ, the MCHE within the allocation stage should be extracted (if available). These rules, and their exceptions, are described below.

Multiple Measurements at a Single Stage

Where multiple measurements are available at a single production stage (i.e. the allocation and observation stages are the same), the measurement closest to human exposure should be extracted, except . . . where there are missing or unavailable data at the time-point closest to human exposure

Example Resistance was assayed at days 10, 20, and 30 of production for the exposed group, but only at days 10 and 20 for the referent group. Day 20 is extracted.

• where the time-point is not applicable to the Canadian context e.g. a measurement at >36 days into broiler production, past the point of harvest in Canadian industry.

Multiple Measurements at Farm

Where multiple measurements are available at the on-farm stage for cattle and swine, a measurement should be extracted at the end of each production sub-stage. This includes the following: • Cattle – Stage 1 – Stage 2 • Swine – Stage 1 – Stage 2 See the production basics section for more information. Multiple Measurements at Multiple Stages

Sample Type

Where individual fecal samples are available, those are preferable to pooled samples. When a pooled fecal sample can’t be taken directly from the animal, the goal is to obtain the equivalent of a pooled fecal sample. Extract litter/barn floor samples and not water/feed/dirt samples.

3.10. Data Extraction Rules 49 iAM.AMR

Provisional Rules

3. Genomic data – record if AMR prevalence given + note what gene in description (can leave AMR dropdown empty – tetA and tetB are available in AMR dropdown though!), otherwise make a note (eg, CFU/g, gene copies, etc.). 5. Salmonella species – combine if AMR prevalence given for more than one Salmonella species

3.10.2 Common Concerns

What do I do if ...... there are no factors to extract If there are no factors to extract, indicate this using the notes field, and skip the reference. . . . I’m confused about how to extract a factor If you’re confused about a factor, reach out on Slack for clarification. Additionally, add a note to indicate why the factor was extracted in that way. . . . an item I need is missing from a dropdown If an item is missing from a dropdown (i.e. a non-free-text field), reach out on Slack. If the decision is made to use an alternative item in the list, add a note to justify this replacement.

3.11 Data Extraction Tips & Tricks

3.11.1 Factor-Level Extraction

Defining the factor itself

What do I do if ...... it is not clear which factors are considered relevant points of comparison, and which are not Factors listed here are considered “unmodifiable” or “immutable”, and thus not relevant for our purposes: • Immutable Factors For examples of extractable factors, please see: • Common Factor Types • Binary and Continuous Factors . . . I’m confused about how to extract a factor If you’re confused about a factor, reach out on Slack for clarification. Additionally, add a note to indicate why the factor was extracted in that way. . . . I want to compare conventional, ABF, organic, ‘welfare’ or ‘humane’ production systems Common Factor Types

50 Chapter 3. Scope iAM.AMR

Selecting exposed and referent groups

Note: Groups are otherwise referred to as levels.

What do I do if...... I am not sure which is which? Selecting a Referent Level . . . a group is not clearly defined (i.e. an “Other” group)? Non-informative Levels . . . there are more than two designated groups in the study? Multiple Discrete Levels (Categories) Non-informative Levels

Selecting the sample type

Which sample type should be extracted if multiple (i.e. fecal, water, dirt. . . ) sample types are avail- able?

Sample Type

Choosing the microbe subtype

What do I do if...... a microbe subtype is not listed in the dropdown in CEDAR For Salmonella species: • Salmonella Species

Factor data

What do I do if...... the data are only available in a figure If factor data are only available in a figure (i.e. no numbers are given on a graph, or in text), and the numerical value cannot be determined with certainty (i.e. is not zero or 100%), indicate this using the notes field, and skip extracting the factor. . . . multiple data formats (i.e. a contingency table and a prevalence table) are available for a factor Multiple Data Formats . . . measurements are provided for multiple time points Multiple Production Stages Multiple Timepoints Within a Single Production Stage Multiple Timepoints Within the Farm Stage

3.11. Data Extraction Tips & Tricks 51 iAM.AMR

. . . the study uses SIR (Susceptible, Intermediate, and Resistant) If a study includes an ‘Intermediate’ category, add the intermediate isolates/prevalence to the resistant category (i.e. we round up intermediate to resistant). . . . odds ratios from both multi-variable and univariable analyses are available Odds Ratio Extraction . . . there are zero observations of resistance in both the exposed and referent groups Zero Observations of Resistance . . . the results are in log(Odds) or an estimate/coefficient of a logistic regression Recall that the Odds Ratio = e^x, where x is the coefficient. . . . the data are presented only as a relative risk We cannot use relative risk at this time. Do not extract the factor’s data, but indicate the omission by attaching a note to the associated reference through the Notes and Issues tab. . . . the study reports multi-drug resistance (MDR) MDR Rules . . . the study reports genomic data on AMR Genomic data

3.11.2 General

What do I do if...... there are no factors to extract If there are no factors to extract, indicate this using the Exclude Extraction Reason field, and skip the reference. . . . an item I need is missing from a dropdown If an item is missing from a dropdown (i.e. a non-free-text field), reach out on Slack. If the decision is made to use an alternative item in the list, add a note to justify this replacement.

3.12 Data Extraction Notes

3.12.1 Add or Edit a Factor Form

• Always use the first match to an antimicrobial in the dropdown • Removing duplicate antimicrobials will result in both being deleted • Ignore discrepancies in cascading fields • Duplicating factors DOES NOT duplicate the AMU field

52 Chapter 3. Scope iAM.AMR

3.12.2 Add or Edit a Reference Form

3.13 Selecting Factors for Models

This section is a guideline for the process of determining which factors are eligible for inclusion in the iAM models. You can begin this process after receiving your timber.

Important: If you make any corrections to the factors in your query of timber (which is likely unavoidable!), please make a note of which ones you have corrected, and always keep a copy of your initial query. As you evaluate each of your factors, whether individually or by discussing with an industry expert, please document these discussions as well, as well as any decisions and rationale regarding inclusion in or exclusion from your model. Eventually, this information will be captured within CEDAR.

3.13.1 Correct Factors

The process of model-building also serves as a chance to correct any errors that were made during the data extraction process. Upon receiving your data query, the first thing you should do is run it through Sawmill. Further steps are outlined below.

Check the scrap pile

Sawmill will produce a file called the scrap pile if it was unable to calculate an odds ratio for one or more of the factors in your timber. If you’re not sure what sawmill or timber is please see Sawmill. There’s also a more detailed rundown of the scrap pile here. In some cases, these factors may have been extracted incorrectly, with one or more of the numerical fields necessary for odds ratio calculation missing from the timber. In this case, you will need to check the text of the paper to see if the key field(s) are available or not. If they are available, you can simply correct the factor and rerun it through sawmill. If the paper does not contain the fields you are missing, it will have to be excluded.

Check the prevalence table totals

These are singled out for specific correction because a common error made during data extraction was to set the totals in prevalence tables to 100 (to represent 100%), but these are actually meant to capture the total number of isolates/animals/samples (whatever the correct unit of analysis is) in each group.

Check all factors

This may take awhile depending on how many factors you have in your query. However, it is an important step to check that the data extraction was done correctly, for each and every field captured.

3.13.2 Check These Common Reasons for Exclusion

Important: Remember to document any factors and papers you are excluding for the reasons outlined below or any other reason.

3.13. Selecting Factors for Models 53 iAM.AMR

Factor is not modifiable

While the general practice was to exclude non-modifiable, or immutable factors such as age, location, or breed (read more) at the data extraction phase, it is not always clear whether a factor is modifiable. As such, there may be some non-modifiable factors in your query that you will need to exclude from your model. Questions of whether a factor is modifiable or not are also context-dependent, and sometimes warrant consultation with an expert. Especially when it comes to factors related to management practices, a factor may be theoretically modifiable, but the implementation required may be cost prohibitive such that the factor is not practically modifiable.

Selective media used at isolation

Midway through the data extraction process, a decision was made to exclude factors for which selective media was applied at the isolation step. For the rationale behind this decision, please see here. As this decision was only made midway through the process, it is likely that some factors were extracted that do not meet this criteria.

Appropriate production stage or sub-stage is unclear

It may be difficult to determine the appropriate production stage or sub-stage at which to place a factor in the model. For instance, a study may state that “pigs from farms across the United States were sampled” without specifying whether these are finisher pigs, nursing piglets, sows, or weaned nursery pigs.

Multiple production stages or sub-stages are combined

It is also possible for data to be aggregated across some combination of the farm, abattoir, and/or retail stages. Other common examples of this particular reason for exclusion are: 1. One or both groups compared in the factor are representative of aggregated samples from multiple farm sub- stages (i.e. a conventional vs organic production system factor, where AMR data in each comparison group is representative of both weaned nursery pigs and finishing pigs) 2. Broiler and layer chickens are either aggregated together in both study groups, or one study group is made up of broilers and the other is made up of layers (even if this is not the focus of the comparison, i.e. a production system factor may compare organic broilers with conventional layers) 3. Chickens and turkey (sometimes referred to collectively as poultry) treated similarly to broiler and layer chickens in point 2

Note: This is especially relevant to cattle and swine, as they spend a significantly longer time at the farm stage—long enough for that farm stage to be split into multiple sub-stages.

3.13.3 Check These Other Possible Reasons for Exclusion

Stage of AMR measurement differs from the stage of factor application

Generally speaking, factors where the site of AMR measurement differs from the site of factor application (i.e. an- timicrobial use on the farm, sampled at retail) are excluded from our models. There are a few potential exceptions to this rule, however:

54 Chapter 3. Scope iAM.AMR

Production system factors measured at retail

If a production system factor (say organic vs conventional) is measured via retail meat samples, the factor can be applied at the retail stage in the model. However, if you have factors eligible for modelling, including production system factors that are measured at farm, you should exclude those measured at retail.

Factors applied at farm and measured at abattoir

If a factor is applied at the farm to a group of animals that are then followed to the abattoir for sampling, the factor may be eligible for inclusion in a model. There are a few different possibilities for factors that fall into this category: Sampling was performed before any processing effects took place, and samples representative of individual animals (such as caecal swabs or droppings) have been taken: The samples are likely representative of the farm stage. Sampling was performed before any processing effects take place, and “external” samples have been taken (i.e. a hide or skin swab, or a floor swab of the transport truck): The samples are likely representative of the farm and transport stages. Sampling has been performed after processing (most commonly via carcass swab): The samples are likely representative of the abattoir stage.

Non-specific antimicrobial use factors

Some papers may contain general antimicrobial use factors, where the antimicrobial(s) administered are not specified. If there are factors related to the use of specific antimicrobial(s) (i.e. ceftiofur use) eligible for inclusion in your model, these less well-characterized factors can likely be excluded. Alternatively, these may be run separately from any specific AMU factors.

Factor is not well-characterized

These are factors that are not fully characterized in the paper, where comparison groups may be difficult to interpret. Here are a few examples: Controlling flies with toxin: • No info on what the “toxin” is • The difference between the two groups is not clear: does one control flies with a toxin, while the other does not control flies at all? Or does the other group use an alternative method of control? Infrequent disinfection vs frequent disinfection: • How often is “infrequent”? How often is “frequent”? • What is the disinfection agent/how are the authors defining disinfection?

3.13. Selecting Factors for Models 55 iAM.AMR

The resistance outcome is a combination of antimicrobials

With the exception of common combinations, i.e. imipenem and cilastatin, quinupristin and dalfopristin, or sul- famethoxazole and trimethoprim, which should appear as established options in the data extraction AMR dropdown menu, other factors must be associated with individual resistance outcomes to be eligible for inclusion in the iAM models. General resistance, or multidrug resistance, where the resistance outcome is not specified, should also be excluded from models.

Tip: Filter your query on the AMR field, with only blank cells selected. This may identify factors without that slipped through the extraction process, with an unspecific or combination resistance outcome.

3.13.4 Discuss with an Industry Expert

Relevancy to the Canadian context

As the objective of the iAM project is to produce models that are applicable to the context of the Canadian agri-food industry, this is an important step in the factor selection process. There are two ways a factor may be relevant to the Canadian context: 1. It is used in Canada 2. If approved for use in Canada, its application or use may impact AMR Factors in the second category will likely be included in the model, but run separately from the factors representative of the typical Canadian industry to explore “what-if” scenarios. These “what-if” scenarios may also include factors not yet approved in Canada, but which have the potential to become relevant through future policy change and are thus worth exploring.

Hint: For food-animal species that spend a longer time at the farm stage before processing (namely cattle and swine), relevancy of a factor may vary between sub-stages of the farm stage. For example, some antimicrobials administered to nursing piglets or weaned nursery pigs may be withdrawn for part or all of the finishing stage due to residue concerns.

Frequency of occurrence

The frequency of occurrence of each factor in the Canadian context should be determined by consulting with an expert, and captured at the frequency node of your model.

3.13.5 If you have too many factors in your model. . .

If you are looking to cut down on the number of factors in your model, or need to due to Analytica constraints, a good place to start is to identify papers that are measuring the same factor, in the same host or host sub-population. For example, you might have two papers measuring the effect of ceftiofur use in piglets. In this case, you may choose to include only the study with the larger sample size, or that was performed in a population more representative of the Canadian context to cut down on factors.

Tip: Your standard error is a proxy for sample size, where a large SE is representative of a small sample size

56 Chapter 3. Scope iAM.AMR

3.13.6 Other Model Components

The following elements are handled by the iAM.AMR.HUB module: 1. Baseline prevalence and distribution 2. Bacterial recovery at retail 3. Consumption from the Foodbook But, you should check to ensure that these apply to your model: 1. Check to see if your baselines are informed by actual data or placeholder values. 2. If they are informed by placeholder values: check to see if the placeholders are applicable to your scenario. If you have better estimates of the baseline than the default value(s), perhaps informed by your discussions with an expert, use those instead.

3.13.7 Other Recommendations or Conventions

Analyticar (when we fix it!)

3.14 Meta-analysis Guidelines

3.14.1 Combining Factors with Meta-analysis

Meta-analysis is a statistical approach for combining data from multiple studies, often used to increase statistical power, or resolve uncertainty in effect size or direction. The simplest way to think of a meta-analysis is as a weighted average of the included observations, where the weighting accounts for the statistical properties of the studies. Meta-analysis is used in the iAM.AMR project to derive a single effect estimator where multiple studies, or multiple observations within a study, are available to describe a given factor.

3.14.2 When should meta-analysis be performed?

Meta-analysis must only be performed where the effect measure, and the study populations, are identical or highly similar. Therefore, meta-analysis should never be performed: • across food-animal species (species) • across sub-populations of food-animals (such as between piglets and finishing pigs) • across bacterial species (microbes) – including between Campylobacter jejuni and Campylobacter spp. • across classes of antimicrobials (and sometimes even within an antimicrobial class) – see Rules for Combining Resistance Outcomes for more specific guidance • across classes or sub-classes of antimicrobials – see Rules for Combining Resistance Outcomes for more specific guidance • across production stages (i.e. do not combine a factor applied at the farm with a factor applied at retail) – this includes where the effective stage is the same, but the measurement is taken at a different stage. • across different factors

3.14. Meta-analysis Guidelines 57 iAM.AMR

– this includes factors that are similar (i.e. two different ceftiofur use factors), but are different due to their underlying methodologies (for example, administration of ceftiofur at a dose of 5 mg/kg vs administration of ceftiofur at a dose of 30 mg/kg). When a measurement is available for the same factor, the same stage of production, the same food-animal and food- animal sub-population, pathogen, and antimicrobial (or sub-class of antimicrobial), as one or more others, they may be included in one of four types of meta-analysis:

Within Study, Same Antimicrobial

Where multiple measurements are available describing the same factor (with the same experimental conditions), for the same resistance, the measurements should be combined using meta-analysis.

Tip: Two comparable sub-populations comprise the study population (e.g. barn A and barn B), and ceftiofur resis- tance is assayed for each. Meta-analysis is conducted for these observations.

Within Study, Same Antimicrobial Class (or Sub-Class)

Where multiple measurements are available describing the same factor (with the same experimental conditions), for the same class or sub-class of resistance, the measurements should be combined using meta-analysis.

Tip: Resistance to ceftiofur and are both included in the assay. Meta-analysis is conducted for these observations, and the resistance is reported at the sub-class level (third-generation cephalosporin resistance).

Resistance to ceftiofur and ceftriaxone are both included in the assay, and there are two comparable sub-populations which comprise the study population. Meta-analysis is conducted for all of these observations, and the resistance is reported at the sub-class level (third-generation cephalosporin resistance).

Across Studies, Same Antimicrobial

Where multiple measurements are available describing the same factor, for the same resistance, and the experimental conditions are comparable, the measurements should be combined using meta-analysis.

Tip: Two studies measure the effect of production type (e.g. organic vs. conventional) on ceftiofur resistance. Meta-analysis is conducted for these observations.

Across Studies, Same Antimicrobial Class (or Sub-Class)

Where multiple measurements are available describing the same factor, for the same class or sub-class of resistance, and the experimental conditions are comparable, the measurements should be combined using meta-analysis.

Tip: Two studies measure the effect of production type (e.g. organic vs. conventional), one on ceftiofur resistance, and the other on ceftriaxone resistance. Meta-analysis is conducted for these observations.

58 Chapter 3. Scope iAM.AMR

3.14.3 Meta-analysis Rules

Resistance Outcomes

Generally, two or more factors may be combined using meta-analysis if their resistance outcomes belong to the same antimicrobial class or sub-class.

Tip: Check that you have the correct antimicrobial class for each of your resistance outcomes. The ATC vet codes (the classification system we use in CEDAR) sometimes classify them slightly differently than they should be.

The table below outlines some common antimicrobials, their antimicrobial class, and which other antimicrobials they may or may not be combined with via meta-analysis. If an “M” is present in the Meta-analysis Status column for a particular antimicrobial, that antimicrobial may be combined with other antimicrobials marked with an “M” that share the same antimicrobial class (and likely may also be combined with other antimicrobials within that same antimicrobial class that are not listed here). Antimicrobials which are the only entries for their corresponding antimicrobial class, and for which the Meta-analysis Status column is blank may also likely be able to be combined with other antimicrobials within that same antimicrobial class that are not listed here.

Antimicrobial Antimicrobial Class Meta-analysis Status cefalotin 1GC M 1GC M 1GC M 3GC M 3GC M ceftiofur 3GC M ceftriaxone 3GC M 4GC aminocycitol M amikacin aminoglycoside aminoglycoside gentamicin aminoglycoside kanamycin aminoglycoside aminoglycoside streptomycin aminoglycoside aminoglycoside chloramphenicol amphenicol M florfenicol amphenicol M imipenem and cilastatin cefoxitin cephamycin trimethoprim diaminopyrimidine sulfamethoxazole and trimethoprim diaminopyrimidine with sulfonamide M sulfadiazine and trimethoprim diaminopyrimidine with sulfonamide M ciprofloxacin fluoroquinolone M enrofloxacin fluoroquinolone M marbofloxacin fluoroquinolone M macrolide macrolide furazolidone nitrofuran derivatives M nitrofurantoin nitrofuran derivatives M Continued on next page

3.14. Meta-analysis Guidelines 59 iAM.AMR

Table 1 – continued from previous page Antimicrobial Antimicrobial Class Meta-analysis Status amoxicillin penicillin M ampicillin penicillin M amoxicillin and beta-lactamase inhibitor potentiated penicillin nalidixic acid quinolone sulfafurazole sulfonamide M sulfamethoxazole sulfonamide M chlortetracycline tetracycline M oxytetracycline tetracycline M tetracycline tetracycline M

Important: For amoxicillin, ampicillin, and , it is important to verify that the indications in this table pertain to situations where these antimicrobials are present alone and not in combinations such as amoxicillin and clavulanic acid, (i.e. ampicillin sulbactam), (i.e. piperacillin tazobactam), etc. When present alone, they may be combined via meta-analysis (amoxicillin & ampicillin & piperacillin). They may also be combined when present in combination (e.g. amoxicillin and clavulanic acid & ampicillin and sulbactam). However, “alone” and a combination should not be combined via meta-analysis (e.g. amoxicillin & amoxicillin and clavulanic acid).

Genomic resistance outcomes

Only resistance outcomes pertaining to the exact same gene may be combined using meta-analysis. Different genes which confer (or may confer) resistance to the same antimicrobial class or individual antimicrobial should not be combined (i.e. tetA and tetB), nor should they be combined with any phenotypic outcomes.

Tip: Gene subgroups (such as blaCTX M1, blaCTX M2) should not be combined with one another.

Different units of analysis

Factors measured using different units of analysis (i.e. isolate and flock) may be combined with meta-analysis.

Production type factors

Factors comparing organic and conventional production may be combined with factors comparing antibiotic-free and conventional production. As all organic production is by default antibiotic-free, but not all antibiotic-free production is organic, the meta-analysis result should be reported as an antibiotic-free vs conventional production comparison.

Important: Please note that definitions of organic and antibiotic-free production vary across studies, especially if those studies were conducted in different countries. For instance, in some cases, antibiotic-free production for swine is defined as no antimicrobials given after weaning (allowing AMU in piglets), while other papers may define antibiotic-free production as no antimicrobials given over the duration of the pigs’ lives. Another example: organic production standards in some countries (for some food-animal commodities) may include stocking density or housing requirements, whereas in other countries, they may not. It is important to make note of these definitions where provided, and only combine factors with similar definitions of the production type. If no definitions are provided

60 Chapter 3. Scope iAM.AMR in the body of the full-text, beyond general designations of “organic” and “antibiotic-free”, then factors with general designations may be combined together.

Antimicrobial use factors

The following AMU-related factor pairings likely should not be combined using meta-analysis: 1. Different routes of administration: i.e. feed and water • The injection route should not be combined with either of the feed or water routes. • Feed and water: these have different therapeutic levels in the gut and typically should not be combined with one another. In-feed use is typically for prevention, and involves a low dose, whereas administration via water is mainly used for treatment (involving a higher dose). This may vary across animal species, however, so the dosage (if provided) or indication (if provided, i.e. preventive versus treatment) should be examined first to determine whether a combination is appropriate. The following AMU-related factor pairings should not be combined using meta-analysis: 1. Subtherapeutic AMU and Therapeutic AMU 2. Therapeutic AMU and Prophylactic AMU (and other similar pairings where the “intent” of the AMU is not the same, including those involving Metaphylactic AMU) 3. Continuous AMU and Pulsed AMU

Note: To make decisions based on the above three pairings, the authors of a paper must have made an explicit designation in their paper as to the type of dosage, intent, or temporal pattern of the AMU (for example, a clear indication of whether a particular dosage is subtherapeutic or not). If numerical values for the dosage are the only information provided, for instance, we would not attempt to classify that ourselves as subtherapeutic, therapeutic, etc.

A good general rule of thumb is to keep any unknown AMU regimes separate from known dose regimes. For instance, a generic “tylosin use (any use)” factor, where no indication is given as to the duration, intent, or dosage of use should not be combined with a “continuous tylosin use” or a “therapeutic tylosin use” factor. However, two generic “tylosin use” factors may be combined.

Feed additive factors

For factors related to the use of feed additives such as probiotics and prebiotics, use caution when combining different brands (check the ingredients first). Generally, different brands of additives should not be combined.

3.14.4 How is the meta-analysis performed?

Please see Adding meta-analysis groupings for instructions on how to prepare your timber for meta-analysis. Our sawmill R package performs meta-analysis using the Metafor Package. We use a random-effects model. There are a number of ways to estimate heterogeneity: • Restricted Maximum Likelihood (REML) – default, requires convergence (it’s ML, so iterative) • DerSimonian-Laird

3.14. Meta-analysis Guidelines 61 iAM.AMR

– a Olaf-approved alternative (non-iterative) We use REML. We calculate the effect size based on Odds Ratio (technically log-OR), and SE of the log-OR. For more details on the math behind the meta-analysis go here.

3.15 The sawmill R Package

The sawmill R package processes queries from the CEDAR (Collection of Epidemiologically Derived Factors Asso- ciated with Resistance) database, performing quality control, and calculating measures of association (odds ratios). Optionally, it can also perform meta-analysis.

3.15.1 Introduction

Why is sawmill needed?

Each of the iAM.AMR models are informed by one more queries to the CEDAR database. The exported query results are called timber. Unfortunately, these raw timber are not usable, as they lack key calculated fields (such as the odds ratio), and have not been screened for simple errors.

What exactly does sawmill do, in brief? sawmill essentially looks at each factor in the timber, checks that the raw data required to calculate an odds ratio and standard error of the log(odds ratio) are available and usable, and then performs those calculations. More details can be found in the sawmill GitHub repository’s latest release notes, as well as in the function help files.

How is sawmill set up?

First and foremost, sawmill is an R package. According to Hadley Wickham and Jennifer Bryan: In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. sawmill is set up as a series of functions, each of which performs a specific step(s) in the processing pipeline. A function, according to R - Functions, is: . . . a set of statements organized together to perform a specific task. Each function is in an individual R script file, where the name of the script file matches the name of the main function it contains. All script files can be found in theR directory of the sawmill GitHub repository. The pipeline is set in motion by running the main function, start_mill, using the following command: sawmill::start_mill()

This function calls all other subsequent functions.

Important: Before proceeding, you will need to have both R and RStudio installed on your computer. If you do not have them both installed, please see R and RStudio.

62 Chapter 3. Scope iAM.AMR

3.15.2 Terminology

In keeping with the logging theme of the sawmill pipeline, the following terminology is used throughout this docu- mentation: Raw timber: the input Excel (.xlsx) file of factors, exported from CEDAR, which acts as the input to sawmill. Grain: the set of fields used to define a particular factor (for instance, a prevalence table or a contingency table). Processed timber or planks: the processed .csv file of factors that sawmill provides as an output.

3.15.3 Navigate RStudio

Note: This section provides only a cursory overview of RStudio, focusing on those features necessary for running sawmill. For a more comprehensive overview, see Introduction to RStudio.

RStudio’s interface looks something like this:

Fig. 18: RStudio interface with sawmill loaded.

It can be divided into four key regions.

Console

R commands can be entered here.

Script editor

This is where R scripts can be viewed and edited.

3.15. The sawmill R Package 63 iAM.AMR

Fig. 19: RStudio console.

Each open script file appears as a tab at the top of this region. To run a block of code, highlight it in the script editor and click the Run button.

Navigation pane

There are two important tabs in this pane: Files and Help. Files shows the contents of the current working directory. Here, users can navigate to R scripts they wish to view in the script editor. Help is where the individual function help files are viewed.

Build tab

When the Build tab is selected, a package can be installed and/or re-loaded using the Install and Restart feature.

3.15.4 How It Works

Acceptable grains

The set of fields used to define a factor (the factor’s grain) varies from reference to reference. Not all grains can be used to calculate an odds ratio and as such, not all are usable by sawmill. The formula for the odds ratio requires a complete contingency table, so any acceptable grain must be able to be converted to the following:

64 Chapter 3. Scope iAM.AMR

Fig. 20: RStudio script editor.

Fig. 21: RStudio navigation pane.

3.15. The sawmill R Package 65 iAM.AMR

Fig. 22: RStudio build tab.

Group AMR+ AMR- Exposed A B Referent C D

As a result, sawmill is capable of working with the following grains.

Contingency tables

Contingency tables are usable in two different forms.

Group AMR+ AMR- Total Exposed A B Referent C D

If AMR- values are not available, totals must be provided.

Group AMR+ AMR- Total Exposed A nexp Referent C nref

Prevalence tables

AMR- prevalences are optional, as they are not used by sawmill.

66 Chapter 3. Scope iAM.AMR

Group AMR+ AMR- Total Exposed P% (R%) nexp Referent Q% (S%) nref

Important: The values in the total column, unlike the other columns, are counts, not percentages. For instance, nexp and nref might represent the total numbers of isolates in each group.

Odds ratios

Lower CI OR Upper CI Significance Value oddslo odds oddsup pval

Note: sawmill will not raise an error if the p-value is not provided, but it cannot calculate one for odds ratio grains.

3.15.5 Access sawmill

Locate sawmill

The sawmill R package is available at the iAM.AMR/sawmill GitHub Repository.

Open sawmill

Once at the repository page, scroll down until you see the README.md file (captured in the image below). This README contains important instructions related to sawmill.

Fig. 23: README.md file on GitHub.

3.15. The sawmill R Package 67 iAM.AMR

Navigate to the Installation and Use section of this file. You can choose either the Bootstrap installation or the Standard installation, depending on your comfort level with R/RStudio and what you intend to use sawmill for.

Attention: Complete steps 1 and 2 of your chosen installation procedure and then return to this documentation. The final steps are related to the use of sawmill and will make more sense upon reading the rest of this page, as well as the related page Processing CEDAR Exports.

3.15.6 Navigate sawmill

Once you have installed sawmill, you may wish to get more familiar with the script files themselves, and/or the accompanying function help files.

Note: This section is largely optional, particularly for those who have chosen the Bootstrap installation procedure, or those not intending to tweak/make development changes to sawmill. However, it is a useful reference, especially the section on Accessing the function help files.

View the R script files

1. Select Files in the Navigation pane 2. Navigate to the directory where the GitHub repository is saved, and open the R directory 3. Open start_mill.R and mill.R in RStudio. These scripts show the order in which the other main functions are called (in other words, the order of the steps (functions) in the processing pipeline). 4. Open any other .R files you would like to examine

Access the function help files

First, select Help in the Navigation pane. Then, enter the following line into the Console:

?function_name()

If that does not work, try entering this line:

?sawmill::function_name()

For example, if you wanted to view the help file for the debark function, you would enter:

?debark()

68 Chapter 3. Scope iAM.AMR

3.16 Processing CEDAR Exports

3.16.1 Overview

This section provides instructions for processing CEDAR exports (queries, timber), so that they can be used to populate the iAM.AMR models. This processing is performed using the sawmill R package. If you are not familiar with sawmill, please review the section on sawmill, and install it as per the instructions on that page before continuing.

Tip: This section should be read concurrently with the last step of your chosen installation procedure (Bootstrap or Standard): please see the Installation and Use section of the sawmill GitHub repository’s README instruction file).

3.16.2 Raw Timber

CEDAR timber should be in the form of an Excel (.xlsx) file, where each row represents an individual factor. The following table is an example of a properly formatted input timber file (header row and one example factor row are shown).

Attention: The left-to-right order and names of the fields in your input file must match that shown above exactly, otherwise sawmill will raise an error.

Each field has an expected data type, as dictated below. A description of each field is also provided.

Attention: The type of data contained within each of the fields in your input file should match those outlined above, as processing errors can occur otherwise. Please see Warnings due to unexpected data types for more information.

3.16.3 Using sawmill

Changing default values of sawmill arguments

Tip: This sub-section is optional if you have chosen the Bootstrap installation.

Complete descriptions of these arguments and guides as to how they should be changed can be found in the Sawmill Arguments section of the sawmill GitHub repository’s README.md file. To change these arguments, open start_mill.R and mill.R. The default values are specified in this script in a single line of code, as shown for mill.R in the following figure. The argument values can be changed directly in this line of code. For example, if you wanted to change the argument insensible_p_lo to 98, simply replace the 99 after the = sign with 98.

Attention: You must click Install and Restart in the Build tab of RStudio for any changes to the code to take effect.

3.16. Processing CEDAR Exports 69 iAM.AMR

Fig. 24: Default arguments in sawmill’s mill.R script.

Adding meta-analysis groupings

Upon examining the processed timber, you may wish to group certain factors together for meta-analysis in the raw timber and rerun sawmill.

Attention: Meta-analysis is currently only supported for timber from CEDAR v2.

To add a meta-analysis grouping, make the following changes to the optional meta-analysis fields in the original, raw timber file: 1. ID_meta: assign the same meta-analysis ID to all factors you wish to include in the grouping 2. meta_amr: specify the antimicrobial or class of antimicrobials to which resistance is assayed 3. meta_type: describe the type and level of granularity of the meta-analysis grouping

Tip: The actual meta-analysis ID assigned to a particular grouping is irrelevant, as long as it is consistent across all factors in the grouping.

The table below provides example values for each meta-analysis field, as they might appear for a factor in the raw timber.

Table 2: Meta-analysis Example ID_meta meta_amr meta_type 7 third-generation cephalosporin Within Study, Same Antimicrobial Class

All three meta-analysis fields (ID_meta, meta_amr, and meta_type) can simply be left blank for factors that should

70 Chapter 3. Scope iAM.AMR not be involved in meta-analysis calculations.

Running sawmill

Please see the instructions in the Installation and Use section of the GitHub repository’s README.md file. Prompts will appear in the Console as you follow the instructions from GitHub. Enter the information requested by the prompts and select the input timber file from its saved location on your computer. Once sawmill is finished running, it will prompt you to save one or more output files. For each one, you will be prompted to select the save location on your computer.

Important: Save all output files with .csv extensions to prevent errors from occurring.

If errors or warnings appear, please see the following sub-sections.

Caution: You will likely rerun sawmill many times, as deciding which factors to include in a model is an iterative process. You will need to enter the command rm(list = ls()) into the Console before rerunning sawmill. This must be done once for every rerun. This way, variables saved during sawmill’s previous run will not carry over to the new one.

Errors

Errors will stop sawmill from continuing to run, at whichever point in the pipeline they are raised. An error message will appear in the Console, indicating which function caused the error. For example, if the error is raised in the build_chairs function, the message will look something like the following:

Fig. 25: Example error message.

Please note that only the lines beginning with “Error” constitute the actual error message. Although the “Processed function. . . ” lines are also in red text, they should be present in the case of a normal output (i.e. one without errors or warnings).

3.16. Processing CEDAR Exports 71 iAM.AMR

Important: In the event of an error, please send the error message and input timber file that produced it to the maintainer of sawmill’s GitHub repository.

Warnings

Warnings alert the user to potential problems with the code or input data. Their presence can indicate that sawmill may run into an error at a later step in the processing pipeline, or simply that the current code or input data will produce an incorrect output without further warning. Others may mean nothing; sawmill may continue to execute flawlessly. Warnings do not stop the pipeline at the point they are raised, but they are still worth examining.

Warnings due to unexpected data types

If sawmill detects that one or more cells in the input timber file do not match the expected data types for their respective columns, a warning message will be generated for each mismatching cell. The warning messages are informative; they specify the exact cell addresses within your input file that contain data of the unexpected type. These particular warnings will also generate a prompt asking whether you would like to stop the pipeline and fix your input data, or continue with processing anyway.

Fig. 26: Warning prompt.

Caution: Electing to continue with processing when faced with this prompt can create unwanted/unexpected results, which you may not receive further warning about.

The type of warning received (Coercing or Expecting) can help you decide whether or not you should continue.

72 Chapter 3. Scope iAM.AMR

Coercing warnings

Coercing warnings appear when R is able to convert the affected cell(s) to the appropriate, expected data type(s). Below is an example of a cell that is likely to produce a coercing warning. This value is in the odds_ratio_up column, so its data type should be numeric. While the value is a number, it is formatted as text (flagged by Excel in the upper left corner of the cell).

Fig. 27: Example of a cell that produces a coercing warning.

Warning messages for coercing warnings appear in the Console and look something like that shown below. The Excel cell shown above produced one of these warnings (the one affecting AE524 / R524C31).

Fig. 28: Coercing warning examples.

If only coercing warnings are present, you can safely choose to continue with processing when faced with the prompt.

Expecting warnings

Expecting warnings appear when R is not able to convert the affected cell(s) to the appropriate, expected data type(s). Below is an example of a cell that is likely to produce an expecting warning. This value is in the prev_table_d column, so its data type should be numeric. However, a text string is present, and it cannot be converted to a numeric data type. Warning messages for expecting warnings appear in the Console and look something like that shown below. The Excel cell shown above produced this warning; it affects cell Z2 / R2C26.

3.16. Processing CEDAR Exports 73 iAM.AMR

Fig. 29: Example of a cell that produces an expecting warning.

Fig. 30: Expecting warning example.

The implications of expecting warnings vary depending on the columns in which they occur. If the affected cell(s) are in any of the columns specified in the table below, you should stop the pipeline and fix the affected cells. These fields have a direct effect on the odds ratio calculation, so in the event of unexpected data types in any of these, sawmill will typically deem the factor unusable, excluding the row from further processing and writing it to the scrap pile without warning. If the affected cell(s) are in any of the other columns, however, sawmill will simply replace the cell with a value of NA. The factor will not be deleted, and the row will still appear in the processed timber. In cases like this, it is up to the user whether or not to continue with processing when faced with the prompt.

Attention: Output fields may still be affected by unexpected data types in these other columns. For instance, the url and html_link output columns are affected by ident_doi (v2)/docID (v1), and sometimes ident_pmid (v2). Also, the identifier output column is affected by ID_factor (v2)/ID (v1) and factor_title (v2)/title (v1).

Other warnings

Every time you execute sawmill, you will likely see a message resembling the following in the Console, once the pipeline has finished and you have saved your processed timber.

Fig. 31: Generic warnings alert.

If you follow the prompt by entering the following into the Console:

74 Chapter 3. Scope iAM.AMR

warnings()

You will see something closely resembling the following:

Fig. 32: Generic warning messages.

This type of warning can be ignored. It occurs when the significance value (p-value) for the factor is calculated using the Fisher’s exact test. Since the values used in the Fisher’s test must be rounded to the nearest integer, a warning is generated to notify the user that the rounding took place.

Attention: If the warning messages are of any other nature than those mentioned, please contact the maintainer of sawmill’s GitHub repository for assistance.

3.16.4 Evaluating the Processed Timber (Planks) and Other Outputs

This section outlines the fields that will be present in the processed timber .csv file. Each row now represents a plank of processed timber, or a factor usable for an iAM.AMR model. An overview of additional output .csv files that may be produced is also provided.

The output .csv files

Processed timber

A processed timber file is produced for each successful run of sawmill. Two types of planks (rows) are present in the following order, from top to bottom: 1. Error-free factors for which an odds ratio and other outputs were successfully calculated 2. Meta-analysis results for each meta-analysis grouping (each unique meta-analysis ID)

3.16. Processing CEDAR Exports 75 iAM.AMR

Note: Rows containing the results of a meta-analysis will look slightly different (for instance, some fields may have values of NA).

Scrap pile

This file is only provided as an output if there is at least one erroneous factor in the raw timber. The scrap pile contains all erroneous factors, or factors for which an odds ratio and other key outputs were not suc- cessfully calculated. Its fields are overall quite similar to those present in the raw timber, with two unique additions: 1. exclude_sawmill: Flagged as TRUE, indicating that the factor was excluded from calculations by sawmill due to errors/missing data 2. exclude_sawmill_reason: A more detailed description of why the factor was not usable

Full meta-analysis results

This file is only provided as an output if there is at least one meta-analysis grouping in the raw timber. Each row represents the results from a single meta-analysis grouping, indicated by the value of ID_meta in the far-left column. The main estimates produced by the meta-analysis calculation (odds ratio, standard error of the log(odds ratio), and p-value) are included in the processed timber. However, the full results produced by metafor (the meta-analysis R package used by sawmill), contain many more fields describing other parameters of the calculation. For a full description of these parameters, please see pg. 241 of the metafor user guide, which is the Value list for rma.uni.

Planks

The following table is an example of processed timber. While all fields present in the input timber are retained in the output, some will have new names. Sawmill renames some of the fields to improve uniformity between v1 and v2 outputs. A description of each output field is provided below. The fields which are added by sawmill and thus only appear in the processed timber are also annotated with the function responsible for adding them.

Tip: The odds_ratio, se_log_or, and pval fields are added by the do_MA function in cases where the row contains the results of a meta-analysis.

Tip: The logOR field is only added if there is at least one meta-analysis grouping (one unique meta-analysis ID) in the raw timber.

Checking the validation fields

These are present in the processed timber file.

76 Chapter 3. Scope iAM.AMR

Low cell count factors

When one or more of the four values in the 2x2 contingency table is equal to zero, sawmill sets the low_cell_count field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5.

Null comparison factors

When the # AMR+ observations in both the exposed and referent groups are equal to zero, sawmill sets the null_comparison field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5. Any null comparison factors also have the low_cell_count field set to True.

CEDAR v2: factors with an insensible_prev_table

Check your output .csv file for rows where the insensible_prev_table field is set to True. These rows likely have data entry errors in the prevalence table columns, as this result indicates that (% AMR+ exposed) + (% AMR- exposed) does not come to approximately 100, and/or that (% AMR+ referent) + (% AMR- referent) does not come to approximately 100.

3.17 Getting Started in Analytica

3.17.1 What is Analytica?

Analytica (by Lumina Decision Systems [hereby ‘Lumina’]) is a popular decision modelling tool for creating prob- abilistic risk assessment models. While Analytica is similar in many ways to a spreadsheet-based modelling tool, it adds two key differentiating features: a flexible and informative user interface, and an ‘intelligent array’ system. The former makes large, complex models easier to understand for end-users and modellers; the latter allows modellers to dynamically resize the model as new information becomes available. Both of these features are desirable for a large-scale model.

3.17.2 Analytica Editions and Versions

Analytica is available in several editions, and is currently (as of Dec 2020) on version 5.4.x.

Editions

Analytica is available in three editions: Analytica Professional, Analytica Enterprise, and Analytica Free 101. The latter is – as the name implies – free! Lumina provides a description of the various editions, and provides a useful comparison of Analytica editions. Analytica Professional is the standard, fully featured edition of Analytica, which allows users to build models of any size, and address arrays with a maximum length of 32,000 elements. Analytica Enterprise offers all the functionality of Analytica Professional, while increasing the maximum array length to 100 million elements, adding encryption, and adding the ability to use a performance profiler to identify computationally expensive model elements. Analytica Free 101 offers the same functionality as Analytica Professional, but limits users to the creation and mod- ification of smaller models, containing no more than 101 nodes. Users can still open and evaluate larger models, but cannot make changes to variables or the underlying model structure.

3.17. Getting Started in Analytica 77 iAM.AMR

Tip: You can open and run the iAM.AMR models with Analytica Free 101, but you will not be able to edit them.

All editions can be installed using the same installer, available from Lumina’s download page. We provide more details on editions in the iAM.AMR Tech Tutorial - see the iAM.AMR Team Repo for more details (login required).

Editions and the iAM.AMR project

In short, we avoid using features from the Enterprise edition (not available in Professional) to reduce software cost. The only Enterprise feature relevant to the iAM.AMR models are the large arrays - we have worked around this issue with some clever coding. We recommend Professional, unless users are involved in framework development.

Versions

Analytica is currently on version 5.4.x. For GoC employees, Shared Services has packaged version 5.1.x.

Versions and the iAM.AMR project

We identify three important milestones; 4.2, 4.6, and 5.0. The changes introduced in versions 4.2 (encryption) and 4.6 (OLE links) are no longer relevant to the iAM.AMR models. Versions 5.0+ enable multi-threaded computation, and may speed up model evaluation. We recommend using version 4.7+.

3.17.3 Install Analytica

Analytica is packaged for GoC computers, however (as always) Shared Services is several versions behind. We can side-load Analytica using the same “extract-and-run” procedure described here.

3.17.4 Learn Analytica

Lumina has provided a number of great first-party resources for getting started with Analytica, including the Analytica User Guide, and a series of Analytica tutorials. These, in addition to the Analytica Wiki, are accessible from the help menu within Analytica. For a general introduction to Analytica, we recommend you get started by reading the User Guide, which is available as a PDF in Analytica’s help menu, or on the Wiki.

Tip: If you’re not a fan of manuals, Lumina has distilled the User Guide into a few key chapters listed here.

There are additional training resources in the iAM.AMR Team Repo( login required).

78 Chapter 3. Scope iAM.AMR

3.17.5 Search for Analytica-related information

It is somewhat difficult to search for information related to Analytica, given the Cambridge Analytica scandal (the CEO of Lumina addresses the errant relationship here). In Firefox (and other browsers), you can setup shortcuts for specific searches.

Add a Shortcut to Search the Wiki

Search shortcuts are key words you used to trigger a specific search. For example, if your search shortcut for the Analytica site is “ANA”, and you want to find information on tables, you’d search “ANA tables”.

Firefox

To create a shortcut to use the Analytica Wiki search box, head to the Analytica Wiki and right-click in the search bar, selecting “Add keyword for this search”. Then, select a keyword to use.

Add a Shortcut to Search the Analytica Domain

To rely instead on Google’s indexing of the Analytica site, we can create a site-specific search, excluding the term “Cambridge”.

Firefox

Right-click in the bookmarks bar, and create a bookmark. The location should be set to https://www.google.com/search?q=site%3Aanalytica.com+-cambridge+%s, and the keyword should be set to your desired search shortcut.

3.18 The Basics

Here, we cover the basics as they apply to the iAM.AMR models. We recommend reviewing this section after exploring GET START, and the User Guide or Wiki; this section is a summary of the key points from the early sections of this material. Additional training resources are available in the iAM.AMR Team Repo( login required). Moreover, there is nothing like first-hand experience. We recommend you try the tutorials, or modelling a simple, every day problem, for which you may have formerly used a spreadsheet.

3.18.1 Learn Analytica

See: Learn Analytica

3.18.2 Basic Concepts in Analytica

An Analytica model consists of one or more objects. An Analytica object, much like a physical object, has a form (it occupies space), and is characterized by its attributes, such as its name (title), its identifier, its class, and its definition. Each of these attributes contains one or more values; values, whether they are text strings, numbers, or formulae, are the data that modellers and users enter into the model.

3.18. The Basics 79 iAM.AMR

The most common object in Analytica is the node (the terms object and node are often, though incorrectly, used interchangeably), and the most important attribute of each node is its definition. The definition is where quantitative data is stored, and where the mathematical relationships between nodes are defined. Other attributes generally contain qualitative data or descriptors (metadata), such as units of measurement.

An influence diagram – the interface you see when you open the model – is a collection of nodes and their connections which serve to communicate the underlying mathematical relationships captured in the model. Because these models are designed to be accessible to users, it is essential that they are as clear and understandable as possible.

80 Chapter 3. Scope iAM.AMR

Node Types

Analytica has ten different types of nodes, which we differentiate into two groups: basic and complex. The basic nodes (variable, chance, objective, and constant) are, for the most part, functionally equivalent. Analytica differentiates between these node types (by default colour and shape) solely to convey information to users; generally, a chance node contains a probability distribution, an objective node contains a model output relevant to users, a constant node contains an immutable constant (such as Avogadro’s number), and a variable node contains any data not belonging to one of the aforementioned categories. The choice between these node types is largely stylistic. The upshot of this principle, is that while Analytica will in some instances automatically select a node type, modellers may coerce a basic node to any other basic node type which they believe is appropriate. In contrast, the complex nodes (module, index, function, text, and button) each confer distinct and unique properties, and are therefore are only useful in specific situations. Where applicable, they are described in subsequent sections.

Attributes

Each object in Analytica is described by a series of attributes; the minimum set of attributes required to describe an object include the title, the identifier, and the definition. The title is a non-unique string that appears as a label for the object in an influence diagram. The identifier is a unique, non-space-containing string used to identify or reference the object in functions, formulae, and definitions of subsequent objects. The definition is the content of the object –- the main repository for data and data manipulation. Comparing Analytica to a traditional spreadsheet program, the title is equivalent to a label beside a cell, the identifier is equivalent to the cell name (e.g. B4), and the definition is equivalent to the cell content (e.g. “dog”, 18, or “=C4+D4”).

Identifier

An object’s identifier is a unique, non-space-containing string used to identify or reference the object in functions, for- mulae, and definitions of subsequent objects. Analytica automatically generates this identifier using the title provided; identifiers are limited to 20 characters, cannot start with a number, and cannot contain punctuation or spaces. For example, a node with the title “Frequency Tree” will automatically be given the identifier “Frequency_Tree”. Similarly, a node with the title “Jim’s Favourite Cookies” will automatically be given the truncated identifier “Jim_s_Favourite_Cook”. Where the generated identifier is already in use, a non-padded number will be appended to the identifier (i.e. a subsequently created node with the title “Frequency Tree” will be assigned the identifier “Frequency_Tree1”. In the event the resulting numbered identifier exceeds 20 characters, the title is further trun- cated (i.e. a subsequently created node with the title “Jim’s Favourite Cookies” will be assigned the identifier “Jim_s_Favourite_Coo1”). Note that while capitalization of identifiers is preserved within the definitions and formulae of the model, identifiers are not case-sensitive.

3.18.3 Lucid Influence Diagrams and Best Practices

Lumina has coined the phrase ‘Lucid Influence Diagrams’ to describe diagrams that follow best practices, and clearly and effectively communicate purpose to users and modellers alike. Several key recommendations from the Analytica Wiki are reproduced here. Of course, none of these are hard and fast rules; use your discretion when applying these conventions, and construct your diagrams as they best make sense to you (and of course, as they best serve your stakeholders’ needs).

3.18. The Basics 81 iAM.AMR

Node Titles

Nodes should include descriptive, but succinct titles. Use common abbreviations where appropriate, but balance these choices against usability. For example, a node describing the probability of antimicrobial resistance at retail is better represented as “Prob. Resistance at Retail” than the more succinct, but difficult to understand “Prob. Res. Ret.”.

Node identifiers

Recall that the title of each node (shown by default in the user interface) is distinct from its identifier. A node’s identifier is the true designator of the node – it is the string used to identify the node in functions, formulae, and definitions of subsequent nodes. This identifier can (and should be) more succinct than the title, as it will be repeatedly entered elsewhere in the model (and used solely by model builders, who have a more thorough understanding of the model). As node identifiers are automatically created, they may be nonsensical or unnecessarily complex. Using our former ex- ample, if we had several nodes titled “Jim’s Favourite Cookies” (perhaps within a model of several bakeries), we could easily end up with a series of identifiers “Jim_s_Favourite_Coo1, Jim_s_Favourite_Coo2, Jim_s_Favourite_Coo3” and no idea as to which bakeries these refer. Therefore, it is best practice to manually edit the identifier after it is generated, using a uniform and consistent nam- ing scheme (e.g. “Jim_Fav_BakeryA”, “Jim_Fav_BakeryB”). The schema developed for the iAM.AMR project is described in the conventions.

Tip: If you are updating the titles of nodes, you can disable the automatic prompt to regenerate the identifier based on the new title in the preferences menu –- this can speed up the process of updating the model where the identifier has already been set correctly.

Visual Consistency

Colour, size, and node type can be used to communicate information to the user, but only when these attributes are used consistently. Nodes containing similar data or which perform a similar function should be the same size and shape – larger or more colourful nodes suggest importance and draw attention. Likewise, the large-scale arrangement of the influence diagram communicates information to the user; influence dia- grams tell a story with their structure, and should flow as one would expect – from left-to-right and from top-to-bottom. Nodes should be aligned where possible to reduce visual clutter; horizontal and vertical arrows, which do not intersect, are easier to follow than their askew or tangled counterparts. Arrows between nodes can be supressed using individual node style properties (by right-clicking the node, and select- ing node style). This is recommended where relationships are implied by positioning or title, and suppression of the links reduce visual clutter. Arrow suppression is especially useful when implementing a User Defined Function (UDF) – the function node will be visually linked to all objects in which the function is called unless output arrows from the function node are disabled. The text-case used in node titles (and identifiers) should be consistent across the model. While title-case may be more attractive for short titles, sentence case improves readability. Decide on one format, and use it consistently throughout the model. The schema developed for the iAM.AMR project is described in the conventions.

Attributes and Metadata

Recall that the minimum set of attributes required to describe an object include the title, the identifier, and the defini- tion. However, Analytica also includes a number of built in attributes to capture metadata, such as the description and

82 Chapter 3. Scope iAM.AMR unit fields, which should always be completed where possible. Notably, modellers can create their own attributes (or enable lesser-known built-in attributes) to further document their models or add functionality; the attributes panel is available under the object menu in the menu bar. The built-in Cell Default attribute specifies the value assigned to newly-created cells. This attribute, enabled on a per-node or model-wide basis, replaces Analytica’s default cell value of zero. Setting this attribute is useful where zero values may result in errors during evaluation (e.g. the node is used as a divisor), where the cell is a complex function, of when the cell is otherwise cumbersome to regularly update (e.g. a series of choice functions in a table). The built-in OnChange attribute, enabled on a per-node basis, specifies an expression to be evaluated or action to be taken any time the definition of the node changes. Importantly, expressions in the OnChange attribute are able to affect changes throughout the model (i.e. global assignment) that are otherwise disallowed by Analytica (other than through a button action). Specifying an OnChange attribute is useful for input validation, or synchronizing multiple nodes.

3.18.4 Indices and Array Abstraction

Indices

Indices are lists, consisting of text strings, non-sequential numbers, or number series, which act as strata for data throughout a model. The simplest way of thinking of an index is as containing the row or column labels of a table – indices delineate data into categories, across which comparisons can be drawn. An example of a simple index is a list of months, which serves as the row or column labels for a table containing data collected on a monthly basis. When defining a list, Analytica presents three options: a list, a list of labels, and a sequence. A list may contain any type of data (string, numeric, etc.). A list of labels can only contain strings – any data entered will be coerced to a string. A sequence is a list of numbers that do not need to be individually specified; where a large list of regularly incremented numbers is required, a sequence is a great shortcut (e.g. a list of numbers from 1 to 100).

Array Abstraction

Indices serve as the basis for Analytica’s ‘Intelligent Array’ system, one of Analytica’s most powerful functions. For those readers with experience in programming, array abstraction (Lumina’s terminology for the implementation of the Intelligent Array system) is akin to automatic vectorization of code. In simpler terms, any operation applied to a table or function which includes an index, is automatically applied over the entire index. Let’s return to our example of an index containing a list of months; multiplying a table containing monthly sales data (indexed by the Month Index) by 5 will automatically multiply each cell by 5 –- no need to specify the operation for each individual cell.

3.18. The Basics 83 iAM.AMR

The true power of array abstraction however, is Analytica’s ability to match indices, and automatically propagate these indices throughout the model. Let’s look at a different example; calculating the revenue associated with multiple products. Given two tables, containing the number of units sold, and the price per unit, we can calculate the revenue per product with a single multiplicative operation. The number of units for Product A in the first table is multiplied by the price of Product A in the second table (and so on for all products), and the result is a single column table, also indexed by the product names.

Additionally, Analytica can identify where operations occur over two different indices and automatically create a matrix, populated with the cross product of those indices. Expanding on our previous example, we can calculate the profit on each product throughout the year, assuming our profit margin changes as a result of material cost (perhaps we’re a bakery, and the cost of vanilla changes throughout the year). Given two tables, containing the revenue per product, and the margin per month, we can calculate our profit again with a single multiplicative operation. The revenue for each product is automatically multiplied by each month’s margin value, and the result is a matrix, indexed both by product names and months.

84 Chapter 3. Scope iAM.AMR

The rules of array abstraction will become more apparent as you build your models; array abstraction (and the rules that govern it) are some of the more difficult concepts to grasp in Analytica, especially before you’ve had an opportunity to try it yourself. One key thing to remember is that indices are propagated forward in the model, and each index adds a dimension to your table or matrix. Any operation on an object associated with an index will bring that index forward into the calculation. The exception to this rule are array reducing functions; for example, Sum() adds elements of a table along an index (for example, if we wanted the total revenue for all products), reducing the dimensionality of the table by one (i.e. removing the index).

3.18.5 Decision Nodes and DetermTables

As you become more familiar with indices and the Intelligent Array system in Analytica, you may notice that the size of tables (and therefore their compute time) increases rapidly –- it’s very easy to build a model that will test the limits of your available computational resources. You may also realize that you require user input in the model, in the form of a choice between one or more scenarios.

Choice Functions and Decision Nodes

Decision nodes (and the Choice functions contained therein) address both of these facets of model building by pre- senting the user with a list of options, and allowing them to select one or all of these options – only these options are propegated through the model and evaluated. The easiest way to understand how Choice functions are implemented in Analytica is to look the corresponding code: Choice(INDEX, POSITION, AllowAll) All of the options presented to the user are specified in an INDEX. The simplest example of an index is one containing the labels “Yes” and “No”. When the user interacts with the choice node and makes a decision, the Choice function stores that decision as the POSITION of that element in the index. In our simple example, if the user chooses “Yes”, and “Yes” is the first element of the index, POSITION = 1; if the user chooses “No”, POSITION = 2. The importance of this concept to end-users is minimal, however model builders should be aware that we can change the

3.18. The Basics 85 iAM.AMR

user’s selection programmatically, by updating the POSITION argument of the Choice function. The final argument, AllowAll, is a logical, which specifies whether the user is allowed to choose all of the options, not just one.

DetermTables

Recall that an index can be thought of as the rows or columns of a table. What the Choice function actually does is take an existing index, perform a subset (i.e. select one element from the index), and returns that subset as an index. This means we can dynamically resize our tables based upon the choice of the user, reducing computational requirements, and returning data tailored to the user’s choices. However, we can’t do this with a traditional Table in Analytica. As Analytica will remind you, if you ever go to delete an element from an index, data are lost as the Table shrinks. Instead, we rely on a DetermTable; an object which works exactly like a Table, but dynamically resizes when calculated. In the example shown below, the DetermTable is indexed by a Decision node, which is set to “Second Quarter”. This means that while the DetermTable contains all of the information necessary to evaluate the whole table, it will only evaluate and return the “Second Quarter” value.

If we wanted to achieve a similar effect using a standard Table, we would need to manually delete and re-add elements of the index, then repopulate the Table – not something that’d you’d want to do regularly. Moreover, it is not something that end-users could easily accomplish.

Tip: You can always use DetermTables in place of standard tables; there is seemingly no downside, and no reconfig- uration is required if a Decision node is later included in the model.

A Note on implementation

There are two important things to consider when using a Choice function. The first is that a Choice function can be self-indexed (i.e. the index of choices is specified within the Decision node itself). We generally do not recommend that option, as the index will likely need to be re-used at some point, elsewhere within the model. The second is that there is an additional step when configuring a Decision node using a Choice function with an external index (as described in the previous section). In addition to specifying the external index in the Choice function definition, it must also be specified as the Domain of the Decision node, in the Domain attribute. If the attribute is not specified, Analytica will throw an error. Note that if the Domain attribute is not accessible in the node window, enable the attribute as previously described.

86 Chapter 3. Scope iAM.AMR

3.19 Models and Modules

3.19.1 About Modules

Models get big fast. This is particularly true of Analytica models, wherein data and model process are coupled to their layouts (recall, Lumina refers to this concept as an Influence Diagram). Accordingly, the Analytica wiki includes a section explicitly addressing working with large models. In brief, Analytica models can be broken down into smaller parts, called modules. These modules fit together like puzzle pieces – each contains its own nodes and connections that together, form a whole. A given module can be shared among multiple models. Within a model, modules are organized hierarchically, like folders on your desktop. You may have many modules in the root of your model, or choose to add modules within other modules. Or, if you prefer, you may just have one large model with no modules at all (but will be doing a lot of scrolling!). A module looks like a standard node, but you’ll be able to identify them by their thick black borders.

3.19.2 Types of Modules

Standard Modules

Standard modules (or simply, modules) exist within the parent model file as a logical sub-dividion of your model. For example, you may add a user-interface at the root level of your model, and hide the actual model from view within a module (we do this in the iAM.AMR models). Or, you might put all nodes related to a particular antimicrobial class within a single class module.

Hint: ‘Enter the model’ on the front page of each model is actually a module! We use modules throughout the iAM.AMR models to better organize information and improve user experience.

Filed Modules

Filed modules exist within a seperate file (i.e. a seperate .ANA file), linked remotely to your parent model file. For example, you may have different filed modules for each animal species (that can be updated by a different subject- matter expert), that are all linked together in a single parent model. Filed modules offer several important advantages, including re-use across multiple models, and improved collaboration (as everyone can work on their own section). However, it is important to note that you must have all of the files present to run the parent model. Returning to our above example, if you have a parent model “agri-food production”, linked to seperate chicken, cattle, and swine filed modules, you must have all four to run the complete model. And, they must be stored/distributed in the same relative location (see below).

3.19.3 Linking Modules in Analytica

For a complete explanation of how to link filed modules to parent models in Analytica, see the Analytica Wiki. For a practical example, see the iAM.AMR.HUB section.

3.19. Models and Modules 87 iAM.AMR

Some Important Points

• Changes in a linked module (i.e. a filed module) are propegated in both directions. Do not make an edit within the module in your model that you do not want to propegate to all models linked to that module. • Generally, you do not want to embed a module, as that breaks the link to the original module file, and no updates or changes will be propegated in either direction. • For existing links, you will have to re-link the modules if the files have been moved relative to one another. For example, when you download the files from GitHub, you will have to re-link the module when open.

3.20 iAM.AMR Framework

3.20.1 Overview

The iAM.AMR models are organized into stories, or collections of one or more drug-microbe-commodity combina- tions that together describe an important facet of resistance in the agri-food production system in Canada. Each story model shares the same architecture, which allows us to: • share common data and functions via linkable modules (small stand-alone models) • integrate the models to simultaneously evaluate multiple stories

You can think of models and modules fitting together like puzzle pieces; similar models and modules ‘click’ into one another to form a whole.

88 Chapter 3. Scope iAM.AMR

Attention: As a result of this ‘pluggable’ framework, the iAM.AMR models are often distributed across multiple files – ensure you have all the files you need to run your model.

3.20.2 Implementation in Analytica

3.20.3 iAM.AMR Stories

Naming Scheme

Insert Here.

Story Icons

Insert Here.

Existing Models

• iAM.AMR.3GC • iAM.AMR.CHI • iAM.AMR.FQC

3.20.4 iAM.AMR Modules

Naming Scheme

Modules are generally saved with a prefix of iAM.AMR.MOD in the format iAM.AMR.MOD_contents_here.

Model Names

The models are named iAM.AMR.XZY where XYZ represents a three character short-code identifying the model. The code should be relevant to the contents of the model. • e.g. the iAM.AMR.CHI model focuses on chickens, while the iAM.AMR.3GC focuses on third-generation cephalosporins.

Factor Identifier

The factor identifier is automatically generated by the sawmill R package, in the format A#####_Name_of_Factor where: • A is either ‘R’ for a standard factor, or ‘M’ for a meta-analysis • ##### is the factor or meta-analysis number Where multiple factors inform a single node: • and one or more of the factors is a meta-analysis, use the meta-analysis identifier with the lowest number • and all of the factors are not meta-analyses, use the identifier with the lowest number

3.20. iAM.AMR Framework 89 iAM.AMR

• and all of the factors are meta-analyses, use the identifier with the lowest number In some instances, it may be appropriate to deviate from this schema – care should be taken to maintain consistency despite these deviations.

Node Colour

Colour is used to indicate the function and contents of each node. The use of colour in the model should conform to the general scheme: Light Grey Non user-modifiable node that performs intermediate calculations, or which is otherwise exposed to the user via a separate user interface Dark Grey A node containing a list of factors Orange An objective node, containing intermediate or final results of calculations Purple A user interface node Blue A factor node, or a node which contains epidemiological data Note, the following colour designations are liable to change, as the models are further standardized: Pink A node in which the factor is informed by meta-analysis Peach A node which contains information for multiple bacterial species Gold A node which contains information for multiple bacterial species, informed by meta-analyses

3.21 iAM.AMR.HUB

Recall that the iAM.AMR models are organized into stories – collections of different drug-bug-commodity combina- tions – that together describe an important facet of resistance in the agri-food production system in Canada. Despite this variability, there are a number of functions and parameters common to all models. To ensure these parameters are consistently available, uniformly implemented, and easily updatable, they are included in a separate filed module, the iAM.AMR.HUB.

Hint: The Hub module is so named for its hub-and-spoke implementation; each story model (spoke) connects back to the central hub, like spokes on a wheel.

Note, because the Hub module is a filed module (i.e. is stored in a separate module file), you must have a copy of both the Hub module AND the story model, to run the story models.

3.21.1 Module Contents

The Hub module contains the following functions: • Get Data (getData) A function used to select the data to import from the Hub module into a story model • Counts to Prevalence (countPrevalence) A function to calculate a prevalence, represented by a Beta() distribution, from count data in a countTotal table • Interleave Tables (Interleave)

90 Chapter 3. Scope iAM.AMR

A function to merge two similarly indexed tables, overwriting the data in the target table with matched data where available. The Hub module supplies the following data: • CIPARS-derived baseline probabilities of resistance (see: Baseline) • CIPARS-derived retail microbial recovery rate (see: Recovery Rate) • commodity consumption rates (see: Consumption Rate) • population size by region (see: Population) The Hub Module also contains the OR Matrix Library.

3.21.2 Hub Data

countTable Data Format

Here, we define a countTotal table as a table of counts of positives, and of total observations of a trial, indexed by the CountTotal index at positions ‘Count’ and ‘Total’ respectively. The purpose of a countTotal table is to store the number of positive observations and the number of total observations of a trial – across an arbitrary number of additional indices – in a standardized format, for manipulation using array operations. For example, directly calculating prevalence (Count / Total), or for use in a distribution (specifically the binomial distribution family, which considers the number of successes in a series of independent experiments). See the Math and Stats Section for more details. Both the CIPARS-derived baseline probabilities of resistance, and the CIPARS-derived retail microbial recovery rate are specified as countTotal tables.

Baseline

The baseline data are stored as counts (as countTotal tables), or as prevalences, in separate tables for each microbe. Why do we separate baseline data into tables by microbe? Or by any index for that matter – why not use one big table? Analytica supports – and is indeed designed for – highly dimensional tables, and we’ve extolled the virtue of Analytica’s Intelligent Array system throughout this documentation. We separate them because these tables don’t actually share all of the same indices; the microbial sub-type index changes, depending on the microbe. Recall from our discussion of Analytica’s Intelligent Array system , that when presented with two different indices, Analytica will automatically array-abstract, and fill-in and compute values for each combination of index A and index B. If we were to include Salmonella and Campylobacter sub-types indices in the same table – for example – Analytica would create cells for the intersection of each of these indices, both the parent (i.e. Salmonella or Campylobacter), and the child (i.e. the subtype). The resultant table would have cells such as “Salmonella jejuni” or “Campylobacter Heidelberg”). This seems simple enough to address (ignore nonsensical combinations), but each nonsensical slice of the table is replicated for each year, region, animal, and antimicrobial. So it’s not just “Salmonella jejuni”, but “Salmonella Enterica jejuni”, “Salmonella Heidelberg Jejuni”, and so on – this quickly becomes computationally prohibitive.

Tip: Often times, we may refer to arrays with multiple, non-congruent indices as “jagged’, because they aren’t of consistent size across each other index.

All of the available baseline data is taken from CIPARS surveillence data, available in the annual reports. There is currently no baseline data directly specified as a prevalence.

3.21. iAM.AMR.HUB 91 iAM.AMR

Recovery Rate

To Do.

Consumption Rates

To Do.

Population

To Do.

3.21.3 Hub Functions

Each function is described within its own Description attribute. Where necessary, further information is provided here.

Get Data

The Get Data function takes a table indexed by the baseline indices, and returns a subset of the data, filtered by the provided story-model indices.

Note: If a main-model index is omitted, the table will be returned indexed by the baseline index.

Count to Prevalence

The Count to Prevalence function converts a countTotal table into a table of prevalences, represented by a Beta distri- bution. For more details on the use of the Beta() distribution, see the Math and Stats section. By default, the function replaces missing count and total values with ‘1’ and ‘15’ respectively. Alternatively, it can return ‘Null’, or a Beta() distribution specified with alternative default count and total values.

Important: The defaults specified here are not the direct parameters used in the Beta() distribution. See the Math and Stats section for more details.

Interleave

The Interleave function combines two similarly-indexed tables, overwriting data in <> with <> where matched and available.

Note: Interleave() respects array abstraction; where <> is specified with fewer indices than <>, <> is expanded (abstracted) to fit.

By default, Interleave() replaces all data for which there is a match (<> = True). To only replace existing data (i.e. replace existing data, not fillng data gaps), set (<> = False.

92 Chapter 3. Scope iAM.AMR

3.21.4 Depreciated Functions

To Do.

3.21.5 HUB vs. HUB.GM

To ensure end-users do not accidently overwrite or change values in the Hub module (and subsequently propagate these changes to all story models), we maintain two different copies of the Hub module: the Gold Master [GM] (iAM.AMR.HUB.GM) and the production copy (iAM.AMR.HUB). The Gold Master (a term borrowed from audio and software engineering) is – as the name suggests – the master copy of the Hub module. The GM is where all development (additions, deletions, changes) occurs. The production module is a protected and encrypted copy of the mutable (editable) GM module, connected to each of the story models. See more details in the iAM.AMR.HUB repo. iAM.AMR.HUB • iAM.AMR.HUB is the main, working-copy of the Hub module. • iAM.AMR.HUB is an encrypted, browse-only (non-editable) copy of iAM.AMR.HUB.GM. • Download and use iAM.AMR.HUB in your models, or select this module if prompted to locate the Hub module by Analytica. iAM.AMR.HUB.GM • iAM.AMR.HUB.GM is the secondary, developer copy of the Hub module. • iAM.AMR.HUB.GM is an unencrypted, editable copy of iAM.AMR.HUB. • Download and edit iAM.AMR.HUB.GM to add new features to the module; do not select this module if prompted to locate the Hub module by Analytica. iAM.AMR.HUB.EX • An example model demonstrating the use of the Hub module.

Important: The iAM.AMR.HUB module is the module to which the story models are linked. Do not link your story model to the iAM.AMR.HUB.GM module.

What does this mean in practice? To make changes to our Hub module, we first make the changes to the GM. Then, we save a protected copy, overwriting the existing production model. Because the name and location of the Hub module do not change, the story models automatically recognize the new production copy, and any changes are propagated when the story models are opened. You can think of making changes to the Hub module like making changes to a manuscript. All changes are made in Microsoft Word, before creating a PDF to submit to the journal.

3.22 Software and services

The IAM.AMR project relies heavily on free and open-source software. The authors would like to thank the developers that made these projects possible, and recognize the companies which provide paid or closed-source software and services at a discount for academic users. For more information on free and/or open-source projects, see the Open Source Initiative and the Free Software Foundation.

3.22. Software and services 93 iAM.AMR

3.22.1 List of software and services

Analytica “Analytica is a visual software package developed by Lumina Decision Systems for creating, analyzing and communicating quantitative decision models.” Git “Git is a distributed version-control system for tracking changes in source code during software development.” GitHub “GitHub Inc. is a web-based hosting service for version control using Git.” Kumu “Kumu is a powerful data visualizatiuon platform that helps you organize complex information into interactive relationship maps.” Let’s Encrypt “Let’s Encrypt is a free, automated, and open certificate authority.” Mendeley “Mendeley is a desktop and web program produced by Elsevier for managing and sharing research papers, discovering research data and collaborating online.” R “R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.” RStudio “RStudio is a free and open-source integrated development environment for R, a programming language for statistical computing and graphics.” Read the Docs “Read the Docs is a software documentation hosting platform.” Sphinx “Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license.”

3.22.2 Management of shared software and services

Collaborative software and online services are usually managed by one or more persons.

GitHub

• The IAM.AMR models are housed in a private repository owned by Brennan Chapman (chapb). Availability of a private repository with more than three collaborators is made possible by GitHub’s academic offerings. • The documentation is housed in a public repository owned by Brennan Chapman (chapb).

Kumu

• The AMR Org Chart is managed by Brennan Chapman (chapmanb). Availability of private projects is made possible by Kumu’s academic offerings.

Read the Docs

• The documentation is hosted by Read the Docs and is managed by Brennan Chapman (chapb).

Domain Name

• The grdi-amr.com domain is managed – and has been graciously donated by – Brennan Chapman.

94 Chapter 3. Scope iAM.AMR

3.22.3 Installation on GoC computers

The installation of software on Government of Canada (GoC) computers is managed remotely by Shared Services Canada (SSC) and the National Service Desk (NSD). The GoC’s Acceptable Use Policy clearly states that users should not install unapproved software on GoC machines. Proceed at your own risk.

Tip: The following instructions use C:/myprograms as an example installation directory. You can install these programs in any writable directory.

Mendeley

1. Download the latest version of Mendeley here. 2. Right-click on the installer and select ‘7-Zip > Extract to “Mendeley-Desktop-###-win32”’. This will create a new folder in your current directory. 3. Navigate to the root of your C:/ drive, and if it doesn’t already exist, create a new folder called ‘myprograms’. 4. Move the folder you created in step 2 into the myprograms folder. 5. Navigate to the folder (within C:/myprograms/), and locate the ‘MendeleyDesktop.exe’ executable. Right-click on MendeleyDesktop.exe and select ‘Send to > Desktop’ to create a shortcut. 6. Launch Mendeley from your newly created shortcut.

R and R Studio

Install R

1. Download the latest version of R from the University of Toronto here. 2. Navigate to the root of your C:/ drive, and if it doesn’t already exist, create a new folder called ‘myprograms’. 3. Run the installer and select your preferred language. 4. When prompted, click ‘Next’ to acknowledge the warning about administrator privileges, and ‘Next’ to accept the licensing agreement. 5. Now, select a destination location by using ‘Browse’. Navigate to and select the ‘myprograms’ folder, in the root of your C:/ directory. The installer will automatically append a folder name to this path, according to the R version number. Click ‘Next’. 6. Click ‘Next’ on all subsequent screens to accept the default installation options, and complete the installation.

Install RStudio

1. Download the latest zipped version of RStudio from the downloads page.

Tip: Ensure you download the Windows Vista/7/8/10 zip file, not the .exe installer. These are located under the Zip/Tarball heading.

2. Right-click on the zip file and select ‘Extract All’. This will create a new folder in your current directory. 3. Navigate to the root of your C:/ drive, and if it doesn’t already exist, create a new folder called ‘myprograms’.

3.22. Software and services 95 iAM.AMR

4. Move the folder you created in step 2 into the myprograms folder. 5. Navigate to the folder (within C:/myprograms/), and locate the ‘rstudio.exe’ exexutable within the ‘bin’ folder. Right-click ‘rstudio.exe’, and select ‘Send to > Desktop’ to create a shortcut. 6. Launch RStudio from your newly created shortcut.

Select a R Installation (optional)

Where multiple versions of R are available, or where the installation has not successfully been added to the registry, it may be necessary to select the appropriate (usually the latest) version of R.

Fig. 33: The RStudio R installation selection window.

If you are prompted during RStudio’s installation, choose the most appropriate version of R from the ‘Choose a specific version of R’ dropdown. If there are none listed, use ‘Browse. . . ’ to navigate to the ‘bin’ sub-directory of your installation, and select ‘R.exe’. If you have multiple versions of R installed and you would like to choose a different version after RStudio has been installed, you can make the selection from Tools > Global Options.

3.23 Git et al.

3.23.1 Version Control

Version control is a system in which tracks changes to files so users to know how, when, and (hopefully) why those changes were made. Whether you know it or not, you likely use an informal version control system in your every day work – albeit a bad one. Take for example, manuscript revision – how many times have you seen a naming scheme like this?

96 Chapter 3. Scope iAM.AMR

• manuscript.docx • manuscript_bc.docx • manuscript_bc_cp.docxz • manuscript_bs.docx • manuscript_bs_cc.docx • manuscript_af_01_01_2019.docx

Tip: A manuscript hosted in Office 365 can be edited simultaniously by an entire team!

3.23.2 Introduction to Git and GitHub

While common, these informal version control systems lend themselves to errors and omissions. For more complex projects, spread across files and users, a more formal solution is required. Enter Git.

Git

Git is a distributed version control system for tracking changes in source code during software development. In layman’s terms, Git is a system where multiple people can work on plain-text files (see here to find out what plain- text files are), merge and compare changes, and track these changes over time. Git, of course, does a lot more than that, and has incredibly powerful features that make collaborative (and importantly, simultaneous) development much easier than our rename_rename method above. But it’s not an easy system to learn, since most of the work is done on the command line.

Caution: It is important to note that Git is not designed for complex files, like Word documents or Excel spread- sheets. While it may not seem like it, these files have extraordinarily complex structures (a Word document is basically a web-page in disguise!), and the benefits of Git – being able to see small, exact changes – are lost in the noise.

GitHub

Luckily, there’s GitHub Desktop, which wraps Git in a nice, easy-to-use graphical wrapper (GUI). Wait you say! What is GitHub? GitHub is simply a centralized, online repository of your files – it takes Git online, connecting all of your local files and revisions with all of mine (and all of our collaborators). GitHub Desktop has many features of Git, and the ability to link to GitHub without worrying about things like managing SSH keys. Because we have this GUI available, subsequent sections will gloss over the complexities of Git, and focus solely on practical applications achievable using GitHub and GitHub Desktop.

Getting Started

There are a huge number of great resources available to learn about version control systems, Git, and GitHub. Some of the best include: • An Introduction to Version Control • The First Chapters of the Pro Git book • Hello World! A Simple Getting Started Activity

3.23. Git et al. 97 iAM.AMR

3.23.3 Installing GitHub Desktop

GitHub Desktop links our local installation of Git with GitHub (we’ll explore these in more depth in subsequent sections). Download GitHub Desktop here, which will include its own, self-contained version of Git. Run the installer and sign in with your GitHub account credentials. When prompted to configure Git, update your name (if incorrect). You can leave the email address as generated, to reduce notifications in your inbox.

Provided you have been added as a collaborator, on the ‘Let’s get started!’ screen, the IAM.AMR repository should be listed under ‘Your repositories’.

98 Chapter 3. Scope iAM.AMR

To download the models, select the repository, and choose ‘Clone’. Now, select a local path – the directory where you’d like to keep the models – and again, choose ‘Clone’.

3.23. Git et al. 99 iAM.AMR

The models will then be downloaded into the directory of your choosing. You can close GitHub Desktop, or explore some of the features in subsequent chapters.

3.24 Communications Style Guide

Appending a suffix on ‘model’ requires the insertion of a second ‘l’: • e.g. ‘modeller’; ‘modelling’ The following should always be hyphenated or (not hyphenated) as follows: • Model-builder • End-user • Farm to fork pathway • Agri-food chain

3.24.1 Common Definitions

Antimicrobial resistance: An in vitro measure to interpret a bacteria’s ability to resist compounds intended to inhibit bacterial growth and/or kill the bacteria.

Baseline probability (po): The prevalence of antimicrobial resistance at the earliest site of measurement in the de- fined agri-food chain (e.g., prevalence of antimicrobial resistance in broiler chicks at placement in the barn).

100 Chapter 3. Scope iAM.AMR

Estimated probability (pE): The estimated probability of antimicrobial resistance leaving a node in the iAM.AMR after adjustment of the baseline probability by the 1) odds ratios, and 2) frequencies of occurrence of factors potentially associated with antimicrobial resistance. Depending on the location of the node in the model, the estimated probability may become the baseline probability of resistance for the next node (e.g., baseline for abattoir node (pA) or baseline for retail node (pR)) or the final probability in the model (FP). Factor: A measured observation, such as antimicrobial use, different types of management systems, disinfectant use at slaughter plants, and packaging at retail. iAM.AMR: Integrated Assessment Model for Antimicrobial Resistance. Integrated assessment model: A modelling approach that synthesizes data from many sources (e.g., different areas of study, varying methods, diverse disciplines, scales of measurement, sources of uncertainty) that intends to address some of the complexity within systems to support decision-making and/or policy interventions. Local sensitivity analysis: An approach to describe changes in a model outcome(s) at a single node (e.g., estimated probability) with a change (including infinitesimal changes) in one or more of the other input model parameters (e.g., baseline probability, odds ratio). Node: An element within the iAM.AMR that represents a particular site (e.g., farm) within the system of interest (e.g., broiler chicken production industry). Scenario: Antimicrobial resistance to an antimicrobial/antimicrobial class in a specific bacterial genus/species in a defined host population (e.g., extended-spectrum cephalosporin-resistant Salmonella from broiler chicken). Site: A generic location (e.g., farm) in the system of interest (e.g., broiler chicken production industry) that is repre- sented by a node in the iAM.AMR.

Style Guide

The following should always be capitalized: • Intelligent Array The proper name of functions and objects should always be capitalized where refering to a generic function or object: • e.g. ‘Choice function’; ‘Table()’; ‘Uniform function’ The names of specific objects should always be capitalized and italicized. The type of object is not capitalized or italicized: • e.g. ‘the Interface index’; ‘the Frequency node’

Key Words

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in RFC 2119: MUST This word, or the terms “REQUIRED” or “SHALL”, mean that the definition is an absolute requirement of the specification. MUST NOT This phrase, or the phrase “SHALL NOT”, mean that the definition is an absolute prohibition of the specification. SHOULD This word, or the adjective “RECOMMENDED”, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. SHOULD NOT This phrase, or the phrase “NOT RECOMMENDED” mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications

3.24. Communications Style Guide 101 iAM.AMR

should be understood and the case carefully weighed before implementing any behavior described with this label. MAY This word, or the adjective “OPTIONAL”, mean that an item is truly optional. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides).

Common Acronyms

AAFC Agriculture and Agri-food Canada AMR Antimicrobial Resistance AMU Antimicrobial Use CFIA Canadian Food Inspection Agency CIPARS Canadian Integrated Program for Antimicrobial Resistance Surveillence CSS Cascading Style Sheets GRDI Genomics Research and Development Initiative HC Health Canada NSD National Service Desk PHAC Public Health Agency of Canada SSC Shared Services Canada 3GC Third-generation Cephalosporins

Study Groups

Study groups should be named as follows: Referent: The group which represents the default practice in Canadian industry, or the least interventionist. Comparator: The group which represents the less common, or more interventionist group.

Note: If a study has more than two groups, all groups except the Referent should be titled Comparator 1, Comparator 2, etc. For example, a study examining the effect of AMU on AMR may have multiple comparator groups, each representing a slightly different dosage or treatment regime.

3.25 Documentation

3.25.1 About this Documentation

The iAM.AMR project’s documentation is written in reStructuredText (reST), generated using Sphinx, and hosted by Read the Docs.

102 Chapter 3. Scope iAM.AMR reStructuredText reST, or reStructuredText, is a lightweight markup language originally created to document the programming language Python. A markup language, such as reST, is a system for creating and formatting complex documents from simple plain-text files.

Tip: Wondering what a plain-text file is? Check out the description here.

In a nutshell, markup languages define document properties (such as titles, chapters, paragraph breaks, bolds and italics) within the text itself, ensuring that no matter how the document is conveyed to the reader (e.g. as a website, as a PDF, or as an e-book), the content will always be structured in the same way.

Note: If you’ve ever used LaTex or designed a website in HTML, you’ve already used a markup language!

This is in contrast to a ‘What-You-See-Is-What-You-Get’ or WYSIWYG system, such as Microsoft Word or Google Docs, where the content and structure of the document are specified separately and are interpreted in a way unique to the program itself. The drawbacks of such a system are evident if you’re a frequent user of Word and Google Docs (or Pages on a Mac) – if you open a document from one of these programs using another, you’ll often see inexplicable formatting errors caused by differing interpretations of the structure of the document. reST is relatively simple to learn. reST uses lines of symbols to designate headings, brackets to designate links, and tags to designate more complex document structures. To give you an idea of how simple reST is, the plain-text version of the above section is included below, and reference material for reST is provided in a subsequent section: iAM.AMR Documentation ======

About this Documentation ------The iAM.AMR project's documentation is written in `reStructuredText

˓→sourceforge.net/rst.html>`_ (reST), generated using `Sphinx

˓→org/en/master/index.html>`_, and hosted by `Read the Docs

˓→`_. reStructuredText ~~~~~~~~~~~~~~~~ reST, or `reStructuredText `_, is a

˓→lightweight `markup language `_

˓→originally created to document the programming language `Python

˓→org/>`_.

A markup language, such as reST, is a system for creating and formatting complex

˓→documents from simple plain-text files. In a nutshell, markup languages define ˓→document properties (such as titles, chapters, paragraph breaks, **bolds** and ˓→*italics*) within the text itself, ensuring that no matter how the document is ˓→conveyed to the reader (e.g. as a website, as a PDF, or as an e-book), the content

˓→will always be structured in the same way.

.. note:: If you’ve ever used `LaTex `_ or designed a

˓→website in `HTML `_, you’ve already used a

˓→markup language!

This is in contrast to a ‘What-You-See-Is-What-You-Get’ or WYSIWYG system, such as ˓→Microsoft Word or Google Docs, where the content and structure of the document(continues on next are page) ˓→specified separately and are interpreted in a way unique to the program itself. The

˓→drawbacks of such a system are evident if you’re a frequent user of Word and Google

3.25.˓→Docs Documentation (or Pages on a Mac) -- if you open a document from one of these programs using103

˓→another, you’ll often see inexplicable formatting errors caused by differing

˓→interpretations of the structure of the document. iAM.AMR

(continued from previous page) reST is relatively simple to learn. reST uses lines of symbols to designate headings,

˓→brackets to designate links, and tags to designate more complex document structures.

˓→ To give you an idea of how simple reST is, the plain-text version of the above

˓→section is included below, and reference material for reST is provided in a

˓→subsequent section.

Sphinx and Read the Docs

Sphinx (likewise created to document the programming language Python), is the tool used to convert plain-text reST into the desired output format (e.g. website, PDF, or e-book). Read the Docs is a free hosting service supported by unobtrusive and ethical ads, which we use to host our documen- tation online. Read the Docs uses Sphinx to generate the web-pages, and provide PDF versions of the documentation for offline readers.

Version Control and GitHub

The master-copies of the documentation (in reST format), are stored in a GitHub repository, under version control. When Read the Docs detects a change in these files, it automatically rebuilds and updates the website accordingly.

Note: Changes to the documentation require 2 – 10 minutes to propagate from GitHub to Read the Docs.

Tip: If the documentation site does not reflect a recent change, ensure you ‘hard-refresh’ your browser using CTRL+F5 on Chrome or Firefox on Windows, or by following the instructions here.

3.25.2 Editing this Documentation

Note that the use of GitHub is outside of the scope of this section. Please refer to the GitHub-specific documentation.

How to Add a Page

This is how to add a page!

Minor Edits

To make a minor edit or addition, follow the link in the upper-right corner of the target page to GitHub, where you can fork the document, make changes, and submit a pull request.

Major Edits

To make a major edit or addition, we recommend you clone the repository to your workstation and use an IDE such as Visual Studio Code – coupled with a local version of Python and Sphinx – to preview the changes before submitting a pull request.

104 Chapter 3. Scope iAM.AMR

Installing VSC, Python, and Sphinx

Detailed installation instructions to install VSC, Python, and Sphinx is outside the scope of this documentation. In brief: 1. Download and install Visual Studio Code 2. Download and install the latest version of Python 3. Download and install the latest version of Sphinx and Read the Docs Sphinx Theme • using PIP (via Python): pip install -U sphinx sphinx-autobuild sphinx_rtd_theme 4. Enable the Python and reStructuredText extensions in VSC

Tip: If the preview window in VSC displays content without the theme (i.e. colours, formatting), ensure the explorer panel is open to the root directory (where build/ and source/ are) so VSC can locate conf.py that specifies the theme.

3.25.3 Documentation FAQs

How do I view online images at full-size?

To view images on the website at full-size, right-click on the image and select open in new tab or open in new window.

3.25.4 reStructuredText

Guides

• reST Full Specification • reST Quick Reference • Sphinx’s reST Primer • reST Cheatsheet

Document Layout

General

There shall be two blank lines at the start of each document. There shall be three blank lines at the end of each document.

Font

Italic text is specified by surrounding text with one asterisk. Bold text is specified by surrounding text with two asterisks:

*this text is italic*

**this text is bold**

3.25. Documentation 105 iAM.AMR

Headings

There should be one blank line between sections of the same level (e.g. H1 – H1) and between a section and a sub- section (e.g. H1 – H2). There should be two blank lines between a sub-section and a greater section (e.g. H2 – H1). There should be no blank line between a heading and the section’s contents, where contents exist:

Section ======contents

Sub-section ------contents

Next Section ======

Sub-section ------contents

The following symbols should be used for headings:

H1 === H2--- H3~~~ H4+++ H5^^^

Only H1 and H2 level headings should use Title Case. Sub-headings should use Sentence case.

Heading Labels

To link to a duplicated heading (i.e. two sections in the same document have the same heading), you will need to specify a heading label. Heading labels should be used where the heading is a common word, phrase, or where the heading is known to be repeated later in the document. Heading labels are placed above the heading, with a blank line seperating the heading label and heading. Where heading labels are used, two blank lines should come before it, regardless of the heading level.

.. _this_is_a_heading_label:

This is the Heading ------

If there is a duplicated heading, you will recieve a build warning regardless of your specified label (as autosectionlabel creates its own labels automatically). The duplicated label will be ambiguous (testing seems to show it will default to the last entry), and therefore not suitable for linking. We use a slightly different format for links to a manually labeled section (we drop the path); see the links section below for more details.

Links

106 Chapter 3. Scope iAM.AMR

Internal links

:ref:`text `

Internal links to manual labels

:ref:`text `

External links

`text `_

Internal links to downloads

:download:`text `

Admonitions

Admonitions are specially marked topics or notes which appear inline with other content. They can be styled with custom CSS.

Standard

Example:

.. attention:: This is an attention admonition.

Attention: This is an attention admonition.

Caution: This is a caution admonition.

Danger: This is a danger admonition.

Error: This is an error admonition.

Hint: This is a hint admonition.

3.25. Documentation 107 iAM.AMR

Important: This is an important admonition.

Note: This is a note admonition.

Tip: This is a tip admonition.

Warning: This is a warning admonition.

Custom

Example:

.. admonition:: This is a Custom Admonition

And this is its content.

This is a Custom Admonition And this is its content.

References

The quick brown fox jumped over the lazy [#chapman]_ dog.

..[ #chapman] Chapman, B. et al. (2019) The laziness of the common dog. Journal.

˓→Issue. DOI.

The quick brown fox jumped over the lazy1 dog.

Images

.. image:: images/image_name.png :height: 100px :width: 200 px :scale: 50% :alt: alternate text :align: right

The same fields are applicable for figures.

1 Chapman, B. et al. (2019) The laziness of the common dog. Journal. Issue. DOI.

108 Chapter 3. Scope iAM.AMR

Figures

.. figure::/images/figure_name.png :align: center

This is the descriptive text for the figure.

Text Substitutions

To setup a text substitution, add a block to your conf.py: rst_prolog= """ .. |placeholder| replace:: Definition .. |other| replace:: other definition """

Both rst_prolog and rst_epilogue should enable substitution. The following solution has also been proposed, but is untested: address='192.168.1.1' port='port 3333' rst_prolog= """ .. |address| replace:: {0} .. |port| replace:: {1} """.format( address, port )

Then, simply add |placeholder| to your document to access the substitution.

3.26 Data Harmonization

To ensure CEDAR is interoperable with other databases, existing ontologies were used – where possible – to establish controlled vocabularies.

3.26.1 External Data Sources in CEDAR

Antimicrobials

Use

A list of antimicrobials which may be administered, or to which resistance may be developed.

Description

The list of antimicrobials was generated from the 2020 edition of the WHO Collaborating Centre for Drug Statistics Methodology’s ATCvet system.

3.26. Data Harmonization 109 iAM.AMR

According to the WHO CCDSM: ATCvet is a system for the classification of substances intended for therapeutic use in veterinary medicine, and can serve as a tool for the classification of medicinal products. The ATCvet system is based on the same overall principles as the ATC system for substances used in human medicine. In most cases, an ATC code exists which can be used to classify a product in the ATCvet system. The ATCvet code is then created by placing the letter Q in front of the ATC code. In both the ATC and the ATCvet systems, preparations are divided into groups, according to their therapeutic use. First, they are divided into anatomical groups (1st level), on the basis of their main therapeutic use. Within most of the 1st level groups, preparations are subdivided into different therapeutic main groups (2nd level). Two levels of chemical/therapeutic/pharmacological subgroups (3rd and 4th levels) provide further subdivisions. At a 5th level, chemical substances are classified.

Data source and notes

The entire catalogue of ATCvet codes was scraped from the WHO CCDSM’s website. All entries in anatomical groups QJ and QP, Antiinfectives for systemic use and Antiparasitic products, insecticides and repellents, were included in CEDAR. Where the same chemical substance (drug) appeared multiple times in different anatomical groups (level 1), the order of preference was: QJ > QP > Others. Where the same chemical substance (drug) appeared multiple times in different therapeutic main groups (level 2), the classification with the broadest scope was preferred. For examaple, ceftiofur appears in both QJ01 (ANTIBACTE- RIALS FOR SYSTEMIC USE) and QJ51 (ANTIBACTERIALS FOR INTRAMAMMARY USE). The QJ01 code is preferred, as it has the broadest application (systemic vs intramammary). Other antimicrobials that are not included in anatomical groups QJ or QP were added on an ad-hoc basis, as needed. Additionally, several genes associated with resistance were added alongside the antimicrobials, under the classification closest to the resistance they effect. This allows users to select antimicrobials or genes as the resistance-outcome measured.

Locations

Use

A list of countries, and sub-country regions where the study was conducted or from which the study population was sourced.

Description

The list of locations were obtained from the ISO 3166 standard Codes for the representation of names of countries and their subdivisions. ISO 3166 is a standard published by the International Organization for Standardization (ISO) that defines codes for the names of countries, dependent territories, special areas of geographical interest, and their principal subdivisions (e.g., provinces or states). The official name of the standard is Codes for the representation of names of countries and their subdivisions. ISO 3166-1 – Codes for the representation of names of countries and their subdivisions – Part 1: Country codes defines codes for the names of countries, dependent territories, and special areas of geographical interest. It defines three sets of country codes: • ISO 3166-1 alpha-2 – two-letter country codes which are also used to create the ISO 3166-2 country subdivision codes and the Internet country code top-level domains.

110 Chapter 3. Scope iAM.AMR

• ISO 3166-1 alpha-3 – three-letter country codes which may allow a better visual association between the codes and the country names than the 3166-1 alpha-2 codes. • ISO 3166-1 numeric – three-digit country codes which are identical to those developed and maintained by the United Nations Statistics Division, with the advantage of script (writing system) independence, and hence useful for people or systems using non-Latin scripts. ISO 3166-2 – Codes for the representation of names of countries and their subdivisions – Part 2: Country subdivision code defines codes for the names of the principal subdivisions (e.g., provinces, states, departments, regions) of all countries coded in ISO 3166-1. ISO 3166-3 – Codes for the representation of names of countries and their subdivisions – Part 3: Code for formerly used names of countries[4] defines codes for country names which have been deleted from ISO 3166-1 since its first publication in 1974. The ISO 3166-1 standard currently comprises 249 countries, 193 of which are sovereign states that are members of the United Nations. Many dependent territories in the ISO 3166-1 standard are also listed as a subdivision of their parent country in the ISO 3166-2 standard.

Data source and notes

The UNECE provides a list of the ISO 3166-1 codes.

3.27 Math and Stats

3.27.1 Beta

Prevalence, in the context of epidemiology, is the proportion of a population expressing a characteristic of interest. A deterministic estimate of prevalence can be calculated as k/n, where k is the number of ‘positive’ units, and ‘n’ represents the total number of units assayed. Alternatively, we can express our estimate of prevalence probabilistically, using a Beta distribution. In this context, we think of each assay as a Bernoulli trial (or as a series of binomial trials).

3.27.2 Meta-analysis

Meta-analysis is a technique for aggregating the findings of related studies. A meta-analysis is basically a weighted average of the effect where: • Generally, we use an inverse weighting scheme, where studies are weighted by the inverse of their variance • Higher variance = small sample size = less certainty = less weight in average

Types of Models

A fixed-effects model assumes each study demonstrates a common intervention effect: • There is a true, constant, but unknown effect - variation arises from sampling error • Applicable when all possible studies/objects are enumerated • Goal of internal validity A random effects model assumes there is a common mean intervention effect:

3.27. Math and Stats 111 iAM.AMR

• There is a mean, unknown effect - variation arises from sampling error and differences between studies • Useful when we can’t enumerate all studies/objects; or we expect more to arise in future • Goal of external validity “the relative weights assigned under random effects will be more balanced than those assigned under fixed effects. As we move from fixed effect to random effects, extreme studies will lose influence if they are large, and will gain influence if they are small.”

3.28 Indices and tables

• genindex • modindex • search

112 Chapter 3. Scope