PhUSE 2018

Paper DS04

The implementation of SDTM standards for non-submission purposes

Paul Vervuren, Nutricia Research, Utrecht, the Netherlands Lieke Gijsbers, OCS Life Sciences, 's-Hertogenbosch, the Netherlands

ABSTRACT Recognizing the value of its data, Nutricia Research performed an SDTM conversion project for twenty-one infant nutrition studies. Since 2015, Nutricia Research and OCS Life Sciences have been working together on this project, building expertise in the SDTM conversion of early life nutrition data, and jointly defining some specific SDTM domains. In this paper we will presents our approach and collaboration, highlight several examples of tools and solutions used in the implementation, discuss some of the challenges faced, and reflect on the use of SDTM as a standard for non-submission purposes.

INTRODUCTION While clinical studies serve to prove the safety and efficacy of medical products, its data, because of the high intrinsic quality, is of value beyond the lifetime of an individual study or the collection of studies. Applying standards and implementing them in the right way, is key to establish and preserve that value. Anyone who has worked with the SDTM standard (see the third section of this paper for an introduction on SDTM), will probably acknowledge that the fundamental issue in its application is not so much to be conformant (i.e. to be exact in applying structure and terminology), but that it lies in making the right implementation decisions. Right meaning first of all the ensurance that the data when mapped to SDTM keep their original meaning. It also means making the best use of the flexibility of the standard, in order to organize the data in a logical way, facilitating review and analysis, and in agreement with the specific needs of the research area. The latter need explains the rise of SDTM’s Therapeutic Area (TA) standards, presenting concepts and examples for a range of therapeutic areas, e.g. Diabetes, Multiple Sclerosis, Virology and Alzheimer’s Disease. This emphasis on decisions does not mean that ensuring conformance to the standard is less relevant or that it is without issues or challenges; clearly, resolving and documenting SDTM non-conformities can be quite a task. Nutricia Research has been conducting an SDTM conversion project involving twenty-one infant nutrition studies. Compared to a submission trajectory this project was different because: − Substantial tailorization was needed in overall design, data collection modules and CRF systems. Arguably this was significantly greater than you would find in a typical submission. − Studies and domains were added along the way. As the integrated dataset grew, design solutions and definitions needed to be updated. − Documentation and conformance requirements were not as explicit and clear up front as in a submission. This project has been a collaborative effort between Nutricia Research (as sponsor) and OCS Life Sciences (as service provider). Next to the right technical expertise and approach, a successful project approach and collaboration model is key to project success. In this paper we will highlight some typical SDTM domains and the mapping issues encountered, and we will present our key learnings. We hope and expect that broader lessons can be drawn from our work, helping data warehouse projects (e.g. SDTM implementations across compounds or companies) as they are likely to face similar issues. We will start this paper with an overview of the project and will reflect on differences with an SDTM implementation for a submission. Next, we will discuss the evolution of the project, addressing in particular adaptions in the approach and collaboration. Then we’ll go into the quality control and validation approach. Subsequently, we will dive into several typical domains and present solutions for the SDTM mapping challenges we faced.

SCOPE, REQUIREMENTS AND APPROACH OF OUR PROJECT This SDTM conversion project involved twenty-one Nutricia Research infant studies. The overall goal of the project was to standardize and pool the data of these studies to build a data warehouse enabling cross-study analyses for exploratory, scientific research purposes in the area of infant nutrition. Selection of the studies was based on business relevance, design characteristics (e.g. randomized controlled, presence of a breastfed reference arm), and size of the study (number of subjects). Infant formula was the treatment intervention in all studies but there was variety across studies in the formula(s) used.

1 PhUSE 2018

Our way of dealing with standards differed from regulatory (e.g. FDA) submissions in the sense there was no strict requirement to adhere to submission guidance. Nonetheless, it was thought that conformance to industry standards would bring two main benefits: 1. tools and software that work with the standards will generally only work if (submission) standards are strictly followed; being just inspired by a standard or loosely following a standard is generally insufficient and may lead to failure to load data or in many conformance errors when using applications built for standard data; 2. exchange of the data at a possible later stage, e.g. for submission or other purposes like internal or external sharing, would be greatly facilitated. The latter advocates the application of documentation standards like define. in addition to the data content standards. There are many aspects to consider when preparing a submission to a regulatory agency like FDA, for instance: dataset structure and content, folder structure of the eCTD (electronic Common Technical Document), documentation, and handling of data validation issues (e.g. see Tinazzi & Marchand, 2017). It is outside the scope of this paper to make a full and detailed comparison between the standards applied in our project with those of a regulatory submission like an IND, NDA. But let’s briefly look at how typical submission standards were applied in our project.

Component Regulatory submission (FDA)1,2 Nutricia project SDTM datasets Required for all submitted clinical data; must be SDTM Desired; SDTM conformant to facilitate conformant and able to fix/explain all validation issues to use of tools and associated standards. ensure acceptance by agency ADaM datasets “Should be used to create and to support the results in Not part of the conversion project, but clinical study reports, Integrated Summaries of Safety (ISS), the intended standard for analyses and Integrated Summaries of Efficacy (ISE), as well as other based on the integrated data. analyses required for a thorough regulatory review.” Trial Design tables “All TD datasets should be included, as appropriate for the Nice-to-have; not included specific clinical trial, in SDTM submissions as a way to describe the planned conduct of a clinical trial. Specifically, the Trial Summary (TS) dataset will be used to determine the time of study start.” Define.xml “The data definition file describes the of the Desired submitted electronic datasets and is considered arguably the most important part of the electronic dataset submission for regulatory review.” Annotated CRF Required; proving a link between the data collection fields Desired and the SDTM variables and terms. Reviewer guides Expected with SDTM and ADaM datasets Nice-to-have; in particular to document (RG) data conformance issues and their handling Study Data Expected; mechanism to assists FDA in identifying potential Not used and not considered relevant at Standardization Plan data standardization issues early in the development this point (SDPS) program Legacy Data Expected as part of the RG when legacy data was converted Available in mapping sheets, design Conversion Plan and to SDTM and ADaM; in order to provide an explanation of documentation and meeting minutes. Report the process, record traceability, issues, adjudications. SAS transport files Required Not required (and not used)

With the exception of transport files and the Study Data Standardization Plan, the various submission standards were either used or considered useful in the context of our project. How standards like Reviewer’s Guides for SDTM and ADaM, and other documentation (e.g. Legacy Data Conversion Plan and Report) can be used in the context of the Nutricia data warehouse will require further thought.

INTRODUCTION TO SDTM AND TO SDTM CONVERSION For readers that are not familiar with SDTM, this section gives an introduction. SDTM, Study Data Tabulation Model, is one of the CDISC standards. SDTM presents dataset structures for clinical study data. It is intended to organize, ‘tabulate’, raw clinical data in a standard manner. For derived variables (like ‘Change from baseline’), supporting statistical analysis, there is a separate model called ADaM (Analysis Data Model), which is closely linked to SDTM and inherits various raw data variables from SDTM for the purpose of clarity and traceability. SDTM development started in 1999 as an FDA submission data standard. Its first production version was released in 2004, with SDTM model v1.0 and SDTMIG v3.1 (the Implementation Guide). See Wood, 2008.

1 See Study Standards resources on the FDA website: www.fda.gov/ForIndustry/DataStandards/StudyDataStandards/default.htm, and, specifically, ‘Providing Regulatory Submissions In Electronic Format — Standardized Study Data, Guidance for Industry’ www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM292334., and the related ‘Study Data Technical conformance Guide’. www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM384744.pdf. 2 See PhUSE CSS work on regulatory deliverables, specifically on reviewer guides at https://www.phuse.eu/css-deliverables 2 PhUSE 2018

SDTM has a number of predefined datasets (referred to as domains) with two-character names, for example DM (Demographics), MH (Medical History), AE (Adverse Events), LB (Laboratory Test Results), VS (Vital Signs), CM (Concomitant Medications), EX (Exposure), and several others. These domains contain standard variables that follow naming conventions and datatypes (character, numeric). Here is an example of a DM dataset:

DM dataset example for two subjects with a selection of variables (total number of standard variables DM is twenty-eight) STUDYID DOMAIN USUBJID SUBJID RFSTDTC RFENDTC SITEID AG AGEU SEX RACE E XYZ01 DM XYZ0101001 001 2017-01-12 2017-07-21 01 2 MONTHS F ASIAN XYZ01 DM XYZ0101002 002 2017-09-02 2018-03-21 01 1 MONTHS M WHIT E RFSTDTC: Reference Start Date, RFENDTC: Reference End Date.

For some variables, controlled terminology (a codelist) applies. For instance, to conform to SDTM, the individual values of variable SEX need to match the following codelist: F (Female), M (Male), U (Unknown), or UN (Undifferentiated). The domains are subdivided in four classes called Findings (e.g. VS and LB), Events (e.g. AE and MH), Interventions (e.g. CM and EX) and Special Purpose (e.g. DM). Domains within the Findings, Events and Intervention classes have common logic and naming conventions. As explained below, the standard domains have a fixed set of variables, which can only be extended using common variables for all classes (like special timing variables) and with some class-specific variables. Any raw data that logically belongs to a domain but that cannot be assigned to the default or extended set of class- specific variables, can only be added in separate dataset, called Supplemental Qualifier dataset. Supplemental Qualifier datasets are linked to the domain to which the additional variables (‘qualifiers’) belong, for example SUPPDM with DM, SUPPAE with AE, etc. This mechanism can be used for additional variables like population flags, e.g. ITT and PP variables, which are not foreseen in the SDTM model, or for original values of mapped data. A common situation where this is applied is with variable RACE, whose codelist is limited to the following terms: − BLACK OR AFRICAN AMERICAN − AMERICAN INDIAN OR ALASKA NATIVE − ASIAN − NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER − WHITE − OTHER Depending on the study, there can be additional options for race on the CRF (e.g. Chinese, Japanese), or there can be a free text field to specify the race when ‘Other’ is selected. In the first situation, one needs to perform a mapping to ensure that the values for RACE conform to SDTM controlled terminology. For example, Chinese will be mapped to RACE=’ASIAN’. The original value, ‘CHINESE’, can be stored in the SUPPDM dataset. Next example of a SUPPDM dataset shows how this is implemented. In addition to the original race value for subject 001, the dataset holds ITT and PP flags for each of the two subjects.

SUPPDM dataset example for two subjects STUDYID RDOMAIN USUBJID QNAM QLABEL QVAL QORIG QEVAL XYZ01 DM XYZ0101001 ITT IntenttoTreat Y DERIVED SPONSOR XYZ01 DM XYZ0101001 PP Per Protocol Y DERIVED SPONSOR XYZ01 DM XYZ0101001 RACEOR Race (original) CHINESE CRF XYZ01 DM XYZ0101002 ITT IntenttoTreat Y DERIVED SPONSOR XYZ01 DM XYZ0101002 PP Per Protocol N DERIVED SPONSOR

It is obvious that engineering will be needed at the moment SUPPDM needs to be used for reporting and analysis. That is, the dataset will need to be transposed in a way that ITT, PP and RACEOR become variables/columns, and a merge needs to take place with the DM dataset in order to process ITT, PP and RACEOR in combination with DM variables. CDISC provides extensive guidance and examples in the SDTM Implementation Guide (SDTMIG). For each domain, specifications are provided on the variables included (see below example for DM from the SDTMIG v3.2). Each SDTM domain has a set of variables that are classified as either Required, Expected or Permissible. Both Required (Req) and Expected (Exp) variables must be in a dataset in order to qualify the dataset as SDTM, while Permissible (Perm) variables can be omitted from the dataset if they are not relevant to the data. In contrast to Required variables, Expected variables are allowed to have missing values. The standard domains cover all areas of data typically collected in a clinical study, but there can and usually will be some sets of raw data that do not fit a standard domain (e.g. Gastrointestinal Tolerance). There can also be practical reasons to consider the creation of custom domain, e.g. to split up (a large) raw lab dataset in different custom lab datasets to facilitate handling and analysis. SDTM accommodates this by allowing new, custom domains to be 3 PhUSE 2018 created based on the general classes Findings, Events and Interventions.

For each standard domain, requirements and considerations are elaborated in SDTMIG in a separate ‘Assumptions’ section, e.g. for DM (only the first three points given):

5.1.1.1 ASSUMPTIONS FOR DEMOGRAPHICS DOMAIN MODEL 1. Investigator and site identification: Companies use different methods to distinguish sites and investigators. CDISC assumes that SITEID will always be present, with INVID and INVNAM used as necessary. This should be done consistently and the meaning of the variable made clear in the define.xml. 2. Every subject in a study must have a subject identifier (SUBJID). In some cases a subject may participate in more than one study. To identify a subject uniquely across all studies for all applications or submissions involving the product, a unique identifier (USUBJID) must be included in all datasets. Subjects occasionally change sites during the course of a clinical trial. The sponsor must decide how to populate variables such as USUBJID, SUBJID and SITEID based on their operational and analysis needs, but only one DM record should be submitted for the subject. The Supplemental Qualifiers dataset may be used if appropriate to provide additional information. 3. Concerns for subject privacy suggest caution regarding the collection of variables like BRTHDTC. This variable is included in the Demographics model in the event that a sponsor intends to submit it; however, sponsors should follow regulatory guidelines and guidance as appropriate.

These Assumptions illustrate that decisions need to be made when implementing the standard, and that raw data variables may have to be reorganized and sometimes transformed to fit the standard. The complexity of the standard is also illustrated by the mapping example of race where we saw that a CRF entry of ‘CHINESE’ cannot be simply stored in variable RACE of DM, but that it needs to be mapped while the original value is put in a separate dataset.

Such transformations may seem strange and unexpected at first, but they are the logical consequence of the SDTM standard’s rules and constraints. The implication of a deviation, e.g. when ‘CHINESE’ would be stored in DM.RACE, would not be a loss of the integrity of the data, but would present a non-conformance. Non-conformance may lead to error messages when loading data in SDTM review tools, may lead to compatibility issues when study data is pooled, and may lead to non-acceptance by regulatory agencies when submitted. So conformance is key.

Another example where transformations may be needed is when a set of related raw data variables belong to different observation classes, e.g. Events and Findings. The SDTMIG describes a mock scenario of migraine data recorded on a diary, where the date and time of migraine occurrence (an event) gets represented in the Clinical Events (CE) domain, and the associated scoring of symptoms is mapped to the Findings About (FA) domain. In scenarios like this, the raw data records are often combined into one dataset because they originate from a single CRF or diary. Splitting these data up into separate domains demands a mechanism to be able to link records that belong together. This is achieved by using variables with a unique key that are added to both datasets, and using a special standard dataset called RELREC to describe this key and the type of relationship. It can be concluded that SDTM provides the flexibility and the guidance to hold a wide variety of clinical data while allowing customizations and tailorization to suit sponsor requirements and analysis needs. Dependent on the structure and qualities of the raw data, a smaller or greater effort will be needed to ensure SDTM conformance. Data that was collected with SDTM in mind (which is the case when using CDISC’s data collection standard: CDASH) requires only a limited translation. However, when standardizing legacy studies that each have different dataset structures, naming conventions and terminologies, substantial transformations as well as mapping decisions are likely to be needed on a study by study basis (and decisions may need to be re-evaluated as more studies are added). This is referred to as legacy data conversion. On top, integrating different studies (data warehousing) will call for additional standardization, e.g. the harmonization of (lab) test categories, treatment arms, visit schedules, units, etc.

4 PhUSE 2018

As to usage of the SDTMs, while SDTM was designed to support data review and analysis, some datasets require back-transformation before usage, as is the case with SUPP-- datasets, or with related records subdivided between a Findings About and an Event (or Interventions) dataset. Next to structural transformation, character to numeric conversion may be needed (as SUPP-- datasets only contain character variables), depending on the variable. All in all, the implementation of SDTM and SDTM legacy conversion can be a complex process that involves both a deep understanding of the standard’s principles, flexibility and guidance as well as the recognition of a sponsor’s own needs and to incorporate that while respecting the SDTM conformance requirements.

Note that the technical storage format, i.e. Excel, XML or ascii text file, or SAS dataset; is not defined in or dictated by SDTM. The submission standard is SAS transport file (.xpt file), a binary file type that is not human-readable but that requires appropriate software.

EVOLUTION OF PROJECT APPROACH AND COLLABORATION The arrangement between Nutricia Research and OCS Life Sciences for this project was an agreement where OCS would conduct the SDTM legacy conversion and generate the following deliverables:

− Designs of the SDTM datasets − Mapping sheets − Annotated CRFs − SAS programs performing the conversion − SDTM data sets − Validation documentation − Data exception report The project encompassed twenty-one studies. In the first phase of the project the following modules were selected: demographics and subject characteristics, vital signs, adverse events, gastrointestinal tolerance, concomitant medication. The main assumption underlying this arrangement was that the OCS project team would be able to perform the design and mapping independently and that questions and issues could be mostly handled via email. However, at an early state we discovered that there were implementation choices, data issues and exceptions whose resolution were a matter of preference. In most cases, OCS could define the options but final decisions needed to be taken by Nutricia. Having had several ‘implementation deep-dives’ triggered by mapping questions, we decided to organize these meetings between the OCS project team and Nutricia domain experts from Data Management and Statistical Programming on a regular basis. Thus, we changed the approach from a delivery model to a delivery-and- collaboration model. An additional advantage of this adjusted approach, and the fact that meetings happened at Nutricia offices, was that specialists for specific areas (e.g. for microbiology data) could be quickly and easily involved in the process. In support of this collaboration we established a secure shared drive for the exchange of (draft) deliverables. This greatly helped the combined teams to efficiently exchange information and work in progress. This exchange may seem a trivial matter, and it’s not so much the establishment of the technical provision, but the open exchange of draft work and proposals, that made the difference. In a second phase, the project was extended to include new domains. The idea in principle was to convert all data to SDTM. At the same time there was a need to be cost-efficient and make decisions based on the potential value of individual variables. Because of the great variability in general design, CRF design, dataset structure and naming conventions across studies, it was difficult to make a quick assessment of the number of possible target domains and data that studies had in common at a more granular level (variables, lab tests, etc.). To enable such an assessment, a raw data inventory was made and the results were put in a spreadsheet. The spreadsheet held an overview of all studies, source datasets/variables and their preliminary mapping. This was established based on the review of the protocol (in particular for non-CRF data), as well as the CRFs and was verified against the datasets (which also acted as a check on the completeness of the transfer of raw data from Nutricia to OCS). This resulted in a full picture of the available raw data, which allowed us to make informed decisions on the scope and priorities, and it made estimations of work breakdown and required effort more transparent. Later on, this also became a useful search tool for users of the data.

5 PhUSE 2018

Raw data inventory

QUALITY CONTROL Quality control procedures were put in place to verify the following requirements on the SDTM datasets: − Traceability: the SDTM datasets and their component parts (variables, value level metadata) can be traced back to the raw source data and mapping decisions − Reproducibility: the SDTM datasets can be reproduced on Nutricia systems using the conversions programs developed by OCS − Conformance: the SDTM datasets are SDTM conformant, i.e. they meet the specific design specifications as well as the general SDTM model specifications − Completeness: the SDTM datasets are complete, i.e. all raw data items that were designed to be mapped are indeed present in the SDTM datasets − Integrity: data integrity is preserved, i.e. data points are not inadvertently affected (e.g. by truncation), no loss of records To verify these requirements a risk-based approach was taken based on the probability (P), impact (I) and non- detectability (ND) of issues, each with a five-level scale (see appendix for definitions). Non-detectability is defined as the likelihood of the issue not being detected before harm occurs. The risk score is defined as P x I x ND, and can range from 1 to 125, as shown in next table.

Risk score ranges and testing requirements based on Probability, Impact and Non-detectability of issues and errors Risk Score Range Risk Score Category Testing requirement >65 – 125 Non-tolerable Mitigation actions required to bring the risk below 65. >40-65 High Extensive testing >20-40 Medium Single testing 1 - 20 Low No testing

The risk scoring per requirement looked at follows: Risk scores per requirement of project deliverables Requirement P I ND Risk Comment Score Traceability 2 2 4 16 Process is traceable by design (mapping sheet, program log, etc.); no direct impact on analysis results. Any issues will only pop-up later (thus quite undetectable). No testing is performed. Reproducibility 4 3 2 24 When running programs on a different system issues can easily occur; impact intermediate (also considering that analyses will not be on a critical path), but issues are likely to emerge immediately when running the program. Basic testing performed is to run conversion programs on Nutricia systems, and programs are checked to ensure they meet demands of ‘Good Programming Practice’. Conformance 4 3 4 48 There are many conformance aspects, which increase the likelihood of issues; impact not as high as incompleteness or integrity issues (also because not intended for submission). Generally not directly detectable. Extensive testing required, via automated (metadata) checks and independent review of design versus datasets. Completeness 2 4 5 40 Checks are built into the process to verify that all datasets and variables are mapped. Impact of incompleteness in the SDTMs could be substantial, and may go unnoticed. Checking required of input versus output in particular on record loss (e.g. when raw data is split across multiple domains. Integrity 3 4 5 60 Integrity issues like truncation of text strings, formatting/rounding problems; but also mistakes in derivations. Impact can be high and can go unnoticed in certain cases. Extensive checking required.

6 PhUSE 2018

Based on this risk assessment, extensive validation testing was performed on the SDTM datasets to ensure SDTM conformance, as well as the completeness and integrity of the data. Basic testing was considered adequate for reproducibility and no testing was done for traceability in line with the low risk level. Following this overall assessment, a validation procedure was designed to verify conformity, completeness and integrity. This procedure was based on Nutricia’s instructions and templates for QC of Statistical Programming activities, but tailored to the specific needs and setting of this project. Since there were specifics and complexities to take into account per study (like differences in data structure, CRF design), QC was performed and documented by study. The overall flow is as follows: 1. Developer sets up a QC plan per study in a spreadsheet. This plan details the specific areas (variables, derivations) that require verification. The plan has one record per program (usually one or two domain datasets are produced per program). In case of complex mappings that involved a special subject matter expert, this expert is planned to perform independent review of the data and the mapping specifications. 2. The QC plan is reviewed by the project lead. Updates are made when needed, and, after approval, execution can start. 3. A QC tracking sheet based on the QC plan is kept to monitor and record progress. 4. Independent tester checks the specific areas in the QC plan as well as the default checks on reproducibility, conformance and integrity (as listed in the QC report template). When applicable the subject matter expert performs the ‘expert check’. 5. Any rework is flagged, performed and verified, and documented by the tester in the QC report. 6. After completion, the project lead checks compliance with the QC plan and signs the QC Summary Form, which documents the compliance check.

Checks on conformance, completeness and integrity are integrated in this procedure in the following manner: − Conformance: automated conformance checks using the Pinnacle21 software are performed by the program developer; exception reports are checked during validation by the independent tester. When applicable an expert review is performed. − Completeness: the independent tester verifies that ‘all required or expected variables are present’, and that the ‘number of records output is as expected’ (i.e. matching the input 1:1, or logically increased or reduced due to SDTM-based transformations). (Automated processes and checks in the development phase ensure that all variables are in the mapping sheet.) − Integrity: independent tester is performing a close visual inspection on a selection of records in the SDTM datasets versus its raw data, checking the data variable by variable, looking at possible truncation and other possible discrepancies. It may be assumed that some potential integrity issues will be triggered by conformance checks (e.g. checks on missing values of required variables, conformance to controlled terminology).

CASE STUDY: LABORATORY FAECAL PARAMETERS Now let’s look at a case from this data conversion project. In nutrition studies, laboratory findings such as faecal parameters (stool physiology and microbiology) are common outcome measures. A total number of 66 common faecal parameters were present in the data of the studies in scope of this project. Mapping these data presented some challenges and required frequent expert meetings at Nutricia in order to discuss and resolve all issues, and to come to a desired SDTM domain implementation. When looking for a suitable SDTM domain for these data, we initially looked at the standard Findings domains LB (Laboratory Test Results) and MB (Microbiology Test Results), but neither was useful for these parameters. The LB domain is commonly used for (but is not restricted to) safety lab tests. Variables expected in the LB dataset such as LBORNRLO (Reference Range Lower Limit in Original Unit), LBORNRHI (Reference Range Upper Limit in Original Unit), LBNRIND (Normal Range Indicator), and related range indicators designed to flag out-of-range values, are typical of standard blood biochemistry, haematology and urinalysis parameters. Yet, they are not applicable to the faecal parameters in nutrition studies as most parameters are used for exploratory data analyses, and values can be highly variable depending on factors such as dietary intake and natural gut flora. This is what an LB dataset record (selection of variables) may look like:

LBTESTCD LBTEST LBCAT LBORRES LBORRESU LBORNRLO LBORNRHI LBNRIND ALB Albumin CHEMISTRY 30 g/L 35 50 LOW

The MB domain normally describes microbiology specimen findings, including ‘gram stain results’ and ‘organisms found’, for example using MBTEST = “Organism Present”, and variable MBORRES holding the name of the organism:

7 PhUSE 2018

MBTESTCD MBTEST MBORRES MBMETHOD GMNROD Gram Negative Rods 2+ FEW GRAM STAIN ORGANISM Organism Present KLEBSIELLA PNEUMONIAE MICROBIAL CULTURE, SOLID

We concluded the MB domain was not suitable for us, as our data consisted of targeted microorganisms or markers as the topic of measurement and with a quantity as an outcome/result.

While LB could have been used based on the general structure and specifications, we decided to design a custom Findings domain as it would serve the nature of these data better. This way we also avoided many missing values for expected reference range variables. We named the domain: “XL - Non-Safety Laboratory Results”, storing stool physiology data as well as microbiology data (and extensible if needed to other non-safety lab results). The following core set of variables were included: • XLTESTCD: Lab Test or Examination Short Name • XLTEST: Lab Test or Examination Name • XLORRES: Result or Finding in Original Units • XLORRESU: Original Unit • XLCAT: Category for Lab Test • XLSCAT: Subcategory for Lab Test • XLMETHOD: Method of Test or Examination

In order to represent all relevant collected laboratory data, several supplemental qualifiers were defined (ending up in SUPPXL): • XLPROBE: Probe Name • XLORLQLO: Lower Limit of Quantification in Orig Unit • XLORLQHI: Upper Limit of Quantification in Orig Unit • XLORLDLO: Lower Limit of Detection in Orig Unit As the name suggests, XLPROBE stores the probe name, when applicable to the analysis method. The Lower/Upper Limits of Quantification/Detection store the limits that are specific for the vendor, batch, method and test. Next step was to create an inventory of all source parameters in order to decide which ones represented the same test and could be mapped to one test code. In several cases, it was necessary to retrieve laboratory plans or other documentation in order to get a full specification of a test. This research ultimately resulted in the list of 66 unique tests. Subsequently we had to define standardized test codes (--TESTCD) and names (--TEST) for the faecal parameters. Of all 66 parameters, only 4 had matching terms in the NCI Controlled Terminology release of 2018-06-29. So a great many new test codes and names had to be developed by the team. This was typically a task involving expert input and review. Many of the CDISC codelists are extensible, and additional sponsor-defined terms are often defined in particular for test codes. CDISC provides rules for creating new terminology such as test names (--TEST), for example: • Test names should not exceed 40 characters. • Test names should not contain the specimen type. • Test names should not contain units of measure. • Test names should not contain methods of measurement. • Test names should not contain collection timing information. • For all differential test names, the numerator and denominator are spelled out as fully as possible (given the 40-character limitation) and separated by a forward slash. Do not use the words ‘ratio’ or ‘percentage’, as that is made clear in the definition. For creating test codes (--TESTCD) the following rules apply: • Test codes should not exceed 8 characters. • Make use of existing ‘Naming Fragments’ (e.g. using suffix ‘AC’ for a parameter containing the word ‘acid’). • For all differential test codes, the absolute count is a short-defined term and the ratio/percentage contains the same short mnemonic for the numerator followed by a second short mnemonic for the denominator. There is no forward slash in the test code. • For microbiology tests for targeted microorganisms, the first letter of the test code should be the first letter of the genus name.

For several parameters, the guidelines could be applied, for example when using suffix ‘AC’ for acids, and when dealing with relative measurements:

8 PhUSE 2018

Parameter --TESTCD --TEST Acetic Acid ACAC Acetic Acid Total Short Chain Fatty Acids SCFA Short Chain Fatty Acids %Acetic Acid (relative to Total Short Chain Fatty Acids) ACACSCFA Acetic Acid / Short Chain Fatty Acids

However, it was not always possible or sensible to follow guidelines completely in each parameter. First of all, the extensive microorganism names could not be captured within 8 characters for --TESTCD using the first letter of the genus name without producing duplicates. Neither could the description of the microorganism be captured in less than 40 characters (--TEST) without abbreviations. For example, parameter ‘Clostridium Histolyticum Group and Clostridium Lituseburense Group’ already contains well over 40 characters, but it is the minimum description of the parameter in order to be specific. Following the guidelines, the test name for the relative measurement of this parameter should also include a slash (“/”) and the base to which the relative measurement was calculated (e.g. ‘Clostridium Histolyticum Group and Clostridium Lituseburense Group / Total Bacteria’), resulting in an even longer description. Together with subject matter experts, multiple approaches for establishing terms in a systematic way were discussed. For the microbiology parameters analysed involving the use of probes, while realizing that methods should not be included in the test code/name, the probe names were used to define terms (e.g. CHIS150CLIT135) since they were very specific in defining these parameters. For other parameters where probe names were not applicable due to other use of methods, a systematic coding structure was introduced considering the genus and species names (but not limited to using the first letter) and taking into account whether the parameter targets a group, cluster, species, superspecies etc. (e.g. CAJEJUSP for Campylobacter Jejuni Sp. or BIBIFI for Bifidobacterium Bifidum). Knowing that these data will be only be used for internal purposes and not for submission, it was agreed to accept violation of the rule for maximum length for test codes and names. The 8 and 40 character limitation is the result of the constraints of SAS transport files in which variables names cannot exceed 8 characters and labels should not be longer than 40 characters. With the adoption of more modern transfer formats (e.g. XML), this limitation is likely to be relaxed in the future. Since raw lab data usually only contain measurement results, information on methods is often not directly available. This information had to be retrieved from study documentation or obtained by contacting relevant clinical study staff. In other cases the dataset names indicated the method name, e.g. ‘QPCR’ or ‘FISH’ standing for ‘Real-Time Polymerase Chain Reaction’ and ‘Fluorescent In Situ Hybridisation’ respectively. A completely different implementation matter was the timing of samples. The visit variables VISITNUM and VISIT are used in all domains to describe the visit associated with the collection of the data. Stool samples are often collected at home just before a study visit and handed in during the next visit. We used VISITNUM and VISIT to record the timing of sample hand-in at regular study visits, and the (planned) timepoint variables --TPT, --TPTNUM and – TPTREF (reference time point for --TPT and --TPTNUM) to refer to the moment when the stool sample was planned to be collected. For example, in one study visits occur at baseline (visit 1) and week 4 (visit 2), while stool samples are taken at baseline, week 1, and just before the second visit at week 4. Week 0 (baseline) 1 4 Visit Visit 1 Visit 2 Sample X X X We defined the visit variables VISIT and VISITNUM and the planned timing variable XLTPTNUM/XLTPT and XLTPTREF for this situation as follows:

Sample handed in Sample collected VISITNUM VISIT XLTPTNUM XLTPT XLTPTREF Stool 1 1 VISIT 1 0 DAY 0 START STUDY Stool 2 2 VISIT 2 1001 WEEK 1 START STUDY Stool 3 2 VISIT 2 1004 WEEK 4 START STUDY

All decisions made together with subject matter experts on the custom design and new terminology have been incorporated into a laboratory template at Nutricia. Data from new studies will be collected following the template, which greatly facilitates future data integration with the existing XL dataset.

CASE STUDY: STOOL CHARACTERISTICS AND GASTROINTESTINAL SYMPTOMS Stool characteristics and gastrointestinal (GI) symptoms present another set of data typical of nutrition research. These data are collected to evaluate product tolerability. Next CRF section gives an example of the components of these data and how they are collected.

9 PhUSE 2018

While often collected together on one CRF, these data cannot be mapped to single SDTM domain. Analyzing the various data elements reveals that GI symptoms, while captured as findings, are in fact (pre-specified) events, and thus should be mapped to an Events domain. Stool characteristics are measurements, hence Findings, but they are related to the event of a stool occurrence. We mapped these data to the CE (Clinical Events) and FA (Findings About) domains. The mapping of these data has been extensively described in a paper of Lieke Gijsbers (2017). In summary, to distinguish between number of stools counted on a single day, mostly assessed via a diary, and the average daily number of stools, mostly assessed via a CRF, different terms of FATESTCD (COUNT, A_COUNT) and FATEST (Count, Average Count) were used. A similar approach was used for characteristics per stool (e.g. FATESTCD = CONSIS, COLOR, VOLUME) versus general/average stool characteristics (e.g. FATESTCD = A_CONSIS, A_COLOR, A_VOLUME). GI symptoms data were collected as pre-specified events, combining the collection of the absence/presence of the event and the severity of the event. These data fit well with the CE domain. In some studies also the frequency of the GI symptoms was recorded. These additional variables were then mapped to the FA domain. In this case, the data originating from one CRF (and one raw dataset) was thus split in two domains.

CONCLUSION CDISC SDTM and related submission standards like define.xml offer a rich and mature framework for the standardization of clinical study data. Even if the purpose of standardization is not a submission, these standards can basically be applied as is. Certain aspects of submission standard have clearly been designed for data transmission and bulk validation, e.g. the concept of Supplemental Qualifiers and SAS transport files. For daily use the SUPPQUAL structure is not convenient as transformations are needed to prepare the data for analysis. Nevertheless, with the growing adoption of CDISC standards, we believe it pays off to conform as much as possible as tools become available for the engineering, validation, review and visualization of SDTM data. We therefore used SDTM conformance as a requirement, and implemented SUPPQUAL dataset. We deviated in a limited number of very specific cases where we thought that customization was needed to ensure that data remains meaningful and user friendly. As explained in this paper, this was in the case of special lab test (faecal parameters) for which we had to define over 60 controlled terms, and for which the limitation (linked to a transport file constraint) of 8 characters for test codes was too stringent. Further, we did not create transport files as it did not serve our needs. As a result, exceeding the maximum lengths of SDTM test codes and test names did not present an issue. Looking at the broader set of standards within submissions like Reviewer’s Guide and Legacy Data Conversion Plan and Report, we believe these are valuable concepts that can be equally useful for documentation purposes in a non- submission context. However its implementation may require some customization and maybe a more lean approach, but this is an area that we will need to study further. The project presented here involved a collaboration between Nutricia Research and OCS Life Sciences. Early in the process we discovered that more discussion and alignment on implementation choices and decisions was needed than we had initially anticipated. We found an efficient mode of working together. While SDTM knowledge was present on both sides, OCS prepared the implementation scenarios and the Nutricia Research team selected the option that fit best Nutricia’s needs or gave input for alternatives. New domains as well as new studies presented new challenges and earlier decisions needed to be revisited from time to time. Therefore, we continued this partnership approach throughout the project. Both sides experienced this way of working as engaging, synergetic and efficient, limiting the amount of rework and allowing a fast resolution of issues.

10 PhUSE 2018

ACKNOWLEDGMENTS We thank Roel Hobo, Elsbeth Verdonk and Heleen de Weerd, from Nutricia Research, and Bas van Bakel, Leah van der Meer and Jules van der Zalm, from OCS Life Sciences, for their contribution to the project. We are grateful for Leah’s support in writing this paper.

REFERENCES 1. Angelo Tinazzi and Cedric Marchand. An FDA Submission Experience Using the CDISC Standards. PhUSE 2017, Paper RG04. (www.phusewiki.org/docs/Conference 2017 RG Papers/RG04 Paper NEW.pdf) 2. Lieke Gijsbers. Mapping of Gastrointestinal Tolerance Data to SDTM. PhUSE 2017, Paper PP25 (www.phusewiki.org/docs/Conference 2017 PP Papers/PP25.pdf) 3. Fred Wood. The CDISC Study Data Tabulation Model (SDTM): History, Perspective, and Basics. PharmaSUG 2008, Paper RS10. (www.lexjansen.com/pharmasug/2008/rs/RS10.pdf)

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Author: Name Paul Vervuren Company: Nutricia Research Address: Uppsalalaan 12 City / Postcode: 3584CT Utrecht Email: Web: www.nutriciaresearch.com

APPENDIX

Definition of Probability, Impact and Non-detectability for risk assessment Probability 1 Issues cannot happen and did not happen in the past 2 Issue are known to happen in exceptional circumstances 3 Issue are known to happen with reasonable frequency 4 Issues happens frequently and re-occurs often 5 Issue are happening frequently or are reported often

Possible Impact 1 No possible impact can be identified neither on the subject/patient, user or on the data in whatever form 2 Slight delay in the production of results, results are correct 3 No results can be produced, or there is a loss of time for having correct results 4 Important delays in producing results that might endanger the subject/patient or slightly incorrect results are produced (e.g. way they are presented might be misinterpreted) 5 There is a direct safety hazard for the subject/patient; this can be due to corrupted results that are displayed

Non-detectability 1 Any issue or malfunction is immediately detected and reported; any possible malfunction is stopped automatically or eliminated 2 The issue is reported and requires intervention to allow continued functioning 3 The issue generates an alarm or signal but the intervention needed is not clear and short-cuts are possible or the user does not see any results appear while he is waiting for the result 4 The issue generates an alarm or signal but the issue continues anyway and no intervention is requested; in other cases the issue is only detectable a posteriori 5 An issue or malfunction can go completely unnoticed

11