DR RISH K PAI (Orcid ID : 0000-0002-1788-475X)

MRS CLAIRE PARKER (Orcid ID : 0000-0002-8587-5441)

DR BRIAN G FEAGAN (Orcid ID : 0000-0002-6914-3822)

DR ROHIT LOOMBA (Orcid ID : 0000-0002-4845-9991)

DR VIPUL JAIRATH (Orcid ID : 0000-0002-1092-0033)

Article type : Original Scientific Paper

TITLE: Standardizing the interpretation of biopsies in non-alcoholic clinical trials

SHORT TITLE: Interpretation of liver biopsies in NAFLD trials

AUTHORS: Rish K. Pai1, David E. Kleiner2, John Hart3, Oyedele A. Adeyi4, Andrew D. Clouston5, Cynthia A. Behling6, Dhanpat Jain7, Sanjay Kakar8, Mayur Brahmania9, Lawrence Burgart10, Kenneth P. Batts11, Mark A. Valasek12, Michael S. Torbenson13, Maha Guindi14, Hanlin L. Wang15, Veeral Ajmera16, Leon A. Adams17, Claire E. Parker18, Brian G. Feagan19, Rohit Loomba20, Vipul Jairath21 AUTHOR AFFILIATIONS: 1Scottsdale, AZ, USA; 2Bethesda, MD, USA; 3Chicago, IL, USA; 4Toronto, ON, Canada; 5Brisbane, QLD, Australia; 6San Diego, CA, USA; 7New Haven, CT,

This is the author manuscript accepted for publication and has undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may Author Manuscript lead to differences between this version and the Version of Record. Please cite this article as doi:

10.1111/APT.15503

This article is protected by copyright. All rights reserved 2

USA; 8San Francisco, CA, USA; 9London, ON, Canada; 10Minneapolis, MN, USA; 11Minneapolis, MN, USA; 12La Jolla, CA, USA; 13Rochester, MN, USA; 14Los Angeles, CA, USA; 15Los Angeles, CA, USA; 16La Jolla, CA, USA; 17Perth, WA, Australia; 18London, ON, Canada; 19London, ON, Canada; 20La Jolla, CA, USA; 21London, ON, Canada.

WORD COUNT: 3634

CORRESPONDENCE: Dr. Rish K. Pai, Department of Laboratory Medicine and , Mayo Clinic Arizona, 13400 E. Shea Blvd, Scottsdale, AZ, 85259; Phone: 480-301-4081; Fax: 480-301-9158; E-mail: [email protected]

LIST OF ABBREVIATIONS: NASH-CRN, NASH Clinical Research Network; NAS, NAFLD activity score, NAS; SAF score, Steatosis, Activity, Fibrosis score.

SUMMARY

Background: There is substantial variation in how histologic definitions and scoring systems of non-alcoholic fatty liver disease are operationalized.

Aims: Our objective was to develop a consensus-based framework for standardizing histologic assessment of liver biopsies in clinical trials of non-alcoholic fatty liver disease.

Methods: An expert panel of 14 liver pathologists and 3 hepatologists was assembled. Using modified RAND/University of California Los Angeles appropriateness methodology, 130 items derived from literature review and expert opinion were rated by each panel member on a 1 to 9 scale. Disagreement was defined as > 5 ratings in the lowest (1-3) and highest (7-9) categories. Items were classified as inappropriate (median 1-3.5 without disagreement), uncertain (median 3.5-6.5 or any median with disagreement) or appropriate (median 6.5-9 without disagreement). Survey results were discussed as a group before voting.

Results: Current measures of disease activity and fibrosis may not fully capture important features of non-alcoholic . It was determined that alternative methods to evaluate Author Manuscript ballooning degeneration are needed, however panellists were uncertain whether portal , degree of steatosis, and Mallory-Denk bodies are important measures of disease activity. Furthermore, it was felt that current staging systems do not capture the full spectrum of

This article is protected by copyright. All rights reserved 3

fibrosis in non-alcoholic steatohepatitis. A consensus definition and sub-stages for bridging fibrosis are needed. The severity of perisinusoidal fibrosis should be captured at all stages. Lastly, a method to evaluate features of fibrosis regression should be developed.

Conclusion: The operating properties of the modifications proposed should be evaluated prospectively to determine reliability and responsiveness.

KEYWORDS: Steatohepatitis, fibrosis, steatosis, ballooning degeneration, histopathology

SUMMARY WORD COUNT: 250

INTRODUCTION

Defined as evidence of hepatic steatosis in the absence of secondary causes of fat accumulation in the liver, 1 non-alcoholic fatty liver disease affects approximately one-quarter of adults worldwide and it is estimated that disease-related complications are increasing.2-4 Non- alcoholic fatty liver disease describes a spectrum of liver disease ranging from steatosis without hepatocyte injury (non-alcoholic fatty liver) to non-alcoholic steatohepatitis, which is characterized by steatosis coupled with inflammation and hepatocellular ballooning.5,6 Patients with non-alcoholic steatohepatitis have an increased risk of liver-related mortality and developing cirrhosis compared to those with non-alcoholic fatty liver.7 In the absence of effective medical treatment, it is anticipated that cirrhosis due to non-alcoholic steatohepatitis will become the leading worldwide cause for liver transplantation in the next decade.8,9 At present, there are no therapies approved by the U.S. Food and Drug Administration or the European Medicines Agency for the treatment of non-alcoholic steatohepatitis. However, several drugs are in early- or late-stage evaluation. Efficient development of new treatments for non- alcoholic steatohepatitis requires the use of objective measures of disease activity that are valid, reliable and responsive.10

While exploratory blood-based and imaging pharmacodynamic biomarkers have been

11

proposed, liver Author Manuscript biopsy remains the current gold standard for assessment of non-alcoholic fatty liver disease activity and fibrosis in clinical trials. Although fibrosis is not required for a diagnosis of non-alcoholic steatohepatitis, it is commonly present. Several histologic scoring systems for measuring microscopic disease activity and fibrosis exist. In 1999, Brunt et al. This article is protected by copyright. All rights reserved 4

developed a descriptive grading system for non-alcoholic steatohepatitis that separately grades necro-inflammatory activity and stages the extent of fibrosis.12 In 2003, Harrison et al. proposed a modification to the Brunt criteria to enable quantification of disease activity in a randomized controlled trial.13 The Non-Alcoholic Steatohepatitis Clinical Research Network (NASH-CRN) proposed a system for scoring the full spectrum of non-alcoholic fatty liver disease lesions in 2005.14 The non-alcoholic fatty liver disease activity score (NAS) is an unweighted index calculated by summing component scores for items considered potentially reversible in the short term, namely, steatosis, lobular inflammation, and hepatocyte ballooning. This system also separately evaluates fibrosis. The NAS is the most commonly used index in non-alcoholic steatohepatitis clinical trials although few studies have formally evaluated the operating properties of the NAS outside of the NASH-CRN.14-16 In 2011, Younossi et al. correlated various histologic features with liver-related mortality.17 While this comprehensive pathologic evaluation demonstrated strong correlation between fibrosis and liver-related mortality, its validity for use in clinical trials is unknown. Finally, in 2012, Bedossa et al. described the Steatosis, Activity, Fibrosis (SAF) scoring system to facilitate the classification of liver biopsies taken from morbidly obese patients.18 In this study, an algorithm based on the steatosis and activity scores was proposed to aid in distinguishing between non-alcoholic fatty liver and non-alcoholic steatohepatitis.

In late 2018, the U.S. Food and Drug Administration published draft guidance for industry regarding the development of therapies in non-cirrhotic patients with non-alcoholic steatohepatitis.19 Histologic evaluation is regarded as essential for determining therapeutic efficacy and determining phase IIb and III trial eligibility. Specifically, a NAS score greater than or equal to 4 with ballooning and lobular inflammation subscores of at least 1, in addition to a NASH-CRN fibrosis stage score of 2 or 3, are the recommended inclusion criteria. Three histologic endpoints have been proposed, including 1) resolution of steatohepatitis (defined by the U.S. Food and Drug Administration as a NAS lobular inflammation subscore ≤ 1, a ballooning subscore of 0, and any steatosis subscore), 2) improvement in liver fibrosis (defined

as a reduction in NASH-CRNAuthor Manuscript fibrosis stage ≥ 1 and no worsening of steatohepatitis), and 3) resolution of steatohepatitis and improvement in fibrosis.

Given the essential role of histopathology in determining the efficacy of new therapeutic agents for non-alcoholic steatohepatitis, we assembled a multidisciplinary panel of 17 liver This article is protected by copyright. All rights reserved 5

pathologists and hepatologists and conducted a two-round consensus process regarding the histologic assessment of liver biopsies in clinical trials using modified RAND/University of California Los Angeles appropriateness methodology.20 Factors that may lead to suboptimal evaluation of key outcome measures, and false negative trial results, were identified by the group. Standardizing the histologic assessment of liver biopsies in an effort to reduce variability and improve responsiveness has important implications for the large and increasing number of patients with non-alcoholic fatty liver disease participating in clinical trials and being managed in clinical practice.

MATERIALS AND METHODS

Item generation

According to recently published systematic reviews, 21,22 histologic outcomes have been reported in 37 randomized, placebo-controlled trials of adult patients with non-alcoholic steatohepatitis (Supplementary Table 1). Of these trials, 30 used the NAS system for assessing disease activity and fibrosis. Three studies used the NAS but evaluated fibrosis with the Ishak staging system.23 The remaining four studies used either a novel scoring system or the Brunt grading and staging system. Other scoring systems, including the SAF scoring system and Goodman scheme, have not been used in clinical trials but were included in the current study based on content validity.

Two reviewers (RKP and CEP) extracted items from the aforementioned scoring systems for inclusion in the survey. Items deemed relevant but not included in published indices were also added. The initial list of survey items was reviewed by VJ, CEP, RKP, and RL.

Expert Consensus Process

Recruitment of Experts

Fourteen experienced liver pathologists and three experienced hepatologists from the United States, Canada,Author Manuscript and Australia were invited to participate. Panellists were selected based on their publication record in non-alcoholic steatohepatitis, international reputation in liver pathology, and/or experience in trial design, drug development and clinical epidemiology. These

This article is protected by copyright. All rights reserved 6

criteria took precedence over global representation. After reviewing a list of experts in the above areas, final participant selection was performed by RKP, VJ, and RL.

Modified RAND/UCLA appropriateness methodology was used to assess the face validity (the extent to which an item appears to address the concept it purports to measure) and feasibility of items identified in the review. These initial items were circulated to the panellists for review prior to the initial meeting. RAND/UCLA appropriateness methodology employs a modified Delphi panel approach to combine the best available evidence with the clinical experience of relevant experts. 20 This process is widely accepted, iterative and evidence-based.

Initial meeting/survey and analysis of survey results

After the initial meeting, items were modified, and additional items were included based on comments from the expert panel. These items were circulated via online survey. Panellists anonymously rated the appropriateness of each item on a scale from 1 to 9 (1 = highly inappropriate, 9 = highly appropriate).

According to the RAND/UCLA manual, each survey item was classified as inappropriate, uncertain or appropriate based on the median panel rating and degree of panel disagreement (median 1 to 3.5 without disagreement = inappropriate; median 3.5 to 6.5 or any median with disagreement = uncertain; median 6.5 to 9 without disagreement = appropriate). 20 As disagreement is not explicitly defined for 17 panellists in the RAND/UCLA manual, the disagreement threshold for 14-16 panellists was used. Disagreement was considered present when five or more panellists rated appropriateness in each extreme 3-point region (1 to 3 and 7 to 9).

Second meeting/survey

Results of the initial survey were distributed to panellists and discussed in a moderated teleconference. Areas of disagreement regarding item appropriateness were identified and

panellists were askedAuthor Manuscript to explain the rationale behind their responses. In accordance with RAND/UCLA appropriateness methodology, no attempt was made to force the panel to consensus. The survey was revised based on the second panel meeting to improve clarity and a second survey was circulated (Supplementary Table 2). This article is protected by copyright. All rights reserved 7

RESULTS

Item Generation and survey

The survey items were grouped according to the following topics: disease definitions and basic principles, histologic scoring systems, steatosis, lobular inflammation, ballooning degeneration, and fibrosis. The initial draft survey included 89 items. During a preliminary teleconference, items were modified, and novel items were added to address the panellists’ comments. The first survey consisted of 121 items. The 17 panellists were asked to rate each item for appropriateness in clinical trials of non-alcoholic fatty liver disease on a scale from 1 to 9 (1 = highly inappropriate, 9 = highly appropriate). After the initial survey a second teleconference was held to discuss the results. A final survey consisting of 130 items (Supplementary Table 2) was then drafted and circulated. Overall, 88 items were regarded as appropriate, 39 as uncertain, and 3 as inappropriate.

Disease definitions, basic principles, and current histologic scoring systems

Items in these sections were focused on specimen adequacy, histologic definitions, and histologic features as they relate to assessment of disease activity. Key items, including the panellist ratings, are shown in Table 1.

The panel determined that liver biopsies should be assessed for adequacy, but there was uncertainty regarding what constitutes an adequate biopsy. Length or length plus width (with > 2.5 cm representing the ideal length and > 1 cm representing the minimum length) and number of portal tracts are among the factors that should be used to determine biopsy adequacy. Adequacy can be qualified as limited if the biopsy is less than or equal to 1 cm, excessively fragmented, and/or subcapsular. Ultimately, the pathologist’s judgement rather than strict criteria should determine biopsy adequacy.

Disease activity and fibrosis should be separately measured and not combined into a

composite index. Author Manuscript Haematoxylin & eosin and Masson’s trichrome (or other collagen staining protocols) should be considered mandatory for assessment of disease activity and fibrosis. Periodic acid-Schiff diastase, iron, and reticulin stains are not required but may provide useful supplementary information. This article is protected by copyright. All rights reserved 8

Despite the fluctuating disease course in non-alcoholic fatty liver disease the distinction between steatosis and steatohepatitis is clinically relevant.24 Greater than 5% steatosis in the absence of significant alcohol and other forms of fatty liver disease is the minimal requirement for a diagnosis of non-alcoholic fatty liver disease in a clinical trial setting. However, steatohepatitis can be diagnosed in the presence of less than 5% steatosis and advanced fibrosis. Ballooning degeneration plus lobular inflammation is the most appropriate criteria needed to distinguish steatosis from steatohepatitis; lobular inflammation alone is insufficient. It is uncertain if the presence of fibrosis in the baseline biopsy is sufficient to change a diagnosis from steatosis to steatohepatitis. It is also uncertain whether borderline steatohepatitis should be defined as evidence of steatosis and fibrosis without ballooning degeneration. However, borderline steatohepatitis should not be considered a distinct entity.

It is uncertain if the degree of steatosis and portal inflammation are important measures of disease activity in non-alcoholic fatty liver disease. Importantly, when assessing change in fibrosis and disease activity using biopsies taken from the same patient at two timepoints in a clinical trial, a direct comparison of the biopsies should be performed, provided that the pathologist is blinded to timepoint and treatment arm.

Panellists were also asked to rate the existing non-alcoholic fatty liver disease scoring indices. The SAF scoring system had the highest appropriateness rating followed by the NAS, although none was universally regarded as optimal.

Steatosis, lobular inflammation, and ballooning degeneration

The NAS, SAF, and Goodman steatosis categories were all considered adequate for discriminating levels of steatosis (Table 2). Steatosis can be measured as either the percentage of hepatocytes with a visible steatotic droplet or by the percentage of area of non-fibrotic parenchyma replaced by steatosis. Both methods were considered appropriate by the panellists. Importantly, assessment of steatosis should be made at low to medium magnification for both

methods. It is uncertainAuthor Manuscript whether microvesicular steatosis, size of macrovesicular steatotic droplet, and location of steatosis is important for measuring disease activity in non-alcoholic fatty liver disease.

This article is protected by copyright. All rights reserved 9

A focus of lobular inflammation should be specifically defined (Table 3). While the SAF score definition of two or more inflammatory cells (neutrophils, lymphocytes and other mononuclear cells, eosinophils, and microgranulomas) within the sinusoids or surrounding individual hepatocytes was rated as an appropriate definition, some panellists expressed concern regarding this low threshold. Alternate definitions of a focus of lobular inflammation are needed.

To adequately assess hepatocyte ballooning, a consensus definition is needed (Table 4). Multiple definitions of a ballooned hepatocyte exist. Defining a classic ballooned hepatocyte as “generally larger than the surrounding hepatocytes” with “distinctive rarified cytoplasm that is irregularly stranded and clumped” was considered to be most appropriate. It is uncertain whether non-classical ballooned hepatocytes and Mallory-Denk bodies are important for measuring disease activity in non-alcoholic fatty liver disease. While the Goodman, NAS, and SAF grades of ballooning were rated as appropriate, it was recognized that given the importance of ballooning degeneration, an alternate scoring system that more fully evaluates these cells is needed.

Fibrosis

Both the SAF and NASH-CRN indices measure fibrosis in a similar manner. This method of staging is considered adequate for discriminating between levels of fibrosis. However, some deficiencies are apparent in this system. First, when the stage is greater than or equal to 2, the severity of perisinusoidal fibrosis is not captured. Based on the panellist responses, the degree of perisinusoidal fibrosis should be evaluated in these higher stages. Furthermore, there is no consensus on the most appropriate definition of bridging fibrosis. Some panellists regard one definite bridge as necessary to diagnose bridging whereas others require two or more fibrous bridges. These results suggest that subdividing stage 3 based on the degree of fibrous bridges and nodularity may provide better staging.

The modified Ishak fibrosis staging system allows for discrimination of the severity of

bridging fibrosis andAuthor Manuscript cirrhosis. However, the Ishak staging system ignores perisinusoidal fibrosis staging and is therefore not appropriate for use in non-alcoholic fatty liver disease clinical trials. Panellists commented that it is better to integrate sub-staging of bridging fibrosis into staging systems designed for non-alcoholic fatty liver disease. Furthermore, the Ishak staging system This article is protected by copyright. All rights reserved 10

ignores perisinusoidal fibrosis. It is uncertain if a new staging system should be developed for non-alcoholic fatty liver disease that separately evaluates severity of portal and zone 3 fibrosis. However, a method for evaluating features of regression is needed.

DISCUSSION

Histologic evaluation of liver biopsies from patients with non-alcoholic fatty liver disease has two essential roles in the setting of a clinical trial: to ensure that patients meet the eligibility criteria and to evaluate response to therapy.5 Steatosis, lobular inflammation, and ballooning degeneration are the three main features that pathologists have used to measure disease activity and categorize non-alcoholic fatty liver disease. Of these three features, the degree of steatosis has been found to be the most reproducible, whereas agreement on lobular inflammation and ballooning degeneration is suboptimal, 14,16,25,26 which highlights the need for consensus definitions, reader training, and monitoring. The Brunt system 12 evaluates theses three features together to determine an overall grade while the Goodman system, 27 SAF score, 18 and NAS 14 evaluate these three features separately. The Harrison system evaluates only lobular inflammation and ballooning degeneration.13 It is important to emphasize that measures of disease activity according to the NAS, SAF, and Goodman system were regarded as adequate by the panellists. However, multiple areas of uncertainty were found to exist, thus highlighting the need for refinement and standardization of scoring practices.

Ballooning degeneration is regarded as an essential feature in the diagnosis of non- alcoholic steatohepatitis and for determining disease activity. This is consistent with the association between ballooning degeneration and presence of fibrosis, which in turn predicts adverse liver-related events in longitudinal studies.17,28-31 Resolution of ballooning degeneration is also a key endpoint in clinical trials. For these reasons, ballooning degeneration should be clearly defined. The most appropriate definition is a hepatocyte that is larger than the surrounding hepatocytes with a distinctive, rarified cytoplasm that is irregularly stranded and clumped. Despite the importance of ballooning degeneration, the 3-tier ballooning score

comprises only a smallAuthor Manuscript portion of the total NAS. The Brunt and Harrison systems incorporate degree of ballooning into the overall grade but do not separately score this feature. Similar to the NAS, the SAF score incorporates three tiers for discriminating ballooning but the criteria are different. The Goodman system has a 4-tier ballooning score but is the least studied of the This article is protected by copyright. All rights reserved 11 instruments. Given that ballooning has only been evaluated on a limited scale in published scoring systems, an alternate score that more fully evaluates these cells was considered necessary by the panellists. The NASH-CRN has been using an expanded 5-tier ballooning score in their database since 2010, 32 however, reliability data has not been published for this expanded system. Furthermore, one component of this expanded grading system – presence of “non- classical” ballooning – has not been clearly defined and the significance of these cells in non- alcoholic fatty liver disease is uncertain. Alternate methods of evaluating ballooning degeneration should be explored in future studies and correlated with clinical outcomes.

Lobular inflammation is considered an important measure of disease activity in non- alcoholic steatohepatitis; however, presence of lobular inflammation alone is not sufficient to change a diagnosis from steatosis to non-alcoholic steatohepatitis. Despite the lack of data supporting the role of lobular inflammation in predicting adverse outcomes, 17,28 the U.S. Food and Drug Administration has recommended that resolution of non-alcoholic steatohepatitis be based on the absence of ballooning degeneration with a lobular inflammation score less than or equal to one according to the NAS. This proposed definition highlights the importance of clearly defining a focus of lobular inflammation in clinical trials. The SAF system defines a focus as two or more inflammatory cells present within the sinusoids or the surrounding ballooned or apoptotic hepatocytes. A focus of lobular inflammation is not specifically defined in the NAS, Brunt, or Goodman systems. If lobular inflammation is to be used as a criterion for resolution of non-alcoholic steatohepatitis, future studies are needed to more clearly define this feature and how it should be measured.

While steatosis is used to define non-alcoholic fatty liver disease, it is regarded as the least important measure of disease activity. Nevertheless, steatosis often contributes more to the total NAS than either ballooning or lobular inflammation. It also heavily influences the Brunt grading scheme. The SAF score measures steatosis but separates steatosis from activity (lobular inflammation and ballooning degeneration). Given that steatosis alone has consistently been shown to not be predictive of adverse outcomes,25,27,28,30,33 it may not be necessary to include evaluation of steatosisAuthor Manuscript in an index that measures disease activity in non-alcoholic fatty liver disease.

This article is protected by copyright. All rights reserved 12

Numerous other histologic features are variably present in non-alcoholic fatty liver disease including, but not limited to, microvesicular steatosis, megamitochondria, portal inflammation, Mallory-Denk bodies, and acidophil bodies. Of these features, portal inflammation has consistently been shown to be associated with adverse outcomes and fibrosis.17,28,34-36 Portal inflammation is not part of the NAS or SAF systems. The Brunt and Harrison grading systems incorporate this feature into the overall grade. The Goodman scheme evaluates portal inflammation but does not incorporate this feature into the overall index. Future studies should consider refining existing scores to include portal inflammation in disease activity measurements of non-alcoholic fatty liver disease.

Previous studies have demonstrated that fibrosis is the most important feature in predicting adverse liver-related outcomes and all-cause mortality in non-alcoholic fatty liver disease.7,17,27,28,30,33,37,38 Therefore, patients with fibrosis are the target population for clinical trials in non-alcoholic steatohepatitis. Given the seminal importance of fibrosis in predicting liver-related and all-cause mortality, other features of non-alcoholic steatohepatitis have been regarded as less important to evaluate. However, it should be emphasized that fibrosis is the result of hepatocellular injury and not the primary insult in non-alcoholic steatohepatitis. Thus, fibrosis alone, without ballooning degeneration and lobular inflammation, in a baseline biopsy was not regarded as diagnostic of steatohepatitis by the majority of panellists. Moreover, improvements in fibrosis are strongly associated with improvement in disease activity including decreased ballooning degeneration, steatosis, and portal inflammation.34 For this reason, focusing only on a fibrosis endpoint in clinical trials without assessment of the underlying features that contribute to the development of fibrosis, as has been suggested, 27 is not ideal. Disease activity and fibrosis should be separately measured as both are important for evaluation of therapeutic efficacy.

In most adults, fibrosis begins in zone 3 as a result of hepatocellular ballooning degeneration.29,30 Portal and periportal fibrosis subsequently develop in association with portal inflammation and ductular reaction.31,36 Bridging fibrosis and cirrhosis follow. While the NASH-

CRN system capturesAuthor Manuscript fibrosis progression and was shown to be reproducible in the initial study, there are some gaps in this staging system. Most notably, the definition of bridging fibrosis is unclear. Dividing stage 3 into sub-stages of bridging could help refine how bridging fibrosis is evaluated. Furthermore, the degree of perisinusoidal fibrosis is not captured in the higher stages. This article is protected by copyright. All rights reserved 13

The severity of perisinusoidal fibrosis may be physiologically relevant and contribute to portal hypertension in the absence of advanced fibrosis.39 Finally, reproducibility of the NASH-CRN staging system was suboptimal in two subsequent studies.15,40 If fibrosis is divided into additional categories, reproducibility may only worsen, further highlighting the need for pathologist training and consensus definitions.

Change in disease activity and fibrosis scores currently drive assessment of primary outcomes for late-stage clinical trials. This is accomplished through comparison of disease activity and fibrosis scores before and after treatment. While this analysis will continue to be essential in the assessment of therapeutic efficacy, panellists also determined that it is appropriate to review paired biopsies from the same patient procured at screening and after treatment in a side-by-side analysis, with the reader blinded to treatment arm and time point. It may not be necessary to rescore these biopsies using a validated index. Rather the reader could simply determine if the biopsies are similar or different in terms of disease activity and fibrosis, and if different, select the biopsy with the least activity and/or fibrosis. Such comparisons are utilized routinely in clinical practice to determine whether disease is improving, stable, or worsening. Direct comparisons could be a more sensitive way to detect efficacy in small numbers of patients during early drug development. The utility of such an analysis could be explored in currently available datasets.

For a histologic index to function as an efficient endpoint in clinical trials, the component items must be clearly defined, and the instrument must be reliable and responsive. It is important to recognize that any improved sensitivity afforded by a modified or novel histologic index could result in reduced specificity and consequently, the approval of therapies with only a minimal clinical benefit. To avoid this outcome, modified and novel histologic indices should be validated by determining the correlation between the instrument and clinically meaningful endpoints (i.e., cirrhosis, hepatocellular carcinoma, liver transplantation, liver-related mortality, and overall mortality). Additionally, histologic indices should correlate with currently available noninvasive measures of disease activity and fibrosis and demonstrate improved responsiveness

in the therapeutic Author Manuscript arms of clinical trials without a corresponding increase in placebo rates. Finally, validation of any new or modified index should follow current U.S. Food and Drug Administration guidelines for surrogate endpoint development.41 This will require large-scale collaboration and dataset sharing across industry, academia, and regulatory agencies. This article is protected by copyright. All rights reserved 14

Our study has some limitations. First, the RAND/UCLA method does not force consensus and many items had uncertain ratings after two rounds of voting. Second, the 9-point scale is cumbersome, and the number of items scored may lead to panellist fatigue. Third, no face-to-face meeting occurred. The strength of our study lies in the inclusion of internationally recognized liver pathologists and hepatologists and the adoption of rigorous methodology to minimize bias.

In conclusion, we performed a consensus process using modified RAND/UCLA appropriateness methodology to help standardize histologic assessment of liver biopsies in clinical trials of non-alcoholic fatty liver disease. While areas of agreement were identified, multiple items were regarded as uncertain. Future studies aimed at improving definitions and defining thresholds for histologic features used to measure disease activity and fibrosis are needed in order to facilitate non-alcoholic fatty liver disease drug development.

REFERENCES

1. Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology 2018;67:328-357. 2. Younossi ZM, Koenig AB, Abdelatif D, Fazel Y, Henry L, Wymer M. Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, Author Manuscript incidence, and outcomes. Hepatology 2016;64:73-84. 3. Younossi Z, Anstee QM, Marietti M, et al. Global burden of NAFLD and NASH: trends, predictions, risk factors and prevention. Nat Rev Gastroenterol Hepatol 2018;15:11-20.

This article is protected by copyright. All rights reserved 15

4. Estes C, Anstee QM, Arias-Loste MT, et al. Modeling NAFLD disease burden in China, France, Germany, Italy, Japan, Spain, United Kingdom, and United States for the period 2016-2030. J Hepatol 2018;69:896-904. 5. Siddiqui MS, Harrison SA, Abdelmalek MF, et al. Case definitions for inclusion and analysis of endpoints in clinical trials for nonalcoholic steatohepatitis through the lens of regulatory science. Hepatology 2018;67:2001-2012. 6. Kleiner DE, Makhlouf HR. Histology of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis in adults and children. Clin Liver Dis 2016;20:293-312. 7. Singh S, Allen AM, Wang Z, et al. Fibrosis progression in nonalcoholic fatty liver vs nonalcoholic steatohepatitis: a systematic review and meta-analysis of paired-biopsy studies. Clin Gastroenterol Hepatol 2015;13:643-654. 8. Wong RJ, Aguilar M, Cheung R, et al. Nonalcoholic steatohepatitis is the second leading etiology of liver disease among adults awaiting liver transplantation in the United States. Gastroenterology 2015;148:547-555. 9. Noureddin M, Vipani A, Bresee C, et al. NASH leading cause of liver transplant in women: updated analysis of indications for liver transplant and ethnic and gender variances. Am J Gastroenterol 2018;113:1649-1659. 10. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987;40:171-178. 11. Cheung A, Neuschwander-Tetri BA, Kleiner DE, et al. Defining improvement in nonalcoholic steatohepatitis for treatment trial endpoints: recommendations from the liver forum. Hepatology 2019; e-pub ahead of print. 12. Brunt EM, Janney CG, Di Bisceglie AM, Neuschwander-Tetri BA, Bacon BR. Nonalcoholic steatohepatitis: a proposal for grading and staging the histological lesions. Am J Gastroenterol 1999;94:2467-2474. 13. Harrison SA, Torgerson S, Hayashi P, Ward J, Schenker S. Vitamin E and vitamin C treatment improves fibrosis in patients with nonalcoholic steatohepatitis. Am J Gastroenterol 2003;98:2485-2490. Author Manuscript 14. Kleiner DE, Brunt EM, Van Natta M, et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 2005;41:1313-1321.

This article is protected by copyright. All rights reserved 16

15. Juluri R, Vuppalanchi R, Olson J, et al. Generalizability of the nonalcoholic steatohepatitis clinical research network histologic scoring system for nonalcoholic fatty liver disease. J Clin Gastroenterol 2011;45:55-58. 16. Gawrieh S, Knoedler DM, Saeian K, Wallace JR, Komorowski RA. Effects of interventions on intra- and interobserver agreement on interpretation of nonalcoholic fatty liver disease histology. Ann Diagn Pathol 2011;15:19-24. 17. Younossi ZM, Stepanova M, Rafiq N, et al. Pathologic criteria for nonalcoholic steatohepatitis: interprotocol agreement and ability to predict liver-related mortality. Hepatology 2011;53:1874-1882. 18. Bedossa P, Poitou C, Veyrie N, et al. Histopathological algorithm and scoring system for evaluation of liver lesions in morbidly obese patients. Hepatology 2012;56:1751-1759. 19. U.S. Food and Drug Administration (FDA) Guidance for Industry. Noncirrhotic nonalcoholic steatohepatitis with liver fibrosis: developing drugs for treatment. In: Department of Health and Human Services, Center for Drug Evaluation and Research (CDER). Rockville, 2018: https://www.federalregister.gov/documents/2018/12/04/2018- 26333/noncirrhotic-nonalcoholic-steatohepatitis-with-liver-fibrosis-developing-drugs-for- treatment-draft. 20. Brook RH: The RAND/UCLA appropriateness method. Santa Monica: RAND Corporation, 1995. https://www.rand.org/pubs/reprints/RP395.html 21. Roskilly A, Taylor E, Jones RL, Rowe I. Fibrosis stage improvement in nonalcoholic steatohepatitis: A systematic review of placebo treated participants in randomized controlled trials. J Hepatol 2018;68: S579-S580. 22. Han MAT, Altayar O, Hamdeh S, et al. Rates of and factors associated with placebo response in trials of pharmacotherapies for nonalcoholic steatohepatitis: systematic review and meta-analysis. Clin Gastroenterol Hepatol 2019;17:616-629. 23. Ishak K, Baptista A, Bianchi L, et al. Histological grading and staging of chronic . J Hepatol 1995;22:696-699. 24. McPherson S, Hardy T, Henderson E, Burt AD, Day CP, Anstee QM. Evidence of Author Manuscript NAFLD progression from steatosis to fibrosing-steatohepatitis using paired biopsies: implications for prognosis and clinical management. J Hepatol 2015;62:1148-1155.

This article is protected by copyright. All rights reserved 17

25. Younossi ZM, Gramlich T, Liu YC, et al. Nonalcoholic fatty liver disease: assessment of variability in pathologic interpretations. Mod Pathol 1998;11:560-565. 26. Jung ES, Lee K, Yu E, et al. Interobserver agreement on pathologic features of liver biopsy tissue in patients with nonalcoholic fatty liver disease. J Pathol Transl Med 2016;50:190-196. 27. Younossi ZM, Stepanova M, Rafiq N, et al. Nonalcoholic steatofibrosis independently predicts mortality in nonalcoholic fatty liver disease. Hepatol Commun 2017;1:421-428. 28. Angulo P, Kleiner DE, Dam-Larsen S, et al. Liver fibrosis, but no other histologic features, is associated with long-term outcomes of patients with nonalcoholic fatty liver disease. Gastroenterology 2015;149:389-397. 29. Gramlich T, Kleiner DE, McCullough AJ, Matteoni CA, Boparai N, Younossi ZM. Pathologic features associated with fibrosis in nonalcoholic fatty liver disease. Hum Pathol 2004;35:196-199. 30. Matteoni CA, Younossi ZM, Gramlich T, Boparai N, Liu YC, McCullough AJ. Nonalcoholic fatty liver disease: a spectrum of clinical and pathological severity. Gastroenterology 1999;116:1413-1419. 31. Richardson MM, Jonsson JR, Powell EE, et al. Progressive fibrosis in nonalcoholic steatohepatitis: association with altered regeneration and a ductular reaction. Gastroenterology 2007;133:80-90. 32. Kleiner DE, Brunt EM, Belt PH, et al. Extending the ballooning score beyond 2: a proposal for a new balloon score. J Hepatol 2015;62:288A. 33. Hagstrom H, Nasr P, Ekstedt M, et al. Fibrosis stage but not NASH predicts mortality and time to development of severe liver disease in biopsy-proven NAFLD. J Hepatol 2017;67:1265-1273. 34. Brunt EM, Kleiner DE, Wilson LA, Sanyal AJ, Neuschwander-Tetri BA. Improvements in histologic features and diagnosis associated with improvement in fibrosis in nonalcoholic steatohepatitis: results from the nonalcoholic steatohepatitis clinical research network treatment trials. Hepatology 2018; e-pub ahead of print. Author Manuscript 35. Brunt EM, Kleiner DE, Wilson LA, et al. Portal chronic inflammation in nonalcoholic fatty liver disease (NAFLD): a histologic marker of advanced NAFLD-Clinicopathologic

This article is protected by copyright. All rights reserved 18

correlations from the nonalcoholic steatohepatitis clinical research network. Hepatology 2009;49:809-820. 36. Gadd VL, Skoien R, Powell EE, et al. The portal inflammatory infiltrate and ductular reaction in human nonalcoholic fatty liver disease. Hepatology 2014;59:1393-1405. 37. Ekstedt M, Hagstrom H, Nasr P, et al. Fibrosis stage is the strongest predictor for disease- specific mortality in NAFLD after up to 33 years of follow-up. Hepatology 2015;61:1547-1554. 38. Vilar-Gomez E, Calzadilla-Bertot L, Wai-Sun Wong V, et al. Fibrosis severity as a determinant of cause-specific mortality in patients with advanced nonalcoholic fatty liver disease: a multi-national cohort study. Gastroenterology 2018;155:443-457. 39. Buzzetti E, Hall A, Ekstedt M, et al. Collagen proportionate area is an independent predictor of long-term outcome in patients with non-alcoholic fatty liver disease. Aliment Pharmacol Ther 2019;49:1214-1222. 40. 40. Pavlides M, Birks J, Fryer E, et al. Interobserver variability in histologic evaluation of liver fibrosis using categorical and quantitative scores. Am J Clin Pathol 2017;147:364-369. 41. U.S. Food and Drug Administration (FDA) Surrogate endpoint resources for drug and biologic development. https://www.fda.gov/drugs/development-resources/surrogate- endpoint-resources-drug-and-biologic-development Author Manuscript

This article is protected by copyright. All rights reserved 19

TABLES

Table 1. Disease definitions and basic principles

Item R1 R2 R3 R4 R5 R6 R7 R8 R9 Summary Rating (MAD)

(1.1) All liver biopsies should be assessed for 0 0 0 0 0 0 0 0 17 9 (0.00) Appropriate adequacy.

(8.1) H&E and Masson’s trichrome stains should be 0 0 1 0 0 0 1 3 12 9 (0.65) Appropriate considered mandatory for assessment of disease activity and fibrosis.

(10.1) In the clinical trial setting, the presence of >5% 0 0 0 0 1 0 3 4 9 9 (0.82) Appropriate steatosis is a minimal requirement for a diagnosis of NAFLD (assuming alcohol and other forms of fatty liver disease are excluded).

(11.1.b) The minimum injury needed to distinguish 1 0 1 1 1 0 2 1 10 9 (1.65) Appropriate steatosis from steatohepatitis is: Ballooning degeneration plus lobular inflammation.

(11.1.c) The minimum injury needed to distinguish 9 1 0 1 2 3 1 0 0 1 (1.94) Inappropriate steatosis from steatohepatitis is: Lobular inflammation alone.

(14.1) Evidence of fibrosis is sufficient to change a 5 0 1 3 2 0 1 2 3 4 (2.59) Uncertain diagnosis from steatosis to steatohepatitis.

(16.1) The degree of steatosis is an important measure 4 0 2 2 5 2 1 0 1 5 (1.76) Uncertain of disease activity.

(17.1) The degree of hepatocellular ballooning is an 0 0 0 1 0 0 0 3 13 9 (0.47) Appropriate important measure of disease activity.

(18.1) The degree of lobular inflammation is an 0 0 0 1 3 1 6 0 6 7 (1.29) Appropriate important measure of disease activity.

(19.1) The degree of portal inflammation is an 1 1 0 1 3 4 3 3 1 6 (1.53) Uncertain important measure of disease activity. Author Manuscript (23.1) Disease activity and fibrosis should be separately 0 0 0 0 0 0 2 0 15 9 (0.24) Appropriate measured.

This article is protected by copyright. All rights reserved 20

(68.1) When assessing for change in fibrosis and 0 0 0 0 1 0 2 5 9 9 (0.76) Appropriate disease activity between 2 biopsies in a given subject, a direct comparison of the biopsies should be performed as long as the pathologist/reader is blinded to timepoint and treatment arm.

(26.1) The NAS is an optimal index for measuring 1 0 1 3 2 2 2 3 3 6 (1.94) Uncertain NAFLD disease severity.

(33.1) The Brunt system is an optimal index for 1 1 3 2 4 3 3 0 0 5 (1.41) Uncertain measuring NAFLD disease severity.

(36.1) The SAF system is an optimal index for 0 0 1 1 5 0 4 4 2 7 (1.47) Appropriate measuring NAFLD disease severity.

(44.1) The Goodman system is an optimal index for 1 2 0 3 6 0 3 2 0 5 (1.47) Uncertain measuring NAFLD disease severity.

(55.1) The Harrison system is an optimal index for 1 0 4 2 5 2 2 1 0 5 (1.35) Uncertain measuring NAFLD disease severity.

Abbreviations: MAD, mean absolute deviation from the median; NAFLD, non-alcoholic fatty liver disease; NAS, NALFD activity score; SAF, steatosis, activity, fibrosis score.

Table 2. Measuring steatosis in NAFLD

Item R1 R2 R3 R4 R5 R6 R7 R8 R9 Summary Rating (MAD) (20.1) Steatosis should be reported as percent of 1 0 0 1 1 2 5 1 6 7 (1.53) Appropriate hepatocytes with a visible steatotic droplet at low to medium (4X to 10X) power magnification.

(21.1) Steatosis should be Author Manuscript reported as percent area of 2 0 1 2 1 0 4 2 5 7 (2.12) Appropriate tissue replaced by steatosis evaluated at low to medium power.

This article is protected by copyright. All rights reserved 21

(22.1) The size of steatotic droplets should be reported 1 0 0 1 5 3 5 1 1 6 (1.29) Uncertain (predominately large droplet, predominately small droplet, or mixed small and large droplet).

(28.1.b) NAS steatosis “grade” categories adequately 0 0 1 1 1 0 3 6 5 8 (1.18) Appropriate discriminate levels of steatosis severity (0: <5%, 1: 5- 33%, 2 >33%-66%, 3: >66%).

(46.1.b) Goodman steatosis “grade” categories are 0 0 0 0 2 4 3 2 6 7 (1.29) Appropriate adequate for discriminating between levels of steatosis (0: 0%, 1: up to <5%, 2: 6-33%, 3: 34%-66%, 4: more than 66%).

(63.1.a) Microvesicular steatosis is of importance for 3 1 0 0 10 1 2 0 0 5 (1.18) Uncertain measuring disease activity in NAFLD.

(64.1.a) Location of steatosis is of importance for 2 3 2 1 3 2 2 0 2 5 (2.12) Uncertain measuring disease activity in NAFLD.

Abbreviations: MAD, mean absolute deviation from the median; NAFLD, non-alcoholic fatty liver disease; NAS, NALFD activity score.

Table 3. Measuring lobular inflammation in NAFLD

Item R1 R2 R3 R4 R5 R6 R7 R8 R9 Summary Rating (MAD)

(30.1.c) A focus of lobular inflammation should be 0 1 0 0 4 1 1 5 5 8 (1.53) Appropriate specifically defined.

(30.1.d) A focus should be defined as 2 or more 0 2 0 0 3 2 2 4 4 7 (1.76) Appropriate inflammatory cells (neutrophils, lymphocytes and other mononuclear cells, eosinophils, and microgranulomas) within the lobule present within the sinusoids or surrounding injured hepatocytes (ballooned or apoptotic hepatocytes).

(39.1.c) Lobular inflammation “Grade” categories (based 0 0 0 1 4 1 4 4 3 7 (1.29) Appropriate on average of all 20X fields) are adequate for discriminating between levels of lobular inflammation Author Manuscript (0: None; 1: ≤ 2 foci; 2: > 2 foci).

This article is protected by copyright. All rights reserved 22

(30.1.b) Lobular inflammation “Grade” categories (based 0 0 0 2 3 0 3 5 4 8 (1.41) Appropriate on average of all 20X fields) adequately discriminate levels of lobular inflammation severity (0: No foci; 1: <2 foci; 2: 2-4 foci; 3: >4 foci).

(47.1) Goodman lobular inflammation grade categories 1 0 0 1 2 4 6 1 2 7 (1.29) Appropriate (0: none, 1: mild, 2: moderate, 3: severe) are adequate for discriminating between levels of severity.

Abbreviations: MAD, mean absolute deviation from the median; NAFLD, non-alcoholic fatty liver disease.

Table 4. Measuring ballooning degeneration in NAFLD

Item R1 R2 R3 R4 R5 R6 R7 R8 R9 Summary Rating (MAD)

(31.1.a) A strict definition of a ballooned hepatocyte is 0 0 0 0 0 3 0 6 8 8 (0.82) Appropriate needed for adequate assessment of this feature.

(31.1.b) Classic ballooned hepatocytes should be defined 0 0 0 0 1 0 2 5 9 9 (0.76) Appropriate as hepatocytes that are “generally larger than the surrounding hepatocytes and have distinctive rarified cytoplasm that is irregularly stranded and clumped”.

(59.1.a) Mallory-Denk bodies are of importance for 2 1 1 1 1 4 5 1 1 6 (1.76) Uncertain measuring disease activity in NAFLD.

(60.1.c) Non-classic hepatocellular ballooning is an 1 2 2 2 3 4 2 0 1 5 (1.65) Uncertain important form of hepatocellular injury in NASH.

(31.1.d) NAS Ballooning “Grade” categories are 0 1 2 0 3 0 2 6 3 8 (1.76) Appropriate adequate for discriminating between levels of ballooning degeneration severity (0: none, 1: few balloon cells; 2: many cells/prominent ballooning).Author Manuscript

This article is protected by copyright. All rights reserved 23

(49.1) Goodman hepatocellular ballooning grade 1 0 0 0 1 5 3 3 4 7 (1.41) Appropriate categories (0: none, 1: rare, 2: frequent, 3: severe/numerous) are adequate for discriminating between levels of severity.

(60.1.a) Given the importance of ballooning 0 0 1 1 2 0 5 2 6 7 (1.47) Appropriate degeneration, an alternate scoring system is needed that more fully evaluates these cells.

(60.1.b) Alternatively, ballooning degeneration should be 0 0 0 1 2 3 4 4 3 7 (1.18) Appropriate measured as: 0: none 1: rare classic ballooned hepatocytes (one or 2 in the entire biopsy), 2: few individual classic ballooned cells identified every few 20X fields, 3: occasional small clusters of ballooned hepatocytes, 4: large clusters of classic ballooned hepatocytes, visible at low power.

(60.1.f) The NASH-CRN extended ballooning “Grade” 0 1 0 0 5 1 3 4 3 7 (1.53) Appropriate categories are adequate for discriminating between levels of ballooning severity and is scored as follows: 0: none1: non-classical ballooning (either few or many), 2: few classic balloon cells, 3: many classic balloon cells, but not severe, 4: Severe ballooning (many classical balloon cells visible from 4X).

Abbreviations: MAD, mean absolute deviation from the median; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; NAS, NALFD activity score; NASH-CRN, NASH Clinical Research Network. Author Manuscript

Table 5. Measuring Fibrosis in NAFLD

This article is protected by copyright. All rights reserved 24

Item R1 R2 R3 R4 R5 R6 R7 R8 R9 Summary Rating (MAD)

(32.1.b) NASH-CRN Fibrosis “Stage” categories are 1 0 1 1 0 2 4 2 6 7 (1.71) Appropriate adequate for discriminating between levels of fibrosis (stage 0: none; stage 1a: mild zone 3 perisinusoidal fibrosis; stage 1b: moderate zone 3 perisinusoidal fibrosis; stage 1c: portal/periportal fibrosis; stage 2: perisinusoidal and portal/periportal fibrosis; stage 3: bridging fibrosis; stage 4: cirrhosis).

(65.1. a) In NAS stage 2 and stage 3 fibrosis, the 0 0 0 1 4 0 5 4 3 7 (1.24) Appropriate predominant location of fibrosis is important.

(66.1.a) In NAS stage 2 and 3 fibrosis, the severity of 0 0 0 2 4 0 6 2 3 7.0 (1.29) Appropriate central/zone 3 fibrosis is important.

(32.1.d) Only 1 definite bridge is necessary for stage 3 2 0 0 4 5 0 0 4 2 5 (1.88) Uncertain fibrosis in the NASH-CRN system.

(32.1.e) Multiple (2 or more) definite bridges are 3 1 0 1 4 1 2 4 1 5 (2.18) Uncertain necessary for bridging fibrosis in the NASH-CRN system.

(77.1) In NASH-CRN Bridging Fibrosis, Stage 3 0 0 1 1 1 3 4 3 4 7 (1.35) Appropriate fibrosis should be divided into: 3a: one definite fibrous bridge, 3b: two or more fibrous bridges without nodule formation, and 3c: complex bridging with rare nodule formation.

(69.1) Modified Ishak fibrosis stage (0: no fibrosis; 1: 4 2 7 0 0 3 0 1 0 3 (1.41) Inappropriate expansion of some portal areas with or without short fibrous septa; 2: expansion of most portal areas with or without short fibrous septa; 3: expansion of most portal areas with occasional portal to portal bridging; 4: expansion of portal areas with marked bridging (portal- portal and/or portal-central); 5: marked bridging with occasional nodules (incomplete cirrhosis); 6: cirrhosis, probable or definitive) is a suitable index for measuring fibrosis in NAFLD.

(71.1) A method for evaluating features of regression of 0 1 0 0 0 0 3 5 8 8 (1.00) Appropriate fibrosis is needed.

(74.1.a) A new staging system is needed that 0 3 0 3 2 1 4 2 2 6 (2.00) Uncertain corresponds to the pathophysiologyAuthor Manuscript of steatohepatitis.

Abbreviations: NAFLD, non-alcoholic fatty liver disease; MAD, mean absolute deviation from the mean; NASH-CRN NASH Clinical Research Network; NAS, NAFLD activity score. This article is protected by copyright. All rights reserved 25

ACKNOWLEDGEMENTS

We would like to thank Leonardo Guizzetti (Robarts Clinical Trials, Inc.) for helping prepare the survey results.

STATEMENT OF INTERESTS

Authors’ declaration of personal interests:

RL has received funding support from NIEHS (5P42ES010337), NCATS (5UL1TR001442), and NIDDK (R01DK106419); RKP has received consulting income from Genentech, Eli Lilly, and Robarts Clinical Trials Inc.; DEK has no relevant conflicts of interest to disclose; JH has no relevant conflicts of interest to disclose; OAA has no relevant conflicts of interest to disclose; ADC has no relevant conflicts of interest to disclose; CAB has received consulting fees from ICON and Covance through Pacific Rim Pathology Group; DJ has no relevant conflicts of interest to disclose; SK has no relevant conflicts of interest to disclose; MB has received speaker fees from Merck; LB has no relevant conflicts of interest to disclose; KPB has no relevant conflicts of interest to disclose; MAV has no relevant conflicts of interest to disclose; MST has no relevant conflicts of interest to disclose; MG has no relevant conflicts of interest to disclose; HLW has no relevant conflicts of interest to disclose; VA has no relevant conflicts of interest to Author Manuscript disclose; LAA has no relevant conflicts of interest to disclose; CEP is an employee of Robarts Clinical Trials, Inc; BGF has received grant/research support from AbbVie Inc., Amgen Inc., AstraZeneca/MedImmune Ltd., Atlantic Pharmaceuticals Ltd., Boehringer-Ingelheim, Celgene

This article is protected by copyright. All rights reserved 26

Corporation, Celltech, Genentech Inc/Hoffmann-La Roche Ltd., Gilead Sciences Inc., GlaxoSmithKline (GSK), Janssen Research & Development LLC., Pfizer Inc., Receptos Inc./Celgene International, Sanofi, Santarus Inc., Takeda Development Center Americas Inc., Tillotts Pharma AG and UCB; consulting fees from Abbott/AbbVie, Akebia Therapeutics, Allergan, Amgen, Applied Molecular Transport Inc., Aptevo Therapeutics, Astra Zeneca, Atlantic Pharma, Avir Pharma, Biogen Idec, BioMx Israel, Boehringer-Ingelheim, Bristol-Myers Squibb, Calypso Biotech, Celgene, Elan/Biogen, EnGene, Ferring Pharma, Roche/Genentech, Galapagos, GiCare Pharma, Gilead, Gossamer Pharma, GSK, Inception IBD Inc, JnJ/Janssen, Kyowa Kakko Kirin Co Ltd., Lexicon, Lilly, Lycera BioTech, Merck, Mesoblast Pharma, Millennium, Nestle, Nextbiotix, Novonordisk, Pfizer, Prometheus Therapeutics and Diagnostics, Progenity, Protagonist, Receptos, Salix Pharma, Shire, Sienna Biologics, Sigmoid Pharma, Sterna Biologicals, Synergy Pharma Inc., Takeda, Teva Pharma, TiGenix, Tillotts, UCB Pharma, Vertex Pharma, Vivelix Pharma, VHsquared Ltd. and Zyngenia; speakers bureau fees from Abbott/AbbVie, JnJ/Janssen, Lilly, Takeda, Tillotts and UCB Pharma; is a scientific advisory board member for Abbott/AbbVie, Allergan, Amgen, Astra Zeneca, Atlantic Pharma, Avaxia Biologics Inc., Boehringer-Ingelheim, Bristol-Myers Squibb, Celgene, Centocor Inc., Elan/Biogen, Galapagos, Genentech/Roche, JnJ/Janssen, Merck, Nestle, Novartis, Novonordisk, Pfizer, Prometheus Laboratories, Protagonist, Salix Pharma, Sterna Biologicals, Takeda, Teva, TiGenix, Tillotts Pharma AG and UCB Pharma; and is the Senior Scientific Officer of Robarts Clinical Trials Inc; RL serves as a consultant or advisory board member for Arrowhead Pharmaceuticals, AstraZeneca, Bird Rock Bio, Boehringer Ingelheim, Bristol-Myer Squibb, Celgene, Cirius, CohBar, Conatus, Eli Lilly, Galmed, Gemphire, Gilead, Glympse bio, GNI, GRI Bio, Intercept, Ionis, Janssen Inc., Merck, Metacrine, Inc., NGM Biopharmaceuticals, Novartis, Novo Nordisk, Pfizer, Prometheus, Sanofi, Siemens, and Viking Therapeutics. In addition, his institution has received grant support from Allergan, Boehringer-Ingelheim, Bristol-Myers Squibb, Cirius, Eli Lilly and Company, Galectin Therapeutics, Galmed Pharmaceuticals, GE, Genfit, Gilead, Intercept, Janssen, Madrigal Pharmaceuticals, Merck, NGM Biopharmaceuticals, NuSirt, Pfizer, Prometheus, and Siemens. He is also co-founder of Liponexus, Inc; VJ has Author Manuscript received consulting fees from AbbVie, Takeda, Eli Lilly, Pfizer, Janssen, Ferring, Shire, Merck, GSK, Celltrion, and Robarts Clinical Trials Inc; serves as an advisory board member for

This article is protected by copyright. All rights reserved 27

AbbVie, Takeda, Janssen, Arena, GSK, Eli Lilly and Ferring; and has received speakers’ bureau fees from Takeda, AbbVie, Janssen, Pfizer, Shire and Ferring.

Declaration of funding interests: This study was supported in part by the Intramural Research Program of the NIH, National Cancer Institute.

AUTHORSHIP STATEMENT

Guarantor of the article: Dr. Rish K. Pai

Specific author contributions: Development of the study concept and design: RKP, BGF, RL, and VJ; Study supervision: RKP, VJ; Participation in the panel: RKP, ADC, CAB, DEK, DJ, HLW, JH, KPB, LB, LAA, MG, MAV, MST, OAA, SK, VA, and MB; Data collection and analysis: RKP and CEP; Drafting of the manuscript: RKP. All authors performed critical revision of the manuscript for intellectual content and approved the final draft.

All authors approved the final version of the manuscript including the authorship list. Author Manuscript

This article is protected by copyright. All rights reserved