<<

EVALUATION OF CARDIO-METABOLIC EFFECTS OF TREATMENT WITH INCRETIN-BASED THERAPIES IN PATIENTS WITH

Olga Montvida, MSc Student number: 9341625

Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Biomedical Sciences Institute of Health and Biomedical Innovation Faculty of Health Queensland University of Technology

2018

ABSTRACT

Type 2 diabetes (T2DM) is a chronic and progressive metabolic disorder with a complex and multifactorial pathophysiology. As patients with T2DM are at increased risk of cardiovascular (CV) complications and mortality, efficient disease management requires a holistic multi- faceted approach to control blood glucose, blood pressure, lipids, and body weight. While has been suggested as the first-line anti-diabetic drug (ADD), given the progressive nature of the disease, many patients eventually require intensification. International guidelines suggest multiple options for second- and third-line ADDs, including incretin-based therapies: dipeptidyl peptidase 4 inhibitor (DPP-4i) and glucagon-like peptide-1 receptor agonist (GLP- 1RA). While current disease management guidelines are primarily based on results from randomised controlled trials that are conducted on a protocol-driven selective patient population, the population-level evaluation of the effectiveness and safety of such therapies in the real-world practice would guide the patients and their carer’s in terms of choosing the right therapies for optimum disease management. Clinical studies have evaluated the possible beneficial association of treatment with novel anti-diabetic therapies with CV risk factors, however the real-world evidence on such aspects is scarce.

With a central focus on incretin-based therapies, the aims of this thesis were to explore the real- world patterns of (1) longitudinal changes in ADD choices, (2) population-level glycaemic control and its sustainability, and (3) the long-term cardio-metabolic risk factor burden.

Using a large database of Electronic Medical Records (EMRs) of the United States, six pharmaco-epidemiological and three methodological studies were conducted. A number of important findings were reported in high impact journals including Diabetes Care and Diabetes, Obesity, and Metabolism, with one publication receiving a dedicated review in the Nature Reviews journal.

Extensive methodological and data mining studies were performed to extract reliable data from voluminous EMRs and to develop efficient study designs and analysis approaches. One study was devoted to the data management of prescriptions, specifically to the estimation of treatment duration at individual patient-level accounting for intensifications and alterations with multiple therapies. Methodological challenges associated with robust identification of the patients with T2DM were addressed in another separate study. An exploratory analysis to

3 investigate the mechanisms and patterns of longitudinal missing risk factor data along with a comparative study of multiple imputation techniques for such data were conducted to account for the uncertainty due to missing values and to ensure the generalisability of study findings. Advanced statistical methodologies, such as “treatment effects modelling”, were performed throughout the thesis to ensure robust inferences drawn in the individual studies.

It was observed that the use of incretin-based drugs has increased since their approval in 2005, in particular the use of DPP-4i as a second-line choice. Patient profiles significantly varied by the class of chosen ADD, for instance GLP-1RA users were younger, had lower HbA1c level, and were more likely to be female, compared to other major ADD users. It was observed that around half of the patients with T2DM do not reach glycaemic targets and clear evidence of therapeutic inertia persists at population-level.

Patients who intensified metformin with incretin-based drugs or were more likely to achieve and sustain glycaemic control over 24 months of continuous treatment, compared to those treated with – the most popular intensification choice. A separate study investigated the outcomes of intensifying GLP-1RA with and reported beneficial cardio-metabolic effects of combining these therapies. Even though the popularity of newer therapeutic classes as second-line options was notably increasing, the longitudinal rates of intensification with a third-line ADD was not reduced significantly at the population level.

Neither glycaemic nor CV risk factor burden significantly improved over the last decade in patients with T2DM, even though most patients were using multiple drugs for glucose, blood pressure and lipid control. The long-term glycaemic burden consistently increased over time, and more than half of the patients with a history of CV disease continued to have uncontrolled blood pressure and lipids post-therapy initiation. Three out of five patients who are already receiving multiple anti-diabetic and cardio-protective drugs were failing to simultaneously control glucose and at least one CV risk factor. Compared to those who initiated second-line ADD with sulfonylurea and insulin, patients who intensified metformin with incretin-based therapies or thiazolidinedione were more likely to achieve simultaneous glucose and CV risk factor control. Treatment with GLP-1RA was associated with lower rates of major adverse macrovascular events, compared to other ADDs.

To conclude, this dissertation provides a detailed exploration and valuable insights of T2DM management in the real-world setting and highlights alarming rates of the existing cardio-

4 metabolic burden at the population level. Incretin-based therapies and thiazolidinedione were found to provide higher chances of sustainable glycaemic and CV risk factor control, and treatment with GLP-1RA appears to have a beneficial association with CV risk, compared to other anti-diabetic treatment options. Nonetheless, proper control in terms of timely intensification with anti-hyperglycaemic, anti-hypertensive, and anti-dyslipidemic therapies when needed, remains a key aspect to improve long-term outcomes in patients with T2DM.

5 KEYWORDS

Glucagon-like peptide-1 receptor agonist, dipeptidyl peptidase 4 inhibitor, incretin-based therapy, glycaemic control, cardiovascular risk, macrovascular event, type 2 diabetes mellitus, electronic medical records, longitudinal cohort study.

6 LIST OF PUBLICATIONS

 The following is a list of published or submitted manuscripts that have been incorporated into this thesis, thereby producing a thesis by publication.

Chapter 4: Olga Montvida, Ognjen Arandjelović, Edward Reiner, and Sanjoy K. Paul. Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. Open Bioinformatics, 2017, 10:1-15. DOI: 10.2174/1875036201709010001.x.

Chapter 6: Mayukh Samanta, Olga Montvida, Joanne Tropea, and Sanjoy K. Paul. A comparison of imputation methods for missing risk factor data from large real-world electronic medical records for comparative effectiveness studies. (Submitted)

Chapter 7: Olga Montvida, Jonathan Shaw, John J Atherton, Francis Stringer, Sanjoy K Paul. Long-term Trends in Antidiabetes Drug Usage in the US: Real-world Evidence in Patients Newly Diagnosed With Type 2 Diabetes. Diabetes Care. 2017 Nov 6:dc171414. DOI: 10.2337/dc17-1414.x.

Chapter 8: Olga Montvida, Jonathan Shaw, Lawrence Blonde, Sanjoy K Paul. Long-term sustainability of glycaemic achievements with second-line anti-diabetic therapies in patients with type 2 diabetes: A real-world study. Diabetes, Obesity, and Metabolism. 2018;20:1722– 1731. DOI: 10.1111/dom.13288.x.

Chapter: 9: Olga Montvida, Sanjoy K Paul. Cardiovascular risk factor burden and safety in patients with type 2 diabetes receiving intensified anti-diabetic and cardio-protective therapies. (Submitted)

 The following is a list of accepted and submitted manuscripts that are highly relevant to the work performed in this thesis and were developed throughout candidature.

Appendix A: Olga Montvida, Kerenaftali Klein, Sudhesh Kumar, Kamplesh Khunti, Sanjoy K. Paul. Addition of or switch to insulin therapy in people treated with glucagon‐like peptide‐ 1 receptor agonists: A real‐world study in 66 583 patients. Diabetes, Obesity and Metabolism. 2017 Jan 1;19(1):108-17. DOI: 10.1111/dom.12790.x.

7 Appendix B: Ebenezer S. Owusu Adjah*, Olga Montvida*, Julius Agbeve, Sanjoy K. Paul. Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus. The Open Bioinformatics Journal, 2017, 10: 16-27. DOI: 10.2174/1875036201710010016.x. *Joint first authorship.

Appendix C: Sanjoy K Paul, Jonathan Shaw, Olga Montvida, Kerenaftali Klein. Weight gain in insulin treated patients by BMI categories at treatment initiation: New evidence from real- world data in patients with type 2 diabetes. Diabetes, Obesity and Metabolism. 2016 Dec 1;18(12):1244-52. DOI:10.111/dom.12761.x.

Appendix D: Olga Montvida, Jennifer B Green, John Atherton, Sanjoy K Paul. Risk of Pancreatic Diseases by Second-line : Real World Evidence in 225,898 Type 2 Diabetes Patients. Diabet Med. 2018 Oct 10. doi: 10.1111/dme.13835.

 The following is a list of presentations and papers in refereed conference proceedings throughout candidature.

1. Olga Montvida, Sanjoy Paul. Cardiovascular risk factor burden and safety in patients with type 2 diabetes receiving intensified anti-diabetic and cardio-protective therapies. QIMR Berghofer Early Career Researcher Seminar, Brisbane, Australia, 18 May 2018.

2. Olga Montvida, Jonathan Shaw, Sanjoy Paul. Comparative assessment of glycaemic achievements with second-line anti-diabetes therapy intensification – Real world evidence based choices for patients and providers. Annual Meeting of the European Association for the Study of Diabetes (EASD), Lisbon, Portugal, 11-15 September 2017

3. Sanjoy Paul, Jennifer B Green, John Atherton, Olga Montvida. Risk of Pancreatic Diseases by Second-line Anti Diabetes Drug Class: Real World Based Evidence. Annual Meeting of the European Association for the Study of Diabetes (EASD), Lisbon, Portugal, 11- 15 September 2017

4. Ebenezer Adjah, Olga Montvida, Kamlesh Khunti, Sanjoy Paul. Interactive changes in cardiovascular risk factors and the long-term cardiovascular risk differ by adiposity levels in incident type 2 diabetes patients: real world study. Annual Meeting of the European Association for the Study of Diabetes (EASD), Lisbon, Portugal, 11-15 September 2017

8 5. Olga Montvida, Sanjoy Paul. Time to third-line anti-diabetes therapy intensification in patients receiving second-line GLP-1 receptor agonist, DPP-4 inhibitor and Sulfonylurea: A real-world study. The Australian Diabetes Society (ADS) and the Australian Diabetes Educators Association (ADEA) Annual Scientific Meeting 2017, Perth, Australia. 30 August - 1 September 2017.

6. Olga Montvida, Sanjoy Paul. Long-term glycaemic control with incretin-based therapies in patients with type 2 diabetes: real-world study. QIMR Berghofer Early Career Researcher Seminar, Brisbane, Australia, 23 June 2017.

7. Sanjoy K. Paul, Brian L. Thorsted, Michael L. Wolden, Kamlesh Khunti, Olga Montvida. Delay in Treatment Intensification Increases the Risk of Cardiovascular Events in Patients with Type 2 Diabetes. Cardiovascular Research Showcase, Brisbane, Australia, 22 November 2016.

8. Sanjoy Paul, Jonathan Shaw, Olga Montvida, Kerenaftali Klein. Obese Patients Gain Less Weight than Non-obese Patients when Treated with Insulin, with Similar HbA1c Reductions: New Evidence from Real-world Data in Type 2 Diabetes. Annual Meeting of the European Association for the Study of Diabetes (EASD), Munich, Germany, 12-16 Sep, 2016.

9. Olga Montvida, Sanjoy Paul. Addition or Switch to Insulin Therapy in People Treated with GLP-1 Receptor Agonists: A Real World Study in 66,583 Patients. Australian Diabetes Society and Australian Diabetes Educators Association Annual Scientific Meeting, Gold Coast, 24-26 Aug, 2016.

10. Olga Montvida, Sanjoy Paul. Real World Outcomes of Addition or Switch to Insulin Therapy in People Treated with GLP-1 Receptor Agonist. QIMR Berghofer Student Symposium, 15 Jul 2016.

11. Sanjoy K. Paul, Jonathan Shaw, Kerenaftali Klein, Olga Montvida. Obese T2DM Patients Gain Less Weight with Insulin Treatment Compared with Normal and Overweight Patients: New Evidence from Real-World Data. American Diabetes Association (ADA) 76th Scientific Sessions, New Orleans, USA, 10-14 June 2016.

9 12. Olga Montvida, Kerenaftali Klein, Sanjoy K. Paul. Evaluation of the Cardio-metabolic Effects of Treatment with Incretin-based Therapies in Patients with Type 2 Diabetes. 8th Biennial QIMR Berghofer Student Retreat, Canungra, Australia, 17-18 September 2015.

13. Kerenaftali Klein, Olga Montvida, Sanjoy K Paul. Real World Glucose and Weight Control in Patients Treated with GLP-1 Receptor Agonists, with Addition or Treatment Change to Insulin. Annual Meeting of the European Association for the Study of Diabetes (EASD), Stockholm, Sweden, 14-18 September 2015.

10 STATEMENT OF ORIGINAL AUTHORSHIP

The work contained in this thesis has not been previously submitted to meet requirements for an award at Queensland University of Technology or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Olga Montvida 6 Nov 2018

QUT Verified Signature

11 ACKNOWLEDGEMENTS

I would like to express sincerest and deepest gratitude to my principal supervisor, Professor Sanjoy Ketan Paul. Thank you for trusting in my abilities to conduct this project from the very beginning and for such a close mentorship and inspiration over the past three years. There was a lot of hard work, tons of coffee, and a lot of fun. We are no Michelangelo, but the quote reflects it perfectly: “If people knew how hard I worked to achieve my mastery, it wouldn't seem so wonderful after all".

Deep thanks to my associate supervisors, Professor Ross Young and Professor Louise Hafner for supporting me and advising me from the first day of my enrolment. To all former colleagues at QIMR Berghofer - Kerenaftali Klein, Julius Agbeve, Mayukh Samanta, Margaret Haughton, and Gunter Hartel. You were here to help me during good and bad times, thank you for that. To my PhD buddy Ebenezer Senyo Owusu Adjah, who never refused to share with me his materials, thoughts, and advice.

I gratefully acknowledge the financial support received through the scholarship from Queensland University of Technology. I also want to thank QIMR Berghofer Medical Research Institute for giving me working space and granting a research top-up scholarship. Big thank you to every employee working for these two institutions who ensured that I felt safe and comfortable in a new country.

Finally, thanks to those who are always in my heart – family and friends. Your love contributed to this dissertation much more than you think. Special thanks to my brother, who kept boosting my self-confidence and fighting spirit.

It’s been an amazing journey. With confidence I may now say that a significant (p<0.01 😊) development of myself as a scientist and as a person has been achieved.

12 TABLE OF CONTENTS

Abstract ...... 3 Keywords ...... 6 List of Publications ...... 7 Statement of Original Authorship ...... 11 Acknowledgements ...... 12 List of Figures ...... 14 List of Tables ...... 15 List of Supplementary Material ...... 16 List of Abbreviations ...... 17 Chapter 1: Introduction ...... 18 1.1 Diabetes Mellitus ...... 18 1.2 Epidemiology of Diabetes Mellitus ...... 19 1.3 Complications of Type 2 Diabetes ...... 20 1.4 Treatment of Type 2 Diabetes ...... 22 1.5 Incretin-based therapies ...... 23 1.6 Glycaemic effects of incretin-based therapies ...... 24 1.7 Cardio-metabolic effects of incretin-based therapies ...... 25 1.8 Aims and Objectives ...... 25 1.9 Methodological Background ...... 27 1.10 Thesis structure and logics ...... 29 Chapter 2: Literature Review ...... 31 2.1 Clinical trials ...... 31 2.2 Observational studies ...... 3 4 2.3 Conclusions and implications ...... 36 Chapter 3: Data Description ...... 38 3.1 Centricity Electronic Medical Records ...... 38 3.2 Medication data ...... 39 3.3 Disease data ...... 43 3.4 Laboratory, clinical, and anthropometric data ...... 45 3.5 Ethics approval ...... 47 Chapter 4: Medication Data Extraction ...... 48 Chapter 5: Diabetes Mellitus Cohort ...... 65 5.1 Diagnostic codes ...... 66 5.2 Supervised machine learning ...... 66 5.3 Final cohort ...... 68 5.4 Representativeness of diabetes cohort ...... 69 5.5 Type 2 diabetes cohort ...... 70 Chapter 6: Imputation of Longitudinal Observation Data ...... 73 Chapter 7: Trends in Anti-diabetic Drug Prescribing Patterns ...... 98 Chapter 8: Glycaemic Control and Sustainability ...... 110 Chapter 9: Cardio-metabolic Risk Factor Burden and Safety ...... 122 Chapter 10: Discussion and Conclusions ...... 148 Bibliography ...... 153 Appendices ...... 165

13 LIST OF FIGURES

Figure 3.1. Schematic representation of the data in CEMR database...... 39 Figure 3.2. Schematic diagram of identifying list of medication keys for ...... 42 Figure 3.3. Schematic diagram of arranging longitudinal risk factor data...... 46 Figure 5.1. Cohort of patients with T2DM and distribution of identified sub-types...... 66 Figure 5.2. Selected Decision Tree algorithm...... 68

14 LIST OF TABLES

Table 1.1 Incretin-based approved in the US and EU ...... 24 Table 1.2 Possible sources of bias in Electronic Medical Record data ...... 29 Table 2.1 Completed cardiovascular outcome trials for DPP-4i in patients with type 2 diabetes ...... 31 Table 2.2 Completed cardiovascular outcome trials for GLP-1RA in patients with type 2 diabetes ...... 32 Table 2.3 Summary of observational CV-outcome studies of treatment with incretin-based therapies ...... 35 Table 3.1 Therapeutic Class and highest corresponding ATC code ...... 40 Table 3.2 Diseases, ICD codes, and Weights used to compute Charlson Comorbidity Index...... 44 Table 5.1 Features Selected as Best Diabetes Predictors in CEMR ...... 67 Table 5.2 Performance of Machine Learning Algorithms on the Training Dataset ...... 68 Table 5.3 Characteristics of patients with diabetes in the CEMR database and in the National Diabetes Statistics report, 2015 ...... 69 Table 5.4 Baseline characteristics among adults with T2DM ...... 71 Table 5.5 Exposure to medications any time during available follow-up among adults with T2DM ...... 72

15 LIST OF SUPPLEMENTARY MATERIAL

Appendix A: Addition of or switch to insulin therapy in people treated with glucagon‐like peptide‐1 receptor agonists: A real‐world study in 66 583 patients.

Appendix B: Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus.

Appendix C: Weight gain in insulin treated patients by BMI categories at treatment initiation: New evidence from real-world data in patients with type 2 diabetes.

Appendix D: Risk of Pancreatic Diseases by Second-line Drug Class: Real World Evidence in 225,898 Type 2 Diabetes Patients.

16 LIST OF ABBREVIATIONS

ADA American Diabetes Association ADD Anti-diabetic Drug ATC Anatomical Therapeutic Chemical Classification AUC Area Under Receiver Operating Characteristic curve BMI Body Mass Index CCI Charlson Comorbidity Index CEMR Centricity Electronic Medical Record CI Confidence Interval CPM Cardio-protective Medication CPU Central Processing Unit CV Cardiovascular CVD Cardiovascular Disease DCCT Diabetes Control and Complications Trial DM Diabetes Mellitus DPP-4 Dipeptidyl Peptidase-4 DPP-4i Dipeptidyl Peptidase-4 inhibitor EMA Europe and Medicines Agency EMR Electronic Medical Record FDA Food and Drug Administration GIP Glucose-dependent Insulinotropic Polypeptide GLP Glucagon-like Peptide-1 GLP-1RA Glucagon-Like Peptide-1 Receptor Agonist HbA1c Glycated Haemoglobin HR Hazard Ratio ICD-10 International Classification of Diseases 10th Revision ICD-9 International Classification of Diseases 9th Revision IDF International Diabetes Federation INS Insulin LDL Low-density Lipoprotein MACE Major Cardiovascular Event MET Metformin ML Machine Learning NDS National Diabetes Statistics OR Odds Ratio RCT Randomised Controlled Trial SBP Systolic Blood Pressure SD Standard Deviation SGLT-2i Sodium Glucose co-Transporter 2 inhibitor SU Sulfonylurea T2DM Type 2 diabetes Mellitus TZD Thiazolidinedione UKPDS The UK Prospective Diabetes Study

17 Chapter 1: Introduction

1.1 DIABETES MELLITUS

Diabetes mellitus (DM) is a group of metabolic disorders characterised by defects in insulin secretion or action that leads to increased blood glycose levels (hyperglycaemia) [1]. It is a chronic disease with increasing prevalence, currently affecting around 9% of the global adult population [2].

Modern aetiologic classification of DM consists of four categories: Type 1, Type 2, Gestational, and Other Types [1, 3]. Absolute insulin deficiency that resulted from autoimmune or idiopathic β-cell destruction is usually classified as Type 1 diabetes. Type 2 diabetes mellitus (T2DM) is generally characterised by relative (rather than absolute) insulin deficiency, and it is attributable for 90-95% of all DM cases [1, 4]. Gestational diabetes occurs in women during gestation. Other specific types of diabetes caused by genetic defects of β-cell function, diseases of pancreas, and other aetiologies are grouped together. or borderline diabetes, is referred when blood glycose levels are higher than normal, but do not reach DM diagnostic threshold [1, 5].

1.1.1 Diagnosis of Type 2 Diabetes Early symptoms of T2DM are polyuria, polydipsia, and polyphagia. Symptoms also may include fatigue, headaches, trouble concentrating, blurred vision, and weight loss. American Diabetes Association guidelines recommend three tests to diagnose T2DM [1]:

 Fasting Plasma Glucose ≥ 126 mg/dL [7.0 mmol/L],

 2-hour Plasma Glucose ≥200 mg/dL [11.1mmol/L] during Oral Glucose Tolerance Test,

 Glycated Haemoglobin ≥ 6.5% [48 mmol/mol].

For fasting plasma glucose test, fasting is defined as no caloric intake for at least 8 hours. Oral glucose tolerance test is defined as glucose load containing the equivalent of 75 gram anhydrous glucose dissolved in water. Glycated Haemoglobin (HbA1c) test reflects average plasma glucose level concentrations over approximately 3 months. HbA1c is a measurement of the percentage of haemoglobin A molecules that formed a stable ketoamine linkage between the amino terminal valine residue of the beta chain and a glucose moiety [6]. The method that

18 is certified by National Glycohemoglobin Standardization Program and standardised to the Diabetes Control and Complications Trial (DCCT) assay, should be used to perform the HbA1c test. Random plasma glycose of more than 200 mg/dL [11.1 mmol/L] may also be used to diagnose T2DM in patients with classic symptoms of hyperglycaemia [1].

For all tests, a second test (same or a different) is recommended to be immediately conducted with new blood sample to confirm the diagnosis.

1.2 EPIDEMIOLOGY OF DIABETES MELLITUS

Global prevalence of DM has increased fourfold during last 30 years. According to the 2017 world wide survey conducted by International Diabetes Federation (IDF), more than 425 million individuals (equivalent to 1 in 11 adults) have diabetes, and 1 in every 2 adults with diabetes is undiagnosed (~212 million) [7, 8]. In the US, the age-adjusted (20-79 years) prevalence of diabetes in 2017 was 10.8%, while about 11.5 million individuals were estimated to have undiagnosed diabetes [8]. More than three quarters of people with DM live in low and middle-income countries, and most of them are 20 to 64 years old. Over a million of children and adolescents are suffering from type 1 diabetes.

About 12% of global health expenditure is spent on DM management [8]. In the US, a quarter of total health expenditures was estimated to be spent on DM management [9]. American Diabetes Association estimated cost of diagnosed DM as USD 327 billion in 2017: 72% direct medical costs and 28% in reduced productivity [10]. Among Australians with DM aged 20–65 years, Magliano and colleagues estimated productivity-adjusted life years lost to DM by 12.2% and 11.0% for men and women respectively [11]. Bommer and colleagues modelled economic burden of DM under various scenarios and reported an increase in the costs as a share of global GDP from 1.8% (1.7–1.9) in 2015 to 2.2% (2.1–2.2) in 2030 [12].

In 2004 the World Health Organisation (WHO) provided an estimate of diabetes prevalence in 2000 and conducted forecasting for diabetes till 2030 – 171 million in 2000 and estimated 366 million in 2030 [13]. In practise, these estimates appeared extremely underestimated, as in 2017 there were already 425 million people with DM. The IDF projects the prevalence of diabetes to rise to 642 million by 2040. However, these estimates may be again underestimated, as IDF extrapolates prevalence for countries with missing data from various less reliable sources [7, 14].

19 Advances in epidemiological research on DM lead to better understanding of various risk factors associated with development of T2DM. The determinants of T2DM consist of many contrasting and interacting genetic, epigenetic and lifestyle factors [14]. The risk of T2DM development increases with age, body mass index (BMI), and with sedentary lifestyle. Also, high-calorie diet leading to excess body fat, hypertension, and dyslipidaemia is considered to be a major contributor to the disease burden. People with a history of diabetes in first- and second- degree relatives have an increased risk of developing T2DM.

Ethnic minorities in the US and Australia have a higher risk of developing T2DM compared to non-minority individuals [1]. South Asians develop diabetes earlier and at lower BMI levels, compared to Western population [15, 16]. In India, 72 million people were estimated to have DM in 2017, and 123.5 million were predicted to have DM by 2040. A population based survey conducted in China in 2010 suggests that about 12% of the adult population had diabetes and about 50% of total population had pre-diabetes (defined as 2-hour oral glucose tolerance levels 140-199 mg/dL [7.8–11.0 mmol/l], and impaired fasting glucose, defined as fasting glucose levels 100-125 mg/dL [5.6–6.9 mmol/l]) [17].

The estimated number of females (20-79 years) living with DM in 2017 is 204 million. Gestational diabetes, defined as hyperglycaemia onset or first recognition during pregnancy, significantly increases the risk of T2DM development in both the woman and the child. According to IDF 2017 estimates, about 16% of live births had some form of hyperglycaemia in pregnancy, and 1 in every 7 births was affected by gestational diabetes. Compared to women who did not have gestational diabetes, a 7-fold increased risk of developing T2DM was observed in those who did have it [18-20]. In the children of women with gestational diabetes, exposure to intrauterine hyperglycaemia was associated with an 8-fold risk of developing diabetes/prediabetes at 19-27 years of age [21].

American Diabetes Association estimated that people with DM have more than twice higher medical expenditures than it were without presence of DM [10]. The costs of DM present immense problem to patients, health systems, and community in general [9]. In the US, people with DM spend on average USD 16,750 per year, 57% of which are attributable to diabetes [10].

1.3 COMPLICATIONS OF TYPE 2 DIABETES

Patients with T2DM are at increased risk of developing a number of comorbidities and life- threatening complications [22]. The short- and long-term complications associated with T2DM

20 are many. Traditionally, macrovascular (cardiovascular) diseases and microvascular diseases have been considered as the primary complications associated with T2DM. While a number of clinical trials and epidemiological outcome studies have established the significantly increased microvascular risk in patients with T2DM [23, 24], the evidence of the long-term macrovascular benefits of tight glucose control in patients with T2DM is less clear [25-27].

1.3.1 Microvascular complications Microvascular complications of long-term hyperglycaemia occur due to damage to small blood vessels leading to neuropathy, retinopathy, and nephropathy. Diabetic neuropathy is characterised by progressive loss of nerve fibres affecting the peripheral nerves and the autonomic neurons. It was estimated that 60-70% of people with DM develop some form of neuropathy [28]. Diabetes-associated longstanding peripheral neuropathy increases the chance of foot ulcer (“diabetic foot”), infection and eventual need for limb amputation [29]. Diabetic retinopathy affects the peripheral retina and/ or macula leading to partial or total vision loss. It was estimated that 2.6% of global blindness can be attributed to diabetes [30]. Diabetic nephropathy is characterised by angiopathy of the capillaries in the kidney glomeruli. It is the most common cause of kidney failure in developed countries [28]. The first indication of nephropathy is typically microalbuminuria, which further worsens to albuminuria (at rate 2- 3% per year), and eventually leads to renal failure [29]. The risk of microvascular complications increases with age, DM duration, and blood glucose level. Randomised controlled trials (RCT), including Diabetes Control and Complications Trial (DCCT) and The UK Prospective Diabetes Study (UKPDS), have indicated that tight glycaemic control (HbA1c <7% [53 mmol/mol]) in patients with diabetes reduces risk of microvascular complications [31-34]. Lowering HbA1c to 6% [42 mmol/mol] is associated with further reductions in the risk of microvascular complications, although at a much smaller pace [1].

1.3.2 Macrovascular complications Atherosclerosis of large vessels due to long-term hyperglycaemia leads to ischaemic heart disease, cerebrovascular disease (stroke), and peripheral vascular disease. Cardiovascular disease (CVD) is a leading cause of death and disability among people with T2DM [22, 35]. While patients with T2DM have an increased cardiovascular (CV) risk profile, DM is considered as an independent CVD risk factor [36].

In patients with T2DM, CVD develops about 14 years earlier with great severity, compared to individuals without diabetes [37-39]. After controlling for traditional CV risk factors, patients

21 with T2DM have more than twice the risk of major CV events compared to the general population [40]. In patients with T2DM, peripheral artery disease (occlusion of the lower- extremity arteries) and heart failure (impaired cardiac pump function) are the most common initial manifestations of CVD [41]. Risk of CVD increases with age and DM duration. Some studies report that the presence of microvascular complications increases the risk of CVD as well [28]. Several large trials (ACCORD, ADVANCE, VADT, DCCT, UKPDS), designed to address the CVD related concerns have shown no beneficial effect of tight glucose control on CV events [33, 42-44]. Nonetheless, follow-up data, meta-analyses, and prospective observational studies suggest positive effects of tighter glycaemic targets on CVD risk, especially in those with shorter DM duration and no history of severe hypoglycaemia [1, 22, 45, 46].

1.4 TREATMENT OF TYPE 2 DIABETES

Glycaemic targets:

According to International and American Diabetes Association guidelines [1, 47], adults with T2DM are recommended to achieve HbA1c < 7% [53 mmol/mol]. Selected adults with shorter duration of diabetes and no significant CVD, may be recommended to maintain HbA1c < 6.5% [48 mmol/mol]. Less stringent targets (e.g. < 8% [64 mmol/mol] may be appropriate for patients for whom the 7% target is difficult to achieve due to extensive comorbidities, history of severe hypoglycaemia, and/or limited life expectancy.

HbA1c testing is recommended at least twice a year for patients who meet treatment targets, and quarterly for those who do not meet targets or changed therapy.

Lifestyle modifications including dietary considerations and physical activity are initially recommended to prevent or delay T2DM [1]. Metformin (MET), if not contraindicated, is widely accepted as the first choice as anti-diabetic drug (ADD) since it does not cause weight gain or hypoglycaemia and may improve macrovascular outcomes [48]. However progressive deterioration of diabetes generally leads to the need for further treatment intensification. In patients with new diagnosis of T2DM, the UKPDS study has shown that approximately half of the people maintain acceptable glucose level after 3 years of monotherapy, however after 9 years the proportion declines to only one quarter of patients [49]. Guidelines recommend to

22 intensify anti-diabetic therapy when treatment targets are not met within 3-6 months of monotherapy [1].

Successful treatment of T2DM is generally complicated by treatment-related adverse effects (hypoglycaemia, weight gain) and the progressive nature of the disease. Many patients eventually require therapy intensification with another drug, however a consensus among physicians has not been achieved [50].

The current six common post metformin second-line therapy intensification options are sulfonylurea (SU), thiazolidinedione (TZD), sodium glucose co-transporter 2 inhibitor (SGLT- 2i), glucagon-like peptide-1 receptor agonist (GLP-1RA), dipeptidyl peptidase-4 inhibitor (DPP-4i) or insulin (INS); other drugs are recommended under specific conditions. MET, SU, and insulin represent the ‘old agents’; TZD has been used for the last decade, especially in Asian countries. GLP-1RA, DPP-4i, and SGLT-2i represent the ‘novel agents’. All of the agents, used alone or in combination, are associated with different adverse events including hypoglycaemia (SU and insulin), weight gain (SU, insulin and TZD), gastrointestinal side effects (MET, GLP-1RA) and increased risk of fractures (TZD) [51, 52].

1.5 INCRETIN-BASED THERAPIES

Glucagon-like peptide-1 (GLP-1) and glucose-dependent insulinotropic polypeptide (GIP) are gut-derived hormones, also called incretins, which induce insulin secretion in a glucose- dependent manner as a response to nutrient ingestion. Additionally, GLP-1 inhibits glucagon secretion, slows gastric emptying, and increases satiety [53]. GLP-1 and GIP are degraded within 2-3 minutes by the enzyme dipeptidyl peptidase-4 (DPP-4).

Incretin-based therapies are represented by two classes: oral DPP-4i and subcutaneous GLP- 1RA. The former increases effective levels of incretins by targeting and inactivating DPP-4, while the later increases insulin release through direct action on GLP-1 receptors [54]. These therapies have been in the focus during the last several years because of their unique mechanisms of action [52, 55-59]. GLP-1RAs stimulate insulin secretion and inhibit glucagon release in a strictly glucose-dependent manner. The pancreatic effects also include increased beta-cell proliferation, and decreased beta-cell apoptosis [56, 60, 61].

Several GLP-1RAs have been approved for the treatment of patients with T2DM. , the first GLP-1RA representative, was approved in April 2005 by the USA Food and Drug Administration (FDA), and in October 2006 by the Europe and Medicines Agency (EMA). FDA and EMA approved agents are summarised in the Table 1.1. GLP-1RAs differ in the

23 structure, and also may be distinguished by durability of action: short-acting (once- or twice- daily administration) and long-acting (once-weekly administration). The first DPP-4i was – approved in October 2006 and March 2007 by FDA and EMA, respectively [62]. DPP-4is are administrated once daily, and all DPP-4is are available in combination with metformin.

Table 1.1 Incretin-based medications approved in the US and EU GLP-1RA class DPP-4i class Exenatide (Byetta®, Bydureon®) Sitagliptin (Januvia®) Liraglutide (Victoza®) (Galvus®) (Lyxumia®) (Onglyza®) (Eperzan®, Tanzeum®) (Trajenta®) (Trulicity®) (Vipidia®, Nesina®) (Ozempic®) Table source: Scheen, A.J., Cardiovascular outcome studies with incretin-based therapies: comparison between DPP-4 inhibitors and GLP-1 receptor agonists. diabetes research and clinical practice, 2017. 127: p. 224-237 [63].

1.6 GLYCAEMIC EFFECTS OF INCRETIN-BASED THERAPIES

Incretin-based therapies have demonstrated their ability to significantly reduce glucose levels while maintaining a low risk of hypoglycaemia in patients with T2DM.

GLP-1RAs reduce fasting plasma glucose by 1.4-3.4 mmol/L and HbA1c by 0.8-1.8% [64]. HbA1c reductions are 0.5-1.0% and 1.5-2.0% with short- and long-acting exenatide, and around 0.8-1.5% with liraglutide [65, 66]. Short-acting agents have a greater effect on postprandial glucose levels mainly through inhibition of gastric emptying, while long-acting GLP-1RAs have a greater effect on fasting glucose levels mainly through their insulinotropic and glucagonostatic actions [65, 67]. In a direct head-to-head study, patients treated with liraglutide achieved greater reductions in HbA1c and fasting glucose, compared to those treated with exenatide [67]. In the review of direct head-to-head trials on GLP-1RAs (n=9), Madsbad (2016) reports higher reductions of HbA1c with liraglutide than with exenatide formulations and albiglutide, and no differences in HbA1c reductions between liraglutide and dulaglutide [68].

DPP-4i reduce fasting plasma glucose levels by 1.0-1.4 mmol/L and HbA1c by 0.5-1.1% [64]. HbA1c reductions with sitagliptin are 0.6-0.8%, with saxagliptin 0.4-0.8, with linagliptin 0.5- 0.7%, with alogliptin 0.5-0.9% [65, 66]. There are no head-to-head trials comparing DPP-4i

24 agents [69], however several systematic reviews and meta-analyses reported similar efficacy and safety of DPP-4i agents [70, 71]. Several head-to-head trials were conducted to compare GLP-1RA representatives with DPP-4i agents, where GLP-1RA class demonstrated higher glycose reductions than DPP-4i [66].

1.7 CARDIO-METABOLIC EFFECTS OF INCRETIN-BASED THERAPIES

GLP-1 receptors are widely expressed throughout the human body and their presence on coronary artery endothelial cells may benefit ischemic conditioning [72, 73]. Data from animal models, pre-clinical and exploratory studies suggest potential CV benefits by improved endothelial and myocardial function [74-76], improved left ventricular ejection fraction and wall indices [55, 77], decreased levels of inflammatory markers and atherosclerosis [62, 78], and recovery of failing and ischemic hearts [76, 79-81].

DPP-4i were shown to be weight-neutral and GLP-1RAs were shown to significantly reduce body weight by 2-4 kg over 6 months of therapy [60, 63]. While T2DM population have increased burden of obesity, an independent risk factor for CVD, incretin-based therapies represent favourable therapeutic option comparing to agents that cause weight gain. These therapies were also reported to decrease blood pressure (BP) [62, 78], which might be independent of weight reductions [82, 83]. The Liraglutide Effect and Action in Diabetes trials 1-5 demonstrated systolic blood pressure reductions of 3.6-6.7 mmHg [73]. Also, incretin- based therapies demonstrated modest improvements in total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglyceride profiles [62, 66, 67, 84]. However, small increase in heart rate 2-4 beats/minute has been associated with GLP-1RA treatment [84, 85].

Nonetheless, beneficial effects on CV risk from these promising studies in animals and humans are not yet transferred to clinical evidence. Chapter 2 presents a summary on current evidence of incretin-based therapies association with CV risk.

1.8 AIMS AND OBJECTIVES

Most of the patients with T2DM require intensified treatment with multiple ADDs, apart from medications for CV risk factor control, as the disease progresses. While metformin is recommended as the first-line ADD, the guidelines suggest at least six possible options for therapy intensification, where evidence is primarily drawn from RCTs. While a great number of studies report beneficial effects of one medication brand over placebo or another brand, patient and practitioner decisions on intensification therapy have become more complicated

25 than ever. At the same time, patient choices, disease management, and cardio-metabolic outcomes significantly differ in the real-world scenario compared to RCT practices.

 The Aim of this project was to explore cardio-metabolic effects of treatment with incretin- based drugs, compared to other treatment options in the real-world setting.

Based on large patient-level electronic medical records (EMRs) from primary and ambulatory care systems, the objectives of this real-world data based pharamaco-epidemiological project combine both comparative effectiveness and outcome studies, while addressing a host of methodological challenges in dealing with large EMRs.

Objective 1: To develop and validate data mining techniques to extract and analyse longitudinal prescription data from EMR database.

Objective 2: To develop algorithms and machine learning techniques to identify disease cohorts from EMR database.

Objective 3: To explore temporal trends in anti-diabetic drug prescribing and intensification patterns, along with glycaemic levels and comorbidities by class of anti-diabetes drug.

Objective 4: To explore long-term dynamics of glucose control and its sustainability in the following treatment groups:

Group 1: Metformin plus GLP-1RA Group 2: Metformin plus DPP-4i Group 3: Metformin plus Insulin Group 4: Metformin plus Sulfonylurea

Objective 5: To explore long-term burden of blood pressure, low density lipoprotein, and triglycerides in the above-mentioned groups, and the association of such control with risk of major adverse cardiovascular events.

SGLT-2 inhibitors, first approved in 2013, have demonstrated glycaemic and extra-glycaemic benefits. Additionally, some class representatives demonstrated renal protection and association with reduced risk of CV events. However, due to data constraints and limited follow-up time, this therapy group could not be included for robust comparative analyses outlined in Objectives 4 and 5.

26 1.9 METHODOLOGICAL BACKGROUND

Patient-level data from electronic medical records (EMRs) collected from 1995 till 2016 across all states of the US during routine primary and ambulatory care were used throughout this project. This subsection is devoted to a general description of the role, scientific value, and limitations of evidence based on real-world data. Chapter 3 describes the data in detail.

1.9.1 Scientific value of real-world evidence According to the FDA, real-world data is defined as any data related to patient health status and /or delivery of health care that is routinely collected from various sources. [86]. These sources include EMRs, claims and billing systems, product and disease registries, health- monitoring devices, and health-related applications [87]. Evidence based on analysing such data includes studies on therapeutics, disease-related outcomes, safety, cost-effectiveness, epidemiology, patient-care, and delivery systems.

The nature of real-world evidence is very different from RCTs, with both important advantages, but also disadvantages relative to RCTs. RCTs are considered as the gold standard for testing hypotheses not only because baseline randomisation supports conclusions of causality, but also due to the ability of tight control over measurement and clinical conduct and ease of communicating results [88]. Tests of safety and efficacy of an intervention in a RCT are considered to be bias free, and provide a reliable source of internal validity. However, RCTs are often conducted with specific populations and findings may be less generalisable to broader populations, apart from being costly and requiring a long time to complete. Real-world based studies provide opportunities to observe health outcomes in populations that are often excluded from RCTs, such as pregnant, older, or co-morbid patients. These studies also allow exploration of research questions that may be unethical for testing in RCTs, for instance the outcomes of a delay/ failure in treatment intensification [89, 90]. While RCTs are conducted in a specialised environment and assess whether a treatment may work, real-world studies observe whether the actual use and outcomes of interventions works in the everyday clinical practice. For example, Edelman and colleagues reported that HbA1c reductions observed in RCTs are much higher than in the real-world scenario: 1.25% vs 0.52% for GLP-1RA and 0.68% vs 0.51% for DPP- 4i [91, 92]. The authors suggest poor medication adherence as a key driver of such a disconnect. Several countries (UK, Sweden, Estonia) have implemented a nationwide “birth-to-death” EMRs for nearly every citizen, which brings a unique opportunity to observe population-level behaviour, effectiveness of changes in health care policies, and health management costs in

27 addition to population-level safety, effectiveness, and health-related long-term outcome research [93-96]. Large population level databases also provide an opportunity to bring together benefits of EMRs and RCTs by randomising and recruiting patients from EMRs to protocol-driven RCTs in a convenient, fast, and cost-effective manner [87].

The increasing role of real-world data in health care decisions has led the European Union to establish a project to monitor adverse drug reactions (EU-ADR) using EMRs from the Netherlands, Denmark, United Kingdom, and Italy [97]. Another example of combining national registry data (US, Norway, Denmark, Sweden, Germany, UK) is CVD-REAL study that was designed to assess whether positive outcomes observed in the completed RCT are also applicable to broader population in the real-world practice [98].

1.9.2 Limitations and challenges of electronic medical records Health related data from EMRs reflect complex multi-factorial relationships of everyday clinical practice – with the challenges in design, analysis and interpretations of the findings [88]. Confounding and data quality limit the ability to conclude direct causation, and in in this sense, the EMR-based studies should be interpreted with caution. EMRs are collected during routine medical care and usually extensively capture demographics, medication prescriptions, diagnoses and procedures, laboratory and anthropometric measures. However, EMR data are prone to (1) loss of follow-up, (2) misdiagnosis, misclassification and miscoding, (3) missing data on certain variables, (4) unreliable data on some relevant variables, and (5) biases and confounding.

Follow-up in the EMRs may be lost when a patient moves to a different location or transfers out of a practice. While nation-wide EMRs lose follow-up when a patient moves to another country, commercial EMRs are not able to track patient records once he /she moves to a practice that does not contribute data to a particular EMR network. Due to the nature of general practitioner settings, some variables are recorded more often than others. For example, blood pressure measurements are taken at almost every general practitioner encounter because of the relative ease with which it can be measured. At the same time, information on diet and exercise, disease activity progression, or medications dose escalations are entered to the EMRs less often. Also, patients are prone to provide non-reliable data on drug abuse, smoking habits, and alcohol consumption. Laboratory and anthropometric measures may be conducted on different equipment and may follow different procedures. Miscommunications, errors during data-entry

28 process, and non-attendance of scheduled visits are part of routine medical care, which brings additional errors to data from EMRs.

Real-world studies are prone to various sources of biases, where some of them reflect data collection nature (e.g. specific insurance or clinic), some of them of them may be reduced with careful study design (e.g. immortal time bias), and others may be reduced with advanced data mining and statistical methodologies (e.g. information bias). In a recent publication, Verheij and colleagues discuss possible sources of bias in the EMR-based research and categorise them as presented in the Table 1.2 [99].

Table 1.2 Possible sources of bias in Electronic Medical Record data Reimbursement system, pay for performance parameters Role of general practitioner in the health care system; gatekeeping / nongatekeeping Professional clinical guidelines Ease of access by patients to their records Data sharing between health care providers Practice workload Variations between EMR system functionalities and lay-out Coding systems and thesauruses Knowledge and education regarding the use of EMR systems Data extraction tools Data processing Research dataset preparation Research methodologies Table source: Verheij, A.R., et al., Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse. J Med Internet Res, 2018. 20(5): p. e185. [99]

1.10 THESIS STRUCTURE AND LOGICS

This project is designed as a series of methodological and pharmaco-epidemiological studies that were conducted using a large database of EMRs. Chapter 2 provides a literature review on the association of treatment with incretin-based therapies with CV risk.

Chapters 3-6 are devoted to data science. Chapter 3 introduces the database and basic data management considerations. Chapter 4 describes the algorithm developed to extract and aggregate medication information at individual patient-level. The information on ADD use was obtained for all patients in the database (~34 million patients), and these data were incorporated in the algorithm to identify a robust cohort of patients with diabetes (chapter 5). For this cohort

29 of patients, chapter 6 reports the patterns of missingness in the longitudinal laboratory and anthropometric measures, and compares performance of several multiple imputation techniques.

Chapters 3-6 describe the data groundwork that was essential in order to draw reliable clinical inferences from voluminous and complex EMRs. These methods were part of the data preparation for each pharmaco-epidemiological study described in chapters 7-9 and appendices. Each of these clinical studies has its own design and methodology (data mining and statistical), described separately within the respective chapters. Note that each chapter’s database description is repetitive and presents a compressed version of chapter 3.

Chapter 7 explores longitudinal trends in the use of ADDs, glycaemic control, and patients’ characteristics with respect to the drug initiation order. Chapter 8 focuses on the glycaemic control and its sustainability comparing second-line treatment options. Chapter 9 explores cardio-metabolic risk factor burden at population level and cardio-metabolic risk factor control by class of second-line ADD. It also explores association of cardio-metabolic risk factor burden and the risk of CV events. Finally, chapter 10 summarise the results, concludes conducted work, and discuss future directions.

30 Chapter 2: Literature Review

2.1 CLINICAL TRIALS

Prior to 2008, the approval of new ADDs was based on improvements in glycaemia with detailed investigation of adverse events. The trials were usually 6 months long, where presence of CVD was often an exclusion criterion [100]. In 2007, a meta-analysis of 43 studies reported a significant increase in the risk of myocardial infarction in patients treated with (TZD class), and a non-significant increase in the risk of CV death [101]. This controversial publication generated enormous public reaction, which resulted in the FDA recommending conducting long-term CV safety trials or other equivalent evidence to support CV safety of new anti-diabetic agents in 2008. The guidance document suggested a meta-analysis of phase 2 and 3 trials to rule out CV risk as a default option, and the need for additional CV safety trial only when the data are insufficient [100].

2.1.1 Cardiovascular outcome trials In practice, a large dedicated CV safety trial has been conducted for every novel agent. Till date, 3 large CV outcome trials have been completed for DPP-4i agents (Table 2.1) and 4 for GLP-1RA agents (Table 2.2).

Table 2.1 Completed cardiovascular outcome trials for DPP-4i in patients with type 2 diabetes SAVOR-TIMI53 EXAMINE TECOS Drug Saxagliptin Alogliptin Sitagliptin Primary Endpoint 3-point MACE 3-point MACE 4-point MACE N 16,492 5,380 14,671 Follow-up, years 2.1 1.5 3 Inclusion: Minimum Age, 40 18 50 years HbA1c, % ≥6.5 6.5-11.0 6.5-8.0 Acute Coronary Cardiovascular Pre-existing CVD Syndrome 15- 90 Pre-existing CVD Status or high CV risk days before Mean at baseline: BMI, kg/m2 31.1 28.7 30.2 Age, years 65 61 65.5 HbA1c, % 8 8 7.2

31 SAVOR-TIMI53 EXAMINE TECOS Outcome, Hazard Ratio (95% CI) Primary composite 1.00 (0.89–1.12) 0.96 (≤1.16)* 0.98 (0.89–1.08) Myocardial 0.95 (0.80–1.12) 1.08 (0.88–1.33) 0.95 (0.81–1.11) infarction Stroke 1.11 (0.88–1.39) 0.95 (≤1.14)* 0.97 (0.79–1.19) Heart Failure 1.27 (1.07–1.51) 1.07 (0.79–1.46) 1.00 (0.83–1.20) CV Mortality 1.03 (0.87–1.22) 0.85 (0.66–1.10) 1.03 (0.89–1.19) All-cause 1.11 (0.96–1.27) 0.88 (0.71–1.09) 1.01 (0.90–1.14) mortality *one-sided repeated CI, at an alpha level of 0.01 MACE: major cardiovascular event 3-point MACE: CV death, Myocardial Infarction, or Stroke 4-point MACE: CV death, Myocardial Infarction, Unstable Angina, or Stroke

The trials were designed to assess CV safety of novel agents over placebo in patients with established CVD or high CV risk. With median follow-up of 1.5-3.8 years and average of 10,000 patients, all completed RCTs could prove CV safety. In the SAVOR-TIMI-53 trial, increased rates of hospitalisation for heart failure were observed in the saxagliptin arm compared to placebo with HR (95% CI) of 1.27 (1.07, 1.51) [102]. Notably, neither this nor other CV safety trials included heart failure as a primary or secondary end point [103]. Upon showing non-inferiority, secondary analyses of the LEADER trial demonstrated superiority of liraglutide compared to placebo with HR (95% CI) for 3-point MACE (CV death, myocardial infarction, or stroke) of 0.87 (0.78, 0.97) [104]. Notably, significantly lower HbA1c (0.4%), body weight (2.3 kg), and systolic blood pressure (1.2 mmHg) were achieved in the liraglutide arm compared to the placebo [63].

Table 2.2 Completed cardiovascular outcome trials for GLP-1RA in patients with type 2 diabetes ELIXA LEADER SUSTAIN-6 EXCEL Drug Lixisenatide Liraglutide Semaglutide Exenatide Primary Endpoint 4-point MACE 3-point 3-point 3-point MACE MACE MACE N 6,068 9,340 3,297 14,752 Follow-up, years 2.1 3.8 1.9 3.2 Inclusion: Minimum Age, 30 50 50 18 years HbA1c, % ≥7 ≥7 ≥7 6.5-10.0 Cardiovascular Acute Coronary Pre-existing Pre-existing Pre-existing Status Syndrome 0-180 CVD or CVD or high CVD or high days before high CV CV risk CV risk risk

32 ELIXA LEADER SUSTAIN-6 EXCEL Mean at baseline: BMI, kg/m2 30.2 32.5 31.1 Age, years 60 64.3 64.6 HbA1c, % 7.7 8.7 8.7 8.0 Outcome, Hazard Ration (95% CI) Primary 1.02 0.87 0.74 0.91 composite (0.89–1.17) (0.78–0.97) (0.58–0.95) (0.83, 1.00) Myocardial 1.03 0.86 0.74 0.97 infarction (0.87–1.22) (0.73–1.00) (0.51–1.08) (0.85−1.10) Stroke 1.12 0.86 0.61 0.85 (0.79–1.58) (0.71–1.06) (0.38–0.99) (0.70−1.03) Heart Failure 0.96 0.87 1.11 0.94 (0.75–1.23) (0.73–1.05) (0.77–1.61) (0.78−1.13) CV Mortality 0.98 0.78 0.98 0.88 (0.78–1.22) (0.66–0.93) (0.65–1.48) (0.76−1.02) All-cause 0.94 0.85 1.05 0.86 mortality (0.78–1.13) (0.74–0.97) (0.74–1.50) (0.77−0.97) MACE: major cardiovascular event 3-point MACE: CV death, Myocardial Infarction, or Stroke 4-point MACE: CV death, Myocardial Infarction, Unstable Angina, or Stroke

These trials provide very valuable clinical evidence of CV safety with novel agents. Importantly, none of them was designed to demonstrate CV superiority and only patients at increased risk of CV were recruited in these RCTs, much longer trials would be required for a low-risk population. Also, most of the patients with T2DM were on a background of cardio- protective and lipid modifying drugs, and those in the placebo group were more likely to receive other (older) agents for treatment intensification.

2.1.2 Non-cardiovascular outcome trials Analyses of non-CV outcome trials with shorter duration that included patients with much lower CV risk demonstrated CV safety or superiority of incretin-based drugs over comparators. A meta-analysis of 70 trials on DPP-4i with at least 24 weeks of follow-up, reported a HR (95% CI) of 0.71 (0.59, 0.86) for major cardiovascular events (MACE) against placebo or other comparators [105]. A meta-analysis of RCTs on patients who used GLP-1RA for a minimum duration of 6 months, reported a HR (95% CI) of 0.78 (0.54, 1.13) for MACE against placebo or other comparators [106]. Another recent meta-analysis that included 281 RCTs on treatment with incretin-based drugs for ≥ 12 weeks reported odds ratio (95% CI) of 0.89 (0.80, 0.99) for the risk of CV events favouring GLP-1RA use against placebo [107]. This meta-analysis also reported odds ratio (95% CI) of 0.92 (0.83, 1.01) for CV events for DPP-4i against placebo.

33 In an overview of reviews, Gamble and colleagues assessed the quality of systematic reviews evaluating the safety, efficacy and effectiveness of incretin-based therapies [108]. A total of 83 pooled treatment effect estimates from 10 systematic reviews on CV outcomes were analysed, where none received a high-quality Assessing the Methodological Quality of Systematic Reviews (AMSTAR) score. The study reported that most of reviews suggested a potential decreased risk (41 of 45 for DPP-4i and 28 of 38 for GLP-1RA), while only few (18 of 41 for DPP-4i and 3 of 28 for GLP-1 RA) pooled treatment effect estimates were statistically significant. The authors suggested possible overestimations in the results and possible publication bias in analysed reviews.

2.2 OBSERVATIONAL STUDIES

Table 2.3 presents a summary of observational studies that explored CV risk of treatment with incretin-based therapies. Overall, conclusions are consistent with CV-outcome trial results – treatment with incretin-based therapies does not increase and possibly reduces CV risk.

Multiple factors such as study design, available data, data management methodology and statistical approaches make direct comparison of these studies very difficult. Patorno and colleagues (2016) compared CV outcomes of treatment with GLP-1RAs with DPP-4i, SU, and INS under the same study design [109]. GLP-1RA users were 1:1 matched to other treatment groups (allowing patient overlap), and during 0.8 years of follow-up there were no significant differences in the CV risk between the groups. Kannah and colleagues (2016) compared MET+SU, MET+TZD, MET+DPP-4i, and MET+GLP-1RA combinations using a Cox regression approach with propensity score as adjustment covariate [110]. While there was no difference in the risk of overall mortality and coronary artery disease between all groups, compared to MET+SU, patients treated with MET+DDP-4i had higher risk of heart failure with a HR (95% CI) of 1.10 (1.04, 1.17). Notably, the methodological approach used in this study is generally not recommended in the statistical literature [111-113]. Zghebi and colleagues (2016) observed a non-significant reduction in the risk of major CVD or CV death for second- line DPP-4i users, compared to second-line SU users adopting Cox regression approach weighted with inverse probability of treatment [114]. The same study observed a significant CV risk reduction for second-line TZD users compared to SU users. The most recent observational study comparing post metformin second-line GLP-1RA, DPP-4i users, reported lower CV risk of treatment with incretin-based therapies, compared with SU users – significant for the DPP-4i group but non-significant for the GLP-1RA group [115].

34 Table 2.3 Summary of observational CV-outcome studies of treatment with incretin-based therapies Source Drug / Cohort Comparator / Follow Conclusion size Cohort size -up (yr)

GLP-1RA Sulfonylurea non-significant Comparative effectiveness of (added to MET) (added to MET) CV risk incretin-based therapies and the risk 487 25,916 reduction of death and cardiovascular events 2.7 DPP-4i Sulfonylurea in 38,233 metformin monotherapy significant CV (added to MET) (added to MET) users. Gamble et at, 2016 [115] risk reduction 6,213 25,916 Comparative risk of major cardiovascular events associated with second-line antidiabetic DPP-4i Sulfonylurea non-significant treatments: a retrospective cohort (added to MET) (added to MET) 2.4 CV risk study using UK primary care data 1,030 6,740 reduction linked to hospitalization and mortality records, Zghebi et al, 2016 [114] GLP-1RA DPP-4i no significant (added to MET) (added to MET) difference in Comparative Cardiovascular Safety 18,658 69,807 CV risk of Glucagon-Like Peptide-1 GLP-1RA SU no significant Receptor Agonists versus Other (added to MET) (added to MET) 0.8 difference in Antidiabetic Drugs in Routine 14,466 114,480 CV risk Care: a Cohort Study. Patorno et al, GLP-1RA Insulin no significant 2016 [109] (added to MET) (added to MET) difference in 29,343 42,982 CV risk The association of the treatment GLP-1RA: with glucagon-like peptide-1 Exenatide with Insulin receptor agonist exenatide or other OAD (concomitant insulin with cardiovascular 2,804 ADD allowed) 3.5 reduced CV risk outcomes in patients with type 2 Exenatide with 28,551 diabetes: a retrospective Insulin observational study. Paul et al, 7,870 2015 [116] Risk of cardiovascular disease events in patients with type 2 GLP-1RA: diabetes prescribed the Glucagon- Other ADD Exenatide reduced CV risk Like Peptide 1 (GLP-1) receptor (concomitant (concomitant - and all-cause agonist exenatide twice daily or ADD allowed) ADD allowed) hospitalization other glucose-lowering therapies: A 391,771 21,754 retrospective analysis of the lifelink database. Best et al, 2011 [117] reduced risk of Association of Anti-Diabetic GLP-1RA or Other ADD hospitalization Medications Targeting the DPP-4i (concomitant for HF, all- Glucagon-Like Peptide-1 Pathway (concomitant 2 ADD allowed) cause and Heart Failure Events in Patients ADD allowed) 2,798 hospitalization with Diabetes. Velez, 2015 [118] 1,426 or death Risk of overall mortality and GLP-1RA SU no significant cardiovascular events in patients (added to MET) (added to MET) difference in HF with type 2 diabetes on dual drug 433 9,419 events therapy including metformin: A 4 DPP-4i SU large database study from the increased risk of (added to MET) (added to MET) Cleveland Clinic. Kannah et al, HF 2016 [110] 1,487 9,419

35 Cardiovascular safety of GLP-1RA SU non-significant combination therapies with (added to MET) (added to MET) 2.3 CV risk incretin-based drugs and metformin 4,345 25,092 reduction compared with a combination of metformin and sulphonylurea in DPP-4i SU significant CV type 2 diabetes mellitus - a (added to MET) (added to MET) 3 risk reduction retrospective nationwide study, 11,138 25,092 Mogensen et al, 2014 [119] Association Between Hospitalization for Heart Failure DPP-4i SU no significant and Dipeptidyl Peptidase-4 (concomitant (concomitant 0.5 difference in Inhibitors in Patients With Type 2 ADD allowed) ADD allowed) CV risk Diabetes: An Observational Study. 109,278 109,278 Fu et al, 2016 [120] DPP-4i: Sitagliptin use in patients with Other ADD + HF Sitagliptin + HF increased risk of diabetes and heart failure: a (not on (concomitant 1.4 hospitalization population-based retrospective Sitagliptin) ADD allowed) for HF cohort study. Weir et al, 2014 [121] 6,733 887 All-cause mortality and cardiovascular effects associated DPP-4i: with the DPP-IV inhibitor MET no significant Sitagliptin sitagliptin compared with (monotherapy) 1.3 difference in (monotherapy) metformin, a retrospective cohort 83,528 CV risk 1,228 study on the Danish population. Scheller et al, 2014 [122] Dipeptidyl peptidase-4 inhibitors do DPP-4i Other ADD not increase the risk of (concomitant (concomitant significant CV cardiovascular events in type 2 0.6 ADD allowed) ADD allowed) risk reduction diabetes: a cohort study. Kim et al, 39,769 39,769 2014 [123] Sitagliptin and the risk of DPP-4i Other ADD increased risk of hospitalization for heart failure: a (concomitant (concomitant 1.5 hospitalization population-based study. Wang et al, ADD allowed) ADD allowed) for HF 2014 [124] 8,288 8,288 Combination therapy with metformin plus sulphonylureas versus metformin plus DPP-4 DPP-4i SU significant CV inhibitors: association with major (added to MET) (added to MET) - risk reduction adverse cardiovascular events and 7,864 33,983 all-cause mortality. Morgan et al, 2014 [125]

2.3 CONCLUSIONS AND IMPLICATIONS

While the completed RCTs clearly demonstrated CV safety of incretin-based therapies in the high-risk population, there is no clear evidence of CV benefits of these therapies. Several meta- analyses of non-CV outcome trials and observational studies supported CV safety of treatment with incretin-based therapies in broader populations. The risk of heart failure with DPP-4i is not completely ruled out, and will remain under more careful monitoring in the future. There is a trend towards CV superiority of treatment with incretin-based therapies, especially with GLP-1RA. While a RCT designed to demonstrate CV superiority of GLP-1RA or DPP-4i over

36 placebo or other comparator is unlikely, a large multi-national observational study with long follow-up could provide strong evidence of comparative CV superiority of one ADD class over others. However, no such study has been for either GLP-1RA nor DPP-4i till date.

Given multi-comorbid profile of patients with T2DM, it is now more urgent to explore whether introduction of novel drug classes to the market has helped to reduce glycaemic and CV risk factor burden at population level. This dissertation is designed to assess such trends and their reflection on the rates of CV events in patients treated with major second-line ADDs.

37 Chapter 3: Data Description

3.1 CENTRICITY ELECTRONIC MEDICAL RECORDS

Data from Centricity Electronic Medical Records (CEMR) was used in this thesis.

Centricity™ is a brand of 27 healthcare IT software solutions from General Electric Healthcare, which incorporates software for independent physician practices, academic medical centres, hospitals and large integrated delivery networks. It refers to the systematised data collection, storing and secure transmitting of patient health information in a digital format [126].

The Medical Quality Improvement Consortium is a rapidly growing community of over 400 CEMR customers who contribute de-identified clinical data to the CEMR database in order to enable quality improvement, benchmarking, and population-based medical research. The database covers over 35,000 health care providers from all US states, where ~70% are primary care providers.

CEMR database contains patient-level information on demographics (sex, ethnicity, year of birth) and longitudinal entries on anthropometrics, diseases, clinical observations, laboratory results, and medications (Figure 3.1). Variables such as BMI, blood pressure, HbA1c, urine albumin and creatinine, or lipid profiles along with dates and other relevant information are stored in the form of a relational database.

The database extract that captured longitudinal EMRs from January 1995 until October 2014 was used at the initial stages of the project. The database extract that captured patient history for more than 34 million individuals with a mean 3.5 years of follow-up from January 1995 until April 2016, was used to achieve the main results of this project, reported in chapters 7-9.

38 Figure 3.1. Schematic representation of the data in CEMR database.

Representativeness of CEMR Database

In general, the database is representative of US population in terms of age and ethnic subgroups, however higher proportions of patients from north eastern and mid-western states are represented in the CEMR [127]. The patients’ demographic characteristics in the CEMR database are generally similar to those of the overall US population, with a slight bias towards older, black, female and non-Hispanic. The distribution of CV risk factors was found to be similar to the prospective national health surveys [128]. The representativeness of patients with diabetes is discussed in chapter 5. CEMR has demonstrated its usefulness for various epidemiological and outcome studies. It has been extensively used for academic research worldwide in the fields of diabetes [129-132], CV research [133-135], obesity [136-138], inflammatory diseases [139-142], and other diseases [143-146].

3.2 MEDICATION DATA

Medication data are extensively recorded in the CEMR - includes names, doses, the dates of prescriptions, and the number of repeat prescriptions for the whole period of the electronic record (or available follow-up time). CEMR also stores data from patients’ medication list, which includes over-the-counter medications and those received from outside the EMR network. This data contains start/stop dates and specific fields to track treatment alterations.

39 Dose escalation for individual medications (e.g. increasing doses of MET) are captured. However, the data on dose titration, especially for insulin, is relatively poor in primary care databases. Chapter 4 provides a detailed description of the available data and discusses data related issues associated with longitudinal information on medication usage from large relational databases such as CEMR. The “chaining” approach (described in chapter 4) was used to extract and aggregate medication information at patient-level throughout this project.

3.2.1 Drug identification As the database captures data from various systems since 1995, medication data were entered in various ways including, but not limited to, Generic Product Identifier codes and the National Drug Codes. CEMR stores the original medication name and less frequently the generic name from the EMR source system terminology reference database, as well as normalised names for clinical drugs (RxNorm terminology system), when possible. Therefore, several medication name related fields exist and may include generic name, brand name, free text comments or missing entries. The procedure described below has been elaborated to identify medication keys (unique identifiers) of anti-diabetic and other relevant drugs:

1. Identify highest level in the Anatomical Therapeutic Chemical Classification (ATC) System [147] for a relevant drug category (Table 3.1).

Table 3.1 Therapeutic Class and highest corresponding ATC code Therapeutic Class Highest ATC code Anti-obesity preparations, excluding diet products A08 Anti-diabetic drug* A10 Antihypertensive drug C02 Diuretic drug C03 Peripheral vasodilator C04 Beta blocking agent C07 Calcium channel blocker C08 Angiotensin-converting-enzyme inhibitor C09AA and C09B Angiotensin C09CA and C09D Agents acting on the renin-angiotensin system C09 HMG CoA reductase inhibitors (Statin) C10AA and C10B Lipid modifying agents C10 Antidepressant drugs N06A Non-steroidal anti-inflammatory drug M01A and B01AC06

40 2. For each therapeutic class obtain a list of generic names browsing lower ATC categories. 3. Search generic names in the official FDA catalogue [148], and create lists of all approved brand names. Link brands of combination products to each generic name. 4. Combine obtained generic and brand names in the list of keywords. 5. Text-mine CEMR to obtain sets of medication keys for each therapeutic class. 6. Manually review obtained lists and exclude inappropriate keys.

Illustrative example (steps 1-5) of identifying medication keys for Liraglutide is provided in the Figure 3.2.

Therapeutic class “Anti-diabetic drug” included eleven groups: MET, SU, TZD, Alpha glucosidase inhibitor, , Dopamine receptor agonist, Meglitinides, DPP-4i, GLP-1RA, SGLT-2i, and INS. Saxenda (brand of Liraglutide, GLP-1RA) was excluded from the GLP- 1RA group as it was approved in 2014 as weight lowering medication only [149]. Although Welchol (Colesevelam) was approved for the treatment of T2DM, it is usually prescribed to reduce cholesterol levels; therefore Colesevelam was not considered as ADD in this project [150].

Angiotensin-converting-enzyme inhibitors, agents acting on the renin-angiotensin system, beta blocking agents, and statins were considered as cardio-protective drugs.

41

Figure 3.2. Schematic diagram of identifying list of medication keys for Liraglutide.

42 3.3 DISEASE DATA

CEMR database stores patients’ disease data by means of International Classification of Diseases 9th Revision (ICD-9) codes, International Classification of Diseases 10th Revision (ICD-10) codes, or less frequently with SNOMED Clinical Terms (SNOMED CT) codes. Reliability of diagnosis coding in CEMRs for various diseases has been examined in prior studies [94, 128, 136], therefore diagnostic codes were directly used to identify presence of a disease. Nonetheless, additional advanced techniques were applied to improve the quality of the cohort of patients with diabetes (chapter 5).

The history of disease before baseline of a particular study and disease events during follow- up were constructed using the date of diagnosis of diseases. “Time to event” was calculated as time from baseline till the first available diagnosis date for a particular disease. Disease events included CV disease, chronic kidney disease (CKD) with its stage, cancer, depression and other relevant diseases. Patients with diagnostic codes for bariatric surgery were also identified. CVD was defined as ischaemic heart disease (including myocardial infarction), peripheral vascular/arterial disease, heart failure or stroke.

3.3.1 Charlson Comorbidity Index While controlling for comorbidities that may affect study outcome is essential, adjusting for large number of possible comorbidities may be problematic from clinical and methodological points of view [151, 152]. Rather than adjusting for the effect of each comorbidity, several methods have been proposed to control for overall comorbidity burden [151, 153, 154]. The Charlson comorbidity index (CCI) is the most widely comorbidity index used in the medical literature [155, 156].

CCI was developed to predict 1-year mortality in a cohort of 604 patients admitted to a New York teaching hospital during 1 month in 1984. The validation of CCI was performed on a cohort of 685 breast cancer patients admitted to a Connecticut teaching hospital from 1962 to 1969 [151, 152]. Weights (1, 2, 3, or 6) for CCI score computation were created by assessing adjusted hazard ratios for each predefined comorbidity from a Cox proportional hazards regression model [151, 157].

Since CCI was introduced, it has been extensively validated in cohorts with different diseases. Also, numerous adaptations of this index were developed, including adaptations for administrative databases [152, 158, 159]. In this project, the algorithm recommended by Quan and colleagues [160] was used to identify ICD-9 and ICD-10 codes for diseased cohorts (except

43 diabetes, Table 3.2). Quan and colleagues expanded ICD-9 codes of Deyo CCI [161] and identified corresponding ICD-10 codes for each comorbidity. Multiple physicians were actively involved through all stages of the algorithm development and reached consensus on the final lists.

To follow the algorithm proposed by Quan and colleagues, SNOMED CT codes were translated to ICD-10 codes using the mapping released in July 2016 by US National Library of Medicine (version 20160301) [162]. CCI score at baseline was calculated using original weights, as presented in Table 3.2.

Table 3.2 Diseases, ICD codes, and Weights used to compute Charlson Comorbidity Index Disease ICD-9 ICD-10 Weight Myocardial Infarction 410.x, 412.x I21.x, I22.x, I25.2 1 398.91, 402.01, 402.11, 402.91, I09.9, I11.0, I13.0, 404.01, 404.03, I13.2, I25.5, I42.0, Congestive heart failure 1 404.11, 404.13, I42.5–I42.9, I43.x, 404.91, 404.93, 425.4– I50.x, P29.0 425.9, 428.x I70.x, I71.x, I73.1, 093.0, 437.3, 440.x, I73.8, I73.9, I77.1, Peripheral vascular 441.x, 443.1–443.9, I79.0, I79.2, K55.1, 1 47.1, 557.1, 557.9, disease K55.8, K55.9, Z95.8, V43.4 Z95.9 G45.x, G46.x, H34.0, Cerebrovascular disease 362.34, 430.x–438.x 1 I60.x–I69.x F00.x–F03.x, F05.1, Dementia 290.x, 294.1, 331.2 1 G30.x, G31.1 Chronic pulmonary 416.8, 416.9, 490.x– I27.8, I27.9, J40.x– 505.x, 506.4, 508.1, J47.x, J60.x–J67.x, 1 disease 508.8 J68.4, J70.1, J70.3 446.5, 710.0–710.4, M05.x, M06.x, M31.5, Rheumatic disease 714.0– 714.2, 714.8, M32.x–M34.x, M35.1, 1 725.x M35.3, M36.0 Peptic ulcer disease 531.x–534.x K25.x–K28.x 1 070.22, 070.23, B18.x, K70.0–K70.3, 070.32, 070.33, K70.9, K71.3–K71.5, 070.44, 070.54, 070.6, Mild liver disease K71.7, K73.x, K74.x, 1 070.9, 570.x, 571.x, K76.0, K76.2–K76.4, 573.3, 573.4, 573.8, K76.8, K76.9, Z94.4 573.9, V42.7 I85.0, I85.9, I86.4, Moderate or severe liver 456.0–456.2, 572.2– I98.2, K70.4, K71.1, 3 disease 572.8 K72.1, K72.9, K76.5, K76.6, K76.7 Diabetes without chronic Identified as described in 0 1 complication

44 E10.2–E10.5, E10.7, Diabetes with chronic E11.2–E11.5, E11.7, 250.4–250.7 E12.2–E12.5, E12.7, 2 complication E13.2– E13.5, E13.7, E14.2–E14.5, E14.7 G04.1, G11.4, G80.1, 334.1, 342.x, 343.x, Hemiplegia or paraplegia G80.2, G81.x, G82.x, 2 344.0– 344.6, 344.9 G83.0–G83.4, G83.9 403.01, 403.11, 403.91, 404.02, I12.0, I13.1, N03.2– 404.03, 404.12, N03.7, N05.2– N05.7, 404.13, 404.92, Renal disease N18.x, N19.x, N25.0, 2 404.93, 582.x, 583.0– Z49.0– Z49.2, Z94.0, 583.7, 585.x, 586.x, Z99.2 588.0, V42.0, V45.1, V56.x C00.x–C26.x, C30.x– C34.x, C37.x– C41.x, 140.x–172.x, 174.x– C43.x, C45.x–C58.x, Cancer 195.8, 200.x–208.x, 2 C60.x– C76.x, C81.x– 238.6 C85.x, C88.x, C90.x– C97.x Metastatic solid tumour 196.x–199.x C77.x–C80.x 6 AIDS/HIV 042.x–044.x B20.x–B22.x, B24.x 6 Note: original table source [160], weights in [157].

As recommended by Quan and colleagues (2005), cancer was considered as any malignancy, including lymphoma and leukaemia, and excluding malignant neoplasm of the skin [160]. In cases where moderate or severe liver disease was present for a patient, mild liver disease did not contribute to the CCI score. Similarly, if a record of diabetes with chronic complications was present, diabetes without chronic complications did not contribute to the CCI score computation. Finally, if a patient had a record of metastatic solid tumour, cancer did not affect the CCI score.

3.4 LABORATORY, CLINICAL, AND ANTHROPOMETRIC DATA

Longitudinal observations on laboratory, clinical, and anthropometric data are extensively recorded in the CEMR. These data are usually entered repeatedly throughout the whole period of the electronic record (available follow-up) for an individual patient. The data used during this project included: HbA1c, fasting/random blood glucose, low-density lipoprotein (LDL), high-density lipoprotein, triglycerides, systolic blood pressure (SBP), diastolic blood pressure, heart rate, urine microalbumin/creatinine ratio, serum creatinine, body mass index (BMI), weight, and tobacco use status. Extensive data validation and cleaning techniques were applied prior to data extraction and all measurements were converted to standard or most frequently used units.

45 3.4.1 Arranging longitudinal measures For individual patients, the longitudinal laboratory, clinical and anthropometric data were arranged in 6 monthly windows: ±3 months both sides of a baseline of a particular study and progressively further on (Figure 3.3). The closest risk factor measure to the middle of the window (or average of multiple measurements if available within that window) was preserved as the observed measure for this window. For baseline HbA1c data, closest measurement on or within 3 months prior to baseline was used for the baseline measurement. The six-monthly longitudinal follow-up data for HbA1c followed the same principle described above.

Figure 3.3. Schematic diagram of arranging longitudinal risk factor data.

Missing data (example: Figure 3.3, window “6M”) were imputed with the Multiple Imputation Monte Carlo Markov Chain approach, after extensive assessments of the missingness patterns and comparison of several imputation techniques, as described in chapter 6.

3.4.2 Tobacco use status The longitudinal free text inputs are also available in the CEMR. Tobacco status included status on any type of tobacco use: cigars, pipe, cigarettes, chewing tobacco or snuff. The majority of records providing such information followed standard coding practice, and were in the form of “current” / “former” / “never” smoker. Remaining records (>80,000) were classified to these 3 categories by creating classification rules upon manual review of entered free text. For example: if description includes keywords “trying” and “quit”, classify as “current”. Occasional smokers were classified as "current". In case of discordant same-day statuses, priority was given to "current" status, than to "former" and lastly to "never" status. Records indicating "never" status were disregarded in case of previous records of "current" or "former" status record. For each patient, last status recorded on or prior to particular analysis baseline was considered as tobacco use status. Nonetheless, a large number of patients with T2DM appeared not to have a record for the tobacco use status.

46 3.5 ETHICS APPROVAL

This thesis involved the use of existing data, where the subjects could not be identified directly or through identifiers linked to the subjects. Thus, according to the US Department of Health and Human Services Exemption 4 (CFR 46.101(b)(4)), this study is exempt from ethics approval from an institutional review board and informed consent.

47 Chapter 4: Medication Data Extraction

Statement of Contribution of Co-Authors for Thesis by Published Paper

The authors listed below have certified* that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Olga Montvida, Ognjen Arandjelović, Edward Reiner, and Sanjoy K. Paul. Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. Open Bioinformatics, 2017, 10:1-15. DOI: 10.2174/1875036201709010001.x.

Contributor Statement of Contribution* Olga Montvida Conceived the idea, was responsible for the primary design 29.06.2018 of the study and the methodological developments. QUT Verified Conducted the data extraction and statistical analyses. Signature Developed first draft and contributed towards development of the manuscript. Ognjen Arandjelović Evaluated the methodological approach and contributed towards development of the manuscript. Edward Reiner Evaluated the methodological approach and contributed towards development of the manuscript. Sanjoy K. Paul Conceived the idea, was responsible for the primary design of the study and the methodological developments. Contributed to the statistical analyses. Developed first draft and contributed towards development of the manuscript.

48 Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship.

QUT Verified Signature

Sanjoy Ketan Paul 29.06.2018 Name Signature Date

49 Send Orders for Reprints to [email protected]

The Open Bioinformatics Journal , 2017, 10, 1-15 1 The Open Bioinformatics Journal

Content list available at: www.benthamopen.com/TOBIOIJ/

DOI: 10.2174/1875036201709010001

RESEARCH ARTICLE Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records

Olga Montvida1,2, Ognjen Arandjelović3, Edward Reiner4 and Sanjoy K. Paul5,*

1Clinical Trials and Biostatistics Unit, QIMR Berghofer Medical Research Institute, Brisbane, Australia 2School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Faculty of Health, Queensland University of Technology, Brisbane, Australia 3School of Computer Science, University of St. Andrews, St. Andrews, United Kingdom 4Smart Analyst Inc., New York, Unites States of America 5Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia

Received: March 27, 2017 Revised: May 06, 2017 Accepted: May 12, 2017 Abstract: Background: Electronic Medical Records (EMRs) from primary/ ambulatory care systems present a new and promising source of information for conducting clinical and translational research.

Objectives: To address the methodological and computational challenges in order to extract reliable medication information from raw data which is often complex, incomplete and erroneous. To assess whether the use of specific chaining fields of medication information may additionally improve the data quality.

Methods: Guided by a range of challenges associated with missing and internally inconsistent data, we introduce two methods for the robust extraction of patient-level medication data. First method relies on chaining fields to estimate duration of treatment (“chaining”), while second disregards chaining fields and relies on the chronology of records (“continuous”). Centricity EMR database was used to estimate treatment duration with both methods for two widely prescribed drugs among type 2 diabetes patients: insulin and glucagon- like peptide-1 receptor agonists.

Results: At individual patient level the “chaining” approach could identify the treatment alterations longitudinally and produced more robust estimates of treatment duration for individual drugs, while the “continuous” method was unable to capture that dynamics. At population level, both methods produced similar estimates of average treatment duration, however, notable differences were observed at individual-patient level.

Conclusion: The proposed algorithms explicitly identify and handle longitudinal erroneous or missing entries and estimate treatment duration with specific drug(s) of interest, which makes them a valuable tool for future EMR based clinical and pharmaco-epidemiological studies. To improve accuracy of real-world based studies, implementing chaining fields of medication information is recommended.

* Address correspondence to this author at the Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia; Tel: +61 3 93428433; Fax: +61 3 93428780; E-mails: [email protected]; [email protected]

1875-0362/17 2017 Bentham Open

50 2 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

Keywords: Electronic medical records, Treatment duration, Data mining, Type 2 diabetes, Rule-based algorithm, Patient-level data aggregation.

1. INTRODUCTION The electronic medical records (EMRs) and the administrative data from the primary/ambulatory care systems are increasingly being used in epidemiological [1 - 3], pharmaco-epidemiological [4 - 6], pharmaco-vigilance [7 - 9], clinical outcome [5, 10 - 12], health economic [13, 14] and public health related studies [15 - 18]. Analyses of large primary care based EMRs from various countries, most notably from UK, USA and Sweden, have provided significant insight into the effectiveness of changes in health care practices/polices on overall disease and health management costs [3, 15, 19, 20], in addition to population level evidences on the safety and effectiveness of various therapies and the association of disease-related risk factors on long-term outcomes [5, 6, 18, 21 - 23]. Increasing use of such large real- world patient-level data is illustrated well by the sixfold increase in EMR based published studies since 2000 [10, 24]. In structured EMRs, especially from the primary/ambulatory care systems, comprehensive patient level data are captured on different domains simultaneously and stored in the form of relational database [25, 26]. Representative examples include the UK Clinical Practice Research Database and CentricityTM EMR (CEMR) database of USA [27, 28]. The extraction, quality control and management of such voluminous longitudinal data under individual study protocols is highly methodologically and computationally involved, and challenging from data mining and analytical viewpoints [22, 29]. Data science generally considers that data preparation tasks consume about 80% of total project timeline leaving only 20% for ultimate analysis itself [30, 31]. Data completeness, systematic biases, reproducibility and quality are some of the notable limitations in such databases [18, 29, 32]. Most EMR databases capture large amounts of detailed information on medications provided to individuals over time, while specific form in which this information is stored varies from database to database [26]. It is usually possible to obtain the drug class, specific brand name within the corresponding class, prescription dates, dosage, and number of refills [32]. However, a significant number of entries for an individual prescription may be missing or contain errors. The problem with information completeness can also arise when the medication nomenclature is not correctly matched [29]. Clinical and pharmaco-epidemiological studies, which rely on the data from EMRs, are often interested in the effectiveness of specific therapies, therapeutical dynamics, treatments with concomitant medications, and durations thereof in specific disease areas. Such real-world analysis provides an extremely valuable means for the understanding of drug utilization patterns, treatment initiation periods following the diagnosis of a disease, the effectiveness of specific therapies on disease-related risk factors, and possible associations of therapies with long-term outcomes [1, 6]. These studies warrant appropriate extraction of longitudinal information on prescriptions or medications at individual patient level, inappropriate extraction of the data may result in misleading inferences reported [33 - 35]. Generally, pharmaco- epidemiological studies do not estimate treatment duration, but only account for the fact of one or more prescriptions for a particular drug(s) [36, 37]. Some studies calculated medication duration by extracting first prescription date from the last prescription date [38, 39], and only few studies additionally considered a drug being discontinued if the subsequent prescription was not refilled within the expected time of drug coverage [40, 41]. While some studies have discussed the challenges in the analysis of medication data from EMRs [18, 42], to the best of our knowledge no existing study has analysed the quality, consistency, and completeness of EMR prescription information, nor proposed a practical algorithm able to extract salient medication information from large and complex longitudinal data sets [43]. The aims of this explanatory and methodological study are (1) to discuss and analyse the most pressing challenges encountered by computer based methods in the process of extracting and aggregating longitudinal medication data from EMRs, (2) to describe two algorithms to extract prescription information of individual therapies and to estimate the corresponding duration of treatment, and (3) to discuss how estimates of individual medication duration are affected by the choice of the study design. The effectiveness of algorithms is compared is on a cohort of patients with a clinical diagnosis of type 2 diabetes (T2DM) using a real-world EMR database collected across the USA.

2. MATERIALS AND METHODS

2.1. Centricity Electronic Medical Records The CEMR database contains more than 40 million patients’ clinical/treatment records from 1995. CEMR represents 49 US states and a variety of ambulatory medical practices, including solo practitioners, community clinics,

51 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 3 academic medical centres, and large integrated delivery networks. The database has been extensively used for academic research worldwide [3, 37, 44 - 47]. The CEMR database consists of over 30,000 health care providers, of whom approximately 70% are primary care providers. For both insured and uninsured patients, this database contains comprehensive patient-level information on many aspects including demographic information, laboratory results, history of diseases, clinical diagnosis of symptoms/ diseases, vital signs, history of medications and detailed information on the ongoing medications. For this study we used longitudinal information from January 1995 to October 2014.

2.2. Medication Data in Centricity EMR database The medications taken by an individual (medication domain) and the prescriptions for drugs provided to the individuals by the service provider registered within the EMR system (prescription domain) are extensively documented in the database by means of three tables: medication dimension (MD), medication fact (MF) and prescription fact (PF). The MF and PF belong to the medication and prescription domains respectively. The MF may include a broader list of all medications that a patient is taking including over the counter medications, herbal remedies and medications prescribed by a provider that may be out of the EMR network. MD is linked to both MF and PF. Each record in the MD contains information on individual drug, which includes the National Drug Code (NDC) and Generic Product Identifier (GPI), as well as the four ordered attributes derived from the GPI such as generic drug names. The MD also includes the medication doses corresponding to different brands’ products, identified by a unique medication key value assigned to each record. The entries in MF capture individual patient’s medication prescription history and active prescriptions from all practitioners including the service provider registered within EMR system. It contains several special fields to track longitudinal patterns, such as active medication flag, which indicates if a patient was taking the drug at the database extraction moment. Active medication list is identified by records with value “Y” of active flag. The chain identification (ID) values facilitate tracking of treatment alterations (including the addition of new medications) over time, with the related chain sequence values which track medication adjustments within the same chain ID. The initiation (‘start’) and cessation (’stop’) dates associated with different treatments are also stored in the MF. However we found that the corresponding values are missing with alarming frequencies: 67% of the cases for the former and 11% for the latter. Also, some of the start and stop date entries could be erroneous, such as stop date preceding start date. An excerpt from the MF for an individual patient is shown in Table 1. Table 1. Snapshot of MF table – treatment intensification.

GPI category 4 Medication key (M) Patient key (P) Create date Start date Stop date Active Chain ID Chain (C) (B) (S) flag (F) (H) seq (G) METFORMIN HCL 41467 288859 6-May-09 6-May-09 N 307667619 0 METFORMIN HCL 41467 288859 11-Jun-10 11-Jun-10 N 307667619 1 METFORMIN HCL 41467 288859 25-Apr-11 11-Jun-10 25-Apr-11 N 307667619 2 LIRAGLUTIDE 3347202 288859 25-Apr-11 25-Apr-11 N 812855070 0 LIRAGLUTIDE 3347202 288859 10-May-11 10-May-11 N 812855070 1 LIRAGLUTIDE 3347202 288859 10-May-11 10-May-11 N 820957274 0 LIRAGLUTIDE 3347202 288859 14-Dec-11 10-May-11 14-Dec-11 N 820957274 1 LIRAGLUTIDE 3347202 288859 14-Dec-11 10-May-11 14-Dec-11 N 812855070 2 682327 288859 27-Feb-12 N 1092145628 0 INSULIN ISOPHANE HUMAN 682834 288859 27-Feb-12 N 1092145627 0 INSULIN GLARGINE 682327 288859 26-Sep-12 26-Sep-12 N 1092145628 1 INSULIN ISOPHANE HUMAN 682834 288859 14-Nov-12 N 1092145627 1 INSULIN GLARGINE 682327 288859 14-Nov-12 Y 1092145628 2 (HUMAN) 682825 288859 26-Feb-14 26-Feb-14 Y 1092145627 2 The entries in the PF capture the prescription date and the associated number of refills only for medications that have been prescribed by the responsible provider within the EMR network. The MF dataset contains a broader set of entry sources, moreover the form of recording potentially comprises more details than corresponding data in the PF. Nevertheless it was determined that PF may contain unique entries that are not stored in MF. Therefore, the MF was considered as the primary source of medication information and the PF as a complimentary one.

52 4 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

3. METHODS In this section, we introduce a novel algorithm for mining large-scale longitudinal EMRs with the ultimate goal of estimating the duration of treatment of a particular individual with a drug(s) of interest. The first method we introduce (“chaining”) relies on chain ID and chain sequence values recorded in the MF. This feature of the approach allows to account for treatments which include alternative drug use. To assess the importance and power of longitudinal chain information, we also describe a modification of the “chaining” method (“continuous”) which disregards chain ID and chain sequence values, and instead relies only on the chronology of patient’s records of particular drug(s). In the current literature, the latter approach is used more frequently.

3.1. Data Pre-processing: Auxiliary Fields Although erroneous entries generally cannot be identified, various types of global consistency rules may be applied to reduce the error. Chronology of the events may be corrected by incorporating two additional fields: patient’s last available follow-up date and patient’s date of birth (DOB). CEMR database stores last available follow-up date for each patient. As initial data pre-processing step, erroneous follow-up date entries were identified and corrected by the latest record creation dates of all activities within the database for corresponding patients. Similar to many anonymized EMRs, the exact DOB was not available within CEMR. Simple procedure was applied to approximate DOB:

1. Obtain multiple DOB estimates per patient by subtracting reported ‘valid’ age from the record creation date for all activities within the database. CEMR groups patients older than 80 years under a single age key. The non- missing age data and the non 80+ age keys were considered as ‘valid’ age entries. 2. Approximate DOB as minimum of all estimates from Step 1. 3. For patients without reported activities estimate DOB from the dataset containing demographic information by subtracting reported ‘valid’ age from the database extraction date.

The parameters for the mathematical formulations are identified in the Table 2 below. Table 2. Mathematical Formulation

Scalars n number of records in MF table k number of records in PF table sd standard prescription duration for individual drug mx maximal number of prescription refills for individual drug u number of unique patient keys in the cohort of interest Sets

PS = {ps1,ps2,…… psu} set of unique patient keys in the cohort of interest V set of missing values MS set of medication keys of selected drug(s)

FY = {fi|fi = "Y", i = 1, 푛 } set of active drugs MF={M,P,C,B,S,F,H,G} dataset T M = (m1, m2,... ,mn) medication keys for drugs T P = (p1, p2,... ,pn) patient keys T C = (c1, c2,... ,cn) record creation dates T B = (b1, b2,... ,bn) start dates of individual records T S = (s1, s2,... ,sn) stop dates of individual records T F = (f1, f2,... ,fn) active medication flag values T H = (h1, h2,... ,hn) chain identification values T G = (g1, g2,... ,gn) chain sequence values PF={M,P,C,B,R} dataset T M = (m1, m2,... ,mk) medication keys for individual prescriptions T P = (p1, p2,... ,pk) patient keys

53 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 5

(Table 2) contd..... T C = (c1, c2,... ,ck) record creation date T B = (b1, b2,... ,bk) prescription dates T R = (r1, r2,... ,rk) number of refills for individual prescription The scalars sd and mx may be defined on the basis of the standard prescription protocol for individual drugs. The default values of sd =1 and mx =24 were considered in our analyses. MS may be identified by text-mining the MD dataset. For example, glucagon-like peptide-1 receptor agonist (GLP-1RA) may be identified by searching for “GLP-1 RECEPTOR AGONIST” in the second order GPI attributed field.

3.2. “Chaining” Method The algorithm for the first approach to extract and aggregate data for the estimation of duration of treatment is elaborated below. 1. Merge the following to the MF dataset by patient key:

T 1.1) date of birth DOB = (db1, db2,...,dbn) .

T 1.2) last available follow-up date L = (l1, l2,...,ln) . The extended MF dataset would be of the form.

푀퐹1 = {푀, 푃, 퐶, 퐵, 푆, 퐹, 퐻, 퐺, 퐷푂퐵, 퐿}

2. Replace erroneous values of start dates (bi ∉ V ∧ (bisi ∨bi>li), i =1, 푛 ) with missing values 3. Sort by patient key ascending, chain ID ascending within the same patient, chain sequence descending within the same chain ID.

1 푀퐹 : 푎) 푝푖 ≤ 푝푖+1, 푖 = 1, 푛

푏) ℎ푖 ≤ ℎ푖+1, ∀푖: 푝푖 = 푝푖+1 - post a) sorting

푐) 푔푖 ≥ 푔푖+1, ∀푖: ℎ푖 = ℎ푖+1 ∧ 푝푖 = 푝푖+1 - post b) sorting

T 4. Set initial values p0 = 0, and approximate individual medication end dates E = (e1, e1,...,en) on the basis of the following rules: 4.1) if stop date is not missing, then end date equals to stop date. 4.2) else, if active flag is “Y”, then end date equals to last follow-up date. 4.3) else, if first unique value of patient key or first unique value of chain ID, and start date is not missing, then end date equals to start date plus standard prescription duration. 4.4) else, if first unique value of patient key or first unique value of chain ID, and start date is missing, then end date equals to record creation date plus standard prescription duration. 4.5) else, end date equals to the create date of a previous record. ( ) 푒푖 = 핀{푏푖∉푉} ⋅ (핀{푝푖≠푝푖−1} + 핀{ℎ푖≠ℎ푖−1} − 핀{푝푖≠푝푖−1} ∙ 핀{ℎ푖≠ℎ푖−1} ) ∙ (푠푖 ⋅ 핀푠푖∉푉 + 푙푖 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 푏푖 + 푠푑 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉) +

( ) 핀{푏푖∈푉} ⋅ (핀{푝푖≠푝푖−1} + 핀{ℎ푖≠ℎ푖−1} − 핀{푝푖≠푝푖−1} ∙ 핀{ℎ푖≠ℎ푖−1} ) ∙ (푠푖 ⋅ 핀푠푖∉푉 + 푙푖 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 푐푖 + 푠푑 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉) +

핀{푝푖=푝푖−1} ⋅ 핀{ℎ푖=ℎ푖−1} ∙ (푙푖 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 푠푖 ⋅ 핀푠푖∉푉 + 푐푖−1 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉),

54 6 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

where 핀{⋅} is an indicator function:

1, 푖푓 푎 = 푏 1, 푖푓 푎 ∈ 푏 핀 = { 핀 = { {푎=푏} 0, 푒푙푠푒 {푎∈푏} 0, 푒푙푠푒

5. Replace values of end dates that falls out of the follow-up interval with last follow-up date.

푒푖 = 푒푖 ⋅ 핀{푒푖≤푙푖} + 푙푖 ⋅ 핀{푒푖>푙푖}

6. Delete records if start date is missing and create date is greater than stop date. Reduce the dataset to the set of patients from the cohort of interest and to set of keys of selected drug(s).

2 1 푀퐹 = {푀퐹 : 푝푖 ∈ 푃푆 ∧ 푚푖 ∈ 푀푆 ∧ ¬(푏푖 ∈ 푉 ∧ 푐푖 > 푒푖), 푖 = 1, 푛}

7. Merge the following to the PF set by patient key:

T 7.1) date of birth DOB = (db1, db2,...,dk)

T 7.2) last available follow-up date within the database L = (l1, l2,...lk) . The extended PF dataset would take the following form:

푃퐹1 = {푀, 푃, 퐶, 퐵, 푅, 퐷푂퐵, 퐿}

8. Replace erroneous prescription dates (bi ∉ V ∧ (bili), i=1, 푘 ) with missing values. 9. If number of refills is greater than pre-defined maximal number of possible refills or negative or missing, replace it with zero.

푟푖 = 푟푖 ⋅ 핀{푟푖<푚푥} ⋅ 핀{푟푖≥0} ⋅ 핀{푟푖∉푉}, 푖 = 1, 푘

T 10. Calculate end dates E = (e1, e2,...,ek) by the following rules. 10.1) if prescription date is not missing, then end date is equals to standard duration multiplied by the number of refills plus one and added to prescription date. 10.2) if prescription date is missing, then end date is equals to standard duration multiplied by the number of refills plus one and added to record creation date.

( ( ) ) ( ( ) ) 푒푖 = 푒푖 + 푟푖 + 1 ⋅ 푠푑 ⋅ 핀{푒푖∉푉} + 푐푖 + 푟푖 + 1 ⋅ 푠푑 ⋅ 핀{푒푖∈푉}

11. Update end dates as described in Step 5 ( 푒푖 = 푒푖 ⋅ 핀{푒푖≤푙푖} + 푙푖 ⋅ 핀{푒푖>푙푖}, 푖 = 1, 푘) . 12. Reduce PF1 to the set of patients from the cohort of interest, to the set of patients not in MF2, and to the set of keys of selected drug(s).

2 1 2 푃퐹 = {푃퐹 : 푝푖 ∈ 푃푆 ∧ 푚푖 ∈ 푀푆 ∧ (푝푖 ∉ 푃 ⊂ 푀퐹 ), 푖 = 1, 푘 }

13. Append both datasets by the following values: patient key, record creation date, start / prescription date and end date, assume that the new dataset MP contain n' records.

55 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 7

푀퐹3 = {푃, 퐶, 퐵, 퐸} ⊂ 푀퐹2

푃퐹3 = {푃, 퐶, 퐵, 퐸} ⊂ 푃퐹2

푀푃 = 푀퐹3⋃푃퐹3

14. Calculate the number (cn) of distinct record creation dates for each patient, treat missing start dates by the following rules: 14.1) if cn is equal to one, then delete the record. 14.2) if cn is greater than one, replace it with record creation date. 15. Sort by patient key ascending, start date ascending within same patient key.

푀푃 ∶ 푎) 푝푖 ≤ 푝푖+1, 푖 = 1, 푛′

푏) 푏푖 ≤ 푏푖+1, ∀푖푖: 푝푖 = 푝푖+1 - post a) sorting

j 16. For each unique patient key psj ∈PS,j = 1, 푢 reduce MP to the set FN containing only pi = psj, i = 1, 푛′ . j Assume that obtained dataset FN has n'' rows. Set e0 = 0 and calculate selected medication duration for the patient avoiding double calculations of overlapping intervals.

푛′′ 푗 ( ) 퐹푁 : 푑푗 = ∑( 푒푖 − 푏푖 ⋅ 핀{푏푖≥푒푖−1} + (푒푖 − 푒푖−1) ⋅ 핀{푏푖<푒푖−1} ⋅ 핀{푒푖≥푒푖−1}) 푖=1

T 17. Use medication duration D = (d1, d2,...,du) to conduct further research.

3.3. “Continuous” Method 1. Repeat steps 1 and 2 from “chaining” method, then perform step 6, and treat missing values in MF2 as described in step 14. Assume that obtained dataset MF2 has 푛̃ instances.

T 2. Create stop date status variable SI = (st1, st2,...,st 푛̃ ) on the basis of the following rules: 2.1) if active flag is “Y” and stop date is missing, then stop date status equals to 2. 2.2) if stop date is not missing, then stop date status equals to 1. 2.3) else 0.

푠푡푖 = 2 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 1 ⋅ 핀푠푖∉푉 + 0 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉

푀퐹3 = {푀,푃,퐶,퐵,푆,퐹,퐻,퐺,퐷푂퐵,퐿, 푆푇}

3. Sort MF3 by patient key ascending, start date descending within same patient key, stop date status ascending within the same start dates of the same patient:

3 푀퐹 : 푎) 푝푖 ≤ 푝푖+1, 푖 = 1, 푛̃

푏) 푏푖 ≥ 푏푖+1, ∀푖: 푝푖 = 푝푖+1 - post a) sorting

푐) 푠푡푖 ≤ 푠푡푖+1, ∀푖: 푏푖 = 푏푖+1 ∧ 푝푖 = 푝푖+1 - post b) sorting

56 8 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

T 4. Set initial value p0 = 0 and approximate individual medication end dates E = (e1, e2,...,e푛̃ ) . 4.1) if stop date is not missing, then end date equals to stop date. 4.2) else, if active flag is “Y”, then end date equals to last follow-up date. 4.3) else, if first unique patient key, then end date equals to start date plus standard duration. 4.4) else end date equals to start date of previous record.

( ) 푒푖 = 핀{푝푖≠푝푖−1} ∙ (푙푖 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 푠푖 ⋅ 핀푠푖∉푉 + 푏푖 + 푠푑 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉) +

+핀{푝푖=푝푖−1} ∙ (푙푖 ⋅ 핀푓푖∈퐹푦 ⋅ 핀푠푖∈푉 + 푠푖 ⋅ 핀푠푖∉푉 + 푏푖−1 ⋅ 핀푓푖∉퐹푦 ⋅ 핀푠푖∈푉)

5. Perform step 5 from “chaining” method, and steps 7-11. 6. Reduce PF1 to the set of patients from the cohort of interest, to the set of patients not in MF3, and to the set of keys selected drug(s).

2 1 3 푃퐹 = {푃퐹 : 푝푖 ∈ 푃푆 ∧ 푚푖 ∈ 푀푆 ∧ (푝푖 ∉ 푃 ⊂ 푀퐹 ), 푖 = 1, 푘 }

7. Treat missing values in PF2 as described in step 14 of "chaining" method. 8. Append both datasets by the following values: patient key, record creation date, start/prescription date and end date, assume that the new dataset MP contain 푛̃ records.

푀퐹4 = {푃, 퐶, 퐵, 퐸} ⊂ 푀퐹3

푃퐹3 = {푃, 퐶, 퐵, 퐸} ⊂ 푃퐹2

푀푃 = 푀퐹4⋃푃퐹3

9. Perform steps 15-17 from “chaining” method.

4. REMARKS Identified erroneous entries are declared as missing in Steps 2, 8, and 9 of “chaining” method. In the Step 14, the algorithm counts the number of unique creation dates for selected drug(s) at patient level. If obtained number is greater than one, then missing start dates are replaced with record creation dates. In such a way, a patient is considered to take a particular drug if the medication records were entered in a systematic manner, otherwise the records with missing start dates are disregarded. As an example, the prescription scenario for anti-diabetes drugs for a patient with type 2 diabetes is presented in Table 1. The treatment was initiated with metformin (METFORMIN HCL) on the 6th of May 2009 and continued until the 25th of April 2011, when a switch to GLP-1RA (LIRAGLUTIDE) was made. With a stop date for GLP-1RA recorded on 14th of December 2011, data show a gap in the treatment till 26th of September 2012, when insulin therapy begun. However, a patient with diabetes using GLP-1RA is unlikely to have had a nine month long gap in the treatment. Indeed, careful data examination leads to the conclusion that insulin treatment started on 27th of February 2012, as would be estimated by the algorithm. As it was mentioned earlier, MF was considered as primary data source, thus if at least one record for selected drug(s) at patient level is present in the MF, then both methods disregard entities in the PF. However, if there is no available data in MF table, the methods append data from PF. Assessment of the first marketing date for a particular drug is an example of additional global consistency audit that is omitted in the methods’ description. For instance, any start date of GLP-1RA drugs must not be prior to April 2005,

57 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 9 the date when first representative (Exenatide) was approved.

5. RESULTS To evaluate the performance of described methods, we chose to focus on the estimation of the duration of treatment with two widely used anti-diabetic drugs, namely GLP-1RA and insulin. In the CEMR database 1,861,560 patients were identified as having been diagnosed with type 2 diabetes mellitus, as inferred from the assigned ICD-9 codes.

5.1. Case Study 1 As the first case study, we consider a randomly selected patient from the CEMR database, whose relevant treatment details are shown in Table 3. The treatments with EXENATIDE and INSULIN GLARGINE started on the 18th of June 2007. The treatment with EXENATIDE was terminated on the 7th of January 2008, while INSULIN therapy continued until the last recorded follow-up date on the 24th of January 2008 (notice that the treatment is flagged as active, “Y”). In this case, the “chaining” and “continuous” methods produce the same estimates for the durations of the two treatments. Specifically, the estimates corresponding to insulin and GLP-1RA are 7.2 and 6.7 months, respectively.

Table 3. Snapshot of MF table-combining therapies. Patient’s last follow-up date was identified as 24 January 2008.

GPI category 4 Medication Patient key Create date Start date Stop Active Chain ID Chain Enddate Enddate key (M) (P) (C) (B) date (S) flag (H) seq (“chaining”) (“continuous”) (F) (G) INSULIN 682327 15219411 18-Jun-07 18-Jun-07 N 136664321 0 20-Jun-07 20-Jun-07 GLARGINE EXENATIDE 12670645 15219411 18-Jun-07 18-Jun-07 N 136664552 0 17-Oct-07 15-Oct-07 INSULIN 1096062 15219411 20-Jun-07 20-Jun-07 N 136664321 1 7-Jan-08 7-Jan-08 GLARGINE EXENATIDE 12670548 15219411 17-Oct-07 15-Oct-07 N 136664552 1 7-Jan-08 7-Jan-08 INSULIN 1096062 15219411 7-Jan-08 7-Jan-08 Y 136664321 2 24-Jan-08 24-Jan-08 GLARGINE EXENATIDE 12670548 15219411 7-Jan-08 7-Jan-08 7-Jan-08 N 136664552 2 7-Jan-08 7-Jan-08

5.2. Case Study 2 As an insightful case study, we consider a patient whose relevant treatment details are shown in Table 4. Since all of the records shown have the same chain ID it can be concluded that in the period from the 23rd of April of 2010 until the 13th of March 2013 the patient was alternating between two therapies, namely with GLP-1RA (EXENATIDE) and insulin (INSULIN GLARGINE). This example illustrates the importance of chain ID information, as readily corroborated by comparing the predicted therapy end dates using the “chaining” and “continuous” methods (per record estimates are shown in the two rightmost columns of Table 4). The latter disregards chain ID information, it implicitly assumes that EXENATIDE was taken continuously from the 23rd of April 2010 until the 27th of April 2011, with the last prescription date being the 28th of March 2011. However, treatment with EXENATIDE was terminated on the 29th of December 2010 when a switch to insulin was made. Treatment with insulin continued until the 28th of March 2011 when a switch back to EXENATIDE appeared. This complex and frequent pattern of therapy alteration leads to vastly different treatment duration estimates when chain ID information is used (“chaining”) and when it is not (“continuous”). For example, in this particular case, “continuous” approach estimates the total duration of insulin/ EXENATIDE treatment to be 5.7/ 28.9 months, compared to 26.5/ 12.1 months estimated by “chaining” method. Table 4. Snapshot of MF table-switching between therapies. Patient’s last follow up date was identified as 13 March 2013.

GPI category 4 Medication Patient key Create date Start date Stop Active Chain ID (H) Chain Enddate Enddate key (M) (P) (C) (B) date flag seq (“chaining”) (“continuous”) (S) (F) (G) EXENATIDE 1523512 64832053 23-Apr-10 23-Apr-10 N 1002923273 0 29-Dec-10 28-Mar-11 INSULIN 682327 64832053 29-Dec-10 29-Dec-10 N 1002923273 1 06-Jan-11 06-Jan-11 GLARGINE INSULIN 682327 64832053 06-Jan-11 06-Jan-11 N 1002923273 2 28-Mar-11 18-Dec-12 GLARGINE

58 10 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

(Table 4) contd..... GPI category 4 Medication Patient key Create date Start date Stop Active Chain ID (H) Chain Enddate Enddate key (M) (P) (C) (B) date flag seq (“chaining”) (“continuous”) (S) (F) (G) EXENATIDE 1523512 64832053 28-Mar-11 28-Mar-11 N 1002923273 3 18-Dec-12 27-Apr-11 INSULIN 682327 64832053 18-Dec-12 18-Dec-12 N 1002923273 4 13-Mar-13 13-Mar-13 GLARGINE INSULIN 682327 64832053 13-Mar-13 13-Mar-13 Y 1002923273 5 13-Mar-13 13-Mar-13 GLARGINE 5.3. General Analysis Given our focus on GLP-1RA and insulin, to facilitate further analysis, from the cohort of all T2DM patients we selected those who at any point in their medical history received treatment with either of the two drugs of interest. Text mining of drug names in MD table revealed various insulin regimens as well as related devices (e.g. insulin syringe). To quantify the result, we found that approximately 30% of the patients in the T2DM cohort received at least one prescription for insulin drug. Interestingly, a large number of patients (~25,000) were found to have received prescriptions for insulin devices but not for insulin therapy itself. Further exploration on these patients revealed that the average duration of use of these devices in this patient group was 21 months (Table 5), strongly suggesting that there was an accompanying insulin therapy which was not recorded in the stored EMRs. This conclusion is further corroborated by the finding that the mean glycated haemoglobin (HbA1c) level for these patients was measured to be 7.8% on the date of the first record associated with the device. Table 5. Summary statistics on the estimated duration in months of treatment with specific medications in T2DM cohort (n=1,861,560) by “chaining” and “continuous” methods, and the difference in the estimated duration between “chaining” and “continuous” methods.

“Chaining” method “Continuous” method “Chaining” - “continuous” n (%) Mean (min, Median n (%) Mean (min, Median n (%) Mean (min, Median (sd) max) (IQR) (sd) max) (IQR) (sd) max) (IQR) Insulin + 588923 32.5 (35) (0, 21.6 (6.5, 591441 32.7 (0, 21.8 (6.3, 588923 -0.2 (-167.8, 0 (0, 0) device (32) 657.8) 46.8) (32) (34.9) 657.8) 47.4) (32) (4.8) 183.4) Insulin only 563293 32.0 (0, 20.8 (6.1, 566014 32.2 (0, 21.0 (6, 563293 -0.3 (5) (-167.8, 0 (0, 0) (30) (34.9) 657.8) 45.8) (30) (34.8) 657.8) 46.5) (30) 176.9) no Insulin, but 25536 (1) 21.2 (0, 14.3 (4.8, 25386 (1) 21.2 (0, 14.1 (4.4, 24910 (1) -0.2 (5) (-131.8, 0 (0, 0) device (21.5) 196.8) 30.9) (21.9) 190.7) 31.1) 183.4) GLP1RA 113416 18.3 (0, 11.7 (3.9, 114316 19.2 (0, 11.7 (3.5, 113416 -1.0 (-103.9, 0 (0, 0) (6) (19.4) 110.7) 26) (6) (21.0) 111.7) 27.4) (6) (7.6) 95.4) Exenatide 73326 (4) 18.8 (0, 11.6 (3.9, 74060 (4) 18.8 (0, 10.6 (3.1, 73326 (4) -0.2 (-97.0, 0 (0, 0) (20.2) 110.7) 26.5) (21.4) 111.2) 26.7) (8.4) 95.4) Liraglutide 56406 (3) 12.5 (0, 56.2) 8.6 (3, 19) 56907 (3) 12.7 (0, 56.2) 8.3 (2.5, 56406 (3) -0.3 (-49.5, 0 (0, 0) (11.9) (12.4) 19.5) (4.0) 47.5) Albiglutide 14 (0) 1.3 (0.5) (1, 2.4) 1 (1, 1.9) 15 (0) 1.3 (0.5) (1, 2.4) 1.0 (1, 1.9) 14 (0) 0 (0) (0, 0) 0 (0, 0) In patients with treatment duration ≥2 Months Insulin + 518000 36.8 (2, 26.4 (11.1, 518318 37.1 (2, 26.8 (11.2, 516808 -0.3 (-167.8, 0 (0, 0) device (28) (35.2) 657.8) 51.6) (28) (35.0) 657.8) 52.3) (28) (4.9) 176.9) Insulin only 492992 36.4 (2, 25.8 (10.7, 493494 36.7 (2, 26.3 (10.9, 491847 -0.4 (-167.8, 0 (0, 0) (26) (35.2) 657.8) 50.8) (27) (35.1) 657.8) 51.6) (26) (5.2) 176.9) no Insulin, but 22085 (1) 24.3 (2, 17.8 (8, 21628 (1) 24.7 (2, 18.0 (8, 21342 (1) -0.5 (-131.8, 0 (0, 0) device (21.5) 196.8) 34.1) (21.9) 190.7) 34.8) (4.1) 65.3) GLP1RA 96458 (5) 21.3 (2, 14.9 (6.8, 94972 (5) 22.9 (2, 15.7 (6.9, 94372 (5) -1.5 (-103.9, 0 (0, 0) (19.6) 110.7) 29.3) (21.3) 111.7) 31.8) (7.8) 95.4) Exenatide 62538 (3) 21.8 (2, 14.7 (6.6, 60228 (3) 22.9 (2, 15.0 (6.5, 59812 (3) -0.8 (-97.0, 0 (0, 0) (20.4) 110.7) 30.4) (21.7) 111.2) 32.1) (8.0) 95.4) Liraglutide 45432 (2) 15.3 (2, 56.2) 12 (5.8, 44344 (2) 16 (12.2) (2, 56.2) 12.5 (5.9, 43991 (2) -0.6 (-49.5, 0 (0, 0) (11.6) 22.1) 23.4) (3.9) 43.9) Albiglutide 2 (0) 2.2 (0.2) (2.1, 2.4) 2.2 (2.1, 2 (0) 2.2 (0.2) (2.1, 2.4) 2.2 (2.1, 2 (0) 0 (0) (0, 0) 0 (0, 0) 2.4) 2.4) The number of patients receiving insulin and GLP-1RA, and the corresponding treatment duration estimates (in months) produced by our algorithms (“chaining” and “continuous”), are summarized in Table 5. Different insulin regimens were treated jointly, as we found that any finer level of detail is poorly recorded in the database. As regards to

59 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 11

GLP-1RA treatment, only three different GLP-1RA drugs (namely, Exenatide, Liraglutide, and Albiglutide) have been used. Being new to the market (introduced in 2014), only limited data was available for Albiglutide treatment. The estimate of the proportion of patients identified as having received specific individual drugs was found to be very similar using both the “chaining” approach, as well as the non-chain ID based alternative “continuous” approach, as shown in Table 5. The corresponding values of the key statistics – namely the mean, standard deviation (SD), median, and the interquartile range (IQR)- of the respective estimates of the duration of treatment with individual drugs were also similar. The average differences in the estimated duration of treatment with insulin only and GLP-1RA drugs were 0.3 month and 1 month respectively. There were no differences at the median levels. Separate analyses for patients with minimum 2 months of treatment duration with individual therapies also revealed the same results. However, it is important to note that although the cumulative statistics of the estimated treatment durations with different therapies were not significantly different, we did find notable differences in the minimum and maximum duration estimates for specific patient subgroups, as evident from (Table 5).

6. DISCUSSION In this work we addressed a number of challenging data mining related issues while extracting patient-level longitudinal information on prescription patterns and medication usages from large relational databases (our data set comprises more than a billion records). There are several key contributions of note. Firstly we identified the specific challenges which automatic methods must deal with in the processing of this complex voluminous data. We corroborated our arguments using analysis of real-world EMRs and discussed the importance and the implications of being able to handle erroneous and incomplete longitudinal information. Secondly, we introduced two methods for the estimation of the duration of treatment with specific drug(s) in the presence of the aforementioned challenges. Developed sequentially ordered case by case rules were presented mathematically. To the best of our knowledge, no robust algorithmic approach has yet been reported to evaluate treatment duration with individual medications in multiple treatment scenario [22, 27]. We have described two algorithmic approaches to estimate treatment duration on the individual record level. First method (“chaining”) relies on specific chaining fields of medication information, while second approach (“continuous”) does not use chain related information and employs only chronological record information instead. Our results on the large Centricity EMR database show that the two approaches do not produce significantly different results on average at population level. However, when examined in detail, the “chaining” method could identify the treatment alterations longitudinally and was shown to be more robust at individual patient level. Furthermore, treatment duration estimates from the “continuous” approach are more sensible to the set of selected medications. The difference between methods is particularly prominent in studies involving multiple drugs as opposed to single drug therapies or focusing on the order of treatment initiation [48, 49]. Our study highlighted the potential risk of underestimating the duration of treatment when EMR data is used directly, due to erroneous or incomplete data emerging from omissions in the data entry process, appointments missed by patients, typographical errors, or numerous others. Both proposed algorithms robustly handle these challenges whenever is possible, estimating values of the missing or erroneous entries. Importantly, being rule based, the decisions of our algorithms are readily interpretable by humans and lend themselves to effortless use by medical professionals not necessarily proficient in data mining and related disciplines. Both approaches implement two fact datasets available in the Centricity EMRs, however algorithms are easily adjusted in case of only one available dataset.

CONCLUSION This study discusses the challenges in exploring the prescription / medication patterns for individual patients in large primary / ambulatory care electronic databases, and introduces two algorithmic approaches for robust estimation of treatment duration with individual drug(s). We have demonstrated that implementing chaining fields of medication information additionally improve the quality of estimates. Given the importance of extracting medication information appropriately in pharmaco-epidemiological studies based on real world data, the proposed algorithms has the potential to significantly contribute to the analytical quality aspects in the future EMR based clinical and epidemiological studies.

LIST OF ABBREVIATIONS

EMR = Electronic Medical Rerecords CEMR = Centricity Electronic Medical Rerecords

60 12 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

T2DM = Type 2 Diabetes MD = Medication Dimension MF = Medication Fact PF = Prescription Fact GPI = Generic Product Identifier ID = Identification DOB = Date of Birth SD = Standard Deviation IQR = Interquartile Range GLP-1RA = Glucagon-Like Peptide-1 Receptor Agonist

ETHICS APPROVAL AND CONSENT TO PARTICIPATE Not applicable.

HUMAN AND ANIMAL RIGHTS No Animals/Humans were used for studies that are base of this research.

CONSENT FOR PUBLICATION Not applicable.

CONFLICT OF INTEREST Sanjay K. Paul (SKP) has acted as a consultant and/or speaker for Novartis, GI Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi Pharmaceutical and Amylin Pharmaceuticals LLC. He has received grants in support of investigator and investigator initiated clinical studies from Merck, Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi-Avensis and Pfizer. Olga Montvida (OM) and Ognjen Arandjelovic (OA) has no conflict of interest to declare. Edward Reiner (ER) was an employee of Quintiles and was responsible for the strategic development of the Centricity EMR database.

ACKNOWLEDGEMENTS Olga Montvida (OM) and Sanjay K. Paul (SKP) conceived the idea and were responsible for the primary design of the study and the methodological developments. Ognjen Arandjelovic (OA) and Edward Reiner (ER) evaluated the methodological approach. Olga Montvida (OM) conducted the data extraction and statistical analyses. The first draft of the manuscript was developed by Sanjay K. Paul (SKP) and Olga Montvida (OM), and all authors contributed to the finalization of the manuscript. Sanjay K. Paul (SKP) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Melbourne EpiCentre gratefully acknowledges the support from the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) initiative through Therapeutic Innovation Australia. No separate funding was obtained for this study. REFERENCES

[1] Paul SK, Klein K, Thorsted BL, Wolden ML, Khunti K. Delay in treatment intensification increases the risks of cardiovascular events in patients with type 2 diabetes. Cardiovasc Diabetol 2015; 14: 100. [http://dx.doi.org/10.1186/s12933-015-0260-x] [PMID: 26249018] [2] Bhatnagar P, Wickramasinghe K, Williams J, Rayner M, Townsend N. The epidemiology of cardiovascular disease in the UK 2014. Heart 2015; 101(15): 1182-9. [http://dx.doi.org/10.1136/heartjnl-2015-307516] [PMID: 26041770] [3] Crawford AG, Cote C, Couto J, et al. Comparison of GE Centricity Electronic Medical Record database and National Ambulatory Medical Care Survey findings on the prevalence of major conditions in the United States. Popul Health Manag 2010; 13(3): 139-50. [http://dx.doi.org/10.1089/pop.2009.0036] [PMID: 20568974] [4] Wettermark B, Zoëga H, Furu K, et al. The Nordic prescription databases as a resource for pharmacoepidemiological research--a literature review. Pharmacoepidemiol Drug Saf 2013; 22(7): 691-9. [http://dx.doi.org/10.1002/pds.3457] [PMID: 23703712]

61 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 13

[5] Lau EC, Mowat FS, Kelsh MA, et al. Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin Epidemiol 2011; 3: 259-72. [PMID: 22135501] [6] Paul SK, Klein K, Maggs D, Best JH. The association of the treatment with glucagon-like peptide-1 receptor agonist exenatide or insulin with cardiovascular outcomes in patients with type 2 diabetes: A retrospective observational study. Cardiovasc Diabetol 2015; 14: 10. [http://dx.doi.org/10.1186/s12933-015-0178-3] [PMID: 25616979] [7] Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc 2010; 17(6): 671-4. [http://dx.doi.org/10.1136/jamia.2010.008607] [PMID: 20962129] [8] Liu M, McPeek Hinz ER, Matheny ME, et al. Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc 2013; 20(3): 420-6. [http://dx.doi.org/10.1136/amiajnl-2012-001119] [PMID: 23161894] [9] Coloma PM, Trifirò G, Patadia V, Sturkenboom M. Postmarketing safety surveillance: where does signal detection using electronic healthcare records fit into the big picture? Drug Saf 2013; 36(3): 183-97. [http://dx.doi.org/10.1007/s40264-013-0018-x] [PMID: 23377696] [10] Lin J, Jiao T, Biskupiak JE, McAdam-Marx C. Application of electronic medical record data for health outcomes research: a review of recent literature. Expert Rev Pharmacoecon Outcomes Res 2013; 13(2): 191-200. [http://dx.doi.org/10.1586/erp.13.7] [PMID: 23570430] [11] Belletti D, Zacker C, Mullins CD. Perspectives on electronic medical records adoption: Electronic Medical Records (EMR) in outcomes research. Patient Relat Outcome Meas 2010; 1: 29-37. [http://dx.doi.org/10.2147/PROM.S8896] [PMID: 22915950] [12] Khunti K, Davies M, Majeed A, Thorsted BL, Wolden ML, Paul SK. and risk of cardiovascular disease and all-cause mortality in insulin-treated people with type 1 and type 2 diabetes: a cohort study. Diabetes Care 2015; 38(2): 316-22. [http://dx.doi.org/10.2337/dc14-0920] [PMID: 25492401] [13] Canavan C, West J, Card T. Calculating Total Health Service Utilisation and Costs from Routinely Collected Electronic Health Records Using the Example of Patients with Irritable Bowel Syndrome Before and After Their First Gastroenterology Appointment. Pharmacoeconomics 2016; 34(2): 181-94. [PMID: 26497004] [14] Bessou A, Guelfucci F, Aballea S, Toumi M, Poole C. Comparison of comorbidity measures to predict economic outcomes in a large UK primary care database. Value Health. 2015; 18(7): A691. [http://dx.doi.org/10.1016/j.jval.2015.09.2565] [15] Birkhead GS, Klompas M, Shah NR. Uses of electronic health records for public health surveillance to advance public health. Annu Rev Public Health 2015; 36: 345-59. [http://dx.doi.org/10.1146/annurev-publhealth-031914-122747] [PMID: 25581157] [16] Paul MM, Greene CM, Newton-Dame R, et al. The state of population health surveillance using electronic health records: a narrative review. Popul Health Manag 2015; 18(3): 209-16. [http://dx.doi.org/10.1089/pop.2014.0093] [PMID: 25608033] [17] Kukafka R, Ancker JS, Chan C, et al. Redesigning electronic health record systems to support public health. J Biomed Inform 2007; 40(4): 398-409. [http://dx.doi.org/10.1016/j.jbi.2007.07.001] [PMID: 17632039] [18] Menachemi N, Collum TH. Benefits and drawbacks of electronic health record systems. Risk Manag Healthc Policy 2011; 4: 47-55. [http://dx.doi.org/10.2147/RMHP.S12985] [PMID: 22312227] [19] Crapo J. Big data in gealthcare: separating the hype from the reality. HealthCatalyst 2015; p. 5. [20] Grabenbauer L, Skinner A, Windle J. Electronic Health Record Adoption - Maybe It’s not about the Money: Physician Super-Users, Electronic Health Records and Patient Care. Appl Clin Inform 2011; 2(4): 460-71. [http://dx.doi.org/10.4338/ACI-2011-05-RA-0033] [PMID: 23616888] [21] Paul SK, Klein K, Majeed A, Khunti K. Association of smoking and concomitant metformin use with cardiovascular events and mortality in people newly diagnosed with type 2 diabetes. J Diabetes 2016; 8(3): 354-62. [http://dx.doi.org/10.1111/1753-0407.12302] [PMID: 25929583] [22] Gaitanou P, Garoufallou E, Balatsoukas P. The effectiveness of big data in health care: a systematic review. Commun Comput Inf Sci 2014; 141-53. [http://dx.doi.org/10.1007/978-3-319-13674-5_14] [23] Svensson MK, Cederholm J, Eliasson B, Zethelius B, Gudbjörnsdottir S. Albuminuria and renal function as predictors of cardiovascular events and mortality in a general population of patients with type 2 diabetes: a nationwide observational study from the Swedish National Diabetes Register. Diab Vasc Dis Res 2013; 10(6): 520-9. [http://dx.doi.org/10.1177/1479164113500798] [PMID: 24002670] [24] Dean BB, Lam J, Natoli JL, Butler Q, Aguilar D, Nordyke RJ. Review: use of electronic medical records for health outcomes research: a

62 14 The Open Bioinformatics Journal , 2017, Volume 10 Montvida et al.

literature review. Med Care Res Rev 2009; 66(6): 611-38. [http://dx.doi.org/10.1177/1077558709332440] [PMID: 19279318] [25] Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med 2015; 7(1): 41. [http://dx.doi.org/10.1186/s13073-015-0166-y] [PMID: 25937834] [26] Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLOS Comput Biol 2012; 8(12): e1002823. [http://dx.doi.org/10.1371/journal.pcbi.1002823] [PMID: 23300414] [27] Torre C, Martins AP. Overview of Pharmacoepidemiological Databases in the Assessment of Medicines Under real-life Conditions. In: Lunet N, Eds. Epidemiolgy-current perspective on Research and practical Intech open publishers contributers 2012; pp.131-54. [http://dx.doi.org/10.5772/35318] [28] Centricity Electronic Medical Record Brochure. GE Healthcare 2011. [29] Lin J, Jiao T, Biskupiak JE, McAdam-Marx C. Application of electronic medical record data for health outcomes research: a review of recent literature. Expert Rev Pharmacoecon Outcomes Res 2013; 13(2): 191-200. [http://dx.doi.org/10.1586/erp.13.7] [PMID: 23570430] [30] Jermyn P, Dixon M, Read BJ. Preparing clean views of data for data mining. ERCIM Work on Database Res 1999; pp. 1-15. [31] Zhang S, Zhang C, Yang Q. Data preparation for data mining. Appl Artif Intell 2003; 17(5-6): 375-81. [http://dx.doi.org/10.1080/713827180] [32] Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13(6): 395-405. [http://dx.doi.org/10.1038/nrg3208] [PMID: 22549152] [33] Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med 2015; 12(10): e1001885. [http://dx.doi.org/10.1371/journal.pmed.1001885] [PMID: 26440803] [34] PLOS Medicine Editors. From checklists to tools: Lowering the barrier to better research reporting. PLoS Med 2015; 12(11): e1001910. [http://dx.doi.org/10.1371/journal.pmed.1001910] [PMID: 26600090] [35] Yao L, Zhang Y, Li Y, Sanseau P, Agarwal P. Electronic health records: Implications for drug discovery. Drug Discov Today 2011; 16(13-14): 594-9. [http://dx.doi.org/10.1016/j.drudis.2011.05.009] [PMID: 21624499] [36] Hall GC, McMahon AD, Dain M-P, Home PD. A comparison of duration of first prescribed insulin therapy in uncontrolled type 2 diabetes. Diabetes Res Clin Pract 2011; 94(3): 442-8. [http://dx.doi.org/10.1016/j.diabres.2011.09.003] [PMID: 21963105] [37] Hansen RA, Farley JF, Maciejewski ML, Ye X, Qian C, Powers B. Real-world utilization patterns and outcomes of colesevelam hcl in the ge electronic medical record. BMC Endocr Disord 2013; 13(1): 24. [http://dx.doi.org/10.1186/1472-6823-13-24] [PMID: 23866087] [38] Hippisley-Cox J, Coupland C. 2016. [39] Fardet L, Petersen I, Nazareth I. Prevalence of long-term oral glucocorticoid prescriptions in the UK over the past 20 years. Rheumatology (Oxford) 2011; 50(11): 1982-90. [http://dx.doi.org/10.1093/rheumatology/ker017] [PMID: 21393338] [40] Davis KL, Tangirala M, Meyers JL, Wei W. Real-world comparative outcomes of US type 2 diabetes patients initiating analog basal insulin therapy. Curr Med Res Opin 2013; 29(9): 1083-91. [http://dx.doi.org/10.1185/03007995.2013.811403] [PMID: 23734906] [41] Xie L, Wei W, Pan C, Du J, Baser O. A real-world study of patients with type 2 diabetes initiating basal via disposable pens. Adv Ther 2011; 28(11): 1000-11. [http://dx.doi.org/10.1007/s12325-011-0074-5] [PMID: 22038703] [42] Deléger L, Grouin C, Zweigenbaum P. Extracting medical information from narrative patient records: the case of medication-related information. J Am Med Inform Assoc 2010; 17(5): 555-8. [http://dx.doi.org/10.1136/jamia.2010.003962] [PMID: 20819863] [43] Etminan M. Reporting guidelines for pharmacoepidemiological studies are urgently needed. BMJ 2014; 349: g5511. [http://dx.doi.org/10.1136/bmj.g5511] [PMID: 25231185] [44] Kamal KM, Chopra I, Elliott JP, Mattei TJ. Use of electronic medical records for clinical research in the management of type 2 diabetes. Res Social Adm Pharm 2014; 10(6): 877-84. [http://dx.doi.org/10.1016/j.sapharm.2014.01.001] [PMID: 24556384] [45] Herrin J, da Graca B, Nicewander D, et al. The effectiveness of implementing an electronic health record on diabetes care and outcomes. Health Serv Res 2012; 47(4): 1522-40. [http://dx.doi.org/10.1111/j.1475-6773.2011.01370.x] [PMID: 22250953] [46] Holt TA, Stables D, Hippisley-Cox J, O’Hanlon S, Majeed A. Identifying undiagnosed diabetes: cross-sectional survey of 3.6 million patients’

63 Estimation of Drug Therapy Duration from EMRs The Open Bioinformatics Journal , 2017, Volume 10 15

electronic records. Br J Gen Pract 2008; 58(548): 192-6. [http://dx.doi.org/10.3399/bjgp08X277302] [PMID: 18318973] [47] Davis KL, Tangirala M, Meyers JL, Wei W. Real-world comparative outcomes of US type 2 diabetes patients initiating analog basal insulin therapy. Curr Med Res Opin 2013; 29(9): 1083-91. [http://dx.doi.org/10.1185/03007995.2013.811403] [PMID: 23734906] [48] Paul SK, Klein K, Majeed A, Khunti K. Association of smoking and concomitant use of metformin with cardiovascular events and mortality in people newly diagnosed with type 2 diabetes. J Diabetes 2015; 8(3): 354-62. [PMID: 25929583] [49] Paul SK, Klein K, Maggs D, Best JH. The association of the treatment with glucagon-like peptide-1 receptor agonist exenatide or insulin with cardiovascular outcomes in patients with type 2 diabetes: a retrospective observational study. Cardiovasc Diabetol 2015; 14(1): 10. [http://dx.doi.org/10.1186/s12933-015-0178-3] [PMID: 25616979]

© 2017 Montvida et al. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: (https://creativecommons.org/licenses/by/4.0/legalcode). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

64 Chapter 5: Diabetes Mellitus Cohort

Using diagnostic codes to identify a diseased cohort is the standard approach in EMR-based studies [163, 164]. As was mentioned in chapter 3, use of such codes was shown as a reliable tool to identify diseased cohorts in various databases including CEMR. Nonetheless, inaccurate coding or incomplete data-entry process is a part of everyday practice, and EMRs reflect this matter [165]. With regards to DM, identifying cohorts by diagnostic codes brings-up the following concerns: (1) unknown disease subtype from high-level diagnostic codes, (2) longitudinally overlapping codes of different subtypes for individual patients, (3) absence of diagnostic codes for patients with DM (false negatives), and (4) presence of DM codes for those who are not diseased (false positives). The accuracy of disease cohorts identified with diagnostic codes, theoretically, may be improved by implementing advanced algorithms that robustly utilise available patient-level information.

Appendix B provides more details on the aforementioned challenges and existing methods trying to overcome them. A systematic review summarising methods on identification of disease cohorts from various clinical databases has been conducted by Shivade and colleagues [166]. While some studies applied restrictive rules to cohorts identified by diagnostic codes, several studies applied and compared machine learning (ML) techniques. For example, Tapak and colleagues [167] reported support vector machine as the best classifier, while Mani and colleagues [168] reported that using decision trees as the best approach to identify patients with diabetes.

As patients with T2DM are in primary focus of this dissertation, the aim was to achieve the highest possible quality of T2DM cohort. To overcome the aforementioned challenges, diagnostic codes were initially used to identify a cohort of patients with any DM (subsection 5.1). Next, machine learning (ML) approaches that seek to identify all patients in the database who are highly likely to be diabetic were compared (subsection 5.2). Afterwards, cohorts that were identified by one selected ML algorithm and by diagnostic codes, were combined. To obtain the final cohort, additional clinically guided rules were incorporated (subsection 5.3). Next, basic characteristics of the obtained DM cohort were compared to national reports (subsection 5.4). Finally, basic characteristics were explored in adults identified to have T2DM (subsection 5.5).

65 5.1 DIAGNOSTIC CODES

Patients with diagnostic codes for any type of diabetes were identified from CEMR. After quality assessments and data corrections, 111,303 and 2,160,098 patients were identified to have type 1 and type 2 diabetes, respectively (Figure 5.1). Patients with records for gestational diabetes and no record of type 1 or type 2 diabetes were treated separately (n=89,562). Patients were also categorised as “unspecific” diabetes type (n=118,557) due to high-level diagnostic codes (e.g. “ICD-9: 250 Diabetes Mellitus”), unspecific codes (e.g. “ICD-9: 362.07 Diabetic macular edema”), or longitudinally overlapping codes for type 1 and type 2. Among aforementioned 2,479,520 patients, 59% (n=1,468,246) had a record of ADD use with duration longer than 6 month, and only 38% (n=935,652) had a record of 2 elevated glucose measures within one consecutive year.

Figure 5.1. Cohort of patients with T2DM and distribution of identified sub-types.

5.2 SUPERVISED MACHINE LEARNING

5.2.1 Training datasets The training dataset with 150,000 patients, containing equal number of positive and negative representatives was extracted. Positive representatives were randomly chosen from those with a diagnostic code for DM, and negative representatives from those without a code for DM. All representatives were chosen from patients with at least 1 year of available follow-up, and non- missing sex and age. To verify presence of bias arising from selecting random negative representatives, two additional training datasets were created by including disjoint negative and the same positive representatives as in the main training set.

66 5.2.2 Feature selection For patients with DM (identified by diagnostic codes n=2,479,520), data on medication prescriptions, diseases, and laboratory measurements were analysed. It was found that 66% of these patients have a diagnosis of “Essential hypertension”, and 62% have a diagnosis of “Disorders of lipoid metabolism”. Antidiabetics, antihypertensives, and antilipidemics drugs were used by 82%, 70%, and 66% patients, respectively. Analgesics non-narcotic and analgesics opioid were used by 53% and 37% patients, respectively. Beta blockers, ulcer drugs, diuretic drugs, and antidepressants were used by 40%, 40%, 38%, and 34% patients, respectively. The most frequent observations and laboratory tests for diabetic patients were: weight, blood pressure, pulse, height, body mass index, creatinine (serum), urea nitrogen (blood), calcium (serum), alanine aminotransferase (serum), aspartate aminotransferase (serum), sodium (serum), HbA1c (blood), bilirubin (serum), and cholesterol (serum).

Obtained results were combined with clinical considerations and guidelines [169], and 11 potential disease predictors were obtained. Scheme-independent (or classifier-independent) attribute subset selection algorithms were applied to determine the best predictors. Using 10- fold cross-validation, bi-directional, forward, and backward greedy search techniques [170] agreed on the 4 features shown in the Table 5.2.

Table 5.1 Features Selected as Best Diabetes Predictors in CEMR Feature description Feature type Two measurements of: HbA1C ≥ 6.5% or fasting blood glucose ≥ 126 mg/dL [7.0 mmol/L], or random blood glucose ≥ 200 Binary mg/dL [11.1 mmol/L] within 1 year Anti-diabetic drug duration ≥ 6 months Binary Average body mass index Continuous Ischemic heart disease, heart failure or stroke Binary

5.2.3 Classification algorithm selection Keeping four selected predictors only, the performance of six classification algorithms was compared on training sets. Sensitivity (true positive rate), specificity (true negative rate), and area under receiver operating characteristic curve (AUC) were calculated as the average of 10- repeat 10-fold cross-validations. Central processing unit (CPU) time of training and percent of correctly classified instances were also recorded. Compared classifiers were: Naïve Bayes

67 [171], Logistic regression [172], Support Vector Machine [173], Multilayer Perceptron [174], Decision Tree with J48 modification [175], and One Rule [176].

While the One Rule algorithm performed significantly worse, performance of the other algorithms was similar (Table 5.2). Among them, the false positive rate was the same, but Support Vector Machine and Decision Tree algorithms produced fewer false negatives. Given the higher AUC and smaller CPU time, the Decision Tree algorithm was chosen as the final machine learning approach. Absence of bias arising from selecting random negative instances was confirmed by almost identical performance of all algorithms on three training sets.

Table 5.2 Performance of Machine Learning Algorithms on the Training Dataset Naïve Logistic Multilayer Support J48 One Bayes Regression Perceptron Vector Decision Rule Machine Tree Percent correct 90.04 90.12 90.15 90.17 90.17 86.6 True positive rate 0.96 0.96 0.96 0.97 0.97 0.98 True negative rate 0.84 0.84 0.84 0.84 0.84 0.76 AUC 0.94 0.94 0.94 0.90 0.91 0.87 CPU time 0.05 2.26 49.23 157.71 0.52 0.09

5.3 FINAL COHORT

The selected J48 algorithm (Figure 5.2) was applied to all patients in the CEMR with at least 1 year of follow-up, non-missing sex and age, and resulted in 2,023,956 patients that are highly likely to have diabetes. Of them, 78% (n=1,580,867) had a diagnostic code for diabetes.

Figure 5.2. Selected Decision Tree algorithm.

68 As errors during the data entry process in a real-world setting usually results in a smaller number of false positive patients, and a larger number of false negatives identified by diagnostic codes, cohorts identified by the ML approach and by diagnostic codes were combined (n=2,922,609). Minimizing false negative instances was ensured by careful design in further studies (e.g. inclusion of patients who initiated pharmacological therapy with MET, and added second-line ADD). Patients who were identified by ML and not by diagnostic codes (n=443,089), were categorised as “unspecific” diabetes type. As a final step, the following rules were applied to distinguish diabetes types amongst all the “unspecific” cases:

7. if duration of non-insulin ADD ≥ 2 months, then type 2; 8. otherwise, if age at first available diagnosis date ≤ 18 years and insulin initiated within 1 year, then type 1; 9. otherwise, if age at first available diagnosis date > 18 years and insulin initiated within 3 months, then type 1; 10. otherwise, consider that patient does not have diabetes.

Steps 1-3 were unable to identify the type of diabetes for 29,288 patients, which were excluded from the cohort. The final diabetes cohort consisted of 2,893,321 patients with 178,805 / 2,624,954 / 89,562 patients identified to have type 1 / type 2 / gestational diabetes (Figure 5.1).

5.4 REPRESENTATIVENESS OF DIABETES COHORT

Among all patients who were active in the CEMR during 2015 and were older than 18 years, 11.6% were identified to have any type of diabetes. This estimate stands very close to the US National Diabetes Statistics (NDS) report that estimated 12.2% of adult population to have diabetes in 2015 [177]. NDS reported an almost equal gender distribution (Table 5.3), while a higher proportion of women (55%) was captured in the CEMR. Calculating the age of patients in 2015, CEMR appeared to have younger patients than the NDS report. Body weight of adults with diabetes in the CEMR was found to reflect the NDS report well, with the majority of patients (90%) being overweight or obese (Table 5.3).

Table 5.3 Characteristics of patients with diabetes in the CEMR database and in the National Diabetes Statistics report, 2015 CEMR, % NDS, % Gender (p=0.4) Male 45 51 Female 55 49

69 Age, years (p=0.5) 18-44 13 9 45-65 39 37 ≥65 47 55 BMI, kg/m2 (p=0.9) <25 10 13 25-30 25 26 30-40 46 44 ≥40 19 18

Other estimates in the NDS report are hard to directly compare with CEMR due to methodological considerations. Although a large number of patients with diabetes do not have a record of tobacco use in the CEMR, 17% were current smokers among those who have a record of tobacco status. This estimate stands very close to the 16% of current smokers reported in the NDS. NDS estimated that 74% of adults with DM had SBP ≥ 140 mm/Hg, or DBP ≥ 90 mm/Hg, or they were on prescription medication for high blood pressure. This estimate in the CEMR ranged between 70% and 82%, depending on the definition of medication for high blood pressure and the timeline chosen. According to the NDS report, 58% of adults aged 21- 75 with no self-reported CVD but who were eligible for statin therapy were on a lipid-lowering medication (data source: 2011–2014 National Health and Nutrition Examination Survey). Rough estimates from CEMR for this figure were 60-65% for adults 21-75 in 2015.

5.5 TYPE 2 DIABETES COHORT

Among 2,624,954 patients identified to have T2DM, 2,596,630 were at least 18 years old at the time of the diabetes diagnosis. Basic characteristics at the time of diagnosis, along with existing diseases for these adults is presented in the Table 5.4. In this cohort, 53% /47% are females /males with mean (SD) follow-up of 3.9 (4.8) years. The majority of patients are White Caucasians with mean (SD) age of 59 (13) years. At the time of T2DM diagnosis 39% /45% of females /males had HbA1c ≥ 8%, and 68% /62% of females /males were obese. Blood pressure and lipid profiles were very similar across females and males (Table 5.4). At the time of the T2DM diagnosis, 23% of males and 15% of females were diagnosed with CVD. CCI was around 1.5 for both genders, however more females had a record of depression diagnosis: 11% against 6% for males.

70 Table 5.4 Baseline characteristics among adults with T2DM Female Male All N 1,381,075 1,215,555 2,596,630 Follow-up, yearsα 4.0 (4.8) 3.9 (4.9) 3.9 (4.8) Follow-up, yearsβ 2.8 (0.9, 5.6) 2.7 (0.9, 5.5) 2.7 (0.9, 5.5) Follow-up ≥ 6 monthsγ 1,148,017 (83) 991,743 (82) 2,139,760 (82) follow-up ≥ 12 monthsγ 1,021,250 (74) 882,575 (73) 1,903,825 (73) follow-up ≥ 24 monthsγ 815,233 (59) 703,941 (58) 1,519,174 (59) Age, yearsα 58 (14) 60 (12) 59 (13) Age ≥ 70 yearsγ 319,075 (23) 293,480 (24) 612,555 (24) Ethnicityγ White, n (%) 852, 115 (62) 799, 320 (66) 1,651,435 (64) Black, n (%) 185, 040 (13) 110,491 (9) 295,531 (11) Asian, n (%) 24,718 (2) 22,753 (2) 47,471 (2) Tobacco use statusγ Current 82,218 (6) 85,101 (7) 167,319 (6) Former 113,714 (8) 151,353 (12) 265,067 (10) Never 306,603 (22) 197,396 (16) 503,999 (19) Unknown 878,540 (64) 781,705 (64) 1,660,245 (64) HbA1c, %α 8.2 (1.8) 8.4 (1.9) 8.3 (1.9) HbA1c ≥ 7.5γ 156,850 (51) 173,302 (57) 330,152 (54) HbA1c ≥ 8%γ 121,476 (39) 138,013 (45) 259,489 (42) Weight, kgα 90 (24) 102 (24) 96 (25) BMI, kg/m2 α 34.6 (8.5) 32.8 (7.1) 33.8 (7.9) Obese γ 707,335 (68) 557,226 (62) 1,264,561 (65) SBP, mmHgα 131 (17) 132 (17) 132 (17) SBP ≥140 mmHgγ 299,248 (28) 270,470 (29) 569,718 (29) DBP, mmHgα 77 (10) 78 (10) 77 (10) Heart Rate, bmpα 79 (12) 77 (12) 78 (12) LDL, mg/dLα 109 (38) 101 (37) 105 (38) HDL, mg/dLα 49 (14) 41 (12) 45 (14) Triglycerides, mg/dLβ 139 (101, 188) 139 (99, 191) 139 (100, 190) Present diseasesγ CVD 204,114 (15) 279,284 (23) 483,398 (19) Heart Failure 47,427 (3) 56,471 (5) 103,898 (4) Myocardial Infarction 18,019 (1) 32,515 (3) 50,534 (2) Stroke 55,988 (4) 55,631 (5) 111,619 (4) Chronic Kidney Disease 45,140 (3) 50,929 (4) 96,069 (4) Rheumatoid Arthritis 20,742 (2) 8,113 (1) 28,855 (1) Cancer 48,105 (3) 47,882 (4) 95,987 (4) Depression 156,623 (11) 69,138 (6) 225,761 (9) Charlson Comorbidity Indexα 1.48 (0.95) 1.53 (1.03) 1.51 (0.99) αmean (SD), βmedian (IQR), γn(%)

71 In adult patients with T2DM, exposure to various medications any time during follow-up is presented in table 5.5. The majority of patients (61%) were prescribed metformin, while a quarter of patients eventually received insulin. Chapter 7 provides a detailed exploration of longitudinal prescription patterns of ADDs in patients with T2DM. Around 80% of patients were using a cardio-protective medication (CPM), while 64% /71% of females /males received lipid-modifying drugs sometime during follow-up.

Table 5.5 Exposure to medications any time during available follow-up among adults with T2DM N (%) Female Male All Metformin 842,806 (61) 736,378 (61) 1,579,184 (61) Sulfonylurea 405,132 (29) 428,792 (35) 833,924 (32) Thiazolidinedione 151,198 (11) 164,740 (14) 315,938 (12) Insulin 351,106 (25) 330,066 (27) 681,172 (26) GLP-1RA 90,030 (7) 67,750 (6) 157,780 (6) DPP-4i 193,388 (14) 188,045 (15) 381,433 (15) SGLT-2i 39,574 (3) 42,320 (3) 81,894 (3) Lipid modifying 880,364 (64) 860,370 (71) 1,740,734 (67) Statin 800,753 (58) 792,066 (65) 1,592,819 (61) CPM* 1,094,662 (79) 1,024,434 (84) 2,119,096 (82) Diuretic 666,838 (48) 519,525 (43) 1,186,363 (46) Antihypertensive 116,478 (8) 119,395 (10) 235,873 (9) Antidepressant 575,425 (42) 323,365 (27) 898,790 (35) Anti-obesity 39,942 (3) 12,503 (1) 52,445 (2) *CPM: beta blocker, statin, angiotensin-converting-enzyme inhibitor, or angiotensin II receptor blocker

72 Chapter 6: Imputation of Longitudinal Observation Data

Statement of Contribution of Co-Authors for Thesis by Published Paper

The authors listed below have certified* that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Mayukh Samanta, Olga Montvida, Joanne Tropea, and Sanjoy K. Paul. A comparison of imputation methods for missing risk factor data from large real-world electronic medical records for comparative effectiveness studies.

Contributor Statement of Contribution* Olga Montvida Conducted data extraction and contributed towards 29.06.2018 manuscript development. QUT Verified Signature

Mayukh Samanta Conceived the idea and was responsible for the primary design of the study. Conducted statistical analyses. Developed first draft and contributed towards development of the manuscript. Joanne Tropea Contributed towards development of the manuscript. Sanjoy K. Paul Conceived the idea and was responsible for the primary design of the study. Contributed to the statistical analyses.

73 Developed first draft and contributed towards development of the manuscript.

Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship.

QUT Verified Signature

Sanjoy Ketan Paul 29.06.2018 Name Signature Date

74 Title: A Comparison of Imputation Methods for Missing Risk Factor Data from

Large Real-world Electronic Medical Records for Comparative Effectiveness

Studies

Mayukh Samanta, PhD1, Olga Montvida1,2, Joanne Tropea3 and Sanjoy K. Paul, PhD3

1Statistics Unit, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston,

QLD 4006, Australia

2School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, Brisbane, Australia 3Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia

Correspondence to:

Prof. Sanjoy Ketan Paul

Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, The Royal Melbourne Hospital - City Campus | 7 East, Main Building, Grattan Street, Parkville Victoria 3050, Australia Email: [email protected]

ORCID Researcher ID: F-8199-2010

Word Count Abstract: 244

Word Count Main Text: 2969

Number of Tables + Figures: 3 +2

Supplementary Figures: 1

Page 1 of 23

75 ABSTRACT Background: Evaluation of appropriate methodologies for imputation of missing risk factor or outcome data from electronic medical records (EMRs) is crucial but lacking for comparative effectiveness studies. Robust imputation of missing data relies on the understanding of the predictors of missingness in the risk factor data, especially in patients with chronic diseases.

These two aspects have not been explored simultaneously to support methodological developments in clinical epidemiological studies with real-world data.

Methods: Using disease-biomarker data (glycated haemoglobin, HbA1c) from large EMR database in patients with diabetes, exploratory analyses were conducted to ascertain the possible predictors of missingness. Three approaches, based on multiple imputation technique, namely two-fold multiple imputation, by chained equations, and with Monte Carlo Markov

Chain, were evaluated in terms of their robustness in imputing missing data. The value of using imputed data for drawing robust inferences on comparative effectiveness of two anti-diabetes therapies were compared with the complete-case analyses.

Results: Older patients and patients with higher disease-severity were less likely to have missing HbA1c data longitudinally over 12 months, while gender and pre-existing comorbidities were not associated with the likelihood of missingness. No significant differences in the distributions of follow-up imputed data with the three methods were observed.

Conclusion: While complete case analyses were prone to bias by indication, use of three multiple imputation techniques for large proportion of missing primary outcome data under unknown patterns of missingness appeared to be valid, and able to provide consistent and reliable clinical inferences.

Page 2 of 23

76 Key Words: Missing data; Imputation; Multiple Imputation; Electronic Medical Records;

Real-world data

Page 3 of 23

77 INTRODUCTION

Recent advances in the design and implementation of large electronic medical records (EMRs)

from national primary/ambulatory care databases have created new opportunities in clinical

and epidemiological studies [1, 2]. These databases have been extensively used to evaluate the

risk factor changes in patients with different clinical conditions [3-6]. However, one of the

critical problems with EMR data, as with all longitudinal observational data, is the issue of

missing data [7-10].

The data entry in EMRs depends on the nature and level of engagement between the individual

and the clinical service provider. For example, a patient with diabetes would be advised to get

blood tests done every 6 months for the assessment of various risk factors including glucose

level and lipids. However, given the severity of the disease state and the nature of anti-diabetes

drug (ADD) titration, the patient might need blood tests done more frequently. In primary care

settings, it is hypothesized that younger patients and those with lower risk profiles are less likely to get blood tests done. The missing data may also arise simply because a patient failed to attend the scheduled consultations. These aspects complicate the assertion about the nature of missing data in EMRs, making it difficult to appropriately differentiate between random and non-random missingness patterns. Although the problem of having significant proportions of missing data in longitudinal studies can be minimised through careful design, it is almost unavoidable in most clinical and epidemiological studies [11-13].

The inference drawn from a clinical or epidemiological study may be compromised when individuals have missing data on health indicators, and inadequate handling of the missing data can lead to substantial bias in the inferences drawn [12, 13]. Hence, before investigating and imputing for the missing data, understanding the mechanisms behind the missing data is crucial. In practice, incomplete data are typically considered as missing at random (MAR) even if they may not be [12, 14]. In most EMRs, some variables would be expected to partially

Page 4 of 23

78 explain some of the variation in missingness, which indicates imputation under MAR setting

[12]. A previous study reported that the standard imputation of missing EMR data with not missing at random (NMAR) assumption but without NMAR model might produce biased estimates, although the bias might not be large [15].

Multiple imputations for missing data, compared to a single imputation, accounts for the statistical uncertainty in missing values. Multiple imputation can lead to consistent, asymptotically normal and efficient estimates for a dataset with MAR missing pattern which makes it very attractive [10, 16-18]. Several statistical and machine learning methods including the multiple imputation techniques have been used to deal with the complex problem of missing data [7-9, 19]. There is a strong body of literature on the methodological and application aspects of multiple imputation of missing values [10, 14, 18-21]. Carpenter and Kenward (2013) described the theoretical justifications and computations aspects of various approaches within the multiple imputation platform [22]. However, the studies addressing the fundamental aspects of missingness patterns in risk factor data from EMRs and the practical implications of such missingness while conducting comparative effective studies is scarce [20, 23-25].

Using nationally representative EMR from the primary care system, the aims of this study were to: (1) evaluate the association of different patient-level characteristics with the likelihood of missingness of the risk factor or outcome data, and (2) investigate the performance of three multiple imputation techniques for the imputation for missing longitudinal clinical risk factor data in the context of evaluating comparative therapeutic effectiveness. Three multiple imputation techniques for missing disease biomarker data (glycated haemoglobin, HbA1c) were compared in patients with type 2 diabetes (T2DM), treated with two different ADDs under a new user design setup. The robustness of drawing clinical inferences with imputed data and complete case (CC) was explored for comparison of effectiveness of the therapies at population level.

Page 5 of 23

79 METHODS

Data

Centricity Electronic Medical Record (CEMR) of USA represents a variety of ambulatory and primary care medical practices. Over 35,000 physicians and other providers from all US states contribute to the CEMR, with approximately 75% being primary care providers. The database is generally representative of the USA population; diabetes prevalence (7.1% diabetic patients identified by diagnostic codes) is similar to National Diabetes Statistics (6.7% diagnosed diabetes in 2014) [26]. CEMR has been used extensively for academic research worldwide [27-

29].

For more than 34 million individuals, longitudinal EMRs were available from 1995 until April

2016. This database contains comprehensive patient-level information on demographics, anthropometric, clinical and laboratory variables including age, sex, ethnicity, and longitudinal measures of HbA1c. Medication data includes brand names and doses for individual medications prescribed, along with start/ stop dates and specific fields to track treatment alterations. This dataset also contains patient reported medications, including prescriptions received outside the EMR network and over-the-counter medications. A robust methodology for extraction and assessment of longitudinal patient-level medication data from the CEMR database has been recently described by the authors [30].

Study population

The T2DM study cohort was selected on the basis of the following conditions: (1) valid diagnosis of T2DM, (2) 18 - 80 years old at the date of treatment initiation (baseline), (3) no missing data on age and sex, and (4) valid baseline HbA1c measure. We focused on two ADDs: dipeptidyl peptidase-4 inhibitor (DPP-4) and Glucagon-like peptide-1 receptor agonist (GLP-

1RA) when added to the first-line metformin. The number of patients with minimum 12/24

Page 6 of 23

80 months of follow-up post initiation of DPP-4 and GLP-1RA were 38,483/23,859 and

8,977/5,312 respectively. These patients were receiving the respective therapies for a minimum of one year. HbA1c measures at baseline, 6, 12, 18, and 24 months were obtained as the nearest measure within 3 months either side of the time point.

Multivariate Imputation by Chained Equations (MICE)

MICE is an increasingly popular method of dealing with missing data in epidemiological and clinical research. This iterative imputation approach imputes multiple variables by using chained equations under the assumption that missing data are MAR [31]. This method creates multiple imputations for missing multivariate data by Gibbs sampling. The advantages of

MICE is that it can handle arbitrary missing data patterns as well as variables of different types.

For imputation with continuous variables, linear regression models or predictive mean matching are used while logistic regression and polytomous models are needed for dichotomous and categorical variables respectively [32, 33]. MICE is also commonly known as fully conditional specification (FCS) and sequential regression multivariate imputation [34].

Two-fold Multiple Imputation

There are two phases in two-fold method of imputation; firstly, the filled-in phase followed by the imputation phase. At the filled-in phase, the missing values for all variables are filled in sequentially over the variables taken one at a time, which specifies separate univariate imputation models for each variable with missing data conditional on all other variables [35] .

At the imputation stage, the missing values are imputed using a specified method and covariates at each iteration. Two-fold multiple imputation imputes missing values at each time point conditional on observed measures within a small time window using FCS or Chained Equations

[35, 36]. Usually, this method uses information from that time point where the imputation is conducted and from immediately adjacent time points. A distinct advantage of this method is

Page 7 of 23

81 that it can handle both time-dependent and time-independent variables as well as allowing users to specify the time window. This method also reduces the issues of collinearity and overfitting.

Multiple imputation with Bayesian Monte Carlo Markov Chain (MCMC)

Multiple imputation platform incorporates both parametric and non-parametric approaches.

Parametric approaches include improper, approximate proper imputation, and the Bayesian imputation (proper) which usually uses Markov Chain Monte Carlo (MCMC) methods to obtain the posterior distribution [22]. In the context of arbitrary missing data patterns, the

MCMC method is often used which creates multiple imputations by using simulations from a

Bayesian prediction distribution for normal data. We used multiple imputation with Bayesian iterative MCMC procedures which can also be used when the pattern of missing data is monotone or non-monotone [22].

Statistical Methods

We imputed the missing HbA1c (measured in %) measurements at 6-, 12-, 18- and 24-month follow-up using these three methods and compared the results with CC analysis. CC analysis considered only the non-missing HbA1c at the same time points. In all three imputation methods, imputed values were adjusted for age, sex, and addition of any third-line ADD within

2 years of follow-up.

Basic statistics were presented by number (percentage), mean (SD), mean (95% CI) or median

(first quartile, third quartile) separately for two the treatment groups, as appropriate. Both the unadjusted and adjusted change in HbA1c (%) at 6 and 12 months by the two treatment groups were evaluated, the adjustment factors being age at treatment initiation, sex, diabetes duration at treatment initiation, and time to second-line ADD. Among patients with baseline HbA1c ≥

7.5%, logistic regression was used to evaluate the odds of reducing HbA1c below 7% (glucose

Page 8 of 23

82 management target in patients with T2DM) at 6 and 12 months of follow-up in the GLP-1RA group compared to the DPP-4 group. Treatment status is usually not randomized in observational data which implies that the outcome and treatment are not necessarily independent. To avoid this issue we applied a treatment effects model adjusting for age, sex, diabetes duration at treatment initiation, and time to second-line ADD to make treatment and outcome independent conditioning on those covariates [37].

To evaluate the association of various factors that could be associated with missingness of

HbA1c measures at follow-up, the likelihood of missingness of HbA1c at 6 and 12 months of follow-up from baseline for each treatment group was estimated using logistic regression, adjusting for age, sex, pre-existing cardiovascular disease (CVD) or pre-existing chronic kidney disease (CKD), baseline HbA1c ≥ 7.5% and use of other medications. Instead of ordinary age groups we considered quartiles of age group denoted by Q1 (18-50 years), Q2

(50-58 years), Q3 (58-66 years) and Q4 (66-80 years).

RESULTS

The basic characteristics of the study cohort are presented in Table 1. The mean (SD) age was

58 (12) and 54 (11) years, 49 % and 35% were male, and 71% and 78% of the patients were

White Caucasian in DPP-4 and GLP-1RA respectively. Median (Q1, Q3) HbA1c at baseline in patients with minimum of 12 month of DPP-4 and GLP-1RA were 7.5 (6.8, 8.8) and 7.1 (6.5,

8.3) respectively.

There were no missing data on HbA1c at baseline by design. The proportions of missing

HbA1c data for patients with a minimum 12 and 24 months of treatment are presented for every

6 months of follow-up in Table 1. Among patients with a minimum treatment duration of 12 months, proportions of missing HbA1c ranged from 28% to 32%. Similar missing proportions

Page 9 of 23

83 (28- 34%) were observed at 24 months follow-up in patients with a minimum of 24 months of treatment.

The possible association of various patient characteristics with the likelihood of missing

HbA1c in the study cohort at 6 and 12 months of follow-up is presented in Table 2. Age at treatment initiation had significant influence on the likelihood of missing HbA1c. Among patients treated with DPP-4, compared to younger patients (age quartile - Q1) the odds of non- missing measure of HbA1c in older patients (Q2 to Q4) increased from 16% to 30% and from

21% to 34% at 6 and 12 months of follow-up respectively. Similar results (20% to 38% higher odds) were observed in patients treated with GLP1-RA at 6 months.

Sex and pre-existing CVD or CKD did not have any influence on the likelihood of missingness of HbA1c at 6 or 12 months follow-up. Patients with HbA1c ≥ 7.5% at baseline were 12% (OR

CI: 0.83, 0.93) and 14% (OR CI: 0.77, 0.97) less likely to have missing data at 6 months follow- up in the DPP-4 and GLP-1RA groups respectively. However, this association seems to disappear at 12 month follow-up.

There was no difference in the distributions of follow-up imputed Hba1c data with the three imputed methods and the complete case analyses (Table 3 and Figure 1). The estimates of unadjusted and adjusted changes in HbA1c during follow-up were also similar with all imputation approaches, and there was no difference in these estimates with the CC analyses

(Table 3 and Figure 2).

Among patients with HbA1c ≥ 7.5% at baseline, the proportions of patients identified to have reduced HbA1c ≤ 7% at 6 and 12 months follow-up were similar using all imputation approaches (Table 3). While making clinical inference on the likelihood of reducing HbA1c below 7% in the GLP-1RA group, compared to those treated with DPP-4, there was no

Page 10 of 23

84 disagreement among the three imputation approaches, and this inference was also in line with the analysis of complete cases.

Figure 1 shows that there was no difference in distribution of HbA1c at 6 months and 12 months among the three imputation approaches for patients treated with DPP-4. In patients treated with

GLP-1RA, at 6 months of follow-up, although MICE indicated a slightly leptokurtic distribution due to its higher variability (SD = 1.4), there was no difference at 12 months. The density plot (Figure 2) for the change in HbA1c in DPP-4, compared to the two treatment groups, indicated there was no obvious difference between the three imputation methods.

However, in the GLP-1RA group, the density plots obtained using MICE were leptokurtic at both 6 and 12 months of follow-up. Supplementary Figure 1 showed that patients treated with

DPP-4 had a higher mean HbA1c level at baseline and maintained it during follow-up compared to patients treated with GLP-1RA. No significant difference in the trajectories of

HbA1c over 24 months of follow-up was observed between CC analyses and the three imputation techniques for both DPP-4 and GLP-1RA.

DISCUSSION

A novel component of this study is the investigation of the likelihood of missingness of follow- up risk factor measures (HbA1c) with patients’ demographic and clinical characteristics (age, sex, pre-existing comorbidities and disease severity (baseline HbA1c). The results clearly indicated that the missingness in the follow-up risk factor data is less likely in older patients, irrespective of the drug they are taking for glycaemic control. We also observed that patients with higher disease severity (HbA1c above 7.5% at baseline) are more likely to visit their GP or primary care provider at 6 months post treatment intensification/ therapy titration – which disappears over longer follow-up time – likely to be a result of effectiveness of the therapies in terms of better risk factor control. This extensive assessment clearly informs on the random and non-random patterns of missingness. However, it is very difficult to identify and

Page 11 of 23

85 distinguish random and non-random missing patterns in EMRs, and model them accordingly

to obtain robust inference(s) through imputations.

Another novel component of our study is the comparative assessment of the usability and

robustness of using imputed data for making clinical inferences in comparative effectiveness

studies at the population level using large real-world EMRs. We observed that the inferences

drawn on the risk factor changes in this pharmaco-epidemiological study are similar between

CC and imputed data based analyses. More importantly, the clinical contexts of evaluating the

effectiveness of the therapies, using continuous measures of risk factors or clinical

categorisation of the therapeutic achievements, were well supported with confidence in making

robust inferences using different methods of imputation. The EMR database presents a

formidable challenge that the “missing data” have an intermittent pattern of missingness over

time (non-monotone) and are NMAR, so approaches such as CC produces biased and

statistically inefficient results [38].

As expected in the primary care based EMR, a large number of patients had missing HbA1c in

the 6-monthly follow-up data over 24 months. We observed that in estimating changes in

HbA1c at 6 and 12 months from baseline, MICE was estimating marginally lower compared

to the other two methods and also slightly leptokurtic in density plot. In almost all instances,

both Two-fold and MCMC performed similarly. Due to its simplified nature of imputing

missing values at a given time by using values at nearby times makes Two-fold imputation a more attractive technique in the context of EMR databases. This automatically reduces the complexity of the imputation models, collinearity and overfitting issues [35]. Furthermore, sometimes measurements further away from time may produce independent information compared to the adjacent time points. In these circumstances, we have to be careful in using

Two-fold imputation with small time window widths. Possibly a greater time window width may solve this problem.

Page 12 of 23

86 The MAR approach assumes that like the observed values, the missing observations are not

random samples that are generated from the same sampling distribution [19]. In our study, the

distributions of the imputed data we obtained using these three imputation methodologies were similar compared to the data for CC because the underlying theory behind these imputation techniques is the same as multiple imputations and are based upon MAR assumption. The missing outcome measure data in this context also raises the issue of some kind of indication bias – the fact that patients with better glycaemic control are less representative in the follow- up outcome measure data. In this case, any analysis on the CC is highly likely to bias the result towards those who are doing poorly in terms of glycaemic control, as observed in this study.

Kim (2004) showed that under the regression model, the bias of the multiple imputation variance estimator decreases with large sample size [39]. We have a large sample size in our study, hence we had almost unbiased variance estimator for the imputed data. Clearly the use of robust statistical analytical techniques employed on analysis of imputed data is highly likely to produce robust and reliable clinical inferences, compared to that based on the CC analyses.

In this context, the use of all three multiple imputation techniques (MICE, Two-fold and

MCMC) to impute for a relatively large proportion of missing primary outcome data, under unknown patterns of missingness, appears to be valid and able to provide consistent and reliable clinical inferences.

ACKNOLEDGEMENTS

University of Melbourne gratefully acknowledges the support from the Australian

Government’s National Collaborative Research Infrastructure Strategy (NCRIS) initiative through Therapeutic Innovation Australia. No separate funding was obtained for this study.

OM acknowledges the Ph. D. scholarship from Queensland University of Technology,

Australia, and her co-supervisors Prof. Ross Young and Prof. Louise Hafner of the same

University.

Page 13 of 23

87 CONFLICT OF INTEREST

SKP has acted as a consultant and/or speaker for Novartis, GI Dynamics, Roche, AstraZeneca,

Guangzhou Zhongyi Pharmaceutical and Amylin Pharmaceuticals LLC. He has received grants in support of investigator and investigator-initiated clinical studies from Merck, Novo Nordisk,

AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi-Avensis and Pfizer. MS, OM and JT have no conflict of interest to declare.

Page 14 of 23

88 REFERENCES

1. Sagreiya, H. and R.B. Altman, The utility of general purpose versus specialty clinical databases for research: warfarin dose estimation from extracted clinical variables. Journal of biomedical informatics, 2010. 43(5): p. 747-751. 2. Shivade, C., et al., A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 2014. 21(2): p. 221-230. 3. Montvida, O., et al., Addition of or switch to insulin therapy in people treated with glucagon-like peptide-1 receptor agonists: A real-world study in 66 583 patients. Diabetes, Obesity and Metabolism, 2016: p. n/a-n/a. 4. Paul, S.K., et al., Delay in treatment intensification increases the risks of cardiovascular events in patients with type 2 diabetes. Cardiovascular Diabetology, 2015. 14(1): p. 100. 5. Badve, S.V., et al., The Association between Body Mass Index and Mortality in Incident Dialysis Patients. PLoS One, 2014. 9(12): p. e114897. 6. Thomas, G., et al., Obesity paradox in people newly diagnosed with type 2 diabetes with and without prior cardiovascular disease. Diabetes Obes Metab, 2013. 16(4): p. 317-25. 7. Jerez, J.M., et al., Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine, 2010. 50(2): p. 105-115. 8. Biering, K., N.H. Hjollund, and M. Frydenberg, Using multiple imputation to deal with missing data and attrition in longitudinal studies with repeated measures of patient-reported outcomes. Clin Epidemiol, 2015. 7: p. 91-106. 9. Thomas, G., K. Klein, and S. Paul, Statistical challenges in analysing large longitudinal patient-level data: The danger of misleading clinical inferences with imputed data. Journal of the Indian Society of Agricultural Statistics, 2014. 68(2): p. 39-54. 10. Sterne, J.A.C., et al., Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 2009. 338: p. 157- 160. 11. Little, R.J., et al., The Prevention and Treatment of Missing Data in Clinical Trials. New England Journal of Medicine, 2012. 367(14): p. 1355-1360. 12. Wells, B.J., et al., Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC), 2013. 1(3): p. 1035. 13. Madden, J.M., et al., Missing clinical and behavioral health data in a large electronic health record (EHR) system. J Am Med Inform Assoc, 2016. 14. Mackinnon, A., The use and reporting of multiple imputation in medical research - a review. J Intern Med, 2010. 268(6): p. 586-93. 15. Lin, J.H. and P.J. Haug, Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. J Biomed Inform, 2008. 41(1): p. 1-14. 16. Marston, L., et al., Issues in multiple imputation of missing data for large general practice clinical databases. Pharmacoepidemiol Drug Saf, 2010. 19(6): p. 618-26. 17. Rubin, D.B. and N. Schenker, Multiple imputation in health-care databases: an overview and some applications. Stat Med, 1991. 10(4): p. 585-98. 18. Spratt, M., et al., Strategies for multiple imputation in longitudinal studies. Am J Epidemiol, 2010. 172(4): p. 478-87. 19. Rubin, D.B., Inference and missing data. Biometrika, 1976. 63(3): p. 581-592.

Page 15 of 23

89 20. Bounthavong, M., J.H. Watanabe, and K.M. Sullivan, Approach to addressing missing data for electronic medical records and pharmacy claims data research. Pharmacotherapy, 2015. 35(4): p. 380-7. 21. Yuan, Y.C., Multiple imputation for missing data: concepts and new developments (version 9.0). 2010, SAS Institute Inc.: Rockville. 22. Carpenter, J.K., Michael, Multiple Imputation and its Application. 2013, Wiley. 23. Wells, B.J., et al., Strategies for Handling Missing Data in Electronic Health Record Derived Data. eGEMs, 2013. 1(3): p. 1035. 24. Madden, J.M., et al., Missing clinical and behavioral health data in a large electronic health record (EHR) system. J Am Med Inform Assoc, 2016. 23(6): p. 1143-1149. 25. Montvida, O., et al., Addition of or switch to insulin therapy in people treated with glucagon-like peptide-1 receptor agonists: A real-world study in 66 583 patients. Diabetes Obes Metab, 2017. 19(1): p. 108-117. 26. Control, C.f.D. and Prevention, National diabetes statistics report: estimates of diabetes and its burden in the United States, 2014. Atlanta, GA: US Department of Health and Human Services, 2014. 2014. 27. Crawford, A.G., et al., Comparison of GE Centricity Electronic Medical Record database and National Ambulatory Medical Care Survey findings on the prevalence of major conditions in the United States. Popul Health Manag, 2010. 13(3): p. 139-50. 28. Brixner, D., et al., Assessment of cardiometabolic risk factors in a national primary care electronic health record database. Value in health, 2007. 10(s1): p. S29-S36. 29. Paul, S.K., et al., Weight gain in insulin treated patients by BMI categories at treatment initiation: New evidence from real-world data in patients with type 2 diabetes. Diabetes, Obesity and Metabolism, 2016. 30. Montvida, O., et al., Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. Open Bioinformatics Journal, 2017. 10: p. 1-15. 31. van Buuren, S. and K. Groothuis-Oudshoorn, mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 2011. 45(3): p. 1-67. 32. White, I.R., R. Daniel, and P. Royston, Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Comput Stat Data Anal, 2010. 54(10): p. 2267-2275. 33. van Buuren, S., Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res, 2007. 16(3): p. 219-42. 34. Raghunathan, T.E., et al., A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Model. Survey Methodology, 2001. 27(1): p. 85-95. 35. Welch, C.A., et al., Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data. Statistics in Medicine, 2014. 33(21): p. 3725-3737. 36. Welch, C., J. Bartlett, and I. Petersen, Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data. Stata J, 2014. 14(2): p. 418-431. 37. Cattaneo, M.D., Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics, 2010. 155(2): p. 138-154. 38. Little, R.J.A., Rubin, Donald B. , Statistical Analysis with Missing Data. Second ed. 2002: Wiley-Interscience. 39. Kim, J.K., Finite sample properties of multiple imputation estimators. The Annals of Statistics, 2004. 32(2): p. 766-783.

Page 16 of 23

90

Table 1: Basic statistics and missingness of HbA1c (%) by treatment group with a minimum 12 months treatment duration.

Minimum 12-months Minimum 24-months treatment duration treatment duration DPP-4 GLP1-RA DPP4 GLP1-RA N 38483 8977 23859 5312 Age in years𝜙 58 (12) 54 (11) 58 (12) 54 (11) Maleη 18721 (49) 3136 (35) 11672 (49) 1834 (35) Ethnicityη Asian 828 (2) 100 (1) 541 (2) 65 (1) Black 4393 (11) 711 (8) 2682 (11) 412 (8) Indian 483 (1) 47 (1) 343 (1) 27 (1) (American) Native Hawaiian 67 (0) 19 (0) 49 (0) 10 (0) Unknown 5507 (14) 1192 (13) 3343 (14) 667 (13) White 27205 (71) 6908 (77) 16901 (71) 4131 (78) HbA1c(%)γ Baseline 8.1 (6.5, 7.7 (6.5, 8.1 (6.5, 18.8) 7.7 (6.5, 18.8) 16.5) 16.5) 6 months 7.1 (5, 17.7) 6.9 (5, 15.9) 7.1 (5, 17.7) 6.9 (5, 15.9) 12 months 7.2 (5, 17.9) 7.0 (5, 15.6) 7.2 (5, 17.9) 7.0 (5, 15.6) 18 months - - 7.2 (5, 17.5) 7.1 (5, 17.4) 24 months - - 7.3 (5, 17.9) 7.1 (5, 16.7) HbA1c(%)ω Baseline 7.5 (6.8, 8.8) 7.1 (6.5, 8.3) 7.5 (6.8, 8.8) 7.1 (6.5, 8.3) 6 months 6.8 (6.3, 7.5) 6.6 (6, 7.4) 6.8 (6.3, 7.5) 6.6 (6, 7.4) 12 months 6.9 (6.3, 7.7) 6.6 (6, 7.5) 6.9 (6.3, 7.7) 6.6 (6, 7.5) 18 months - - 6.9 (6.3, 7.7) 6.7 (6.1, 7.6) 24 months - - 6.9 (6.3, 7.8) 6.8 (6.1, 7.6) HbA1c(%)η – Missingness Baseline 0 0 0 0 6 months 6622 (28) 1553 (31) 4110 (28) 883 (29) 12 months 6904 (30) 1643 (32) 4093 (28) 945 (31) 18 months - - 4518 (31) 1032 (33) 24 months - - 4680 (32) 1050 (34) Unless stated otherwise; α: Mean (95% CI); 𝜙: Mean (SD); η: N (%); ω: Median (IQR); γ: Mean (Min, Max)

Page 17 of 23

91 Table 2: Odds ratio (95% CI) and p-values for likelihood of missingness of HbA1c (%) measure at 6 and 12 months of follow-up in patients treated

with DPP-4 and GLP1-RA adjusted for age quartiles (Q1, Q2, Q3, Q4) , sex, pre-existing CVD and CKD and patients with baseline HbA1c≥ 7.5%.

6 months follow-up 12 months follow-up DPP-4 p-value GLP1-RA p-value DPP-4 p-value GLP1-RA p-value Age Quartiles Q2 0.84 (0.78, 0.90) <0.001 0.80 (0.70, 0.91) <0.001 0.79 (0.74, 0.85) <0.001 0.69 (0.60, 0.78) <0.001 Q3 0.75 (0.70, 0.81) <0.001 0.68 (0.58, 0.80) <0.001 0.74 (0.69, 0.80) <0.001 0.61 (0.52, 0.71) <0.001 Q4 0.70 (0.64, 0.76) <0.001 0.69 (0.55, 0.88) <0.001 0.66 (0.60, 0.72) <0.001 0.62 (0.50, 0.78) <0.001 Male vs Female 1.02 (0.97, 1.08) 0.48 1.05 (0.94, 1.19) 0.39 1.03 (0.98, 1.09) 0.28 0.87 (0.78, 0.98) 0.024 Cardiovascular Disease (CVD) 1.06 (0.98, 1.14) 0.16 1.03 (0.86, 1.23) 0.76 1.08 (1.00, 1.16) 0.06 1.13(0.95, 1.35) 0.17 Chronic Kidney Disease (CKD) 1.01 (0.88, 1.16) 0.88 1.02 (0.70, 1.50) 0.30 0.92 (0.80, 1.06) 0.26 0.99 (0.68, 1.43) 0.95 HbA1c ≥ 7.5% at Baseline 0.88 (0.83, 0.93) <0.001 0.86 (0.77, 0.97) 0.011 1.00 (0.94, 1.05) 0.81 0.94 (0.84, 1.05) 0.24

Page 18 of 23 Table 3: (i) Mean (SD) for HbA1c (%) at 6 and 12 months by treatment group for complete case and on imputed data by three imputation methods (ii) Change in HbA1c (%) at 6 and 12 months from baseline by treatment group for unadjusted analyses and adjusted for age, gender, baseline HbA1c (%), diabetes duration, and time to second-line ADD as measured on complete cases and on imputed data; (iii) Among patients with Hba1c ≥ 7.5% at the index date, the proportions of patients who reduced HbA1c ≤ 7% at 6 and 12 months during follow-up, by treatment groups, for complete cases and imputed data; (iv) Odds ratio for HbA1c ≤ 7% in patients treated with GLP-1RA compared to DPP-4 group adjusted for baseline HbA1c (%), age, gender, diabetes duration, time to second-line ADD and third-line ADD started within 6

(or 12 month) or not.

6 months follow-up 12 months follow-up DPP-4 GLP1-RA DPP-4 GLP1-RA HbA1c (%) CC 7.1 (1.3) 6.9 (1.3) 7.2 (1.4) 7.0 (1.4) MICE 7.1 (1.3) 6.9 (1.4) 7.2 (1.3) 7.0 (1.4) Two-fold 7.1 (1.3) 6.9 (1.3) 7.2 (1.3) 7.0 (1.4) Bayesian MCMC 7.1 (1.3) 6.9 (1.3) 7.2 (1.3) 7.0 (1.4) Change in HbA1c (%)α – Unadjusted CC -1.05 -0.89 -0.91 -0.73 (-1.07, -1.02) (-0.93, -0.84) (-0.94, -0.89) (-0.77, -0.68) MICE -1.00 -0.84 -0.91 -0.71 (-1.02, -0.98) (-0.87, -0.80) (-0.93, -0.89) (-0.75, -0.69) Two-fold -1.00 -0.84 -0.92 -0.72 (-1.02, -0.98) (-0.87, -0.80) (-0.94, -0.9) (-0.76, -0.69) Bayesian -1.00 -0.85 -0.92 -0.72 MCMC (-1.02, -0.98) (-0.88, -0.81) (-0.94, -0.9) (-0.76, -0.69) Change in HbA1c (%)α – Adjusted CC -1.14 -0.98 -1.00 -0.81 (-1.16, -1.11) (-1.02, -0.94) (-1.03, -0.98) (-0.85, -0.76) MICE -1.09 -0.93 -1.00 -0.79 (-1.11, -1.07) (-0.96, -0.89) (-1.02, -0.98) (-0.83, -0.75) Two-fold -1.09 -0.92 -1.01 -0.80 (-1.11, -1.07) (-0.95, -0.89) (-1.03, -0.99) (-0.84, -0.77) Bayesian -1.09 -0.93 -1.00 -0.80 MCMC (-1.11, -1.07) (-0.96, -0.89) (-1.02, -0.98) (-0.84, -0.77)

Page 19 of 23

93 Patients with HbA1c ≥7.5% at Baseline and ≤ 7% at follow-up CC 5193 (48) 1043 (49) 4768 (46) 956 (48) MICE 6142 (45) 1312 (47) 5673 (43) 1238 (45) Two-fold 6221 (45) 1294 (46) 5883 (43) 1226 (45) Bayesian 6112 (45) 1311 (47) 5774 (43) 1239 (45) MCMC Odds Ratio - Adjusted α CC Ref 1.03 (1, 1.06) Ref 1.02 (0.99, 1.04) MICE 1.03 (1, 1.06) 1.02 (1, 1.04) Two-fold 1.03 (1, 1.06) 1.01 (1, 1.04) Bayesian MCMC 1.03 (1, 1.05) 1 (0.99, 1.03) Unless stated otherwise; α: Mean (95% CI); 𝜙: Mean (SD); η: N (%); ω: Median (IQR); γ: Mean (Min, Max)

Page 20 of 23

94 Figure 1: Distribution of HbA1c (%) at 6 months and 12 months for DPP-4 and GLP-1RA respectively for complete case, MICE, Two-fold and MCMC imputation.

Page 21 of 23

95 Figure 2: Distribution of change in HbA1c (%) (∆HbA1c) at 6 months and 12 months for

DPP-4 and GLP-1RA respectively for complete case, MICE, Two-fold and MCMC imputation.

Page 22 of 23

96 Supplementary Figure 1: Trajectory plot for mean (95% CI) HbA1c (%) at baseline and follow- up for two treatment groups for complete case and on imputed data by three imputation methods.

Page 23 of 23

97 Chapter 7: Trends in Anti-diabetic Drug Prescribing Patterns

Statement of Contribution of Co-Authors for Thesis by Published Paper

The authors listed below have certified* that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Olga Montvida, Jonathan Shaw, John J Atherton, Francis Stringer, Sanjoy K Paul. Long-term Trends in Antidiabetes Drug Usage in the US: Real-world Evidence in Patients Newly Diagnosed With Type 2 Diabetes. Diabetes Care. 2017 Nov 6:dc171414.

Contributor Statement of Contribution* Olga Montvida Conceived the idea and was responsible for the primary 29.06.2018 design of the study. Conducted the data extraction and QUT Verified statistical analyses. Developed first draft and contributed Signature towards development of the manuscript. Jonathan Shaw Contributed significantly in the study design and manuscript development. John J Atherton Contributed significantly in the study design and manuscript development. Francis Stringer Contributed in the interpretation of the results and manuscript development.

98 Sanjoy K. Paul Conceived the idea and was responsible for the primary design of the study. Contributed to the statistical analyses. Developed first draft and contributed towards development of the manuscript.

Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship.

QUT Verified Signature

Sanjoy Ketan Paul ___ 29.06.2018 Name Signature Date

99 Diabetes Care Volume 41, January 2018 69

Olga Montvida,1,2 Jonathan Shaw,3 Long-term Trends in Antidiabetes John J. Atherton,4 Frances Stringer,5 and Drug Usage in the U.S.: Real-world Sanjoy K. Paul1,6 Evidence in Patients Newly Diagnosed With Type 2 Diabetes Diabetes Care 2018;41:69–78 | https://doi.org/10.2337/dc17-1414 PDMOOYHAT EVCSRESEARCH SERVICES EPIDEMIOLOGY/HEALTH

OJBECTIVE To explore temporal trends in antidiabetes drug (ADD) prescribing and intensification patterns, along with glycemic levels and comorbidities, and possible benefits of novel ADDs in delaying the need for insulin initiation in patients diagnosed with type 2 diabetes.

RESEARCH DESIGN AND METHODS Patients with type 2 diabetes aged 18–80 years, who initiated any ADD, were se- lected (n = 1,023,340) from the U.S. Centricity Electronic Medical Records. Those who initiated second-line ADD after first-line metformin were identified (subcohort 1, n = 357,482); the third-line therapy choices were further explored.

RESULTS 1 fi – Statistics Unit, QIMR Berghofer Medical Re- From 2005 to 2016, rst-line use increased for metformin (60 77%) and decreased search Institute, Brisbane, Australia for (20–8%). During a mean follow-up of 3.4 years post metformin, 48% 2Faculty of Health, School of Biomedical Sci- initiated a second ADD at a mean HbA1c of 8.4%. In subcohort 1, although sulfonyl- ences, Queensland University of Technology, – Brisbane, Australia urea usage as second-line treatment decreased (60 46%), it remained the most pop- 3 – Baker Heart and Diabetes Institute, Melbourne, ular second ADD choice. Use increased for insulin (7 17%) and dipeptidyl peptidase-4 Australia inhibitors (DPP-4i) (0.4–21%). The rates of intensification with insulin and sulfonyl- 4Cardiology Department, Royal Brisbane and ureas did not decline over the last 10 years. The restricted mean time to insulin Women’s Hospital, and University of Queensland initiation was marginally longer in second-line DPP-4i (7.1 years) and in the glucagon- School of Medicine, Brisbane, Australia 5Model Answers Pty Ltd, Brisbane, Australia like peptide 1 receptor agonist group (6.6 years) compared with sulfonylurea (6.3 6Melbourne EpiCentre, University of Melbourne, years, P < 0.05). Melbourne, Australia Corresponding author: Sanjoy K. Paul, sanjoy.paul@ CONCLUSIONS unimelb.edu.au. Most patients initiate second-line therapy at elevated HbA1c levels, with highly Received 14 July 2017 and accepted 25 Septem- heterogeneous clinical characteristics across ADD classes. Despite the introduction ber 2017. of newer therapies, sulfonylureas remained the most popular second-line agent, This article contains Supplementary Data online and the rates of intensification with sulfonylureas and insulin remained consistent at http://care.diabetesjournals.org/lookup/ over time. The incretin-based therapies were associated with a small delay in the suppl/doi:10.2337/dc17-1414/-/DC1. need for therapy intensification compared with sulfonylureas. This article is featured in a podcast available at http://www.diabetesjournals.org/content/ diabetes-core-update-podcasts. A broad choice of “old” and “new” antidiabetes drugs (ADDs) is available, which differ © 2017 by the American Diabetes Association. not only in their mechanisms of action but also in their glycemic and extraglycemic Readers may use this article as long as the work is properly cited, the use is educational and not effects (1). While treatment guidelines for type 2 diabetes are regularly updated based for profit, and the work is not altered. More infor- on new evidence, real-world prescription trends may also be driven by other factors, mation is available at http://www.diabetesjournals such as medication costs, side effect profile, and provider and patient preferences. .org/content/license.

100 70 Antidiabetes Treatment Patterns in the U.S. Diabetes Care Volume 41, January 2018

With the development of new classes of third-line ADDs over a long period of ethnicity, and longitudinal measures of antidiabetes therapies since 2005, including time (16,17). BMI, blood pressure, glycated hemoglo- incretin-based drugs and sodium–glucose To the best of our knowledge, the pro- bin (HbA1c), and lipids. All disease events cotransporter 2 inhibitors (SGLT2i), the gressive changes in the proportional dis- along with dates are coded with ICD-9, paradigm of therapy options for patients tributions across all new and old ADDs, ICD-10, or SNOMED CT codes. Medication with highly heterogeneous glycemic and and the patterns and determinants of data include brand names and doses for cardiovascular risk factors has changed therapy intensification with second- and individual medications prescribed, along significantly. However, the way in which third-line ADDs, have not been explored with start/stop dates and specific fields to this has occurred in real-world practice, comprehensively in any study. With rec- track treatment alterations. This data especially in the trade-off between older ognition of the growing disease burden set also contains patient-reported medi- and new classes of ADDs as initial and in- and increasing volumes of dispensed cations, including prescriptions received tensification therapy options, has not medications (18–21), the primary aim outside the EMR network and over-the- been studied thoroughly. of this study was to provide a comprehen- counter medications. The newer ADDs have been shown to sive up-to-date exploration of the treat- be associated with significantly lower risk ment pattern changes for type 2 diabetes Methods of hypoglycemia compared with the sul- in the U.S. using the nationally representa- Eleven antidiabetes therapeutic classes fonylureas (SU) and insulin (INS) (2). The tive Centricity Electronic Medical Records were considered in this study: MET, SU, weight neutrality or benefits of weight (CEMR) from primary and secondary am- TZD, a-glucosidase inhibitors (AGI), amy- reductions have also been well established bulatory care systems. Specifically, the aims lin, dopamine receptor agonists (DOPRA), for new therapies including the incretins were to 1) explore temporal changes in pre- meglitinides (MEG), DPP-4i, glucagon-like (3,4). Given the glycemic and extraglyce- scribing patterns from 2005 to 2016 with peptide 1 receptor agonists (GLP-1RA), mic benefits of these agents, one would respect to the drug initiation order, 2)ex- SGLT2i, and INS. For each patient, these expect a fall in the use of SU or INS as in- plore therapy intensification with second ADDs were arranged chronologically ac- tensification therapies. However, studies and third ADDs, 3) explore patient char- cording to the initiation dates. Same-day evaluating the possible benefits of using acteristics including risk factors and co- initiations (including combination thera- newer ADDs in terms of delaying the need morbidities according to ADD therapy pies) were prioritized in the order as listed for INS are scarce (5,6). In this context, un- prescribed, 4) explore the temporal pat- above, with highest order priority assigned derstanding the changing patterns of ther- terns in the rates of intensification with to MET and lowest to INS. Additions or apy initiation and intensifications with SU and INS, and 5) evaluate whether use switches were defined by comparing stop second- and third-line therapies, in conjunc- of incretin-based therapies as second-line dates and start dates of corresponding tion with the heterogeneous patients’ char- therapy delays the need for intensifica- therapies. Details on the medication data acteristics, is a fundamental background tion with third-line ADDs and with INS structure, associated data-mining chal- requirement. any time during follow-up. lenges, and description of an algorithm Cohen et al. (7) explored ADD-prescribing applied to extract and aggregate patient- patterns in the U.S. from 1997 to 2000 and RESEARCH DESIGN AND METHODS level medication data from CEMR have reported decreasing use of SU and increas- Data Source recently been published (24). ing trends in metformin (MET) and thiazoli- CEMR represents a variety of ambulatory For convenience, AGI, amylin, DOPRA, dinedione (TZD) prescription over time. and primary care medical practices, in- and MEG were combined into the “other” Utilizing a claims database, Desai et al. cluding solo practitioners, community category. Saxenda (a version of liraglutide) (8) reported an increasing proportional clinics, academic medical centers, and was excluded from the GLP-1RA list, as share of MET and decreasing prescrip- large integrated delivery networks in the it was approved in 2014 for weight lower- tions for TZD between 2006 and 2008. U.S. More than 34 million individuals’ ing and not as an ADD (25). Although Wel- One of the reasons for the reduction in longitudinal electronic medical records chol (colesevelam) was approved for the the use of TZD was the safety concerns (EMRs) were available from 1995 to April treatment of type 2 diabetes, it was mainly (9–12). While Berkowitz et al. (13) evalu- 2016. More than 35,000 physicians and prescribed to reduce cholesterol levels; ated treatment initiations with MET, SU, other providers from all U.S. states con- therefore, we did not include colesevelam and dipeptidyl peptidase-4 inhibitors tribute to the CEMR, of whom ;75% are in our analyses (18). (DPP-4i) between 2009 to 2013 in the primary care providers. The database is Patients with diabetes were identified U.S., utilization patterns of other ADDs generally representative of the U.S. pop- on the basis of diagnostic codes; those and the changing utilization trend over ulation, with a diabetes prevalence of 7.1% with a diagnosis of type 1 diabetes or only time were not explored (14). Lipska (identified by diagnostic codes) that is gestational diabetes mellitus were identified et al. (15) have evaluated the temporal similar to the national diabetes preva- and excluded. For identified patients with trend in the use of ADDs from 2006 to lence of 6.7% (diagnosed diabetes in type 2 diabetes, the following inclusion cri- 2013 using claims data from the U.S. A 2014) (14). The CEMR has been used ex- teria were applied: 1) age at diagnosis $18 small number of studies have explored tensively for academic research world- and ,80 years, 2) diagnosis date strictly the clinical characteristics of patients wide (3,22,23). after first registered activity in the data- according to the type of ADD pre- This database contains comprehensive base, 3) diagnosis date on or after 1 Janu- scribed, but only over relatively short patient-level information on demo- ary 2005, and 4)initiationofanyADD. time periods, and did not evaluate treat- graphic, anthropometric, clinical, and lab- Demographic variables included sex, ment intensification with second- or oratory variables including age, sex, age, and ethnicity. HbA1c values on the

101 care.diabetesjournals.org Montvida and Associates 71

date of diagnosis and first-, second-, and initiated any ADD (study cohort) (Supple- white Caucasian, female, and obese, as third-line ADD therapy initiation were ob- mentary Fig. 1) during the available compared with those who initiated with tained as the closest observations within mean/median 3.4/2.8 years of follow-up MET,DPP-4i,INS,TZD,orSU. a 3-month window. Body weight, BMI, time. In the study cohort, 46% were male, systolic/diastolic blood pressure (SBP/DBP), mean (SD) age was 58 years (13), and 68% Comorbidities lipids (LDL, HDL, and triglycerides), and were white Caucasians, 12% blacks, and The proportions of patients with CVD, heart rate were calculated as the aver- 2% Asians (Table 1). chronic kidney disease (CKD), cancer, or fi age of available measurements within a depression at rst ADD initiation were 3-month window of the diagnosis or ADD First ADD 19%, 4%, 4%, and 11%, respectively. initiation date. Obesity was defined as Prescription Pattern Changes Over Time Those patients initiating therapy with fi BMI $30 kg/m2. Figure 1A presents the proportional dis- INS had a signi cantly higher prevalence fi , , The presence of comorbidities prior to tribution of the rst-line ADD by year of of CVD (27%, P 0.01), CKD (11%, P the first and second drug initiation was initiation. The proportional share of MET 0.01), and higher Charlson Comorbidity fi explored. Cardiovascular disease (CVD) as the rst choice increased consistently Indexwithmean(SD)of1.84(1.31), fi was defined as ischemic heart disease (in- from 60% in 2005 to 77% in 2016 ( rst compared with those initiating with MET, ’ cluding myocardial infarction), peripheral quarter of the year). SU s share declined DPP-4i, GLP-1RA, or TZD (Table 1 for com- ’ vascular disease, heart failure, or stroke. from 20 to 8%, while INS s share ranged parative estimates). Cancer was defined as any malignancy ex- from 8 to 10%. Starting at 11% in 2005, TZD’s proportional share dropped pro- Discontinuation of First ADD cept malignant neoplasm of skin. Charlson Among patients with at least 1 year of Comorbidity Index was defined and calcu- gressively to 0.7% in 2016. Other drugs were chosen as first-line in 3% of cases follow-up (n = 813,826), the proportions lated following the algorithm described by fi or less. of patients discontinuing the rst-line Quan et al. (26). ADD within 1 year by individual ADDs Patients’ Characteristics are presented in Table 1. While only 8% Statistical Methods In the study cohort of 1,023,340 patients, The characteristics of patients were sum- of patients discontinued MET within a the distribution of prescription patterns marized by ADD classesdat first prescrip- year, 20%, 17%, and 25% of patients dis- for individual ADDs at any time from Jan- tion and at second ADD initiation when continued GLP-1RA, DPP-4i, and SGLT2i uary 2005 to April 2016 and as the first addedtoMET.Separateanalyseswere within a year, respectively. ADD are presented in Table 1. The demo- conducted to explore the pattern of ad- graphic and clinical characteristics of the Second ADD dition or switch to third ADD by major patients, along with the prevalence of co- Among 740,478 patients who initiated classes of second-line ADDs. Study vari- morbidities at the time of first ADD initi- therapy with MET, 357,482 (48%, subco- ables were summarized as number (%), ation, are also presented in Table 1. hort 1) (Supplementary Fig. 1) initiated a mean (SD), or median (first quartile [Q1], In the study cohort, 79% received MET second ADD, with an annual mean rate of third quartile [Q3]) as appropriate. In pa- any time during the recorded follow-up, 10.7 initiations per 100 person-years tients who had a second-line ADD added and 72% received MET as the first ADD. (minimum 10.2, maximum 14.0) during after MET, and had at least 1 year of The mean time to initiation of MET as a mean 3.3 years of available follow-up, follow-up post–second-line initiation, the first ADD and the available follow- at an average HbA level of 8.4% the “restricted mean survival time” esti- 1c up time since initiation were 3.7 months (68 mmol/mol), with 60% and 48% having mation approach was used to compare and 3.3 years, respectively. The propor- HbA $7.5% (58 mmol/mol) and 8.0% the mean time to the third ADD/INS ini- 1c tions of patients with HbA level $7.5% (64 mmol/mol), respectively. The propor- tiation among major second-line ADD 1c (58 mmol/mol) and 8% (64 mmol/mol) at tional share of second-line ADD (post- groups. This method computes survival MET initiation were 48% and 37%, respec- MET) over time is presented in Fig. 1B. time as time to third ADD/INS if initiated, tively. Those who initiated with GLP-1RA, The demographic and clinical characteris- and otherwise as time to the end of fol- DPP-4i, TZD, and SU had a similar mean tics of the patients along with the time to low-up (date of patient’s last available re- HbA of 8.0, 7.7, 7.8, and 8.0%, respec- second ADD, and the prevalence of co- cord within the database). Standard life 1c tively (64, 61, 62, and 64 mmol/mol). INS morbidities at the time of second drug table methods were used to estimate was initiated at an average HbA of 8.9% initiation, are presented in Table 2. rates per 100 person-years (95% CI) of 1c (74 mmol/mol), with 71% and 59% having Although the proportional share of SU second-line ADD, INS, and SU initiations in HbA $7.5% (58 mmol/mol) and 8% as a second-line therapy gradually de- patients with a minimum of 1 year follow- 1c (64 mmol/mol), respectively. creased from 60 to 46% over time (Fig. 1B), up post–MET initiation. Patients who initiated treatment with it remained the most popular choice MET were younger (mean age 57 years, (53%) of therapy intensification post– RESULTS with 19% $70 years) than those who MET initiation across the whole time pe- From the 2,624,954 patients identified initiated with SU (mean age 64 years, 43% riod. SU was initiated as second-line ADD with type 2 diabetes, 2,590,853 were $70 years), with INS (mean 60 years, at an average HbA1c level of 8.4% (68 aged between $18 and ,80 years at 29% $70years), withTZD (mean 62years, mmol/mol), with 62% and 49% having the date of diagnosis with mean/median 32% $70 years), or with DPP-4i (mean HbA1c $7.5% (58 mmol/mol) and 8.0% 3.9/2.7 years of follow-up. Of these patients, 64 years, 39% $70 years). Those who (64 mmol/mol), respectively (Table 2). 1,305,686 (50%) were newly diagnosed af- had GLP-1RA and SGLT2i as the first Among patients with a second ADD ter 1 January 2005, while 1,023,340 patients ADD were younger, more likely to be and a minimum 1 year of follow-up, only

102 2Atdaee ramn atrsi h U.S. the in Patterns Treatment Antidiabetes 72

Table 1—Patient characteristics at the time of first ADD initiation, by drug class in the study cohort (N = 1,023,340) GLP-1RA SGLT2i MET INS TZD DPP-4i SU Other§ All Any time during follow-up n (% of N) 68,522 (7) 39,549 (4) 808,518 (79) 270,432 (26) 109,754 (11) 182,457 (18) 354,367 (35) 25,358 (2) 1,023,340 (100) Time to first prescription, months* 22.32 (27.23) 39.64 (33.48) 5.3 (14.16) 13.03 (22.54) 8.93 (17.72) 19.57 (25.56) 11.09 (20.25) 14.97 (23.56) d At first ADD n2(%ofN) 9,494 (1) 1,935 (0.2) 740,478 (72) 93,078 (9) 28,004 (3) 25,005 (2) 116,435 (11) 8,911 (1) d Time to first prescription, months* 3.82 (11.92) 7.94 (18.68) 3.73 (11.78) 3.74 (11.06) 2.33 (8.2) 4.98 (13.65) 3.84 (11.51) 2.45 (9.66) 3.73 (11.66) Follow-up from ADD initiation, years* 3.22 (2.5) 0.98 (0.62) 3.31 (2.54) 2.94 (2.45) 5.06 (3.02) 2.79 (2.1) 3.72 (2.7) 3.86 (2.71) 3.36 (2.58) Follow-up $1yearfromfirst ADD initiation, n3(%ofn2) 7,400 (78) 889 (46) 589,246 (80) 68,385 (73) 25,105 (90) 19,278 (77) 95,854 (82) 7,669 (86) 813,826 (80) Discontinuation within 1 year, n (% of n3) 1,516 (20) 225 (25) 44,485 (8) 3,359 (5) 4,795 (19) 3,243 (17) 9,765 (10) 1,367 (18) 68,755 (8) Age (years)* 55 (12) 56 (11) 57 (13) 60 (13) 62 (11) 64 (11) 64 (11) 58 (16) 58 (13) Age $70 years, n (% of n2) 1,217 (13) 223 (12) 144,210 (19) 26,790 (29) 8,838 (32) 9,741 (39) 50,280 (43) 2,925 (33) 244,224 (24) Male, n (% of n2) 3,205 (34) 849 (44) 332,206 (45) 44,016 (47) 14,075 (50) 10,968 (44) 59,208 (51) 3,339 (37) 467,866 (46) White Caucasian, n (% of n2) 7,005 (74) 1,430 (74) 512,521 (69) 58,396 (63) 18,342 (65) 17,082 (68) 77,533 (67) 5,812 (65) 698,121 (68) Black, n (% of n2) 899 (9) 218 (11) 83,767 (11) 14,089 (15) 2,795 (10) 2,957 (12) 13,988 (12) 944 (11) 119,657 (12) HbA , %* 8 (1.6) 8.1 (1.7) 8.1 (1.8) 8.9 (2.1) 7.8 (1.6) 7.7 (1.5) 8 (1.6) 7.8 (1.5) 8.2 (1.8) 103 1c HbA1c, mmol/mol‡ 64 65 65 74 62 61 64 62 66 HbA1c $7.5% (58 mmol/mol), n (% of n2) 831 (47) 295 (51) 108,114 (48) 19,756 (71) 2,249 (42) 2,410 (40) 15,109 (49) 509 (45) 149,273 (50) HbA1c $8% (64 mmol/mol), n (% of n2) 609 (34) 209 (36) 82,914 (37) 16,284 (59) 1,565 (29) 1,656 (28) 10,900 (36) 348 (31) 114,485 (39) Weight, kg* 107.6 (26.6) 103.5 (24.8) 98.2 (24.9) 95.3 (25.3) 96.7 (25.2) 92.8 (24) 93.6 (23.7) 87.6 (24.6) 97.3 (24.9) BMI, kg/m2* 38.1 (8.5) 36.2 (8) 34.6 (7.9) 33.6 (8.3) 34 (7.9) 33 (7.6) 33.1 (7.6) 31.5 (8) 34.3 (7.9) Obese, n (% of n2) 6,460 (85) 1,281 (78) 427,651 (70) 43,030 (64) 12,954 (66) 12,106 (62) 53,029 (62) 3,476 (51) 559,987 (68) SBP, mmHg* 129 (15) 129 (14) 131 (15) 131 (18) 131 (16) 130 (16) 132 (17) 126 (17) 131 (16) SBP $140 mmHg, n (% of n2) 1,538 (21) 345 (22) 153,062 (25) 20,320 (29) 5,400 (27) 4,960 (25) 26,435 (30) 1,313 (19) 213,373 (26) DBP, mmHg* 78 (10) 78 (9) 78 (10) 75 (11) 75 (10) 75 (10) 75 (10) 74 (10) 77 (10) Heart rate, bpm* 79 (11) 79 (12) 78 (12) 78 (12) 75 (12) 76 (12) 76 (12) 76 (11) 78 (12) LDL, mg/dL* 103 (37) 106 (39) 106 (37) 98 (40) 100 (38) 99 (37) 98 (37) 98 (37) 104 (37) HDL, mg/dL* 45 (13) 45 (14) 44 (13) 44 (15) 46 (14) 45 (14) 44 (14) 48 (15) 44 (13) Triglycerides, mg/dL† 140 (102, 187) 144 (104, 195) 143 (104, 193) 133 (93, 184) 131 (94, 182) 141 (102, 188) 142 (103, 192) 123 (88, 174) 142 (103, 192) Care Diabetes CVD, n (% of n2) 1,422 (15) 331 (17) 118,342 (16) 25,051 (27) 5,446 (19) 6,182 (25) 31,293 (27) 1,815 (20) 189,882 (19) CKD, n (% of n2) 377 (4) 47 (2) 12,590 (2) 10,329 (11) 1,820 (6) 2,371 (9) 10,988 (9) 643 (7) 39,165 (4) Cancer, n (% of n2) 289 (3) 64 (3) 30,195 (4) 3,636 (4) 1,374 (5) 1,340 (5) 6,551 (6) 451 (5) 43,900 (4) Depression, n (% of n2) 1,266 (13) 213 (11) 88,673 (12) 7,834 (8) 2,210 (8) 2,327 (9) 8,931 (8) 845 (9) 112,299 (11) Charlson Comorbidity Index* 1.47 (0.9) 1.45 (0.91) 1.44 (0.89) 1.84 (1.31) 1.57 (1.06) 1.76 (1.22) 1.77 (1.23) 1.66 (1.16) 1.53 (1.0) 2018 January 41, Volume bpm, beats per minute; n2, number of study cohort patients prescribed each drug class as a first ADD; n3, number of n2patientswith$1 year follow-up after first ADD initiation. *Mean (SD); †median (interquartile range); ‡mean; §other: amylin, DOPRA, AGI, or MEG. care.diabetesjournals.org Montvida and Associates 73

Figure 1—A: Proportional share of the first ADD by year of initiation in the study cohort. B: Proportional share of the second ADD by year of initiation in subcohort 1 and key studies listed. C: In patients with a minimum of 1 year of follow-up post-MET, annual rates (95% CI) of SU and INS initiations per 100 person-years. Subcohort 1: initiated second ADD and had MET as first-line treatment. *Other: amylin, DOPRA, AGI, or MEG. EMPA REG, BI 10773 (Empagliflozin) Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patients; EXAMINE, Examination of Cardiovascular Outcomes with Alogliptin versus Standard of Care; FDA, U.S. Food and Drug Administration; LEADER, Liraglutide Effect and Action in Diabetes: Evaluation of cardiovascular outcome Results; PROactive, Prospective Pioglitazone Clinical Trial in Macrovascular Events; RECORD, Evaluated for Cardio- vascular Outcomes in Oral Agent Combination Therapy for Type 2 Diabetes; SAVOR-TIMI, Saxagliptin Assessment of Vascular Outcomes Recorded in Patients with Diabetes Mellitus–Thrombolysis in Myocardial Infarction; TECOS, Trial Evaluating Cardiovascular Outcomes with Sitagliptin; UKPDS, UK Prospective Diabetes Study.

104 4Atdaee ramn atrsi h U.S. the in Patterns Treatment Antidiabetes 74

Table 2—Patient characteristics at the time of the second ADD initiation, by drug class added in subcohort 1|| (N =357,482) GLP-1RA SGLT2i INS TZD DPP-4i SU Other§ All n1(%ofN) 15,448 (4) 5,971 (2) 49,939 (14) 33,021 (9) 61,508 (17) 187,819 (53) 3,776 (1) 357,482 (100) Time from first to second ADD (months)* 11.1 (18.58) 18.52 (23.73) 5.74 (14.58) 4.09 (11.19) 11.15 (18.95) 7.38 (16.08) 7.35 (15.92) 7.84 (16.5) Follow-up from second ADD initiation (years)* 2.97 (2.44) 0.95 (0.66) 2.71 (2.26) 4.77 (2.98) 2.68 (2.01) 3.34 (2.56) 3.59 (2.65) 3.22 (2.53) Follow-up $1 year from second ADD initiation, n2(%ofn1) 11,431 (74) 2,558 (43) 36,337 (73) 28,841 (87) 46,822 (76) 149,109 (79) 3,090 (82) 278,188 (78) Discontinuation within 1 year, n (% of n2) 2,407 (21) 643 (25) 2,537 (7) 5,913 (21) 8,234 (18) 15,569 (10) 724 (23) 36,027 (13) Age (years)* 53 (12) 54 (11) 57 (13) 58 (11) 58 (12) 60 (12) 61 (12) 59 (12) Age $70 years, n (% of n1) 1,203 (8) 435 (7) 9,576 (19) 6,450 (20) 11,595 (19) 47,416 (25) 1,116 (30) 77,791 (22) Male, n (% of n1) 5,305 (34) 2,909 (49) 23,366 (47) 17,301 (52) 29,463 (48) 97,730 (52) 1,664 (44) 177,738 (50) White Caucasian, n (% of n1) 11,698 (76) 4,622 (77) 33,215 (67) 22,710 (69) 43,076 (70) 130,418 (69) 2,575 (68) 248,314 (69)

HbA1c, %* 7.8 (1.6) 8.1 (1.8) 9.3 (2.3) 7.9 (1.7) 8.2 (1.7) 8.4 (1.8) 7.5 (1.6) 8.4 (1.9)

HbA1c,mmol/mol‡ 62 65 78 63 66 68 58 68

HbA1c $7.5% (58 mmol/mol), n (% of n1) 3,890 (44) 2,102 (59) 19,083 (73) 8,150 (46) 20,960 (57) 65,980 (62) 587 (41) 118,063 (60) $ 105 HbA1c 8% (64 mmol/mol), n (% of n1) 2,913 (33) 1,581 (44) 16,871 (64) 6,106 (34) 15,768 (43) 51,841 (49) 418 (29) 93,499 (48) Weight, kg* 108.1 (25.9) 105.1 (25.2) 99.8 (26) 99.9 (24.3) 98 (24.2) 98 (24.6) 94 (26.1) 98.9 (24.9) BMI, kg/m2* 38.1 (8.2) 36.2 (7.8) 35 (8.3) 34.8 (7.8) 34.3 (7.6) 34.3 (7.7) 33.3 (7.9) 34.6 (7.8) Obese, n (% of n1) 12,429 (86) 4,387 (79) 32,472 (71) 20,920 (71) 39,574 (69) 117,232 (68) 1,961 (61) 222,627 (70) SBP, mmHg* 128 (14) 129 (14) 131 (16) 130 (15) 130 (14) 132 (15) 129 (15) 131 (15) SBP $140 mmHg, n (% of n1) 2,565 (18) 1,098 (20) 11,642 (25) 6,993 (24) 12,446 (22) 46,191 (27) 703 (22) 79,837 (25) DBP, mmHg* 78 (9) 79 (9) 76 (10) 77 (9) 78 (9) 77 (10) 75 (9) 77 (9) Heart rate, bpm* 81 (11) 80 (12) 80 (12) 78 (11) 79 (11) 79 (12) 78 (12) 79 (12) LDL, mg/dL* 96 (34) 79 (9) 98 (37) 97 (35) 98 (35) 97 (35) 75 (9) 97 (35) HDL, mg/dL* 44 (12) 43 (12) 43 (13) 45 (13) 44 (12) 43 (12) 47 (15) 43 (12) Triglycerides, mg/dL† 150 (109, 200) 156 (115, 207) 143 (103, 196) 139 (100, 190) 148 (109, 197) 149 (109, 199) 135 (97, 185) 147 (107, 197) ibtsCare Diabetes CVD, n (% of n1) 2,134 (14) 1,004 (17) 11,781 (24) 5,894 (18) 12,212 (20) 41,220 (22) 876 (23) 75,121 (21) CKD, n (% of n1) 301 (2) 128 (2) 1,686 (3) 836 (3) 2,090 (3) 6,806 (4) 140 (4) 11,987 (3) Cancer, n (% of n1) 606 (4) 264 (4) 2,554 (5) 1,350 (4) 3,359 (5) 9,472 (5) 237 (6) 17,842 (5) Depression, n (% of n1) 2,979 (19) 1,123 (19) 7,294 (15) 3,710 (11) 9,073 (15) 23,433 (12) 465 (12) 48,077 (13) Charlson Comorbidity Index* 1.52 (0.9) 1.58 0.0(1) 1.71 (1.16) 1.48 (0.92) 1.63 (1.08) 1.63 (1.08) 1.69 (1.11) 1.62 (1.07) 2018 January 41, Volume bpm, beats per minute; n1, number of subcohort 1 patients prescribed each drug class as a second ADD; n2, number of n1 patients with $1 year follow-up after second ADD initiation. *Mean (SD); †median (interquartile range); ‡mean; §other: amylin, DOPRA, AGI, or MEG; |subcohort 1: initiated second ADD and had MET as first-line treatment. care.diabetesjournals.org Montvida and Associates 75

10% discontinued SU within 1 year com- ADDs. Most of the patients (84% [n = groups are presented in Table 3. The pared with significantly higher discontin- 121,559]) added a third drug on top of mean time to intensification with a third uation proportions in other second-line the second ADD, while 16% (n = 22,547) ADD was marginally longer in incretin non-INS ADDs. ceased the second ADD and switched to groups (DPP-4i 4.1 years [95% CI 4.1, The proportional share of DPP-4i as a a third ADD. Addition of the third drug 4.2] and GLP-1RA 4.2 years [4.1, 4.3]) therapy intensification option post–MET occurred at higher HbA1c levels (8.5% compared with that in patients with SU initiation sharply increased from 0.4% in [69 mmol/mol]) compared with switching as the second-line ADD (3.9 years [3.8, 2006 (approved in October 2006) to 20% (8.2% [66 mmol/mol]). 3.9]; P = 0.04). The restricted mean times in 2016 (Fig. 1B). DPP-4i were initiated at Among patients with SU as the sec- to intensification with INS any time during an average HbA1c of 8.2% (66 mmol/mol), ond ADD, 49% (n = 73,776) added and follow-up were 6.3, 7.1, and 6.6 years in with 57% and 43% having HbA1c $7.5% 6% (n = 8,204) switched to a third drug the SU, DPP-4i, and GLP-1RA groups, re- (58 mmol/mol) and 8.0% (64 mmol/mol), during a mean follow-up of 4.1 years. The spectively (all comparative P , 0.05). respectively. While 18% discontinued most popular third ADD addition was DPP-4i within a year of initiation, the pro- DPP-4i (34% of those who added a third CONCLUSIONS portions of patients discontinuing second- ADD), followed by INS (28%) and TZD This longitudinal exploratory study of a line GLP-1RA, TZD, or SGLT2i within a year (26%). Among those who switched, al- large cohort of patients with type 2 dia- were higher. most one-half (49%) switched to INS, betes observed between 2005 and 2016 The proportional share of patients while 30% and 8% switched to DPP-4i from primary and ambulatory care sys- receiving GLP-1RA as a second ADD in- and GLP-1RA, respectively. tems in the U.S. provides 1) a detailed creased from 3% in 2006 to 7% in 2016. Ini- SU, DPP-4i, and GLP-1RA were added account of glycemic states, clinical char- tiation of GLP-1RA occurred at relatively to INS in 32%, 26%, and 22% of patients acteristics, and comorbidities at first-line lower HbA1c levels of 7.8% (62 mmol/mol) (from those who added a third ADD), and second-line therapy initiation by dif- and at the highest BMI levels among respectively. Only 3% of patients ceased ferent drug classes, as well as new in- second-line ADD groups. Twenty-one per- INS therapy to switch to another ADD sights into 2) the changes in the choice cent of patients discontinued GLP-1RA during a mean 3.6 years of follow-up. In of first- and second-line ADDs over the therapy within a year of commencing it the second-line DPP-4i group (n = 46,822), last 10 years, 3) patterns of therapy inten- as a second ADD. After approval of the 40% added and 11% switched to a third sification with third-line ADDs and with first SGLT2i in 2013, the proportional drug during a mean 3.4 years of follow- INS, separately for major second-line share of those receiving it as a second up. The most popular third ADD addition ADDs, 4) changes in the annual rates of ADD reached 7% in 2016. One-quarter was SU (40% of those who added a third therapy intensification with SU and INS of patients discontinued SGLT2i therapy ADD), followed by INS (29%) and GLP-1RA over time, and 5) possible benefits of us- within a year of adding it as second-line (9%). Of those who switched from DPP-4i, ing newer novel antidiabetes therapies in ADD. The proportional share of patients one-half of the patients moved to SU, fol- terms of delaying the need for third-line receiving TZD as a second-line therapy lowed by INS and GLP-1RA (17%). therapy intensification, including the dropped from 30 to 4% (Fig. 1B), with Among those who had a GLP-1RA as the need for initiating INS. 21% of patients discontinuing therapy second ADD, 52% added INS (of those who With 3.4 years of mean follow-up in within 1 year. addedathirdADD)and18%switchedto more than one million patients with a di- The proportional share of patients re- INS (of those who switched to a third ADD) agnosis of type 2 diabetes from 2005, this ceiving INS as a second ADD post–MET during a mean 3.9 years of follow-up; 11% study provides robust and detailed infor- initiation has consistently increased from added and 34% switched to DPP-4i. In the mation on the changing clinical practices 7% in 2005 to 17% in 2016 (Fig. 1B). The TZD group, 43% added and 22% switched for the management of type 2 diabetes intensification with INS occurred at a toathirdADDduring5.4yearsoffollow- in a real-world setting. We are not aware 9.3% (78 mmol/mol) average HbA1c level, up. Among those who switched, 45% of any study that simultaneously evalu- with 73% and 64% having HbA1c $7.5% choseSUwhile35%movedtoDPP-4i. ated the changing prescribing patterns (58 mmol/mol) and 8.0% (64 mmol/mol), of old and new ADDs as first-line therapy respectively. Only 7% patients discontin- Temporal Changes in Rates of and as intensification options at various ued INS within 1 year of initiation. Intensification With SU and INS levels of glycemia and comorbidities. Among patients with first-line MET and a The proportional share of MET as the Third ADD minimum 1 year of follow-up, the annual first-line therapy choice has increased Among patients in subcohort 1, 78% had rates per 100 person-years of INS/SU ini- from 60 to 77%, while that for SU has at least 1 year of follow-up from the sec- tiation (irrespective of order of therapy decreased from 20 to 8%, over the last ond ADD initiation (subcohort 2; n = intensification) are presented in Fig. 1C. decade. However, SU continue to be the 278,188). Of these patients, 144,106 The rates did not significantly decline most popular second-line therapy inten- (52%) initiated a third ADD, with an an- from 2005 to 2014. sification option, although with a declin- nual mean rate of 12.6 initiations per ing share (from 60 to 46% over the last 100 person-years (minimum 11.4, maxi- Do Novel ADDs Help Delay the Need decade). The discontinuation rate of SU mum 14.9) during a mean follow-up for Therapy Intensification? was found to be the lowest among non- of 4 years post–second-line initiation. The Kaplan-Meier analyses, based on re- INS second-line ADDs. Among those who Table 3 presents treatment intensifica- stricted mean years to adding or moving intensified with a third-line therapy, the tion patterns by the major second-line to a third ADD, in major second-line ADD ratio of addition to switching to third ADD

106 6Atdaee ramn atrsi h U.S. the in Patterns Treatment Antidiabetes 76

Table 3—Intensification of major second-line therapies in subcohort 2‡ (N =278,188) GLP-1RA INS TZD DPP-4i SU All N 11,431 36,337 28,841 46,822 149,109 278,188 Follow-up from second ADD initiation, years* 3.85 (2.24) 3.56 (2.09) 5.40 (2.66) 3.37 (1.81) 4.09 (2.35) 4.02 (2.33) Initiated third ADD, n (% of N) 5,942 (52) 10,677 (29) 18,788 (65) 23,840 (51) 81,980 (55) 144,106 Initiated INS, n (% of N) 3,285 (29) 8,223 (29) 9,633 (21) 45,293 (30) 67,812 Restricted mean time to a third ADD, years§ 4.23 (4.14, 4.32) 6.15 (6.09, 6.21) 3.53 (3.49, 3.58) 4.12 (4.07, 4.17) 3.91 (3.88, 3.93) 4.18 (4.17, 4.20) Restricted mean time to INS, years§ 6.58 (6.49, 6.67) 6.82 (6.78, 6.87) 7.14 (7.09, 7.18) 6.26 (6.23, 6.28) 6.51 (6.49, 6.53) Added third ADD n1(%ofN) 4,522 (40) 9,675 (27) 12,481 (43) 18,881 (40) 73,776 (49) 121,559 (44) HbA1c, %* 8.2 (1.7) 8.9 (2) 8.1 (1.8) 8.5 (1.8) 8.6 (1.7) 8.5 (1.8) HbA1c, mmol/mol† 66 74 65 69 70 69 HbA1c $7.5% (58 mmol/mol), n (% of n1) 1,798 (61) 4,953 (74) 4,102 (55) 8,968 (69) 32,611 (72) 53,100 (69) HbA1c $8% (64 mmol/mol), n (% of n1) 1,364 (47) 4,160 (62) 3,101 (42) 6,974 (54) 26,231 (58) 42,330 (55) GLP-1RA as third ADD, n (% of n1) 2,125 (22) 1,532 (12) 1,643 (9) 5,662 (8) 11,230 (9) 107 INS as third ADD, n (% of n1) 2,356 (52) 3,338 (27) 5,472 (29) 20,483 (28) 32,447 (27) TZD as third ADD, n (% of n1) 241 (5) 688 (7) 1,425 (8) 19,010 (26) 21,472 (18) DPP-4i as third ADD, n (% of n1) 481 (11) 2,488 (26) 3,804 (30) 25,216 (34) 32,696 (27) SU as third ADD, n (% of n1) 865 (19) 3,090 (32) 3,048 (24) 7,556 (40) 14,844 (12) SGLT2i as third ADD, n (% of n1) 521 (12) 938 (10) 138 (1) 2,399 (13) 2,077 (3) 6,095 (5) Switched to third ADD n2(%ofN) 1,420 (12) 1,002 (3) 6,307 (22) 4,959 (11) 8,204 (6) 22,547 (8) HbA1c, %* 7.9 (1.6) 8.2 (1.8) 7.8 (1.7) 8.1 (1.7) 8.7 (2) 8.2 (1.8) HbA1c, mmol/mol† 63 66 62 65 72 66 HbA1c $7.5% (58 mmol/mol), n (% of n2) 470 (54) 377 (60) 1,846 (49) 1,940 (59) 3,380 (68) 8,231 (59) HbA1c $8% (64 mmol/mol), n (% of n2) 335 (38) 280 (44) 1,290 (34) 1,451 (44) 2,780 (56) 6,286 (45) GLP-1RA as third ADD, n (% of n2) 84 (8) 383 (6) 842 (17) 677 (8) 2,065 (9) INS as third ADD, n (% of n2) 262 (18) 703 (11) 849 (17) 4,012 (49) 5,931 (26) Care Diabetes TZD as third ADD, n (% of n2) 61 (4) 48 (5) 273 (6) 510 (6) 924 (4) DPP-4i as third ADD, n (% of n2) 488 (34) 269 (27) 2,199 (35) 2,436 (30) 5,561 (25) SU as third ADD, n (% of n2) 390 (27) 521 (52) 2,829 (45) 2,456 (50) 6,450 (29) SGLT2i as third ADD, n (% of n2) 205 (14) 66 (7) 109 (2) 464 (9) 373 (5) 1,228 (5) oue4,Jnay2018 January 41, Volume n1, number of subcohort 2 patients for each drug class who added a third ADD; n2, number of subcohort 2 patients for each drug class who switched to a third ADD. *Mean (SD); †mean; §mean (95% CI). ‡Subcohort 2: those with MET as first-line treatment, who initiated second ADD, with a minimum of 1 year of follow-up post–second drug initiation. care.diabetesjournals.org Montvida and Associates 77

was highest in the SU group (9.0), followed at first-line ADD initiation were similar drugs are not overestimated. We observed by DPP-4i (3.8) and GLP-1RA (3.2), during .4 across all ADDs, except in those who ini- that only 39,549 patients with type 2 diabe- years of mean follow-up post–second-line tiated with INS, whose average HbA1c was tes were using SGLT2i during the available ADD initiation. 8.9% (74 mmol/mol). Although the mean follow-up period, which is not surprising, We observed that second-line SU users time to second ADD post–MET initiation given that they were first approved in 2013. initiate a third ADD marginally sooner com- was only 8 months, it occurred at a high To conclude, while we have observed paredwithincretinusers.Astudybasedon HbA1c level of 8.4% (68 mmol/mol), significant increase in the use of MET as EMR data from the U.K. reported the oppo- with 60% and 48% of patients having the first-line therapy over the last 10 years, site results, with an average of 1.6 and 2.4 HbA1c $7.5% (58 mmol/mol) and 8% the second- and third-line therapy intensi- years to third ADD initiation in the 3,080 (64 mmol/mol), respectively. Among fication choices are highly heterogeneous. and 15,508 patients treated with MET + those with a minimum of 1 year of follow- While increasing popularity of “new” DPP-4i and MET + SU, respectively (5). up post–second ADD, ;52% intensified drugs, especially DPP-4i and SGLT2i, was The proportions of patients who added with a third ADD at an average HbA1c observed as the second and third drugs INS were similar between patients who level of 8.5% (69 mmol/mol). These find- choices, SU remain the most popular had a DPP-4i and an SU as the second ings reflect the continued glycemic risk therapy intensification choice and have ADD. However, among those who switched burden in patients with type 2 diabetes a lower discontinuation rate compared to a third ADD, only 17% of patients in the (27–30). While the persistent therapeutic with other non-INS ADDs. The propor- DPP-4i group switched to INS, compared inertia (28) and the long-term conse- tional share of INS as a second-line ther- with almost 50% in the SU group. We also quences of therapeutic inertia (27,29) apy choice has also increased significantly observed that the mean time to INS initia- for glycemic control in primary care sys- over the last decade. Incretin-based ther- tion was significantly shorter for second-line tems have been evaluated, exploration of apies were found to delay the need for SU users (6.3 years) than in the DPP-4i group the glycemic state at therapy initiation therapy intensification only marginally (7.1 years). This finding is similar to a study and intensification by ADD classes is compared with other ADDs. Contrary to (2015) based on 3,864 matched pairs of pa- scarce. Our study provides a detailed ac- the guidelines for proactive glycemic tients treated with DPP-4i or SU when count of the glycemic state in people with management, pharmacological therapy added to MET, where Inzucchi et al. (6) re- type 2 diabetes at therapy initiation and initiation and the intensifications oc- ported that those treated with DPP-4i intensifications during a reasonable follow- curred at very high levels of HbA1c,with were significantly less likely to initiate up period in primary and ambulatory care 48% of patients having HbA1c $8.0% INS compared with those treated with SU. settings. (58 mmol/mol) at second-line therapy We observed an increasing propor- The main strength of this study is the initiation. tional share of INS as a second-line ther- availability of data from the patients’ apy option over the last 10 years, despite medication lists that included prescribed the availability of novel therapies that medications within the EMR network and Acknowledgments. O.M. acknowledges her were found to have similar or better gly- also medication information that could be cosupervisors Ross Young and Louise Hafner of Queensland University of Technology. fi cemic ef cacy in clinical trials. Also, the prescribed outside of the EMR, as well as Funding. J.J.A. and S.K.P. acknowledge project annual rates of intensification with INS data on glycemic control and comorbid- grant support provided by the Royal Brisbane remained similar over the last decade. ities. The CEMR database tracks longitudi- and Women’s Hospital Foundation. O.M. acknowl- In a similar study, Lipska et al. (15) ob- nal treatment adjustments and contains edges a PhD scholarship from Queensland Univer- sity of Technology. J.S. is supported by a National served that the overall rate of severe hy- comprehensive clinical information, which Health and Medical Research Council Research poglycemia did not reduce from 2006 to is usually not available in claims databases. Fellowship. Melbourne EpiCentre received sup- 2013. This may reflect pressure to achieve The limitations of this study include the port from the National Health and Medical Re- glycemic targets rapidly and an increasing nonavailability of complete and reliable search Council and the Australian Government’s recognition that for people with very poor data on 1) medication adherence and National Collaborative Research Infrastructure Strategy initiative through Therapeutic Innovation glycemic control, INS may be the only side effects, 2) diet and exercise, 3)socio- Australia. drug likely to achieve targets. economic status, and 4) insurance type. Duality of Interest. J.S. has received speaker Compared with rates for older ADDs, We did not include dosage changes or honoraria, consultancy fees, and/or travel spon- high discontinuation rates of new thera- brands’ distribution in our analyses. The sorship from AstraZeneca, Boehringer Ingelheim, fi peutic classes, particularly of DPP-4i, are findings of this study should be inter- Lilly, Sano , Mylan, Novo Nordisk, Merck Sharp & Dohme, and Novartis. J.J.A. has received speaker surprising. The higher cost of these newer preted with caution: EMR data are in gen- honoraria, consultancy fees, and/or travel spon- drugs may be relevant and may also con- eral biased toward unhealthy populations sorship from AstraZeneca, Boehringer Ingelheim, tribute to the fairly low rates of initiation and commercially insured individuals, Lilly, and Novartis. S.K.P. has acted as a consultant of these drugs overall. More studies, uti- white Caucasians are overrepresented in and/or speaker for Novartis, GI Dynamics, Roche, lizing additional data sources, are needed the CEMR, and the results are subject to AstraZeneca, Guangzhou Zhongyi Pharmaceutical, andAmylinPharmaceuticals.S.K.P.hasreceived to specifically test hypotheses for the dif- limited follow-up. grants in support of investigator and investigator- ferences in initiation, adherence, and per- Less popular ADDs such as MEG, AGI, initiated clinical studies from Merck, Novo Nordisk, sistence between drug classes. DOPRA, and amylin were included in our AstraZeneca, Hospira, Amylin Pharmaceuticals, The HbA level at pharmacological study for multiple reasons: first, to assess Sanofi,andPfizer. No other potential conflicts of 1c interest relevant to this article were reported. therapy initiation was found to be 8.2% utilization data of such medications, as Author Contributions. O.M. and S.K.P. were (66 mmol/mol), with 50% having HbA1c these drugs are usually omitted, and sec- responsible for the primary design of the study. $7.5% (58 mmol/mol). The HbA1c levels ond, to ensure market shares of other J.S. and J.J.A. contributed significantly in the study

108 78 Antidiabetes Treatment Patterns in the U.S. Diabetes Care Volume 41, January 2018

design. O.M. conductedthe data extraction. O.M. 10. Nissen SE, Wolski K. Effect of rosiglitazone on mellitus: a long-term view of real-world treat- and S.K.P. jointly conducted the statistical anal- the risk of myocardial infarction and death from ment between 2000 and 2015. Diabetes Metab yses. F.S. contributed in the interpretation of the cardiovascular causes. N Engl J Med 2007;356: Syndr Obes 2016;9:371–380 results. The first draft of the manuscript was 2457–2471 22. Crawford AG, Cote C, Couto J, et al. Compar- developed by O.M. and S.K.P., and all authors 11. Woodcock J, Sharfstein JM, Hamburg M. Reg- ison of GE Centricity Electronic Medical Record contributed to the finalization of the manuscript. ulatory action on rosiglitazone by the U.S. Food database and National Ambulatory Medical Care S.K.P.istheguarantorofthisworkand,assuch,had and Drug Administration. N Engl J Med 2010;363: Survey findings on the prevalence of major con- full access to all the data in the study and takes 1489–1491 ditions in the United States. Popul Health Manag responsibility for the integrity of the data and 12. Home PD, Pocock SJ, Beck-Nielsen H, et al.; 2010;13:139–150 the accuracy of the data analysis. RECORD Study Team. Rosiglitazone evaluated for 23.BrixnerD,SaidQ,KirknessC,ObergB, cardiovascular outcomes in oral agent combina- Ben-Joseph R, Oderda G. Assessment of cardio- References tion therapy for type 2 diabetes (RECORD): a mul- metabolic risk factors in a national primary care 1. American Diabetes Association. Standards of ticentre, randomised, open-label trial. Lancet electronic health record database. Value Health Medical Care in Diabetesd2017: summary of re- 2009;373:2125–2135 2007;10:S29–S36 visions. Diabetes Care 2017;40(Suppl. 1):S4–S5 13. Berkowitz SA, Krumme AA, Avorn J, et al. Ini- 24. Montvida O, Arandjelovic´ O, Reiner E, Paul 2. Paul SK, Maggs D, Klein K, Best JH. Dynamic risk tial choice of oral glucose-lowering medication for SK. Data mining approach to estimate the dura- factors associated with non-severe hypoglycemia diabetes mellitus: a patient-centered comparative tion of drug therapy from longitudinal electronic in patients treated with insulin glargine or exena- effectiveness study. JAMA Intern Med 2014;174: medical records. Open Bioinform J 2017;10:1–15 tide once weekly. J Diabetes 2015;7:60–67 1955–1962 25. U.S. Food and Drug Administration. FDA ap- 3. Paul SK, Shaw JE, Montvida O, Klein K. Weight 14. Segal JB, Maruthur NM. Initial therapy for di- proves weight-management drug [article online], gain in insulin-treated patients by body mass in- abetes mellitus. JAMA Intern Med 2014;174: 2014. Available from https://wayback.archive-it dex category at treatment initiation: new evidence 1962–1963 .org/7993/20170111160832/http://www.fda from real-world data in patients with type 2 diabe- 15. Lipska KJ, Yao X, Herrin J, et al. Trends in drug .gov/NewsEvents/Newsroom/PressAnnouncements/ tes. Diabetes Obes Metab 2016;18:1244–1252 utilization, glycemic control, and rates of severe ucm427913.htm. Accessed 23 December 2014 4. Waldrop G, Zhong J, Peters M, et al. Incretin- hypoglycemia, 2006–2013. Diabetes Care 2017; 26. Quan H, Sundararajan V, Halfon P, et al. Cod- based therapy in type 2 diabetes: an evidence based 40:468–475 ing algorithms for defining comorbidities in ICD- systematic review and meta-analysis. J Diabetes Com- 16. Pantalone KM, Hobbs TM, Wells BJ, et al. Clin- 9-CM and ICD-10 administrative data. Med Care plications. 28 August 2016 [Epub ahead of print]. ical characteristics, complications, comorbidities and 2005;43:1130–1139 https://doi.org/10.1016/j.jdiacomp.2016.08.018 treatment patterns among patients with type 2 di- 27. Paul SK, Klein K, Thorsted BL, Wolden ML, 5. Mamza J, Mehta R, Donnelly R, Idris I. Impor- abetes mellitus in a large integratedhealthsystem. Khunti K. Delay in treatment intensification in- tant differences in the durability of glycaemic re- BMJ Open Diabetes Res Care 2015;3:e000093 creases the risks of cardiovascular events in pa- sponse among second-line treatment options when 17. Raebel MA, Xu S, Goodrich GK, et al. Initial tients with type 2 diabetes. Cardiovasc Diabetol added to metformin in type 2 diabetes: a retrospec- antihyperglycemic drug therapy among 241 327 2015;14:100 tive cohort study. Ann Med 2016;48:224–234 adults with newly identified diabetes from 2005 28. Khunti K, Nikolajsen A, Thorsted BL, Andersen 6. Inzucchi SE, Tunceli K, Qiu Y, et al. Progression through 2010: a surveillance, prevention, and man- M, Davies MJ, Paul SK. Clinical inertia with regard to insulin therapy among patients with type 2 di- agement of diabetes mellitus (SUPREME-DM) study. to intensifying therapy in people with type 2 di- abetes treated with sitagliptin or sulphonylurea Ann Pharmacother 2013;47:1280–1291 abetes treated with basal insulin. Diabetes Obes plus metformin dual therapy. Diabetes Obes 18. Hampp C, Borders-Hemphill V, Moeny DG, Metab 2016;18:401–409 Metab 2015;17:956–964 Wysowski DK. Use of antidiabetic drugs in the U.S., 29. Paul SK, Klein K, Thorsted BL, Wolden ML, 7. Cohen FJ, Neslusan CA, Conklin JE, Song X. Re- 2003-2012. Diabetes Care 2014;37:1367–1374 Khunti K. Delay in treatment intensification in- cent antihyperglycemic prescribing trends for U.S. 19. Centers for Disease Control and Prevention. creases the risks of cardiovascular events in pa- privately insured patients with type 2 diabetes. National Diabetes Statistics Report: Estimates of tients with type 2 diabetes. Cardiovasc Diabetol Diabetes Care 2003;26:1847–1851 Diabetes and Its Burden in the United States, 2015;14:100 8. Desai NR, Shrank WH, Fischer MA, et al. Pat- 2014. Atlanta, GA, U.S. Department of Health 30. Montvida O, Klein K, Kumar S, Khunti K, Paul terns of medication initiation in newly diagnosed and Human Services, 2014 SK.Additionoforswitchtoinsulintherapy diabetes mellitus: quality and cost implications. 20. International Diabetes Federation. IDF Diabe- in people treated with glucagon-like peptide-1 Am J Med 2012;125:302.e1-7 tes Atlas. 7th ed. Brussels, Belgium, 2015 receptor agonists: a real-world study in 9. Tanne JH. FDA places “black box” warning on 21. Higgins V, Piercy J, Roughley A, et al. Trends in 66 583 patients. Diabetes Obes Metab 2017;19: antidiabetes drugs. BMJ 2007;334:1237 medication use in patients with type 2 diabetes 108–117

109 Chapter 8: Glycaemic Control and Sustainability

Statement of Contribution of Co-Authors for Thesis by Published Paper

The authors listed below have certified* that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Olga Montvida, Jonathan Shaw, Lawrence Blonde, Sanjoy K Paul. Long-term sustainability of glycaemic achievements with second-line anti-diabetic therapies in patients with type 2 diabetes: A real-world study. Accepted at Diabetes, Obesity, and Metabolism.

Contributor Statement of Contribution* Olga Montvida Conceived the idea and was responsible for the primary 29.06.2018 design of the study. Conducted the data extraction and QUT Verified statistical analyses. Developed first draft and contributed Signature towards development of the manuscript. Jonathan Shaw Contributed in the study design and manuscript development. Lawrence Blonde Contributed in the manuscript development. Sanjoy K. Paul Conceived the idea, and was responsible for the primary design of the study. Contributed to the statistical analyses. Developed first draft and contributed towards development of the manuscript.

110 Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship.

QUT Verified Signature Sanjoy Ketan Paul 29.06.2018 Name Signature Date

111 Received: 21 January 2018 Revised: 28 February 2018 Accepted: 7 March 2018 DOI: 10.1111/dom.13288

ORIGINAL ARTICLE

Long-term sustainability of glycaemic achievements with second-line antidiabetic therapies in patients with type 2 diabetes: A real-world study

Olga Montvida MSc1,2 | Jonathan E. Shaw MD3 | Lawrence Blonde MD4 | Sanjoy K. Paul PhD1,5

1Statistics Unit, QIMR Berghofer Medical Research Institute, Brisbane, Australia Aims: To inform patients and their carers about both the probability of reducing glycated hae- 2School of Biomedical Sciences, Faculty of moglobin (HbA1c) to clinically desirable levels and the sustainability of such control over Health, Queensland University of Technology, 2 years with major second-line antidiabetic therapies, in individual risk scenarios, with and Brisbane, Australia without third-line intensification. 3 Baker Heart and Diabetes Institute, Materials and Methods: From US Centricity Electronic Medical Records, 163 081 patients with Melbourne, Australia type 2 diabetes aged 18 to 80 years, who had initiated metformin, intensified their treatment 4Ochsner Diabetes Clinical Research Unit, Department of Endocrinology, Frank Riddick with dipeptidyl peptidase-4 (DPP-4) inhibitors, glucagon-like peptide-1 (GLP-1) receptor ago- Diabetes Institute, Ochsner Medical Center, nists (GLP-1RAs), sulphonylureas (SUs), insulin or (TZDs), and continued New Orleans, Louisiana second-line treatment for ≥6 months, were selected. Treatment groups were balanced with 5 Melbourne EpiCentre, University of regard to baseline characteristics, and glycaemic achievements were estimated using logistic Melbourne and Melbourne Health, regression analysis. Melbourne, Australia Results: With HbA1c concentrations of 58–63.9 mmol/mol (7.5–7.9%) at second-line treat- Correspondence Sanjoy K. Paul PhD, The Royal Melbourne ment initiation, the adjusted probabilities of achieving HbA1c <53 mmol/mol (<7%) at Hospital, City Campus, 7 East, Main Building, 6 months were 32%, 38%, 39%, 26% and 38% in the SU, DPP-4 inhibitor, GLP-1RA, insulin Grattan Street, Parkville, Victoria 3050, and TZD groups, respectively, while with baseline HbA1c concentrations of 64–75 mmol/mol Australia. Email: [email protected] (8–9%), the corresponding probabilities of reducing HbA1c to <58 mmol/mol (<7.5%) were Funding information 38%, 44%, 40%, 34% and 42%, respectively. In these baseline HbA1c categories, the adjusted No separate funding was obtained for this probabilities of sustaining HbA1c achievements over 2 years were higher in the GLP-1RA and study. TZD groups, compared with the SU and insulin groups (P < .01). With baseline HbA1c concen- trations of 75.1–108 mmol/mol (9.1–12%) 38% of patients achieved an HbA1c concentration <58 mmol/mol (<7.5%) at 6 months. The adjusted probability of sustaining this control over 2 years was higher in the incretin and TZD groups (range 62%-75%), while insulin and SUs offered lower chances of sustainable control (range 54%-56%). Conclusions: Patients treated with second-line incretins and TZDs had a significantly higher probability of achieving and sustaining glycaemic control over 2 years without further intensifi- cation, compared with those treated with SUs or insulin.

KEYWORDS antidiabetic drug, glycaemic control, therapeutic choice

1 | INTRODUCTION require therapy intensification with multiple antidiabetic drugs (ADDs) to achieve glycaemic control.1–3 For second-line treatment Metformin is recommended as a first-line pharmacological treatment intensification, the American Diabetes Association recommends sul- for patients with type 2 diabetes; however, most patients eventually phonylureas (SUs), thiazolidinediones (TZDs), dipeptidyl peptidase

Diabetes Obes Metab. 2018;1–10. wileyonlinelibrary.com/journal/dom © 2018 John Wiley & Sons Ltd 1

112 2 MONTVIDA ET AL.

(DPP-4) inhibitors, sodium-glucose co-transporter-2 (SGLT2) inhibi- present study were: to inform clinicians and patients on the likelihood tors, glucagon-like peptide-1 receptor agonists (GLP-1RAs) or insulin. of reducing HbA1c to a clinically desirable level over 6, 12 and Other drugs are recommended under specific conditions.1 The tem- 24 months of treatment with major second-line ADDs when added poral patterns of the changes in the second-line ADD choices over to metformin; to estimate the probability of sustaining early glycae- the last decade in the United States have recently been explored by mic control over 24 months of therapy continuation with and without 4 Montvida et al. the need for third-line ADD addition; and to determine whether the While clinicians' and patients' decisions with regard to add-on availability of newer ADDs has reduced the need for intensification 5,6 agents have become more complicated, few studies have directly with third-line therapy at the population-level over time. compared the glycaemic effectiveness of second-line therapies.7–11 A recent network meta-analysis reported similar glycaemic achieve- ments with all second-line ADDs when added to metformin.9 Analo- 2 | MATERIALS AND METHODS gous results were discussed in two observational studies using electronic medical records (EMRs).7,8 In 7009 patients from Germany, 2.1 | Data source Rathmann et al7 reported an unadjusted mean glycated haemoglobin The US Centricity Electronic Medical Record (CEMR) database was (HbA1c) reduction of 0.7% to 1.1% after 6 months of treatment with used in the present study. The CEMR represents >35 000 solo practi- major second-line ADDs, including insulin. A study from Denmark in tioners, community clinics, academic medical centres and large inte- 4734 patients by Thomsen et al8 reported median reductions of 0.8% grated delivery networks across all of the United States. Patients in to 1.3% at 12 months for non-insulin drugs (from baseline HbA1c of 60–64 mmol/mol [7.6–8.0%]) and 2.4% for insulin-treated patients the database are generally representative of the US population, with (from 81 mmol/mol [9.6%]).8 a diabetes prevalence (7.1%, as identified by diagnostic codes) that is Given the increasing complexity and challenges with regard to similar to National Diabetes Statistics (6.7% with diagnosed diabetes 13 multiple risk factor management in patients with type 2 diabetes, and in 2014). The CEMR database has been extensively used for aca- 14–16 the availability of a number of new and older classes of ADDs, a demic research worldwide. population-level assessment of the likelihood of short- and long-term Research involved existing data, from which the patients could glycaemic achievements and their sustainability with the use of dif- not be identified either directly or through identifiers linked to the ferent second-line ADDs would be of great value. With evaluation of patients. According to the US Department of Health and Human Ser- a reasonably large number of patients from primary and ambulatory vices Exemption 4 [CFR 46.101(b) (4)], therefore, this study was care systems, probabilistic estimates of sustainable glycaemic exempt from ethics approval from an institutional review board and achievements with different second-line ADDs for different risk para- informed consent. digms would empower clinicians and their patients to make more For more than 34 million individuals, longitudinal EMRs were informed therapeutic choices. To the best of our knowledge, no study available from 1995 until April 2016, with comprehensive patient- has evaluated early glycaemic control and its sustainability among level information on demographics, anthropometric, clinical and labo- groups of patients with different HbA1c concentrations at the time ratory variables. Medication data included brand names and doses for of post-metformin second-line ADD intensification. individual medications prescribed, along with start/stop dates and The newer classes of ADDs, including GLP-1RAs and SGLT2 specific fields to track treatment alterations. The dataset also con- inhibitors, potentially have both extra glycaemic benefits, such as tained patient-reported medications, including prescriptions received weight and blood pressure reductions, and possible associations with outside the EMR network and over-the-counter medications. reduced risk of cardiovascular diseases. These therapies are costlier in comparison to the older ADDs; however, if the newer drugs have 2.2 | Study design longer-lasting benefits with regard to glycaemic control than older ADDs, then over time, a lower rate of third-line therapy intensifica- To obtain data on the first-, second-, and third-line ADDs for each tion across the whole population would be expected, as the use of patient with type 2 diabetes, the following drug classes were newer drugs increases. In this context, evaluation of whether second- arranged chronologically according to the initial prescription dates: line therapy intensification with newer drug classes has been helpful metformin, SUs, TZDs, alpha glucosidase inhibitors, amylin, dopamine in reducing the need for third-line therapy intensification over time, receptor agonists, meglitinide, DPP-4 inhibitors, GLP-1RAs, SGLT2 would be of great interest, but there is a paucity of population-level inhibitors and insulin. Same-day initiations (including combination studies addressing this question. A modelling study by Zhang et al12 therapies) were prioritized in the order as listed above, from highest reported a marginally shorter time to insulin treatment in patients to lowest. A robust methodology for extraction and assessment of treated with incretin-based therapies (GLP-1RAs and DPP-4 inhibi- longitudinal patient-level medication data from the CEMR database tors) compared with those treated with SUs, but we are not aware of has recently been described by the authors.17 any population-level study that has evaluated the possible delay in The study cohort included patients with: (1) age at diagnosis ≥18 the need for third-line therapy intensification in patients who choose and <80 years; (2) a diagnosis date strictly after first registered activ- incretin-based therapies as second-line therapy. ity in the CEMR database; (3) a diagnosis date on or after January Taking into account the heterogeneous HbA1c levels among 1, 2005; (4) initiation of antidiabetic therapy with metformin; (5) initia- patients at second-line treatment initiation, the main aims of the tion of second-line ADD with SU, TZD, DPP-4 inhibitors, GLP-1RAs

113 MONTVIDA ET AL. 3 or insulin; (6) available HbA1c measure at second-line ADD initiation In patients with baseline HbA1c concentrations 58–63.9 mmol/ (baseline); and (7) second-line therapy duration ≥6 months. Additional mol (7.5–7.9%) who achieved an HbA1c target of 53 mmol/mol (7%) restrictions on the duration of second-line therapy were applied: at 6 months without addition of a third ADD, the probabilities of sus- ≥12 months (sub-cohort 1) and ≥24 months (sub-cohort 2). taining HbA1c control over 24 months were estimated after the bal- Baseline body weight, body mass index (BMI), systolic/diastolic ancing and adjustments described above. Similarly, in patients with blood pressure and lipids were calculated as the average of available baseline HbA1c of 64–75 mmol/mol (8–9%) who achieved HbA1c of measurements within the 3 months before and 3 months after initia- <58 mmol/mol (<7.5%) at 6 months without addition of a third ADD, tion of therapy. HbA1c values at baseline, 6, 12, 18 and 24 months the adjusted probabilities of sustaining HbA1c control over were obtained as the nearest measure within 3 months either side of 24 months were estimated. Finally, in patients with baseline HbA1c the time point. Under condition of at least two non-missing HbA1c concentration of 75.1–108 mmol/mol (9.1–12%) who achieved measures over 24 months, the missing data were imputed using a HbA1c <58 mmol/mol (<7.5%) at 6 months with or without third-line Markov chain Monte Carlo method, adjusting for age, diabetes dura- treatment intensification, the adjusted probabilities of sustaining tion and usage of concomitant ADDs.18 The following baseline HbA1c control (irrespective of third-line ADD status) over 24 months HbA1c categories were then created: (1) 53.0-63.9 mmol/mol (7.0– were estimated. The assessment of achieving HbA1c <53 mmol/mol 7.9%); (2) 64.0–75.0 mmol/mol (8.0–9.0%); (3) 75.1-108.0 mmol/mol (<7%) in this category was considered clinically unrealistic. (9.1–12%); (4) >108 mmol/mol (>12%). Sensitivity analyses included an intention-to-treat evaluation and The presence of comorbidities prior to baseline was assessed separate assessment in patients with comorbidities at baseline. using the relevant disease identification codes. The Charlson comor- bidity index (CCI) score was calculated using the algorithm described 3 | RESULTS by Quan et al19 Cardiovascular disease was defined as ischaemic heart disease, peripheral vascular disease, heart failure or stroke. Can- From 2 624 954 identified patients with type 2 diabetes, 195 720 cer was defined as any malignancy except malignant neoplasm initiated second-line ADD after metformin and had available HbA1c of skin. measurements (Figure S1). Of these, 85%, 79%, 77%, 83% and 83% in the SU, DPP-4 inhibitor, GLP-1RA, insulin and TZD groups, respec- 2.3 | Statistical methods tively, continued therapy for at least 6 months. The study cohort

Baseline characteristics were summarized as number (%), mean included 90 572, 29 308, 6696, 21 827 and 14 678 patients in the (SD) or median (first quartile, third quartile) as appropriate. Patterns SU, DPP-4 inhibitor, GLP-1RA, insulin and TZD groups, respectively of intensification with third-line ADDs were summarized according to (Table 1). On average, the progression to a second ADD occurred 9 months after metformin initiation. Available follow-up years from second-line ADDs in the study cohort, sub-cohort 1 and sub-cohort baseline were 4.0, 3.2, 3.7, 3.5 and 5.6 years in the SU, DPP-4 inhibi- 2. Among patients with ≥2 years of follow-up in the study cohort, tor, GLP-1RA, insulin and TZD groups, respectively, and 84% of the proportions (95% confidence interval [CI]) of those who initiated patients continued therapy for at least 1 year. The distributions of a third ADD within 2 years of baseline were calculated according to age, sex, BMI and comorbidities at baseline were significantly differ- year of second-line treatment initiation. ent among the the second-line ADDs (Table 1). Propensity scores for multiple treatment levels20 were calculated The distribution of HbA1c categories at baseline was heteroge- within each HbA1c category to account for heterogeneous baseline neous among the treatment groups (Table 1). With a mean characteristics among second-line ADD groups. Inverse probability of (SD) cohort HbA1c level of 8.4 (1.9)% (68 mmol/mol) at second-line these exposure weights21,22 was used to balance second-line treat- therapy initiation, the proportions of patients with baseline HbA1c ment groups with regard to age, sex, baseline HbA1c and baseline <64 mmol/mol (<8%) were 52%, 58%, 67%, 36% and 66% in the SU, CCI score. In patients without a history of cardiovascular disease, DPP-4 inhibitor, GLP-1RA, insulin and TZD groups, respectively. chronic kidney disease or cancer at baseline, the probabilities (95% CIs) of achieving glycaemic control (HbA1c <53 or 58 mmol/mol (7 or 7.5%)) at 6, 12 and 24 months after second-line treatment initiation 3.1 | Treatment intensification with a third drug were estimated in the study cohort, sub-cohort 1 and sub-cohort Overall, 52% in the cohort had a third ADD prescribed (either in addi- 2, respectively. Three outcomes were assessed with multinomial tion to or as a switch from a second ADD) during the available logistic regression: (1) no glycaemic control achievement at corre- follow-up. On average, the progression to a third ADD occurred at sponding time point; (2) glycaemic control achievement with a third 15 months after second-line treatment initiation (Table 2). Of those ADD addition within the analysis time window; and (3) glycaemic who initiated a third drug, 88% added it to dual therapy (ranging from achievement without a third ADD addition within the analysis time 70% in the insulin group to 94% in the GLP-1RA group), while only window. Analyses were conducted by balancing the data as described 12% ceased the second ADD and switched to a third agent. above, with additional covariate adjustments for age, sex, and time By study design, patients who switched to a third agent within from metformin to second-line treatment, separately for the HbA1c 6, 12 or 24 months were not included in the study cohort, sub-cohort categories of 58–63.9 mmol/mol (7.5–7.9%); 64–75 mmol/mol 1 or sub-cohort 2, respectively. During 6 months of therapy post (8–9%); and 75.1–108 mmol/mol (9.1–12%). baseline, 27%, 21%, 26%, 12% and 29% of patients in the SU, DPP-4

114 4 MONTVIDA ET AL.

TABLE 1 Characteristics of patients at initiation of second-line antidiabetic drug

Metformin + Metformin Metformin Metformin Metformin + SU All DPP-4 inhibitor + GLP-1RA + insulin + TZD N 90 572 29 308 6696 21 827 14 678 163 081 Mean (SD) age, years 59 (12) 57 (12) 53 (11) 56 (13) 57 (11) 57 (12) Men, n (%) 46 005 (51) 14 330 (49) 2354 (35) 9858 (45) 7782 (53) 80 329 (49) White ethnicity, n (%) 63 338 (70) 20 366 (69) 5100 (76) 14 267 (65) 10 256 (70) 113 327 (69) Black ethnicity, n (%) 11 703 (13) 3618 (12) 616 (9) 3690 (17) 1434 (10) 21 061 (13) Mean (SD) time from metformin to 8.9 (16.8) 12.8 (19.3) 12.1 (18.3) 5.8 (14.2) 4.8 (11.7) 9.0 (16.8) second drug, mo Mean (SD) follow-up from baseline, 4.03 (2.49) 3.22 (1.95) 3.66 (2.39) 3.46 (2.24) 5.57 (2.80) 3.93 (2.47) years Mean (SD) therapy duration from 38.3 (26.3) 29.6 (20.1) 28.4 (21.0) 37.9 (26.0) 35.8 (26.1) 36.0 (25.3) baseline, mo Therapy duration ≥12 mo, n (%) 77 779 (86) 23 327 (80) 5061 (76) 18 729 (86) 12 040 (82) 136 936 (84) Therapy duration ≥24 mo, n (%) 56 324 (62) 14 746 (50) 3090 (46) 13 472 (62) 8297 (57) 95 929 (59) Mean (SD) HbA1c, % 8.4 (1.8) 8.2 (1.7) 7.8 (1.6) 9.3 (2.3) 7.9 (1.7) 8.4 (1.9) Mean HbA1c mmol/mol 68 66 62 78 63 68 HbA1c category, n (%) 53–63.9 mmol/mol (7–7.9%) 26 493 (29) 10 112 (35) 1953 (29) 4034 (18) 4139 (28) 46 731 (29) 64–75 mmol/mol (8–9%) 18 701 (21) 5726 (20) 1027 (15) 3838 (18) 2295 (16) 31 587 (19) 75.1–108 mmol/mol (9.1–12%) 20 148 (22) 5373 (18) 989 (15) 7432 (34) 2183 (15) 36 125 (22) >108 mmol/mol (>12%) 4695 (5) 1227 (4) 166 (2) 2798 (13) 504 (3) 9390 (6) Mean (SD) weight, kg 98.3 (24.5) 98.9 (24.2) 109.5 (25.9) 99.8 (26.2) 100.2 (23.9) 99.3 (24.8) Mean (SD) BMI, kg/m2 34.5 (7.7) 34.5 (7.6) 38.5 (8.1) 35.2 (8.4) 34.8 (7.6) 34.8 (7.8) BMI categorya, n (%) Normal 5803 (7) 1841 (6) 89 (1) 1712 (8) 827 (6) 10 272 (6) Overweight 20 477 (23) 6567 (23) 669 (10) 4217 (20) 3130 (22) 35 060 (22) Grade 1 25 568 (29) 8570 (30) 1661 (25) 5697 (27) 4029 (29) 45 525 (29) Grade ≥ 2 35 853 (41) 11 788 (41) 4144 (63) 9587 (45) 6012 (43) 67 384 (43) Mean (SD) SBP, mm Hg 131 (15) 129 (13) 128 (13) 130 (15) 130 (14) 130 (14) SBP ≥ 140 mm Hg, n (%) 22 164 (25) 5807 (20) 1084 (17) 5022 (23) 3088 (22) 37 165 (23) Mean (SD) LDL cholesterol, mg/dL 98 (34) 98 (35) 95 (34) 99 (37) 97 (34) 98 (35) Mean (SD) HDL cholesterol, mg/dL 43 (12) 44 (12) 44 (12) 43 (13) 45 (12) 43 (12) Median (IQR) triglycerides, mg/dL 150 (109, 199) 147 (108, 196) 150 (109, 200) 144 (103, 197) 139 (101, 190) 147 (107, 198) CVD, CKD or cancer, n (%) 23 281 (26) 7223 (25) 1205 (18) 5870 (27) 2982 (20) 40 561 (25) CVD, n (%) 18 031 (20) 5406 (18) 852 (13) 4675 (21) 2276 (16) 31 240 (19) CKD, n (%) 3750 (4) 1205 (4) 151 (2) 811 (4) 431 (3) 6348 (4) Cancer, n (%) 4469 (5) 1628 (6) 285 (4) 1103 (5) 552 (4) 8037 (5) Neuropathy, n (%) 7153 (8) 2080 (7) 519 (8) 2305 (11) 879 (6) 12 936 (8) Retinopathy, n (%) 1329 (1) 288 (1) 76 (1) 535 (2) 166 (1) 2394 (1) Depression, n (%) 14 925 (16) 5427 (19) 1576 (24) 4200 (19) 2145 (15) 28 273 (17) Mean (SD) CCI 1.7 (1.1) 1.7 (1.1) 1.6 (0.9) 1.8 (1.2) 1.5 (0.9) 1.7 (1.1)

Abbreviations: BMI, body mass index; CCI, Charlson comorbidity index; CVD, cardiovascular disease; CKD, chronic kidney disease; DBP, diastolic blood pressure; DPP-4, dipeptidyl peptidase-4; GLP-1RA, glucagon-like peptide-1 receptor agonist; HbA1c, glycated haemoglobin; IQR, interquartile range; SBP, systolic blood pressure; SU, sulphonylurea; TZD, thiazolidinedione. a BMI category: normal: <25 kg/m2; overweight: ≥25 and <30 kg/m2; Grade 1: ≥30 and <35 kg/m2; Grade ≥ 2: ≥35 kg/m2. inhibitor, GLP-1RA, insulin and TZD groups, respectively, added a 3.2 | Temporal pattern of initiating third-line ADDs third-line therapy (Table 2). Insulin was the most popular third-line Irrespective of the class of second-line ADD, the proportions of ADD, followed by DPP-4 inhibitors. Of those who added a third drug, patients who initiated a third ADD within 2 years of baseline are insulin was chosen by 26%, 36%, 69% and 32% of patients in the SU, shown in Figure 1A (“All”), stratified by calendar year of second-line DPP-4 inhibitor, GLP-1RA and TZD groups, respectively (Table 2). initiation. Figure 1 also shows those who intensified treatment with a Among those who continued the second-line therapy for 12 months third ADD, excluding TZD as second-line group (“All without TZD”) (sub-cohort 1) and for 24 months (sub-cohort 2), 30% and 39% added because a large proportion of patients ceased TZD treatment as a a third-line therapy respectively. result of cardiovascular safety concerns23–25 and not necessarily

115 MONTVIDA ET AL. 5

TABLE 2 Third-line anti-diabetic drug usage in the study cohort and two sub-cohortsa

Metformin + Metformin + Metformin + Metformin + Metformin + SU DPP-4 All GLP-1RA insulin TZD inhibitor Study cohort N 90 572 29 308 6696 21 827 14 678 163 081 Initiated third drug n (% from 49 255 (54) 15 248 (52) 3513 (52) 7275 (33) 10 006 (68) 85 297 (52) N) Time from second to Mean (SD) 14.3 (19.5) 14.3 (16.0) 13.2 (17.8) 17.7 (19.3) 18.4 (23.1) 15.0 (19.4) third drug, mo Added third drug n1 (% from 24 600 (27) 6053 (21) 1725 (26) 2627 (12) 4260 (29) 39 265 (24) within 6 mo N) • Most popular Name; n (% TZD; 8107 (33) Insulin; 2200 Insulin; 1189 SU; 888 (34) Insulin; 1352 Insulin; 11 054 third drug from n1) (36) (69) (32) (28) • Second most Name; n (% DPP-4 inhibitor; SU; 2073 SU; 193 (11) DPP-4 DPP-4 DPP-4 popular third drug from n1) 7455 (30) (34) inhibitor; inhibitor; inhibitor; 703 (27) 1236 (29) 9499 (24) Sub-cohort 1 N2 77 779 23 327 5061 18 729 12 040 136 936 Added third drug n (% from 20 990 (27) 4581 (20) 1300 (26) 2220 (12) 3450 (29) 32 541 (24) within 6 mo N2) Added third drug n2 (% from 4265 (5) 1860 (8) 293 (6) 1076 (6) 682 (6) 8176 (6) within 6 to 12 mo N2) • Most popular Name; n (% DPP-4 SU; 975 (52) SU; 104 (35) SU; 336 (31) SU; 340 (50) DPP-4 third drug from n2) inhibitor;1737 inhibitor; (41) 2217 (27) • Second most Name; n (% Insulin; 1074 Insulin; Insulin; GLP1RA; DPP-4 SU; 1755 (21) popular third drug from n2) (25) 267 (14) 62 (21) 269 (25) inhibitor; 160 (23) Sub-cohort2 N3 56 324 14 746 3090 13 472 8297 95 929 Added third drug n (% from 15 074 (27) 2549 (17) 800 (26) 1521 (11) 2309 (28) 22 253 (23) within 6 mo N3) Added third drug n (% from 2867 (5) 1124 (8) 168 (5) 756 (6) 471 (6) 5386 (6) within 6 to 12 mo N3) Added third drug n3 (% from 5302 (9) 1833 (12) 319 (10) 1070 (8) 645 (8) 9169 (10) within 12 to 24 mo N3) • Most popular Name; n (% DPP-4 inhibitor; SU; 959 (52) SU; 113 (35) SU; 301 (28) SU; 297 (46) DPP-4 third drug from n3) 2356 (44) inhibitor; 2876 (31) • Second most Name; n (% Insulin; 1225 SGLT2; Insulin; DPP-4 DPP-4 SU; 1670 (18) popular third drug from n3) (23) 274 (15) 63 (20) inhibitor; inhibitor; 269 (25) 201 (31)

Abbreviations: DPP-4, dipeptidyl peptidase-4; GLP-1RA, glucagon-like peptide-1 receptor agonist; SU, sulphonylurea; TZD, thiazolidinedione. a Duration of second-line agent ≥6 months/ ≥12 months/ ≥24 months in the study cohort/ sub-cohort 1/ sub-cohort 2, respectively.

FIGURE 1 Among patients who had at least 2 years of follow-up in the study cohort, the proportion (95% confidence interval) of patients who initiated third antidiabetic drug (ADD) within 2 years of second ADD: A, irrespective of HbA1c level at second-line initiation; B, among those with HbA1c of 64.0–75.0 mmol/mol (8–9%) at second-line initiation; and C, among those with HbA1c of 75.1-108.0 mmol/mol (9.1–12%) at second-line initiation. HbA1c, glycated haemoglobin; INS, insulin; TZD, thiazolidinedione

116 6 MONTVIDA ET AL. because of efficacy issues. The figure also provides the data exclud- Among those who achieved an HbA1c concentration <58 mmol/ ing those who had a TZD or insulin as second-line (“All without mol (<7.5%) without a third ADD at 6 months, 76% and 67% sus- TZD & insulin”) to explore the possible change in intensification rate tained this glycaemic achievement at 12 and 24 months, respectively, with non-insulin ADDs over time, accounting for decreasing popular- without requiring third-line treatment intensification. The probability ity of TZDs. Figure 1B and C focus on those who had baseline HbA1c of sustaining this glycaemic achievement was significantly higher in of 64–75 mmol/mol (8–9%) and 75.1–108 mmol/mol (9.1–12%), the GLP-1RA and TZD groups at 12 months (range of 95% CI of respectively. Figure 1 shows that between 2007 and 2014 the pro- probability: 76%, 79%), compared with other second-line ADDs portion of patients initiating a third ADD, within 2 years of adding the second ADD, decreased; however, this decline started to reverse (Figure 2D; P < .01). While the probability of sustaining this glycae- in 2014, especially among those whose HbA1c was 75.1–108 mmol/ mic control declined significantly by 24 months of therapy across all mol (9.1–12%) at initiation of the second ADD. groups, patients treated with insulin had the lowest probability of sustaining the glycaemic control. 3.3 | Glycaemic achievements and sustainability 3.3.3 | Baseline HbA1c 75.1–108 mmol/mol (9.1–12%) At 6 months, the mean unadjusted HbA1c reductions were 0.8%, In the patients with 75.1–108 mmol/mol (9.1–12%) baseline HbA1c, 0.8%, 0.7%, 1.0% and 0.8% in the SU, DPP-4 inhibitor, GLP-1RA, 29%, 36% and 45% added a third ADD within 6, 12 and 24 months insulin and TZD groups, respectively. The mean adjusted reductions of baseline, respectively. Irrespective of third ADD status, 37%, 45%, at 6 months were 0.8%, 1.0%, 1.1%, 0.7% and 1.0% in the respective 38%, 21% and 43% of patients in the SU, DPP-4 inhibitor, GLP-1RA, treatment groups (significant for all groups, P < .01). insulin and TZD groups, respectively, achieved HbA1c <58 mmol/mol (<7.5%) at 6 months, with corresponding probabilities of 36%, 45%, 3.3.1 | Baseline HbA1c group 58–63.9 mmol/mol 38%, 33% and 43% (Figure 2E). The probability of achieving an (7.5–7.9%) HbA1c concentration <58 mmol/mol (<7.5%) at 24 months reduced Among patients with HbA1c concentrations of 58–63.9 mmol/mol by 4% for insulin users, did not change in the SU and DPP-4 inhibitor (7.5–7.9%) at baseline, 44%, 47%, 57%, 31% and 57% of patients in groups, and increased by 8% and 9% in the second-line GLP-1RA and the SU, DPP-4 inhibitor, GLP-1RA, insulin and TZD groups, respec- TZD groups (all P < .01). Among those who achieved an HbA1c con- tively, achieved HbA1c <53 mmol/mol (<7%) at 6 months without centration <58 mmol/mol (<7.5%) at 6 months, 72% and 58%, third-line treatment addition. The corresponding adjusted probabili- respectively, sustained this glycaemic achievement at 12 and ties were 32%, 38%, 39%, 26% and 38% in the second-line treatment 24 months, irrespective of third-line treatment intensification status. groups (P < .01 for all groups [Figure 2A]); however, the probabilities The probability of sustaining glycaemic control at <58 mmol/mol of reducing HbA1c below the target 53 mmol/mol (7%) without (<7.5%) over 12 and 24 months of treatment was significantly higher third-line ADD intensification declined by 5%, 5%, 6%, 2% and 1% at in the incretin and TZD groups, while insulin and SU offered lower 12 months and by 9%, 8%, 15%, 5% and 7% at 24 months in the SU, chances of sustainable control (Figure 2F). DPP-4 inhibitor, GLP-1RA, insulin and TZD groups, respectively. Among those who reduced HbA1c to <53 mmol/mol (<7%) with- 3.3.4 | Baseline HbA1c >108 mmol/mol (>12%) out a third ADD at 6 months, 68% and 58% of patients sustained this In patients with baseline HbA1c >108 mmol/mol (>12%), the proba- glycaemic achievement at 12 and 24 months, respectively. The prob- bility of reducing HbA1c by at least 2% increased over time: 82% at ability of sustaining this glycaemic achievement was higher and simi- 2 years of insulin therapy, and ~90% for other second-line choices. lar in the GLP-1RA and TZD groups at 12 months (range of 95% CI The probabilities of reducing HbA1c by at least 1.5% in this baseline of probability: 76%, 79%), compared with other second-line therapy HbA1c group were not significantly different among the ADD groups options (Figure 2B). While the probability of sustaining this glycaemic over 2 years (results not shown). control declined significantly by 24 months, GLP-1RAs, DPP-4 inhibi- tors and TZDs provided significantly higher chances of sustainability (range of 95% CI of probability: 53%, 58%) compared with patients 3.4 | Sensitivity analyses treated with insulin or SUs (range of 95% CI of probability: An intention-to-treat approach obtained similar results to those of 46%, 50%). the main analyses. Patients with cardiovascular disease, chronic kid- ney disease or cancer at baseline had marginally higher probabilities 3.3.2 | Baseline HbA1c 64–75 mmol/mol (8–9%) of glycaemic achievements in all treatment groups, compared with – Among patients with baseline HbA1c concentrations of 64 75 those without comorbidities (results not shown). mmol/mol (8-9%), 55%, 58%, 66%, 41% and 67% of patients in the SU, DPP-4 inhibitor, GLP-1RA, insulin and TZD groups achieved HbA1c <58 mmol/mol (7.5%) at 6 months without third-line ADD 4 | DISCUSSION addition, and the corresponding adjusted probabilities were 38%, 44%, 40%, 34% and 42%, respectively (Figure 2C). The probabilities The novelty of the present pharmaco-epidemiological study, with of this glycaemic achievement declined significantly by at least 5% real-world population-level data, is its evaluation of short- and long- across all treatment groups at 12 months, and by at least 8% at term glycaemic control with post-metformin major second-line ADDs, 24 months. and the comparison of the sustainability of such glycaemic goals over

117 MONTVIDA ET AL. 7

FIGURE 2 At 6, 12, and 24 months of second-line initiation, adjusted probability (95% confidence interval) of A, reducing glycated haemoglobin (HbA1c) below 53 mmol/mol (7%) without adding third anti-diabetic drug (ADD), from baseline HbA1c of 58–63.9 mmol/mol (7.5–7.9%); B, sustaining 6-month achievement without adding a third ADD; C, reducing HbA1c below 58 mmol/mol (7.5%) without adding a third ADD, from baseline HbA1c of 64–75 mmol/mol (8–9%); D, sustaining 6-month achievement without adding a third ADD; E, reducing HbA1c below 58 mmol/mol (7.5%) (irrespective of third ADD), from baseline HbA1c of 75.1–108 mmol/mol (9.1–12%); and F, sustaining 6-month achievement (irrespective of third ADD). DPP-4, dipeptidyl peptidase-4; GLP-1RA, glucagon-like peptide-1 receptor agonist; INS, insulin; MET, metformin; SU, sulphonylurea; TZD, thiazolidinedione

118 8 MONTVIDA ET AL.

24 months of continuous treatment. Among patients with HbA1c glycaemic control after 2 years of continuous treatment without fur- concentrations of 58-63.9 mmol/mol (7.5-7.9%) at second-line ADD ther intensification with a third ADD. initiation, the probabilities of achieving an HbA1c concentration of Comparatively poor performance of insulin as a second-line <53 mmol/mol (<7%) without adding a third-line ADD at 6 and agent may be surprising, as randomized controlled trial data show 12 months were significantly higher in the incretin and TZD groups, that insulin can achieve at least as much HbA1c reduction as other compared with the insulin and SU groups. Treatment with incretins agents. A possible reason for this is that insulin is often chosen when or TZDs also offered a significantly higher probability of sustaining there are multiple comorbidities, and in such patients, the HbA1c tar- this glycaemic achievement over 24 months of treatment without the get may be higher, and many other potential third-line ADDs may be need for further therapy intensification. Among those who initiated a contra-indicated. In addition, the insulin dose may be inadequately second-line ADD at HbA1c levels of 64–75 mmol/mol (8–9%), DPP- titrated because of adverse effects, such as hypoglycaemia and 4 inhibitors and TZDs offered significantly higher and similar chances weight gain, as well as inadequate healthcare professional support for of reducing HbA1c to <58 mmol/mol (<7.5%) over 24 months of the regular titration of insulin doses. More work needs to be carried therapy without adding a third ADD, compared with other second- out to determine how best to translate the clinical trial efficacy of line groups. GLP-1RAs and TZDs offered the highest chances of sus- insulin into clinical practice effectiveness. taining this control over 24 months, while treatment with SUs, insulin We observed that the proportion of patients who intensify treat- and DPP-4 inhibitors provided significantly lower sustainability ment with a third ADD has decreased only moderately during the last chances. decade, despite the increasing availability of newer agents. Lipska In this real-world study, we observed similar performance by et al29 reported that overall glycaemic control in the United States DPP-4 inhibitors and GLP-1RAs in terms of the probability of reduc- did not change between 2006 and 2013. ing HbA1c to a clinically desirable glycaemic target over 24 months A strength of the present study is the availability of data from of therapy, when added to metformin. In terms of sustaining the gly- patients' medication lists that include prescribed medications within caemic achievements over 12 months, GLP-1RAs appear to offer the EMR network and also medications that could be prescribed out- higher chances among patients with HbA1c <75 mmol/mol (<9%) at side of the EMR. Furthermore, the CEMR database tracks longitudinal second-line initiation (~76%-79% probability), compared with DPP-4 treatment adjustments, and contains comprehensive clinical informa- inhibitors (~68%-73% probability); however, this difference disap- tion, which is usually not available in claims databases. In addition, we pears at 24 months of therapy. While SUs as second-line therapy used advanced data mining and statistical methods. Given unequal offer a higher probability of achieving desirable glycaemic control probabilities of receiving particular second-line agents in the real- across all HbA1c categories (<108 mmol/mol (<12%)) compared with world scenario, we modelled treatment assignment with multinomial insulin over 2 years, the probability of sustaining the early glycaemic propensity scores, and then assessed the adjusted outcomes of the achievement appears to be similar between these two therapy study. options. We have seen that, across all HbA1c categories, treatment The limitations of this study include the non-availability of data with second-line TZDs provided better or similar glycaemic achieve- on: (1) adherence and side-effects; (2) diet and exercise; (3) socio- ments and sustainability, compared with other therapy options. This economic status; and (4) insurance type. Carls and colleagues31 result supports the study by Mamza et al,26 who reported that treat- highlighted alarmingly low rates of medication adherence as the main ment with post-metformin TZD provides the most durable glycaemic cause of the disconnect between results of real-world studies and response compared with second-line SU and DPP-4 inhibitor treat- clinical trials. Importantly, in the present study, we focused only on ment. Recent results of the TOSCA.IT trial, providing cardiovascular those who continued the second-line therapy for a minimum of safety reassurance with pioglitazone, taken in conjunction with our 6 months. Montvida et al4 recently reported higher discontinuation results may increase the popularity of TZDs as a therapeutic rates for incretins, compared with older treatment alternatives. option.27 To conclude, incretin-based therapies and TZDs offered a higher Compared with sulphonylurea add-on to metformin, Thomsen probability of long-term glycaemic achievements and of sustaining et al8 reported a higher likelihood of achieving HbA1c below 53 these, compared with SUs and insulin for metformin-treated patients mmol/mol (7%) at 6 months for second-line GLP-1RA users (Relative with type 2 diabetes. While the results of a large randomized Risk (95% CI) of 1.10 (1.01, 1.19)), and lower likelihoods for DPP-4 controlled trial (GRADE) comparing glycaemic efficacy of major inhibitor (Relative Risk (95% CI) 0.94 (0.89, 0.99)) and insulin users second-line therapies are not expected before 2020, the present (Relative Risk (95% CI) 0.88 (0.77, 0.99)). Our results are closer to study provides much-needed information to patients and clinicians those of the study conducted by Rathmann et al,7 who reported odds with regard to the probability of sustainable glycaemic control with ratios (with SU as reference) of achieving HbA1c below 53 mmol/mol different therapy options.30 (7%) of 1.2, 1.4, 1.7 and 0.7 for second-line DPP-4 inhibitors, GLP- 1RAs, TZDs and insulin, respectively. Our findings are also in line with a study that used data from the ACKNOWLEDGMENTS National Health and Nutrition Examination Survey, which reported Melbourne EpiCentre gratefully acknowledges the support from the that only half of patients achieve HbA1c below 53 mmol/mol (7%).28 Australian Government Department of Education's National Collabo- Furthermore, in patients with HbA1c <75 mmol/mol (<9%) at rative Research Infrastructure Strategy (NCRIS) initiative through second-line initiation, we observed that only 30% maintained Therapeutic Innovation Australia. O.M. acknowledges the PhD

119 MONTVIDA ET AL. 9 scholarship from Queensland University of Technology, Australia, and patients with type 2 diabetes: a meta-analysis. JAMA. 2016;316: her co-supervisors Prof. Ross Young and Prof. Louise Hafner of the 313-324. 10. Maruthur NM, Tseng E, Hutfless S, et al. Diabetes medications as same University. No separate funding was obtained for this study. monotherapy or metformin-based combination therapy for type 2 dia- betes: a systematic review and meta-analysis. Ann Intern Med. 2016; 164:740-751. Conflict of interest 11. Bennett WL, Maruthur NM, Singh S, et al. Comparative effectiveness and safety of medications for type 2 diabetes: an update including S.K.P. has acted as a consultant and/or speaker for Novartis, GI new drugs and 2-drug combinations. Ann Intern Med. 2011;154: Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi Pharmaceutical 602-613. and Amylin Pharmaceuticals LLC. He has received grants in support 12. Zhang Y, McCoy RG, Mason JE, Smith SA, Shah ND, Denton BT. of investigator and investigator-initiated clinical studies from Merck, Second-line agents for glycemic control for type 2 diabetes: are newer agents better? Diabetes Care. 2014;37:1338-1345. Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi- 13. Centers for Disease Control and Prevention. National Diabetes Statis- Avensis and Pfizer. O.M. has no conflict of interest to declare. tics Report: estimates of diabetes and its burden in the United States. J.E.S. has received honoraria or grant support from Merck Sharp and Atlanta, GA: US Department of Health and Human Services, 2014. 14. Crawford AG, Cote C, Couto J, et al. Comparison of GE centricity Dohme, Novo Nordisk, Eli Lilly, AstraZeneca, Sanofi-Aventis, Mylan electronic medical record database and National Ambulatory Medical Pharmaceuticals and Boehringer Ingelheim. Care Survey findings on the prevalence of major conditions in the United States. Popul Health Manag. 2010;13:139-150. 15. Brixner D, Said Q, Kirkness C, Oberg B, Ben-Joseph R, Oderda G. Author contributions Assessment of cardiometabolic risk factors in a national primary care electronic health record database. Value in health. 2007;10(s1). O.M. and S.K.P. were responsible for the primary design of the study. 16. Paul SK, Shaw J, Montvida O, Klein K. Weight gain in insulin treated O.M. conducted the data extraction. O.M. and SKP jointly conducted patients by BMI categories at treatment initiation: new evidence from the statistical analyses. The first draft of the manuscript was devel- real-world data in patients with type 2 diabetes. Diabetes Obes Metab. 2016;18(12):1244-1252. oped by O.M. and S.K.P., and all authors contributed to the finaliza- 17. Montvida O, Arandjelović O, Reiner E, Paul SK. Data mining approach tion of the manuscript. S.K.P. had full access to all the data in the to estimate the duration of drug therapy from longitudinal electronic study and takes responsibility for the integrity of the data and the medical records. Open Bioinforma J. 2017;10:1-15. 18. Thomas G, Klein K, Paul S. Statistical challenges in analysing large lon- accuracy of the data analysis. gitudinal patient-level data: the danger of misleading clinical infer- ences with imputed data. J Indian Soc Agric Stat. 2014;68:39-54. ORCID 19. Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defin- ing comorbidities in ICD-9-CM and ICD-10 administrative data. Med Jonathan E. Shaw http://orcid.org/0000-0002-6187-2203 Care. 2005;43:1130-1139. Lawrence Blonde http://orcid.org/0000-0003-0492-6698 20. Ridgeway G, McCaffrey DF, Morral AR, Burgette LF, Griffin BA. Sanjoy K. Paul http://orcid.org/0000-0003-0848-7194 Toolkit for Weighting and Analysis of Nonequivalent Groups: A Tuto- rial for the R TWANG Package. Santa Monica, CA: RAND Corpora- tion, 2014. https://www.rand.org/pubs/tools/TL136z1.html. 21. Lunceford JK, Davidian M. Stratification and weighting via the pro- REFERENCES pensity score in estimation of causal treatment effects: a comparative 1. American Diabetes Association. Standards of medical care in study. Stat Med. 2004;23:2937-2960. diabetes—2017: summary of revisions. Diabetes Care. 2017;40:S4-S5. 22. McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation 2. Turner RC, Cull CA, Frighi V, Holman RR, Group UPDS. Glycemic con- with boosted regression for evaluating causal effects in observational trol with diet, sulfonylurea, metformin, or insulin in patients with type studies. Psychol Methods. 2004;9:403. 2 diabetes mellitus: progressive requirement for multiple therapies 23. Woodcock J, Sharfstein JM, Hamburg M. Regulatory action on rosigli- (UKPDS 49). JAMA. 1999;281:2005-2012. tazone by the US Food and Drug Administration. N Engl J Med. 2010; 3. Garber AJ, Abrahamson MJ, Barzilay JI, et al. Consensus statement by the 363:1489-1491. American association of clinical endocrinologists and American College of 24. Tanne JH. FDA places" black box" warning on antidiabetes drugs. Endocrinology on the comprehensive type 2 diabetes management BMJ. 2007;334:1237. algorithm–2017 executive summary. Endocr Pract. 2017;23:207-238. 25. Nissen SE, Wolski K. Effect of rosiglitazone on the risk of myocardial 4. Montvida O, Shaw J, Atherton JJ, Stringer F, Paul SK. Long-term infarction and death from cardiovascular causes. N Engl J Med. 2007; trends in antidiabetes drug usage in the US: real-world evidence in 356:2457-2471. patients newly diagnosed with type 2 diabetes. Diabetes Care. 2018; 26. Mamza J, Mehta R, Donnelly R, Idris I. Important differences in the 41:69-78. durability of glycaemic response among second-line treatment 5. Giugliano D, Maiorino MI, Bellastella G, Esposito K. Comment on options when added to metformin in type 2 diabetes: a retrospective Edelman and Polonsky. Type 2 diabetes in the real world: the elusive cohort study. Ann Med. 2016;48:224-234. nature of glycemic control. Diabetes Care. 2017;40:1425-1432. Dia- 27. Vaccaro O, Masulli M, Nicolucci A, Bonora E, Del Prato S, betes Care 2018;41:e17. Maggioni AP, Rivellese AA, Squatrito S, Giorda CB, Sesti G, 6. McCarthy MI. Painting a new picture of personalised medicine for Mocarelli P. Effects on the incidence of cardiovascular events of the diabetes. Diabetologia. 2017;60:793-799. addition of pioglitazone versus sulfonylureas in patients with type 2 7. Rathmann W, Bongaerts B, Kostev K. Change in glycated haemoglo- diabetes inadequately controlled with metformin (TOSCA. IT): a ran- bin levels after initiating second-line therapy in type 2 diabetes: a pri- domised, multicentre trial. Lancet Diabetes Endocrinol. 2017;5: mary care database study. Diabetes Obes Metab. 2016;18:840-843. 887-897. 8. Thomsen RW, Baggesen LM, Søgaard M, et al. Early glycaemic control 28. Edelman SV, Polonsky WH. Type 2 diabetes in the real world: the in metformin users receiving their first add-on therapy: a elusive nature of glycemic control. Diabetes Care. 2017;40: population-based study of 4,734 people with type 2 diabetes. Diabe- 1425-1432. tologia. 2015;58:2247-2253. 29. Lipska KJ, Yao X, Herrin J, et al. Trends in drug utilization, glycemic 9. Palmer SC, Mavridis D, Nicolucci A, et al. Comparison of clinical out- control, and rates of severe hypoglycemia, 2006–2013. Diabetes Care. comes and adverse events associated with glucose-lowering drugs in 2017;40:468-475.

120 10 MONTVIDA ET AL.

30. Nathan DM, Buse JB, Kahn SE, et al. Rationale and design of the gly- cemia reduction approaches in diabetes: a comparative effectiveness How to cite this article: Montvida O, Shaw JE, Blonde L, study (GRADE). Diabetes Care. 2013;36:2254-2261. 31. Carls GS, Tuttle E, Tan R-D, et al. Understanding the gap between Paul SK. Long-term sustainability of glycaemic achievements efficacy in randomized controlled trials and effectiveness in with second-line antidiabetic therapies in patients with type 2 real-world use of GLP-1 RA and DPP-4 therapies in patients with diabetes: A real-world study. Diabetes Obes Metab. 2018; type 2 diabetes. Diabetes Care. 2017;40:1469-1478. 1–10. https://doi.org/10.1111/dom.13288 SUPPORTING INFORMATION

Additional Supporting Information may be found online in the sup- porting information tab for this article.

121 Chapter 9: Cardio-metabolic Risk Factor Burden and Safety

Statement of Contribution of Co-Authors for Thesis by Published Paper

The authors listed below have certified* that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Olga Montvida, Xiaoling Cai, Sanjoy K Paul. Cardio-metabolic risk factor burden and safety in patients with type 2 diabetes receiving intensified anti-diabetic and cardio-protective therapies.

Contributor Statement of Contribution* Olga Montvida Conceived the idea and was responsible for the primary 29.06.2018 design of the study. Conducted the data extraction and QUT Verified statistical analyses. Developed first draft and contributed Signature towards development of the manuscript. Xiaoling Cai Contributed to the manuscript development. Sanjoy K. Paul Conceived the idea, and was responsible for the primary design of the study. Contributed to the statistical analyses. Developed first draft and contributed towards development of the manuscript.

122 Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship.

QUT Verified Signature

Sanjoy Ketan Paul 29.06.2018 Name Signature Date

123 ABSTRACT

Background: Individualized treatment of patients with type 2 diabetes requires detailed

evaluation of risk factor dynamics at population level. The aim was to evaluate persistent

glycaemic and cardiovascular (CV) risk factor burden over 2 years post treatment

intensification (TI).

Methods: From US Centricity Electronic Medical Records, 276,884 patients with type 2

diabetes who intensified metformin were selected. SBP ≥ 130 / 140 mmHg and LDL ≥ 70 /

100 mg/dL were defined as uncontrolled for those with / without a history of CV disease

(CVD) at TI. Triglycerides (Trig) ≥ 150 mg/dL and HbA1c ≥ 7.5% were defined as

uncontrolled. Longitudinal measures over 2 years post TI were used to define continuously

uncontrolled patients.

Findings: With 3.7 years mean follow-up, patients were 59 years old, 70% were obese, 22% had history of CVD; 60 / 30 / 50 / 48% had uncontrolled HbA1c / SBP / LDL / Trig at TI;

81% and 69% were receiving therapies for blood pressure and lipid control respectively.

The proportion of patients with consistently uncontrolled HbA1c increased from 31% in 2005 to 41% in 2014. Among those on lipid-modifying drugs, 41% and 37% had consistently high

LDL and Trig over 2 years. Being on blood pressure control therapies, 29% had continuously uncontrolled SBP. Among patients receiving cardio-protective therapies, 62% failed to

achieve control in HbA1c + LDL, 62% in HbA1c + Trig, and 55% in HbA1c + SBP over 2

year post TI. Rates per 1000-person years of major adverse cardiovascular events were lower

among those who intensified metformin with GLP-1RA, compared to other therapies.

Page 2 of 25

124 Interpretation: Among patients on multiple therapies for risk factor control, more than a third had uncontrolled HbA1c, lipid and SBP levels, and 3 out of 5 had uncontrolled HbA1c

and at least one CV risk factor over 2 years post TI.

Page 3 of 25

125 INTRODUCTION

Cardiovascular (CV) disease in patients with type 2 diabetes has been in much of focus

during the last decade and remains so till date, being the most common reason of death and

comorbidities among patients with diabetes 1,2. The efficient management of these patients

requires a multi-faced approach to holistically control for hyperglycaemia and CV risk factors

such as blood pressure, body weight, and lipids 3,4. Recent review by Khunti and colleagues

discuss current evidence of early control of glycose, lipids, and blood pressure on CV

benefits 5.

While American and international guidelines constantly stress out the importance of cardio-

metabolic risk factor control, the population-level control has not improved during last

decade in the US 6-8. Using data from National Health and Nutrition Examination Survey,

Carls and colleagues reported that 57% of patients with diabetes during 2003-2006 achieved

HbA1c < 7%, while only 51% in the 2011-20148. Similarly, using privately insured and

Medicare Advantage patients with type 2 diabetes, Lipska and colleagues reported declining proportion of patients with HbA1c < 7% from 56% in 2006 to 54% in 2013 7. Ali and

colleagues reported that only 14% of patients with diabetes had simultaneous control of

glycose, blood pressure, cholesterol and non-smoking status during 1999-2010 in the US 6.

Another study on 530,747 patients from Diabetes Collaborative Registry, reported that 83% and 81% of patients have hypertension and hyperlipidaemia respectively 3.

Significant portion of patients with type 2 diabetes eventually intensify first-line metformin

apart from using multiple cardio-protective medications, nonetheless poor cardio-metabolic

risk factor control is common in these patients. While previous studies were using general

population of individuals with diabetes, to the best of our knowledge, there is no study that

holistically explores the patterns of risk factor control post therapy intensification at

Page 4 of 25

126 population level. In this context, assessment of those who continuously fail to control risk

factors would help to understand whether increasing evidence of early control benefits and

introduction of newer classes of anti-diabetic drugs (ADDs) has helped to improve population

health during the last decade.

In patients with type 2 diabetes identified from US primary and secondary ambulatory care

systems’ electronic medical records (EMRs), the aims of this study were to provide up-to-

date exploration of the population-level (1) glycaemic and CV risk factor control at the time

of metformin intensification; (2) simultaneous control of glycaemic and CV risk factors post- metformin intensification, (3) persistent glycaemic and CV risk factor burden in those who are using anti-diabetic and cardio-protective therapies; and (4) rates of major adverse cardiovascular events by different second-line ADD choices.

RESEARCH DESIGN AND METHODS

Data Source

The Centricity EMRs were used in this study, the database represents more than 35,000 solo practitioners, community clinics, academic medical centres, and large integrated delivery networks across all US states. Patients in the database are generally representative of the

USA population, among those who were active in the CEMR during 2015 and were older than 18 years, 11.6% were identified to have any type of diabetes. This estimate stands very close to the US National Diabetes Statistics (NDS) report that estimated 12.2% of adult population to have diabetes in 20159. The database has been extensively used for academic

research worldwide10-12.

Page 5 of 25

127 For more than 34 million individuals, longitudinal EMRs were available from 1995 until

April 2016, with comprehensive patient-level information on demographics, anthropometric, clinical and laboratory variables.

Study Design

For each identified patient with type 2 diabetes, the following drug classes were arranged chronologically according to the initial prescription dates: metformin, sulfonylurea (SU), thiazolidinedione (TZD), alpha glucosidase inhibitor, amylin, dopamine receptor agonist, meglitinide, dipeptidyl peptidase-4 inhibitor (DPP-4i), glucagon like peptide 1 receptor agonist (GLP-1RA), sodium glucose cotransporter inhibitor, and insulin (INS). Next, the data on individual patient’s first-, second-, and third-line ADDs was created. A robust methodology for extraction and assessment of longitudinal patient-level medication data from the Centricity EMRs has been recently described by the authors13.

Main study cohort included patients with: (1) age at diagnosis 18 and <80 years, (2) diagnosis date strictly after first registered activity in the EMR database, (3) diagnosis date on or after January 1, 2005, (4) initiated anti-diabetic therapy with metformin, (5) initiated second-line ADD with SU, TZD, DPP-4i, GLP-1RA or INS, (6) available HbA1c, systolic blood pressure (SBP), low density lipoprotein (LDL), or triglycerides measure at second-line

ADD initiation (baseline), (7) second-line therapy duration at least three months, and (8) follow-up from baseline at least six months. Additional restrictions on the follow-up were applied: 12 months (sub-cohort 1) and 24 months (sub-cohort 2).

HbA1c measures at baseline, 6, 12, 18, and 24 months were obtained as the nearest measure within 3 months either side of the time point. Baseline and longitudinal body weight, SBP, and lipids were calculated as the average of available measures within 3 months either side of

Page 6 of 25

128 the time point. With the condition of at least two non-missing follow-up data over 24 months, the missing data were imputed using a Markov Chain Monte Carlo method adjusting for age, diabetes duration and usage of concomitant ADDs14.

The presence of comorbidities prior to baseline was assessed by relevant disease identification codes. Cardiovascular disease (CVD) was defined as ischaemic heart disease, peripheral vascular disease, heart failure (HF), or stroke. Three-point major adverse cardiovascular event (MACE) was defined as presence of HF, Myocardial infarction (MI), or stroke.

Lipid modifying agents included all FDA approved drugs with highest Anatomical

Therapeutic Chemical (ATC) classification code of C10. Drugs against high blood pressure were defined by the ATC codes of C02-C04 and C07-C09 (includes diuretics and vasodilators).

SBP ≥ 130/ 140 mmHg for those with/ without CVD history at baseline was defined as uncontrolled. Similarly, LDL ≥ 70/ 100 mg/dL for those with/ without CVD history at baseline was defined as uncontrolled. Triglycerides ≥ 150 mg/dL and HbA1c ≥ 7.5% were defined as uncontrolled.

Statistical Methods

Baseline characteristics were summarised as number (%), mean (SD) or median (first quartile, third quartile). Longitudinal failure to control risk factors (individual and pairwise) and risk factor burden were calculated irrespective of baseline control status. Failure to control LDL and triglycerides at 6, 12, 24 months was calculated in those who were using a lipid modifying drug prior to 6, 12, 24 months of baseline, respectively. Similarly, failure to control SBP was calculated only in those who were using a blood pressure lowering drug

Page 7 of 25

129 prior to 6, 12, 24 months of baseline. Sub-cohort 1 and sub-cohort 2 were used for one and

two year analyses, respectively. Pairwise failure to simultaneously control HbA1c plus (1)

LDL, (2) triglycerides, and (3) SBP was summarised as proportion (95% CI) at 6, 12, and 24

of baseline. Probability (95% CI) of failure to control both risk factors by second-line ADD

groups was calculated using “Treatment Effects” modelling approach15-17. Second-line

treatment groups were balanced on baseline risk factor measurements. Probit model for

likelihood of failure was adjusted for sex, duration of diabetes, baseline age and body weight.

Risk factor two-year burden was defined as uncontrolled measures (at 6 months OR at 12 months) AND (at 18 months OR at 24 months) for patients in sub-cohort 2. Two-year burden for LDL and triglycerides was calculated among those who were using a lipid modifying drug prior to 12 months of baseline. Two-year burden for SBP was calculated among those who were using a blood pressure lowering drug prior to 12 months of baseline. Proportions of continuously uncontrolled patients (two-year burden) were summarized by the year of second-line ADD initiation and by the class of second-line ADD. Standard life-table methods were used to estimate rates per 1000 person years (95% CI) of MACE by class of second-line

ADDs.

RESULTS

From 2,624,954 identified patients with T2DM, 276,884 met the inclusion criteria

(Supplementary Figure 1, Table 1). With mean follow-up of 3.7 years, 89% of cohort had at least one year of follow-up. In the cohort majority of patients were obese (n=187,936, 70%), and 60,317 (22%) had a history of CVD on or prior to baseline. Those with a history of CVD were older (mean: 64 years) and more likely to be male (61%) than those without a history of

CVD (mean: 57 years; 46% male). With a mean (SD) HbA1c of 8.4 (1.9)% at the time of second ADD initiation, 54/ 61% of patients with/ without a history of CVD had HbA1c

Page 8 of 25

130 ≥7.5% respectively. With mean (SD) LDL of 97 (35) mg/dL, 67% of those with a history of

CVD had LDL ≥70 mg/dL, while 46% of those without CVD history had LDL ≥100 mg/dL.

Baseline triglycerides ≥ 150 mg/dL had 48% of patients. In the sub-cohort 1, among those with/ without a history of CVD 90/ 74% were using a lipid modifying drug prior or within 1 year of baseline (data not shown). With mean (SD) SBP of 131 (15) mmHg, 50% of those with a history of CVD had SBP ≥130 mmHg, while 25% of those without CVD history had

SBP ≥140 mmHg. In the sub-cohort 1, among those with/ without a history of CVD 97 / 84% were using a blood pressure lowering drug prior or within 1 year of baseline (data not shown).

Individual Risk Factor Failure

Irrespective of baseline control, 37, 39, and 42% of patients failed to achieve HbA1c below

7.5% at 6, 12, and 24 months post intensification with second-line ADD (Table 2). The proportions of those who failed to control HbA1c were lower for those with a history of CVD at baseline (32-38%), compared to those without a history of CVD at baseline (38-42%, data not shown). Among patients, who were using a lipid modifying drug, 43% had uncontrolled

LDL over two years post baseline (Table 2), whereas 64 / 36% of those with / without a history of CVD failed to achieve LDL< 70 /100 mg/dL (data not shown). Among patients, who were using a lipid modifying drug, 46% had uncontrolled triglycerides over two years post baseline (Table 2), the proportions were similar among those with/ without a history of

CVD at baseline. Among patients, who were using a blood pressure lowering drug, 30% failed to control SBP during two years post intensification with second-line ADD, whereas 49

/ 24% of those with / without a history of CVD failed to achieve SBP < 130/140 mmHg over

2 years.

Page 9 of 25

131 Among patients with baseline HbA1c ≥7.5 and ≤ 9%, 43/ 46/ 48% failed to achieve HbA1c <

7.5% at 6/ 12/ 24 months, irrespective of additional therapy intensification (Supplementary

Figure 2). Among those who were using a lipid modifying drug and had uncontrolled LDL at baseline, the proportions of those who were uncontrolled at 6/12/ 24 months were 71 /65/

60%. Similarly, more than 60% continued to have uncontrolled triglycerides, among those who were uncontrolled at baseline. Among those who were using a blood pressure lowering drug and had uncontrolled SBP at baseline, the proportions of those who were uncontrolled at

6/12/ 24 months were 60/ 55/ 51% (Supplementary Figure 2).

Pairwise Risk Factor Control

Among patients who were using a lipid modifying drug, apart from being on intensified ADD by design, around 62% failed to simultaneously control HbA1c+LDL over two years post second-line ADD initiation (Table 2), whereas around 75 / 58% of those with / without a history of CVD failed to control both risk factors simultaneously (data not shown). The adjusted probability (95% CI) of failing to simultaneously control both risk factors at 24 months was the lowest in those who initiated second-line with GLP-1RA [0.55 (0.53, 0.58)] and TZD [0.55 (0.54, 0.57)], followed by DPP-4i [0.59 (0.58, 0.60)], and significantly higher for SU [0.65 (0.64, 0.65)] and INS [0.69 (0.68, 0.70)] users (Table 3).

Among patients who were using a lipid modifying drug, around 62% failed to simultaneously control HbA1c+Triglycerides over two years post second-line ADD initiation (Table 2). The adjusted probability (95% CI) of failing to simultaneously control both risk factors at 24 months was the lowest in those who initiated second-line with TZD [0.52 (0.50, 0.53)], followed by DPP-4i [0.56 (0.55, 0.57)] and GLP-1RA [0.56 (0.52, 0.60)], and significantly higher for SU [0.63 (0.62, 0.64)] and insulin [0.65 (0.63, 0.66)] users (Table 3).

Page 10 of 25

132 Among patients who were using a blood pressure lowering drug, 53/ 55/ 57% failed to

simultaneously control HbA1c+SBP at 6, 12, and 24 months post second-line ADD initiation

(Table 2). Among those with/ without a history of CVD 64-67% and 50-54% of patients

failed to control both risk factors simultaneously (data not shown). The adjusted probability

(95% CI) of failing to simultaneously control both risk factors at 24 months, was lower in

those who initiated second-line with TZD [0.48 (0.47, 0.50)], GLP-1RA [0.49 (0.47, 0.51)],

or DPP-4i [0.51 (0.50, 0.52)], compared to those who initiated with SU [0.60 (0.59,0.60)] and

insulin [0.64 (0.63, 0.65)] groups (Table 3).

Continued Risk Factor Burden

Among those with follow-up for at least 24 months, 35% had continuously uncontrolled

HbA1c of more than 7.5%. The two-year burden increased from 31% for those who

intensified first-line in 2005 till 41% for those who intensified therapy in 2014 (Figure 1A).

Two-year burden increased from 28 to 36% and from 32 to 42% for those with/ without a history of CVD at baseline (Figure 1B and C). The proportions of those with continuously

uncontrolled HbA1c were lower among those who initiated second-line with GLP-1RA (95%

CI: 24-26%) and TZD (95% CI: 23-24%), followed by DPP-4i (95% CI: 28-30%), and

significantly higher for SU (95% CI: 39-40%) and INS (95% CI: 50-51%) (Figure 1 D).

Among those who initiated a lipid modifying drug prior to 12 months of baseline and had at least two years of follow-up, 41% had continuously uncontrolled LDL (Figure 1A). Among those with/ without a history of CVD at baseline, 65/ 33% had continuously uncontrolled

LDL over 2 years of baseline (Figure 1B and C). Comparing by the CVD status at baseline,

Page 11 of 25

133 the proportions of those with two-year LDL burden were similar among second-line ADD

classes (Figure 1 E-F).

Among those who initiated a lipid modifying drug prior to 12 months of baseline and had at

least two years of follow-up, 37% had continuously uncontrolled triglycerides of more than

150 mg/dL (Figure 1A-C). Among those without CVD history at baseline, the proportion of

those with continuously uncontrolled triglycerides was lower in TZD (95% CI: 31-33%) and

INS (95% CI: 33-36%) groups, compared to other second-line ADDs (95% CIs: 37-41%)

(Figure 1E). Among those with CVD history at baseline, the lowest proportion of those with continuously uncontrolled triglycerides was in TZD (95% CI: 28-32%) group, compared to other second-line ADDs (95% CIS: 34-43%) (Figure 1F).

Among those who initiated a blood pressure lowering prior to 12 months of baseline and had at least two years of follow-up, 27-33% had continuously uncontrolled SBP (Figure 1A).

Among those with/ without a history of CVD at baseline, 51/ 21% had continuously uncontrolled SBP over 2 years of baseline (Figure 1B and C). Those who initiated second- line ADD with GLP-1RA had the lowest two-year SBP burden (95% CI: 18-20%), followed by DPP-4i (95% CI: 23-24%), TZD (95% CI: 25-26%), SU and INS (95% CIs: 30-32%)

(Figure 1 D-F).

Cardiovascular risk

Individual and composite rates per 1000 person-years of HF, MI, and stroke along with number of failures are presented in the Table 4. In the primary prevention group, 18,438 3- point MACE events occurred during available follow-up. The lowest rate per 1000 person- years was observed in those who initiated second-line with GLP-1RA (95% CI: 13-15),

Page 12 of 25

134 followed by DPP-4i and TZD groups (95% CIs: 19, 21), SU (95% CI: 26-27), and INS (95%

CI: 31-34). Such pattern was preserved in individual analyses for HF, MI, and stroke as well.

In the secondary prevention group, 15,323 3-point MACE events occurred during available

follow-up. The lowest rate per 1000 person-years was observed in those who initiated

second-line with GLP-1RA (95% CI: 53-66), followed by DPP-4i and TZD groups (95% CIs:

66- 80), SU (95% CI: 86-90), and INS (95% CI: 100-108). Such pattern was preserved in individual analyses for HF, MI, and Stroke as well (Table 4).

CONCLUSIONS

In this longitudinal exploratory study of a large cohort of patients with type 2 diabetes from primary and ambulatory care systems of USA we have observed that (1) irrespective of baseline control, more than 40% of patients do not meet 7.5% target after two years post metformin intensification; (2) long-term glycaemic burden has increased over the last decade;

(3) around third of patients have consistently uncontrolled lipids and SBP even though they are using cardio-protective drugs; (4) treatment with GLP-1RA was associated with lower rates of major adverse macrovascular events.

The results of this study clearly demonstrate persistent glycaemic and CV risk factor burden among patients who are using multiple medications for glucose, lipid, and blood pressure control. Three out of five patients who are already receiving intensified treatment are failing to simultaneously control glucose level and at least one CV risk factor. Furthermore, the proportions of patients who fail to control CV risk factors are not reducing over the time, and glycaemic burden has increased during the last decade.

Population ageing, therapy non-adherence and not adequate treatment intensification when needed (therapeutic inertia) may explain observed patterns. While statin prescribing patterns

Page 13 of 25

135 are increasing, using US Medical Expenditure Panel Survey data, Salami and colleagues

reported that use of high intensity statins was only 18-20% in patients with diabetes and no

atherosclerotic cardiovascular disease 18. Similarly, Abdallah and colleagues reported that

among 1,300 patients with diabetes, 88% were prescribed statins at the time of hospital

discharge for acute myocardial infarction, whereas only 22% were prescribed intensive statin

therapy 19. Further studies investigating intensification patterns for lipid and blood pressure

control and long-term consequences of not intensifying therapy when needed are required in

patients with diabetes.

We have observed the lower probabilities of simultaneous failure in control of HbA1c and

LDL, triglycerides, or SBP in those who initiated second-line with GLP-1RA, DPP-4i, and

TZD, compared to those who were treated with SU and INS. In a recent study by Montvida

and colleagues it was shown that incretin-based therapies and TZD provide higher chance of

sustainable glycaemic compared to SU and INS 20. While two-year glycaemic burden was

significantly different by the ADD classes, those who were treated with TZD had lower

triglycerides burden and those on GLP-1RA had lower SBP burden. The composite and

individual rates of HF, MI, and stroke were lowest for those who initiated second-line with

GLP-1RA, compared to other ADDs in cohorts with and without a history of CVD.

In general, the CEMR database is representative of US population in terms of age and ethnic

subgroups, however a higher proportions of patients from north eastern and mid-western states are represented in the CEMR 21. The distribution of CV risk factors was found to be

similar to the prospective national health surveys 22. A large cohort size with average of 3.7

years of follow-up post metformin intensification assure reliable estimates reported in the present study. Drug use data available from patient’s medication lists along with prescribing information and robust data mining methodologies applied, bring additional value to this

Page 14 of 25

136 study 13,23. Nonetheless, the findings should be interpreted with caution: EMR data are in general biased towards unhealthy populations and commercially insured individuals, White

Caucasians are over represented in the CEMR, and the results are subject to limited follow- up.

To conclude, we have observed alarming rates of population-level glycaemic and CV risk factor control, whereas the burden has not reduced during the last decade. While treatment guidelines, clinician, and population education are constantly improving, the CV burden and associated costs of diabetes management are unlikely to reduce in the nearest future.

ACKNOWLEDGEMENTS

OM and SKP were responsible for the primary design of the study. JHS contributed significantly in the study design. OM conducted the data extraction. OM and SKP jointly conducted the statistical analyses. The first draft of the manuscript was developed by OM and

SKP, and all authors contributed to the finalization of the manuscript. SKP had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Melbourne EpiCentre gratefully acknowledges the support from the National Health and

Medical Research Council and the Australian Government’s National Collaborative Research

Infrastructure Strategy (NCRIS) initiative through Therapeutic Innovation Australia. OM acknowledges the Ph. D. scholarship from Queensland University of Technology, Australia, and her co-supervisors Prof. Ross Young and Prof. Louise Hafner of the same University.

JHS is supported by a National Health and Medical Research Council Research Fellowship.

No separate funding was obtained for this study.

Declaration of interests

Page 15 of 25

137 SKP has acted as a consultant and/or speaker for Novartis, GI Dynamics, Roche,

AstraZeneca, Guangzhou Zhongyi Pharmaceutical and Amylin Pharmaceuticals LLC. He has received grants in support of investigator and investigator initiated clinical studies from

Merck, Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi-Avensis and

Pfizer. OM has no conflict of interest to declare. JHS has received speaker honoraria, consultancy fees and/or travel sponsorship from AstraZeneca, Boehringer Ingelheim, Lilly,

Sanofi, Mylan, Novo Nordisk, Merck Smith and Dohme and Novartis.

Page 16 of 25

138 REFERENCES 1. Fox CS, Coady S, Sorlie PD, et al. Increasing cardiovascular disease burden due to diabetes mellitus: the Framingham Heart Study. Circulation 2007; 115(12): 1544-50. 2. Turnbull FM, Abraira C, Anderson RJ, et al. Intensive glucose control and macrovascular outcomes in type 2 diabetes. Diabetologia 2009; 52(11): 2288-98. 3. Arnold SV, Kosiborod M, Wang J, Fenici P, Gannedahl G, LoCasale RJ. Burden of Cardio-Renal-Metabolic Conditions in Adults with Type 2 Diabetes within the Diabetes Collaborative Registry. Diabetes, Obesity and Metabolism 2018. 4. American Diabetes Association. Standards of Medical Care in Diabetes—2018. Diabetes Care 2018; 41(Supplement 1): S4. 5. Khunti K, Kosiborod M, Ray KK. Legacy benefits of blood glucose, blood pressure and lipid control in people with diabetes and cardiovascular disease: Time to overcome multifactorial therapeutic inertia? Diabetes, Obesity and Metabolism 2018. 6. Ali MK, Bullard KM, Saaddine JB, Cowie CC, Imperatore G, Gregg EW. Achievement of goals in US diabetes care, 1999–2010. New England Journal of Medicine 2013; 368(17): 1613-24. 7. Lipska KJ, Yao X, Herrin J, et al. Trends in drug utilization, glycemic control, and rates of severe hypoglycemia, 2006–2013. Diabetes care 2017; 40(4): 468-75. 8. Carls G, Huynh J, Tuttle E, Yee J, Edelman SV. Achievement of glycated hemoglobin goals in the US remains unchanged through 2014. Diabetes Therapy 2017; 8(4): 863-73. 9. Centers for Disease Control and Prevention. National diabetes statistics report: estimates of diabetes and its burden in the United States, 2018. Atlanta, GA: US Department of Health and Human Services 2018. 10. Crawford AG, Cote C, Couto J, et al. Comparison of GE Centricity Electronic Medical Record database and National Ambulatory Medical Care Survey findings on the prevalence of major conditions in the United States. Population health management 2010; 13(3): 139-50. 11. Brixner D, Said Q, Kirkness C, Oberg B, Ben-Joseph R, Oderda G. Assessment of cardiometabolic risk factors in a national primary care electronic health record database. Value in health 2007; 10(s1): S29-S36. 12. Paul SK, Shaw J, Montvida O, Klein K. Weight gain in insulin treated patients by BMI categories at treatment initiation: New evidence from real-world data in patients with type 2 diabetes. Diabetes, Obesity and Metabolism 2016. 13. Montvida O, Arandjelović O, Reiner E, Paul SK. Data Mining Approach to Estimate the Duration of Drug Therapy from Longitudinal Electronic Medical Records. Open Bioinformatics Journal 2017; 10: 1-15. 14. Thomas G, Klein K, Paul S. Statistical challenges in analysing large longitudinal patient-level data: the danger of misleading clinical inferences with imputed data. J Indian Soc Agric Stat 2014; 68: 39-54. 15. Rotnitzky A, Robins JM. Inverse probability weighting in survival analysis. Encyclopedia of Biostatistics 2005. 16. Austin PC, Stuart EA. The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes. Statistical methods in medical research 2015: 0962280215584401. 17. Austin PC. Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis. Statistics in Medicine 2016; 35(30): 5642-55.

Page 17 of 25

139 18. Salami JA, Warraich H, Valero-Elizondo J, et al. National trends in statin use and expenditures in the US adult population from 2002 to 2013: insights from the Medical Expenditure Panel Survey. Jama cardiology 2017; 2(1): 56-65. 19. Abdallah MS, Kosiborod M, Tang F, et al. Patterns and predictors of intensive statin therapy among patients with diabetes mellitus after acute myocardial infarction. American Journal of Cardiology 2014; 113(8): 1267-72. 20. Montvida O, Shaw J, Blonde L, Paul SK. Long-term sustainability of glycaemic achievements with second-line anti-diabetic therapies in patients with type 2 diabetes: A real- world study. Diabetes, Obesity and Metabolism 2018. 21. Brixner D, McAdam-Marx C, Ye X, et al. Six-month outcomes on A1C and cardiovascular risk factors in patients with type 2 diabetes treated with exenatide in an ambulatory care setting. Diabetes, Obesity and Metabolism 2009; 11(12): 1122-30. 22. Brixner D, Said Q, Kirkness C, Oberg B, Ben-Joseph R, Oderda G. Assessment of cardiometabolic risk factors in a national primary care electronic health record database. Value in health 2007; 10: S29-S36. 23. Owusu Adjah ES, Montvida O, Agbeve J, Paul SK. Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus. The Open Bioinformatics Journal 2017; 10(1).

Page 18 of 25

140 Table 1: Cohort characteristics at the time of second-line anti-diabetic drug initiation.

All No history of History of CVD CVD N 276,884 216,567 60,317 Age, years* 59 (12) 57 (12) 64 (9) Male† 136,918 (49) 99,907 (46) 37,011 (61) White† 194,758 (70) 149,180 (69) 45,578 (76) Black† 32,671 (12) 27,274 (13) 5,397 (9) Time from metformin initiation, 7.5 (15.7) 7.1 (15.2) 9.0 (17.4) months* Follow-up, years* 3.7 (2.4) 3.7 (2.5) 3.6 (2.4) Follow-up ≥ 12 months† 247,223 (89) 193,092 (89) 54,131 (90) Follow-up ≥ 24 months† 191,883 (69) 149,833 (69) 42,050 (70) Therapy duration, months* 33 (25) 33 (25) 33 (24) HbA1c, %* 8.4 (1.9) 8.5 (1.9) 8.1 (1.7) HbA1c ≥ 7.5%† 102,624 (60) 84,835 (61) 17,789 (54) Weight, kg* 99 (25) 100 (25) 97 (23) BMI, kg/m2 * 35 (8) 35 (8) 33 (7) BMI<25 kg/m2† 18,819 (7) 13,735 (7) 5,084 (9) BMI ≥25 and <30 kg/m2† 60,575 (23) 44,963 (22) 15,612 (27) BMI ≥ 30 kg/m2† 187,936 (70) 150,067 (72) 37,869 (65) SBP, mmHg* 131 (15) 131 (15) 130 (16) Uncontrolled SBP† 82,837 (30) 53,168 (25) 29,669 (50) DBP, mmHg* 77 (9) 78 (9) 75 (9) LDL, mg/dL* 97 (35) 100 (35) 87 (34) Uncontrolled LDL† 71,424 (50) 51,077 (46) 20,347 (67) HDL, mg/dL*` 43 (12) 44 (12) 42 (12) Triglycerides, mg/dL‡ 147 (107, 197) 148 (107, 198) 146 (107, 195) Triglycerides ≥ 150 mg/dL† 54,640 (48) 43,240 (49) 11,400 (48) Chronic kidney disease† 9,602 (3) 5,793 (3) 3,809 (6) Cancer† 13,750 (5) 9,951 (5) 3,799 (6) Depression† 38,444 (14) 29,996 (14) 8,448 (14) Charlson Comorbidity Index* 1.6 (1.1) 1.4 (0.9) 2.4 (1.4) Any lipid modifying drug† 188,272 (68) 137,391 (63) 50,881 (84) Statin† 168,485 (61) 121,287 (56) 47,198 (78) Blood pressure lowering drug† 224,086 (81) 167,177 (77) 56,909 (94) *mean (sd); †n(%); ‡median (IQR);

§Uncontrolled SBP: ≥ 130 /140 mmHg for those with/ without a history of cardiovascular disease; ||Uncontrolled LDL: ≥ 70 /100 mg/dL for those with/ without history cardiovascular disease.

Page 19 of 25

141 Table 2: Proportions (95% CI) of those who failed to control individual risk factors* and proportions (95% CI) of those who failed to control two risk factors simultaneously at 6, 12, and 24 months post second-line anti-diabetic drug initiation, 6 months 12 months 24 months Individual Failure HbA1c 37 (36, 37) 39 (39, 39) 42 (41, 42) LDL 43 (43, 43) 43 (43, 43) 42 (41, 42) Triglycerides 46 (45, 46) 46 (45, 46) 45 (44, 45) SBP 31 (30, 31) 31 (30, 31) 30 (30, 30) Simultaneous Failure HbA1c + LDL 61 (60, 61) 62 (62, 62) 63 (62, 63) HbA1c + Triglycerides 61 (61, 61) 62 (62, 62) 63 (62, 63) HbA1c + SBP 53 (53, 54) 55 (55, 55) 57 (57, 57)

*Control: HbA1c <7.5%; triglycerides <150 mg/dL; SBP < 130/140 mmHg for those with/ without a history of cardiovascular disease; LDL< 70 /100 mg/dL for those with/ without history cardiovascular disease. LDL, Triglycerides and SBP proportions are calculated among users of lipid modifying and blood pressure lowering drugs.

Page 20 of 25

142 Table 3: By groups of second-line anti-diabetic drug, adjusted probability (95% CI) of failure to simultaneously control* both risk factors. 6 months 12 months 24 months HbA1c and LDL MET+SU 0.62 (0.62, 0.63) 0.64 (0.64, 0.65) 0.65 (0.64, 0.65) MET+TZD 0.57 (0.56, 0.58) 0.55 (0.54, 0.56) 0.55 (0.54, 0.57) MET+INS 0.66 (0.65, 0.67) 0.67 (0.66, 0.68) 0.69 (0.68, 0.70) MET+GLP-1RA 0.56 (0.54, 0.58) 0.54 (0.52, 0.56) 0.55 (0.53, 0.58) MET+DPP4 0.58 (0.57, 0.59) 0.58 (0.58, 0.59) 0.59 (0.58, 0.60) HbA1c and Triglycerides MET+SU 0.60 (0.59, 0.61) 0.62 (0.61, 0.62) 0.63 (0.62, 0.64) MET+TZD 0.53 (0.51, 0.54) 0.52 (0.50, 0.53) 0.52 (0.50, 0.53) MET+INS 0.62 (0.61, 0.64) 0.65 (0.63, 0.66) 0.65 (0.63, 0.66) MET+GLP-1RA 0.54 (0.51, 0.57) 0.54 (0.50, 0.57) 0.56 (0.52, 0.60) MET+DPP4 0.54 (0.53, 0.55) 0.56 (0.54, 0.57) 0.56 (0.55, 0.57) HbA1c and SBP MET+SU 0.55 (0.55, 0.56) 0.57 (0.57, 0.58) 0.60 (0.59, 0.60) MET+TZD 0.48 (0.47, 0.49) 0.47 (0.46, 0.48) 0.48 (0.47, 0.50) MET+INS 0.59 (0.58, 0.60) 0.61 (0.60, 0.62) 0.64 (0.63, 0.65) MET+GLP-1RA 0.47 (0.45, 0.49) 0.48 (0.46, 0.49) 0.49 (0.47, 0.51) MET+DPP4 0.49 (0.48, 0.49) 0.49 (0.48, 0.50) 0.51 (0.50, 0.52)

*Control: HbA1c <7.5%; triglycerides <150 mg/dL; SBP < 130/140 mmHg for those with/ without a history of cardiovascular disease; LDL< 70 /100 mg/dL for those with/ without history cardiovascular disease. LDL, Triglycerides and SBP proportions are calculated among users of lipid modifying and blood pressure lowering drugs. Second-line treatment groups balanced on baseline risk factor measurements, analyses adjusted for sex, duration of diabetes, baseline age and body weight.

Page 21 of 25

143 Figure 1: Proportion of continuously uncontrolled patients over two years post second-line anti-diabetic drug initiation, by the year of drug initiation (A-C) and by drug class (D-F).

(A) (B) (C)

(D) (E) (F)

* Uncontrolled measures (at 6 months OR at 12 months) AND (at 18 months OR at 24 months). LDL, Triglycerides and SBP proportions are calculated among those who were using lipid modifying and blood pressure lowering drugs prior to or within 12 months of second-line anti-diabetic drug initiation.

Page 22 of 25

Table 4: By groups of second-line anti-diabetic drug and by CVD status at the time of second-line anti-diabetic drug initiation, number of failures and rates (95% CIs ) per 1000 person years of heart failure, myocardial infarction, stroke and their composite.

Without baseline CVD With baseline CVD Failures Rate Failures Rate Heart Failure, Myocardial Infarction, or Stroke MET+SU 10,781 26.1 (25.7, 26.6) 9,157 87.9 (86.1, 89.7) MET+TZD 2,165 20.2 (19.4, 21.1) 1,416 69.1 (65.6, 72.8) MET+INS 2,809 32.4 (31.2, 33.6) 2,491 103.5 (99.5, 107.6) MET+GLP-1RA 480 14.1 (12.9, 15.4) 307 59.4 (53.1, 66.4) MET+DPP4 2,203 19.3 (18.5, 20.2) 1,952 76.3 (73.0, 79.7) Heart Failure MET+SU 4,835 11.3 (11.0, 11.6) 4,666 39.9 (38.8, 41.0) MET+TZD 916 8.2 (7.7, 8.8) 610 26.0 (24.0, 28.1) MET+INS 1,357 15.0 (14.2, 15.8) 1,336 49.4 (46.8, 52.1) MET+ GLP- 179 5.1 (4.4, 6.0) 134 24.0 (20.3, 28.4) 1RA MET+DPP4 791 6.8 (6.3, 7.3) 898 32.2 (30.1, 34.4) Myocardial Infarction MET+SU 1,429 3.3 (3.1, 3.4) 1,511 12.2 (11.6, 12.8) MET+TZD 284 2.5 (2.2, 2.8) 238 9.7 (8.6, 11.0) MET+INS 366 4.0 (3.6, 4.4) 386 13.4 (12.1, 14.8) MET+ GLP- 63 1.8 (1.4, 2.3) 49 8.4 (6.4, 11.2) 1RA MET+DPP4 229 1.9 (1.7, 2.2) 307 10.6 (9.5, 11.9) Stroke MET+SU 5,925 13.9 (13.6, 14.3) 4,548 39.4 (38.3, 40.6) MET+TZD 1,232 11.2 (10.6, 11.8) 815 36.3 (33.8, 38.8) MET+INS 1,415 15.7 (14.9, 16.6) 1,210 44.8 (42.4, 47.4) MET+ GLP- 276 8.0 (7.1, 9.0) 167 30.2 (26.0, 35.2) 1RA MET+DPP4 1,318 11.4 (10.8, 12.0) 1,011 36.8 (34.6, 39.1)

Page 23 of 25

145 Supplementary Figure 1: Flowchart of study cohorts

Patients with non‐missing sex and age (n=34,299,123)

Diabetes Mellitus (n=2,893,321)

Type 2 Diabetes (n=2,624,954)

Age at diagnosis ≥ 18 and <80 (n=2,590,853)

Diabetes Diagnosis after entry to the EMR (n=1,412,938)

Diabetes Diagnosis on or after Jan 1, 2005 (n=1,305,686)

Metformin as first‐line (n=740,478)

Initiated second‐line (n=347,735) SU DPP‐4i GLP‐1RA INS TZD (n=187,819) (n=61,508 (n=15,448 (n=49,939 (n=33,021

Baseline HbA1c, SBP, Triglycerides, or LDL (n=322,630)

Therapy duration ≥ 3 months and follow‐up ≥ 6 months

(n=276,884)

History of CVD No history of CVD (n=60,317) (n=216,567)

STUDY COHORT

Follow‐up at least 12 months (n=247,223) History of CVD No history of CVD (n=54,131) (n=193,092) STUDY SUB‐COHORT 1

Follow‐up at least 24 months (n=191,883)

History of CVD No history of CVD (n=42,050) (n=149,833) STUDY SUB‐COHORT 2

Page 24 of 25

146 Supplementary Figure 2: Among uncontrolled* patients at baseline, proportion (95% CI)

of those who failed to control† individual risk factors at 6-,12-, and 24- months post second-

line anti-diabetic drug initiation.

*Uncontrolled: HbA1c ≥7.5 and ≤ 9%; triglycerides ≥ 150 mg/dL; SBP ≥130/140 mmHg for those with/ without a history of cardiovascular disease; LDL ≥ 70 /100 mg/dL for those with/ without history cardiovascular disease. †Control: HbA1c <7.5%; triglycerides <150 mg/dL; SBP < 130/140 mmHg for those with/ without a history of cardiovascular disease; LDL< 70 /100 mg/dL for those with/ without history cardiovascular disease. LDL, Triglycerides and SBP proportions are calculated among users of lipid modifying and blood pressure lowering drugs.

Page 25 of 25

147 Chapter 10: Discussion and Conclusions

The novelty of this research project includes a holistic evaluation of large representative EMR to conduct much needed pharmaco-epidemiological comparative effectiveness and outcome studies in patients with T2DM treated with different older and newer classes of anti-diabetic therapies, while also addressing the methodological issues to ensure our ability to draw robust inferences from such epidemiological studies based on EMRs. This thesis provides a detailed exploration of real-world cardio-metabolic effects of treatment with incretin-based therapies in patients with T2DM. Six pharmaco-epidemiological and three methodological studies were conducted over three years and multiple important findings were reported in prestigious high impact research journals.

Using large EMRs for clinical studies is a comparatively recent, and rapidly developing research direction that requires specialists in health informatics to have both outstanding data management skills as well as a deep understanding of the clinical question being studied. Three methodological studies conducted as part of this thesis have direct implications on the data quality used for the dissertation’s clinical analyses, and also can potentially improve the quality of the data in future EMR based clinical and pharmaco-epidemiological research, leading to reliable inferences drawn from individual studies.

The first methodological study (Chapter 4) focused on the analytical challenges associated with the dynamics of prescriptions with different drugs, and developed two algorithms to estimate the duration of treatment with specific drugs of interest. These approaches were compared and tested on their ability to capture interchanges between therapies during the course of treatment. The proposed algorithm was shown to be a reliable and effective tool to extract and aggregate information on medication data at an individual patient level, which makes it a valuable tool for use in future research.

The second methodological study (Chapter 5 and Appendix B) discussed data mining challenges associated with the extraction of diseased cohorts, some of which are not straightforward and potentially may be unnoticed or indicated at the late stages of statistical analyses. Diagnostic codes, clinically guided algorithms and machine learning approaches were simultaneously employed to ensure the choice of a robust cohort of patients with diabetes for observational studies with a reduced selection bias.

148 Missing data is a pervasive problem with all prospective and retrospective observational studies (including EMRs), posing challenges in terms of efficient design and analyses of clinical effectiveness studies. Compared to the missing data in RCTs, the patterns and mechanisms behind the missing risk factor or outcome data in EMRs are very complex and difficult to ascertain. Robust imputation of missing data relies on the understanding of the predictors of missingness in the risk factor data, especially in patients with chronic diseases. The third methodological study (Chapter 6) compared three approaches based on the Multiple Imputation technique in terms of their robustness in imputing data in patients with diabetes. A novel component of this study is the investigation of the likelihood of missingness of follow-up risk factor measures (HbA1c) with patients’ demographic and clinical characteristics (age, sex, pre- existing comorbidities and disease severity). While all three imputation techniques were able to provide consistent and reliable clinical inferences under unknown patterns of missingness, this study demonstrated that complete case analyses were prone to bias by indication and highlights the importance of missing risk factor data imputation.

This dissertation provides detailed explorations of the population-level disease management patterns. Firstly, it provides detailed assessments of changes in the choices of first-, second-, and third- line ADDs over last 10 years (Chapter 7). It also explores the glycaemic state, clinical characteristics and comorbidities at the time of first-line and second-line therapy initiation by ADD classes. This dissertation clearly demonstrates that the therapeutic inertia problem exists at the population level with 50 - 60% of patients having HbA1c above 7.5% at first- and second- line therapy initiation. The long-term consequences of not intensifying treatment when needed on the glycaemic and CV risk were shown by Paul and Khunti [90, 178, 179]. In this context, the study that explored the combination of GLP-1RA and insulin (Appendix A) is of high importance. It was demonstrated that patients who are not adequately controlled on GLP-1RA would benefit from an early addition of insulin (compared to switching) in terms of long-term cardio-metabolic outcomes. In fact, those who intensified the therapy with insulin later, ended up at the same high HbA1c level after 2 years of therapy initiation with GLP-1RA. The study that reported that obese patients who initiate insulin do not increase body weight (Appendix C) brings additional reassurance to these patients, as most of the patients who initiate with GLP- 1RA are obese with mean body weight of 109 kg (Chapters 7-9).

Due to complex study designs, results of RCTs are difficult to compare and individual patient choices on therapy intensification becomes very complex. While only one large RCT has been designed to compare the glycaemic outcomes of treatment with major ADDs (GRADE -

149 completion expected in 2020), the study presented in chapter 8 provides much needed estimates of adjusted probabilities to achieve clinically desirable glycaemic control with major second- line ADDs, and the probabilities to sustain such glycaemic achievements over 2 years, with and without the need for third-line therapy intensification. A highly valuable contribution of this study is the assessment of glycaemic control sustainability in patients treated with major second-line ADDs. Notably, the sustainability of achieved control would not be ethical to assess using a RCT. Similarly, the long-term glycaemic and CV risk factor burden in patients with T2DM who are already using intensified anti-diabetic and cardio-protective therapies, was not reported to date (Chapter 9). With more than a third of patients having consistently uncontrolled HbA1c, lipid and SBP levels, 3 out of 5 have uncontrolled HbA1c and at least one CV risk factor over 2 years post intensification. The results reported in chapter 9 provide an explanation for the non-declining rates of CV events among patients with T2DM – an issue that is of major concern for health and government authorities globally.

Extensions of this dissertation include both methodological and clinical directions. The data quality and linkages of registry data are improving with time, and it becomes possible to estimate a patient's adherence to prescribed medications in the real-world setting. Such calculations are methodologically challenging and will provide only rough estimates. Nonetheless, such studies are essential in order to understand population level patterns of adherence - given the increased cardio-metabolic burden described in this dissertation and other studies. Much needed methodological studies include exploring the variability of study outcomes under different study designs and statistical methodologies. For example, assessing supplementary hypotheses while working on this dissertation, it was observed that exposure to insulin is associated with an increased risk of acute pancreatitis in patients with T2DM. However when events that occurred during the first 6 months of baseline were excluded, the risk was no longer significant (Appendix D).

Exploration of efficient designs for pharmaco-epidemiological effectiveness studies in chronic diseases is an important future direction, along with research towards improving statistical techniques for advanced analyses to account for the longitudinal non-linear risk factor interactions in real-world scenarios. While the “STrengthening the Reporting of OBservational studies in Epidemiology’’ (STROBE) statement is guiding through the methodological aspects in observational studies, there are no standardised protocols to conduct such studies till date [180].

150 The pharmaco-epidemiological extension directions include assessment of cardio-metabolic outcomes of combining incretin-based therapies and SGLT-2i, the latest ADD class. This dissertation was not designed to explore glycaemic and CV outcomes of treatment with SGLT- 2i alone, neither to assess outcomes of combining incretin-based therapies with SGLT-2i due to very limited data in the initial CEMR extract. As it was shown in the chapter 7, the popularity of DPP-4i and SGLT-2i is dramatically increasing, therefore assessments of novel drug combinations present particular interest. Chapter 7 also reports higher discontinuation rates of novel therapies compared to the older alternatives, which may be attributable to side-effects and also to higher medication costs. More detailed assessments of the underlying reasons for medication cessation and the long-term outcomes of such discontinuations also present an opportunity for future studies. As it was reported in this dissertation, even though most patients with T2DM are using lipid modifying and blood pressure lowering drugs, many do not meet LDL, Triglycerides, or SBP targets. There is a paucity of studies assessing outcomes of therapeutic inertia associated with non-intensifying lipid or blood pressure lowering therapies when needed. The numerical estimates of the complications associated with therapeutic inertia could motivate clinicians and patients towards more pro-active / aggressive CV risk factor management. The direct extension of the studies reported in chapters 8-9 is the development of an algorithm that would estimate the probabilities of achieving and sustaining cardio- metabolic risk factor control with different glucose, lipid, and blood pressure lowering drug intensification options under the given (current) risk factor profile of an individual patient. The algorithm could be implemented as an open-source tool or integrated into the existing EMR systems. Such a patient-centred approach would help health care professionals to make therapy intensification choices in the most informed manner to maximise long-term cardio-metabolic benefits while reducing diabetes related complications in a cost-effective way. The tool would also help to involve and engage a patient in the decision making process without the need to assess a huge amount of clinical literature. Finally, using this tool, health economists could evaluate cost-benefits related to the therapy choice and risk-factor control at national levels.

Challenges and limitations of EMR-based studies were discussed in the introductory chapter (Subsection 1.9.2), whereas particular concerns associated with individual studies were discussed in each chapter separately. Representativeness of the CEMR database and of patients with diabetes are discussed in the subsections 3.1 and 5.4 respectively. In general, it was observed that the reported population-level processes are comparable to studies using national survey data. Compared to surveys, EMR data does not suffer from selective nonresponse,

151 response and recall biases [99], but are biased towards unhealthy populations and commercially insured individuals. Comparative analyses conducted during this dissertation were carefully balanced on baseline characteristics between treatment groups and appropriately adjusted on various confounders including demographics, existing comorbidities, and longitudinal clinical and medication data. Nonetheless, non-availability of data to control for patients’ social- economic status, diet and exercise complicates direct causality interpretation.

To summarise, this thesis highlights the existing glycaemic and cardiovascular risk factor burden at the population level. Treatment with incretin-based therapies and thiazolidinedione provides higher chances to achieve and sustain a glycaemic and CV risk factor control, compared to sulfonylurea and insulin. A residual benefit on the risk of major adverse cardiovascular events was observed among patients treated with GLP-1RA compared to other major ADD choices. Nonetheless, patient-centred disease management to holistically control for glycaemic and cardiovascular risk factors remains a key aspect to improve long-term outcomes in patients with T2DM.

152 Bibliography

1. American Diabetes Association, Standards of Medical Care in Diabetes—2018. Diabetes Care, 2018. 41(Supplement 1): p. S4. 2. International Diabetes Federation, IDF Diabetes Atlas 6th edition. 2014, Brussel, Belgium: International Diabetes Federation. 3. World Health Organization, Definition, diagnosis and classification of diabetes mellitus and its complications: report of a WHO consultation. Part 1, Diagnosis and classification of diabetes mellitus. 1999. 4. Lloyd-Jones, D., et al., Executive summary: Heart disease and stroke statistics-2010 update: A report from the american heart association. Circulation, 2010. 121(7): p. e46-e215. 5. American Diabetes Association, Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care, 2003. 26(1): p. S5-S20. 6. Benjamin, M., Miller-Keane Encyclopedia and Dictionary of Medicine. Nursing and Allied Health. Philadelphia: Saunders, 1997. 7. Zimmet, P.Z., Diabetes and its drivers: the largest epidemic in human history? Clin Diabetes Endocrinol, 2017. 3: p. 1. 8. International Diabetes Federation, IDF Diabetes Atlas, 8 edn. Brussels, Belgium, 2017. 9. Riddle, M.C. and W.H. Herman, The Cost of Diabetes Care—An Elephant in the Room. Diabetes Care, 2018. 41(5): p. 929-932. 10. American Diabetes Association, Economic Costs of Diabetes in the US in 2017. Diabetes Care, 2018. 41(5): p. 917-928. 11. Magliano, D.J., et al., The productivity burden of diabetes at a population level. Diabetes care, 2018. 41(5): p. 979-984. 12. Bommer, C., et al., Global economic burden of diabetes in adults: projections from 2015 to 2030. Diabetes care, 2018. 41(5): p. 963-970. 13. Wild, S., et al., Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care, 2004. 27(5): p. 1047-53. 14. Zheng, Y., S.H. Ley, and F.B. Hu, Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol, 2018. 14(2): p. 88-98. 15. Hu, F.B., Globalization of diabetes: the role of diet, lifestyle, and genes. Diabetes care, 2011. 34(6): p. 1249-1257. 16. Paul, S.K., et al., Comparison of body mass index at diagnosis of diabetes in a multi-ethnic population: A case-control study with matched non-diabetic controls. Diabetes Obes Metab, 2017. 19(7): p. 1014-1023. 17. Xu, Y., et al., Prevalence and control of diabetes in Chinese adults. Jama, 2013. 310(9): p. 948- 59.

153 18. Rayanagoudar, G., et al., Quantification of the type 2 diabetes risk in women with gestational diabetes: a systematic review and meta-analysis of 95,750 women. Diabetologia, 2016. 59(7): p. 1403-1411. 19. Wendland, E.M., et al., Gestational diabetes and pregnancy outcomes--a systematic review of the World Health Organization (WHO) and the International Association of Diabetes in Pregnancy Study Groups (IADPSG) diagnostic criteria. BMC Pregnancy Childbirth, 2012. 12: p. 23. 20. Bellamy, L., et al., Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis. Lancet, 2009. 373(9677): p. 1773-9. 21. Clausen, T.D., et al., High prevalence of type 2 diabetes and pre-diabetes in adult offspring of women with gestational diabetes mellitus or type 1 diabetes: the role of intrauterine hyperglycemia. Diabetes Care, 2008. 31(2): p. 340-6. 22. Turnbull, F.M., et al., Intensive glucose control and macrovascular outcomes in type 2 diabetes. Diabetologia, 2009. 52(11): p. 2288-98. 23. Stratton, I.M., et al., Association of glycaemia with macrovascular and microvascular complications of type 2 diabetes (UKPDS 35): prospective observational study. Bmj, 2000. 321(7258): p. 405-12. 24. Zoungas, S., et al., Effects of intensive glucose control on microvascular outcomes in patients with type 2 diabetes: a meta-analysis of individual participant data from randomised controlled trials. Lancet Diabetes Endocrinol, 2017. 5(6): p. 431-437. 25. Mannucci, E., et al., Is Glucose Control Important for Prevention of Cardiovascular Disease in Diabetes? Diabetes Care, 2013. 36(Suppl 2): p. S259-S263. 26. Holman, R.R., et al., 10-year follow-up of intensive glucose control in type 2 diabetes. NEJM, 2008. 359(15): p. 1577-89. 27. Holman, R.R., et al., Long-term follow-up after tight control of blood pressure in type 2 diabetes. NEJM, 2008. 359(15): p. 1565-76. 28. Zimmet, P., Preventing diabetic complications: a primary care perspective. Diabetes research and clinical practice, 2009. 84(2): p. 107-116. 29. Cade, W.T., Diabetes-related microvascular and macrovascular diseases in the physical therapy setting. Physical therapy, 2008. 88(11): p. 1322-1335. 30. Bourne, R.R., et al., Causes of vision loss worldwide, 1990–2010: a systematic analysis. The lancet global health, 2013. 1(6): p. e339-e349. 31. UK Prospective Diabetes Study (UKPDS) Group, Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). The Lancet, 1998. 352(9131): p. 854-865. 32. Patel, A., et al., Intensive blood glucose control and vascular outcomes in patients with type 2 diabetes. New England Journal of Medicine, 2008. 358(24): p. 2560-2572. 33. Skyler, J.S., et al., Intensive Glycemic Control and the Prevention of Cardiovascular Events: Implications of the ACCORD, ADVANCE, and VA Diabetes Trials. A Position Statement of the American Diabetes Association and a Scientific Statement of the American College of Cardiology Foundation and the American Heart Association. Journal of the American College of Cardiology, 2009. 53(3): p. 298-304.

154 34. de Boer, I.H., et al., Long-term renal outcomes of patients with type 1 diabetes mellitus and microalbuminuria: an analysis of the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications cohort. Arch Intern Med, 2011. 171(5): p. 412- 20. 35. Fox, C.S., et al., Increasing cardiovascular disease burden due to diabetes mellitus: the Framingham Heart Study. Circulation, 2007. 115(12): p. 1544-50. 36. American Heart Association, Diabetes mellitus: a major risk factor for cardiovascular disease. 1999: American Heart Association. 37. Booth, G.L., et al., Relation between age and cardiovascular disease in men and women with diabetes compared with non-diabetic people: a population-based retrospective cohort study. Lancet, 2006. 368(9529): p. 29-36. 38. Booth, G.L., et al., Recent trends in cardiovascular complications among men and women with and without diabetes. Diabetes Care, 2006. 29(1): p. 32-7. 39. Benjamin, E.J., et al., Heart Disease and Stroke Statistics-2017 Update: A Report From the American Heart Association. Circulation, 2017. 135(10): p. e146-e603. 40. Goff, D.C., et al., Prevention of cardiovascular disease in persons with type 2 diabetes mellitus: current knowledge and rationale for the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial. American Journal of Cardiology, 2007. 99(12): p. S4-S20. 41. Shah, A.D., et al., Type 2 diabetes and incidence of cardiovascular diseases: a cohort study in 1ꞏ 9 million people. The lancet Diabetes & endocrinology, 2015. 3(2): p. 105-113. 42. Del Prato, S., Megatrials in type 2 diabetes. From excitement to frustration? Diabetologia, 2009. 52(7): p. 1219-26. 43. Ismail-Beigi, F., et al., Effect of intensive treatment of hyperglycaemia on microvascular outcomes in type 2 diabetes: an analysis of the ACCORD randomised trial. The Lancet, 2010. 376(9739): p. 419-430. 44. Fox, C.S., et al., Update on prevention of cardiovascular disease in adults with type 2 diabetes mellitus in light of recent evidence: A scientific statement from the American Heart Association and the American Diabetes Association. Circulation, 2015. 132(8): p. 691-718. 45. Khunti, K., M. Kosiborod, and K.K. Ray, Legacy benefits of blood glucose, blood pressure and lipid control in people with diabetes and cardiovascular disease: Time to overcome multifactorial therapeutic inertia? Diabetes, Obesity and Metabolism, 2018. 46. Holman, R.R., et al., 10-year follow-up of intensive glucose control in type 2 diabetes. N Engl J Med, 2008. 359(15): p. 1577-89. 47. Diabetes Atlas, International diabetes federation. Press Release, Cape Town, South Africa, 2006. 4. 48. Inzucchi, S.E., et al., Management of hyperglycemia in type 2 diabetes: a patient-centered approach: position statement of the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care, 2012. 35(6): p. 1364-79. 49. Turner, R.C., et al., Glycemic control with diet, sulfonylurea, metformin, or insulin in patients with type 2 diabetes mellitus: progressive requirement for multiple therapies (UKPDS 49). UK Prospective Diabetes Study (UKPDS) Group. Jama, 1999. 281(21): p. 2005-12.

155 50. Nathan, D.M., et al., Medical Management of Hyperglycemia in Type 2 Diabetes: A Consensus Algorithm for the Initiation and Adjustment of Therapy. Diabetes Care, 2009. 32(1): p. 193- 203. 51. American Diabetes Association, Standards of Medical Care in Diabetes—2011. Diabetes Care, 2011. 34(Supplement 1): p. S11-S61. 52. Adler, A.I., et al., Newer agents for blood glucose control in type 2 diabetes: summary of NICE guidance. BMJ, 2009. 338. 53. Ross, S.A. and J.-M. Ekoé, Incretin agents in type 2 diabetes. Canadian Family Physician, 2010. 56(7): p. 639-648. 54. Hædersdal, S., et al. The Role of Glucagon in the Pathophysiology and Treatment of Type 2 Diabetes. in Mayo Clinic Proceedings. 2018. Elsevier. 55. Drucker, D.J. and A.B. Goldfine, Cardiovascular safety and diabetes drug development. The Lancet, 2011. 377(9770): p. 977-979. 56. Garber, A.J., Novel GLP-1 receptor agonists for diabetes. Expert Opinion on Investigational Drugs, 2012. 21(1): p. 45-57. 57. Holst, J.J., The physiology of glucagon-like peptide 1. Physiological Reviews, 2007. 87(4): p. 1409-1439. 58. Nauck, M.A., et al., Incretin-based therapies: viewpoints on the way to consensus. Diabetes Care, 2009 32(Suppl 2): p. S223-231. 59. Stonehouse, A.H., T. Darsow, and D.G. Maggs, Incretin-Based Therapies. Journal of Diabetes, 2011: p. In press. 60. Baggio, L.L. and D.J. Drucker, Biology of Incretins: GLP-1 and GIP. Gastroenterology, 2007. 132(6): p. 2131-2157. 61. Charbonnel, B. and B. Cariou, Pharmacological management of type 2 diabetes: the potential of incretin-based therapies. Diabetes, Obesity and Metabolism, 2011. 13(2): p. 99-117. 62. Smilowitz, N.R., R. Donnino, and A. Schwartzbard, Glucagon-Like Peptide-1 Receptor Agonists for Diabetes Mellitus: A Role in Cardiovascular Disease. Circulation, 2014. 129(22): p. 2305-2312. 63. Scheen, A.J., Cardiovascular outcome studies with incretin-based therapies: comparison between DPP-4 inhibitors and GLP-1 receptor agonists. diabetes research and clinical practice, 2017. 127: p. 224-237. 64. Nauck, M.A., et al., Incretin-based therapies: viewpoints on the way to consensus. Diabetes care, 2009. 32(suppl 2): p. S223-S231. 65. Neumiller, J.J., Incretin-based therapies. Medical Clinics, 2015. 99(1): p. 107-129. 66. Stonehouse, A.H., T. Darsow, and D.G. Maggs, Incretin‐based therapies. Journal of diabetes, 2012. 4(1): p. 55-67. 67. Gough, S., Handbook of Incretin-based Therapies in Type 2 Diabetes. 2016: Springer. 68. Madsbad, S., Review of head‐to‐head comparisons of glucagon‐like peptide‐1 receptor agonists. Diabetes, Obesity and Metabolism, 2016. 18(4): p. 317-332.

156 69. Munir, K.M. and E.M. Lamos, Diabetes type 2 management: what are the differences between DPP-4 inhibitors and how do you choose? 2017, Taylor & Francis. 70. Craddy, P., H.-J. Palin, and K.I. Johnson, Comparative Effectiveness of Dipeptidylpeptidase-4 Inhibitors in Type 2 Diabetes: A Systematic Review and Mixed Treatment Comparison. Diabetes Therapy, 2014. 5(1): p. 1-41. 71. Keshavarz, K., et al., Linagliptin versus sitagliptin in patients with type 2 diabetes mellitus: a network meta-analysis of randomized clinical trials. DARU Journal of Pharmaceutical Sciences, 2017. 25(1): p. 23. 72. Sivertsen, J., et al., The effect of glucagon-like peptide 1 on cardiovascular risk. Nat Rev Cardiol, 2012. 9(4): p. 209-22. 73. Mora, P.F. and E.L. Johnson, Cardiovascular Outcome Trials Of The Incretin-Based Therapies: What Do We Know So Far? Endocr Pract, 2017. 23(1): p. 89-99. 74. Ha, S.J., et al., Preventive Effects of Exenatide on Endothelial Dysfunction Induced by Ischemia-Reperfusion Injury via KATP Channels. Arteriosclerosis, Thrombosis, and Vascular Biology, 2012. 32(2): p. 474-480. 75. Nikolaidis, L.A., et al., Effects of Glucagon-Like Peptide-1 in Patients With Acute Myocardial Infarction and Left Ventricular Dysfunction After Successful Reperfusion. Circulation, 2004. 109(8): p. 962-965. 76. Ban, K., et al., Cardioprotective and Vasodilatory Actions of Glucagon-Like Peptide 1 Receptor Are Mediated Through Both Glucagon-Like Peptide 1 Receptor–Dependent and –Independent Pathways. Circulation, 2008. 117(18): p. 2340-2350. 77. Chilton, R., et al., Cardiovascular Comorbidities of Type 2 Diabetes Mellitus: Defining the Potential of Glucagonlike peptide–1-Based Therapies. The American Journal of Medicine, 2011. 124(1, Supplement): p. S35-S53. 78. Song, X., et al., Anti-atherosclerotic effects of the glucagon-like peptide-1 (GLP-1) based therapies in patients with type 2 Diabetes Mellitus: A meta-analysis. Scientific reports, 2015. 5. 79. Noyan-Ashraf, M.H., et al., GLP-1R Agonist Liraglutide Activates Cytoprotective Pathways and Improves Outcomes After Experimental Myocardial Infarction in Mice. Diabetes, 2009. 58(4): p. 975-983. 80. Bunck, M.C., et al., One-year treatment with exenatide vs. Insulin Glargine: Effects on postprandial glycemia, lipid profiles, and oxidative stress. Atherosclerosis, 2010. 212(1): p. 223-229. 81. Schwartz, E.A., et al., Exenatide suppresses postprandial elevations in lipids and lipoproteins in individuals with impaired glucose tolerance and recent onset type 2 diabetes mellitus. Atherosclerosis, 2010. 212(1): p. 217-222. 82. Buse, J.B., et al., Switching to Once-Daily Liraglutide From Twice-Daily Exenatide Further Improves Glycemic Control in Patients With Type 2 Diabetes Using Oral Agents. Diabetes Care, 2010. 33(6): p. 1300-1303. 83. Vilsbøll, T., et al., Effects of glucagon-like peptide-1 receptor agonists on weight loss: systematic review and meta-analyses of randomised controlled trials. BMJ, 2012. 344.

157 84. Sun, F., et al., Effect of glucagon-like peptide-1 receptor agonists on lipid profiles among type 2 diabetes: a systematic review and network meta-analysis. Clin Ther, 2015. 37(1): p. 225- 241.e8. 85. Katout, M., et al., Effect of GLP-1 mimetics on blood pressure and relationship to weight loss and glycemia lowering: results of a systematic meta-analysis and meta-regression. Am J Hypertens, 2014. 27(1): p. 130-9. 86. Food and Drug Administration. Use of Real-World Evidence To Support Regulatory Decision- Making for Medical Devices; Guidance for Industry and Food and Drug Administration Staff. 2017; Available from: https://www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocu ments/ucm513027.pdf. 87. Sherman, R.E., et al., Real-world evidence—what is it and what can it tell us. N Engl J Med, 2016. 375(23): p. 2293-2297. 88. Franklin, J.M. and S. Schneeweiss, When and how can real world data analyses substitute for randomized controlled trials? Clinical Pharmacology & Therapeutics, 2017. 102(6): p. 924- 933. 89. Paul, S.K., et al., Delay in treatment intensification increases the risks of cardiovascular events in patients with type 2 diabetes. Cardiovascular diabetology, 2015. 14(1): p. 100. 90. Khunti, K., et al., Clinical inertia with regard to intensifying therapy in people with type 2 diabetes treated with basal insulin. Diabetes, Obesity and Metabolism, 2016. 91. Giugliano, D., et al., Comment on Edelman and Polonsky. Type 2 Diabetes in the Real World: The Elusive Nature of Glycemic Control. Diabetes Care 2017;40:1425–1432. Diabetes Care, 2018. 41(2): p. e17. 92. Edelman, S.V. and W.H. Polonsky, Type 2 Diabetes in the Real World: The Elusive Nature of Glycemic Control. Diabetes Care, 2017. 40(11): p. 1425-1432. 93. Crapo, J., Big Data in Healthcare: Separating The Hype From The Reality. HealthCatalyst, 2015: p. 5. 94. Crawford, A.G., et al., Comparison of GE Centricity Electronic Medical Record database and National Ambulatory Medical Care Survey findings on the prevalence of major conditions in the United States. Popul Health Manag, 2010. 13(3): p. 139-50. 95. Grabenbauer, L., A. Skinner, and J. Windle, Electronic Health Record Adoption - Maybe It's not about the Money: Physician Super-Users, Electronic Health Records and Patient Care. Appl Clin Inform, 2011. 2(4): p. 460-71. 96. Birkhead, G.S., M. Klompas, and N.R. Shah, Uses of electronic health records for public health surveillance to advance public health. Annu Rev Public Health, 2015. 36: p. 345-59. 97. Coloma, P.M., et al., Combining electronic healthcare databases in Europe to allow for large‐ scale drug safety monitoring: the EU‐ADR Project. Pharmacoepidemiology and drug safety, 2011. 20(1): p. 1-11. 98. Kosiborod, M., et al., Lower Risk of Heart Failure and Death in Patients Initiated on Sodium- Glucose Cotransporter-2 Inhibitors Versus Other Glucose-Lowering DrugsClinical Perspective: The CVD-REAL Study (Comparative Effectiveness of Cardiovascular Outcomes

158 in New Users of Sodium-Glucose Cotransporter-2 Inhibitors). Circulation, 2017. 136(3): p. 249-259. 99. Verheij, A.R., et al., Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse. J Med Internet Res, 2018. 20(5): p. e185. 100. Regier, E.E., M.V. Venkat, and K.L. Close, More than 7 years of hindsight: revisiting the FDA’s 2008 guidance on cardiovascular outcomes trials for Type 2 diabetes medications. Clinical Diabetes, 2016. 34(4): p. 173-180. 101. Nissen, S.E. and K. Wolski, Effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular causes. New England Journal of Medicine, 2007. 356(24): p. 2457- 2471. 102. Scirica, B.M., et al., Saxagliptin and cardiovascular outcomes in patients with type 2 diabetes mellitus. New England Journal of Medicine, 2013. 369(14): p. 1317-1326. 103. McMurray, J.J.V., et al., Heart failure: a cardiovascular outcome in diabetes that can no longer be ignored. The Lancet Diabetes & Endocrinology, 2015. 2(10): p. 843-851. 104. Marso, S.P., et al., Liraglutide and cardiovascular outcomes in type 2 diabetes. New England Journal of Medicine, 2016. 375(4): p. 311-322. 105. Monami, M., et al., Dipeptidyl peptidase‐4 inhibitors and cardiovascular risk: a meta‐analysis of randomized clinical trials. Diabetes, Obesity and Metabolism, 2013. 15(2): p. 112-120. 106. Monami, M., et al., Effects of glucagon‐like peptide‐1 receptor agonists on cardiovascular risk: a meta‐analysis of randomized clinical trials. Diabetes, Obesity and Metabolism, 2014. 16(1): p. 38-47. 107. Wu, S., et al., The cardiovascular effect of incretin-based therapies among type 2 diabetes: a systematic review and network meta-analysis. Expert opinion on drug safety, 2018: p. 1-7. 108. Gamble, J.M., et al., Incretin-based medications for type 2 diabetes: an overview of reviews. Diabetes Obes Metab, 2015. 17(7): p. 649-58. 109. Patorno, E., et al., Comparative Cardiovascular Safety of Glucagon-Like Peptide-1 Receptor Agonists versus Other Antidiabetic Drugs in Routine Care: a Cohort Study. Diabetes, Obesity and Metabolism, 2016: p. n/a-n/a. 110. Kannan, S., et al., Risk of overall mortality and cardiovascular events in patients with type 2 diabetes on dual drug therapy including metformin: A large database study from the Cleveland Clinic. J Diabetes, 2016. 8(2): p. 279-85. 111. d’Agostino, R.B., Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med, 1998. 17(19): p. 2265- 2281. 112. Rubin, D.B., Using multivariate matched sampling and regression adjustment to control bias in observational studies. ETS Research Report Series, 1978. 1978(2). 113. Hade, E.M. and B. Lu, Bias associated with using the estimated propensity score as a regression covariate. Statistics in medicine, 2014. 33(1): p. 74-87. 114. Zghebi, S., et al., Comparative risk of major cardiovascular events associated with second‐line antidiabetic treatments: a retrospective cohort study using UK primary care data linked to

159 hospitalization and mortality records. Diabetes, Obesity and Metabolism, 2016. 18(9): p. 916- 924. 115. Gamble, J.-M., et al., Comparative effectiveness of incretin-based therapies and the risk of death and cardiovascular events in 38,233 metformin monotherapy users. Medicine, 2016. 95(26). 116. Paul, S.K., et al., The association of the treatment with glucagon-like peptide-1 receptor agonist exenatide or insulin with cardiovascular outcomes in patients with type 2 diabetes: A retrospective observational study. Cardiovascular Diabetology, 2015. 14(1). 117. Best, J.H., et al., Risk of cardiovascular disease events in patients with type 2 diabetes prescribed the Glucagon-Like Peptide 1 (GLP-1) receptor agonist exenatide twice daily or other glucose-lowering therapies: A retrospective analysis of the lifelink database. Diabetes Care, 2011. 34(1): p. 90-95. 118. Velez, M., et al., Association of antidiabetic medications targeting the glucagon-like peptide 1 pathway and heart failure events in patients with diabetes. J Card Fail, 2015. 21(1): p. 2-8. 119. Mogensen, U.M., et al., Cardiovascular safety of combination therapies with incretin-based drugs and metformin compared with a combination of metformin and sulphonylurea in type 2 diabetes mellitus - a retrospective nationwide study. Diabetes Obes Metab, 2014. 120. Fu, A.Z., et al., Association Between Hospitalization for Heart Failure and Dipeptidyl Peptidase-4 Inhibitors in Patients With Type 2 Diabetes: An Observational Study. Diabetes care, 2016: p. dc150764. 121. Weir, D.L., et al., Sitagliptin use in patients with diabetes and heart failure: a population-based retrospective cohort study. JACC Heart Fail, 2014. 2(6): p. 573-82. 122. Scheller, N.M., et al., All-cause mortality and cardiovascular effects associated with the DPP- IV inhibitor sitagliptin compared with metformin, a retrospective cohort study on the Danish population. Diabetes Obes Metab, 2014. 16(3): p. 231-6. 123. Kim, S.C., et al., Dipeptidyl peptidase-4 inhibitors do not increase the risk of cardiovascular events in type 2 diabetes: a cohort study. Acta Diabetol, 2014. 51(6): p. 1015-23. 124. Wang, K.L., et al., Sitagliptin and the risk of hospitalization for heart failure: a population- based study. Int J Cardiol, 2014. 177(1): p. 86-90. 125. Morgan, C.L., et al., Combination therapy with metformin plus sulphonylureas versus metformin plus DPP-4 inhibitors: association with major adverse cardiovascular events and all-cause mortality. Diabetes Obes Metab, 2014. 16(10): p. 977-83. 126. GE Healthcare, Centricity Electronic Medical Record Brochure. 2011. 127. Brixner, D., et al., Six‐month outcomes on A1C and cardiovascular risk factors in patients with type 2 diabetes treated with exenatide in an ambulatory care setting. Diabetes, Obesity and Metabolism, 2009. 11(12): p. 1122-1130. 128. Brixner, D., et al., Assessment of cardiometabolic risk factors in a national primary care electronic health record database. Value in health, 2007. 10: p. S29-S36. 129. Inzucchi, S., et al., Progression to insulin therapy among patients with type 2 diabetes treated with sitagliptin or sulphonylurea plus metformin dual therapy. Diabetes, Obesity and Metabolism, 2015. 17(10): p. 956-964.

160 130. Levin, P., et al., Therapeutically interchangeable? A study of real‐world outcomes associated with switching basal insulin analogues among US patients with type 2 diabetes mellitus using electronic medical records data. Diabetes, Obesity and Metabolism, 2015. 17(3): p. 245-253. 131. Chitnis, A.S., et al., Clinical effectiveness of liraglutide across body mass index in patients with type 2 diabetes in the United States: a retrospective cohort study. Advances in therapy, 2014. 31(9): p. 986-999. 132. Davis, K.L., et al., Real-world comparative outcomes of US type 2 diabetes patients initiating analog basal insulin therapy. Current medical research and opinion, 2013. 29(9): p. 1083-1091. 133. Ashton, V., et al., LDL-C levels in US patients at high cardiovascular risk receiving rosuvastatin monotherapy. Clinical therapeutics, 2014. 36(5): p. 792-799. 134. Chopra, I. and K.M. Kamal, Factors associated with therapeutic goal attainment in patients with concomitant hypertension and dyslipidemia. Hospital Practice, 2014. 42(2): p. 77-88. 135. Saseen, J.J., et al., Maintaining goal blood pressures after switching from olmesartan to other angiotensin receptor blockers. The Journal of Clinical Hypertension, 2013. 15(12): p. 888-892. 136. Crawford, A.G., et al., Prevalence of obesity, type II diabetes mellitus, hyperlipidemia, and hypertension in the United States: findings from the GE Centricity Electronic Medical Record database. Population health management, 2010. 13(3): p. 151-161. 137. Brixner, D., et al., Evaluation of cardiovascular risk factors, events, and costs across four BMI categories. Obesity, 2013. 21(6): p. 1284-1292. 138. DerSarkissian, M., et al., Maintenance of weight loss or stability in subjects with obesity: a retrospective longitudinal analysis of a real-world population. Current Medical Research and Opinion, 2017. 33(6): p. 1105-1110. 139. Tandon, N., et al., Psy64 Ge Centricity® Electronic Medical Records Study: Comorbidities And Biologic Experience Among Patients Receiving Golimumab. Value in Health, 2011. 14(3): p. A70-A71. 140. Wang, J., et al., New diagnosis of hypertension among celecoxib and nonselective NSAID users: a population-based cohort study. Annals of Pharmacotherapy, 2007. 41(6): p. 937-943. 141. Rajagopalan, V., et al., SAT0069 Performance of the Framingham Cardiovascular Risk Prediction Model with and without C-Reactive Protein or Erythrocyte Sedimentation Rate in RA: Analysis of US Electronic Medical Records Database. Annals of the Rheumatic Diseases, 2014. 73(Suppl 2): p. 615-615. 142. Paul, S.K., et al. Effectiveness of biologic and non-biologic antirheumatic drugs on anaemia markers in 153,788 patients with rheumatoid arthritis: new evidence from real-world data. in Seminars in arthritis and rheumatism. 2018. Elsevier. 143. Marelli, C., et al., Statins and risk of cancer: a retrospective cohort analysis of 45,857 matched pairs from an electronic medical records database of 11 million adult Americans. Journal of the American College of Cardiology, 2011. 58(5): p. 530-537. 144. Talal, A., et al., Absolute and relative contraindications to pegylated‐interferon or ribavirin in the US general patient population with chronic hepatitis C: results from a US database of over 45 000 HCV‐infected, evaluated patients. Alimentary pharmacology & therapeutics, 2013. 37(4): p. 473-481.

161 145. Unni, S., et al., Hypertension control and antihypertensive therapy in patients with chronic kidney disease. American journal of hypertension, 2014. 28(6): p. 814-822. 146. Patel, A., et al., Care Provision and Prescribing Practices of Physicians Treating Children and Adolescents With ADHD. Psychiatric Services, 2017: p. appi. ps. 201600130. 147. World Health Organization Collaborating Centre for Drug Statistics Methodology. ATC. 2011; Available from: https://www.whocc.no/atc/structure_and_principles/. 148. US Food and Drug Administration. FDA Approved Drug Products. 2017; Available from: https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm. 149. US Food and Drug Administration, FDA approves weight-management drug Saxenda. 2014. 150. Hampp, C., et al., Use of antidiabetic drugs in the US, 2003–2012. Diabetes care, 2014. 37(5): p. 1367-1374. 151. Yurkovich, M., et al., A systematic review identifies valid comorbidity indices derived from administrative health data. Journal of Clinical Epidemiology, 2015. 68(1): p. 3-14. 152. Needham, D.M., et al., A systematic review of the Charlson comorbidity index using Canadian administrative databases: a perspective on risk adjustment in critical care research. Journal of critical care, 2005. 20(1): p. 12-19. 153. Elixhauser, A., et al., Comorbidity measures for use with administrative data. Medical care, 1998. 36(1): p. 8-27. 154. Von Korff, M., E.H. Wagner, and K. Saunders, A chronic disease score from automated pharmacy data. Journal of clinical epidemiology, 1992. 45(2): p. 197-203. 155. Khan, N.F., et al., Adaptation and validation of the Charlson Index for Read/OXMIS coded databases. BMC family practice, 2010. 11(1): p. 1. 156. Bannay, A., et al., The best use of the charlson comorbidity index with electronic health care database to predict mortality. Medical care, 2016. 54(2): p. 188-194. 157. Charlson, M.E., et al., A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases, 1987. 40(5): p. 373-383. 158. Quan, H., et al., Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American journal of epidemiology, 2011. 173(6): p. 676-682. 159. Denti, L., et al., Validity of the modified Charlson comorbidity index as predictor of short-term outcome in older stroke patients. Journal of Stroke and Cerebrovascular Diseases, 2015. 24(2): p. 330-336. 160. Quan, H., et al., Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care, 2005: p. 1130-1139. 161. Deyo, R.A., D.C. Cherkin, and M.A. Ciol, Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. Journal of clinical epidemiology, 1992. 45(6): p. 613- 619. 162. U.S. National Library of Medicine. SNOMED CT to ICD-10-CM Map. 2016; Available from: https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html.

162 163. Kamal, K.M., et al., Use of electronic medical records for clinical research in the management of type 2 diabetes. Res Social Adm Pharm, 2014. 10(6): p. 877-84. 164. Su, C.C., et al., Risk of diabetes in patients with rheumatoid arthritis: a 12-year retrospective cohort study. J Rheumatol, 2013. 40(9): p. 1513-8. 165. Seidu, S., et al., Prevalence and characteristics in coding, classification and diagnosis of diabetes in primary care. Postgraduate medical journal, 2014. 90(1059): p. 13-17. 166. Shivade, C., et al., A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 2014. 21(2): p. 221-230. 167. Tapak, L., et al., Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran. Healthcare Informatics Research, 2013. 19(3): p. 177-185. 168. Mani, S., et al., Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Annu Symp Proc, 2012. 2012: p. 606-15. 169. American Diabetes Association, Standards of Medical Care in Diabetes—2017: Summary of Revisions. Diabetes Care, 2017. 40(Supplement 1): p. S4-S5. 170. Witten, I.H., et al., Data Mining: Practical machine learning tools and techniques. 2016: Morgan Kaufmann. 171. Leung, K.M., Naive bayesian classifier. Polytechnic University Department of Computer Science/Finance and Risk Engineering, 2007. 172. Ng, A.Y. and M.I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. in Advances in neural information processing systems. 2002. 173. Cristianini, N. and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods. 2000: Cambridge university press. 174. Murtagh, F., Multilayer perceptrons for classification and regression. Neurocomputing, 1991. 2(5): p. 183-197. 175. Bhargava, N., et al., Decision tree analysis on j48 algorithm for data mining. Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, 2013. 3(6). 176. Holte, R.C., Very simple classification rules perform well on most commonly used datasets. Machine learning, 1993. 11(1): p. 63-90. 177. Centers for Disease Control and Prevention, National diabetes statistics report: estimates of diabetes and its burden in the United States, 2018. Atlanta, GA: US Department of Health and Human Services, 2018. 178. Paul, S., J. Shaw, and K. Klein. Therapeutic inertia for glycaemic and blood pressure control in patients with type 2 diabetes mellitus and the cardiovascular consequences. in Diabetologia. 2015. Springer 233 Spring St, New York, NY 10013 USA. 179. Mata‐Cases, M., et al., Therapeutic inertia in patients treated with two or more antidiabetics in primary care: F actors predicting intensification of treatment. Diabetes, Obesity and Metabolism, 2018. 20(1): p. 103-112.

163 180. Von Elm, E., et al., The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. International journal of surgery, 2014. 12(12): p. 1495-1499.

164

APPENDIX A

165 Received: 11 June 2016 Revised: 9 September 2016 Accepted: 11 September 2016 DOI 10.1111/dom.12790

ORIGINAL ARTICLE

Addition of or switch to insulin therapy in people treated with glucagon-like peptide-1 receptor agonists: A real-world study in 66 583 patients

Olga Montvida MSc1,2 | Kerenaftali Klein PhD1 | Sudhesh Kumar MD3 | Kamlesh Khunti PhD4 | Sanjoy K. Paul PhD1

1Clinical Trials and Biostatistics Unit, QIMR Berghofer Medical Research Institute, Background: Real world outcomes of addition or switch to insulin therapy in type 2 diabetes Brisbane, Australia (T2DM) patients on glucagon-like paptide-1 receptor agonist (GLP-1RA) with inadequately con- 2School of Biomedical Sciences, Institute of trolled hyperglycaemia, are not known. Health and Biomedical Innovation, Faculty of Materials and methods: Patients with T2DM (n = 66 583) with a minimum of 6 months of Health, Queensland University of Technology, Brisbane, Australia GLP-1RA treatment and without previous insulin treatment were selected. Those who added fi 3Warwick Medical School, University of insulin (n = 39 599) or switched to insulin after GLP-1RA cessation (n = 4706) were identi ed. Warwick, and University Hospitals Coventry Adjusted changes in glycated haemoglobin (HbA1c), weight, systolic blood pressure (SBP) and and Warwickshire, Coventry, UK LDL cholesterol were estimated over 24 months follow-up. 4 Diabetes Research Centre, Leicester Diabetes Results: Among those who continued with GLP-1RA treatment without adding or switching to Centre, University of Leicester, Leicester, UK insulin, the highest adjusted mean HbA1c change was achieved within 6 months, with no fur- Corresponding Author: Sanjoy K. Paul, PhD, ther glycaemic benefits observed during 24 months of follow-up. Addition of insulin within Clinical Trials and Biostatistics Unit, QIMR Berghofer Medical Research Institute, 6 months of GLP-1RA initiation was associated with 18% higher odds of achieving HbA1c <7% 300 Herston Road, Herston, QLD 4006 at 24 months, compared with adding insulin later. At 24 months, those who added insulin Brisbane, Australia (Sanjoy. reduced HbA1c significantly by 0.55%, while no glycaemic benefit was observed in those who [email protected]). switched to insulin. Irrespective of intensification with insulin, weight, SBP and LDL cholesterol Funding Information fi No separate funding was obtained for this were signi cantly reduced by 3 kg, 3 mm Hg and 0.2 mmol/L, respectively, over 24 months. study. Conclusions: Significant delay in intensification of treatment by addition of insulin is observed in patients with T2DM inadequately controlled with GLP-1RA. Earlier addition of insulin is associated with better glycaemic control, while switching to insulin is not clinically beneficial during 2 years of treatment. Non-responding patients on GLP-1RA would benefit from adding insulin therapy, rather than switching to insulin.

KEYWORDS

GLP-1 analogue, insulin therapy, pharmaco-epidemiology, type 2 diabetes

1 | INTRODUCTION lipids.4 Novel antihyperglycaemic glucagon-like peptide-1 receptor agonist (GLP-1RA) therapies, including exenatide (EXE) and liraglutide People with diabetes are at increased risk of developing disabling and (LIRA), have the potential to address these challenges.5–9 life-threatening health problems, including microvascular and macro- The combination of GLP-1RA and insulin treatments represents a vascular complications.1,2 Good control of hyperglycaemia and the promising glycaemic management strategy because of the comple- associated risk factors in type 2 diabetes (T2DM) has been shown to mentary mechanisms of actions of these therapies.10 Both therapies reduce the risk of these complications.1,3 Thus anti-hyperglycaemic affect body weight, but in opposite directions; while significant treatment strategies should ideally also address the management of weight reductions have been observed in patients treated with GLP- cardiovascular risk factors, including body weight, blood pressure and 1RA, insulin is known to significantly increase body weight.11,12

Diabetes Obes Metab; 9999: •• wileyonlinelibrary.com/journal/dom © 2016 John Wiley & Sons Ltd 1

166 2 MONTVIDA ET AL.

A meta-analysis of clinical trials conducted by Eng et al.13 showed records in the form of start/stop dates, along with several specific that the combination of a GLP-1RA with basal insulin resulted in fields to track treatment adjustments and alterations over time. robust glycaemic control without increased risk of hypoglycaemia or From >2.4 million patients with a confirmed diagnosis of T2DM, weight gain. The effectiveness of adding GLP-1RA to basal insulin a cohort of 134 268 patients who received treatment with GLP-1RAs therapy on glucose and weight control in patients with T2DM has between April 2005 and October 2014 was identified. The final study also been evaluated in a number of observational and real-world cohort of 66 583 patients was selected on the basis of the following data-based studies.10,14–20 The intensification of insulin therapy by criteria: (1) diagnosis of diabetes from January 1990; (2) age addition of GLP-1RA, rather than adding mealtime insulin, has been ≥18 years at the diagnosis of diabetes; (3) no insulin therapy before shown to be an attractive therapeutic option,13,21 and is now recom- the initiation of GLP-1RA treatment; (4) no missing data on age, sex, mended in international guidelines: “The available data now suggest HbA1c and body weight at GLP-1RA initiation; and (5) minimum that either a GLP-1 receptor agonist or prandial insulin could be used 6 months of continuous treatment with GLP-1RA from the first in this setting [in patients not achieving target glycated haemoglobin recorded prescription date. (HbA1c)], with the former arguably safer, at least for short-term In the final study cohort, those treated with EXE and those trea- outcomes.”21 ted with LIRA were identified. Those who added insulin therapy to A significant number of patients treated with GLP-1RAs also existing GLP-1RA treatment (GLP-1RA + INS), and those who intensify therapy by adding insulin, or switching to insulin therapy, switched to insulin therapy on cessation of GLP-1RA treatment (GLP- primarily because of sub-optimal glycaemic control; however, to the 1RA ! INS), were identified by comparing insulin initiation dates and best of our knowledge, only three observational studies (n = 44 to GLP-1RA cessation dates. Time to addition of/switch to insulin ther- n = 432) have evaluated the effectiveness of adding insulin to GLP- apy was calculated as the time difference between GLP-1RA and 1RA therapy.20 Moreover, these studies did not explore the HbA1c insulin initiation dates. Insulin therapy was identified by any insulin trajectories over time to understand the longitudinal patterns of regimen (basal, biphasic or prandial). The majority of the patients in glycaemic failure. Furthermore, no real-world study, to the best of the EXE group (94%) were treated with a twice-daily EXE regimen. our knowledge, has explored the dynamics of changes in glycaemic Demographic, clinical and laboratory information included age, and cardiovascular risk factors from the time of GLP-1RA initiation sex, ethnicity and longitudinal measures of body weight, body mass through to the transition phase of adding or switching to insulin ther- index, systolic blood pressure (SBP), diastolic blood pressure, HbA1c, apy. The real-world patterns of adding or switching to insulin therapy and lipids. Clinical and laboratory data at GLP-1RA treatment initia- from initial GLP-1RA treatment are also not well understood. Further, tion (index date) were included on the basis of a 3-months window, amongst patients with sub-optimal glycaemic control who intensify on or prior to the index date. Follow-up clinical and laboratory mea- GLP-1RA therapy by addition of insulin, it is not known whether sures were arranged longitudinally on the basis of non-overlapping earlier intensification is beneficial for sustainable glucose control. 6-monthly windows which were defined progressively from the time The aims of the present longitudinal cohort study were to evalu- of GLP-1RA treatment initiation. Complete information on antihyper- ate, from the time of initiation of GLP-1RA therapy in patients with glycaemic agents, antihypertensive agents, cardio-protective medica- T2DM: (1) changes in HbA1c, body weight, blood pressure and LDL tions (CPMs), weight-lowering and anti-depressant drugs was cholesterol over 6, 12 and 24 months of follow-up; (2) possible bene- obtained, along with dates of prescriptions. The CPMs included sta- fits of adding or switching to insulin treatment; and (3) the likelihood tins, angiotensin-converting enzyme inhibitors and angiotensin II of clinically meaningful HbA1c reduction in those who intensified receptor blockers. The status of different medication intakes was GLP-1RA treatment with insulin earlier, compared with those who defined by whether it was taken during GLP-1RA treatment, and the added insulin later. treatment durations for such medications were estimated.

2.2 | Statistical methods 2 | MATERIALS AND METHODS While complete data on HbA1c and body weight were available at index date (by design), the proportion of patients with missing longi- 2.1 | Data source tudinal data on body weight, SBP, LDL cholesterol and HbA1c over The Centricity Electronic Medical Record (CEMR) database was used 24 months of follow-up ranged from 9% to 19%. The missing longitu- for this study. The CEMR represents a variety of ambulatory medical dinal follow-up data were imputed using the multiple imputation practices from 49 US states, including solo practitioners, community approach, with adjustments for age at index date and use of oral anti- clinics, academic medical centres and large integrated delivery net- hyperglycaemic drugs during follow-up. All primary analyses were works. The CEMR database consists of >35 000 physicians and other conducted using the imputed data, with additional analyses con- providers, of whom ~75% are primary care providers. The database ducted for sensitivity assessment based on complete cases. has been extensively used for academic research worldwide.5,22–24 The mean (95% confidence interval [CI]) changes in HbA1c, body The CEMR database contains detailed prescription information weight, SBP and LDL cholesterol at 6, 12 and 24 months from index with dates of prescription, including information on medications that date were estimated using multivariate regression models. Risk factor were purchased over the counter or prescribed outside of the EMR changes were adjusted for age at index date, sex, ethnicity and con- network. The main medication information data set stores individual comitant antihyperglycaemic, antihypertensive and weight-lowering

167 MONTVIDA ET AL. 3 treatments, weighted by the respective baseline measures, as appropri- these patients, the mean HbA1c increased to 9.3% at the time of ate. Separate analyses for HbA1c and weight changes were conducted switching to insulin from a mean HbA1c of 8.5% at GLP1-RA initiation, for patients who continued to receive only GLP-1RA treatment over and 80% of them had HbA1c above 8% at insulin initiation (Table 3). 6, 12 and 24 months, and for those who added or switched to insulin Notably, these patients did not achieve better glycaemic control at during follow-up. Robust estimates of the CIs were obtained. 24 months compared with their glycaemic status at index date For patients with HbA1c >7.5% at the time of GLP-1RA initia- (Figure 1B,C). In contrast, patients who added insulin within 24 months tion, the proportions of patients who achieved HbA1c below 7% at of follow-up (n = 36 113; mean time to insulin 3 months; Table 3) 6, 12 and 24 months of treatment were evaluated for all groups. Pro- experienced a significant HbA1c reduction of 0.55% (95% CI 0.54, portions of those who achieved weight loss ≥5% from initial body 0.57) at 24 months compared with the index date (Figure 1C). The weight at 6, 12 and 24 months after GLP-1RA initiation were also adjusted means (95% CI) of change in HbA1c and body weight at calculated. The characteristics of the patients who added or switched 6, 12 and 24 months after GLP-1RA initiation, for those who ceased to insulin were presented at the index date and at the time of transi- GLP-1RA after 6 months of initiation and switched to insulin between tion to insulin. 6 and 12, 12 and 18, and 18 and 24 months, are shown in Table S1.

3.2 | Weight change 3 | RESULTS With a baseline body weight of 109 kg (Table 1), patients with a min- In the cohort of 66 583 patients with minimum 6 months of treat- imum 12 months of treatment with GLP-1RA had significantly greater ment with GLP-1RA, the mean (standard deviation) age was adjusted weight reduction (mean reduction 2.5 kg [CI 2.50, 2.51]), 56 (11) years, 28 959 (43%) were male, 45 291 (68%) were white, and 24% reduced their body weight by ≥5% (Table 4). Among those 51 719 (87%) were obese, 3404 (26%)/858 (7%) had micro-/macro- who continued GLP-1RA treatment only (without addition or switch albuminuria, and 17 415 (26%) had a history of hypertension at the to insulin) for 24 months, the average weight reduction from index time of initiation of GLP-1RA treatment (Table 1). The use of differ- date was 3.31 kg (95% CI 3.30, 3.32), and a third of patients achieved ent medications during GLP-1RA therapy, along with their durations a weight loss of ≥5%. Patients who added insulin achieved marginally of treatment, are shown in Table 1. higher weight reduction (adjusted) at 12 and 24 months (2.93 and 3.40 kg, respectively; Table 4), compared with those who did not add or switch to insulin therapy (2.50 and 3.31 kg, respectively). 3.1 | Glycaemic control

With mean HbA1c of 8.2% at index date, among those who continued 3.3 | Associations of glucose and weight loss with GLP-1RA treatment without adding or switching to insulin, the highest adjusted mean HbA1c change was achieved within 6 months Among patients with a minimum of 12 months of GLP-1RA treat- (−0.73%; 95% CI −0.73, −0.71), with no further glycaemic benefits at ment, 78% and 67% had reductions in HbA1c and body weight, 12 months (−0.65%; 95% CI −0.65, −0.62) or 24 months of follow-up respectively, from the index date, while 53% had reductions in both (−0.59%; 95% CI −0.60, −0.58; Table 2; Figure 1A; all P < 0.01). Among body weight and HbA1c (similar in patients treated with LIRA and patients with HbA1c ≥7.5% at index date, who did not add or switch EXE; Figure 1D). At 12 months of follow-up, 8% and 7% of patients to insulin, and who continued with GLP-1RA treatment only for did not have reductions in both HbA1c and weight in the EXE and 12 and 24 months (n = 14 682 and n = 6825), 26% achieved HbA1c LIRA groups, respectively. levels below 7% at 12 and 24 months, respectively (Table 2). Among those who added insulin during follow-up (GLP-1RA + 3.4 | Cardiovascular risk factors INS, n = 39 599), the mean HbA1c values at index date and at the time of adding insulin were 8.3% and 8.8%, respectively. The median With a mean 129 mm Hg SBP level at index date, only 24% had SBP time to intensification with insulin was 3 months. Among these ≥140 mm Hg. The adjusted average reduction in SBP was ~3 mm Hg patients, 84% and 71% had HbA1c levels above 7.5% and 8%, consistently over 6, 12 and 24 months of follow-up, and was similar respectively, at insulin initiation. Those who added insulin within across EXE and LIRA groups (Tables 1 and 4). Among those who 6 months of GLP-1RA initiation achieved significantly greater switched to insulin, the mean SBP levels at index date, at the time of (P < 0.001) adjusted HbA1c reduction at 24 months of follow-up moving to insulin, and at 24 months of follow-up remained stable at (−0.58%; 95% CI −0.61, −0.57), compared with those who added 130 mm Hg. insulin after 12 months (−0.41%; 95% CI −0.43, −0.40; Figure 1A). In all, 92% of patients in the study cohort were on lipid-lowering Those who added insulin within 6 months of GLP-1RA initiation were therapy. The average reduction in LDL cholesterol was ≥0.18 mmol/L 18% (odds ratio 1.18; 95% CI 1.09, 1.28; P < 0.001) more likely to consistently during 6, 12 and 24 months of follow-up (range of CI of achieve HbA1c below 7% at 24 months of follow-up, compared with reduction: 0.17-0.24 mmol/L; Table 4), starting with a baseline LDL those who added insulin treatment later. cholesterol level of 2.43 mmol/L. Among patients who did not The 6-monthly trajectories (mean, 95% CI) of HbA1c levels for receive any statin (n = 15 949), mean reductions in LDL cholesterol those who switched to insulin therapy within 24 months (n = 2483; at 6, 12 and 24 months were 0.15 mmol/L, 0.14 mmol/L and mean time to insulin 14 months; Table 3) are shown in Figure 1B. In 0.17 mmol/L, respectively (all P < 0.001). Among those who switched

168 4 MONTVIDA ET AL.

TABLE 1 Basic statistics on study variables at the time of initiation of exenatide or liraglutide for patients who continued glucagon-like peptide- 1 receptor agonist treatment for at least 6 months, those who added insulin treatment during the follow-up, and those who switched to insulin treatment during the follow-up

GLP- ALL EXE LIRA GLP-1RA + INS 1RA ! INS N 66 583 44 523 22 060 39 599 4706 Age at GLP-1RA initiation, y 56 (11) 56 (11) 56 (11) 56 (11) 56 (11) Men, n (%) 28 959 (43) 18 917 (42) 10 042 (46) 17 531 (44) 2130 (45) Ethnicity, n (%) White 45 291 (68) 29 500 (66) 15 791 (72) 27 231 (69) 3311 (70) Black 5021 (8) 3118 (7) 1903 (9) 2971 (8) 358 (8) Hispanic 1465 (2) 943 (2) 522 (2) 1023 (3) 89 (2) Asian 534 (1) 326 (1) 208 (1) 312 (1) 30 (1) HbA1c at diagnosis of diabetes, % 8 (1.4) 8 (1.4) 8.1 (1.5) 8.1 (1.5) 8.2 (1.5) HbA1c at GLP-1RA initiation, % 8.2 (1.3) 8.1 (1.3) 8.3 (1.4) 8.3 (1.4) 8.4 (1.3) Median (IQR) HbA1c at GLP-1RA initiation, % 7.8 (7, 8.8) 7.8 (7, 8.7) 7.9 (7.1, 8.9) 8 (7.1, 9.0) 8.2 (7.4, 9.0) HbA1c ≥ 7% at GLP-1RA initiation, n (%) 60 351 (91) 40 180 (90) 20 171 (91) 36 130 (91) 4388 (93) HbA1c ≥ 7.5% at GLP-1RA initiation, n (%) 41 045 (62) 26 700 (60) 14 345 (65) 25 691 (65) 3388 (72) HbA1c ≥ 8% at GLP-1RA initiation, n (%) 30 599 (46) 19 777 (44) 10 822 (49) 19 859 (50) 2628 (56) Weight at GLP-1RA initiation, kg 109 (25) 110 (25) 109 (25) 110 (25) 108 (24) BMI at GLP-1RA initiation, kg/m2 38 (8) 38 (8) 38 (8) 38 (8) 37 (8) Obese at GLP-1RA initiation, n (%) 57 927 (87) 38 753 (87) 18 971 (86) 34 847 (88) 4000 (85) SBP at GLP-1RA initiation, mm Hg 129 (16) 129 (16) 129 (16) 129 (16) 130 (16) SBP ≥ 140 mm Hg at GLP-1RA initiation, n (%) 15 728 (24) 10 628 (24) 5100 (23) 9487 (24) 1184 (25) DBP at GLP-1RA initiation, mm Hg 77 (10) 77 (10) 77 (10) 77 (10) 77 (10) LDL cholesterol at GLP-1RA initiation, mmol/L 2.43 (0.72) 2.44 (0.73) 2.42 (0.72) 2.42 (0.74) 2.47 (0.74) LDL cholesterol ≥ 3.37 mmol/L at GLP-1RA 5780 (9) 3951 (9) 1829 (8) 3652 (9) 450 (10) initiation, n (%) HDL cholesterol at GLP-1RA initiation, mmol/L 1.10 (0.31) 1.11 (0.30) 1.10 (0.31) 1.10 (0.30) 1.09 (0.29) Median (IQR) triglycerides at GLP-1RA initiation, 1.69 (1.23, 2.28) 1.69 (1.23, 2.27) 1.71 (1.24, 2.31) 1.71 (1.24, 2.29) 1.75 (1.25, 2.35) mmol/L Triglyceride ≥ 1.69 mmol/L at GLP-1RA 15 060 (51) 9920 (50) 5140 (51) 10 107 (51) 967 (53) initiation, n (%) Micro-albuminuria, n (%) 3404 (26) 2126 (25) 1278 (29) 2481 (26) 167 (26) Macro-albuminuria, n (%) 858 (7) 532 (6) 326 (7) 597 (6) 49 (8) Hypertension, n (%) 17 415 (26) 12 707 (29) 4708 (21) 11 020 (28) 1340 (28) Metformin taken during the GLP-1RA treatment, 56 035 (84) 37 645 (85) 18 390 (83) 33 837 (85) 4129 (88) n (%) Median (IQR) metformin duration, months 52.7 (28.2, 84.3) 62.6 (34.9, 91.9) 36.6 (20.8, 61.1) 55.9 (30.3, 87.5) 71.8 (43.6, 97.6) Sulphonylurea taken during the GLP-1RA 38 003 (57) 26 719 (60) 11 284 (51) 23 723 (60) 3583 (76) treatment, n (%) Median (IQR) sulphonylurea duration, months 32.5 (15, 60.3) 36.8 (17, 66) 25.4 (12.2, 45.8) 33.5 (15.2, 61.9) 39.8 (20.2, 67.9) Antihypertensive taken during the GLP-1RA 53 821 (81) 36 610 (82) 17 211 (78) 32 655 (82) 4032 (86) treatment, n (%) Median (IQR) antihypertensive duration, months 46.5 (23.8, 80.1) 54.7 (28.4, 86.9) 33.2 (17.5, 59.8) 48.7 (25.2, 82.3) 61.4 (33.9, 91.8) CPM taken during the GLP-1RA treatment, n (%) 61 145 (92) 41 273 (93) 19 872 (90) 36 861 (93) 4462 (95) Median (IQR) CPM duration, months 51.1 (26.6, 84.9) 59.9 (32.7, 91.6) 35.7 (19.4, 64.5) 53.8 (28.5, 87.5) 68.3 (39, 96.3) Weight lowering drugs taken during the 4591 (7) 3297 (7) 1294 (6) 2831 (7) 328 (7) GLP-1RA treatment, n (%) Median (IQR) weight-lowering duration, months 12.6 (5.1, 28.2) 13.1 (5.2, 29.4) 11.9 (4.9, 25.5) 12.1 (4.8, 27.5) 12.6 (5.3, 28.7) Anti-depressants taken during the GLP-1RA 28 133 (42) 19 865 (45) 8268 (37) 16 950 (43) 2296 (49) treatment, n (%) Median (IQR) anti-depressants duration, months 35.6 (15.4, 68.6) 40.6 (17.4, 74.7) 27.2 (12.6, 51.4) 36.2 (15.5, 70.1) 43.4 (19.5, 78.9)

BMI, body mass index; CPM, cardio-protective medication; DBP, diastolic blood pressure; EXE, exenatide; GLP-1RA, glucagon-like peptide-1 receptor agonist; HbA1c, glycated haemoglobin; INS, insulin; IQR, interquartile range; LIRA, liraglutide; SBP, systolic blood pressure. Values are mean (standard deviation) unless stated otherwise.

169 MONTVIDA ET AL. 5

TABLE 2 Adjusted mean (95% confidence interval) of change in glycated haemoglobin (HbA1c) at 6, 12 and 24 months after glucagon-like peptide-1 receptor agonist (GLP-RA) initiation, for those who took a GLP-1RA for at least 6, 12 and 24 months, stratified by whether patients continued on GLP-1RA treatment only or added insulin therapy, and, for patients with HbA1c levels above 7.5% at GLP-1RA initiation, number and proportion of those whose HbA1c reduced below 7% at 6, 12 and 24 months after GLP-1RA initiation stratified by whether patients took GLP-1RA only or added insulin therapy

On GLP-1RA for ≥6 months On GLP-1RA for ≥12 months On GLP-1RA for ≥24 months All EXE LIRA All EXE LIRA All EXE LIRA 66 583 44 523 22 060 50 109 35 085 15 024 28 422 22 111 6311 Δ HbA1c at 6 months, mean (95% CI) GLP-1RA −0.73 −0.70 −0.80 −0.75 −0.72 −0.83 −0.75 −0.73 −0.85 only (−0.73, (−0.71, (−0.80, (−0.76, (−0.73, (−0.81, (−0.75, (−0.73, (−0.86, −0.71) −0.70) −0.79) −0.75) −0.72) −0.83) −0.74) −0.72) −0.85) GLP- −0.83 −0.75 −0.95 −0.82 −0.75 −0.95 −0.81 −0.77 −0.93 1RA + INS (−0.84, (−0.77, (−0.96, (−0.82, (−0.76, (−0.96, (−0.81, (−0.77, (−0.93, −0.83) −0.75) −0.94) −0.81) −0.75) −0.95) −0.80) −0.76) −0.92) HbA1c < 7% at 6 months for those whose HbA1c was ≥7.5% at GLP-1RA initiation, n (%) GLP-1RA 5410 (25) 3672 (24) 1738 (27) 3965 (27) 2826 (26) 1139 (29) 2018 (30) 1603 (29) 415 (34) only GLP- 4156 (21) 2236 (19) 1920 (24) 3451 (22) 1987 (20) 1464 (25) 2338 (24) 1594 (22) 744 (28) 1RA + INS Δ HbA1c at 12 months, mean (95 CI) GLP-1RA ---−0.65 −0.62 −0.71 −0.67 −0.65 −0.74 only (−0.65, (−0.62, (−0.72, (−0.68, (−0.67, (−0.74, −0.62) −0.61) −0.71) −0.67) −0.65) −0.72) GLP- ---−0.73 −0.67 −0.85 −0.74 −0.71 −0.84 1RA + INS (−0.73, (−0.67, (−0.86, (−0.75, (−0.71, (−0.84, −0.72) −0.66) −0.85) −0.74) −0.70) −0.83) HbA1c < 7% at 12 months for those whose HbA1c was ≥7.5% at GLP-1RA initiation, n (%) GLP-1RA - - - 3829 (26) 2770 (26) 1059 (27) 1960 (29) 1577 (28) 383 (32) only GLP- - - - 3462 (22) 2083 (21) 1379 (24) 2401 (24) 1679 (23) 722 (27) 1RA + INS Δ HbA1c at 24 months, mean (95 CI) GLP-1RA ------−0.59 −0.58 −0.63 only (−0.60, (−0.58, (−0.64, −0.58) −0.57) −0.63) GLP------−0.65 −0.63 −0.70 1RA + INS (−0.66, (−0.64, (−0.71, −0.65) −0.63) −0.70) HbA1c < 7% at 24 months for those whose HbA1c was ≥7.5% at GLP-1RA initiation, n (%) GLP-1RA ------1806 (26) 1469 (26) 337 (28) only GLP------2247 (23) 1621 (23) 626 (23) 1RA + INS

CI, confidence interval; EXE, exenatide; GLP-1RA, glucagon-like peptide-1 receptor agonist; HbA1c, glycated haemoglobin; INS, insulin; LIRA, liraglutide.

to insulin, the mean LDL cholesterol levels at index date, at the time rather than switching to it, is associated with significantly lower glu- of moving to insulin, and at 24 months of follow-up were 2.47, 2.42 cose levels in a long-term outcome. We have also observed a clear and 2.38 mmol/L, respectively. indication of therapeutic inertia among patients who failed to respond to GLP-1RA therapy. This study presents real-world evidence of significant reductions 4 | DISCUSSION in HbA1c, body weight, SBP and LDL cholesterol over 2 years of follow-up in patients treated with GLP-1RAs. Initiation of GLP-1RA The present longitudinal cohort study of 66 583 patients with T2DM treatment at lower HbA1c levels was associated with better glucose treated with GLP-1RA suggests that: (1) significant HbA1c reductions control over 2 years of follow-up. The observed HbA1c reductions may be obtained within 6 months from GLP-1RA treatment initiation, were consistent with previous findings.15–19 While glycaemic with no further glycaemic benefits likely over 24 months of follow- achievements observed within 6 months of GLP-1RA treatment initi- up; (2) earlier intensification with insulin therapy by 6 months (when ation were higher in clinical trials, it is recognized that the effective- added to GLP-1RA) is associated with 18% higher odds of lowering ness studies based on real-world data generally provide lower HbA1c below 7% within 2 years of treatment; and (3) adding insulin, estimates of glycaemic reduction or treatment effect(s) in general.

170 6 171

FIGURE 1 A, Mean (95% confidence interval [CI]) of longitudinal glycated haemoglobin (HbA1c) measurements by whether patient added insulin within 6 months/6 to 12 months/12 to 18 months/18 to 24 months from the GLP-1RA initiation, or remained on GLP-1RA treatment only. B, Mean (95% CI) of longitudinal HbA1c measurements by whether patient switched to insulin within 6 to 12 months/12 MONTVIDA to 18 months/18 to 24 months from the GLP-1RA initiation. C, Mean (95% CI) of longitudinal HbA1c measurements by whether patient added or switched to insulin within 24 months from GLP-1RA initiation. D, Scatterplot of HbA1c change and body weight change at 6 and 12 months for patients treated with exenatide and liraglutide without adding or switching to insulin. The perpendicular dotted lines present the mean time to addition or switching to insulin by categories of timeline. TAL ET . MONTVIDA ET AL. 7

TABLE 3 Glycated haemoglobin levels at glucagon-like peptide-1 receptor agonist (GLP-1RA) initiation and at insulin initiation, and the time to insulin therapy, for those who added insulin to existing GLP-1RA (GLP-1RA+INS) and those who switched to insulin treatment (GLP-1RA!INS) within 2 years of follow-up

GLP-1RA + INS GLP-1RA ! INS All EXE LIRA All EXE LIRA N 36 113 22 703 13 410 2483 1856 627 HbA1c at GLP-1RA 8.3 (1.4) 8.3 (1.4) 8.4 (1.4) 8.5 (1.4) 8.5 (1.4) 8.5 (1.4) initiation, % Median (IQR) HbA1c at 8 (7.1, 9) 7.9 (7.1, 9) 8.1 (7.2, 9.1) 8.3 (7.5, 9.2) 8.3 (7.5, 9.2) 8.3 (7.6, 9.3) GLP-1RA initiation, % HbA1c ≥ 7% at GLP-1RA 33 032 (91) 20 649 (91) 12 383 (92) 2351 (95) 1761 (95) 590 (94) initiation, n (%) HbA1c ≥ 7.5% at GLP-1RA 23 736 (66) 14 507 (64) 9229 (69) 1892 (76) 1404 (76) 488 (78) initiation, n (%) HbA1c ≥ 8% at GLP-1RA 18 436 (51) 11 202 (49) 7234 (54) 1489 (60) 1119 (60) 370 (59) initiation, n (%) HbA1c at insulin initiation, % 8.8 (1.3) 8.8 (1.3) 8.8 (1.3) 9.3 (1.6) 9.2 (1.5) 9.5 (1.7) Median (IQR) HbA1c at insulin 8.7 (7.8, 9.4) 8.7 (7.8, 9.4) 8.8 (7.9, 9.5) 9.1 (8.2, 10) 9 (8.1, 10) 9.1 (8.4, 10.3) initiation, % HbA1c ≥ 7% at insulin 36 045 (100) 22 652 (100) 13 393 (100) 2482 (100) 1855 (100) 627 (100) initiation, n (%) HbA1c ≥ 7.5% at insulin 30 267 (84) 18 848 (83) 11 419 (85) 2209 (89) 1634 (88) 575 (92) initiation, n (%) HbA1c ≥ 8% at insulin 25 649 (71) 15 897 (70) 9752 (73) 1985 (80) 1466 (79) 519 (83) initiation, n (%) Median (IQR) Δ HbA1c 0.46 (0.45, 0.47) 0.49 (0.47, 0.50) 0.42 (0.4, 0.44) 0.73 (0.67, 0.79) 0.67 (0.60, 0.73) 0.91 (0.80, 1.03) (insulin - GLP-1RA), % Mean (range) time to insulin 3 (0, 24) 3 (0, 24) 2 (0, 24) 14 (6, 24) 14 (6, 24) 14 (6, 24) initiation, months Median (IQR) time to insulin 0 (0, 3) 0 (0, 4) 0 (0, 2) 13 (10, 18) 14 (10, 19) 12 (9, 18) initiation, months

EXE, exenatide; GLP-1RA, glucagon-like peptide-1 receptor agonist; HbA1c, glycated haemoglobin; INS, insulin; IQR, interquartile range; LIRA, liraglutide. Statistics are mean (standard deviation) unless stated otherwise.

The mean HbA1c reduction in patients treated with LIRA was margin- 8.5% (71%/58% in the GLP-1RA + INS group and 80%/68% in the ally higher than in those treated with EXE, although the proportions GLP-1RA ! INS group). This clearly raises the issue of therapeutic of patients who had reductions in HbA1c to below 7% over 12 and inertia.25,26 Given the high glycaemic burden in this population, the 24 months of follow-up (ranging from 26% to 28%) were similar for time to intensification of therapy requires further evaluation, in con- each of these two therapies. junction with the factors that might prevent early intensification with With regard to initial response to GLP-1RA within 6 months of insulin therapy, including fear of weight gain and hypoglycaemia. follow-up, patients who switched to insulin (by design after 6 months Notably, the distributions of body weight were similar between those of GLP-1 RA treatment) experienced a significant rise in the glycae- who switched to, or added insulin, and those who remained on GLP- mic level at different points of follow-up, as shown in Figure 1B. For 1RA treatment only (Table 1). Moreover, observed adjusted weight example, those who switched to insulin within 18 to 24 months after reductions in patients who added insulin were consistently and mar- the index date, clearly experienced rising HbA1c levels consistently ginally greater than in those treated with GLP-1RA only during the above 8% after 6 months of treatment with GLP-1RA. Furthermore, follow-up period (Table 4). Our findings in terms of weight loss are we observed that although switching to insulin prevented any further consistent with a recent study reporting no weight gain after initia- rise in HbA1c, no significant glycaemic reductions were achieved by tion of insulin in obese patients with T2DM27 and the systematic the end of the 24-months follow-up period, compared to the index review by Balena et al.20 date (Figure 1B,C); however, those patients who added insulin had The limitations of this study include non-availability of complete significantly better glycaemic control by the end of the follow-up and reliable data on: (1) medication adherence; (2) dose adjustments period. After adjusting for the HbA1c levels at index date and at the in insulin-treated patients; (3) diet and exercise; (4) socio-economic time of insulin initiation, those who added insulin achieved a signifi- status; and (5) potential residual confounders. The selection of cantly higher HbA1c reduction at 24 months, by 0.48% (95% CI 0.47, patients with a minimum 6 months’ treatment with GLP-1RA, exclud- 0.50%), compared with those who switched to insulin (Figure 1C). ing those who initiated insulin therapy earlier, could lead to a poten- This study showed that addition of insulin and switch to insulin tial selection bias; however, the large analysis cohort from the occurred at elevated HbA1c levels of 8.8% and 9.3%, respectively, validated CEMR database used in the study should be considered as with a significant proportion of patients having HbA1c above 8%/ a representative sample, and as such, provides a reliable picture of

172 TABLE 4 Adjusted mean (95% confidence interval [CI]) of change in body weight at 6, 12 and 24 months after glucagon-like peptide-1 receptor agonist (GLP-1RA) initiation, for those who took a GLP-1RA 8 for at least 6, 12 and 24 months, stratified by whether patients continued on GLP-1RA treatment only or added insulin therapy, number and proportion of those who lost ≥5% body weight during follow-up after GLP-1RA initiation, and adjusted mean (95% CI) of changes in SBP and LDL cholesterol at 6, 12 and 24 months after GLP-1RA initiation

On GLP-1RA for ≥6 mo On GLP-1RA for ≥12 mo On GLP-1RA for ≥24 mo All EXE LIRA All EXE LIRA All EXE LIRA Δ Weight at 6 months, adjusted mean (95% CI) GLP-1RA only −1.87 (−1.88, −1.90 (−1.91, −1.80 (−1.81, −1.96 (−1.97, −2.00 (−2.00, −1.86 (−1.87, −2.24 (−2.24, −2.23) −2.23 (−2.24, −2.23) −2.25 (−2.26, −2.23) −1.87) −1.90) −1.80) −1.96) −1.99) −1.85) GLP-1RA + INS −2.32 (−2.32, −2.20 (−2.20, −2.51 (−2.52, −2.41 (−2.41, −2.26 (−2.26, −2.68 (−2.68, −2.50 (−2.50, −2.49) −2.38 (−2.39, −2.37) −2.84 (−2.85, −2.83) −2.31) −2.19) −2.51) −2.40) −2.25) −2.67) Weight loss ≥5% at 6 months, n (%) GLP-1RA only 5539 (15) 3957 (15) 1582 (15) 3978 (15) 2980 (15) 998 (15) 2050 (16) 1684 (16) 366 (17) GLP-1RA + INS 5241 (18) 3085 (17) 2156 (19) 4413 (18) 2729 (17) 1684 (20) 2985 (19) 2124 (18) 861 (21) Δ Weight at 12 months, adjusted mean (95% CI) GLP-1RA only - - - −2.50 (−2.51, −2.54 (−2.55, −2.39 (−2.39, −2.83 (−2.84, −2.82) −2.82 (−2.83, −2.82) −2.87 (−2.88, −2.85) −2.50) −2.54) −2.38) GLP-1RA + INS - - - −2.93 (−2.93, −2.80 (−2.81, −3.16 (−3.16, −3.08 (−3.08, −3.07) −3.00 (−3.00, −2.99) −3.31 (−3.32, −3.30) −2.92) −2.80) −3.15) Weight loss ≥ 5% at 12 months, n (%) GLP-1RA only - - - 6150 (24) 4644 (24) 1506 (23) 3204 (25) 2662 (26) 542 (25)

173 GLP-1RA + INS - - - 6404 (26) 4062 (26) 2342 (28) 4314 (27) 3132 (27) 1182 (29) Δ Weight at 24 months, adjusted mean (95% CI) GLP-1RA only ------−3.31 (−3.32, −3.30) −3.3 (−3.31, −3.29) −3.36 (−3.38, −3.35) GLP-1RA + INS ------−3.40 (−3.41, −3.40) −3.32 (−3.33, −3.31) −3.63 (−3.64, −3.62) Weight loss ≥ 5% at 24 months, n (%) GLP-1RA only ------3884 (31) 3210 (31) 674 (31) GLP-1RA + INS ------5104 (32) 3747 (32) 1357 (33) Blood pressure changes, adjusted mean (95% CI) Δ SBP at 6 months −2.82 (−2.82, −2.78 (−2.78, −2.90 (−2.91, −2.95 (−2.96, −2.91 (−2.91, −3.05 (−3.06, −2.91 (−2.91, −2.90) −2.87 (−2.87, −2.86) −3.05 (−3.07, −3.04) −2.81) −2.77) −2.89) −2.95) −2.90) −3.04) Δ SBP at 12 months - - - −2.79 (−2.79, −2.79 (−2.79, −2.79 (−2.80, −2.85 (−2.86, −2.85) −2.78 (−2.78, −2.77) −3.13 (−3.14, −3.11) −2.78) −2.78) −2.78) Δ SBP at 24 months ------−2.64 (−2.64, −2.63) −2.69 (−2.70, −2.69) −2.44 (−2.46, −2.42) LDL changes, adjusted mean (95% CI) Δ LDL cholesterol at −0.19 (−0.20, −0.19 (−0.20, −0.19 (−0.19, −0.19 (−0.19, −0.19 (−0.20, −0.19 (−0.19, −0.19 (−0.21, −0.18) −0.18 (−0.18, −0.17) −0.20 (−0.21, −0.19) 6 months −0.19) −0.19) −0.18) −0.18) −0.19) −0.18) Δ LDL cholesterol at ---−0.18 (−0.19, −0.18 (−0.19, −0.19 (−0.19, −0.19 (−0.20, −0.19) −0.18 (−0.18, −0.17) −0.20 (−0.21, −0.18)

12 months −0.18) −0.18) −0.18) MONTVIDA Δ LDL cholesterol at ------−0.23 (−0.24, −0.23) −0.23 (−0.24, −0.23) −0.23 (−0.23, −0.22) 24 months

CI, confidence interval; EXE, exenatide; GLP-1RA, glucagon-like peptide-1 receptor agonist; HbA1c, glycated haemoglobin; INS, insulin; IQR, interquartile range; LIRA, liraglutide; SBP, systolic blood pressure. AL ET . MONTVIDA ET AL. 9 the state of risk factor management in routine practice. Complete risk discussed and agreed with S. K. and K. Khunti. O. M. and Kere K con- factor data were available at index date, and imputations were con- ducted the data extraction. S. K. P., K. Klein and O. M. jointly con- ducted for only 9% to 19% missing longitudinal data. The results from ducted the statistical analyses. S. K. and K. Khunti worked on the complete case analyses and imputed data were very similar. Finally, analysis plan along with S. K. P. The first draft of the manuscript was the careful new-user design with a reasonable exposure time of developed by S. K. P. and O. M., and all authors contributed to the 2 years, and appropriate adjustments for confounders are the primary finalization of the manuscript. S. K. P. had full access to all the data in strengths of the study. the study and takes responsibility for the integrity of the data and In conclusion, this novel real-world study provides evidence of the accuracy of the data analysis. significant delays in intensification of treatment in patients with T2DM treated with a GLP-1RA. Among HbA1c non-responders, early addition of insulin with GLP-1RA therapy within 6 months resulted in REFERENCES better and sustainable glycaemic control over 2 years. The findings 1. Turnbull FM, Abraira C, Anderson RJ, et al. Intensive glucose control from the present study suggest that, in people requiring treatment and macrovascular outcomes in type 2 diabetes. Diabetologia. 2009;52(11):2288-2298. fi intensi cation on GLP-1RA, the preferred option should be addition 2. Shah AD, Langenberg C, Rapsomaniki E, et al. Type 2 diabetes and of insulin rather than stopping GLP-1RA and switching to insulin incidence of cardiovascular diseases: a cohort study in 1 9 million therapy. people. Lancet Diabetes Endocrinol. 2015;3(2):105-113. 3. Fox CS, Golden SH, Anderson C, et al. Update on prevention of cardi- ovascular disease in adults with type 2 diabetes mellitus in light of recent evidence: a scientific statement from the American Heart ACKNOWLEDGMENTS Association and the American Diabetes Association. Circulation. 2015;132(8):691-718. We gratefully acknowledge the support for the QIMR Berghofer 4. American Diabetes Association. Standards of medical care in Institute from the Australian Government Department of Education’s diabetes—2015. Diabetes Care. 2015;38(suppl 1):S49-S57. National Collaborative Research Infrastructure Strategy (NCRIS) initi- 5. Paul SK, Klein K, Maggs D, Best JH. The association of the treatment with glucagon-like peptide-1 receptor agonist exenatide or insulin ative through Therapeutic Innovation Australia. No separate funding with cardiovascular outcomes in patients with type 2 diabetes: a ret- was obtained for this study. O. M. acknowledges the support from rospective observational study. Cardiovasc Diabetol. 2015;14:10, her associate supervisors Prof. Ross Young and Prof. Louise Hafner. doi:10.1186/s12933-015-0178-3. 6. Smilowitz NR, Donnino R, Schwartzbard A. Glucagon-like peptide-1 K. Klein acknowledges support from the National Institute for Health receptor agonists for diabetes mellitus: a role in cardiovascular dis- Research Collaboration for Leadership in Applied Health Research ease. Circulation. 2014;129(22):2305-2312. and Care – East Midlands (NIHR CLAHRC – EM), and the NIHR 7. Drucker DJ, Goldfine AB. Cardiovascular safety and diabetes drug development. Lancet. 2011;377(9770):977-979. Leicester Loughborough Diet, Lifestyle and Physical Activity Biomedi- 8. Garber AJ. Novel GLP-1 receptor agonists for diabetes. Expert Opin cal Research Unit. Investig Drugs. 2012;21(1):45-57. 9. Monami M, Dicembrini I, Nardini C, Fiordelli I, Mannucci E. Effects of glucagon-like peptide-1 receptor agonists on cardiovascular risk: a Conflict of interest meta-analysis of randomized clinical trials. Diabetes Obes Metab. 2014;16(1):38-47. S. K. P. has acted as a consultant and/or speaker for Novartis, GI 10. Vora J. Combining incretin-based therapies with insulin: realizing the Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi Pharmaceutical potential in type 2 diabetes. Diabetes Care. 2013;36(suppl 2):S226- S232. and Amylin Pharmaceuticals LLC. He has received grants in support 11. Shaefer CF, Reid TS, Dailey G, et al. Weight change in patients with of investigator and investigator-initiated clinical studies from Merck, type 2 diabetes starting basal insulin therapy: correlates and impact Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi- on outcomes. Postgrad Med. 2014;126(6):93-105. Avensis and Pfizer. K. Khunti has acted as a consultant and speaker 12. Balkau B, Home PD, Vincent M, Marre M, Freemantle N. Factors associated with weight gain in people with type 2 diabetes starting fi for Novartis, Novo Nordisk, Sano -Aventis, Lilly, Merck Sharp & on insulin. Diabetes Care. 2014;37(8):2108-2113. Dohme, Janssen, Astra Zeneca and Boehringer Ingelheim. He has 13. Eng C, Kramer CK, Zinman B, Retnakaran R. Glucagon-like peptide-1 received grants in support of investigator and investigator-initiated receptor agonist and basal insulin combination treatment for the management of type 2 diabetes: a systematic review and meta-analy- trials from Novartis, Novo Nordisk, Sanofi-Aventis, Lilly, Pfizer, Boeh- sis. Lancet. 2014;384(9961):2228-2234. ringer Ingelheim, Merck Sharp & Dohme, Janssen and Roche, and 14. Lee WC, Dekoven M, Bouchard J, Massoudi M, Langer J. Improved funds for research, honoraria for speaking at meetings and has served real-world glycaemic outcomes with liraglutide versus other incretin- based therapies in type 2 diabetes. Diabetes Obes Metab. on advisory boards for Lilly, Sanofi-Aventis, Merck Sharp & Dohme 2014;16(9):819-826. and Novo Nordisk, Boehringer Ingelheim, Janssen and Astra Zeneca. 15. Li Q, Chitnis A, Hammer M, Langer J. Real-world clinical and eco- S. K. has received research grants and been on advisory boards for nomic outcomes of liraglutide versus sitagliptin in patients with Novo Nordisk. O. M. and K. Klein have no conflict of interest to type 2 diabetes mellitus in the United States. Diabetes Ther. 2014;5(2):579-590. declare. 16. Rigato M, Avogaro A, Fadini GP. Effects of dose escalating liraglutide from 1.2 to 1.8 mg in clinical practice: a case-control study. J Endocrinol Invest. 2015;38(12):1357-1363. Author contributions 17. Gautier JF, Martinez L, Penfornis A, et al. Effectiveness and persist- ence with liraglutide among patients with type 2 diabetes in routine S. K. P. conceived the idea, and S. K. P. and O. M. were responsible clinical practice–EVIDENCE: a prospective, 2-year follow-up, observa- for the primary design of the study. The design concept was tional, post-marketing study. Adv Ther. 2015;32(9):838-853.

174 10 MONTVIDA ET AL.

18. Ryder B, Thong K. Findings from the Association of British Clinical with type 2 diabetes treated with basal insulin. Diabetes Obes Metab. Diabetologists (ABCD) nationwide exenatide and liraglutide audits. 2016;18(4):401-409. Hot topics in diabetes. 2012;5:49-61. 26. Paul SK, Klein K, Thorsted BL, Wolden ML, Khunti K. Delay in 19. Thong KY, Jose B, Sukumar N, et al. Safety, efficacy and tolerability treatment intensification increases the risks of cardiovascular of exenatide in combination with insulin in the Association of British events in patients with type 2 diabetes. Cardiovasc Diabetol. Clinical Diabetologists nationwide exenatide audit. Diabetes Obes 2015;14(1):100. Metab. 2011;13(8):703-710. 27. Paul SK, Shaw J, Montvida O, Klein K. Weight gain in insulin treated 20. Balena R, Hensley IE, Miller S, Barnett AH. Combination therapy with patients by BMI categories at treatment initiation: new evidence from GLP-1 receptor agonists and basal insulin: a systematic review of the real-world data in patients with type 2 diabetes. Diabetes Obes Metab. literature. Diabetes Obes Metab. 2013;15(6):485-502. 2016, doi:10.1111/dom.12761. [Epub ahead of print]. 21. Inzucchi SE, Bergenstal RM, Buse JB, et al. Management of hypergly- cemia in type 2 diabetes, 2015: a patient-centered approach: update to a position statement of the American Diabetes Association and the SUPPORTING INFORMATION European Association for the Study of Diabetes. Diabetes Care. 2015;38(1):140-149. 22. Kamal KM, Chopra I, Elliott JP, Mattei TJ. Use of electronic medical Additional Supporting Information may be found online in the sup- records for clinical research in the management of type 2 diabetes. porting information tab for this article. Res Social Adm Pharm. 2014;10(6):877-884. 23. Davis KL, Tangirala M, Meyers JL, Wei W. Real-world comparative outcomes of US type 2 diabetes patients initiating analog basal insulin How to cite this article: Montvida O, Klein K, Kumar S, therapy. Curr Med Res Opin. 2013;29(9):1083-1091. 24. Crawford AG, Cote C, Couto J, et al. Comparison of GE centricity Khunti K and Paul SK. Addition of or switch to insulin therapy electronic medical record database and National Ambulatory Medical in people treated with glucagon-like peptide-1 receptor ago- fi Care Survey ndings on the prevalence of major conditions in the nists: A real-world study in 66 583 patients, Diabetes Obes United States. Popul Health Manag. 2010;13(3):139-150. 25. Khunti K, Nikolajsen A, Thorsted BL, Andersen M, Davies MJ, Metab, 2016. doi: 10.1111/dom.12790 Paul SK. Clinical inertia with regard to intensifying therapy in people

175 NEWS & VIEWS

on such trials6,7. Although about one-third of DIABETES patients can achieve HbA1c concentrations in the target range with GLP1‑RAs alone and Incretin mimetics and insulin — closing will maintain treatment with this therapy for a long period of time, such long-term treat‑ the gap to normoglycaemia ment efficacy is not observed in other patients. Furthermore, the necessity for treatment Michael A. Nauck and Juris J. Meier intensification can occur earlier or later dur‑ ing the course of treatment with GLP1‑RAs. In Treatment of type 2 diabetes mellitus with GLP1 receptor agonists can result head‑to‑head trials comparing GLP1‑RAs with in long-term glycaemic control or can fail over time, in which case insulin can insulin (mainly basal insulin) treatment, both approaches lead to similar degrees of glycae‑ be used as an alternative or as an additive treatment. New research shows mic control when used as separate treatment that the latter is more likely to achieve glycaemic targets than the former. strategies8. Small but statistically significant differences in efficacy favouring GLP1‑RAs Refers to Montvida, O. et al. Addition or switch to insulin therapy in people treated with GLP‑1 receptor agonists: a real over insulin treatments have been noted. This world study in 66,583 patients. Diabetes Obes. Metab. http://dx.doi.org/10.1111/dom.12790 (2016) effect seems to persist even in patients with

high baseline HbA1c levels, in whom one might Using data obtained from the Centricity relative to the time point at which GLP1‑RA intuitively assume a better response to insulin Electronic Medical Record database, which treatment alone failed to provide sufficient therapy than to GLP1‑RAs9. As the clinical effi‑ documents patient and treatment data from glycaemic control1. cacy of GLP1‑RA treatment or insulin regimens >35,000 USA‑based physicians or other health- GLP1‑RAs have been approved as glucose- is similar, the lack of net changes in glycaemic care providers, Montvida et al.1 present data on lowering treatments for patients with T2DM. control when switching from one to the other, 1 changes in HbA1c levels, body weight, systolic Usually, these agonists are added to one or as shown by Montvida et al., is not surprising . blood pressure and LDL cholesterol levels in more oral glucose-lowering agents (for exam‑ An unexpected result of the analysis by patients with type 2 diabetes mellitus (T2DM) ple, metformin or sulfonylureas), as alternatives Montvida et al.1 is that adding insulin treat‑ who received treatment with glucagon-like to insulin treatment. However, GLP1‑RAs can ment to GLP1‑RA treatment does not increase peptide 1 receptor agonists (GLP1‑RAs; spe‑ also be used with insulin treatment, for which body weight; rather, weight loss was greater in cifically, exenatide and liraglutide). Patient long-acting insulin preparations such as insulin patients adding insulin to their regimen. In data were examined between 6 months and glargine2, insulin detemir3 or insulin degludec4 studies directly comparing GLP1‑RAs and 24 months after the initiation of GLP1‑RA are most often used. This combination can be insulin as single injectable agents and in combi‑ treatment in ~67,000 individuals with T2DM. achieved by adding GLP1‑RAs to pre-existing nation4,5, weight reduction for the combination After a minimum of 6 months of treatment insulin treatment2, by adding insulin to pre- treatment was attenuated compared with the with GLP1‑RAs, 33.5% of patients continued existing GLP1‑RA treatment3 or by introduc‑ GLP1‑RA treatment alone, with insulin treat‑ with GLP1‑RA treatment alone for at least ing both treatments at the same time, in the lat‑ ment alone promoting weight gain. Whether 24 months, 7.1% of patients were switched ter case fixed-dose combinations can be used, body weight was assessed by using calibrated to insulin treatment and simultaneously dis‑ such as IDegLira ( plus lira‑ scales or less reliable information provided by continued GLP1‑RA treatment and 59.9% of glutide)4 or LixiLan (insulin glargine plus lix‑ the patient is not known, which points to some patients received insulin (of any preparation) isenatide)5. Prospective, randomized controlled limitations of such ‘real world’ approaches. in addition to GLP1‑RA treatment. trials have demonstrated achievement of HbA1c Another interesting and novel aspect of In patients initially receiving GLP1‑RA levels <7% in 60–85% of patients using these the study by Montvida et al.1 is the change 2–5 treatment and continuing with this treat‑ approaches . in HbA1c levels before intensification of the ment alone for up to 24 months, the great‑ The findings of Montvida et al.1 confirm that therapy was considered necessary. If a patient est reduction in HbA1c levels were observed results obtained from randomized, prospective responded well to GLP1‑RA treatment, the within 6 months of starting treatment, with trials can be replicated in the ‘real world’, as the reduction in HbA1c levels was evident after negligible changes observed after 6 months. database used contained results from diagnos‑ 6 months of treatment, with little further

Patients in whom GLP1‑RA treatment was tic procedures (such as measuring HbA1c levels, change in HbA1c levels with time. However, discontinued and replaced with insulin body weight and LDL cholesterol levels) and treatment intensification (either a switch to treatment did not show improved glycaemic the prescription of medication (including the insulin treatment or the addition of insu‑ control by switching treatment. However, if date of initiation and the duration of continu‑ lin treatment to GLP1‑RA treatment) was insulin was added to GLP1‑RA treatment, a ous use of drugs, such as GLP1‑RAs or insulin preceded by an elevation in HbA1c levels that net improvement in glycaemic control was treatment). At least qualitatively, the results of was usually only evident during the 6‑month observed. This net improvement was greater Montvida et al.1 confirm the findings reported period before the requirement for treatment if insulin was added earlier rather than later, in clinical trials2–5 and in meta-analyses based change. Thus, as with the trajectories of

NATURE REVIEWS | ENDOCRINOLOGY www.nature.com/nrendo ©2016 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved. 176 NEWS & VIEWS

Target achievement Real-world target in clinical trials achievement

40–60% 20–40% Oral glucose-lowering drugs patients patients Traditional approach Novel approach 1 Novel approach 2

Add basal Add a GLP1 40–70% 20–60% Add basal insulin insulin analogue patients patients

Add short- acting insulin at one meal

Add short-acting Add a GLP1 Add basal Add prandial 60–85% 20–60% Current target insulin at all meals analogue insulin insulin patients patients achievement

Potential for target achievement for target Potential Residual glycaemic Future unknown strategies burden ? 100% 100% Normoglycaemia patients patients

Figure 1 | Glycaemic target achievement resulting from different treatment approaches.Nature The Reviews achievement | Endocrinology of glycaemic targets in clinical trials and in real world studies of patients with type 2 diabetes mellitus (T2DM) differs with conventional and novel treatment approaches, and at different stages of treatment escalation. The lack of achievement of normoglycaemia in all patients illustrates the need for further advances in the treatment of T2DM. The gap between the target achievement with current combination treatment approaches and normoglycaemia is called the ‘residual glycaemic burden’. measures of insulin secretory capacity and short-term clinical trials. Addition of insulin 4. Gough, S. C. et al. Efficacy and safety of a fixed-ratio combination of insulin degludec and liraglutide insulin resistance before the manifestation of to GLP1‑RA treatment is, therefore, suggested (IDegLira) compared with its components given T2DM10, short-term dynamics in important to be more effective than switching from a alone: results of a phase 3, open-label, randomised, 26‑week, treat‑to‑target trial in insulin-naive patients determinants of glycaemic control are also GLP1‑RA to insulin in reducing HbA1c lev‑ with type 2 diabetes. Lancet Diabetes Endocrinol. 2, evident during the transition from satisfactory els, thus offering one effective way to narrow 885–893 (2014). 5. Rosenstock, J. et al. Benefits of LixiLan, a titratable glycaemic control with a glucose-lowering the gap to near-normoglycaemia. However, fixed-ratio combination of insulin glargine plus medication (in this case GLP1‑RAs) to treat‑ despite these clear advances in the treatment lixisenatide, versus insulin glargine and lixisenatide monocomponents in type 2 diabetes inadequately ment failure, indicating the need for a rapid of T2DM, a large proportion of patients still controlled with oral agents: the LixiLan‑O response when implementing a treatment fail to reach normoglycaemia with current randomized trial. Diabetes Care http://dx.doi. org/10.2337/dc16‑0917 (2016). intensification. Increased granularity, ena‑ glucose-lowering strategies (FIG. 1). Continued 6. Balena, R., Hensley, I. E., Miller, S. & Barnett, A. H. bling the examination of these relationships efforts will be needed, to develop novel treat‑ Combination therapy with GLP‑1 receptor agonists and basal insulin: a systematic review of the literature. at a higher temporal resolution, is not possi‑ ment strategies, reduce treatment-related Diabetes. Obes. Metab. 15, 485–502 (2013). ble with the present data set, but should be an adverse effects, optimize treatment adherence 7. Eng, C., Kramer, C. K., Zinman, B. & Retnakaran, R. Glucagon-like peptide‑1 receptor agonist and basal area of interest for future research. and refine current combination strategies. insulin combination treatment for the management One might also argue whether clinical of type 2 diabetes: a systematic review and meta- Michael A. Nauck and Juris J. Meier are at the Division analysis. Lancet 384, 2228–2234 (2014). inertia (that is, an undue delay in treatment of Diabetology, Department of Medicine, St Josef- 8. Abdul-Ghani, M. A., Williams, K., Kanat, M., Altuntas, Y. & DeFronzo, R. A. Insulin versus GLP‑1 intensification), is at work in these circum‑ Hospital, Ruhr University Bochum, Gudrunstraße 56, analogues in poorly controlled type 2 diabetic stances, or whether the hesitation to escalate D-44791, Bochum, Germany. subjects on oral therapy: a meta-analysis. treatment rather reflects physicians’ concerns J. Endocrinol. Invest. 36, 168–173 (2013). [email protected]; [email protected] 9. Buse, J. B. et al. Is insulin the most effective about hypoglycaemia, other treatment-related doi:10.1038/nrendo.2016.180 injectable antihyperglycaemic therapy? Diabetes Obes. Metab. 17, 145–151 (2015). adverse effects or monetary considerations. Published online 10 Nov 2016 10. Tabak, A. G. et al. Trajectories of glycaemia, insulin In accordance with these concerns, long- sensitivity, and insulin secretion before diagnosis of term adherence to GLP1‑RAs is still subop‑ 1. Montvida, O., Kleine, K., Kumar, S., Khunti, K. & type 2 diabetes: an analysis from the Whitehall II Paul, S. K. Addition or switch to insulin therapy in study. Lancet 373, 2215–2221(2009). timal, probably owing to treatment-emergent people treated with GLP‑1 receptor agonists: a real world study in 66,583 patients. Diabetes Obes. adverse effects. Metab. http://dx.doi.org/10.1111/dom.12790 Competing interests statement The study by Montvida et al.1 also shows (2016). M.A.N. declares that he has received personal fees, grants, 2. Buse, J. B. et al. Use of twice-daily exenatide in basal non-financial support or other support from AstraZeneca, that even with the combination of GLP1‑RAs insulin-treated patients with type 2 diabetes: a Berlin Chemie‑AG, Boehringer Ingelheim, Eli Lilly, and insulin, the two drug classes consid‑ randomized, controlled trial. Ann. Intern. Med. 154, GlaxoSmithKline, Hoffmann La Roche, Intarcia 103–112 (2011). Therapeuticals, Janssen Global Services, Medscape LLC, ered to be most effective in terms of reduc‑ 3. DeVries, J. H. et al. Sequential intensification of Merck Sharp & Dohme, Novartis, Novo Nordisk, Sanofi- metformin treatment in type 2 diabetes with Aventis and Versartis. J.J.M. declares that he has received ing HbA1c levels, only a minority of patients liraglutide followed by randomized addition of basal grants or personal fees from Astra Zeneca, Berlin-Chemie, achieve an HbA1c target of <7%, in contrast insulin prompted by A1C targets. Diabetes Care 35, Boehringer-Ingelheim, Eli Lilly, MSD, NovoNordisk, Sanofi with the higher efficacy reported in current 1446–1454 (2012). and Servier.

NATURE REVIEWS | ENDOCRINOLOGY www.nature.com/nrendo ©2016 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved. ©2016 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved. 177

APPENDIX B

178 Send Orders for Reprints to [email protected]

16 The Open Bioinformatics Journal , 2017, 10, 16-27 The Open Bioinformatics Journal

Content list available at: www.benthamopen.com/TOBIOIJ/

DOI: 10.2174/1875036201710010016

RESEARCH ARTICLE Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus

Ebenezer S. Owusu Adjah1,2, Olga Montvida1,3, Julius Agbeve1 and Sanjoy K. Paul4,*

1QIMR Berghofer Medical Research Institute, Brisbane, Australia 2Faculty of Medicine, The University of Queensland, Brisbane, Australia 3School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Faculty of Health, Queensland University of Technology, Brisbane, Australia 4Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia

Received: August 17, 2017 Revised: November 28, 2017 Accepted: November 29, 2017 Abstract: Background: Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences.

Objective: To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database.

Methods: Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve.

Results: In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication.

Conclusion: Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.

Keywords: Electronic Medical Records, Primary Care Database, Machine Learning Algorithm, Diabetes, Type 2 Diabetes, Cohort Identification.

* Address correspondence to this author at the Melbourne EpiCentre , University of Melbourne and Melbourne Health, Melbourne, Australia; Tel: +61-3-93428433; E-mail: [email protected]

1875-0362/17 2017 Bentham Open

179 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 17

1. INTRODUCTION Recent advances in the design and implementation of large patient-level electronic medical records (EMRs) from national primary care databases have created opportunities in clinical, epidemiological and public health research [1, 2]. In a typical primary or ambulatory care setting, large volumes of data are generated as patients go through various phases of treatment. Individual patients’ longitudinal data on demographics, lifestyle, disease and treatment history, clinical and laboratory parameters, hospitalization statistics, and clinical events are typically organized and stored in a form of relational database. Such databases present unique challenges in terms of efficient and effective extraction of data for various investigative interests [3]. One of the challenging aspects in this context is the identification of disease cohorts for retrospective or prospective clinical epidemiological studies [4, 5]. Diagnostic codes, such as the International Classification of Diseases (ICD) codes or Read codes [6], are generally used to identify disease cohorts from EMRs [4]. The reliability of diagnosis coding for various diseases has been extensively examined for many primary care databases including The Health Improvement Network (THIN) database from the United Kingdom [7 - 9]. However, there are four specific issues in relation to identifying cohorts by diagnostic codes: (1) differentiating between disease subtypes from high-level codes, (2) overlapping codes of disease subtypes longitudinally at individual patient level, (3) absence of codes for diseased patients (false negatives), and (4) presence of disease specific codes for patients without the specific disease (false positives). With regards to diabetes mellitus (DM), identification and appropriate classification of different types of diabetes in the primary care databases are particularly challenging [5, 10 - 13]. These challenges border mostly on inaccurate coding leading to misclassification, misdiagnosis, and undiagnosed diabetes [12]. Algorithms based on laboratory, clinical, and medication data have thus been proposed as tools for distinguishing between type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM) [10, 14 - 16]. However, the overall accuracy and reliability of derived disease cohorts based on diagnostic codes can be improved by implementing advanced machine learning (ML) or statistical data mining techniques and clinically guided cohort selection algorithms that robustly capture comprehensive patient level information available in the EMRs [4, 5, 12, 13]. Shivade and colleagues (2014) have conducted a systematic review of various techniques used for the identification of different disease cohorts from different sources of clinical databases [2]. Some of these proposed algorithms have been criticized for their appropriateness in the context of other studies [17]. While several studies compared or applied ML techniques to identify T2DM patients, to the best of our knowledge, there is no study that employed an extensive assessment of diagnostic codes, deterministic clinical selection algorithms, and ML algorithms simultaneously to identify T2DM cohorts from primary care EMRs. The aims of this exploratory methodological study were to (1) explore technical challenges in the extraction of disease cohorts, (2) compare the ability of different clinically guided cohort selection algorithms to identify the disease cohorts, and (3) compare the disease cohorts identified by ML algorithms and clinically guided cohort selection algorithms using a large nationally representative primary care database from the UK.

2. MATERIALS AND METHODS In this section, we introduce the primary care database, describe the challenges in identifying cohort of patients with specific disease (i.e. T2DM), explain the clinically guided cohort selection algorithms, and the data mining and computational processes leading to comparison of different supervised ML techniques.

2.1. Data Source Data from The Health Improvement Network (THIN), which is a patient level primary care data from UK was used in this study. THIN is an ongoing primary care database of medical records of anonymized patients from general practitioners, covers over 600 UK general practices, and has been linked to the hospital episode statistics (HES) and other statistics from the National of Bureau of Statistics. Longitudinal patient level records have been collected since 1990 and the current version of the database holds more than 13 million individual patient records. The patients included in this database are representative of the UK population by age, gender, medical conditions and death rates adjusted for demographics and social deprivation. The accuracy and completeness of THIN database have been previously described elsewhere [18, 19]. The THIN database is considered as one of the most comprehensive patient level databases available globally, and has been extensively used by researchers and government bodies for clinical, epidemiological and public health related studies [20]. The database contains extensive information on individuals’

180 18 The Open Bioinformatics Journal , 2017, Volume 10 Owusu Adjah et al. demographic, clinical, laboratory, medications and event history data. The study protocol was approved by the Independent Scientific Review Committee for the THIN database (Protocol Number: 15THIN030) and the Institutional Review Board of QIMR Berghofer Medical Research Institute.

2.2. Challenges in Identifying Disease Cohort THIN uses the UK’s standard Read code classification system which is useful for hierarchical classification of patients’ specific circumstances and lifestyles, thereby enhancing scalability and retrieval (6). However, the Read coding system is complex as a disease or an encounter with a general practitioner can be coded in several ways including use of existing codes or by creating new user-defined codes [21]. In this way, considerable variation and inconsistency is introduced into the coding system as observed in the case of DM [11, 14, 22].

2.2.1. Differentiating Between Disease Subtypes Typically, many diabetes related codes are available for a single patient, some of which are high- level codes (e.g. C10 - “Diabetes mellitus”) or disease related codes that are unspecific in the description of the diabetes type (e.g. C106.12-“Diabetes mellitus with neuropathy”). Common practice has been to exclude any high level codes [14, 23] which may lead to underestimation of the disease cohort. When it is impossible to identify disease subtype (type 1 or type 2 diabetes) from the diagnostic codes, data on surrogate markers (like glutamic acid carboxylase) could be useful, but such information is not available in THIN database. Nevertheless, combinations of available biomarkers (such as age, weight or HbA1c) and medication prescriptions have been used to distinguish types of diabetes in some studies [10, 14].

2.2.2. Longitudinally Overlapping Disease Subtypes Patients may have different disease subtypes coded longitudinally as a result of data entry errors or biological progression of the disease. While the former can lead to any combinations of subtypes, the latter may result in developing T1DM from T2DM or T2DM from gestational diabetes. To distinguish between contradictory codes, longitudinal exploratory techniques were applied in some studies [5]. Also, the techniques described above that deal with unspecific codes may be considered. To address the issue of contradictory diagnostic codes longitudinally, the following was adopted to distinguish between T1DM and T2DM.

i. Use of Read codes that uniquely distinguish between T1DM and T2DM. ii. In patients with unspecific codes, or longitudinally overlapping subtypes, the following is used: a. If oral antidiabetic drug (ADD) is taken ≥ 2 months, then T2DM. b. Otherwise, if age at first available diagnosis date ≤ 18 years and insulin initiated within 1 year, then T1DM. c. Otherwise, if age at first available diagnosis date > 18 years and insulin initiated within 3 months then T1DM. d. Else T2DM. iii. Patients with codes for gestational diabetes and other forms of diabetes were not include in this study

2.2.3. Absence of Codes for Patients with Disease and Presence of Codes for Patients without Disease Data entry errors such as omissions, typing, communicating errors and patients’ temporary loss of follow-up in EMRs usually result in relatively small amount of false positive, and larger numbers of false negative patients identified by diagnostic codes. Earlier studies have addressed this complex issue by employing deterministic or probabilistic algorithms [2, 15, 16]. We further focus on this challenging aspect by comparing deterministic (clinically guided) and probabilistic (ML) cohort identification approaches.

2.3. Clinically Guided Cohort Selection Algorithms Four separate cohorts were created by applying logical, clinically guided algorithms that select patients from those who have at least one record of Read code for T2DM (Fig. 1). Specifically, the T2DM cohorts were selected on the basis of available records for:

i. Selection algorithm 1: T2DM Read code (Cohort 1);

181 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 19

ii. Selection algorithm 2: Lifestyle modification intervention + T2DM Read code (Cohort 2); iii. Selection algorithm 3: At least one prescription for antidiabetic medication + lifestyle modification intervention + T2DM Read code (Cohort 3); iv. Selection algorithm 4: At least one prescription for antidiabetic medication or lifestyle modification intervention + T2DM Read code (Cohort 4).

Selection algorithm 1: T2DM Read code only; Selection algorithm 2: T2DM Read code + lifestyle modification advice. Selection algorithm 3: T2DM Read code + antidiabetic medication + lifestyle modification advice. Selection algorithm 4: T2DM Read code + (antidiabetic medication or lifestyle modification advice)

All patients with valid record (n=11,018,025) No record of DM (n=10,487,077) Individuals with any type of DM (n=530,948) Exclude : 1. Type 1 Diabetes (n=46,238) 2. Gestational Diabetes (n=15,814) 3. Prediabetes (n=86,800) T2DM (Selection algorithm 1) 4. Other Types (n=2,439) n=379,657

Age, mean = 60 years Male, % = 55

T2DM (Selection algorithm 2) T2DM (Selection algorithm 3) T2DM (Selection algorithm 4) n=243,597 n=197,326 n=346,993

Age, mean = 59 years Age, mean = 58 years Age, mean = 60 years Male, % = 55 Male, % = 56 Male, % = 55

Fig. (1). Flow chart for the selection of type 2 diabetes (T2DM) cohorts by clinically guided algorithms.

2.4. Supervised Machine Learning Techniques The process of selecting one most appropriate probabilistic algorithm to identify patients with T2DM is described below.

2.4.1. Feature Selection THIN database was mined to detect the most frequent medications, comorbidities, laboratory and anthropometric measurements among patients with T2DM identified on the basis of Read codes. The resulting 280 variables were combined with current clinical considerations, practices and guidelines for T2DM management [24], and 11 potential disease predictors were obtained through iterative process (Table 1). Correlation based Feature Selection (CFS) algorithm was applied to determine best of these predictors [25, 26]. This scheme independent attribute subset selection approach is particularly useful when attributes are correlated with one another, and with the class attribute. Bi- directional, forward and backward greedy search methods were applied using 10-fold cross-validation [27] and they all agreed on the same seven features described in Table 1.

2.4.2. Training Dataset From the 11,018,025 patients in THIN database, a training dataset of 150,000 instances, containing equal number of positive and negative representatives was extracted. Positive instances were randomly selected from patients with (1) available T2DM Read code, (2) at least one year of follow-up, and (3) 18-90 years old at the time of T2DM diagnosis.

182 20 The Open Bioinformatics Journal , 2017, Volume 10 Owusu Adjah et al.

Negative instances were also randomly selected from those without Read code for any subtype of DM and at least one year of follow-up (Fig. (2), training set). Table 1. Features selected as best T2DM predictors.

– Feature Name Feature Type Selected for ML Two measurements of HbA >6% or fasting blood glucose > 7 mmol/l or random blood glucose > 11.1 mmol/l 1 1c Binary Yes within 1 year. 2 Any antidiabetic drug prescriptions for at least 6 months. Binary Yes 3 Average BMI. Continuous Yes Hypertension diagnosis or antihypertensive drug use greater or equal to 6 months or beta blockers prescription 4 Binary Yes for 6 months or more. 5 Chronic kidney diagnosis. Binary Yes 6 Retinopathy or neuropathy diagnosis. Binary Yes 7 Average systolic blood pressure. Continuous Yes 8 Lifestyle modification advice. Binary No 9 Average HbA1c. Continuous No 10 Average random glucose Continuous No 11 Heart failure or myocardial infarction or stroke or coronary artery disease Binary No

AllAll patientspatients withwith T2DM (Cohort 1) No record of DM validvalid recordrecord (n=379,657) (n=10,487,077) (n=11,018,025)(n=11,018,025)

Exclude : 1. Type 1 Diabetes (n=46,238) 2. Gestational Diabetes (n=15,814) 3. Prediabetes (n=86,800) 4. Other Types (n=2,439)

Randomly select Follow-up ≥ 1 yr Follow-up ≥ 1 yr 75,000 age at diagnosis 18 - 90 years (n=9,587,202) (- instances) (n=350,201)

Randomly Training set select 75,000 (n=150,000) (+ instances)

Prediction set (n=9,937,403)

Fig. (2). Flowchart of creating dataset for machine learning training, and of dataset for predicting diabetes status.

2.4.3. Classification Algorithm Selection Keeping the selected subset of 7 robust predictors of T2DM, six classification algorithms were applied to the training set. Ten repeat 10-fold cross-validation was applied to calculate true positive rate (sensitivity), true negative

183 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 21 rate (specificity), and area under receiver operating characteristic curve (AUC). Percent of correctly classified instances and required central processing unit (CPU) time for training the algorithms were also derived. The algorithms for comparison were: Naïve Bayes [28, 29], Logistic regression [30], Support Vector Machine (SVM) [31, 32], Multilayer Perceptron (MP) [33], Decision Tree with J48 modification [34], and One Rule [35]. One Rule algorithm performed significantly worse. Except differences in CPU time, performance of other algorithms was similar. Among them, Naïve Bayes had lower sensitivity misclassifying approximately 500 additional patients compared to other approaches. AUC was smaller for SVM and J48, while SVM and MP required significantly higher CPU time (Table 2). Interestingly, neither body mass index nor blood pressure contributed significantly to any model. Logistic regression was selected as most appropriate model for predicting T2DM. The model obtained from full training dataset was applied to all THIN database patients with no record of Read code for diabetes diagnosis other than T2DM, and with available follow-up for at least one year (Fig. (2), prediction set). Table 2. Performance of machine learning algorithms on the training dataset.

Multilayer Support Vector J48 – Naïve Bayes Logistic Regression One Rule Perceptron Machine Decision Tree Percent correct 95.6 95.9 95.9 95.9 95.9 91.7 TPR 0.98 0.99 0.99 0.99 0.99 0.99 TNR 0.93 0.93 0.93 0.93 0.93 0.84 AUC 0.98 0.98 0.98 0.96 0.96 0.92 CPU time 0.09 3.36 68.03 191.9 1.78 0.21 TPR: True Positive Rate, TNR: True Negative Rate; AUC: Area Under receiver operating characteristic Curve; CPU: Central Processing Unit. 3. RESULTS The distributions of basic characteristics of patients identified by all four clinically guided algorithms and the ML algorithm were similar (Table 3). Clinically guided algorithms 1-4 and the ML algorithm resulted in cohorts of 379,657; 243,597; 197,326; 346,993; and 383,330 patients with T2DM respectively. For patients identified by the ML algorithm who did not have a Read code, the first available date of entry of the significant predictors was used as their date of diagnosis. At the time of diabetes diagnosis, identified patients were on average 60 years old, 86 kg in weight with 55% male. The proportions of those who had two elevated glucose level measurements within one year were 75, 86, 90, 79, and 82% in cohorts identified by selection algorithms 1-4 and ML respectively. With median 11 years of follow-up post diagnosis, proportions of those who received at least one prescription for antidiabetic medication were 79, 81, 100, 87, and 75% in cohorts identified by rules 1-4 and ML respectively. Among the cohort of T2DM patients identified by ML algorithm, 317,979 (83% of 383,330) patients had Read code for T2DM (Table 4). It is worth noting that 59,678 (16% of 379,657) patients with a record of T2DM Read code were not selected by ML approach. Almost a fifth (17% of 383,330) of the patients in ML cohort were without a record of T2DM Read code. Of them, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication (Table 4). In order to assess the proportion of patients that remain undetected by the algorithms used in this study, complement cohort-specific analysis was performed (data not shown). Among patients not selected by ML as T2DM, only 884 patients had at least two elevated glucose measurements (HbA1C > 6% or fasting blood glucose > 7 mmol/l or random blood glucose > 11.1 mmol/l) within 1 year, compared to 32,039, 106,671, 137,796, and 42,583 patients not selected by selection algorithms 1-4. Table 3. Baseline characteristics of T2DM patients identified by selection algorithms and logistic regression classifier (ML).

– Selection Algorithm 1 Selection Algorithm 2 Selection Algorithm 3 Selection Algorithm 4 ML Patients, n 379,657 243,597 197,326 346,993 383,330 Age at diagnosis (years) α 60 (15) 59 (14) 58 (14) 60 (15) 59 (15) Age at diagnosis (years) * 61 (50,71) 60 (50,69) 58 (49,67) 60 (50,70) 60 (50,70) ≤40 32,644 (9) 19,761 (8) 17,969 (9) 29,701 (9) 71,752 (19) 41-50 62,656 (17) 43,872 (18) 39,289 (20) 59,608 (17) 58,813 (15) 51-60 90,464 (24) 62,610 (26) 54,006 (27) 85,587 (25) 84,277 (22) 61+ 193,893 (51) 117,354 (48) 86,062 (44) 172,097 (50) 168,488 (44) Male # 208,155 (55) 134,393 (55) 110,178 (56) 191,107(55) 200,447 (52)

184 22 The Open Bioinformatics Journal , 2017, Volume 10 Owusu Adjah et al.

(Table 3) contd..... – Selection Algorithm 1 Selection Algorithm 2 Selection Algorithm 3 Selection Algorithm 4 ML At least one prescription# 300,722 (79) 197,326 (81) 197,326 (100) 300,722 (87) 287,095 (75) Prescription duration ≥ 6 243,064 (64) 171,800 (71) 171,800 (87) 243,064 (70) 254,255 (66) months# RBG (mmol/l) α § 11.5 (5.1) 11.4 (5.1) 12.1 (5.3) 11.6 (5.2) 11.3 95.2) RBG (mmol/l) α ‡ 9.5 (3.4) 9.4 (3.3) 9.9 (3.4) 9.6(3.4) 9.1 (3.5) FBG (mmol/l) α § 8.4 (2.3) 8.4 (2.3) 8.9 (2.4) 8.5 (2.3) 8.3 (2.3) FBG (mmol/l) α ‡ 7.8 (2.1) 7.7 (2.0) 8.0 (2.1) 7.8(2.1) 7.5 (2.1) α § HbA1c (%) 8.4 (2.1) 8.4 (2.1) 8.7 (2.2) 8.5 (2.2) 8.3 (2.1) α ‡ HbA1c (%) 7.5 (1.4) 7.5 (1.3) 7.7 (1.3) 7.5(1.4) 7.4 (1.3) Composite measure# ‡ 283,419 (75) 208,787 (86) 177,689 (90) 272,875 (79) 314,574 (82) Weight (kg) α § 89.4(20.8) 90.3 (21.0) 91.1 (21.1) 89.6 (20.9) 89.3 (21.0) Weight (kg) α ‡ 85.0 (19.8) 86.6 (19.9) 87.6 (20.0) 85.5 (19.8) 86.1 (20.6) BMI (kg/m2) α § 31.6 (6.7) 32.0 (6.7) 32.2 (6.7) 31.7 (6.7) 31.7 (6.8) BMI (kg/m2) α ‡ 30.2 (6.1) 30.7 (6.1) 31.0 (6.2) 30.4(6.1) 30.7 (6.7) Normal weight # 22311(12) 15,821 (11) 12,339 (11) 21,108 (12) 24,453 (13) Overweight # 58,447 (32) 44,283 (32) 35,289 (31) 55,885 (32) 61,846 (32) Grade 1 obese # 52,465 (29) 41,323 (30) 33,669 (30) 50,423 (29) 55,684 (29) Grade 2 obese # 27,168 (15) 22,163 (16) 18,497 (16) 26,336 (15) 29,178 (15) Any CVD# 106,523 (28) 67,011 (28) 51,905 (26) 96,147 (28) 93,703 (24) CKD# 10,547 (3) 8,035 (3) 4,609 (2) 9,445 (3) 12,404 (3) Cancer# 24,159 (6) 15,998 (7) 11,084 (6) 21,536 (6) 22,112 (6) Hypertension# 149,752 (39) 104,916 (43) 79,193 (40) 137,440 (40) 140,341 (37) Follow-up (years) * 11 (6,17) 10 (6,15) 11 (6,16) 11(6,17) 10 (5,16) Legend: Selection algorithm 1: Read code only; Selection algorithm 2: Read code and lifestyle modification advice; Selection algorithm 3: Read code and medication and lifestyle modification advice; Selection algorithm 4: Read code and (medication or lifestyle modification advice); ML: Machine learned cohort; RBG: random blood glucose; FBG: fasting blood glucose; Composite measure: fasting blood glucose > 7mmol/l or random blood 2 glucose >11.1 mmol/l or HbA1c >6; BMI: Body Mass Index (kg/m ); Normal: (18.5-24.99), Overweight: (25-29.99); Grade 1 obese: (30-34.99), Grade 2 obese (35-39.99); α: Mean(SD); #: n(%); *: median (Q1,Q3); CKD: Chronic kidney disease ; Any CVD: any cardiovascular disease defined as occurrence of angina, MI, coronary heart disease (CHD), HF, stroke, and peripheral artery disease (PAD) on or before diagnosis of T2DM; §: measured at diagnosis and ‡: an average over of all available measurements.

Table 4. Baseline characteristics and distribution of glycaemic markers among patients identified by ML.

Machine Learned T2DM Cohort – (n=383,330) – With Read Code Without Read Code Patients # 319,979 (83) 63,351 (17) Age at diagnosis (years) α 60 (14) 54 (24) Age at diagnosis (years) * 60 (50, 70) 56 (33, 73) ≤ 40 25,645 (8) 46,107 (73) 41-50 56,583 (18) 2,230 (4) 51-60 81,262 (25) 3,015 (5) 61+ 156,489 (49) 11,999 (19) Male # 176,568 (55) 23,879 (38) At least one prescription # 273,272 (85) 13,823 (22) Prescription duration ≥ 6 months # 241,517 (76) 12,738 (20) RBG >11.1 mmol/l #, 101,135 (32) 1,471 (2) FBG > 7 mmol/l# 50,446 (16) 1,695 (3) # HbA1c > 6% 274,565 (86) 29,793 (47) Composite measure# 274,565 (86) 29,793 (47) Legend: RBG: random blood glucose; FBG: fasting blood glucose; Composite measure: fasting blood glucose > 7 mmol/l or random blood glucose

>11.1 mmol/l or HbA1c > 6; *: median (Q1,Q3), #: n (%), α: mean (SD)

185 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 23

4. DISCUSSION In this study we addressed a number of problems encountered by computer based methods in the complex tasks of identifying a disease cohort from large EMR databases. Specifically, (1) we have defined and discussed common technical challenges in differentiating diabetes subtypes, (2) combining clinical, medication and morbidity information with database patterns, we selected a set of best predictors as feeds to ML algorithms that can be used to identify patients with T2DM in the absence of any disease code, and (3) compared T2DM cohorts identified by clinically guided selection algorithm and ML algorithm. The results of this study are of particular interest to researchers who work with THIN database, however methods explored in this study are generalizable for any EMR with different disease coding systems. Although we have seen no difference in distributions of basic characteristics among cohorts obtained by deterministic and probabilistic approaches, ML algorithms were found to be superior. With the use of selected features, we could confirm that 83% of the patients identified by the ML algorithm had a Read code for T2DM (Table 3). Those without Read code had comparable high risk as identified by the significant predictors. While 25 / 21% of patients with Read code / Read code + (medication or life style advice) for T2DM did not have at least two elevated measures of blood glucose within one year, only 18% of ML identified cohort did not have such measures. Among Read code / ML defined patients without elevated composite glucose measure, 69 / 41% did not receive ADD for at least 6 months. It is important to note that the patients without a Read code for diabetes are highly less likely to have a 2 elevated blood glucose measures within one year unless they were known to be diabetic or pre-diabetic. Five of the six ML algorithms demonstrated similar performances in the training-testing data sets. Logistic regression approach was chosen as the best classifier for THIN database, however different feature patterns within other EMRs could potentially lead to better performance of other ML techniques to predict T2DM cohort. Tapak and colleagues [36] reported SVM as the better classifier, while Mani and colleagues [37] reported decision trees to outperform other ML algorithms. In this context it is important to mention that, ML algorithms cannot operate without meaningful data fed-in (“Garbage in, garbage out” principle). Although the use of different datasets makes it difficult for direct comparisons, a critical part of ML steps is the feature engineering or selection. Some recent studies have used large sets of variables associated with diabetes with the aim of enhancing the predictive accuracy [38, 39]. However, this may be limited by inclusion of irrelevant and redundant variables, and model overfitting in cases where number of observations are less than number of variables. While earlier studies were primarily based on clinically guided feature selection, we adopted a more holistic approach initially to identify the data driven candidates as potential predictors of T2DM from the whole database. Combining clinical knowledge and data driven candidate predictors, we ensured selection of most robust set of 7 predictors. Although selected features were not surprising, we have seen that, BMI, lifestyle modification advice and hypertension did not contribute to the models, while microvascular complications did. We have compared the performances of six classification algorithms on a set of 150,000 instances, which was reconfirmed to be large enough by assessing the performance curves of several incremental classifiers. Nevertheless, training dataset was small compared to the whole database; therefore in order to ensure that our results are not prone to selection bias, we performed same analyses on 2 other randomly selected training datasets and obtained almost identical results. Unlike most ML applications that focus on training to ensure best fit for future predictions, in this study, we have used various techniques to correct available labelling with ultimate goal to improve quality of diseased cohort (Type 2 Diabetes). It would be of great interest to compare ML error, Rule-based error, and human error in terms of predicting disease from available data. For this task a “gold standard” dataset would consist of random patients whose true disease state was reconfirmed approaching both clinician and patient. We were not able to conduct this task, as the THIN database contains de-identified patient-level data, which is true for all large EMR databases that are used for research purposes. THIN database also does not have data on surrogate markers that could improve quality of the cohort identification algorithms. Miscoding between type 1 and type 2 diabetes in the primary care database is not uncommon [40, 41]. It is important to mention that ML techniques may poorly distinguish between disease subtypes without incorporating additional classification rules. We have excluded patients with other diabetes Read codes from the dataset on which our ML algorithm was applied. Furthermore, for patients identified as T2DM without Read codes, the ML techniques are not able to provide exact diagnosis date, therefore requiring incorporation of additional techniques.

186 24 The Open Bioinformatics Journal , 2017, Volume 10 Owusu Adjah et al.

CONCLUSION Careful investigation of diagnostic codes patterns within the databases is essential prior to conducting analyses on the disease cohort. Direct extraction of a disease cohort using diagnostic codes may lead to inclusion of falsely diagnosed patients and omitting patients with true disease state. Rule-based techniques represent conservative approach, which results in minimizing only false positive cases. ML techniques that minimize both false positives and false negatives cases represent more robust approach. However, ML techniques heavily rely on the meaningful input and use diagnostic codes for training purposes. Combining human expertise and machine power represent best strategy that allows to test hypotheses on potential disease predictors, lower human interventions, and to reduce the burden of selection bias.

LIST OF ABBREVIATIONS

ADD = Antidiabetic Drug AUC = Area Under the Curve BMI = Body Mass Index CHD = Coronary Heart Disease CPU = Central Processing Unit CVD = Cardiovascular Disease DM = Diabetes Mellitus EMR = Electronic Medical Record FBG = Fasting Blood Glucose GP = General Practitioner HbA1c = Glycated Haemoglobin HES = Hospital Episode Statistics HF = Heart Failure ICD = International Classification of Diseases MI = Myocardial Infarction ML = Machine Learning MP = Multilayer Perceptron PAD = Peripheral Artery Disease RBG = Random Blood Glucose SD = Standard Deviation SVM = Support Vector Machine T1DM = Type 1 Diabetes Mellitus T2DM = Type 2 Diabetes Mellitus THIN = The Health Improvement Network TNR = True Negative Rate TPR = True Positive Rate UK = United Kingdom

ETHICS APPROVAL AND CONSENT TO PARTICIPATE The study protocol was approved by the Independent Scientific Review Committee for the THIN database (Protocol Number: 15THIN030) and the Institutional Review Board of QIMR Berghofer Medical Research Institute.

HUMAN AND ANIMAL RIGHTS No Animals/Humans were used for studies that are base of this research.

CONSENT FOR PUBLICATION Not applicable.

187 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 25

CONFLICT OF INTEREST Sanjoy K. Paul has acted as a consultant and/or speaker for Novartis, GI Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi Pharmaceutical and Amylin Pharmaceuticals LLC. He has received grants in support of investigator and investigator initiated clinical studies from Merck, Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi-Avensis and Pfizer. Ebenezer S. Owusu Adjah, Olga Montvida, and Julius Agbeve. have no conflict of interest to declare.

ACKNOWLEDGEMENTS Sanjoy K. Paul conceived the idea and was responsible for the primary design of the study. Ebenezer S. Owusu Adjah , and Olga Montvida significantly contributed in the study design. Julius Agbeve conducted the primary raw data extraction. Ebenezer S. Owusu Adjah and Olga Montvida jointly conducted the data extraction, data manipulation, statistical analyses and developed the first draft of the manuscript. Ebenezer S. Owusu Adjah , Olga Montvida , Sanjoy K. Paul, and Julius Agbeve contributed to the finalization of the manuscript. Sanjoy K. Paul had full access to all the data in the study and is the guarantor, taking responsibility for the integrity of the data and the accuracy of the data analysis. Ebenezer S. Owusu Adjah was supported by QIMR Berghofer International Ph.D. Scholarship and The University of Queensland International Scholarship. Olga Montvida was supported by the Queensland University of Technology International Scholarship. No separate funding was obtained for this study. Melbourne EpiCentre gratefully acknowledges the support from the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) initiative through Therapeutic Innovation Australia and the research project funding from the National Health and Medical Research Council of Australia (Project Number: GNT1063477). Olga Montvida acknowledges the support from her associate supervisors Prof. Ross Young and Prof. Louise Hafner. REFERENCES

[1] Sagreiya H, Altman RB. The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables. J Biomed Inform 2010; 43(5): 747-51. [http://dx.doi.org/10.1016/j.jbi.2010.03.014] [PMID: 20363365] [2] Shivade C, Raghavan P, Fosler-Lussier E, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21(2): 221-30. [http://dx.doi.org/10.1136/amiajnl-2013-001935] [PMID: 24201027] [3] Tate AR, Beloff N, Al-Radwan B, et al. Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface. J Am Med Inform Assoc 2014; 21(2): 292-8. [http://dx.doi.org/10.1136/amiajnl-2013-001847] [PMID: 24272162] [4] Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE. A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 2011; 44(Suppl. 1): S63-8. [http://dx.doi.org/10.1016/j.jbi.2011.10.013] [PMID: 22079803] [5] Sadek AR, Van Vlymen J, Khunti K, De Lusignan S. Automated identification of miscoded and misclassified cases of diabetes from computer records. Diabet Med 2012; 29(3): 410-4. [http://dx.doi.org/10.1111/j.1464-5491.2011.03457.x] [PMID: 21916978] [6] Read J. The Read clinical classification (Read codes). Br Homeopath J 1991; 80(1): 14-20. [http://dx.doi.org/10.1016/S0007-0785(05)80418-1] [7] Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: A systematic review. Br J Clin Pharmacol 2010; 69(1): 4-14. [http://dx.doi.org/10.1111/j.1365-2125.2009.03537.x] [PMID: 20078607] [8] Hammad TA, Margulis AV, Ding Y, Strazzeri MM, Epperly H. Determining the predictive value of Read codes to identify congenital cardiac malformations in the UK Clinical Practice Research Datalink. Pharmacoepidemiol Drug Saf 2013; 22(11): 1233-8. [http://dx.doi.org/10.1002/pds.3511] [PMID: 24002995] [9] Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice Research Database: A systematic review. Br J Gen Pract 2010; 60(572): e128-36. [http://dx.doi.org/10.3399/bjgp10X483562] [10] Stone MA, Camosso-Stefinovic J, Wilkinson J, de Lusignan S, Hattersley AT, Khunti K. Incorrect and incomplete coding and classification of diabetes: A systematic review. Diabet Med 2010; 27(5): 491-7. [http://dx.doi.org/10.1111/j.1464-5491.2009.02920.x] [PMID: 20536944] [11] De Lusignan S, Sadek K, McDonald H, et al. Call for consistent coding in diabetes mellitus using the Royal College of General Practitioners and NHS pragmatic classification of diabetes. Inform Prim Care 2012; 20(2): 103-13. [PMID: 23710775]

188 26 The Open Bioinformatics Journal , 2017, Volume 10 Owusu Adjah et al.

[12] Seidu S, Davies MJ, Mostafa S, de Lusignan S, Khunti K. Prevalence and characteristics in coding, classification and diagnosis of diabetes in primary care. Postgrad Med J 2014; 90(1059): 13-7. [http://dx.doi.org/10.1136/postgradmedj-2013-132068] [PMID: 24225940] [13] De Lusignan S, Liaw S-T, Dedman D, Khunti K, Sadek K, Jones S. An algorithm to improve diagnostic accuracy in diabetes in computerised problem orientated medical records (POMR) compared with an established algorithm developed in episode orientated records (EOMR). J Innov Health Inform 2015; 22(2): 255-64. [http://dx.doi.org/10.14236/jhi.v22i2.79] [PMID: 26245239] [14] De Lusignan S, Khunti K, Belsey J, et al. A method of identifying and correcting miscoding, misclassification and misdiagnosis in diabetes: A pilot and validation study of routinely collected data. Diabet Med 2010; 27(2): 203-9. [http://dx.doi.org/10.1111/j.1464-5491.2009.02917.x] [PMID: 20546265] [15] Holt TA, Gunnarsson CL, Cload PA, Ross SD. Identification of undiagnosed diabetes and quality of diabetes care in the United States: Cross- sectional study of 11.5 million primary care electronic records. CMAJ Open 2014; 2(4): E248-55. [http://dx.doi.org/10.9778/cmajo.20130095] [PMID: 25485250] [16] Holt TA, Stables D, Hippisley-Cox J, O’Hanlon S, Majeed A. Identifying undiagnosed diabetes: cross-sectional survey of 3.6 million patients’ electronic records. Br J Gen Pract 2008; 58(548): 192-6. [http://dx.doi.org/10.3399/bjgp08X277302] [PMID: 18318973] [17] Magliano DJ, Zimmet P, Shaw J. US trends for diabetes prevalence among adults. JAMA 2016; 315(7): 705. [http://dx.doi.org/10.1001/jama.2015.16455] [PMID: 26881376] [18] Blak BT, Thompson M, Dattani H, Bourke A. Generalisability of The Health Improvement Network (THIN) database: Demographics, chronic disease prevalence and mortality rates. Inform Prim Care 2011; 19(4): 251-5. [PMID: 22828580] [19] Denburg MR, Haynes K, Shults J, Lewis JD, Leonard MB. Validation of The Health Improvement Network (THIN) database for epidemiologic studies of chronic kidney disease. Pharmacoepidemiol Drug Saf 2011; 20(11): 1138-49. [http://dx.doi.org/10.1002/pds.2203] [PMID: 22020900] [20] IMS Health Incorporated The Health Improvement Network (THIN) database London: IMS Health Incorporated 2017. Available at: http://www.csdmruk.imshealth.com/index.html [21] Gray J, Orr D, Majeed A. Use of Read codes in diabetes management in a south London primary care group: Implications for establishing disease registers. BMJ 2003; 326(7399): 1130. [http://dx.doi.org/10.1136/bmj.326.7399.1130] [PMID: 12763987] [22] Rollason W, Khunti K, De Lusignan S. Variation in the recording of diabetes diagnostic data in primary care computer systems: Implications for the quality of care. Inform Prim Care 2009; 17(2): 113-9. [PMID: 19807953] [23] Lycett D, Nichols L, Ryan R, et al. The association between smoking cessation and glycaemic control in patients with type 2 diabetes: A THIN database cohort study. Lancet Diabetes Endocrinol 2015; 3(6): 423-30. [http://dx.doi.org/10.1016/S2213-8587(15)00082-0] [PMID: 25935880] [24] American Diabetes Association. Standards of Medical Care in Diabetes-2015. Diabetes Care 2015; 38(Suppl. 1): S4. [http://dx.doi.org/10.2337/dc15-S003] [25] Hall MA. 1999. Correlation-based feature selection for machine learning PhD dissertation. Hamilton, NZ: University of Waikato, 1999 [26] Senliol B, Gulgezen G, Yu L, Cataltepe Z. Fast Correlation Based Filter (FCBF) with a different search strategy. Computer and Information Sciences. 2008 ISCIS'08 23rd International SymposiumIstanbol, Turkey: IEEE, 2008. [27] Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Berlington, MA: Morgan Kaufmann 2005. [28] Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Mach Learn 1997; 29(2): 131-63. [29] John GH, Langley P, Eds. Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Berlington, MA: Morgan Kaufmann Publishers Inc.338-45. [30] Schmidt M, Roux NL, Bach F. Minimizing finite sums with the stochastic average gradient. Math Program 2017; 162(1-2): 83-112. [31] Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97. [http://dx.doi.org/10.1007/BF00994018] [32] Wu T-F, Lin C-J, Weng RC. Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 2004; 5: 975-1005. [33] Ruck DW, Rogers SK, Kabrisky M. Feature selection using a multilayer perceptron. J Neural Netw Comput 1990; 2(2): 40-8. [34] Loh W-Y. Improving the precision of classification trees. Ann Appl Stat 2009; 3(4): 1710-37. [http://dx.doi.org/10.1214/09-AOAS260] [35] Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn 1993; 11(1): 63-90. [http://dx.doi.org/10.1023/A:1022631118932] [36] Tapak L, Mahjub H, Hamidi O, Poorolajal J. Real-data comparison of data mining methods in prediction of diabetes in Iran. Healthc Inform Res 2013; 19(3): 177-85.

189 Cohort Identification from Primary Care Database The Open Bioinformatics Journal , 2017, Volume 10 27

[http://dx.doi.org/10.4258/hir.2013.19.3.177] [PMID: 24175116] [37] Mani S, Chen Y, Elasy T, Clayton W, Denny J. Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Annu Symp Proc 2012. 606-15. [38] Zheng T, Xie W, Xu L, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2017; 97: 120-7. [http://dx.doi.org/10.1016/j.ijmedinf.2016.09.014] [PMID: 27919371] [39] Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 2015; 3(4): 277-87. [http://dx.doi.org/10.1089/big.2015.0020] [PMID: 27441408] [40] Thomas G, Klein K, Paul S. Statistical challenges in analysing large longitudinal patient-level data: The danger of misleading clinical inferences with imputed data. J Indian Soc Agric Stat 2014; 68(2): 39-54. [41] Khunti K, Davies M, Majeed A, Thorsted BL, Wolden ML, Paul SK. Hypoglycemia and risk of cardiovascular disease and All-cause mortality in insulin-treated people with type 1 and type 2 diabetes: A cohort study. Diabetes Care 2015; 38(2): 316-22. [PMID: 25492401]

© 2017 Owusu Adjah et al. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: (https://creativecommons.org/licenses/by/4.0/legalcode). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

190

APPENDIX C

191 Received: 2 June 2016 Revised: 31 July 2016 Accepted: 1 August 2016 DOI 10.1111/dom.12761

ORIGINAL ARTICLE

Weight gain in insulin-treated patients by body mass index category at treatment initiation: new evidence from real-world data in patients with type 2 diabetes

S. K. Paul PhD1 | J. E. Shaw MSc2 | O. Montvida MSc1,3 | K. Klein PhD1

1Clinical Trials and Biostatistics Unit, QIMR Berghofer Medical Research Institute, Aims: To evaluate, in patients with type 2 diabetes (T2DM) treated with insulin, the extent of Brisbane, Australia weight gain over 2 years of insulin treatment, and the dynamics of weight gain in relation to 2Baker IDI Heart and Diabetes Institute, glycaemic achievements over time according to adiposity levels at insulin initiation. Melbourne, Australia Materials and methods: Patients with T2DM (n = 155 917), who commenced insulin therapy 3 School of Biomedical Sciences, Faculty of and continued it for at least 6 months, were selected from a large database of electronic medi- Health, Institute of Health and Biomedical Innovation, Queensland University of cal records in the USA. Longitudinal changes in body weight and glycated haemoglobin (HbA1c) Technology, Brisbane, Australia according to body mass index (BMI) category were estimated. Corresponding Author: Prof. S. K. Paul, Clinical Results: Patients had a mean age of 59 years, a mean HbA1c level of 9.5%, and a mean BMI of Trials and Biostatistics Unit, QIMR Berghofer 35 kg/m2 at insulin initiation. The HbA1c levels at insulin initiation were significantly lower Medical Research Institute, Brisbane, (9.2-9.4%) in the obese patients than in patients with normal body weight (10.0%); however, Queensland 4006, Australia ([email protected] ). the proportions of patients with HbA1c >7.5% or >8.0% were similar across the BMI cate- Funding information gories. The adjusted weight gain fell progressively with increasing baseline BMI category over The QIMR Berghofer Medical Research 6, 12 and 24 months (p < .01). The adjusted changes in HbA1c were similar across BMI cate- Institute gratefully acknowledges the support gories. A 1% decrease in HbA1c was associated with progressively less weight gain as pretreat- from the National Health and Medical 2 Research Council and the Australian ment BMI rose, ranging from a 1.24 kg gain in those with a BMI <25 kg/m to a 0.32 kg loss in Government’s National Collaborative Research those with a BMI > 40 kg/m2. Infrastructure Strategy (NCRIS) initiative Conclusions: During 24 months of insulin treatment, obese patients gained significantly less through Therapeutic Innovation Australia. body weight than normal-weight and overweight patients, while achieving clinically similar glycaemic benefits. These data provide reassurance with regard to the use of insulin in obese patients.

KEYWORDS

body mass index, glycaemic control, insulin initiation, type 2 diabetes, weight change

1 | INTRODUCTION weight gain are usually greatest for those who are already obese.8–11 Type 2 diabetes (T2DM) is a progressive disease in which β-cell While many studies have reported that significant weight gain is function continues to decline over time, leading to the need for associated with insulin therapy, no study, to the best of our knowl- insulin treatment in a significant proportion of patients. Although edge, has explored the weight gain with insulin therapy according to studies suggest that early initiation of insulin supplementation may adiposity levels at the time of insulin initiation. If it were the case that alter the progressive course of T2DM, insulin initiation is often insulin-related weight gain declines with increasing pretreatment adi- delayed, mainly because of patients’ hesitation and physician posity, then significant reassurance could be provided to obese barriers.1–7 This may significantly increase the risk of developing dia- patients, and timely insulin therapy could be initiated. betic complications.7,8 Fear of weight gain is one of the common In the present longitudinal study, using data from a comprehen- reasons for delaying insulin therapy, and concerns about potential sive electronic medical record database, the aims were to evaluate

1244 © 2016 John Wiley & Sons Ltd wileyonlinelibrary.com/journal/dom Diabetes Obes Metab December 2016; 18: 1244–1252

192 PAUL ET AL. 1245 the following, according to pretreatment body mass index (BMI) cate- 2.2 | Statistical methods gory: (1) change in body weight during the 2 years after initiation of The proportions of patients with missing data on body weight insulin treatment; (2) glycaemic control in these patients in relation to and HbA1c between 6 and 24 months of follow-up ranged between weight change; and (3) weight gain associated with a 1 percentage 9% – 16% and 8% – 17%, respectively. The missing data were point improvement in glycated haemoglobin (HbA1c). imputed using a multiple imputation technique, with adjustments for age at insulin initiation, diabetes duration at insulin initiation and usage of oral antidiabetes drugs (OAD) during follow-up. All primary 2 | MATERIALS AND METHODS analyses were conducted using the imputed weight and HbA1c data, with additional analyses based on complete cases for sensitivity 2.1 | Data source analyses. The mean [95% confidence interval (CI)] changes in body weight The General Electronic Centricity Electronic Medical Records and HbA1c at 6, 12 and 24 months for the main study cohort and (CEMR) contain more than 40 million patients’ clinical/treatment the two sub-cohorts were obtained using weighted multivariate records from 1995 to January 2015. The CEMR represents 49 US regression models, adjusting for age at index date, sex, diabetes dura- states and includes data from >35 000 healthcare providers, of tion at index date, OAD usage, and distribution of weight or HbA1c which ~70% are primary care providers. The CEMR database stores at index date, separately for BMI categories at index date. The data on medication prescriptions within the electronic medical regression models for weight and HbA1c change were weighted by records network, and also information on medications that may be baseline weight and HbA1c respectively. The mean (95% CI) values used over the counter or prescribed outside of the electronic medi- for the possible marginal contribution of sulphonylurea usage to cal records network. This database includes insured and uninsured weight and HbA1c changes were estimated using the same regres- patients, and has been extensively used for academic research sion models, as appropriate. Separate sensitivity analyses were con- worldwide.12–16 ducted using data from patients with a minimum of 2 years’ diabetes From more than 1.6 million patients with T2DM, a cohort of duration at index date, to verify the consistency of the findings on 432 287 patients, who received at least one prescription of insulin on weight change according to BMI category at index date. Additional or after diagnosis of T2DM, was selected. Identification of T2DM sensitivity analyses included evaluations of changes in body weight was based on the International Classification of Diseases (ICD)-9 and HbA1c, with further adjustments for the insulin regimen at index codes and at least two prescriptions for any antidiabetes drugs within date, and among those who did not undergo any bariatric surgery 6 months of diagnosis of T2DM. Patients with incomplete description procedure before insulin initiation or during follow-up (n = 153 788). of the disease and with any record of type 1 diabetes longitudinally To evaluate the possible association of a 1% reduction in HbA1c were excluded. Inclusion criteria were no missing data on age, sex or by insulin other antidiabetes drugs with weight gain over 6, 12 and ethnicity at diagnosis of T2DM, age at insulin initiation between 24 months of insulin treatment, multivariate factorial regression mod- 18 and 75 years, first prescription date of insulin on or after 1 January els were fitted. For example, to evaluate the possible association of a 1995, no missing data on body weight and HbA1c at and within 1% reduction in HbA1c at 12 months of treatment, the fitted 3 months of the first date of insulin prescription, and any prescription model was: of glucagon-like peptide-1 (GLP-1) receptor agonists during the follow-up period. The sizes of the cohorts that had 6, 12 and ðÞweight12 m – weightindex date functionf ageindex date + sex 24 months of insulin treatment duration were 155 917, 151 220

(sub-cohort 1) and 144 857 (sub-cohort 2), respectively. + diabetes durationindex date Patients’ BMI at insulin initiation was categorized as: BMI < 25 kg/m2 (normal weight); BMI ≥ 25 and <30 kg/m2 (overweight); + use of any OAD on index date or during 12 months of follow-up BMI ≥ 30 and <35 kg/m2 (Grade 1 obesity); BMI ≥ 35 and <40 kg/ +ðÞ BMI categories × ðÞHbA1c – HbA1c g m2 (Grade 2 obesity); and BMI ≥ 40 kg/m2 (Grade 3 obesity). index date 12m index date Baseline data included age, sex, ethnicity, body weight, BMI and −weighted by weightindex date: blood pressure at the time of diagnosis of diabetes and at the time of insulin initiation (index date). Longitudinal clinical and laboratory mea- The regression-based approach described above was also used to sures were arranged on the basis of 6-monthly windows, progres- evaluate the possible differences in the patterns of association of sively from 6 months before the index date, and only the latest HbA1c change with weight change in people with different BMI measurement within each window was preserved. For example, the levels at insulin initiation. latest HbA1c value measured >6 and ≤12 months after the index date was kept as HbA1c at 12 months. Complete information on antidiabetes drugs, antihypertensive and cardioprotective medica- tions over time was obtained, along with dates of prescriptions. For 3 | RESULTS antidiabetes drugs, information was extracted on any medication that was prescribed after diagnosis of diabetes and after the index date. The demographic, clinical, laboratory and medications data at the The treatment duration with individual medications was calculated. time of insulin initiation are shown in Table 1, for the whole study

193 1246 PAUL ET AL.

TABLE 1 Baseline characteristics of main cohort and two sub-cohorts

Insulin treatment ≥ 6 months Insulin treatment ≥ 12 months Insulin treatment ≥ 12 months Main cohort Sub-cohort 1 Sub-cohort 2 N 155 917 151 220 144 857 Male, n (%) 75 038 (48) 72 797 (48) 69 788 (48) Ethnicity, n (%) White 83 441 (54) 80 840 (54) 77 431 (54) Black 21 658 (14) 21 086 (14) 20 274 (14) Hispanic 6180 (4) 6004 (4) 5740 (4) Asian 2789 (2) 2713 (2) 2616 (2) Other/Unknown 41 489 (27) 40 577 (27) 38 796 (27) Mean (s.d.) age at insulin initiation, years 59 (11) 59 (11) 59 (11) Diabetes duration Median (Q1, Q3), years 1.9 (0.2, 2.6) 1.9 (0.2, 2.6) 2.0 (0.2, 2.7) <2 years, n (%) 102 038 (72) 98 911 (72) 94 655 (71) 2-5 years, n (%) 25 754 (18) 25 020 (18) 24 027 (18) >5years, n (%) 14 848 (10) 14 431 (10) 13 856 (11) Mean (s.d.) weight, kg 98.8 (26.0) 98.6 (26.0) 98.6 (25.9) Mean (s.d.) BMI, kg/m2 34.8 (8.4) 34.8 (8.5) 34.8 (8.4) BMI category1, n (%) <25 kg/m2 13 880 (9) 13 585 (9) 13 153 (9) ≥25 and <30 kg/m2 32 047 (21) 31 254 (21) 30 106 (21) ≥30 and < 35 kg/m2 43 274 (28) 41 916 (28) 40 153 (28) ≥35 and <40 kg/m2 32 445 (21) 31 385 (21) 29 940 (21) ≥40 kg/m2 34 271 (22) 33 080 (22) 31 505 (22) HbA1c Mean (s.d.) HbA1c, % 9.5 (2.3) 9.5 (2.3) 9.5 (2.2) Median (Q1, Q3) HbA1c, % 9.2 (7.7, 11.0) 9.2 (7.7, 10.9) 9.2 (7.7, 10.9) HbA1c ≥ 7.5%, n (%) 123 802 (79) 120 111 (79) 114 965 (79) HbA1c ≥ 8%, n (%) 111 368 (71) 108 052 (72) 103 379 (71) Antidiabetes medication, n (%) Metformin 108 377 (70) 105 084 (70) 100 531 (69) Sulphonylurea 74 492 (48) 72 305 (48) 69 214 (48) Insulin type at initiation, n (%) Basal 99 299 (65) 96 343 (65) 92 212 (65) Biphasic 21 136 (14) 20 548 (14) 19 843 (14) Prandial 31 314 (21) 30 335 (21) 29 044 (21)

1 BMI categories at insulin initiation: <25 kg/m2 (normal weight); ≥25 and <30 kg/m2 (overweight); ≥30 and <35 kg/m2 (Grade 1 obesity); ≥35 and <40 kg/m2 (Grade 2 obesity); and ≥40 kg/m2 (Grade 3 obesity). cohort (n = 155 917) and the two sub-cohorts defined on the basis patients were more likely to be on concomitant metformin and/or of minimum insulin treatment duration of 12 months (n = 151 220) sulphonylurea therapies than patients with normal weight. and 24 months (n = 144 857). In the whole cohort, patients had a mean [standard deviation (s.d.)] age of 59 (11) years, 48% were male, 3.1 | Weight change 54% were white, and the median diabetes duration at index date was ~2 years. The mean HbA1c at insulin initiation (9.5%) and the propor- Weight change over 24 months is shown in Figure 1 and Table 3. tions of patients with HbA1c >7.5% and >8% were similar in all The weight gain was significantly and consistently lower in patients cohorts. with a higher BMI, compared with that in patients with normal Table 2 shows the baseline characteristics according to BMI cate- body weight. In Grade 1 and Grade 2 obese patients, the adjusted gory. Female and white patients were more likely to have a higher mean weight gain over 6 and 12 months of insulin treatment ran- BMI at index date. The diabetes duration was similar among the BMI ged between 0.1 and 0.9 kg (Table 3), combining the main cohort groups. The mean HbA1c levels at index date were significantly lower and sub-cohort 1. In Grade 3 obese patients, the adjusted reduc- (9.2-9.4%) in the obese patients than in patients with normal body tions in body weight were 0.7, 1.1 and 2.2 kg over 6, 12 and weight (10.0%); however, the proportions of patients with HbA1c 24 months of insulin treatment, respectively. The adjusted mean >7.5% and >8.0% were similar across the BMI categories. The obese weight gain in the normal-weight patients ranged between 2 and

194 PAUL ET AL. 1247

TABLE 2 Study variables and concomitant medication usage at insulin initiation according to BMI category in patients with a minimum 24 months of insulin treatment (sub-cohort 2)

BMI category ≥25 and ≥30 and ≥35 and <25 kg/m2 <30 kg/m2 <35 kg/m2 <40 kg/m2 ≥40 kg/m2 N 13 153 30 106 40 153 29 940 31 505 Male, n (%) 7126 (54) 16 925 (56) 20 847 (52) 13 592 (45) 11 298 (36) Ethnicity, n (%) White 6200 (47) 14 959 (50) 21 491 (54) 16 727 (56) 18 054 (57) Black 1851 (14) 4244 (14) 5414 (14) 3967 (13) 4798 (15) Hispanic 584 (4) 1577 (5) 1641 (4) 1051 (4) 887 (3) Asian 639 (5) 891 (3) 603 (2) 267 (1) 216 (1) Other/Unknown 3879 (30) 8435 (28) 11 004 (28) 7928 (27) 7550 (24) Mean (s.d.) age at insulin initiation, 58 (12) 60 (11) 59 (11) 59 (11) 57 (12) years Diabetes duration, years Median (Q1, Q3) 1.6 (0.1, 1.9) 1.8 (0.1, 2.4) 1.9 (0.1, 2.4) 1.8 (0.1, 2.5) 1.8 (0.1, 2.5) <2 years, n (%) 9223 (76) 19 800 (71) 26 050 (71) 19 281 (71) 20 301 (71) 2–5 years, n (%) 1927 (16) 5004 (18) 6697 (18) 5076 (19) 5323 (19) >5 years, n (%) 1015 (8) 2952 (11) 3937 (11) 2892 (11) 3060 (11) Mean (s.d.) weight, kg 64.4 (10.5) 79.8 (11.1) 93.8 (13.9) 106.4 (15.0) 130.2 (23.1) Mean (s.d.) BMI, kg/m2 22.5 (2.2) 27.7 (1.4) 32.6 (1.4) 37.3 (1.4) 46.8 (6.9) HbA1c Mean (s.d.), % 10.0 (2.8) 9.7 (2.4) 9.4 (2.3) 9.3 (2.2) 9.2 (2.1) Median (Q1, Q3), % 9.6 (7.7, 12.0) 9.4 (7.8, 11.3) 9.2 (7.7, 10.9) 9.0 (7.7, 10.7) 9.0 (7.6, 10.5) HbA1c ≥ 7.5%, n (%) 10 400 (79) 24 289 (81) 32 015 (80) 23 686 (79) 24 575 (78) HbA1c ≥ 8%, n (%) 9 462 (72) 22 045 (73) 28 766 (72) 21 188 (71) 21 918 (70) Antidiabetes medication, n (%) Metformin only 7892 (60) 20 685 (69) 27 973 (70) 21 328 (71) 22 653 (72) Sulphonylurea only 5416 (41) 14 339 (48) 19 378 (48) 14 683 (49) 15 398 (49) Metformin + Sulphonylurea 4390 (33) 11 896 (40) 15 945 (40) 12 074 (40) 12 636 (40) Insulin type at initiation, n (%) Basal 8164 (64) 19 503 (66) 25 757 (66) 19 018 (65) 19 770 (64) Biphasic 1756 (14) 4114 (14) 5449 (14) 4125 (14) 4399 (14) Prandial 2901 (23) 5755 (20) 7847 (20) 6025 (21) 6516 (21)

4 kg over 6-24 months of insulin treatment. In normal-weight and 3.2 | Glycaemic control overweight patients, the mean weight gain was significantly higher The 6-monthly longitudinal measures [mean (95% CI)] of HbA1c over than in obese patients. As evident from Figure 1C, normal-weight 24 months from index date are shown in Figure 1B. The adjusted and overweight patients continuously gained weight over 2 years changes in HbA1c at 6, 12 and 24 months over the BMI categories of follow-up, while declining body weight trajectories were are shown in Table 4. Starting with a significantly higher HbA1c of observed in obese patients after 12 months of insulin initiation. 10% at the index date (compared with overweight and obese The proportions of patients with weight gain of ≥5 kg were sig- patients), the normal-weight patients had a mean reduction in HbA1c nificantly lower in obese groups (16% and 19%) than in normal- of ~1.4% over 6-24 months of insulin treatment. The overweight, weight patients (28% and 37%) during 12 and 24 months of insulin treatment (Table 3). Sulphonylurea treatment only marginally contrib- Grade 1 and Grade 2 obese patients had similar glycaemic achieve- uted to the weight gain (0.17-0.27 kg over 6-24 months of insulin ments over the follow-up time (1.0-1.3%). Although the HbA1c fi treatment). reductions in Grade 3 obese patients were statistically signi cantly Approximately 28% of patients (n = 53 879) had a minimum of lower compared with the other groups of overweight and obese 2 years’ diabetes duration at insulin initiation (Table S1). The patterns patients, these were clinically marginal differences. of weight change by BMI category in this subset of patients were In the main study cohort, the proportions of patients on metfor- similar to those observed in all patients across the insulin treatment min, sulphonylureas or both medications were 70%, 48% and 39%, duration categories. The observed weight changes in different BMI respectively. Among normal-weight patients with a minimum of categories were similar after adjustments for insulin regimen, and also 24 months of insulin treatment (13 153/144 857 patients), the pro- for those who did not undergo any bariatric procedure(s). portions of patients receiving metformin, sulphonylurea or both were

195 1248 PAUL ET AL.

FIGURE 1 A, Mean (95% CI) of longitudinal measures of body weight (kg) over 2 years from the time of insulin initiation by BMI category at index date; B, mean (95% CI) of longitudinal measures of HbA1c over 2 years from the time of insulin initiation by BMI category at index date; C, adjusted mean (95% CI) of change in body weight at 6, 12 and 24 months of treatment with insulin by BMI category at index date; D, adjusted mean (95% CI) change in body weight at 6, 12 and 24 months of treatment with insulin, associated with 1% reduction in HbA1c at these time points.

60%, 41% and 33%, respectively. The distribution of usages of these patients during the same follow-up period. Among Grade 3 obese drugs alone or in combination were similar for overweight and obese patients, a 1% HbA1c reduction was not associated with any increase patients, and were significantly higher compared with the usage in weight. observed among the normal-weight patients (Table 2). The adjusted marginal HbA1c reductions, associated with metformin treatment, 3.4 | Pattern of association of weight change were 0.37%, 0.38% and 0.31% at 6, 12 and 24 months of follow-up. with glycaemic control by BMI category The adjusted changes in HbA1c level associated with sulphonylurea Figure 1D shows a difference in the association between HbA1c were marginal (Table 4). These estimates were similar across all BMI reduction and weight gain by BMI category. For example, at categories (results are not presented). 24 months of treatment, with the slope of 1.24 (95% CI 1.18, 1.31) kg associated with a 1% reduction in HbA1c for normal-weight 3.3 | Association of weight change with patients as reference, the differences in the slopes for obese cate- improvement in glycaemic control gories (from slope for normal-weight category) were significantly dif- The adjusted estimates (regression coefficients) and 95% CIs of ferent (p < .01) from zero, and were also significantly different among weight gain associated with 1% reduction in HbA1c by BMI cate- three obese categories. The differences in the slopes in the over- gories over 6, 12 and 24 months of insulin treatment among patients weight, Grade 1 and Grade 2 categories were 0.43, 0.78 and 1.11 kg, in sub-cohort 2 are shown in Table 3 and Figure 1D. The estimated respectively (p < .01). The difference in estimated slopes in the weight gains were significantly lower among obese patients than normal-weight and Grade 3 obese patients was 1.56 kg. The pattern among normal- and overweight patients. While a 1% HbA1c reduc- of association of longitudinal changes in body weight with HbA1c tion was associated with weight gains of 0.92 and 1.24 kg among was similar for different ethnic groups. normal-weight patients at 12 and 24 months of insulin treatment, As a sensitivity analysis, all analyses (described above) were also weight gain was 0.13-0.46 kg among Grade 1 and Grade 2 obese conducted in patients with complete data on weight and HbA1c

196 PAUL ET AL. 1249

TABLE 3 Adjusted weight change over 24 months of insulin treatment initiation, by baseline BMI category

Main cohort Sub-cohort 1 Sub-cohort 2 Sub-cohort 2 n = 155 917 n = 151 220 n = 144 857 n = 144 857 Mean (95% CI) weight change1,kg Mean (95% CI) weight change associated with a 1% reduction in HbA1c2 At 6 months BMI < 25 kg/m2 2.0 (2.0, 2.1) 2.0 (2.0, 2.1) 2.0 (2.0, 2.1) 0.66 (0.61, 0.70) BMI ≥ 25 and <30 kg/m2 1.2 (1.1, 1.2) 1.2 (1.1, 1.2) 1.1 (1.1, 1.2) 0.44 (0.41, 0.47) BMI ≥ 30 and <35 kg/m2 0.6 (0.6, 0.7) 0.6 (0.6, 0.7) 0.6 (0.6, 0.7) 0.29 (0.26, 0.31) BMI ≥ 35 and <40 kg/m2 0.1 (0.1, 0.2) 0.1 (0.1, 0.2) 0.1 (0.1, 0.2) 0.11 (0.08, 0.15) BMI ≥40 kg/m2 −0.7 (−0.7, −0.6) −0.7 (−0.7, −0.6) −0.7 (−0.7, −0.6) −0.06 (−0.02, 0.0)

At 12 months BMI < 25 kg/m2 3.0 (3.0, 3.1) 3.0 (3.0, 3.1) 0.92 (0.86, 0.97) BMI ≥ 25 and <30 kg/m2 1.8 (1.7, 1.8) 1.7 (1.7, 1.8) 0.61 (0.58, 0.65) BMI ≥ 30 and <35 kg/m2 0.9 (0.9, 1.0) 0.9 (0.9, 1.0) 0.39 (0.36, 0.43) BMI ≥ 35 and <40 kg/m2 0.2 (0.2, 0.3) 0.2 (0.2, 0.3) 0.15 (0.11, 0.19) BMI ≥ 40 kg/m2 −1.1 (−1.2, −1.1) −1.1 (−1.2, −1.1) −0.18 (−0.22, −0.14)

At 24 months BMI < 25 kg/m2 3.9 (3.8, 3.9) 1.24 (1.18, 1.31) BMI ≥ 25 and <30 kg/m2 2.0 (1.9, 2.0) 0.81 (0.76, 0.85) BMI ≥ 30 and <35 kg/m2 0.7 (0.7, 0.8) 0.46 (0.41, 0.50) BMI ≥ 35 and <40 kg/m2 −0.2 (−0.3, −0.2) 0.13 (0.08, 0.18) BMI ≥ 40 kg/m2 −2.2 (−2.2, −2.1) −0.32 (−0.37, −0.27)

Percentage of patients who increased body weight ≥ 5kg BMI < 25 kg/m2 13 28 37 BMI ≥ 25 and < 30 kg/m2 12 19 24 BMI ≥ 30 and < 35 kg/m2 11 17 19 BMI ≥ 35 and < 40 kg/m2 11 16 18 BMI ≥ 40 kg/m2 11 17 18

Mean (95% CI) of marginal contribution towards changes in weight, kg Sulphonylurea treatment 0.17 (0.11, 0.22) 0.25 (0.17, 0.32) 0.27 (0.18, 0.36)

1 Adjusted for sex, diabetes duration and metformin and sulphonylurea usage, according to BMI categories at insulin initiation. 2 Adjusted for sex, diabetes duration, concomitant antidiabetic medication usage, and baseline HbA1c, according to BMI category at insulin initiation. BMI categories at insulin initiation: <25 kg/m2 (normal weight); ≥25 and <30 kg/m2 (overweight); ≥30 and <35 kg/m2 (Grade 1 obesity); ≥35 and <40 kg/m2 (Grade 2 obesity); and ≥40 kg/m2 (Grade 3 obesity).

measures at 6, 12 and 24 months of follow-up. No difference in the and (4) the weight gain associated with a 1% reduction in HbA1c falls estimates or inferences was observed between complete case ana- progressively as pretreatment BMI increases. lyses and analyses based on the imputed data. Our finding that obese patients gain significantly less weight with insulin treatment is robust, and is supported by consistent estimates of weight changes according to different BMI categories over 6- 24 months of insulin treatment (Figure 1C). Separate analyses to 4 | DISCUSSION explore possible confounders, through evaluation of various charac- teristics according to BMI category in patients with a minimum of The present exploratory clinical study, based on large-scale longitudi- 2 years’ insulin treatment (Table 2), provide a basis to support this nal real-world data, clearly suggests that: (1) weight gain associated robust finding. This is further supported by consistent findings in with insulin treatment is significantly lower in obese patients with patients with a minimum of 2 years’ diabetes duration before insulin T2DM compared with that observed in patients with normal body initiation (Table S1). It is likely that some of the normal-weight weight; (2) the significantly lower weight gain in obese patients is patients (BMI <25 kg/m2), and perhaps some with a BMI 25-30 kg/ consistent over 6, 12 and 24 months of treatment with insulin, m2, had lost weight before commencing insulin, which may account adjusted for various factors including the use of concomitant antidia- for some of their weight gain after insulin initiation; however, we still betes medications; (3) the glycaemic control over 24 months of treat- see progressive reductions in body weight gain with progressive ment with insulin is similar among patients with different BMI levels; increases in baseline BMI >30 kg/m2.

197 1250 PAUL ET AL.

TABLE 4 Adjusted HbA1c change over 24 months of insulin treatment initiation, by baseline BMI category

Main cohort Sub-cohort 1 Sub-cohort 2 n = 155 917 n = 151 220 n = 144 857 Mean (95% CI) HbA1c change (%)1

At 6 months <25 kg/m2 −1.4 (−1.4, −1.3) −1.4 (−1.4, −1.3) −1.3 (−1.3, −1.2) ≥25 and <30 kg/m2 −1.3 (−1.3, −1.2) −1.3 (−1.3, −1.2) −1.2 (−1.2, −1.1) ≥30 and <35 kg/m2 −1.1 (−1.2, −1.1) −1.1 (−1.2, −1.1) −1.1 (−1.2, −1.1) ≥35 and <40 kg/m2 −1.1 (−1.2, −1.1) −1.1 (−1.1, −1.0) −1.1 (−1.1, −1.0) ≥40 kg/m2 −1.0 (−1.0, −0.9) −1.0 (−1.0, −0.9) −1.0 (−1.0, −0.9)

At 12 months <25 kg/m2 −1.3 (−1.4, −1.3) −1.3 (−1.3, −1.2) ≥25 and <30 kg/m2 −1.2 (−1.2, −1.1) −1.2 (−1.2, −1.1) ≥30 and <35 kg/m2 −1.1 (−1.1, −1.0) −1.1 (−1.1, −1.0) ≥35 and <40 kg/m2 −1.0 (−1.0, −0.9) −1.0 (−1.1, −1.0) ≥40 kg/m2 −1.0 (−0.1, −0.9) −1.0 (−1.0, −0.9)

At 24 months <25 kg/m2 −1.4 (−1.4, −1.3) ≥25 and <30 kg/m2 −1.2 (−1.2, −1.1) ≥30 and <35 kg/m2 −1.1 (−1.1, −1.0) ≥35 and <40 kg/m2 −1.0 (−1.1, −1.0) ≥40 kg/m2 −1.0 (−1.0, −0.9)

Metformin treatment Yes −1.2 (−1.3, −1.2) −1.2 (−1.3, −1.2) −1.2 (−1.2, −1.1) No −0.9 (−0.9, −0.8) −0.9 (−0.9, −0.8) −0.9 (−0.9, −0.8)

Mean (95% CI) of marginal contribution towards changes in HbA1c, % Sulphonylurea treatment 0.03 (0.02, 0.03) 0.3 (0.1, 0.5) 0.7 (0.5, 0.9)

1 Adjusted for sex, diabetes duration and metformin and sulphonylurea usage. Estimates are provided by BMI categories at insulin initiation. BMI cate- gories at insulin initiation: <25 kg/m2 (normal weight); ≥25 and <30 kg/m2 (overweight); ≥30 and <35 kg/m2 (Grade 1 obesity); ≥35 and <40 kg/m2 (Grade 2 obesity); and ≥40 kg/m2 (Grade 3 obesity).

The level of HbA1c reached over time was clinically similar in the patients with normal body weight gained ≥5 kg. Compared with different BMI groups, despite the fact that obese patients had signifi- patients with normal weight, patients in the Grade 1, 2 and 3 obesity cantly lower HbA1c levels (9.2-9.4%) at insulin initiation compared categories were 48%, 51% and 50% less likely, respectively, to with those with normal body weight or who were overweight (9.7- increase body weight by >5 kg. 10.0%; Table 2). Thus, patients arrive at approximately the same We observed that the cost of glycaemic control in terms of HbA1c level, independently of the starting value. This might suggest weight gain is marginal in Grade 1 and Grade 2 obese patients. that the lower weight gain seen in obese patients was attributable to Among the Grade 3 obese patients, a 1% reduction in HbA1c was the use of less intensive insulin therapy. Although we did not have associated with a decrease in weight of ~0.3 kg after 24 months of insulin dose data, we nevertheless showed that, even when corrected insulin treatment. Balkau et al.17 reported that high baseline HbA1c for the same HbA1c reduction, weight gain remained smaller in the level, insulin dose requirements and lower baseline BMI were inde- obese group. pendently associated with greater weight gain. In a meta-analysis, Based on a cohort of 2179 patients with a median diabetes dura- Pontiroli et al.10 found that intensity of treatment, insulin dose, final tion of ~9 years, Balkau et al. reported 1.78 kg of average weight HbA1c level, change in HbA1c level and frequency of hypoglycaemia gain (unadjusted) over 1 year of treatment with insulin, with 24% of were significantly associated with weight increase as well as type of patients experiencing weight gain of ≥5 kg, and a significant inverse insulin regimen; however, these studies did not evaluate the possible association of baseline BMI with weight gain.17 The adjusted mean association of glycaemic control with weight change in patients with weight gain in the present study cohort with a minimum of 1 year of different levels of adiposity at the time of insulin initiation. The pres- treatment with insulin (cohort 2) ranged between 0.2 and 3.49 kg ent study also provides new information on the significant differences among patients with BMI ≤ 40 kg/m2 (combining sub-cohorts 1 and in the patterns of possible association between glycaemic control and 2). A marginal weight loss was observed in patients with Grade 3 obe- weight change in insulin-treated patients by BMI category. sity. The proportion of patients gaining ≥5 kg body weight at 1 year Electronic databases present challenges in terms of accuracy was similar across obese patients (16-17%; Table 3), while 28% of and completeness of the required data. The limitations of this

198 PAUL ET AL. 1251 study include non-availability of complete and reliable data on: and Amylin Pharmaceuticals LLC. He has received grants in support (1) medication adherence; (2) diet and exercise; (3) socio-economic of investigator and investigator initiated clinical studies from Merck, status; and (4) potential residual confounders. We believe the non- Novo Nordisk, AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi- availability of longitudinal insulin doses would not affect the Avensis and Pfizer. J. S. has received honoraria for consultancies and robustness of our findings, as the main interest was to evaluate lectures from: Novartis, Novo Nordisk, Astra Zeneca, Sanofi, Merck the observed weight change and glycaemic control at a population Sharp and Dohme, Abbott, Janssen Cilag, and Takeda. O. M. and level, reflecting the primary/ambulatory care disease risk factor K. K. have no conflict of interest to declare. management. Our analysis of weight change in relation to change in HbA1c confirms that the weight gain “cost” of achieving any given improvement in glycaemic control is, in fact, less with Author contributions increase in pretreatment BMI. S. K. P. conceived the idea and was responsible for the primary Although we excluded patients treated with GLP-1 receptor ago- design of the study. J. S. and K. K. significantly contributed to the nists, it is not possible to know to what extent the lower weight gain study design. K. K. conducted the data extraction, and S. K. P., O. M. with increasing BMI is attributable to pharmacological differences in and K. K. jointly conducted the statistical analyses. The first draft of the effects of insulin or to other attempts to lose weight in the obese the manuscript was developed by S. K. P., and all authors contributed groups. Patients may well intensify lifestyle efforts on commencing to the finalization of the manuscript. S. K. P. had full access to all the insulin. Whether or not this happens more in the more obese is diffi- data in the study and is the guarantor, taking responsibility for the cult to ascertain because of the non-availability of longitudinal life- integrity of the data and the accuracy of the data analysis. style intervention data. Thus, these results should not be interpreted as indicating that lifestyle efforts to control weight gain are not nec- essary for obese patients initiating insulin. Nevertheless, they do indi- REFERENCES cate that within the facilities available in routine care, weight gain can readily be limited when initiating insulin therapy in obese patients 1. Ishii H, Iwamoto Y, Tajima N. An exploration of barriers to insulin ini- tiation for physicians in Japan: findings from the Diabetes Attitudes, with T2DM. Wishes and Needs (DAWN) JAPAN study. PLoS One. 2012;7:e36361. A large analysis cohort from the validated CEMR database 2. Weng J, Li Y, Xu W, et al. Effect of intensive insulin therapy on beta- should be considered as a representative sample, and as such, pro- cell function and glycaemic control in patients with newly diagnosed type 2 diabetes: a multicentre randomised parallel-group trial. Lancet. vides a good picture of the state of weight and glycaemic control 2008;371:1753–1760. in routine practice. We had complete data on weight and HbA1c 3. Alvarsson M, Sundkvist G, Lager I, et al. Effects of insulin measured within 3 months of insulin initiation, and the 6-monthly vs. in recently diagnosed patients with type 2 diabetes: a 4-year follow-up. Diabetes Obes Metab. 2008;10:421–429. follow-up measures of weight and HbA1c were imputed for only 4. Wang Z, York NW, Nichols CG, Remedi MS. Pancreatic beta cell 8-16% of missing cases. The results from complete case analyses dedifferentiation in diabetes and redifferentiation following insulin and imputed data were very similar. Finally, a careful new-user therapy. Cell Metab. 2014;19:872–882. design with a reasonable exposure time of 2 years and appropriate 5. Khunti K, Davies M, Majeed A, Thorsted BL, Wolden ML, Paul SK. Hypoglycemia and risk of cardiovascular disease and all-cause mortal- adjustments for various aspects are the primary strengths of the ity in insulin-treated people with type 1 and type 2 diabetes: a cohort study. study. Diabetes Care. 2015;38:316–322. – In conclusion, we observed that, over 24 months of treatment 6. Russell-Jones D, Khan R. Insulin-associated weight gain in diabetes causes, effects and coping strategies. Diabetes Obes Metab. with insulin, obese patients gained significantly less weight than 2007;9:799–812. normal- and overweight patients, while achieving clinically similar gly- 7. Paul SK, Klein K, Thorsted BL, Wolden ML, Khunti K. Delay in treat- caemic benefits. These findings should provide important reassurance ment intensification increases the risks of cardiovascular events in patients with type 2 diabetes. Cardiovasc Diabetol. 2015;14:1–10. that, among obese patients with T2DM in routine clinical practice, 8. Khunti K, Nikolajsen A, Thorsted BL, Andersen M, Davies MJ, meaningful improvements in glycaemic control can be achieved with Paul SK. Clinical inertia in intensifying therapy among people with only small increases in weight. type 2 diabetes treated with basal insulin. Diabetes Obes Metab. 2016;18:401–409. 9. Wang H, Ni YF, Li HZ, Yang S, Feng B. Effects of insulin monother- apy on body weight, composition, and fat distribution in newly diag- ACKNOWLEDGMENTS nosed patients with type 2 diabetes mellitus. J Diabetes. 2013;5:146–148. The QIMR Berghofer Medical Research Institute gratefully acknowl- 10. Pontiroli AE, Miele L, Morabito A. Increase of body weight during the edges the support from the National Health and Medical Research first year of intensive insulin treatment in type 2 diabetes: systematic Council and the Australian Government’s National Collaborative review and meta-analysis. Diabetes Obes Metab. 2011;13:1008–1019. Research Infrastructure Strategy (NCRIS) initiative through Therapeu- 11. Paul S, Thorsted BL, Wolden M, Klein K, Khunti K. Delay in treatment intensification increases the risks of cardiovascular events in patients tic Innovation Australia. with type 2 diabetes. Cardiovascular diabetology. 2015;14:1. 12. Kamal KM, Chopra I, Elliott JP, Mattei TJ. Use of electronic medical records for clinical research in the management of type 2 diabetes. Conflict of interest Res Social Adm Pharm. 2014;10:877–884. 13. Herrin J, da Graca B, Nicewander D, et al. The effectiveness of imple- S. K. P. has acted as a consultant and/or speaker for Novartis, GI menting an electronic health record on diabetes care and outcomes. Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi Pharmaceutical Health Serv Res. 2012;47:1522–1540.

199 1252 PAUL ET AL.

14. Hansen RA, Farley JF, Maciejewski ML, Ye X, Qian C, Powers B. Real- SUPPORTING INFORMATION world utilization patterns and outcomes of colesevelam HCL in the ge electronic medical record. BMC Endocr Disord. 2013;13:24. 15. Levin P, Wei W, Miao R, et al. Therapeutically interchangeable? A Additional Supporting Information may be found online in the sup- study of real-world outcomes associated with switching basal insulin porting information tab for this article. analogues among US patients with type 2 diabetes mellitus using electronic medical records data. Diabetes Obes Metab. 2015;17:245–253. How to cite this article: Paul SK, Shaw J, Montvida O and 16. Davis KL, Tangirala M, Meyers JL, Wei W. Real-world comparative Klein K. Weight gain in insulin-treated patients by body mass outcomes of US type 2 diabetes patients initiating analog basal insulin therapy. Curr Med Res Opin. 2013;29:1083–1091. index category at treatment initiation: new evidence from 17. Balkau B, Home PD, Vincent M, Marre M, Freemantle N. Factors real-world data in patients with type 2 diabetes, Diabetes associated with weight gain in people with type 2 diabetes starting Obes Metab 2016, 18, 1244–1252. DOI:10.1111/dom.12761 on insulin. Diabetes Care. 2014;37:2108–2113.

200

APPENDIX D

201 DIABETICMedicine

DOI: 10.1111/dme.13835 Research Article: Treatment Treatment with incretins does not increase the risk of pancreatic diseases compared to older anti-hyperglycaemic drugs, when added to metformin: real world evidence in people with Type 2 diabetes

O. Montvida1,2, J. B. Green3, J. Atherton4 and S. K. Paul1,5

1Statistics Unit, QIMR Berghofer Medical Research Institute, 2School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia, 3Division of Endocrinology and Duke Clinical Research Institute, Duke University Medical Center, Durham, NC,USA, 4Cardiology Department, Royal Brisbane and Women’s Hospital and University of Queensland School of Medicine, Brisbane and 5Melbourne EpiCentre, University of Melbourne and Melbourne Health, Melbourne, Australia

Accepted 9 October 2018

Abstract

Aims In people with metformin-treated diabetes, to evaluate the risk of acute pancreatitis, pancreatic cancer and other diseases of the pancreas post second-line anti-hyperglycaemic agent initiation.

Methods People with Type 2 diabetes diagnosed after 2004 who received metformin plus a dipeptidyl peptidase-4 inhibitor (DPP-4i, n = 50 095), glucagon-like peptide-1 receptor agonist (GLP-1RA, n = 12 654), sulfonylurea (n = 110 747), thiazolidinedione (n = 17 597) or insulin (n = 34 805) for at least 3 months were identified in the US Centricity Electronic Medical Records. Time to developing acute pancreatitis, other diseases of the pancreas and pancreatic cancer was estimated, balancing and adjusting anti-hyperglycaemic drug groups for appropriate confounders.

Results In the DPP-4i group, the adjusted mean time to acute pancreatitis was 2.63 [95% confidence intervals (CI) 2.38, 2.88] years; time to pancreatic cancer was 2.70 (2.19, 3.21) years; and time to other diseases of the pancreas was 2.73 (2.33, 3.12) years. Compared with DPP-4i, the insulin group developed acute pancreatitis 0.48 years (P < 0.01) earlier and the GLP-1RA group developed pancreatic cancer 3 years later (P < 0.01). However, with the constraint of no event within 6 months of insulin initiation, the risk of acute pancreatitis in the insulin group was insignificant. No other significant differences were observed between groups.

Conclusions No significant differences in the risk of developing pancreatic diseases in those treated with various anti-hyperglycaemic drug classes were found. Diabet. Med. 00, 1–8 (2018)

Introduction association of treatment with incretin-based therapies, par- Glucagon-like peptide-1 receptor agonists (GLP-1RAs) and ticularly with DPP-4i, and the risk of acute pancreatitis or – dipeptidyl peptidase-4 inhibitors (DPP-4i) represent incretin- pancreatic cancer [6 17]. based therapeutic drug classes used to treat Type 2 diabetes. Although a number of cohort studies and meta-analyses These drugs have demonstrated efficacy in reducing blood reported no association between incretin-based therapies and glucose levels with low risk of hypoglycaemia [1–3]. Treat- risk of acute pancreatitis and pancreatic cancer [9,12,13, ment with GLP-1RAs is associated with favourable changes 15,16,18], other studies have reported an increased risk of in metabolic measurements such as body weight, and some acute pancreatitis with such agents [8,11,14,17]. In a meta- agents in the class have been shown to reduce the risk of analysis based on pooled data from the SAVOR-TIMI 53,  cardiovascular events [3–5]. However, some recent clinical EXAMINE and TECOS trials, Tkac and Raz [8] reported a observational studies have raised questions as to the possible significant increase in the incidence of acute pancreatitis [odds ratio (OR): 1.79; 95% confidence intervals (CI) 1.13, 2.82] in people treated with DPP-4i when compared with Correspondence to: Sanjoy Ketan Paul. E-mail: [email protected] placebo. The observed increase in the absolute risk of acute

ª 2018 Diabetes UK 1

202 DIABETICMedicine Pancreatic diseases in people with Type 2 diabetes  O. Montvida et al.

all US states contribute to the CEMR, wherein ~ 75% are What’s new? primary care providers. The database is generally represen- • Association of treatment with incretin-based therapies tative of the US population: the diabetes prevalence (7.1% of and the risk of pancreatic diseases remains controver- people with diabetes identified by diagnostic codes) is similar sial. However, no study explored the comparative to the US National Diabetes Statistics (6.7% diagnosed safety of different anti-hyperglycaemic drugs in this diabetes in 2014) [19]. CEMR has been used extensively for context. academic research worldwide [20,21]. For more than 34 million individuals, longitudinal EMRs • This study provides a holistic population-level compar- were available from 1995 to April 2016. This database ative outcome evaluation of the risk of pancreatic contains comprehensive person-level information on demo- diseases from the time of receiving different second-line graphics, anthropometric, clinical and laboratory variables anti-hyperglycaemic drugs post metformin. including age, sex, ethnicity, smoking status, and longitudi- • Although treatment with incretin-based therapies was nal measures of body weight, BMI, blood pressure, HbA1c, not found be to be associated with an increased risk of full lipid profiles, urine albumin and creatinine, and serum pancreatic diseases, people treated with insulin experi- creatinine. enced higher risk of such diseases. Medication data include brand names and doses for individual medications prescribed (RxNorm), along with start/stop dates and specific fields to track treatment alter- pancreatitis with DPP-4i therapy was 0.13%. In a cohort ations. This data set also contains self-reported medications, study based on real-world primary care data from the UK, including prescriptions received outside the EMR network and Knapen and colleagues [17] reported a 1.5-fold increased risk over-the-counter medications. All disease events along with of any pancreatitis in incretin-based therapy users compared dates are coded with International Classification of Diseases with other non-insulin anti-hyperglycaemic drug users. (ICD)-9, ICD-10 or SNOMED Clinical Terms (CT) codes. However, another study by Knapen and colleagues [16], based on the same database, reported no association of incretin-based therapies with the risk of pancreatic cancer. Study design and study data Previously published cohort studies have generally assessed All individuals with a diagnosis of Type 2 diabetes were pancreatic risk by comparing rates of pancreatic diseases in included in this study with the conditions of no missing data users of incretin-based therapies with rates in users of any non- for age, sex or ethnicity; age ≥ 18 and < 80 years at the incretin based anti-hyperglycaemic regimen. However, no diagnosis of Type 2 diabetes; and date of diagnosis of Type 2 prior study of adequate size and duration has holistically diabetes after EMR registration date and after 1 January evaluated the risks of acute pancreatitis and pancreatic cancer 2005. All those included also first began anti-hyperglycaemic with incretin-based therapy compared with use of other therapy with metformin, followed by second-line additional specific anti-hyperglycaemic drug classes. Furthermore, pre- treatment with a DPP-4i, GLP-1RA, insulin, thiazolidine- vious publications based on real-world data, in which baseline dione or sulfonylurea for ≥ 3 months. Users of second-line risks differ significantly and are modified over time by insulin, thiazolidinedione or sulfonylurea who had ever contrasting confounders, may not have utilized optimal received a DPP-4i or GLP-1RA were excluded, as were those analytical approaches to assess risk. Using extensive person- with other diseases of the pancreas or any type of cancer that level longitudinal data from ambulatory and primary care occurred prior to initiation of second-line anti-hyperglycae- systems in the USA, the aims of this exploratory outcome study mic drug (index date). were to evaluate the rates and risks of developing acute Baseline (index date) data included age, sex, ethnicity, pancreatitis, other diseases of the pancreas or pancreatic body weight, BMI and blood pressure at the time of second cancer in people with metformin-treated Type 2 diabetes who anti-hyperglycaemic drug initiation. Baseline HbA1c was initiated second-line anti-hyperglycaemic therapy with a DPP- obtained as the closest observation to second drug initiation 4i, GLP-1RA, sulfonylurea, thiazolidinedione or insulin. within a [À3, +3] month window. Body weight, BMI, SBP and lipids were calculated as the average of available Materials and methods measurements within [À3, +3] months of baseline. Obesity was defined as BMI ≥ 30 kg/m2. Data source The presence of comorbidities prior/post index date and the time to such events were also obtained. Acute pancreati- Centricity Electronic Medical Record (CEMR) of the USA tis, other diseases of the pancreas, cancer, cardiovascular represents a variety of ambulatory and primary care medical disease, chronic kidney disease and hypertension were practices, including solo practitioners, community clinics, identified. Cancer was defined as any malignant neoplasm academic medical centres and large integrated delivery or carcinoma in situ. Cancer of the pancreas was additionally networks. Over 35 000 physicians and other providers from separated. Other diseases of the pancreas included specified

2 ª 2018 Diabetes UK

203 Research article DIABETICMedicine

(e.g. pancreatic cyst) and unspecified diseases of the pancreas were estimated. The robustness of choosing gamma distri- with appropriate clinical codes. Cardiovascular disease was bution was tested on the basis of information criteria defined as ischaemic heart disease (includes myocardial estimates. Risk analyses were balanced on age and the infarction), peripheral vascular/artery disease, heart failure follow-up time by treatment groups, and were adjusted for or stroke. age, sex, smoking status, BMI and diabetes duration, Tobacco use status included data on the use of cigars, pipe, following the weighted propensity-score approach. Survival cigarettes, chewing tobacco, snuff and smokeless tobacco. time was computed as time to event (acute pancreatitis, other Occasional smokers were classified as ‘current’. In case of diseases of the pancreas or pancreatic cancer) if an event discordant same-day statuses, priority was given to ‘current’, occurred, otherwise as time to the end of follow-up (date of a rather than to ‘former’ and lastly to ‘never’ status. Last status person’s last available record within the database). The recorded prior to index date was preserved as tobacco use robustness of risk modelling with balancing factors were status. Complete information on anti-hyperglycaemic drugs, evaluated by estimating the weighted standardised differ- along with non-steroidal anti-inflammatory drugs, lipid- ences in these factors by treatment groups. modifying drugs, anti-hypertensive and cardioprotective Two sensitivity analyses were conducted to evaluate the medications was obtained. Cardioprotective medications robustness of the risk analyses in two sub-cohorts: (i) in all included beta-blocking agents, angiotensin-converting people from the study cohort excluding those with acute enzyme inhibitors, angiotensin II antagonists and statin. pancreatitis, other diseases of the pancreas or any type of cancer within 6 months of the index date (sub-cohort 1); and (ii) in all people from the study cohort with non-missing Ethical approval baseline HbA1c (sub-cohort 2). Sub-cohort 2 was addition-

Research involved existing data, in which individuals could ally balanced on HbA1c and body weight for risk analyses. not be identified directly or through identifiers linked to them. Thus, according to the US Department of Health and Results Human Services Exemption 4 (CFR 46.101(b)(4)), this study is exempt from ethics approval from an institutional review From 2 624 954 people identified as having Type 2 diabetes, board and informed consent. 225 898 met the inclusion criteria for the study (Fig. 1). At the index date, participants had a mean (SD) age of 59 (12) years, 49% were men, 69% had White European Statistical methods ancestry, and had an overall mean follow-up time of All analyses were performed by class of second-line anti- 3.2 years. Anti-hyperglycaemic drug groups as defined hyperglycaemic drugs. Basic statistics were presented using included 22% (n = 50 095) using DPP-4i; 6% number (%), mean (SD) or median [interquartile range (n = 12 654) using GLP-1RA; 15% (n = 34 805) using (IQR)], as appropriate. The event rates per 1000 person- insulin; 49% (n = 110 747) using sulfonylurea; and 8% years (95% CI) were estimated for acute pancreatitis, other (n = 17 597) using thiazolidinedione (Table 1). Follow-up diseases of the pancreas, pancreatic cancer using standard time of those in most of the treatment groups was similar, life-table method. except for those in the thiazolidinedione group which had a In the presence of significant differences in risk factors longer mean follow-up of 4.6 years. The distributions of age, between comparative treatment groups in observational diabetes duration and HbA1c were significantly different studies, standard Cox regression survival models after between groups, as expected. The proportions of people propensity score adjustments are often used. Estimation of adding or moving to a third-line anti-hyperglycaemic drug hazard ratios are useful for population effects when they are were similar in those receiving incretin-based therapy (47% constant, which occurs when the treatment enters linearly, in both the DPP-4i and GLP-1RA groups), although other and the distribution of the outcome has a proportional- groups had significantly lower proportions. The non-incretin hazards form [22]. However, decisions on therapeutic groups could not have received incretin-based therapies introductions and modifications are neither linear nor during follow-up, by design. conform to proportional-hazards form in the context of risk of an event. Given the observational nature of the study, with Risk of acute pancreatitis high likelihood of inherent differences in the comparator treatment groups, we used a ‘treatment effect’ modelling Only 1049 (0.46%) people developed acute pancreatitis approach [23–25]. The parametric gamma time-to-event during the mean 3.2 years of follow-up (Table 1). The rates model with inverse probability-weighted regression adjust- per 1000 person-years of acute pancreatitis were similar in ment for the confounders was used to evaluate the adjusted the DPP-4i (1.31; 95% CI 1.21, 1.59), GLP-1RA (1.49; 1.16, mean (95% CI) time to event for the reference treatment 1.92) and sulfonylurea (1.45; 1.33, 1.58) groups, whereas group (DPP-4), and the adjusted time difference (95% CI) to those treated with insulin had significantly higher acute the occurrence of event in the comparator treatment groups pancreatitis rate (2.01; 1.75, 2.31) and those treated with

ª 2018 Diabetes UK 3

204 DIABETICMedicine Pancreatic diseases in people with Type 2 diabetes  O. Montvida et al.

Paents with non-missing sex and age (n = 34 299 123)

Diabetes mellitus (n = 2 893 321)

Type 2 diabetes (n = 2 624 954)

Age at diagnosis ≥ 18 and <80 (n = 2 590 853)

Diabetes diagnoses a er entry to the EMR (n = 1 412 938)

Diabetes diagnosis on or a er jan 1 2005 (n = 1 305 686)

Meormin as first line (n = 740 478)

Iniated second line (n = 357 482) DPP-4i GLP-1RA INS SU TZD (n = 61 508) (n = 15 448) (n = 49 939) (n = 187 819) (n = 33 021)

No record of acute pancreas, or other disease of pancreas, or any type of cancer prior to second-line iniaon (n = 320 754) DPP-4i GLP-1RA INS SU TZD (n = 56 327) (n = 14 498) (n = 45 936) (n = 173 137) (n = 30 856)

On treatment for at least 3 months (n = 289 434) DPP-4i GLP-1RA INS SU TZD (n = 50 095) (n = 12 654) (n = 40 846) (n = 157 502) (n = 28 337)

No DPP-4 or GLP-1RA ever taken in TZD, INS, or SU groups (n = 225 898) DPP-4i GLP-1RA INS SU TZD (n = 50 095) (n = 12 654) (n = 34 805) (n = 110 747) (n = 17 597) MAIN COHORT

No record of acute pancreas, or other disease of pancreas, or any type of Non-missing HbA at the me of second-line iniaon (n = 131 482) cancer 6 months post second-line iniaon (n = 221 882) 1c

DPP-4i GLP-1RA INS SU TZD DPP-4i GLP-1RA INS SU TZD (n = 49 419) (n = 12 518) (n = 34 098) (n = 108 553) (n = 17 294) (n = 31 618) (n = 7 580) (n = 18 924) (n = 64 086) (n = 9 274) SUB-COHORT 1 SUB-COHORT 2

FIGURE 1 Flow-chart of the study cohort. EMR, electronic Medical Record; DDP-4i, dipeptidyl peptidase-4 inhibitor; GLP-1RA, glucagon-like peptide-1 receptor agonist; INS, insulin; SU, sulfonylurea; TZD, thiazolidinedione. thiazolidinedione had significantly lower acute pancreatitis groups. Among those with acute pancreatitis, 17 (2%) rate (0.89; 0.70, 1.12) compared with DPP-4i group developed pancreatic cancer, of whom four and two individ- (Table 2). The adjusted mean (95% CI) time to acute uals belonged to DPP-4 and GLP-1RA groups, respectively. pancreatitis in people treated with DPP-4 was 2.63 (2.38, The adjusted mean (95% CI) time to outcome in the DPP-4 2.88) years. Those treated with insulin were likely to develop group was 2.70 (2.19, 3.21) years, with no significant acute pancreatitis 0.48 years (P < 0.01) earlier. The adjusted differences in time to event in the insulin, sulfonylurea and mean time to acute pancreatitis were similar in other thiazolidinedione groups (Table 2). However, people treated treatment groups. with GLP-1RA were likely to develop pancreatic cancer by ~ 3 years later (P < 0.01) compared with the DPP-4 group.

Risk of cancer of the pancreas Risk of other diseases of the pancreas In the cohort, 357 (0.16%) people developed cancer of the pancreas, and there was no significant difference in the rate of Only 0.33% (n = 752) of the cohort experienced other pancreatic cancer per 1000 person-years among the treatment diseases of the pancreas during follow-up. Among those with

4 ª 2018 Diabetes UK

205 ª 08Daee UK Diabetes 2018 eerharticle Research

Table 1 Basic characteristics at the time of second-line anti-hyperglycaemic therapy initiation (index date)

Dipeptidyl peptidase-4 Glucagon-like peptide-1 inhibitor receptor agonist Insulin Sulfonylurea Thiazolidinedione Total

N 50 095 12 654 34 805 110 747 17 597 225 898 Age, years* 58 (12) 53 (12) 57 (13) 60 (12) 59 (12) 59 (12) † Men 24 034 (48) 4 346 (34) 16 302 (47) 57 876 (52) 9 174 (52) 111 732 (49) † White European ancestry 34 989 (70) 9 613 (76) 23 229 (67) 76 430 (69) 11 791 (67) 156 052 (69) † Black 5 852 (12) 1 083 (9) 5 083 (15) 12 971 (12) 1 581 (9) 26 570 (12) † Current tobacco use 4 872 (10) 979 (8) 3 929 (11) 9 980 (9) 797 (5) 20 557 (9) † Former tobacco use 8 086 (16) 1 989 (16) 5 729 (16) 16 790 (15) 1 232 (7) 33 826 (15) † Never tobacco use 15 265 (30) 3 839 (30) 8 828 (25) 26 797 (24) 2 482 (14) 57 211 (25) † Tobacco use – unknown status 21 872 (44) 5 847 (46) 16 319 (47) 57 180 (52) 13 086 (74) 114 304 (51) Diabetes duration prior to index date, 14.35 (21.47) 13.50 (20.85) 7.76 (16.61) 11.13 (19.92) 6.34 (14.27) 11.09 (19.64) months* Treatment duration, months* 26.22 (20.07) 24.85 (20.34) 31.15 (24.77) 31.59 (24.94) 32.49 (26.13) 30.03 (23.91) ‡ Follow-up, years 2.54 (1.3, 4.11) 2.67 (1.25, 4.67) 2.27 (1.13, 3.99) 2.67 (1.31, 4.6) 4.33 (1.99, 6.79) 2.66 (1.3, 4.54) Follow-up, years* 2.91 (1.96) 3.24 (2.39) 2.81 (2.13) 3.24 (2.37) 4.56 (2.89) 3.20 (2.34) ‡ HbA1c, mmol/mol 61 (52, 74) 56 (48, 68) 74 (56, 93) 63 (53, 77) 55 (48, 68) 62 (52, 78) ‡ 206 HbA1c,% 7.7 (6.9, 8.9) 7.3 (6.5, 8.4) 8.9 (7.3, 10.7) 7.9 (7.0, 9.2) 7.2 (6.5, 8.4) 7.8 (6.9, 9.3) SBP, mmHg* 130 (14) 128 (13) 131 (16) 132 (16) 130 (15) 131 (15) LDL, mmol/l* 2.53 (0.91) 2.48 (0.88) 2.53 (0.96) 2.53 (0.91) 2.51 (0.88) 2.53 (0.91) ‡ Triglicerides, mmol/l 1.66 (1.22, 2.21) 1.68 (1.22, 2.24) 1.61 (1.15, 2.20) 1.67 (1.22, 2.24) 1.55 (1.12, 2.11) 1.65 (1.21, 2.21) Weight, kg* 98 (24) 108 (26) 100 (26) 97 (25) 99 (24) 99 (25) BMI, kg/m2* 34.4 (7.6) 38.2 (83) 35.2 (8.3) 34.1 (7.7) 34.6 (7.9) 34.6 (7.9) † Obese 32 567 (70) 10 201 (87) 22 600 (71) 67 899 (68) 10 763 (70) 144 030 (70) † Hypertension 28 063 (56) 6 675 (53) 17 477 (50) 60 207 (54) 8 434 (48) 120 856 (54) † Cardiovascular disease 8 796 (18) 1 531 (12) 7 745 (22) 22 995 (21) 2 958 (17) 44 025 (19) † Chronic kidney disease 1 525 (3) 229 (2) 1 129 (3) 3 910 (4) 447 (3) 7 240 (3) Received third anti-hyperglycaemic 23 318 (47) 5 986 (47) 4 482 (13) 34 905 (32) 7 015 (40) 75 706 (34) † agent Received cardio-protective 46 395 (93) 11 336 (90) 31 862 (92) 103 553 (94) 16 390 (93) 209 536 (93) † medication Received non-steroidal anti- 37 265 (74) 9 269 (73) 25 121 (72) 82 963 (75) 13 207 (75) 167 825 (74) inflammatory drugs † Received anti-hypertensive 3 613 (7) 824 (7) 3 821 (11) 10 518 (10) 1 627 (9) 20 403 (9) † Sub-cohort 1 49 419 (99) 12 518 (99) 34 098 (98) 108 553 (98) 17 294 (98) 221 882 (98) † Sub-cohort 2 31 618 (63) 7 580 (60) 18 924 (54) 64 086 (58) 9 274 (53) 131 482 (58)

† ‡ DIABETIC Values are given as *mean (SD), n (%) or median (IQR). Sub-cohort 1, patients from the study cohort excluding those with acute pancreatitis, other disease of pancreas or any type of cancer within 6 months of index date. Sub-cohort 2, patients from the study cohort with non-missing HbA1c measure at index date. Medicine 5 DIABETICMedicine Pancreatic diseases in people with Type 2 diabetes  O. Montvida et al.

Table 2 Event rates (95% CI) per 1000 person-years; adjusted mean time to events (95% CI) in dipeptidyl peptidase-4 (DPP-4) group and adjusted difference in time to events in other treatment groups with DPP-4 inhibitor as a reference

Acute pancreatitis (95% CI) Pancreatic cancer (95% CI) Other disease of pancreas (95% CI)

DPP-4 Rate per 1000 person-years 1.38 (1.21, 1.59) 0.46 (0.36, 0.59) 0.93 (0.78, 1.10) Mean time to event (years) 2.63 (2.38, 2.88) 2.70 (2.19, 3.21) 2.73 (2.33, 3.12) Glucagon-like peptide-1 receptor agonist Rate per 1000 person-years 1.49 (1.16, 1.92) 0.17 (0.08, 0.36) 0.78 (0.55, 1.10) Time difference (years) À0.18 (À0.72, 0.37) 3.00 (0.84, 5.16)* 0.52 (À0.60, 1.65) Insulin Rate per 1000 person-years 2.01 (1.75, 2.31) 0.59 (0.46, 0.77) 1.48 (1.26, 1.75) Time difference (years) À0.48 (À0.90, À0.06)* À0.70 (À1.56, 0.17) À0.49 (À1.01, 0.03) Sulfonylurea Rate per 1000 person-years 1.45 (1.33, 1.58) 0.55 (0.47, 0.63) 1.04 (0.94, 1.15) Time difference (years) À0.01 (À0.51, 0.50) À0.57 (À1.26, 0.11) À0.43 (À1.13, 0.28) Thiazolidinedione Rate per 1000 person-years 0.89 (0.70, 1.12) 0.36 (0.25, 0.52) 0.85 (0.67, 1.08) Time difference (years) À0.25 (À0.56, 0.05) À0.09 (À0.74, 0.56) À0.28 (À0.74, 0.18)

*P < 0.01. † Analyses were balanced on age and the follow-up time by treatment groups, and were adjusted for age, sex, BMI, smoking status and diabetes duration. acute pancreatitis, 101 (10%) also had other diseases of the low in recent clinical trials. Low rates of these outcomes of pancreas. Except for the insulin group, the rate per 1000 interest, as well as the likely extended duration of exposure person-years for other diseases of the pancreas was similar and follow-up needed to ascertain a relationship between across treatment groups (Table 2). The mean (95% CI) time anti-hyperglycaemic drug exposure and the development of to develop other diseases of the pancreas in DPP-4i group malignancy, pose significant challenges in determining the was 2.73 (2.33, 3.12) years, with no significant difference pancreatic safety of drugs commonly used in diabetes from other treatment groups. management. In all survival regression models, the balances for defined In this retrospective, longitudinal, real-world study of a confounders were achieved among the treatment groups. The large cohort of people with metformin-treated Type 2 weighted standardized differences in confounding factors diabetes, we analysed the effects of second-line anti- across the treatment groups were similar while evaluating the hyperglycaemic drugs upon rates and times to clinically risk of acute pancreatitis, other diseases of the pancreas and important pancreatic complications. Our analyses have pancreatic cancer separately (Table S1). The sensitivity shown that the rate of acute pancreatitis was higher in analyses with sub-cohorts 1 and 2 showed similar estimates the group treated with insulin, lower in the thiazolidine- of time to events. However, after removing those who dione group, and similar in the groups receiving GLP-1RA developed acute pancreatitis or other diseases of the pancreas or sulfonylurea therapy when compared with the group or any cancer within 6 months of the index date (sub-cohort treated with DPP-4i. Time to development of pancreatic 1), the adjusted time to acute pancreatitis among people cancer was longer in the GLP-1RA group compared with treated with insulin was no more different from the other DPP-4i users, but did not differ significantly between the groups, as otherwise observed in the primary cohort. other anti-hyperglycaemic drug groups compared with DPP- 4i. Rates of any other disease of the pancreas were not higher among people who received additional therapy with Discussion a DPP-4i compared with other classes of anti-hyperglycae- A potential relationship between the use of incretin-based mic drugs. anti-hyperglycaemic drugs and adverse pancreatic outcomes The increased risk of pancreatitis in insulin users is perhaps has been suggested by various pre-clinical and clinical not surprising, because insulin users often tend to have a studies. In particular, meta-analysis of data obtained from greater burden of comorbidities and risks for adverse cardiovascular safety trials of several DPP-4i medications outcomes that cannot be fully adjusted for in a retrospective suggests that exposure to those medications significantly analysis. However, time to acute pancreatitis was no longer increased the risk of acute pancreatitis compared with significantly different between the insulin and DPP-4i groups placebo, although the absolute increase in risk was small after removing individuals who developed acute pancreatitis [8]. Interestingly, although people with Type 2 diabetes and or other diseases of the pancreas or any cancer within obesity are known to be at increased risk for pancreatitis and 6 months of the index date (sub-cohort 1). The lower rate pancreatic cancer, rates of those complications have been per 1000 person-years of acute pancreatitis noted in

6 ª 2018 Diabetes UK

207 Research article DIABETICMedicine thiazolidinedione users is perhaps more unexpected. Inter- anti-hyperglycaemic drug prescribed. These differences could estingly, studies in rodent models suggest that rosiglitazone not be readily determined or adjusted for in the analyses, and exposure may limit the severity of pancreatitis and shorten the impact of these individual characteristics upon pancreatic recovery from pancreatic inflammation in the setting of outcomes is unknown. Future pharmaco-epidemiologic stud- induced pancreatic injury [26,27]. However, the thiazo- ies with longer-term follow up and more robust medication lidinedione class is associated with a number of adverse side exposure and outcomes ascertainment will further comple- effects that have significantly limited clinical use of these ment our understanding of the pancreatic safety of anti- medications [28]. The findings of this analysis provide hyperglycaemic therapies. reassurance to prescribers and users of DPP-4i that these medications do not significantly increase the risk of adverse Funding sources pancreatic outcomes compared with other commonly pre- scribed second-line therapies. None. Several strengths of this analysis include the large number of people who met the inclusion criteria for analysis, the Competing interests robust amount of data collected and the reasonable duration of follow-up after addition of the second-line anti-hypergly- SKP has acted as a consultant and/or speaker for Novartis, caemic drug. Although the overall numbers of people expe- GI Dynamics, Roche, AstraZeneca, Guangzhou Zhongyi riencing acute pancreatitis and pancreatic cancer were small, Pharmaceutical and Amylin Pharmaceuticals LLC. He has the numbers exceed those available for inclusion in the received grants in support of investigator and investigator- previously cited meta-analysis [8]. The unique and novel initiated clinical studies from Merck, Novo Nordisk, aspects of this study include the careful choice of study cohort AstraZeneca, Hospira, Amylin Pharmaceuticals, Sanofi- without any history of pancreatic diseases or cancer. This Avensis and Pfizer. JA has received honoraria for consultan- approach reduces the likelihood of events attributable to pre- cies and lectures from Novartis, Novo Nordisk, Astra existing conditions such as biliary disease, structural pancre- Zeneca, Sanofi, Merck Sharp and Dohme, Abbott, Janssen atic disorders or an autoimmune/genetic predisposition to Cilag and Takeda. OM has no conflict of interest to declare. pancreatic diseases. The analyses also include a holistic JG has received research grants from AstraZeneca, Boehrin- evaluation of the risks associated with use of different anti- ger Ingelheim, GlaxoSmithKline, Intarcia and Sanofi, and hyperglycaemic drugs rather than a one drug vs. all approach; personal fees for consultative work from AstraZeneca, a detailed evaluation of the treatment patterns, ensuring non- Daiichi, Merck Sharp & Dohme, NovoNordisk and exposure to incretin-based therapies in other comparator Boehringer-Ingelheim. treatment groups; and careful choice of statistical methodol- ogy to evaluate the risk while robustly addressing the Acknowledgements challenging issues of imbalances in important risk factors and confounders across treatment groups. Our finding that SKP, OM and JA acknowledge a grant provided by Royal people treated with a DPP-4i are not at higher risk of Brisbane Women Hospital Foundation. Melbourne EpiCen- pancreatic diseases is robust, supported by the sensitivity tre gratefully acknowledges support from the National analyses in a large number of people with Type 2 diabetes. Health and Medical Research Council and the Australian Electronic databases do present challenges in terms of the Government’s National Collaborative Research Infrastruc- accuracy and completeness of the required data. As a result, ture Strategy (NCRIS) initiative through Therapeutic Inno- limitations of this study include non-availability of complete vation Australia. The authors are grateful to all contributors and reliable longitudinal data related to medication adherence, in the CEMR database. OM gratefully acknowledges the tobacco and alcohol consumption, socio-economic status and PhD scholarship from Queensland University of Technology, potential residual confounders. In particular, the inability to Australia, and her co-supervisors Prof. Ross Young and Prof. quantify alcohol exposure does not permit adjustment or Louise Hafner of the same university. balancing for this known risk factor for pancreatitis. The analyses were not adjusted for conditions such as hypertriglyc- Author contributions eridaemia, hypercalcaemia or non-anti-hyperglycaemic drug exposures that have been associated with pancreatitis; how- SKP conceived the idea and was responsible for the primary ever, these are considered responsible for only a small design of the study. OM, JA and JG contributed in the study percentage of overall cases of acute pancreatitis [29]. Further- design. OM conducted the data extraction, and SKP and OM more, other known risk factors for pancreatic cancer including jointly conducted the statistical analyses. The first draft of dietary composition, inactivity or family history/genetic pre- the manuscript was developed by OM and SKP, and all disposition are not available in routinely collected electronic authors contributed to the finalisation of the manuscript. health records. There may also have been inherent medical, SKP had full access to all the data in the study and is the socio-economic or other differences that affected the types of guarantor, taking responsibility for the integrity of the data

ª 2018 Diabetes UK 7

208 DIABETICMedicine Pancreatic diseases in people with Type 2 diabetes  O. Montvida et al. and the accuracy of the data analysis. Aggregated data is 16 Knapen L, van Dalem J, Keulemans Y, van Erp N, Bazelier M, De available upon on request. Bruin M et al. Use of incretin agents and risk of pancreatic cancer: a population-based cohort study. Diabetes Obes Metab 2016; 18: 258–265. References 17 Knapen LM, de Jong RG, Driessen JH, Keulemans YC, van Erp NP, De Bruin ML et al. The use of incretin agents and risk of acute 1 Deacon CF, Mannucci E, Ahren B. Glycaemic efficacy of glucagon- and chronic pancreatitis: a population-based cohort study. Dia- like peptide-1 receptor agonists and dipeptidyl peptidase-4 inhibi- betes Obes Metab 2017; 19: 401–411. tors as add-on therapy to metformin in subjects with type 2 diabetes 18 Chen H, Zhou X, Chen T, Liu B, Jin W, Gu H et al. Incretin-based —a review and meta analysis. Diabetes Obes Metab 2012; 14: 762– therapy and risk of pancreatic cancer in patients with Type 2 767. diabetes mellitus: a meta-analysis of randomized controlled trials. 2 Paul SK, Agbeve J, Maggs D, Best JH. Comparison of trajectories of Diabetes Ther 2016; 7: 725–742. self-monitored glucose levels by hypoglycemia status over 52 weeks 19 Centers for Disease Control and Prevention. National Diabetes of treatment with insulin glargine or exenatide once weekly. J Statistics Report: Estimates of Diabetes and its Burden in the Diabetes 2016; 8: 148–157. United States, 2014. Atlanta, GA: US Department of Health and 3 American Diabetes Association. Standards of Medical Care in Human Services, 2014. Diabetes—2018. Diabetes Care 2018; 41(Suppl 1): S4. 20 Crawford AG, Cote C, Couto J, Daskiran M, Gunnarsson C, Haas 4 Paul SK, Klein K, Maggs D, Best JH. The association of the K et al. Comparison of GE Centricity Electronic Medical Record treatment with glucagon-like peptide-1 receptor agonist exenatide database and National Ambulatory Medical Care Survey findings or insulin with cardiovascular outcomes in patients with type 2 on the prevalence of major conditions in the United States. Popul diabetes: a retrospective observational study. Cardiovasc Diabetol Health Manag 2010; 13: 139–150. 2015; 14: 10. 21 Paul SK, Shaw J, Montvida O, Klein K. Weight gain in insulin- 5 Wu S, Cipriani A, Yang Z, Yang J, Cai T, Xu Y et al. The treated patients by body mass index category at treatment cardiovascular effect of incretin-based therapies among type 2 initiation: new evidence from real-world data in patients with type diabetes: a systematic review and network meta-analysis. Expert 2 diabetes. Diabetes Obes Metab 2016; 18: 1244–1252. Opin Drug Saf 2018; 17: 243–249. 22 ElHafeez SA, Torino C, D’Arrigo G, Bolignano D, Provenzano F, 6 Azoulay L, Filion KB, Platt RW, Dahl M, Dormuth CR, Clemens Mattace-Raso F et al. An overview on standard statistical methods KK et al. Incretin based drugs and the risk of pancreatic cancer: for assessing exposure–outcome link in survival analysis (Part II): international multicentre cohort study. BMJ 2016; 352: i581. the Kaplan-Meier analysis and the Cox regression method. Aging 7 Azoulay L. Incretin-based drugs and adverse pancreatic events: Clin Exp Res 2012; 24: 203–206. almost a decade later and uncertainty remains. Diabetes Care 2015; 23 Rotnitzky A, Robins JM. Inverse probability weighting in survival 38: 951–953. analysis. Encyclopedia of Biostatistics. Chichester: Wiley, 2005. 8Tkac I, Raz I. Combined analysis of three large interventional trials 24 Austin PC, Stuart EA. The performance of inverse probability of with gliptins indicates increased incidence of acute pancreatitis in treatment weighting and full matching on the propensity score in patients with type 2 diabetes. Diabetes Care 2017; 40: 284–286. the presence of model misspecification when estimating the effect of 9 Thomsen RW, Pedersen L, Møller N, Kahlert J, Beck-Nielsen H, treatment on survival outcomes. Stat Methods Med Res 2017; 26: Sørensen HT. Incretin-based therapy and risk of acute pancreatitis: 1654–1670. a nationwide population-based case-control study. Diabetes Care 25 Austin PC. Variance estimation when using inverse probability of 2015; 38: 1089–1098. treatment weighting (IPTW) with survival analysis. Stat Med 2016; 10 Faillie J-L, Azoulay L, Patenaude V, Hillaire-Buys D, Suissa S. 35: 5642–5655. Incretin based drugs and risk of acute pancreatitis in patients with 26 Pini M, Rhodes DH, Castellanos KJ, Cabay RJ, Grady EF, type 2 diabetes: cohort study. BMJ 2014; 348: g2780. Fantuzzi G. Rosiglitazone improves survival and hastens recovery 11 Roshanov PS, Dennis BB. Incretin-based therapies are associated from pancreatic inflammation in obese mice. PLoS One 2012; 7: with acute pancreatitis: meta-analysis of large randomized con- e40944. trolled trials. Diabetes Res Clin Pract 2015; 110: e13–e17. 27 Wan H, Yuan Y, Liu J, Chen G. Pioglitazone, a PPAR-c activator, 12 Wang T, Wang F, Gou Z, Tang H, Li C, Shi L et al. Using real- attenuates the severity of cerulein-induced acute pancreatitis by world data to evaluate the association of incretin-based therapies modulating early growth response-1 transcription factor. Transl with risk of acute pancreatitis: a meta-analysis of 1 324 515 Res 2012; 160: 153–161. patients from observational studies. Diabetes Obes Metab 2015; 28 Woodcock J, Sharfstein JM, Hamburg M. Regulatory action on 17:32–41. rosiglitazone by the US Food and Drug Administration. N Engl J 13 Li L, Shen J, Bala MM, Busse JW, Ebrahim S, Vandvik PO et al. Med 2010; 363: 1489–1491. Incretin treatment and risk of pancreatitis in patients with type 2 29 Forsmark CE, Vege SS, Wilcox CM. Acute pancreatitis. N Engl J diabetes mellitus: systematic review and meta-analysis of ran- Med 2016; 375: 1972–1981. domised and non-randomised studies. BMJ 2014; 348: g2366. 14 Chou H-C, Chen W-W, Hsiao F-Y. Acute pancreatitis in patients Supporting Information with type 2 diabetes mellitus treated with dipeptidyl peptidase-4 inhibitors: a population-based nested case-control study. Drug Saf Additional supporting information may be found online in 37 – 2014; : 521 528. the Supporting Information section at the end of the article. 15 Chang C-H, Lin J-W, Chen S-T, Lai M-S, Chuang L-M, Chang Y-C. Dipeptidyl peptidase-4 inhibitor use is not associated with Table S1. Weighted standardized differences in balanced acute pancreatitis in high-risk type 2 diabetic patients: a nationwide groups. cohort study. Medicine 2016; 95: e2603.

8 ª 2018 Diabetes UK

209