<<

Opioids in chronic osteoarthritis pain - A systematic review and meta-analysis of efficacy and harms of randomized placebo-controlled studies of at least four weeks duration

R. Schaefert 1, P. Welsch 2, P. Klose 3, C. Sommer 4, F. Petzke 5, W. Häuser 6,7

1 Klinik für Allgemeine Innere Medizin und Psychosomatik, Universitätsklinikum Heidelberg, Heidelberg, Germany 2 Stichting Rugzorg Nederland, Ede, The Netherlands 3 Abteilung für Natuheilkunde und Integrative Medizin, Kliniken Essen-Mitte, Essen, Germany 4 Neurologische Klinik, Universitätsklinikum Würzburg, Germany 5 Schmerz-Tagesklinik und Ambulanz, Universitätsmedizin Göttingen, Göttingen, Germany 6 Klinik und Poliklinik für Psychosomatische Medizin und Psychotherapie, Technische Universität München, Germany 7 Innere Medizin I, Klinikum Saarbrücken gGmbH, Germany

Korrespondenzadresse: PD Dr.med. Winfried Häuser Innere Medizin 1 Klinikum Saarbrücken gGmbH Winterberg 1 D - 66119 Saarbrücken Germany Tel: +49 681 9632020 Fax: +49 681 9632022 Email: [email protected]

1

Background: The efficacy, tolerability and safety of therapy in chronic osteoarthritis (OA) pain are under debate. We updated Cochrane systematic reviews on the efficacy and safety of in chronic OA pain published in 2008. Methods: We screened Medline, Scopus, and the Cochrane Library (through October 2013), as well as reference sections of original studies and systematic reviews of randomized controlled trials (RCTs) of opioids in chronic non-cancer pain. We included double-blind randomized placebo-controlled studies  4 weeks. Relative risk differences (RD) of categorical data and standardized mean differences (SMD) of continuous variables were calculated using a random effects model. Results: We included 20 RCTs with 33 treatment arms, with 8545 participants and a median study duration of 12 (4 - 24) weeks. Six studies each tested , respectively , two studies each , , , respectively , and one study each , , and . Results are reported with [95% confidence intervals] Opioids were superior to placebo in reducing pain intensity (SMD -0.22 [-0.28, -0.17]; p < 0.00001; 16 studies with 6743 participants). Opioids were not superior to placebo in 50% pain reduction (RD -0.01 [-0.07, 0.06], p = 0.82; two studies with 2709 participants). Opioids were superior to placebo in reports of much or very much global improvement (RD 0.13 [0.05, 0.21]; p = 0.002; three studies with 2251 participants). Opioids were superior to placebo in improving physical functioning (SMD -0.22 [-0.28, -0.17]; p < 0.00001; 14 studies with 5887 participants). Patients dropped out more frequently with opioids than with placebo (RD 0.17 [0.14, 0.21], p < 0.00001; 15 studies with 6834 participants; number needed to harm 5 (4 - 6). There was no significant difference between opioids and placebo in the frequency of serious adverse events (SAE) and of deaths over the respective observation periods. Conclusions: In short-term studies (4 - 12 weeks), opioids were superior in terms of efficacy and inferior in terms of tolerability to placebo. The effect sizes of average reduction of pain intensity and physical disability were small. Opioids and placebo did not differ in terms of safety. The conclusion on the safety of opioids compared to placebo is limited by the low number of serious adverse events and deaths. Short- term opioid therapy in patients with chronic OA pain may be considered in selected patients. No current evidence-based guideline recommends opioids as first-line treatment option for chronic OA pain. To provide superior evidence for future treatment guidelines, RCTs must directly compare existing pharmacological and non- 2 pharmacological therapies with each other, as well as administer them in various combinations and sequences.

The English full-text version of this article is available at SpringerLink (under “Supplemental”). This article is published with free access at Springerlink.com

Key words: Osteorthritis; chronic pain; systematic review; meta-analysis; efficacy; tolerability; safety

3

Introduction Osteoarthritis (OA) is the most common disease of joints in adults around the world (10,24). In epidemiologic studies, OA is typically defined by radiographic findings and consideration of symptoms (31). About one-third of all adults have radiological signs of osteoarthritis (10). However, clinically significant osteoarthritis of the knee, hand, or hip in terms of chronic pain and/ or disability was found in only 8.9% of the adult population (2). The incidence and prevalence of OA are rising, likely related to the aging of the population and increasing obesity (24). Non-pharmacological as well as pharmacological modalities were recommended for OA by a recent guideline of the American College of Rheumatology (ACR) (16). Opioid were strongly recommended by the ACR in patients who were either not willing to undergo or had contraindications for total joint arthroplasty after having failed medical therapy. However, a Cochrane review on opioids in hip or knee OA pain published in 2009 concluded that the small to moderate beneficial effects of non-tramadol opioids were outweighted by large increases in the risk of adverse events. Non-tramadol opioids should therefore not be routinely used, even if osteoarthritic pain is severe (26). The debate on the use of opioids in OA also depends on the duration of opioid use. Short-term (< 4 weeks) opioid therapy might be appropriate in case of acute (on) chronic OA pain from a clinical point of view. However, if long-term opioid therapy in chronic OA pain is clinically useful, is under debate. Chronic opioid therapy has been defined by daily or near-daily use of opioids for at least 90 days, but in practice often used indefinitely (27). A systematic review on opioid therapy in chronic low back pain distinguished between short-term (4 - 12 weeks), intermediate term (13 - 26 weeks) and long-term (> 26 weeks) trials (4). To our knowledge, the last systematic review of opioids in chronic OA pain searched the literature until July 2008 and included short-term studies (< 4 weeks) into the analyses of efficacy and harms. Studies with opioid with additional mode of action (e.g. tramadol, tapentadol) were excluded (24). In the meanwhile, new randomized controlled trials (RCTs) with opioids in chronic OA pain have been published. Therefore, we updated the search of literature of systematic reviews on opioids in OA pain for the update of the German 2008 guideline on the long-term administration of opioids in chronic non-cancer pain (CNCP) (LONTS) (7). The objectives of this

4 review were to determine the efficacy, tolerability and safety of opioids compared to placebo in adult patients with chronic OA pain in placebo-controlled RCTs  4 weeks.

Methods The review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement (21) and the recommendations of the Cochrane Collaboration (15). Criteria for considering studies for this review Types of studies We included fully published double blind RCTs that compared any opioid to placebo (pure or pseudo) for therapeutic purposes in OA pain. We included studies with a parallel design and an enriched enrolment randomized withdrawal (EERW) design. Studies with a cross-over design were included if a.) separated data of the two periods were reported or b.) data were presented which excluded statistically significant carry-over effect or c.) statistical adjustments were carried out in case of a significant carry-over effect. Study duration should be at least 4 weeks (tapering and maintenance phase for parallel and cross-over design; double blind withdrawal phase for EERW design). Studies should include at least 10 patients per treatment arm. We grouped outcome measures according to the duration of postrandomization follow-up, as proposed by Chaparro et al. (4): short-term (4 - 12 weeks), intermediate (12 - 26 weeks) and long-term (> 26 weeks). We had no restriction on the language of the publication. We excluded studies which conducted a tapering phase after open-label run-in and a consecutive double-blind parallel design with responders of the open-label run-in period. We excluded studies with a duration of the tapering/ maintenance or withdrawal period of less than 4 weeks, with an experimental design (i.e. if the primary purpose was to study pain mechanisms and not pain relief) and studies which were only published as abstracts. Furthermore, we excluded studies in which different dosages of one opioid were compared without a placebo control group. Types of participants We included men and women of all ages and races or ethnicities diagnosed with clinically or radiologically confirmed peripheral joint OA and associated pain of at least three months duration. Trials exclusively including patients with inflammatory 5 arthritis, such as rheumatoid arthritis were not included. We excluded studies with mixed study samples (participants with OA and low back pain) were if data of the two groups were not presented separately. Types of interventions We included trials that examined the use of any opioid prescribed in an outpatient setting, for a period of at least 4 weeks (titration and maintenance). We considered trials with opioids given by oral or transdermal routes. We included studies in which opioids were combined with abuse deterrent formulations (ADFs) (e.g. ). We included studies with tramadol, a centrally acting, synthetic opioid with two complementary mechanisms of action: binding of parent and M1 metabolite to μ - opioid receptors and weak inhibition of reuptake of norepinephrine and serotonin. We included studies with tapentadol with two mechanisms of action: μ- and norepinephrine reuptake inhibitor. We included both drugs into this review because they are classified as opioids by German medicine agencies. We considered trialswhich compared opioid with placebo: We excluded trials that examined opioids given by intravenous route, including implantable pumps, due to the invasive nature of the therapy and its limited clinical relevance in the outpatient setting. In addition, the effectiveness of opioids used in neuraxial implantable pumps has been discussed elsewhere (25). We excluded studies in which the primary aim of the study was to test the efficacy of opioids as rescue medication. We excluded studies with complete tapering off opioids after open-label run-in followed by a randomized placebo-controlled parallel design. We excluded studies in which drugs other than opioid agonists were used as a fixed combination with opioids (e.g. tramadol with acetaminophen) because it is not possible to detangle the effects of opioids from the one or the other analgesic. Limited rescue medication with non-opioids was accepted. We excluded studies in which a defined opioid was compared to the same opioid with abuse deterrent formulations (ADFs) (e.g. oxycodone with and without naloxone) or in which two opioids combined were compared with a single opioid. We excluded studies in which opioids were compared to non-pharmacological treatments. We excluded studies with propoxyphene because the drug has been withdrawn from the market (United States Food and Drug Administration NEWS RELEASE vom 19.11.2010).

6

Types of outcome measures The selection of outcomes was based on the recommendations of the ACTINPAIN writing group of the International Association for the Study of Pain (IASP) Special Interest Group (SIG) on Systematic Reviews in Pain Relief and the Cochrane Pain, Palliative and Supportive Care Systematic Review Group editors for reporting meta- analyses of RCTs in chronic pain (22). We included pain intensity as additional outcome because most studies conducted before 2005 did not report responder analyses (5,26). Efficacy 1. Pain intensity ratings 2. Proportion of patients reporting 50% pain relief (responders) 3. Global improvement (Patient Global Impression of Change PGIC): Number of patients reporting to be much or very much improved 4. Function: Examples of functional impairment outcomes that could be extracted were as follows: Brief Pain Inventory (BPI); Fibromyalgia Impact Questionnaire Subscale Physical Function (FIQ); Multidimensional Pain Inventory (MPI, physical function); Western Ontario and McMaster Universities Arthritis Index (WOMAC); Neck Disability Index (NDI); Oswestry Disability Index (ODI); Pain Disability Index (PDI), physical disability; Roland Disability Questionnaire (RDQ); Short Form (SF)-36 or SF-12 (physical functioning scale). In case both, generic and disease specific instruments were used, disease specific instruments were preferred (e.g. FIQ over PDI, WOMAC over SF-36 physical functioning scale). 5. Proportion of patients who withdrew due to lack of efficacy We excluded studies in which the primary outcome measure was not one of the five outcomes of efficacy as defined above. Tolerability 1.Proportion of patients who withdrew because of adverse events Safety 1. Proportion of patients who experienced any SAE 2. Proportion of patients who died during study Search methods for identification of studies Electronic searches The review updated and expanded the search of literature for the first version of the German guideline on long-term administration of opiods (LONTS) which searched the

7 literature until October 2008 (27). The updated and expanded search included CENTRAL, Medline and Scopus from October 2008 to October 2013 and all types of chronic non-cancer pain (CNCP). The search was conducted by PK. Our search included all languages. The search strategy for Pubmed is detailed in the electronic supplementary table 1. Searching other resources We searched bibliographies from reviewed articles and we retrieved relevant articles. We screened the references of recent systematic reviews on long-term treatment of opioids in CNCP (12,17,19,20) and in OA pain (5,26). We contacted the steering committee of the update of the German guideline on long-term administration of opioids (LONTS) to assist in locating fully published studies which we might have missed by our search. Data collection and analysis Selection of studies Two authors (PW, WH) independently screened titles, abstracts, and keywords of trials that we identified by the search strategies to determine if our inclusion criteria were met. We obtained the full text of trials that either appeared to meet our inclusion criteria or for which we considered their inclusion was uncertain. We screened these articles for inclusion and we resolved any disagreements through discussion. Data extraction Three pairs of authors (CS, WH; FP, WH; RS, WH) independently extracted data, using the standardized forms on inclusion and exclusion criteria of the studies, characteristics of participants, intervention group, clinical setting, interventions, country of study, and study funding. If data were not available in a format that was appropriate for data extraction, we did not contact the authors of the trial for further clarification. We resolved any disagreements through discussion. Dealing with missing data If both, baseline observation carried forward (BOCF) data as well as last observation carried forward (LOCF) data were reported for intention-to-treat (ITT) analysis, we preferred BOCF data (23). Where means or standard deviations (SD) were missing, we calculated them from t- values, CIs or standard errors, as far as reported in the articles (15). If missing SDs could not be calculated from these values, the study was excluded from analysis. Measures of treatment effect

8

The effect measures of choice were absolute risk differences (RD) for dichotomous data and standardized mean difference (SMD) for continuous data (pain intensity, physical functioning) using a random effects model (method inverse variance). For subgroup analyses of dichotomous outcomes we calculated risk ratios (RR). We expressed uncertainty using 95% CIs. The threshold for “appreciable benefit” or “appreciable harm” was set for categorical variables by a relative risk reduction (RRR) or relative risk increase (RRI) >= 25% (4). We used Cohen’s categories to evaluate the magnitude of the effect size, calculated by SMD, with Hedge’s g of 0.2 = small, 0.5 = medium and 0.8 = large (6). We labelled g < 0.2 to be a ”not substantial” effect size. We assumed a minimally important difference if there was small effect size (9). The numbers needed to treat for an additional beneficial outcome (NNTB) and the numbers needed to treat for an additional harm (NNTH) for dichotomous variables (50% pain reduction [responder], Patient Global Impression of Change [PGIC], drop out due to adverse events, SAEs, death) were calculated by a calculator provided by the Cochrane Musculoskeletal Group (personal communication). Subgroup comparisons were performed by the test of interaction (1). Unit-of-analysis issues In the case of multiple opioid arms compared with one placebo group, we adjusted the number of participants in the placebo group according to the number of opioid arms for continuous outcomes. Data synthesis We pooled data from RCTs comparing opioids to placebo controls by a random- effects model (method inverse variance). We used the I² statistic to describe the percentage variability of effect estimates that is due to heterogeneity. I² values above 50% indicate high heterogeneity, between 25% and 50% moderate heterogeneity, and below 25% low heterogeneity (15). The risk of bias in each trial was assessed independently by two pairs of authors (CS, WH; FP; WH; RS, WH) using eight aspects of bias recommended by the Cochrane Collaboration considering (see figure 2 and see electronic supplementary table 2) (4). We defined a study to have high quality (low risk of bias) when it fulfilled six to eight, to have moderate quality study (moderate risk of bias) if it fulfilled three to five and to have low quality (high risk of bias) if it fulfilled zero to two aspects.

9

We used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) to assess the overall quality evidence (13,15), defined as the extent of confidence into the estimates of treatment benefits and harms. Quality ratings were made separately for each of the eight quality indicators. The quality of evidence was downgraded by one level for each of the following factors that were encountered: Limitations of study design: > 50% of the participants in low quality studies, Inconsistency of results: I² > 50% Indirectness: We assessed whether the question being addressed in this systematic review was different from the available evidence regarding the population in routine clinical care, if patients with clinically relevant somatic disease and/ or major mental disorders (history of substance abuse or major depression) were excluded in  50% of participants. Imprecision: There was only one trial or when there was more than one trial, the total number was < 400 patients or when 95% CI of the effect size included zero. We categorized the quality of evidence as follows: · High: further research is very unlikely to change the confidence in the estimate of effect. · Moderate: further research is likely to have an important impact in the confidence in the estimate of effect. · Low: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. · Very low: any estimate of effect is very uncertain. Assessment of reporting biases We used the Egger intercept test and the Begg rank correlation test at the significance level p < 0.05, if at least 10 studies were available. The Begg test examines the rank correlation between standardized intervention effect and its standard error. An asymmetric funnel plot would give rise to such a correlation and may be indicative of publication bias (3). In the Egger test, the standard normal deviate is regressed on precision, defined as the inverse of the standard error. The intercept in this regression corresponds to the slope in a weighted regression of the effect size on the standard error (8). Subgroup analysis

10

Subgroup analyses were a priori planned to assess the variations in effect sizes (heterogeneity) by pooling study results of the relative effects of opioids compared to placebo for the outcomes (pain intensity and drop out due to adverse events) for different types of opioids (pure opioids versus opioids with additional modes of action [tramadol, tapentadol]) and for treatment duration (short-term, intermediate-term and long-term studies). At least two studies should be available for subgroup analysis. Sensitivity analyses We performed sensitivity analyses of all types of opioids pooled together compared to placebo groups pooled together for the outcomes in studies in which we extracted means and/ or SDs from figures or calculated SD from p-values. Software Comprehensive meta-analysis (Biostat, Englewood, NJ, USA) and RevMan Analysis (RevMan 5.2) software of the Cochrane Collaboration were used for statistical analyses.

Results Literature search

After removing duplicates, the literature search produced unique 12601 citations. Through screening, 12580 records were excluded. Twenty-one full-text articles were assessed for eligibility. One study was excluded after full-text review. Twenty studies with 33 treatment arms were included in the meta-analysis (see Figure 1).

Study characteristics (see table 1 and electronic supplementary table 3)

Study design: We included 20 RCTs with 33 treatment arms, with 8545 participants and a median study duration of 12 (4 - 24) weeks. One study (5%) had a duration  12 weeks, namely 24 weeks (appendix reference 4). Fifteen (75.0%) studies had a parallel, one (5.0%) study had a cross-over and four (20.0%) studies had an EERW design. Fifteen (75.0%) studies were conducted in North America, four studies (20%) in Europe and one study (5%) in different continents. Nineteen studies (50%) were funded by the manufacturer of one of the tested drugs. One study was supported by public funding.

Participants: Participants were diagnosed with OA of the hip and/ or knee. Seventeen (85%) studies excluded patients with current and/ or a history of substance abuse

11 and/ or current major mental disorders and 18 (90%) studies excluded patients with clinically relevant medical diseases. The range of the mean ages of participants in the studies was 58-64 years. The participants were predominantly caucasian, the gender ratio was nearly balanced.

Interventions: Six studies each tested oxycodone and tramadol, two studies each buprenorphine, hydromorphine, morphine and tapentadol and one study each codeine, fentanyl and oxymorphone. All oral opioids were administered by extended release (ER) formulations except one four arm study which used in one treatment arm immediate release morphine (appendix reference 6). 16 (80%) studies used a flexible dosage of the opioid, the remaining ones used a fixed dosage. Five (25%) studies did not report on rescue medication, three (15%) studies prohibited any analgesic rescue medication and 12 (60%) studies allowed rescue medication (acetaminophen, NSAIDs, short-acting opioids).

Study quality

Risk of bias could not be exactly assessed in all studies due to poor method reporting. No study had a high study quality. Fourteen studies (70%) had a moderate, five studies (20%) had a low study quality, and two studies (10%) had a very low study quality (see Figure 2 for risk of bias graph and Figure 3 for risk of bias summary). Detailed information regarding risk of bias assessments of every study are given in the Electronic Supplementary Material table 4. Synthesis of results

Parallel and cross-over design (Results are reported with 95% CI) Sixteen studies with 6743 participants were entered into an analysis of mean pain reduction at the end of the study. Opioids were superior to placebo (SMD -0.22 [- 0.28, -0.17]; p < 0.00001; I² = 21%) (moderate quality evidence). According to Cohen’s categories the effects size was small (see Electronic Supplementary Material figure 1). One study (appendix reference 2) did not report means and SDs, but that tapentadol and oxycodone were not significantly superior to placebo. Two studies with 2709 participants were entered into an analysis of 50% pain reduction at the end of the study. Opioids were not superior to placebo (RD -0.01 [- 0.07, 0.06] p = 0.82, I² = 75%) (low quality evidence) (see Electronic Supplementary Material figure 2).

12

Three studies with 2251 participants were entered into an analysis of Patients’ Global Impression of Change (PGIC) reports to be much or very much improved at the end of the study: Opioids were superior to placebo (RD 0.13 [0.05, 0.21]; p = 0.002; I² = 74%) (moderate quality evidence) (see Electronic Supplementary Material figure 3). 510/1018 (50.0%) of patients in opioid and 467/1233 (37.8%) of patients in placebo groups reported to be much or very much improved (NNTB 8 [6-12]). According to the predefined criteria there was an appreciable benefit by opioids (RRR 32 % [20%- 45%]). Fourteen studies with 5887 participants were entered into an analysis of improved physical functioning at the end of the study. Opioids were superior to placebo (SMD - 0.22 [-0.28, -0.17]; p < 0.00001; I² = 0%) (moderate quality evidence) (see Electronic Supplementary Material figure 4). According to Cohen’s categories, the effect size was small. Two studies did not report means and SDs. One study reported that tramadol was superior to placebo (appendix reference 8). One study reported that tapentadol and oxacodone were not superior to placebo (see appendix reference 2). Fourteen studies with 6457 participants were entered into an analysis of drop outs due to lack of efficacy. Patients dropped less frequently out in opioid than in placebo group (RD -0.13 [.0.16, -0.10], p<0.0001, I²=72) (moderate quality evidence) (see Electronic Supplementary Material figure 5). 386/3873 (10.0%) dropped out in opioid and 596/2584 (23.1%) dropped out in placebo group (NNTB 8 [7-9]). According to the predefined criteria there was an appreciable benefit by opioids (RRR 57 % [51%- 62%]). Fifteen studies with 6834 participants were entered into an analysis of drop outs due to adverse events. Patients dropped out more frequently with opioids than with placebo (RD 0.17 [0.14, 0.21], p < 0.00001, I² = 77%) (moderate quality evidence) (see Electronic Supplementary Material figure 6). 25.6% (1075/4207) of patients dropped out in the opioid groups and 7.0% (184/2627) in placebo groups due to adverse events (NNTH 5 [95% CI 4 - 6) (see Electronic Supplementary Material figure 5). According to the predefined criteria there was an appreciable harm by opioids (RRI 237 % [192%-291%]).

Eleven studies with 5520 participants were entered into an analysis of SAEs: There was no significant difference between opioids and placebo (RD 0.00 [- 0.00, 0.01]; p

13

= 0.37; I² = 2%) (moderate quality evidence) (see Electronic Supplementary Material figure 7). Seven studies with 4694 participants were entered into an analysis of deaths. 1/2752 patients in opioid and 4/1942 in placebo groups dies during the study (RD - 0.00 [- 0.00, 0.00]; p = 0.88, I²=0%) (moderate quality evidence) (see Electronic Supplementary Material figure 8). EERW design Three studies with 823 participants were entered into an analysis of mean pain reduction from baseline to end of treatment. Opioids were superior to placebo (SMD - 0.26 [-0.49, -0.03]; p = 0.03; I² = 57%) (low quality evidence). According to Cohen’s categories the effects size was small (see Electronic Supplementary Material figure 9). One study with 344 participants was entered into a responder analysis of 50% pain reduction at the end of the study. Opioid was not superior to placebo (RD 0.09 [-0.01, 0.20]; p = 0.08) (moderate quality evidence) (see Electronic Supplementary Material figure10). One study with 344 participants was entered into an analysis of physical functioning at the end of the study. Opioid was not superior to placebo (SMD -0.13 [- 0.34, 0.08] p = 0.24) (moderate quality evidence) (see Electronic Supplementary Material figure 11). One study reported that tramadol was superior to placebo (appendix reference 8) but did not provide means and SDs. Four studies with 1178 participants were entered into an analysis of drop outs due to lack of efficacy. Patients dropped less frequently out in opioid than in placebo group (RD -0.13 [.0.18, -0.09], p<0.0001, I²=7) (moderate quality evidence) (see Electronic Supplementary Material figure 12). 68/599 (11.4%) dropped out in opioid and 140/579 (24.2%) dropped out in placebo group (NNTB 8 [6-12]). According to the predefined criteria there was an appreciable benefit by opioids (RRR 53 % [39%- 64%]). Three studies with 826 participants were entered into an analysis of dropping out due to adverse events. There was no significant difference between opioids and placebo (RD 0.05 [-0.00, 0.11], p = 0.06, I² = 35%) (moderate quality evidence) (see Electronic Supplementary Material figure 13). Two studies with 756 participants were entered into an analysis of SAEs. There was no significant difference between opioids and placebo (RD 0.01 [- 0.01, 0.03]; p =

14

0.40; I² = 0%) (moderate quality evidence) (see Electronic Supplementary Material figure 14). One study (appendix reference 10) with 412 participants explicitly stated that were no deaths in both groups. Subgroup and sensitivity analyses

In parallel and cross-over studies, opioids and opioids with additional mode of action (tapentadol, tramadol) did not significantly differ in mean pain reduction (z= 0.01, p=0.87). Thedrop out rates due to adverse events was higher with opioids than with opioids with additional mode of action (z=3, p=0.003) (see table 2).

Removing two studies with means and SDs extracted from figures from analysis did not change the significance and the magnitude of effect of pain reduction and dropping out due to adverse events (details available on request).

Publication bias

The Kendall tau of the Begg rank correlation test of the outcome pain intensity reduction of studies with a parallel and cross over design was significant (tau= -0.47, P two-tailed = 0.0005). The Egger intercept of the outcome pain intensity of studies with a parallel and cross over design was significant (intercept = -3.79, p two-tailed = 0.01). Both tests were indicative of a publication bias.

Discussion Summary of main results In short-term studies (4 - 12 weeks), opioids were superior in terms of efficacy and inferior in terms of tolerability to placebo. Opioids and placebo did not differ in terms of safety.The effect size of opioids on pain and physical function were small. Comparison with other systematic reviews The results of this systematic review on the efficacy, but also the limited tolerability of opioids compared to placebo are consistent with the ones of previous systematic reviews of Cochrane groups on opioids in OA pain. Cepeda and co-workers ( 5) analysed 11 RCTs with a total of 1019 participants who received tramadol or tramadol/ and 920 participants who received placebo or active-control irrespectively of study duration. Participants who received tramadol had less pain (- 8.5 units on a 0 to 100 scale; 95% CI -12.0 to -5.0) than patients who received 15 placebo. Of every 8 people who received tramadol or tramadol/ paracetamol, 1 (12.5%) stopped taking the medication because of adverse events, the NNTH was 8 (95% CI 7 to 12) for major adverse events. Nüesch and coworkers (26) included ten trials with 2268 participants irrespectively of study duration. Oral codeine was studied in three trials, transdermal fentanyl and oral morphine in one trial each, oral oxycodone in four, and oral oxymorphone in two trials. Overall, opioids were more effective than control interventions in terms of pain relief (SMD -0.36, 95% CI -0.47 to -0.26) and improvement of function (SMD -0.33, 95% CI -0.45 to -0.21). The authors did not find substantial differences in effects according to type of opioid, analgesic potency (strong or weak), daily dose and duration of treatment. Adverse events were more frequent in patients receiving opioids compared to control. The pooled risk ratio was 1.55 (95% CI 1.41 to 1.70) for any adverse event (4 trials), 4.05 (95% CI 3.06 to 5.38) for dropouts due to adverse events (10 trials), and 3.35 (95% CI 0.83 to 13.56) for SAEs (two trials).

Limitations Only double-blind randomized placebo-controlled studies were included in this meta- analysis, representing a high level in evidence based medicine. However, the methodological quality of the included studies was predominantly at least moderate. The blinding of outcome assessment was mostly unclear implying a high detection bias. Complete data reporting was often doubtful, leading to a high attrition bias. There was a high risk of selective reporting causing a relevant reporting bias. Almost all studies were funded by the manufacturers of the tested drugs implying a high funding bias.The external validity of the study results for OA-patients in routine clinical care is limited, because no subgroup analyses of very aged patients (e.g. >75 years) was presented by any study. The Kendall tau of the Begg rank correlation test of the outcome pain reduction and the Egger intercept of the outcome pain, both were indicative for publication bias. Negative study results may not have been published which can lead to an overestimation of the true intervention effect. On the other hand, we might have underestimated the quality of studies because we did not ask the authors for missing details. Summarizing, the methodological quality of the studies and their reporting should be improved in future research. The conclusion on the safety of opioids compared to placebo is limited by the low number of serious adverse events and deaths.

16

Future research directions The ability of systematic reviews of placebo-controlled studies to guide physicians and patients in the choice of treatment options in chronic OA pain is very limited. Head-to-head comparisons of opioids with other drugs have rarely been been conducted in chronic OA pain (11,16). A comparative effectiveness review compared Cox 1- and Cox 2-inhibitors (6) but did not include opioids. A recent systematic review of head-to-head comparisons of opioids and non-opioid analgesics found low quality evidence (five studies) that nonsteroidal agents were superior to tramadol in pain reduction, improvement of physical function and tolerability (30). To provide a superior evidence base for future treatment guidelines, additional RCTs must be conducted in which existing drugs are directly compared with each other and administered in various combinations. Additionally, whether non-pharmacological approaches for the management of patients with chronic OA pain (eg, physical therapy and life-style interventions) should be used before, in combination with, or after pharmacological treatments, must be tested in clinical trials. Traditional RCTs may not be the method of choice to answer all these questions; alternative approaches should be developed and evaluated (eg, systematic comparative effectiveness studies of health care registry data.

Conclusions for clinical practice Opioids may be considered for the short-term treatment (4 - 24 weeks) of chronic OA pain. However, clinicians should keep in mind that no current evidence-based guideline recommends opioids as first-line treatment options for chronic OA pain (11,16). In addition, recent data from the UK General Practice Research Database indicated that the risk of fracture was increased during initiation of opioid therapy (18). The EULAR (European League Against Rheumatism) guideline recommended patient information and education, lifestyle changes, exercise, weight loss, assistive technology and adaptations, footwear and work as non-pharmacological treatments (11). The ACR strongly recommended non-pharmacological therapies for the management of knee OA such as aerobic, aquatic, and/ or resistance exercises as well as weight loss for overweight patients. Non-pharmacological modalities conditionally recommended for knee OA included medial wedge insoles for valgus

17 knee OA, subtalar strapped lateral insoles for varus knee OA, medially directed patellar taping, manual therapy, walking aids, thermal agents, Tai Chi, self management programs, and psychosocial interventions. Pharmacological modalities conditionally recommended for the initial management of patients with knee OA included acetaminophen, oral and topical NSAIDs (in combination with a proton- pump inhibitor) and intraarticular corticosteroid injection. Intraarticular hyaluronate injections, duloxetine, and opioids were conditionally recommended in patients who had an inadequate response to initial therapy. Opioid analgesics were strongly recommended in patients who were either not willing to undergo or had contraindications for total joint arthroplasty after having failed medical therapy. Recommendations for hip OA were similar to those for the management of knee OA (16). Long-term open-label studies demonstrated that a minority of patients with chronic OA pain initially treated with opioids will experience a sustained (> 1 year) response with no or tolerable side effects (14,29). Long-term ( 26 months) opioid therapy may be offered to sustained responders to short-term opioid therapy and/ or to non- responders to physical therapy and/ or life-style interventions or patients who are not suited for total joint arthroplasty due to major medical diseases. However, the potential benefits of long-term opioid therapy must be carefully weighted against the potential risks of long-term opioid therapy such as aberrant drug use, increased mortality, fractures and hypogonadism (15,18).

Achknowledgements

We thank Professor Sorgatz (Essen) for revieweing our extractions of the drop out rates due to lack of efficacy.

18

Table 1: Overview of the randomized controlled trials in chronic osteoarthritis pain included into the systematic review (grouped by type of opioid in alphabetical order)

Buprenorphine Reference Study design Population Interventions Duration of trial Year type and control Countries of Number of group study centers patients randomized Breivik Parallel Osteoarthritis Stable dose 5 - 9 days 2010 pain NSAID or screening Denmark, Coxib oral 24 weeks Finland, 199 plus 7-day titration and Norway, buprenorphine maintenance Sweden flexible 5 or 10 4 weeks follow- or 20 ug/h up transdermal patch

Stable dose NSAID or Coxib plus placebo transdermal patch Munera Parallel Osteoarthritis 7-day 1-week run-in 2010 pain buprenorphine period USA flexible 5 or 10 3 weeks 315 or 20 ug/h titration transdermal 1 week patch maintenance Placebo transdermal patch

Codeine Reference Study design Population Interventions Duration of trial Year type and control Countries of Number of group study centers patients randomized Peloso Parallel Osteoarthritis Codeine Duration pain flexible 100 – screening not Canada 400 mg/d oral reported 2000 103 4 weeks Placebo oral titration and maintenance

19

Fentanyl Reference Study design Population Interventions Duration of trial Year type and control Countries of Number of group study centers patients randomized Langford Parallel Osteoarthritis Stable dosage 1 week Great Britain pain of steroids or screening 2006 NSAIDS oral 6 weeks 399 plus titration to titration and individually maintenance optimal 3 days tapering dosage of off fentanyl 25,50,75 or 100 ug/h transdermal patch

Stable dosage of steroids or NSAIDS oral plus placebo transdermal patch

Hydromorphone Reference Study design Population Interventions Duration of trial Year type and control Countries of Number of group study centers patients randomized Rauck Parallel Osteoarthritis Hydromorphone ≤ 2 weeks 2012 pain fixed 8 or 16 wash out USA mg/d oral ≤ 16 days 981 Placebo titration 12 weeks maintenance ≤ 1 week taper Vojtassak Parallel Osteoarthritis Hydromorphone ≤ 1 week 2011 pain fixed 32 mg/d screening Slovakia oral 4 week titration 278 Placebo 12 week maintenance 28 weeks open label

20

Morphine Reference Study design Population type Interventions Duration of trial Year Number of and control (titration and Countries of patients group maintenance) study centers randomized Caldwell Parallel Osteoarthritis Extended Duration 2002 pain release screening and USA morphine 30 wash-out not 295 mg/d once reported daily in the 4 weeks morning oral maintenance Extended release 26 weeks open morphine 30 label mg/d once daily in the evening oral Morphine 2x15 mg/d oral Placebo oral Katz Enriched- Osteoarthritis Morphine and Screening and 2010 enrollment pain wash-out ≤ 14 USA randomized extended days withdrawal 547 release flexible ≤ 45 days open design 20 - 160 mg/d label titration oral 12 weeks Placebo double-blind withdrawal

21

Oxycodone Reference Study Population Interventions and Duration of Year design type control group trial (titration Countries of Number of and study patients maintenance) centers randomized Caldwell Enriched- Osteoarthritis Oxycodone oral Duration 1999 enrollment pain flexible 40 to 100 screening USA randomized mg/d oral and wash-out withdrawal 107 not reported Placebo 4 weeks open label titration 4 weeks double-blind withdrawal Friedmann Enriched- Osteoarthritis Oxycodone oral Duration USA enrollment pain flexible 10 - 80 mg/d screening 2011 randomized oral and wash-out withdrawal 412 4 - 10 days

2 weeks open label titration

12 weeks double-blind maintenance 6 months open label Markenson Parallel Osteoarthritis Oxycodone oral Duration 2005 pain flexible up to 120 screening USA mg/d oral and wash-out 107 not reported 13 weeks titration and maintenance

22

Oxymorphone Reference Study Population Interventions and Duration of Year design type control group trial (titration Countries of Number of and study centers patients maintenance) randomized Matsumoto Parallel Osteoarthritis Oxymorphone oral 2 - 7 days 2005 pain fixed 40 mg/d or 80 wash-out USA mg/ 489 4 weeks fixed Oxycodone oral 40 mg/d Placebo

Tapentadol Reference Study design Population Interventions Duration of trial Year type and control (weeks) Countries of Number of group study centers patients randomized Afilalo Parallel Osteoarthritis Tapentadol < 2 weeks 2010 knee pain flexible 200 - screening Australia, 1050 600 mg/d oral 3 - 7 days Canada, New wash-out Zealand, USA Oxycodon 3 weeks flexible 40 - titration 100 mg/d oral 12 weeks Placebo maintenance 10 - 14 days follow-up Afilalo Parallel Osteoarthritis Tapentadol Duration of 2013 knee pain flexible 200 - screening not 13 European 987 600 mg/d oral reported countries 3 - 7 days Oxycodon wash-out flexible 40 - 100 mg/d oral 3 weeks Placebo titration 12 weeks maintenance 2 weeks follow- up

23

Tramadol Reference Study Population type Interventions and Duration of Year design Number of control group trial (weeks) Countries of patients study centers randomized Babul Parallel Osteoarthritis Tramadol flexible 3 - 7 days 2004 knee pain 100 - 400 mg/d oral wash-out USA 246 Placebo 1 week titration 12 weeks maintenance Delemos Parallel Osteoarthritis Tramadol fixed 100, 2 - 7 days 2011 pain 200 or 300 mg/d oral wash-out, USA 806 Placebo oral 12 weeks maintenance, 1 week follow-up Fishmann Parallel Osteoarthritis Tramadol fixed 100, 6 days 2007 knee or hip 200 or 300 mg/d oral titration USA pain Placebo 12 weeks 552 maintenance Fleischmann Parallel Osteoarthritis Tramadol flexible 10 days 2000 knee or hip 100 - 400 mg/d oral screening USA pain Placebo and wash-out 129 12 weeks titration and maintenance Gana Parallel Osteoarthritis Tramadol fixed 100, 2 - 7 days 2006 knee or hip 200 or 300 or 400 wash-out USA pain mg/d oral 12 weeks 1020 Placebo maintenance 1 week follow-up Thorne Cross-over Osteoarthritis Tramadol flexible Up to 1 week 2008 100 100 - 400 mg/d oral wash-out Canada Placebo 4 weeks each period 6 months open label

24

Table 2 : Effect sizes of diffent classes of opioids on selected outcome variables Outcome Number Number Effect size Test for title of of [95% CI]) Heterog overall studies patients eneity effect I² [%] p-value Opioids 01 Pain 11 3509 SMD -0.23 (-0.32, 32 < 0.0001 -0.14) 02 Drop out 8 3582 RR 4.68 72 < 0.0001 due to (3,51,6.24) adverse events

Opioids with additional mode of action * 01 Pain 6 3545 SMD -0.22 (-0,31, 14 < 0.0001 -0.14) 02 Drop out 7 3618 RR 2.55 49 < 0.0001 due to (2.06,3.14) adverse events

Abbreviations: CI = Confidence interval; RR = Relative risk ; SMD = Standardized mean difference; * tapentadol, tramadol

25

Figure 1: PRISMA Flow Diagram

Records identified through

database searching (n = 17 591)

CENTRAL: (n=3688)

Medline: (n=6944) Scopus: (n=6959) Additional records identified through hand searching (n = 52)

Identification

Records after duplicates removed (n = 12 601 )

Records screened Records excluded (n = 12601 ) (n =12 580)

Screening

Full-text articles assessed Full-text articles excluded, with for eligibility reasons (n = 21 ) (n = 1) Study design did not meet inclusion criteria (n=1)

Eligibility Studies included in qualitative synthesis (n = 20)

Studies included in quantitative synthesis (meta-analysis)

Included (n =20)

26

Figure 2: Risk of bias graph

Random sequence generation (selection bias)

Allocation concealment (selection bias)

Blinding of participants and personnel (performance bias)

Blinding of outcome assessment (detection bias)

Incomplete outcome data (attrition bias)

Selective reporting (reporting bias)

Selection bias

Funding bias

0% 25% 50% 75% 100%

Low risk of bias Unclear risk of bias High risk of bias

27

Figure 3: Risk of bias summary/ Study quality (Study quality was defined according to the eight quality indicators as follows: high = 6 - 8, moderate = 3 – 5, low = 0 – 2,

28

Random sequence generation (selection bias) (selection generation sequence Random Allocation concealment (selection bias) (selection concealment Allocation bias) (performance personnel and participants of Blinding bias) (detection assessment outcome of Blinding bias) (attrition data outcome Incomplete bias) (reporting reporting Selective bias Selection bias Funding

Afilalo 2010 + + ? ? ? + + –

Afilalo 2013 ? ? + ? ? – ? –

Babul 2004 + ? + ? ? – + –

Breivik 2010 + + + ? ? ? + –

Caldwell 1999 + + + ? ? – ? –

Caldwell 2002 ? ? + ? ? – + –

DeLemos 2011 + + + ? ? ? + –

Fishman 2007 + + + ? ? – + –

Fleischmann 2000 + ? + ? ? ? + –

Friedmann 2011 ? ? ? ? ? – ? –

Gana 2006 + + + ? ? ? – –

Katz 2010 + ? + ? ? + + –

Langford 2006 + + + ? ? – + –

Markenson 2005 + + ? ? ? ? + –

Matsumoto 2005 + ? + + ? – + –

Munera 2010 ? ? + ? ? + ? – 29 Peloso 2000 ? ? + ? – – + +

Rauck 2013 ? ? + ? ? ? + –

Thorne 2008 ? ? + ? ? ? + –

Vojtassak 2011 + ? ? ? ? + + – References 1. Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ 2003;326(7382):219. 2. Andrianakos AA, Kontelis LK, Karamitsos DG. Prevalence of symptomatic knee, hand and hip osteoarthritis in Greece. The ESORDIG study. J Rheumatol 2006;33:2507–2513. 3. Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics.1994;50:1088–1101. 4. Chaparro LE, Furlan AD, Deshpande A, Mailis-Gagnon A, Atlas S, Turk DC. Opioids compared to placebo or other treatments for chronic low-back pain. Cochrane Database Syst Rev 2013;8:CD004959. 5. Cepeda MS, Camargo F, Zea C, Valencia L. Tramadol for osteoarthritis. Cochrane Database Syst Rev 2006;(3):CD005522. 6. Chou R, McDonagh MS, Nakamoto E, Griffin J. Analgesics for Osteoarthritis: An Update of the 2006 Comparative Effectiveness Review. Comparative Effectiveness Review No. 38. (Prepared by the Oregon Evidence-based Practice Center under Contract No. HHSA 290 2007 10057 I) AHRQ Publication No. 11(12)-EHC076-EF. Rockville, MD: Agency for Healthcare Research and Quality. October 2011. www.effectivehealthcare.ahrq.gov/reports/final.cfm.Accessed February 1,2014

7. Cohen J. Statistical Power Analysis for the Behavoral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. 8. Egger M, Smith GD, Schneider M. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997;315:629–634 9. Fayers PM, Hays RD. Don't middle your MIDs: regression to the mean shrinks estimates of minimally important differences. Qual Life Res 2014;23(1):1-4. 10. Felson DT. Epidemiology of knee and hip osteoarthritis. Epidemiol Rev. 1988;10:1–28. 11. Fernandes L, Hagen KB, Bijlsma JW, Andreassen O, Christensen P, Conaghan PG, Doherty M, Geenen R, Hammond A, Kjeken I, Lohmander LS, Lund H, Mallen CD, Nava T, Oliver S, Pavelka K, Pitsillidou I, da Silva JA, de la Torre J, Zanoli G, Vliet Vlieland TP; European League Against Rheumatism (EULAR). EULAR recommendations for the non-pharmacological core management of hip and knee osteoarthritis. Ann Rheum Dis 2013;72(7):1125-35.

30

12. Furlan AD, Sandoval JA, Mailis-Gagnon A, Tunks E. Opioids for chronic noncancer pain: a meta-analysis of effectiveness and side effects. CMAJ 2006;174(11):1589-94. 13. Guyatt GH, Oxman AD, Kunz R, et al; GRADE Working Group. Going from evidence to recommendations. BMJ 2008;336(7652):1049–1051. 14. Higgins JPT, Green S. Cochrane Handbook for systematic reviews of intervention. Version 5.1.0. http://handbook.cochrane.org/ 15. Häuser W, Bernardy K, Maier C. Long-term opioid therapy in chronic non-cancer pain: A systematic review and meta-analysis of efficacy and harms in open-label extension trials with a study duration of at least 26 weeks duration. Schmerz 2014, in press.

16. Hochberg MC, Altman RD, April KT, Benkhalti M, Guyatt G, McGowan J, Towheed T, Welch V, Wells G, Tugwell P; American College of Rheumatology. American College of Rheumatology 2012 recommendations for the use of nonpharmacologic and pharmacologic therapies in osteoarthritis of the hand, hip, and knee. Arthritis Care Res (Hoboken) 2012;64(4):465-74. 17. Kissin I. Long-term opioid treatment of chronic nonmalignant pain: unproven efficacy and neglected safety? J Pain Res 2013;6:513-29. 18. Li L, Setoguchi S, Cabral H, Jick S. Opioid use for noncancer pain and risk of fracture in adults: a nested case-control study using the general practice research database. Am J Epidemiol 2013 178(4):559-69. 19. Manchikanti L, Vallejo R, Manchikanti KN, Benyamin RM, Datta S, Christo PJ. Effectiveness of long-term opioid therapy for chronic non-cancer pain. Pain Physician 2011;14(2):E133-56.

20. Michna E, Cheng WY, Korves C, Birnbaum H, Andrews R, Zhou Z, Joshi AV, Schaaf D, Mardekian J, Sheng M. Systematic literature review and meta-analysis of the efficacy and safety of prescription opioids, including abuse-deterrent formulations, in non-cancer pain management. Pain Med 2014;15(1):79-92.

21. Moher D, Liberati A, Teztlaff J, Altman G and the PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. Ann Intern Med 2009;51:1-7.

31

22. Moore AR, Eccleston C, Derry S, Wiffen P, Bell RF, Straube S, McQuay H; ACTINPAIN Writing Group of the IASP Special Interest Group on Systematic Reviews in Pain Relief; Cochrane Pain, Palliative and Supportive Care Systematic Review Group Editors. "Evidence" in chronic pain - establishing best practice in the reporting of systematic reviews. Pain 2010;150(3):386-9.

23. Moore RA, Derry S, Wiffen PJ. Challenges in design and interpretation of chronic pain trials. Br J Anaesth 2013;111(1):38-45.

24. Neogi T, Zhang Y. Epidemiology of osteoarthritis. Rheum Dis Clin North Am. 2013;39(1):1-19.

25. Noble M, Treadwell JR, Tregear SJ, Coates VH, Wiffen PJ, Akafomo C, Schoelles KM. Long-term opioid management for chronic noncancer pain. Cochrane Database Syst Rev 2010;(1):CD006605.

26. Nüesch E, Rutjes AW, Husni E, Welch V, Jüni P. Oral or transdermal opioids for osteoarthritis of the knee or hip. Cochrane Database Syst Rev 2009;(4):CD003115. 27. Reinecke H, Sorgatz H; German Society for the Study of Pain (DGSS). [S3 guideline LONTS. Long-term administration of opioids for non-tumor pain]. Schmerz 2009;23(5):440-7. 28. Von Korff M, Kolodny A, Deyo RA, Chou R. Long-term opioid therapy reconsidered. Ann Intern Med 2011;155(5):325-8. 29. Watson CP, Watt-Watson J, Chipman M. The long-term safety and efficacy of opioids: a survey of 84 selected patients with intractable chronic noncancer pain. Pain Res Manag 2010;15(4):213-7. 30. Welsch P, SOmmer C, Schiltenwolf M, Häöuser W. Opioids in chronic non-cancer pain: Are opioids superior to non-opioid analgesics? A systematic review and meta- analysis of efficacy and harms of randomized head-to head comparisons of opioids versus non-opioid analgesics in studies of at least four weeks duration. Schmerz 2014, in press 31. Zacher J, Carl HD, Swoboda B, Backhaus M. [Imaging of osteoarthritis of the peripheral joints]. Z Rheumatol 2007;66(3):257-8, 260-4, 266. German

32

Appendix references 1.Afilalo M, Etropolski MS, Kuperwasser B, Kelly K, Okamoto A, Van Hove I, Steup A, Lange B, Rauschkolb C, Haeussler J. Efficacy and safety of Tapentadol extended release compared with oxycodone controlled release for the management of moderate to severe chronic pain related to osteoarthritis of the knee: a randomized, double-blind, placebo- and active-controlled phase III study. Clin Drug Investig. 2010;30(8):489-505. 2. Afilalo M, Morlion B. Efficacy of tapentadol ER for managing moderate to severe chronic pain. Pain Physician 2013;16(1):27-40. 3. Babul ND, Novek R, Chipman H, Roth SH, Gana T, Albert K. Efficacy and safety of extended-release, once-daily tramadol in chronic pain: A randomized 12-week clinical trial in osteoarthritis of the knee. J Pain Symptom Manage 2004; 28(1): 59-71 4. Breivik H, Ljosaa TM, Stengaard-Pedersen K. A 6-months, randomised, placebo- controlled evaluation of efficacy and tolerability of a low-dose 7-day buprenorphine transdermal patch in osteoarthritis patients naive to potent opioids. Scand J Pain 2010;1:122-41 5. Caldwell JR, Hale ME, Boyd RE, Hague JM, Iwan T, Shi M, Lacouture PG. Treatment of osteoarthritis pain with controlled release oxycodone or fixed combination oxycodone plus acetaminophen added to nonsteroidal antiinflammatory drugs: A double blind, randomized, multicenter, placebo controlled trial. J Rheumatol 1999; 26:862-869. 6. Caldwell JR, Rapoport RJ, Davis JC, Offenberg HL, Marker HW, Roth SH, Yuan W, Eliot L, Babul N, Lynch PM. Efficacy and safety of a once-daily morphine formulation in chronic, moderate-to-severe osteoarthritis pain: Results from a randomized, placebo-controlled, doubleblind trial and an open-label extension trial. J Pain Symptom Manage 2002; 23:278-291. 7. DeLemos BP, Xiang J, Benson C, Gana TJ, Pascual ML, Rosanna R, Fleming B. Tramadol hydrochloride extended-release once-daily in the treatment of osteoarthritis of the knee and/or hip: a double-blind, randomized, dose-ranging trial. Am J Ther 2011;18(3):216-26. 8. Fishman RL, Kistler CJ, Ellerbusch MT, Aparicio RT, Swami SS, Shirley ME, Jain AK, Fortier L, Robertson S, Bouchard S. Efficacy and safety of 12 weeks of osteoarthritic pain therapy with once-daily tramadol (Tramadol Contramid OAD). J Opioid Manag 2007; 3(5):273-80.

33

9. Fleischmann RM, Caldwell JR, Roth SH, Tesser JRP, Olson W, Kamin M. Tramadol for the treatment of joint pain associated with osteoarthritis: a randomized, double-blind, placebo-controlled trial. Curr Ther Res 2001; 62(2): 113-128. 10. Friedmann N, Klutzaritz V, Webster L. Efficacy and safety of an extended-release oxycodone (Remoxy) formulation in patients with moderate to severe osteoarthritic pain. J Opioid Manag 2011;7(3):193-202. 11. Gana TJ, Pascual ML,Fleming RR, Schein JR, Janagap CC, Xiang J. Extended- release tramadol in the treatment of osteoarthritis: a multicenter, randomized, double- blind, placebo-controlled clinical trial. Curr Med Res Opin 2006; 22(7):1391-1401. 12. Katz N, Hale M, Morris D. Morphine sulfate and naltrexone hydrochloride extended release capsules in patients with chronic osteoarthritis pain. Postgrad Med 2010;122:112-18 13. Langford R, McKenna F, Ratcliffe S, Vojtassak J, Richarz U. Transdermal fentanyl for improvement of pain and functioning in osteoarthritis: A randomized, placebocontrolled trial. Arthritis Rheum 2006;54(6):1829–37. 14. Markenson JA, Croft J, Zhang PG, Richards P. Treatment of persistent pain associated with osteoarthritis with controlled-release oxycodone tablets in a randomized controlled clinical trial. Clin J Pain 2005;21(6): 524–35. 15. Matsumoto AK, Babul N, Ahdieh H. Oxymorphone extended-release tablets relieve moderate to severe pain and improve physical function in osteoarthritis: results of a randomized, double-blind, placebo- and active-controlled phase III trial. Pain Med. 2005 Sep-Oct;6(5):357-66. 16. Munera C, Drehobl M, Sessler NE, Landau C. A randomized, placebo-controlled, double-blinded, parallel-group, 5-week study of buprenorphine transdermal system in adults with osteoarthritis. J Opioid Manag 2010;6:193-202. 17. Peloso PM, Bellamy N, Bensen W, Thomson GTD, Harsanyi Z, Babul N, Darke AC. Double blind randomized placebo control trial of controlled release codeine in the treatment of osteoarthritis of the hip or knee. J Rheumatol 2000;27: 76 4-7 71. 74. 18. Rauck R, Rapoport R, Thipphawong J. Results of a double-blind, placebo- controlled, fixed-dose assessment of once-daily OROS® hydromorphone ER in patients with moderate to severe pain associated with chronic osteoarthritis. Pain Pract 2013;13(1):18-29. 19. Thorne C, Beaulieu AD, Callaghan DJ, O’Mahony WF, Bartlett JM, Knight R,

34

Kraag GR, Akhras R, Piraino PS, Eisenhoffer J, Harsanyi Z, Darke AC. A randomized, double-blind, crossover comparison of the efficacy and safety of oral controlled-release tramadol and placebo in patients with painful osteoarthritis. Pain Res Manag 2008;13:93-102. 20. Vojtaššák J, Vojtaššák J, Jacobs A, Rynn L, Waechter S, Richarz U. A Phase IIIb, Multicentre, Randomised, Parallel-Group, Placebo-Controlled, Double-Blind Study to Investigate the Efficacy and Safety of OROS Hydromorphone in Subjects with Moderate-to-Severe Chronic Pain Induced by Osteoarthritis of the Hip or the Knee. Pain Res Treat 2011; 2011:239501

Excluded studies (with reason) 1.Burch F, Fishman R, Messina N, Corser B, Radulescu F, Sarbu A, Craciun-Nicodin MM, Chiriac R, Beaulieu A, Rodrigues J, Beignot-Devalmont P, Duplan A, Robertson S, Fortier L, Bouchard S. A comparison of the analgesic efficacy of Tramadol Contramid OAD versus placebo in patients with pain due to osteoarthritis. J Pain Symptom Manage 2007;34(3):328-38. (study design did not meet inclusion criteria)

35

Electronic supplementary tables

Table 1: Search strategy PubMed November 18, 2013

Items Search Query found

#40 Search (#31 AND #34 AND #38) Filters: Publication date from 6938 2008/10/01 to 2013/10/31

#39 Search (#31 AND #34 AND #38) 22855

#38 Search (#35 OR #36 OR #37) 165885

#34 Search (#32 OR #33) 976115

#31 Search (#29 NOT #30) 2846830

#37 Search [nm] OR [nm] OR [nm] 2949 OR [nm] OR [nm] OR dihydrocodein[nm] OR [nm] OR [nm] OR [nm] OR hydroxycodeinone[nm] OR kaolin-pectin[nm] OR [nm] OR levomethadryl[nm] OR levomethadyl[nm] OR methynaloxone[nm] OR nocistatin[nm] OR oxycodein[nm] OR oxymorph[nm] OR paracymethadol[nm] OR [nm] OR protopine[nm] OR [nm] OR sufentanyl[nm] OR tapentadol[nm]

#36 Search Analgesics, Opioid[mh] OR [mh] OR 108754 morphine derivatives[mh] OR [mh] OR opiate[mh] OR opioid[mh] OR acemethadone[mh] OR [mh] OR [mh] OR alphaprodine[mh] OR anileridine[mh] OR benzomorphan[mh] OR buprenorphine[mh] OR [mh] OR carfentanil[mh] OR codeine[mh] OR [mh] OR [mh] OR dezocine[mh] OR diacetylmorphine[mh] OR diamorphine[mh] OR dihydroetorphine[mh] OR [mh] OR dionine[mh] OR [mh] OR [mh] OR dihydrocodein[mh] OR dihydrohydroxycodeinone[mh] OR [mh] OR dihydromorphinone[mh] OR dipipanone[mh] OR [mh] OR endomorphin[mh] OR [mh] OR eseroline[mh] OR [mh] OR ethylketocyclazocine[mh] OR [mh] OR fenoperidine[mh] OR fentanyl[mh] OR [mh] OR hydrocodon[mh] OR hydromorphon[mh] OR hydroxycodeinone[mh] OR [mh] OR isonipecain[mh] OR isopromedol[mh] OR kaolin-pectin[mh] OR ketobemidone[mh] OR [mh] OR levodroman[mh] OR levomethadryl[mh] OR levomethadyl[mh] OR levorphan[mh] OR meperidine[mh]

36

Items Search Query found OR [mh] OR methadol[mh] OR [mh] OR methadyl acetate [mh] OR morphia[mh] OR morphine[mh] OR methynaloxone[mh] OR [mh] OR nocistatin[mh] OR [mh] OR oxycodein[mh] OR oxycodone[mh] OR oxymorph[mh] OR [mh] OR [mh] OR paracymethadol[mh] OR paregoric[mh] OR [mh] OR [mh] OR [mh] OR phenbenzorphan[mh] OR phenethylazocine[mh] OR [mh] OR pirinitramide[mh] OR promedol[mh] OR propoxyphene[mh] OR protopine[mh] OR pyrrolamidol[mh] OR remifentanil[mh] OR [mh] OR sufentanyl[mh] OR talwin[mh] OR tapentadol[mh] OR [mh] OR theocodin[mh] OR [mh] OR tramadol[mh] OR [mh]

#35 Search Opioid Analgesics[tiab] OR Narcotics[tiab] OR 141108 morphine derivatives[tiab] OR narcotic[tiab] OR opiate[tiab] OR opioid[tiab] OR acemethadone[tiab] OR acetylmethadol[tiab] OR alfentanil[tiab] OR alphaprodine[tiab] OR anileridine[tiab] OR benzomorphan[tiab] OR buprenorphine[tiab] OR butorphanol[tiab] OR carfentanil[tiab] OR codeine[tiab] OR dextromoramide[tiab] OR dextropropoxyphene[tiab] OR dezocine[tiab] OR diacetylmorphine[tiab] OR diamorphine[tiab] OR dihydroetorphine[tiab] OR dimepheptanol[tiab] OR dionine[tiab] OR diphenoxylate[tiab] OR diprenorphine[tiab] OR dihydrocodein[tiab] OR dihydrohydroxycodeinone[tiab] OR dihydromorphine[tiab] OR dihydromorphinone[tiab] OR dipipanone[tiab] OR dynorphin[tiab] OR endomorphin[tiab] OR enkephalin[tiab] OR eseroline[tiab] OR etorphine[tiab] OR ethylketocyclazocine[tiab] OR ethylmorphine[tiab] OR fenoperidine[tiab] OR fentanyl[tiab] OR heroin[tiab] OR hydrocodon[tiab] OR hydromorphon[tiab] OR hydroxycodeinone[tiab] OR isocodeine[tiab] OR isonipecain[tiab] OR isopromedol[tiab] OR kaolin- pectin[tiab] OR ketobemidone[tiab] OR levallorphan[tiab] OR levodroman[tiab] OR levomethadryl[tiab] OR levomethadyl[tiab] OR levorphan[tiab] OR meperidine[tiab] OR meptazinol[tiab] OR methadol[tiab] OR methadone[tiab] OR methadyl acetate [tiab] OR morphia[tiab] OR morphine[tiab] OR methynaloxone[tiab] OR nalbuphine[tiab] OR nocistatin[tiab] OR opium[tiab] OR oxycodein[tiab] OR oxycodone[tiab] OR oxymorph[tiab] OR pantopon[tiab] OR papaveretum[tiab] OR paracymethadol[tiab] OR paregoric[tiab] OR pentazocine[tiab] OR pethidine[tiab] OR phenazocine[tiab]

37

Items Search Query found OR phenbenzorphan[tiab] OR phenethylazocine[tiab] OR phenoperidine[tiab] OR pirinitramide[tiab] OR promedol[tiab] OR propoxyphene[tiab] OR protopine[tiab] OR pyrrolamidol[tiab] OR remifentanil[tiab] OR sufentanil[tiab] OR sufentanyl[tiab] OR talwin[tiab] OR tapentadol[tiab] OR thebaine[tiab] OR theocodin[tiab] OR tilidine[tiab] OR tramadol[tiab] OR trimeperidine[tiab]

#33 Search chronic pain[mh] OR Chronic Disease[mh] OR 704482 Pain[mh] OR Pain Measurement[mh] OR Low Back pain[mh] OR Back Pain[mh] OR backache[mh] OR Osteoarthritis[mh] OR Rheumatoid arthritis[mh] OR Brachial Plexus Neuritis[mh] OR cervicobrachial pain syndrome[mh] OR Irritable bowel syndrome[mh] OR Irritable colon[mh] OR chronic pancreatitis[mh] OR Tension headache[mh] OR Headache[mh] OR Headache Disorders[mh] OR Temporomandibular joint syndrome[mh] OR globus syndrome[mh] OR Diabetic Neuropathies[mh] OR diabetic neuropath*[mh] OR Post herpetic neuralgia[mh] OR Postherpetic neuralgia[mh]OR neuropathic pain[mh] OR neuralgia[mh] OR polyneuropathies[mh] OR polyneuropathy[mh] OR Fibromyalgia[mh] OR Phantom Limb[mh] OR Phantom limb pain[mh] OR Cumulative Trauma Disorders[mh] OR Repetitive strain syndrome[mh] OR Whiplash Injuries[mh] OR Whiplash[mh]

#32 Search chronic pain[tiab] OR Chronic Disease[tiab] OR 572053 Pain[tiab] OR Pain Measurement[tiab] OR Low Back pain[tiab] OR Back Pain[tiab] OR backache[tiab] OR Osteoarthritis[tiab] OR Rheumatoid arthritis[tiab] OR Brachial Plexus Neuritis[tiab] OR cervicobrachial pain syndrome[tiab] OR Irritable bowel syndrome[tiab] OR Irritable colon[tiab] OR chronic pancreatitis[tiab] OR Tension headache[tiab] OR Headache[tiab] OR Headache Disorders[tiab] OR Temporomandibular joint syndrom [tiab] OR globus syndrome[tiab] OR Diabetic Neuropathies[tiab] OR diabetic neuropath*[tiab] OR Post herpetic neuralgia[tiab] OR Postherpetic neuralgia[tiab]OR neuropathic pain[tiab] OR neuralgia[tiab] OR polyneuropathies[tiab] OR polyneuropathy[tiab] OR Fibromyalgia[tiab] OR Phantom Limb[tiab] OR Phantom limb pain[tiab] OR Cumulative Trauma Disorders[tiab] OR Repetitive strain syndrome[tiab] OR Whiplash Injuries[tiab] OR Whiplash[tiab]

#30 Search animals[mh] NOT humans[mh] 3833757

#29 Search randomized controlled trial[pt] OR controlled 3308547

38

Items Search Query found clinical trial[pt] OR randomized[tiab] OR placebo[tiab] OR drug therapy[sh] OR randomly[tiab] OR trial[tiab] OR groups[tiab] OR Systematic Review[pt] OR Systematic Review[tiab] OR Systematic Review[sh] OR Meta- Analysis[pt] OR Meta-Analysis[tiab] OR Meta-Analysis[sh] OR Cochrane Database Syst Rev[Journal]

39

40

Table 2. Criteria for risk of bias assessment for RCTs 1. Random sequence generation (selection bias) Selection bias (biased allocation to interventions) due to inadequate generation of a randomized sequence There is a low risk of selection bias if the investigators describe a random component in the sequence generation process such as: referring to a random number table, using a computer random number generator, coin tossing, shuffling cards or envelopes, throwing dice, drawing of lots or minimization (minimization may be implemented without a random element, and this is considered to be equivalent to being random). There is a high risk of selection bias if the investigators describe a non-random component in the sequence generation process such as: sequence generated by odd or even date of birth, date (or day) of admission, hospital or clinic record number; or allocation by judgement of the clinician, preference of the participant, results of a laboratory test or a series of tests, or availability of the intervention. 2. Allocation concealment (selection bias) Selection bias (biased allocation to interventions) due to inadequate concealment of allocations prior to assignment There is a low risk of selection bias if the participants and investigators enrolling participants could not foresee assignment because one of the following, or an equivalent method, was used to conceal allocation: central allocation (including telephone, web-based and pharmacy-controlled randomization); sequentially numbered drug containers of identical appearance; or sequentially numbered, opaque, sealed envelopes. There is a high risk of bias if participants or investigators enrolling participants could possibly foresee assignments and thus introduce selection bias, such as allocation based on: using an open randomallocation schedule (for example, a list of random numbers); assignment envelopes were used without appropriate safeguards (for example, if envelopes were unsealed or non-opaque or not sequentially numbered); alternation or rotation; date of birth; case record number; or other explicitly unconcealed procedures. 3. Blinding of participants and of personnel/ care providers (performance bias)

41

Performance bias due to knowledge of the allocated interventions by participants and by personnel/care providers during the study There is a low risk of performance bias if blinding of participants was ensured and it was unlikely that the blinding could have been broken; or if there was no blinding or incomplete blinding, but the review authors judge that the outcome is not likely to be influenced by lack of blinding. There is a low risk of performance bias if blinding of personnel was ensured and it was unlikely that the blinding could have been broken; or if there was no blinding or incomplete blinding, but the review authors judge that the outcome is not likely to be influenced by lack of blinding. 4. Blinding of outcome assessor (detection bias) Detection bias due to knowledge of the allocated interventions by outcome assessors of patient reported outcomes There is low risk of detection bias if the outcome assessor of patient-reported outcomes is not the the clinical investigator but a statistician not involved in the treatment of the patients. There is an unclear risk of bias if not details are reported who was the outcome assessor. There is a high risk of bias if the outcome assessor was involved in the treatment of the patients. 5. Incomplete outcome data (attrition bias) Attrition bias due handling of incomplete outcome data There is low risk of bias if all randomized patients were reported or analysed in the group to which they were allocated by randomization and dropours were analysed by baseline observation forward method (BOCF). There is an unclear risk of bias if all randomized patients were reported or analysed in the group to which they were allocated by randomization and dropouts were analysed by last observation observation forward method (LOCF). There is a high risk of bias if there was no ITT- analysis and only completers were reported. 6. Selective reporting (reporting bias) Reporting bias due to selective outcome reporting There is low risk of reporting bias if the study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way, or if the study protocol is not available but it is clear that the published reports include all expected outcomes,

42 including those that were pre-specified (convincing text of this nature may be uncommon). There is a high risk of reporting bias if not all of the study’s pre-specified primary outcomes have been reported; one or more primary outcomes is reported using measurements, analysis methods or subsets of the data (for example, subscales) that were not pre-specified; one or more reported primary outcomes were not pre- specified (unless clear justification for their reporting is provided, such as an unexpected adverse effect); one or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta-analysis; the study report fails to include results for a key outcome that would be expected to have been reported for such a study. 7. Group similarity at baseline (selection bias) Bias due to dissimilarity at baseline for the most important prognostic indicators. There is low risk of bias if groups are similar at baseline for demographic factors, value of main outcome measure(s), and important prognostic factors 8. Other bias (Funding bias) We assumed a low risk of bias if the study was initiated by an investigator and the study received no funding by a pharmaceutical company. We assumed a high risk of bias if there was relationship of the authors with the pharmaceutical industry. We extracted the following information about the relationship with the pharmaceutical industry: author affiliation with industry, funding of study by industry, industry providing the study drug or statistical analysis performed by an industry-affiliated statistician. In case of affirmative response to any of these questions, we concluded that there was a funding bias.

43

Table 3: Characteristics of included studies Afilalo 2010 Methods Disease: Osteoarthritis pain

Study setting: 87 sites in US, 15 sites in Canada, 6 sites in new Zealand, 4 sites in Australia

Study design: Parallel design

Study duration: 3 weeks titration, 12 weeks maintenance Participants Inclusion criteria: Age >= 40 years; diagnosis of osteoarthritis of the knee according to ACR criteria, functional capacity I-III, pain at the reference joints requiring the use of non-opioids or opioids at doses equivalent to <= 160 mg oral morphine/d for >=3 months prior to screening

Exclusion criteria: Presence of any clinically significant or unstable medical or psychiatric disease (e.g. history of substance abuse, chronic hepatitis B or C, HIV infection, uncontrolled hypertension, renal impairment (GFR < 60 ml/min), hepatic impairment (ALT or AST >= 3times the upper limit of normal)

Placebo: N=337; mean age 58.2 years; 59.3% female; 79.2% white. Severe pain 81.8 %

Tapentadol: N=344; mean age 58.4 years; 62.8% female; 75.6% white. Severe pain 81.8 %

Oxycodone: N=342;mean age 58.2 years; 59.1% female; 71.6% white. Severe pain 83.0% Interventions Study medication: Tapentadol 200-500 mg/d (mean dosage approximately 350 mg/d); Oxycodon 40-100 mg/d (mean dosage approximately 70 mg/d), Placebo

Rescue medication: Paracetamol (up to 1000 mg/day for 3 consecutive days)

Allowed co-medication: Antidepressants for patients with controlled psychiatric or neurological diseases Outcomes Pain: Pain intensity subscale of WOMAC at study

44

visit

Responder: 50% pain reduction NRS 0-10

PGIC: Much or very much improved

Function: Physical function subscale WOMAC *

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Reported Notes * The authors report ITT-analysis; however, the number of patients reported indicate rather per protocol analysis

45

Afilalo 2013 Methods Disease: Osteoarthritis pain

Study setting: 101 sites in 13 European countries;

Study design: Parallel design

Study duration: 3 weeks titration, 12 weeks maintenance Participants Inclusion criteria: Age >= 40 years; Patients diagnosed with osteoarthritis of the knee based on the American College of Rheumatology (ACR) criteria and functional capacity class of I- III;Patients taking analgesic medications for at least 3 months prior to screening and dissatisfied with their current therapy;Patients requiring opioid treatment must be taking daily doses of opioid- based analgesic, equivalent to <160 mg of oral morphine; Baseline score of >=5 on an 11-point numeric rating scale, calculated as the average pain intensity during the last 3 days prior to randomization

Exclusion criteria: History of and/or drug abuse in Investigator's judgment;Chronic hepatitis B or C, or HIV, presence of active hepatitis B or C within the past 3 months;Life-long history of seizure disorder or epilepsy;Uncontrolled hypertension;Patients with severely impaired renal function;Patients with moderate to severely impaired hepatic function or with laboratory values reflecting inadequate hepatic function; Treatment with neuroleptics, monoamine oxidase inhibitors, serotonin norepinephrine reuptake inhibitors (SNRI), tricyclic antidepressants, anticonvulsants, or anti-parkinsonian drugs, treatment with any other analgesic therapy than investigational medication or rescue medication during the trial

Total sample: N=987; mean age 62.1 years; 60.4% female; 99.3% white. Severe pain 81.8 % Interventions Study medication: Tapentadol 200-500 mg/d (mean daily dosage not reported); Oxycodon 40- 100 mg/d (mean daily dosage not reported); Placebo

46

Rescue medication: No information given

Allowed com-medication: No information given Outcomes Pain: Average pain intensity NRS 0-10; only LSMD vs placebo reported; no significant difference to placebo; data not usable for meta- analysis.

Responder: 50% pain reduction NRS 0-10

PGIC much or very much improved

Function: SF-36 physical functioning; only LSMD vs placebo reported; no significant difference to placebo; data not usable for meta-analysis.

Withdrawal due to adverse events: Reported for tapentadol and oxycodone, but not for placebo

Serious adverse events: Not reported

Death: Not explicitly stated Notes The study was not published as a full paper; some data were presented in a pooled analysis of tapentadol studies in chronic pain. The authors stated: "Because the difference between the active comparator oxycodone and placebo was not statistically significant for either primary endpoint, this study must be considered a failed trial; hence, the lack of a statistically significant difference between tapentadol ER and placebo in the primary endpoint is not interpretable."

47

Babul 2004 Methods Diasease: Osteoarthritis pain

Study setting: 16 sites in US

Study design: Parallel design

Study duration: 3-7 days wash-out, 12 weeks titration and maintenance Participants Inclusion criteria: At least 18 years of age and have Functional Class I–III, primary OA of the knee meeting ACR diagnostic criteria, defined by knee pain and recent radiographic evidence of osteophytes, plus at least one of the following: age 50 years, morning stiffness 30 minutes in duration, and/or crepitus; have involvement of at least one knee joint that has warranted treatment with acetaminophen, COX-2 inhibitors, NSAIDs, tramadol, or opioid analgesics for at least 75 of 90 days prior to the study; and have a baseline visual analogue scale (VAS) pain intensity score of 40 mm in the index joint.

Exclusion criteria: uncontrolled concomitant disease or chronic condition(s) that might interfere with the assessment of pain and other symptoms of OA; other prior disease or joint replacement at the index joint; likelihood of requiring a surgical procedure of the index joint(s) during the study; inflammatory arthritis, gout, pseudogout,or Paget’s disease that might interfere with the assessment of response; diagnosis of chronic pain syndrome; ACR or a clinical diagnosis of fibromyalgia; inability to discontinue acetaminophen, COX-2 inhibitors, NSAIDs (other than <=325 mg QD for cardiovascular prophylaxis), corticosteroids, or other analgesics for the duration of the double-blind study; use of oral, intramuscular, intravenous, or soft tissue corticosteroids within 1 month prior to the study; use of intra-articular corticosteroids in the index knee joint within 2 months prior to the study; intra-articular viscosupplementation in the index knee joint during the past 6 months, or intra- articular viscosupplementation in a non-index knee in the past 3 months; weight <= 100 lbs; history of clinically significant intolerance to 48

tramadol or a known hypersensitivity to opioid analgesics; and increased risk in terms of the precautions, warnings, and contraindications noted in the tramadol prescribing information.

Tramadol: N=124; mean age 61.2 years; 66.1% female; 78.2% white. Pain baseline 78.2 (±10)

Placebo:N=122; mean age 61.5 years; 56.6% female; 86.1 % white. Pain baseline 75.5 (±16.5) Interventions Study medication: Upward titration up to tramadol 400mg/d (the mean tramadol dose was 276 mg )

Rescue medication: NSAIDs or other analgesics were not permitted during the washout period or double-blind period, except for acetaminophen up to 2000 mg per day for reasons other than for chronic pain, if absolutely necessary, and for no more than 3 consecutive days.

Allowed co-therapies: Glucosamine and chondroitin were permitted provided the patient was on stable doses for a minimum of 2 months prior to randomization and agreed to continue on the same dose for the duration of the study. Patients receiving physical therapy or using assistive devices upon entering the study were encouraged to continue these interventions throughout the study. Outcomes Pain: Average daily pain intensity during the past 24 h VAS 0-100

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: The WOMAC (Western Ontario and McMaster Universities) OA Index Physical Function Subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Not explicitely reported

Death: Not explicitly stated

49

Breivik 2010 Methods Disease: Osteoarthritis pain

Study setting: 19 sites in Denmark, Finland, Norway, Sweden

Study design: Parallel design

Study duration: 24 weeks maintenance, follow- up after 4 weeks Participants Inclusion criteria: Men and women over the age of 40 were included if they had a clinical diagnosis of osteoarthritis of the hip and/or knee, fulfilled the American College of Rheumatology (ACR) Criteria for osteoarthritis, had experienced pain from the relevant joint for at least one year prior to enrolment, had radiographic evidence of osteoarthritis of the hip and/or knee, as defined by Grades II to IV of the Kellgren and Lawrence scale, were taking NSAIDs or coxibs for their osteoarthritis pain for at least one month prior to the Screening Visit (visit 1), at a stable frequency and dose, and at least half the maximum allowed daily dose which gives an anti-inflammatory effect, they continued to experience at least moderate pain when walking on a flat surface, in spite of treatment with NSAIDs or coxibs, were willing to continue their treatment with NSAID or coxib, at a stable frequency and dose, until the end of the double-blind phase, those who had been using intermittently low-potent opioids (e.g. tramadol, low dose codeine) were willing to discontinue this regimen from the screening visit until the completion or discontinuation visit and take paracetamol tablets provided by the Sponsor as intermittent analgesic rescue, those who were receiving transcutaneous nerve stimulation (TENS) or biofeedback prior to study entry were willing to discontinue this therapy for the duration of the study.

Exclusion criteria: Patients were excluded if they had been treated with strong opioid analgesics (e.g. morphine, oxycodone, methadone, fentanyl-patch),were treated regularly with weak opioid analgesics such as tramadol, or codeine, for longer than three weeks prior to the screening visit; if any intermittent, 50

short-term, treatment with weak opioids could not be discontinued for the duration of the study; if they had a history of other chronic condition(s) for which they required frequent analgesic therapy (e.g., headaches, migraine, gout);were scheduled for any major surgery that would fall within the screening phase or the double-blind phase of the study; if transcutaneous nerve stimulation (TENS) or biofeedback prior to enrolment could not be discontinued for the duration of the study; if the investigator deemed that the patient had any contraindication to treatment with opioid medication, such as history of alcohol or substance abuse; if the patient had any other clinically significant disease or any reduced organ function; if the patient was using antidepressants, antiepileptic drugs, steroids, hypnotics (that may increase respiratory depression of buprenorphine); if the patient, or any close relatives, had long QT-syndrome, were on anti-arrhythmic medication (Class IA or Class III), or had any unstable or symptomatic cardiac abnormality.

Placebo: N=99; mean age 62.9 years; 64.7% female; 100% white. Pain baseline only reported in figure

Buprenorphine 5 or 10 or 20 ug/h: N=100; mean age 62.9 years; 72.0% female; 100% white. Pain baseline only reported in figure Interventions Study medication: Titration to individually optimal dosage buprenorphine 5,10 or 20 ug/h; placebo

Rescue medication: Acetaminophen up to 4g/d

Allowed co-therapies: Stable dosage of NSAIDs Outcomes Pain: Pain intensity NRS 0-10

Responder: No 50% pain reduction rates reported

PGIC: Much and very much improved

Function: WOMAC OA index of function

Withdrawal due to adverse events: Reported

51

Serious adverse events: Reported

Death: Not explicitly stated: no deaths reported in section "serious adverse events"

52

Caldwell 1999 Methods Disease: Osteoarthritis pain

Study setting: 9 sites in US

Study design: Enriched enrollment randomized withdrawal

Study duration: Open label titration for 30 days, 30 days double-blind withdrawal Participants Inclusion criteria: Adult patients with moderate to severe average persistent daily pain > 1 month despite regular use of NSAIDS. The diagnosis of OA was based on 6 radiographic criteria.

Exclusion criteria: Ligitation related to pain or injury; intraarticular steroid injections within the last 6 weeks; active cancer; severe organic dysfunction; history of substance abuse

Oxycodone: N=34, mean age 57 years, race not reported. Pain baseline only reported in figure.

Placebo:N=36; mean age 58 years; 69% female, race not reported. Pain baseline only reported in figure. Interventions Study medication: Oxycodone was adjusted (20 to 60 mg/d) in open label phase until pain intensity was less than moderate for several days in the absence of intolerable or unmanageable side effects (mean dosage 40 mg/d)

Rescue medication: No information provided.

Allowed co-therapies: NSAIDS were continued at prestudy dose, no other analgesics, stable steroid dose for at least 1 month. Outcomes Pain: Mean global pain intensity at the end of double-blind phase compared to end of titration Pain Intensity NRS 0-10

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: Not assessed

53

Withdrawal due to adverse events: Reported

Serious adverse events: Not reported

Death: Not explicitly stated Notes

54

Caldwell 2002 Methods Disease: Osteoarthritis pain

Study setting: 16 sites in US

Study design: Parallel design

Study duration: 7 days days wash out, 4 weeks maintenance Participants Inclusion criteria: Patients had to be at least 40 years of age and have both a clinical diagnosis and grade II-IV radiographic evidence of OA of the hip and/or knee; have had prior suboptimal analgesic response to treatment with NSAIDs and acetaminophen or had previously received intermittent opioid analgesic therapy; and have a baseline visual analogue scale (VAS) pain intensity score of 40 mm in the index joint.

Exclusion criteria: Patients with serious concomitant disease, chronic condition(s) that might interfere with the assessment of pain and other symptoms of OA, prior disease at the index joint, surgery or the likelihood of requiring a surgical procedure of the index joint(s) during the trial; diseases other than OA not well managed with treatment; weight 100 lbs; oral, intramuscular, intravenous, intra-articular, or soft tissue administration of steroids within 1 month of study drug administration (two months, if at index knee or hip joint); intra-articular viscosupplementation (in the index joint) within six months of trial treatment; opioid therapy for longer than three weeks prior to baseline; any history of substance abuse within two years prior to screening; and history of clinically significant intolerance to opioids or any known hypersensitivity to morphine or other opioid analgesics.

Extended release morphine 30 mg/d once daily in the morning: N=73; mean age 62.6 years; 59% female; 86% white. Pain baseline 62.6 (±9.5)

Extended release morphine 30 mg/d once daily in the evening: N=73; mean age 55

63.1years; 58% female; 82% white. Pain baseline 63.1 (±11.1)

Non-extended morphine 2x15 mg/d: N=76; mean age 61.9 years; 63% female; 90% white. Pain baseline 61.9 (±10.4)

Placebo:N=73; mean age 61.9 years; 70% female; 80 % white. Pain baseline 61.9 (±10.7) Interventions Study medication: Extended release morphine 30 mg/d once daily in the morning (mean dosage not reported), extended release morphine 30 mg/d once daily in the evening (mean dosage not reported); Non-extended morphine 2x15 mg/d (mean dosage not reported); placebo twice/d

Rescue medication: Cardiovascular prophylactic doses of aspirin (up to 325 mg/day) and acetaminophen for non-OA symptomatology (up to 2000 mg/day for a maximum of 3 consecutive days) was prohibited in the double-blind trial. Acetaminophen had to be stopped 24 hours prior to efficacy assessments.

Allowed co-therapies: No information provided. Outcomes Pain: Overall Arthritis Pain Intensity VAS 0-10

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC Funcional Impairment subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Incompletely reported *

Death: Not explicitly stated Notes * Details reported not sufficient to perform meta- analysis: "Six patients experienced a serious AE but only one (hospitalized for constipation) was thought to be possibly related to study drug (Avinza QPM); this patient withdrew from the trial due to this event.

56

DeLemos 2011 Methods Diasease: Osteoarthritis pain

Study setting: 70 sites in US

Study design: Parallel

Study duration: (1) 2-7 days wash-out, (2) 12 weeks maintenance, (3) 1 week follow-up Participants Inclusion criteria: Aged 18-74 years with symptomatic (painful) OA of the kneehip and/or hipknee, radiographically confirmed ACR functional class I-III. Taken acetaminophen, NSAIDs, CO-2 inhibitor or an opioid at least 75 of 90 previous days to treat OA pain; moderate or severe OA pain that warranted treatment with COX-2-inhibitors, NSAIDs, acetaminophen, or opioid analgesics fort at least 75 of 90 days preceding the screening visit; baseline joint index pain of >= 40 on a 100 mm pain scale after wash- out. In addition, patients were required to be able to discontinue acetaminophen, NSAIDs, COX-2 inhibitors, opioids, and other analgesics during the study.

Exclusion criteria: Any medical condition other than OA which was not well controlled; inflammatroy arthritis, gout, pseudo-gout, or Paget disease; a chronic pain syndrome or fibromyalgia; prior joint replacement surgery at the index joint; history of substance abuse in the previous 6 months; use of antidepressants, anticonvulsants or other analgesics except acetaminophen.

Tramadol 100 mg/d: N=201; mean age 58.459.5 years; 62.458.2% female; 72.381.6% white. Pain baseline 298.4 (± 101.3)

Tramadol 200 mg/d: N=201; mean age 62.0 (± 9.9) years; 62.3% female; 78.4% white. Pain baseline 302.9 (± 96.1)

Tramadol 300 mg/d: N=201; mean age 59.7 years; 61.8% female; 80.9% white. Pain baseline 306.9 (± 107.3)

Placebo: N=200; mean age 58.9 years; 68.5% female; 82.5% white. Pain baseline 300.8 (± 57

103.5) Interventions Study medication: Fixed dosage of 100, 200 or 300 mg tramadol/d; fixed dosage of placebo

Rescue medication: acetaminophen up to 2g/day for no more than 3 consecutive days for reasons other than OA or chronic pain. Use of acetaminophen was prohibited 48 h before each study visit.

Allowed co-therapies: No information provided Outcomes Pain: WOMAC Pain Subscale

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC OA physical function score 0-1700

Withdrawal due to adverse events: Reported

Serious adverse events: Reported *

Death: Reported Notes * Only treatment related SAE reported: Data not used for analysis

58

Fishmann 2007 Methods Diasease: Osteoarthritis pain

Study setting: 74 sites in US

Study design: Parallel

Study duration: 6 days titration, 12 weeks maintenance Participants Inclusion criteria: 40-75 years, diagnosis of OA according to ACR criteria; WOMAC OA Index pain score >= 150 mm;

Exclusion criteria: Arthritic conditions other than OA; history of seizures; evidence of effusion > 15 cc on physical examination; BMI >=38; major illnesses requiring hospitalization within 3 months before screening; unwillingness to discontinue pain medication or other medication taken for OA; previous or current substance abuse or dependence other than ; significant bowel, renal or liver disease

Placebo: N=224; mean age 61 years; 61,6% female; race not reported. Pain baseline 30.1 (±8.9)

Tramadol 100 mg/d: N=103; mean age 63 years; 60.2% female; race not reported. Pain baseline 28.7 (±7.9)

Tramadol 200 mg/d: N=107; mean age 61 years; 59.8% female; race not reported. Pain baseline 28.4 (±8.2)

Tramadol 300 mg/d: N=105; mean age 60 years; 65.7% female; race not reported. Pain baseline 31.4 (±8.7) Interventions Study medication: Tramadol fixed 100 mg/d or 200 mg/d or 300 mg/d

Rescue medication: Rescue medication of pain due to OA was not permitted.

Allowed co-therapies: No information provided Outcomes Pain: Relative percentage of pain improvement in WOMAC pain score from baseline

59

Responder: No 50% pain reduction rates reported

PGIC: Not reported

Function: WOMAC OA physical functioning score; no detailed outcomes reported *

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated Notes * The mean improvement in the WOMAC physical function score from baseline to end of the study was 46% for 300 mg/d (p=0.02 vs. placebo), 45% for 200 mg/d (p=0.045 vs. placebo) and 48% with 100 mg/d (p=0.03 vs. placebo) and 27% in placebo

60

Fleischmann 2000 Methods Diasease: Osteoarthritis pain

Study setting: 12 sites in US

Study design: Parallel

Study duration: 10 days screening and wash- out,1 week titration and 11 weeks maintenance Participants Inclusion criteria: aged 35 to 75 years with symptomatic (painful) OA of the knee for >=1 year were eligible for inclusion if they had used NSAlDs for >=3 months before study entry and were otherwise in good health. The diagnosis of OA was confirmed by demonstration of osteophytes on knee radiographs taken within a year before enrollment. Patients were required to have at least moderate pain (pain intensity r2 on a scale of 0 to 4, with 0 being the least and 4 being the greatest pain intensity) in the target knee when their current analgesic was discontinued.

Exclusion criteria: any other form of arthritis; major trauma, infection, or apparent avascular necrosis of the target knee within 6 months before study entry; or anatomical deformities of the knee that could interfere with assessment. In addition, patients were excluded if they underwent arthroscopic procedures within 6 months or surgical procedures on the target knee within a year before the study, or had knee replacements or were candidates for knee replacement within 1 year before the study. Patients who received intra-articular injections of corticosteroids in the knee within 1 month, hyaluronic acid injections in the knee or systemic corticosteroids within 3 months, or glucosamine within 10 days before the study were excluded. Also excluded were patients who, in the opinion of the investigator, should not have been enrolled in the study based on the precautions, warnings, or contraindications outlined in the tramadol package insert.

Placebo: N=66; mean age 62.5 years; 59.1% female; 86.4% white. Pain baseline 2.85 (± 0.63)

Tramadol: N=63; mean age 62.5 years; 65.1% 61

female; 95.2% white. Pain baseline 2.71 (± 0.63) Interventions Study medication: Titration to individually optimal between 200 and 400 mg/d (mean dosage not reported)

Rescue medication: None allowed

Allowed co-therapies: Patients were instructed to maintain a constant level of activity throughout the study. Physiotherapy (ie, hot/cold packs and massages) initiated before the double-blind phase was continued throughout the study, although it could not be initiated during the double blind phase. Patients were not to use other adjunctive therapy (eg, topical therapy, acupuncture) during the study Outcomes Pain: Pain intensity previous 24 hours NRS 0-4

Responder: No 50% pain reduction rates reported

PGIC: Not reported

Function: WOMAC OA function score

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated

62

Friedmann 2011 Methods Diasease: Osteoarthritis pain

Study setting: 62 sites in US

Study design: Enriched enrollment randomized withdrawal design

Study duration: 4-10 days wash-out all medications (if >= 20mg Oxycodon or 200 mg Tramadol an opioid taper was required before), 2 weeks open label, 12 weeks double-blind withdrawal with option of dose escalation in the first 4 weeks Participants Inclusion criteria: Male or female non-preganent women 40-75 years, moderate to severe pain due to OA for >=3 months owing to OA in the hip and or knee as demonstrated by clinical and radiological eveindence by the ACR-criteria; taking one or more of the following medications: NSAIds, COX2-inhibitors, opioids (>= 4 days/week for >= 4 weeks),NRS >= 5 after washout period ;

Exclusion criteria: Daily opioid dosage >=80 mg oxycodone equivalent for >=4 days/week during the week before initial screening; intararticular injections in the previous month; positive urine screening for , amphetamines, , or methadone at baseline visit

Placebo: N=207; mean age 58.5 years; 68% female; 83% white. Pain baseline 7.6 (±1.36)

Oxycodone: N=205; mean age 58.0 years; 72% female; 82% white.Pain baseline 7.8 (±1.35) Interventions Study medication: Titration to individually optimal up to 80 mg/d oxycodone (mean dosage 45 mg/d)

Rescue medication: Acetaminophen up to 3g/d

Allowed co-therapies: Stable dose of antidepressants, , , chondroitin Outcomes Pain: Change in average pain intensity NRS 0-10 recorded via touch-phone

63

Responder: No 50% pain reduction rates reported

PGIC: Not reported

Function: WOMAC OA Index; no detailed outcomes reported *

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Reported Notes * The mean change in the WOMAC subscales from prerandomisation to end of the study was only statistically significant for the pain subscale.

64

Gana 2006 Methods Diasease: Osteoarthritis pain

Study setting: 46 sites in US

Study design: Parallel

Study duration: 2-7 days wash-out, 12 weeks maintenance, 1 week follow-up Participants Inclusion criteria: aged 18-74 years with symptomatic (painful) OA of the knee or hip radiographically confirmed ACR functional class I-III. Taken acetaminophen, NSAIDs, CO-2 inhibitor or an opioid at least 75 of 90 previous days to treat OA pain; baseline joint index pain of >= 40 on a 100 mm pain scale after wash-out

Exclusion criteria: Any medical condition other than OA which that was not well controlled; any other form of arthritis or joint dsease at the index joint; a chronic pain syndrome or fibromyalgia; any contraindication for the use of tramadol; history of substance abuse in the previous 6 months; any condition that was likely to influence the absorption, efficacy, or safety of tramadol ER. Subjects were not permitted to take another investigational medication, a corticosteroid, a medication that could interact with tramadol (e.g., ), or another medication for pain (e.g., analgesics, antidepressants) during the study use of antidepressants, anticonvulsants or other analgesics except acetaminophen

Placebo: N=205; mean age 56.4 (±9.8) years; 68.8% female; 81.5% white. Pain baseline 305.9 (± 95.2)

Tramadol ER 100 mg/d: N=202; mean age 58.4 (±10.9) years; 62.4% female; 72.3% white. Pain baseline 308.2 (±99.3)

Tramadol ER 200 mg/d: N=201; mean age 59.1 (±9.9) years; 63.7% female; 76.1% white. Pain baseline 315.2 (±92.4)

Tramadol ER 300 mg/d: N=201; mean age 58.5 (±9.4) years; 59.2% female; 81.6% white. Pain baseline 296.6 (±96.3)

65

Tramadol ER 400 mg/d: N=202; mean age 58.4 (±9.7) years; 57.9% female; 79.7% white. Pain baseline 298.0 (±93.7) Interventions Study medication: Fixed dosage of 100, 200, 300 or 400 mg tramadol/d, placebo

Rescue medication: acetaminophen up to 2g/day for no more than 3 consecutive days for reasons other than OA or chronic pain. Use of acetaminophen was prohibited 48 h before each study visit.

Allowed co-therapies: Subjects were not permitted to take another investigational medication, a corticosteroid, a medication that could interact with tramadol (e.g., carbamazepine), or another medication for pain (e.g., analgesics, antidepressants) during the study except acetaminoph Outcomes Pain: WOMAC Osteoarthritis Pain Index 0-500

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC OA function score 0-1700

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated

66

Katz 2010 Methods Diasease: Osteoarthritis pain

Study setting: Number of US sites not reported

Study design: Enriched-enrollment randomized withdrawal

Study duration: 7-14 days wash-out, <= 45 days titration, 12 weeks maintenance Participants Inclusion criteria: Male and female subjects aged ≥21 years, OA (as defined by the American College of Rheumatology) of the hip or knee. Moderate-to-severe OA pain was defined as an average 24-hour pain intensity score of ≥5 on a scale of 0–10 at baseline visit following cessation of previous medication .Subjects must have suffered from chronic OA pain in the target joint for more than 3 months, and their pain must not have been adequately controlled with either nonopioid analgesics, tramadol or another opioid at a dose equivalent of <= 40 mg/d morphine for 3 months before beginning the study.

Exclusion criteria: History of drug or alcohol abuse within the last 5 years; positive urine toxicology test for illicit drugs or non prescribed controlled substances at screening; established history or uncontrolled major depressive disorder; any other chronic condition that would interfere with or confound the study results; history of rheumatoid or inflammatory rheumatoid arthritis

Morphine/Naloxone: N=171; mean age 54.2 years; 62% female; 74.9% white. Pain baseline 3.3 (± 1.3)

Placebo:N=73; mean age 54.7 years; 54.9% female; 69.9 % white, Pain baseline 3.2 (± 1.1) Interventions Study medication: Dose titration of morphine, starting with 20 mg/d, increments up to 160 mg/d. Responders were defined by BPI score <=4 over the last 4 days before clinic visit and decline by >02 points from baseline

Rescue medication: Acetaminophen <= 3g/d.

Allowed co-therapies: No information provided.

67

Outcomes Pain: Average pain intensity NRS 0-10 of Brief Pain Inventory electronic diary

Responder: 50% pain reduction from the titration baseline visit

PGIC: Not assessed

Function: WOMAC Funcional Impairment subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated

68

Langford 2006 Methods Diasease: Osteoarthritis pain

Study setting: Several European countries

Study design: Parallel design

Study duration: 6 weeks maintenance Participants Inclusion criteria: Patients (at least 40 years of age) meeting the American College of Rheumatology diagnostic criteria for hip or knee OA and requiring joint replacement surgery, with radiographic evidence of disease in the affected joint(s);moderate or severe pain that was not adequately controlled with weak opioids, with or without paracetamol

Exclusion criteria: Patients who received any strong opioid in the 4 weeks before the study or had recently started a new therapy (e.g., physiotherapy or acupuncture) for their pain. Those patients deemed unsuitable for treatment with a strong opioid (e.g., because of suspected alcohol or drug abuse, or because they were considered at risk for respiratory depression)

Placebo: N=197; mean age 66 years; 68% female; race not reported. Pain baseline 73.3 (no SD reported)

Fentanyl: N=202; mean age 66 years; 65% female; race not reported. Pain baseline 73.1 (no SD reported) Interventions Study medication: Stable dosage of steroids or NSAIDS plus titration to individually optimal dosage of fentanyl 25,50,75 or 100 ug/h (no average dosage reported); Stable dosage of steroids or NSAIDS plus placebo

Rescue medication: None

Allowed co-therapies: Paracetamol up to 4g/d allowed Outcomes Pain: Average pain intensity score NRS 0-100

Responder: No 50% pain reduction rates reported

69

PGIC: Not assessed

Function: WOMAC physical functioning

Withdrawal due to adverse events: Not reported

Serious adverse events: Reported

Death: Reported

70

Markenson 2005 Methods Diasease: Osteoarthritis pain independent of location (~40% Back pain)

Study setting: 9 sites in US

Study design: Parallel design

Study duration: 12 weeks titration and maintenance Participants Inclusion criteria: OA, as defined by the American College of Rheumatology guidelines. Patients selected were experiencing moderate to severe pain in the most affected joint or region, as characterized by: 1) complaints of pain for at least 1 month before day 0 (baseline) or after the patient had discontinued their as necessary opioid; and 2) pain during the week before day 0 that was moderate to severe, defined as an average score of 5 or greater (3 or greater if receiving as necessary opioids) on a scale from 0 (no pain) to 10 (pain as bad as you can imagine). Eligible patients: 1) had been taking NSAIDs or APAP at a therapeutic and/or tolerated (but not as necessary) dose for at least 2 weeks before day 0; 2) were not taking NSAIDs because they were NSAIDintolerant or at high risk for toxicity or complications; or 3) were receiving as necessary oral opioid therapy that was equivalent to 60 mg of oxycodone per day (with or without NSAIDs or APAP for analgesia).

Exclusion criteria: allergic to opioids, were scheduled to have surgery during the study period, had unstable coexisting disease or active dysfunction, had active cancer, were pregnant or nursing, had a past or present history of substance abuse, were involved in litigation related to their pain, or had received intra-articular or intramuscular steroid injections involving the joint or site under evaluation within 6 weeks prior to baseline.

Oxycodone: N=56; mean age 62 years; 68% female; 93% white. Pain baseline 6.9 (± 1.5)

Placebo:N=51; mean age 64 years; 78% female; 71

94 % white. Pain baseline 6.3 (± 1.4) Interventions Study medication: Upward titration up to 120 mg oxycodone/d (average dosage (44 ± 5 mg)

Rescue medication: No information provided

Allowed co-therapies: Patients were permitted to continue their stable NSAID (or APAP) regimen during the study; the dose could be decreased but could not be increased Outcomes Pain: Average daily pain intensity during the past 24 h NRS 0-10 BPI

Responder: Not assessed

PGIC: Not assessed

Function: WOMAC physical functioning score

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated

72

Matsumoto 2005 Methods Diasease: Osteoarthritis pain

Study setting: Number of US sites not reported

Study design: Parallel

Study duration: 2-7 days wash-out, 4 weeks maintenance Participants Inclusion criteria: presence of typical knee or hip joint symptoms and signs and radiographic evidence of OA, with a minimum of grade 2 in the index joint using the Kellgren–Lawrence scale. Patients must have taken either acetaminophen, a conventional NSAID, a COX-2 inhibitor, or an opioid analgesic for at least 75 of 90 days before the screening visit and must have had a suboptimal response to these agents. Other inclusion criteria included age >40 years, use of a medically acceptable form of contraception or abstinence in women of childbearing test 7 days before first dose of study medication. Eligible patients entered a 2- to 7-day washout period during which all analgesic medications were discontinued. Patients were randomized when pain in the index joint reached 40

Exclusion criteria: inflammatory arthritis, gout, Paget’s disease, chronic pain syndrome, or fibromyalgia were excluded. Patients requiring knee or hip arthroplasty within 2 months of screening or anticipating any need for surgical procedures on the index joint during the study were also excluded. Other exclusion criteria included weight <100 pounds, difficulty swallowing capsules or tablets, prior history of substance or alcohol abuse, corticosteroid or investigational drug use within 1 month of first study treatment, and prior history of intolerance to opioids.

Oxymorphone 80 mg/d : N=121; mean age 61.4 years; 64.5% female; 87.6% white. Pain baseline not reported *

Oxymorphone 40 mg/d : N=119; mean age 63.4 years; 55.5% female; 81.5% white. Pain baseline

73

not reported *

Oxycodone 40 mg/d : N=125; mean age 62.7 years; 57.6% female; 89.6% white. Pain baseline not reported *

Placebo:N=124; mean age 61.7 years; 65.3% female; 86.3 % white, Pain baseline not reported * Interventions Study medication: To improve tolerability, patients randomized to the oxymorphone ER 40 mg treatment group received oxymorphone ER 20 mg every 12 hours during weeks 1 and 2 and oxymorphone ER 40 mg every 12 hours during weeks 3 and 4. Similarly, patients randomized to oxycodone CR 20 mg received oxycodone CR 10 mg every 12 hours during weeks 1 and 2 and oxycodone CR 20 mg every 12 hours during weeks 3 and 4.

Rescue medication: No information provided.

Allowed co-therapies: No information provided. Outcomes Pain: WOMAC Pain score VAS 0-500 **

Responder: Not assessed

PGIC: Not assessed

Function: WOMAC Funcional Impairment subscale * and SF-36 physical component score ***

Withdrawal due to adverse events: Reported

Serious adverse events: Incompletely Reported ****

Death: Not explicitly stated Notes * "The mean baseline scores for the VAS were similar across all treatment groups"

** Data extracted from figures

*** Data of WOMAC only reported in figures; data of SF-36 reported by numbers and therefore chosen for meta-analysis

**** "Five patients had serious AEs that were not considered to be related to study medication." 74

75

Munera 2010 Methods Diasease: Osteoarthritis pain

Study setting: 22 sites in US

Study design: Parallel design

Study duration: 3 weeks titration, 1 week maintenance Participants Inclusion criteria: Age >= 18 years; documented history and/or radiologic evidence of chronic osteoarthritis of the hip or knee; receiving opioid therapy for osteoarthritis-related pain within the past year or have experienced pain that was inadequately controlled with a full standard dose of NSAIDs; average pain intensity >=7

Exclusion criteria: receiving opioids at an average daily dose of greater than 90 mg of oral morphine equivalents or patients receiving more than 12 tablets or capsules per day of short- acting opioid-containing products;scheduled to have surgery (including dental) during the study period that involved the use of pre- and/or postoperative analgesics or anesthetics.

Placebo: N=163; mean age 62 years; 67% female; 87% white. Pain baseline not reported

Buprenorphine 5,10 or 20 ug/h: N=152; mean age 60 years; 67% female; 87% white. Pain baseline not reported Interventions Study medication: Titration to individually optimal dosage buprenorphine 5,10 or 20 ug/h (no average dosage reported); placebo

Rescue medication: None

Allowed co-therapies: Aspirin as antithrombotic >=325 mg/d allowed Outcomes Pain: Average pain intensity score NRS 0-10;

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

76

Function: Not assessed

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Reported

77

Peloso 2000 Methods Diasease: Osteoarthritis pain

Study setting: 4 sites in Canada

Study design: Parallel design

Study duration: 4 weeks maintenance Participants Inclusion criteria: Age >= 35 years; primary osteoarthritis garde II defined by standard atlas of radiographs of the knee or hip with symptoms (pain, stiffness, disability requiring the use of acetaminophen, NSAIDS or opioid analgesics for the previous 3 months or longer.

Exclusion criteria: Allergy against analgesics,history of previous opioid abuse; secondary osteoarthritis; grade IV osteoarthritis awaiting surgery

Placebo: N=52; mean age 63.0 years; 65.7 % female; Race not reported, Pain baseline 53.2 (±24.5)

Codeine: N=51; mean age 60.1 years; 58.1% female; Race not reported. Pain baseline 58.2 (±18.9) Interventions Study medication: Titration to individually optimal dosage of codeine 100-400 mg/d (mean daily dosage 159 ± 52 mg/d); placebo

Rescue medication: Acetaminophen

Allowed co-therapies: No information provided Outcomes Pain: Average pain intensity 24 h score VAS 0- 100;

Responder: Not assessed

PGIC: Not assessed

Function: WOMAC physical function (VAS 0- 1700)

Withdrawal due to adverse events: Reported

Serious adverse events: Not reported

78

Death: Reported Notes

79

Rauck 2013 Methods Diasease: Osteoarthritis pain

Study setting: Number of study sites in US not reported

Study design: Parallel design

Study duration: <= 16 days titration, 12 weeks maintenance, <=1 week taper Participants Inclusion criteria: Male or female patients aged >= 21 years with OA of the hip or knee, reporting a target joint pain score of ‡ 5 on the NRS, who were unable to consistently control or treat their pain with nonopioid medications or who had received an opioid for pain treatment. Eligible patients were considered to be in good general health at the time of screening. based on results from medical history, physical examination, laboratory profile, and 12- lead electrocardiogram. Patients were required to have a primary diagnosis of Functional Class I–III OA of the knee or hip. In the case of OA of the knee, primary OA was characterized by knee pain, radiographic severity Grade II–IV, radiographic evidence (< 12 months) of target joint osteophytes, and at least one of the following symptoms: > 50 years of age, morning stiffness < 30 minutes in duration, or crepitus. In the case of OA of the hip, primary OA was characterized by articular hip pain, radiographic severity Grade II–IV, radiographic evidence (< 12 months) of target joint osteophytes, and target joint space narrowing.

Exclusion criteria: Patients with clinically significant intolerance to hydromorphone or other opioids were excluded from the study. In addition, those who had severe asthma; were pregnant or breastfeeding; were being treated with monoamine oxidase inhibitors; or had a history of drug or analgesic abuse, dependence, or misuse, or alcohol abuse within the last 5 years were not eligible. Patients were also excluded if they had a 80

chronic pain syndrome that could interfere with the study’s assessment of pain or other symptoms of OA (eg, fibromyalgia), a documented history of uncontrolled inflammatory arthritis (eg, rheumatoid arthritis), inflammatory arthritis dependent on nonsteroidal anti- inflammatory drugs (NSAIDs), significant clinical abnormalities in laboratory analyses, including hematology and urinanalysis, or other conditions that might interfere with dose administration. This study also excluded patients who were unable to wash out other opioids before the start of the study or who participated in a previous controlled- release hydromorphone HCl study

Hydromorphone 8 mg: N=319; mean age 59.7 years; 64.6% female; 88.3% white. Pain baseline 7.2 (±1.5)

Hydromorphone 16 mg: N=330; mean age 59.5 years; 64.2% female; 86.1% white. Pain baseline 7.5 (±1.4)

Placebo:N=332; mean age 60 years; 63% female; 88.3 % white. Pain baseline 7.4 (±1.4) Interventions Study medication: Hydromorphone 8 mg, Hydromorphone 16 mg, Placebo

Rescue medication: Acetaminophen (< 2,000 mg daily) was permitted as supplemental analgesia during the titration, maintenance, and taper phases.

Allowed co-therapies: No information provided. Outcomes Pain: Mean change from average pain intensity baseline NRS 0-10

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC Funcional Impairment subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated 81

Notes This study did not meet the primary endpoint of improvement in pain intensity as analyzed with the primary imputation method, BOCF.

82

Thorne 2008 Methods Diasease: Osteoarthritis pain

Study setting: 6 sites in Canada

Study design: Cross over

Study duration: (1) Analgesic wash-out for 2-7 days; Up to 1 week wash-out, (2) 4 weeks each periodAfter four weeks, patients crossed over to the alternate treatment for another four weeks. Participants Inclusion criteria: Men and nonpregnant, non- nursing women over the age of 18 years, diagnosed with OA and requiring the use of acetaminophen,anti-inflammatory agents or combination opioid and nonopioid analgesics for at least three months were eligible for the present study. OA was defined by the presence of hip and/or knee symptoms (pain, stiffness, disability) and signs (bony crepitus), as well as radiographic evidence of OA in the medial and/or lateral tibiofemoral compartment (with or without patellofemoral OA), or in the hip. Radiographic evidence was defined by the presence of at least one of the following: osteophytes, joint space narrowing, periarticular sclerosis or subchondral cysts, with a minimum grade 2 severity, as illustrated in the Atlas of Standard Radiographs of Arthritis . Ptients with more advanced grades were eligible if they were not awaiting surgery. Patients using only acetaminophen at the time of enrollment were required to have pain of at least moderate intensity (a 2 or greater on a 0 to 4 ordinal pain scale) at both visits 1 and 2. Patients treated with any other opioid or nonopioid analgesic were required to have at least moderate pain (a 2 or greater on a 0 to 4 ordinal pain scale) after a two to seven day washout period at visit 2.

Exclusion criteria: intolerance to any opioid, tramadol or acetaminophen; Patients who required more than eight tablets per day of acetaminophen plus codeine, or its analgesic equivalent, or with a history of drug or alcohol abuse were also excluded. The following medical conditions were exclusionary: any other form of joint disease or previous replacement of the study joint, renal or 83

hepatic impairment (alanine aminotransferase or aspartate aminotransferase more than two times the upper limit of the normal range), shortened gastrointestinal transit time, peptic ulcer disease, inflammatory disease of the gastrointestinal tract, cardiac or respiratory conditions that put the patient at risk for respiratory depression, a history of seizures or a recognized risk for seizure, and any other condition that would adversely affect the patient’s safety or obscure the assessment of efficacy. Patients receiving monoamine oxidase inhibitors, carbamazepine, , selective serotonin reuptake inhibitors or tricyclic antidepressants, , , neuroleptics, warfarin or digoxin were excluded. Patients who received an investigational drug within the last month were also ineligible.

Total sample: N=100; mean age 61 years; 55% female; race not reported. Pain baseline 50.8 (±17.9) Interventions Study medication: Tramadol flexible 100-400 mg/d placebo (average dosage 340 mg/d)

Rescue medication: Acetaminophen up to 2.6 g/d

Allowed co-therapies: No information provided. Outcomes Pain: Average pain intensity last 24 hours VAS 0- 100

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC physical function subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated Notes ""An analysis of carry-over effect found no significance"

84

Vojtassak 2011 Methods Diasease: Osteoarthritis pain

Study setting: 18 sites in four European countries (Czech republic, Romania, Slovakia, UK)

Study design: Parallel design

Study duration: 4 weeks titration, 12 weeks maintenance Participants Inclusion criteria: Male and female subjects aged ≥40 years, with moderate-to-severe pain induced byOA (as defined by the American College of Rheumatology) of the hip or knee. Moderate-to- severe OA pain was defined as a mean weekly score of ≥5 on a scale of 0–10 for “pain on average” on the BPI scale, which was calculated as a mean of the pain assessments collected at screening visit (week −1), telephone call (week – 0.5), and baseline visit (week 0).Subjects must have suffered from chronic OA pain in the target joint for more than 3 months, and their pain must not have been adequately controlled with daily analgesic (NSAIDs or paracetamol) treatment for the month before beginning the study.

Exclusion criteria: regular treatment with an opioid in the 4 weeks before the screening visit— infrequent use of tramadol, codeine, tilidine, or for no more than 10 days in the 4 weeks before the screening visit was acceptable, but subjects were to stop any use of weak opioids at the screening visit, another type of continuous pain that stood out in comparison with OA pain such as fibromyalgia, cervical radiculopathy, or chronic low back pain, any of the following 6 months before entering study: major trauma to target joints, infection in target joints, radiologically apparent avascular necrosis in target joints, hyaluronan injections in the target joints, arthrodesis in the year or arthroscopy in the 2 months before entering study, planned treatment that could have altered the degree of pain within the study period, subjects who were being treated with buprenorphine, 85

nalbuphine, or pentazocine; corticosteroid injections in the 3 months before the start of the study.

Hydromorphone: N=139; mean age 65 years; 77% female; 100% white. Pain baseline 6.6 (±1.04)

Placebo:N=149; mean age 66 years; 68% female; 100 % white. Pain baseline 6.5 (±0.94) Interventions Study medication: Hydromorphone. Treatment comprised a 4-week titration phase and a 12- week maintenance phase. In the event of unsatisfactory pain control, subjects had their dose titrated 3-4 days after randomisation until week 4 of the study with intervals of at least 3-4 days between dose increments. Possible doses were 4mg, 8mg, 12 mg, 16mg, 24 mg, and a maximum daily dose of 32 mg. There followed a 12-week maintenance phase on as stable a dose as possible. If a dose of 32mg did not provide sufficient analgesia, subjects were withdrawn owing to lack of efficacy and had their dose tapered off by reducing their dose in specified increments every 2 days..

Rescue medication: No information provided.

Allowed co-therapies: No information provided. Outcomes Pain: Average pain intensity NRS 0-10 of Brief Pain Inventory

Responder: No 50% pain reduction rates reported

PGIC: Not assessed

Function: WOMAC Funcional Impairment subscale

Withdrawal due to adverse events: Reported

Serious adverse events: Reported

Death: Not explicitly stated

86

Table 4: Risk of bias assessment of RCTs with opioids in chronic osteoarthritis pain inlcuded into the review

Afilalo 2010 Authors' Bias Support for judgement judgement Random sequence generation Low risk Computer-generated

(selection bias) randomization list Allocation concealment (selection Low risk Randomization was bias) implemented by interactive voice response system Blinding of participants and Unclear risk We had insufficient information personnel (performance bias) to permit judgement. Blinding of outcome assessment Unclear risk No details provided. Outcomes

(detection bias) assessors could be bias based on the side effects profile of tapentadol. Incomplete outcome data (attrition Unclear risk ITT-analysis according to LOCF bias) Selective reporting (reporting bias) Low risk NCT00421928; All outcomes as

reported in the protocol were published Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk Funding by pharmaceutical

industry; 8 of 9 authors affiliated with industry

87

Afilalo 2013 Authors' Bias Support for judgement judgement Random sequence generation Unclear risk "Study medication assigned to

(selection bias) patients by chance" Allocation concealment (selection Unclear risk "Study medication assigned to bias) patients by chance" Blinding of participants and Unclear risk "Neither patient nor investigator personnel (performance bias) knows which patient gets which study medication" Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement.Outcomes assessors could be biased by the side effects profile of tapentadol. Incomplete outcome data (attrition Unclear risk ITT-analysis according to LOCF bias) Selective reporting (reporting bias) High risk NCT00486811; drop out rates

due to serious adverse events for placebo group not reported; serious adverse events not reported Selection bias Unclear risk Demographic and clinical data

of study samples were not reported separately Funding bias High risk The two authors were clinical

investigators of the study. Study sponsored by pharmaceutical company.

88

Babul 2004 Authors' Bias Support for judgement judgement Random sequence generation Low risk "A list of randomization numbers

(selection bias) based on a computer-generated randomization schedule was prepared" Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Low risk "identical appearing placebo" personnel (performance bias) Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement.Outcome assessors could be biased by the side effects of tramadol Incomplete outcome data (attrition Unclear risk ITT, method not reported bias) Selective reporting (reporting bias) High risk No protocol reported by the

authors; SAE not reported Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk No information on study

sponsoring provided. The study was managed by SCIREX Corporation,Horsham, PA, a commercial scientific research service;

89

Breivik 2010 Authors' Bias Support for judgement judgement Random sequence generation Low risk "Validated computer system

(selection bias) with randomised numbers" Allocation concealment (selection Low risk "All patients, investigators, and bias) study centre and Sponsor personnel were blinded to the medication codes." Blinding of participants and Low risk " They were identical in personnel (performance bias) appearance, packed in a labelled foil pouch, containing coded treatment group identification" Blinding of outcome assessment Unclear risk No details provided. Outcomes

(detection bias) assessors could be biased on the side effects profile of buprenorphine. Incomplete outcome data (attrition Unclear risk ITT-analysis, method not bias) reported Selective reporting (reporting bias) Unclear risk No protocol reported by the

authors Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the groups Funding bias High risk One of six authors (senior

author) affiliated with pharmaceutical company

90

Caldwell 1999 Authors' Bias Support for judgement judgement Random sequence generation Low risk "Centralized randomization code

(selection bias) provided by the sponsor" Allocation concealment (selection Low risk "Blocks of study medication bias) blister packs were assigned to study centers in sequential ascending order" Blinding of participants and Low risk " The double dummy technique personnel (performance bias) was used to blind the study medications for differences in appearance and dosing frequency Blinding of outcome assessment Unclear risk No details provided. Outcomes

(detection bias) assessors could be biased by the side effects profile of oxycodone. Incomplete outcome data (attrition Unclear risk ITT-analysis, LCOF bias) Selective reporting (reporting bias) High risk No protocol reported by the

authors; SAE not reported Selection bias Unclear risk No significant baseline

differences in demographic variables between the three groups. Pain baseline not reported. Funding bias High risk Funded by pharmaceutical

industry; one author (senior author) affiliated with drug manufacturer

91

Caldwell 2002 Authors' Bias Support for judgement judgement Random sequence generation Unclear risk We had insufficient information

(selection bias) to permit judgement. Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Low risk "Placebo Avinza and placebo personnel (performance bias) MSC matched the appearance of the respective active treatments" Blinding of outcome assessment Unclear risk No details reported. Outcome

(detection bias) assessors may have been unblinded by the side effects of opioids Incomplete outcome data (attrition Unclear risk ITT-analysis, no details reported bias) Selective reporting (reporting bias) High risk No protocol reported by the

authors; serious adverse events insufficiently reported Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk The affiliation of 4 of 9 study

authors was the pharmaceutical company which sponsored the study

92

DeLemos 2011 Authors' Bias Support for judgement judgement Random sequence generation Low risk " a randomization schedule was

(selection bias) generated" Allocation concealment (selection Low risk "Interactive voice-response bias) system to asign randomization numbers to subjects" Blinding of participants and Low risk "Tablets were similar in personnel (performance bias) appearance and size" Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement. The outcome assessor could be blinded by the side effects of tramadol. Incomplete outcome data (attrition Unclear risk ITT-analysis, methods not bias) reported Selective reporting (reporting bias) Unclear risk No protocol reported by authors Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the groups Funding bias High risk Study funded by pharmaceutical

industry; 7 of 8 authors affiliated with drug manufacturer

93

Fishmann 2007 Authors' Bias Support for judgement judgement Random sequence generation Low risk "Centralized computer-

(selection bias) generated randomization list" Allocation concealment (selection Low risk "Patients and personel bias) remained blinded to treatment assignments" Blinding of participants and Low risk Double dummy technique personnel (performance bias) Blinding of outcome assessment Unclear risk Outcome assessors could be

(detection bias) unblinded by side effects of tramadol Incomplete outcome data (attrition Unclear risk ITT-analysis, the primary bias) method of imputation used for missing data was LOCF Selective reporting (reporting bias) High risk No protocol reported by authors;

data of outcome physical functioning not suited for meta- analysis Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk The affiliation of 3 of 10 study

authors was the pharmaceutical company which sponsored the study

94

Fleishmann 2010 Authors' Bias Support for judgement judgement Random sequence generation Low risk Study medications were

(selection bias) randomly assigned by a computer to a numerical list for each site, and patients were enrolled sequentially using the list. Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Low risk "The tramadol 50-mg capsules personnel (performance bias) were identical in appearance to the placebo capsules". Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement. The outcome assessor could be unblinded by the side effects of tramadol. Incomplete outcome data (attrition Unclear risk ITT-analysis, methods not bias) reported Selective reporting (reporting bias) Unclear risk No protocol reported by authors Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the groups Funding bias High risk Study funded by pharmaceutical

industry; 2 of 6 authors affiliated with drug manufacturer

95

Friedmann 2011 Authors' Bias Support for judgement judgement Random sequence generation Unclear risk We had insufficient information

(selection bias) to permit judgement. Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Unclear risk We had insufficient information personnel (performance bias) to permit judgement. Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement. Incomplete outcome data (attrition Unclear risk ITT-analysis by LOCF bias) Selective reporting (reporting bias) High risk No means and SDs of

secondary outcomes reported; outcomes of function could not be used for meta-analysis Selection bias Unclear risk No significant baseline

differences in demographic and clinical variables between the group Funding bias High risk Authors affiliated with

commercial companies

96

Gana 2006 Authors' Bias Support for judgement judgement Random sequence generation Low risk " a randomization schedule was

(selection bias) generated" Allocation concealment (selection Low risk "Interactive voice-response bias) system to asign randomization numbers to subjects" Blinding of participants and Low risk "Tablets were similar in personnel (performance bias) appearance and size" Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement. The outcome assessor could be unblinded by the side effects of tramadol. Incomplete outcome data (attrition Unclear risk ITT-analysis by LOCF bias) Selective reporting (reporting bias) Unclear risk No protocol reported by authors Selection bias High risk Significant differences between

the groups in pain baseline Funding bias High risk Study funded by pharmaceutical

industry; 7 of 8 authors affiliated with drug manufacturer

97

Katz 2010 Authors' Bias Support for judgement judgement Random sequence generation Low risk "The outpatient site contacted

(selection bias) the interactive Web Response System..." Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Low risk ""Both drug and placebo were personnel (performance bias) packaged so as to be blinded to..." Blinding of outcome assessment Unclear risk No information provided.

(detection bias) Outcomes assessors could be biased by the side effects profile of morphine. Incomplete outcome data (attrition Unclear risk ITT-analysis of primary bias) outcomes by BOCF and of secondary outcomes by LOCF Selective reporting (reporting bias) Low risk The trial was registered at

clinicaltrials.gov (NCT004200992); the primary and the secondary outcomes were consistent in the protocol compared with the publication Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk The study and the writing of the

manuscript were sponsored by the manufacturer of the drug.

98

Langford 2006 Authors' Bias Support for judgement judgement Random sequence generation Low risk "Randomization was performed

(selection bias) using a computer generated list and stratified by target joint" Allocation concealment (selection Low risk "Participants were assigned bias) consecutive treatment codes, and investigators were unaware of the treatment allocation" Blinding of participants and Low risk "TDF and placebo patches were personnel (performance bias) identical." Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement.Outcomes assessors could be biased by the side effects profile of fentanyl. Incomplete outcome data (attrition Unclear risk ITT-analysis according to LCOF bias) method Selective reporting (reporting bias) High risk No protocol available in

clinicaltrials.gov; drop out rates due to adverse events not reported Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk The affiliation of 1 of 5 study

authors (senior author) was the pharmaceutical company which sponsored the study

99

Markenson 2005 Authors' Bias Support for judgement judgement Random sequence generation Low risk The computer-generated

(selection bias) randomization code and study drug bottles labelled with randomization numbers were supplied by the sponsor. Allocation concealment (selection Low risk Study drug bottles labeled with bias) randomization numbers were supplied by the sponsor Blinding of participants and Unclear risk We had insufficient information personnel (performance bias) to permit judgement. Blinding of outcome assessment Unclear risk We had insufficient information

(detection bias) to permit judgement. Incomplete outcome data (attrition Unclear risk ITT by LOCF bias) Selective reporting (reporting bias) Unclear risk No protocol reported by the

authors Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the three groups Funding bias High risk The affiliation of 2 of 4 study

authors (senior author) was the pharmaceutical company which sponsored the study

100

101

Matsumoto 2005 Bias Authors' Support for judgement judgement Random sequence generation Low risk "The list of randomization

(selection bias) numbers was based on a computergenerated randomization schedule." Allocation concealment (selection Unclear risk We had insufficient information bias) to permit judgement. Blinding of participants and Low risk "Study enrollees, study personnel (performance bias) personnel, and investigators were blinded to the identity of the treatments. The statisticians who analyzed the data remained blinded to the identity of the treatments until all data were entered into the database and the database was locked." Blinding of outcome assessment Low risk "Study enrollees, study

(detection bias) personnel, and investigators were blinded to the identity of the treatments. The statisticians who analyzed the data remained blinded to the identity of the treatments until all data were entered into the database and the database was loc Incomplete outcome data (attrition Unclear risk ITT by LCOF bias) Selective reporting (reporting bias) High risk The authors did not provide a

protocol; SAE incompletely reported * Selection bias Low risk No significant baseline

differences in demographic and clinical variables between the groups Funding bias High risk Study funded by manufacturer

of the drug

102

Munera 2010

Authors' Support for Bias judgement judgement Random Unclear risk We had

sequence insufficient generation information to (selection permit bias) judgement. Allocation Unclear risk We had

concealment insufficient (selection information to bias) permit judgement. Blinding of Low risk "Patients participants received and identical personnel looking (performance patches" bias) Blinding of Unclear risk No details

outcome provided. assessment Outcomes (detection assessors could bias) be biased by the side effects profile of buprenorphine Incomplete Unclear risk ITT-analysis, outcome data details not (attrition reported bias) Selective Low risk NCT00455520;

reporting the primary (reporting and the bias) secondary outcomes were consistent in the protocol compared with the publication Selection Unclear risk Pain baseline

bias not reported Funding bias High risk Funding by

pharmaceutical industry; 3 of 4 103 authors affiliated with industry

104

Peloso 2000 Authors' Support for Bias judgement judgement Random Unclear risk We had

sequence insufficient generation information to (selection permit bias) judgement. Allocation Unclear risk We had

concealment insufficient (selection information to bias) permit judgement. Blinding of Low risk

participants "Identical and appearing personnel placebo" (performance bias) Blinding of Unclear risk No details

outcome provided. assessment Outcomes (detection assessors bias) could be bias based on the side effects profile of codeine. Incomplete High risk

outcome data Completer (attrition analysis bias) Selective High risk No protocol

reporting reported by the (reporting authors. SAE bias) not reported Selection Low risk No significant

bias baseline differences in demographic and clinical variables between the groups Funding bias Low risk No funding by

pharmaceutical industry

105 reported; no authors affiliated with industry

106

Rauck 2013 Authors' Bias Support for judgement judgement Random sequence generation Unclear risk We had insufficient information to

(selection bias) permit judgement. Allocation concealment (selection Unclear risk We had insufficient information to bias) permit judgement. Blinding of participants and personnel Low risk "Identical appearing placebo" (performance bias) Blinding of outcome assessment Unclear risk No details provided. Outcomes

(detection bias) assessors could be bias based on the side effects profile of codeine. Incomplete outcome data (attrition High risk Completer analysis bias) Selective reporting (reporting bias) High risk No protocol reported by the

authors. SAE not reported Selection bias Low risk No significant baseline differences

in demographic and clinical variables between the groups Funding bias Low risk No funding by pharmaceutical

industry reported; no authors affiliated with industry

107

Thorne 2008

Authors' Bias Support for judgement judgement Random sequence generation Unclear risk We had insufficient information to

(selection bias) permit judgement. Allocation concealment (selection Unclear risk We had insufficient information to bias) permit judgement. Blinding of participants and personnel Low risk "Matching placebo tablets" (performance bias) Blinding of outcome assessment Unclear risk No information provided.

(detection bias) Outcomes assessors could be bias based on the side effects profile of tramadol Incomplete outcome data (attrition Unclear risk ITT-analysis, method not reported bias) Selective reporting (reporting bias) No study protocol reported by

authors Selection bias Low risk Cross over design Funding bias High risk Funding by pharmaceutical

industry; 4/9 authors affiliated with industry

108

Vojtassasak 2011 Authors' Bias Support for judgement judgement Random sequence generation Low risk "computer-generated randomisation

(selection bias) schedule prepared by an independent statistician" Allocation concealment (selection Unclear risk "The investigator and the subject bias) were blinded to treatment allocation". No further information provided Blinding of participants and personnel Unclear risk The authors did not report the

(performance bias) physical characteristics of the placebos. Blinding of outcome assessment Unclear risk We had insufficient information to

(detection bias) permit judgement.Outcomes assessors could be bias based on the side effects profile of hydromorphone Incomplete outcome data (attrition Unclear risk ITT-analysis, No further bias) information provided Selective reporting (reporting bias) Low risk No protocol reported by the

authors. By search in clinicaltrials. gov: NCT00980798. the primary and the secondary outcomes were consistent in the protocol compared with the publication Selection bias Low risk No significant baseline differences

in demographic and clinical variables between the three groups Funding bias High risk Funding by pharmaceutical

industry; 3 of five authors affiliated with industry

109

Evidence report – Forest Plots of standardised mean differences and risk differences of opioids compared to placebo on selected outcomes

Parallel or cross over design

Figure 1 (Electronic Supplementary Material): Effect estimates (standardised mean differences) of mean pain intensity reduction at end of treatment

110

Opioids Placebo Std. Mean Difference Std. Mean Difference Study or Subgroup Mean SD Total Mean SD Total Weight IV, Random, 95% CI IV, Random, 95% CI 1.1.1 Buprenorphine Breivik 2010 -3.2 3.8 100 -2.3 3.7 99 3.5% -0.24 [-0.52, 0.04] Munera 2010 -1.84 2.68 149 -1.4 2.67 162 4.9% -0.16 [-0.39, 0.06] Subtotal (95% CI) 249 261 8.4% -0.19 [-0.37, -0.02] Heterogeneity: Tau² = 0.00; Chi² = 0.17, df = 1 (P = 0.68); I² = 0% Test for overall effect: Z = 2.18 (P = 0.03)

1.1.2 Codeine Peloso 2000 32.5 21.4 31 47.7 24.7 35 1.3% -0.65 [-1.14, -0.15] Subtotal (95% CI) 31 35 1.3% -0.65 [-1.14, -0.15] Heterogeneity: Not applicable Test for overall effect: Z = 2.55 (P = 0.01)

1.1.3 Fentanyl Langford 2006 -23.6 25.6 202 -17.9 26.7 197 5.8% -0.22 [-0.41, -0.02] Subtotal (95% CI) 202 197 5.8% -0.22 [-0.41, -0.02] Heterogeneity: Not applicable Test for overall effect: Z = 2.17 (P = 0.03)

1.1.4 Hydromorphone Rauck 2013 -2.5 2.9 330 -1.9 2.86 166 6.2% -0.21 [-0.39, -0.02] Rauck 2013 -2 2.86 319 -1.9 2.86 165 6.2% -0.03 [-0.22, 0.15] Vojtassak 2011 -2.4 2.1 138 -2.6 2.3 149 4.6% 0.09 [-0.14, 0.32] Subtotal (95% CI) 787 480 17.0% -0.06 [-0.23, 0.10] Heterogeneity: Tau² = 0.01; Chi² = 4.05, df = 2 (P = 0.13); I² = 51% Test for overall effect: Z = 0.72 (P = 0.47)

1.1.5 Morphine Caldwell 2002 -26.7 29.9 73 -14.6 29.9 24 1.5% -0.40 [-0.87, 0.06] Caldwell 2002 -23.1 29.9 76 -14.6 29.9 25 1.5% -0.28 [-0.74, 0.17] Caldwell 2002 -22.8 29.9 73 -14.6 29.9 24 1.5% -0.27 [-0.73, 0.19] Subtotal (95% CI) 222 73 4.4% -0.32 [-0.58, -0.05] Heterogeneity: Tau² = 0.00; Chi² = 0.19, df = 2 (P = 0.91); I² = 0% Test for overall effect: Z = 2.34 (P = 0.02)

1.1.6 Oxycodone Afilalo 2010 -1.05 0.67 92 -0.88 0.69 158 4.0% -0.25 [-0.51, 0.01] Markenson 2005 -1.7 2.2 56 -0.6 2.9 51 2.1% -0.43 [-0.81, -0.04] Matsumoto 2005 -90 112 125 -60 111 124 4.2% -0.27 [-0.52, -0.02] Subtotal (95% CI) 273 333 10.2% -0.29 [-0.45, -0.13] Heterogeneity: Tau² = 0.00; Chi² = 0.62, df = 2 (P = 0.73); I² = 0% Test for overall effect: Z = 3.48 (P = 0.0005)

1.1.7 Oxymorphone Matsumoto 2005 -118 110 121 -60 111 62 2.9% -0.52 [-0.83, -0.21] Matsumoto 2005 -104 109 121 -60 111 62 3.0% -0.40 [-0.71, -0.09] Subtotal (95% CI) 242 124 5.9% -0.46 [-0.68, -0.24] Heterogeneity: Tau² = 0.00; Chi² = 0.31, df = 1 (P = 0.58); I² = 0% Test for overall effect: Z = 4.12 (P < 0.0001)

1.1.8 Tapentadol Afilalo 2010 -1.16 0.67 149 -0.88 0.69 158 4.8% -0.41 [-0.64, -0.18] Afilalo 2010 0 0 0 0 0 0 Not estimable Subtotal (95% CI) 149 158 4.8% -0.41 [-0.64, -0.18] Heterogeneity: Not applicable Test for overall effect: Z = 3.56 (P = 0.0004)

1.1.9 Tramadol Babul 2004 -29 76.2 124 -18.7 73.9 122 4.1% -0.14 [-0.39, 0.11] DeLemos 2011 -90.4 125.6 199 -94.9 125.9 67 3.5% 0.04 [-0.24, 0.31] DeLemos 2011 -82.5 125.6 201 -94.9 125.9 67 3.5% 0.10 [-0.18, 0.38] DeLemos 2011 -117.8 125.9 199 -94.9 125.9 66 3.5% -0.18 [-0.46, 0.10] Fishman 2007 -46 39.9 105 -32.3 38.2 76 3.2% -0.35 [-0.65, -0.05] Fishman 2007 -42.8 46.4 107 -32.3 48.2 76 3.2% -0.22 [-0.52, 0.07] Fishman 2007 -41.6 50.2 103 -32.3 48.2 75 3.1% -0.19 [-0.49, 0.11] Fleischmann 2000 2.1 1.06 63 2.48 1.13 66 2.4% -0.34 [-0.69, 0.00] Gana 2006 -111.5 123.3 201 -74.2 121.7 51 3.0% -0.30 [-0.61, 0.01] Gana 2006 -107.2 122.2 202 -74.2 121.7 51 3.0% -0.27 [-0.58, 0.04] Gana 2006 -103.9 123.3 201 -74.2 121.7 51 3.0% -0.24 [-0.55, 0.07] Gana 2006 -107.8 123.7 202 -74.2 121.7 52 3.0% -0.27 [-0.58, 0.03] Thorne 2008 38.2 22.7 100 47.7 25.7 100 3.5% -0.39 [-0.67, -0.11] Subtotal (95% CI) 2007 920 42.1% -0.20 [-0.28, -0.12] Heterogeneity: Tau² = 0.00; Chi² = 11.83, df = 12 (P = 0.46); I² = 0% Test for overall effect: Z = 4.87 (P < 0.00001)

Total (95% CI) 4162 2581 100.0% -0.22 [-0.28, -0.17] Heterogeneity: Tau² = 0.01; Chi² = 35.47, df = 28 (P = 0.16); I² = 21% -1 -0.5 0 0.5 1 Test for overall effect: Z = 7.51 (P < 0.00001) Favours opioids Favours placebo Test for subgroup differences: Chi² = 15.03, df = 8 (P = 0.06), I² = 46.8% Figure 2 (Electronic Supplementary Material): Effect estimates (risk difference) of 50% pain reduction at end of treatment

111

Opioids Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 1.2.1 Oxycodone Afilalo 2010 59 342 82 337 25.8% -0.07 [-0.13, -0.01] Afilalo 2013 73 331 91 337 25.0% -0.05 [-0.11, 0.02] Subtotal (95% CI) 673 674 50.8% -0.06 [-0.11, -0.02] Total events 132 173 Heterogeneity: Tau² = 0.00; Chi² = 0.22, df = 1 (P = 0.64); I² = 0% Test for overall effect: Z = 2.68 (P = 0.007)

1.2.2 Tapentadol Afilalo 2010 0 0 0 0 Not estimable Afilalo 2010 110 344 82 337 24.6% 0.08 [0.01, 0.14] Afilalo 2013 0 0 0 0 Not estimable Afilalo 2013 99 344 91 337 24.6% 0.02 [-0.05, 0.09] Subtotal (95% CI) 688 674 49.2% 0.05 [-0.01, 0.10] Total events 209 173 Heterogeneity: Tau² = 0.00; Chi² = 1.46, df = 1 (P = 0.23); I² = 31% Test for overall effect: Z = 1.61 (P = 0.11)

Total (95% CI) 1361 1348 100.0% -0.01 [-0.07, 0.06] Total events 341 346 Heterogeneity: Tau² = 0.00; Chi² = 12.24, df = 3 (P = 0.007); I² = 75% -0.5 -0.25 0 0.25 0.5 Test for overall effect: Z = 0.22 (P = 0.82) Placebo Opioids Test for subgroup differences: Chi² = 8.48, df = 1 (P = 0.004), I² = 88.2%

112

Figure 3 (Electronic Supplementary Material): Effect estimates (risk difference) of Patient Global Impression of Change (PGIC): reports to be much or very much improved at end of treatment Opioids Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 1.3.1 Buprenorphine Breivik 2010 36 100 19 99 16.8% 0.17 [0.05, 0.29] Subtotal (95% CI) 100 99 16.8% 0.17 [0.05, 0.29] Total events 36 19 Heterogeneity: Not applicable Test for overall effect: Z = 2.70 (P = 0.007)

1.3.2 Tapentadol Afilalo 2010 0 0 0 0 Not estimable Afilalo 2010 151 258 97 273 21.2% 0.23 [0.15, 0.31] Afilalo 2013 0 0 0 0 Not estimable Afilalo 2013 139 248 127 294 21.0% 0.13 [0.04, 0.21] Subtotal (95% CI) 506 567 42.2% 0.18 [0.08, 0.28] Total events 290 224 Heterogeneity: Tau² = 0.00; Chi² = 2.85, df = 1 (P = 0.09); I² = 65% Test for overall effect: Z = 3.54 (P = 0.0004)

1.3.3 Oxycodone Afilalo 2010 94 200 97 273 20.4% 0.11 [0.03, 0.20] Afilalo 2013 90 212 127 294 20.6% -0.01 [-0.09, 0.08] Subtotal (95% CI) 412 567 41.0% 0.05 [-0.07, 0.17] Total events 184 224 Heterogeneity: Tau² = 0.01; Chi² = 3.66, df = 1 (P = 0.06); I² = 73% Test for overall effect: Z = 0.87 (P = 0.38)

Total (95% CI) 1018 1233 100.0% 0.13 [0.05, 0.21] Total events 510 467 Heterogeneity: Tau² = 0.01; Chi² = 15.55, df = 4 (P = 0.004); I² = 74% -0.5 -0.25 0 0.25 0.5 Test for overall effect: Z = 3.05 (P = 0.002) Placebo Opioids Test for subgroup differences: Chi² = 2.83, df = 2 (P = 0.24), I² = 29.3%

113

Figure 4 (Electronic Supplementary Material): Effect estimates (standardised mean differences) of physical function improvement at end of treatment

114

Figure 5 (Electronic Supplementary Material): Effect estimates (risk difference) of dropping out due to lack of efficacy during study

115

Opioids Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 1.8.1 Buprenorphine Breivik 2010 7 100 12 99 4.7% -0.05 [-0.13, 0.03] Subtotal (95% CI) 100 99 4.7% -0.05 [-0.13, 0.03] Total events 7 12 Heterogeneity: Not applicable Test for overall effect: Z = 1.23 (P = 0.22)

1.8.2 Codeine Peloso 2000 1 51 5 52 4.5% -0.08 [-0.17, 0.01] Subtotal (95% CI) 51 52 4.5% -0.08 [-0.17, 0.01] Total events 1 5 Heterogeneity: Not applicable Test for overall effect: Z = 1.69 (P = 0.09)

1.8.3 Fentanyl Langford 2006 15 202 64 197 4.9% -0.25 [-0.33, -0.18] Subtotal (95% CI) 202 197 4.9% -0.25 [-0.33, -0.18] Total events 15 64 Heterogeneity: Not applicable Test for overall effect: Z = 6.57 (P < 0.00001)

1.8.4 Hydromorphone Rauck 2013 30 330 42 166 4.9% -0.16 [-0.24, -0.09] Rauck 2013 49 319 41 166 4.8% -0.09 [-0.17, -0.02] Vojtassak 2011 5 139 16 149 5.4% -0.07 [-0.13, -0.01] Subtotal (95% CI) 788 481 15.2% -0.11 [-0.16, -0.05] Total events 84 99 Heterogeneity: Tau² = 0.00; Chi² = 3.69, df = 2 (P = 0.16); I² = 46% Test for overall effect: Z = 3.85 (P = 0.0001)

1.8.5 Morphine Caldwell 2002 9 73 5 24 2.3% -0.09 [-0.26, 0.09] Caldwell 2002 12 73 4 25 2.5% 0.00 [-0.16, 0.17] Caldwell 2002 8 76 5 25 2.4% -0.09 [-0.27, 0.08] Subtotal (95% CI) 222 74 7.3% -0.06 [-0.16, 0.04] Total events 29 14 Heterogeneity: Tau² = 0.00; Chi² = 0.80, df = 2 (P = 0.67); I² = 0% Test for overall effect: Z = 1.12 (P = 0.26)

1.8.6 Oxycodone Afilalo 2010 7 332 56 337 5.8% -0.15 [-0.19, -0.10] Markenson 2005 9 56 34 51 2.6% -0.51 [-0.67, -0.34] Matsumoto 2005 9 121 34 124 4.4% -0.20 [-0.29, -0.11] Subtotal (95% CI) 509 512 12.8% -0.26 [-0.42, -0.11] Total events 25 124 Heterogeneity: Tau² = 0.02; Chi² = 18.36, df = 2 (P = 0.0001); I² = 89% Test for overall effect: Z = 3.29 (P = 0.0010)

1.8.7 Oxymorphone Matsumoto 2005 5 121 17 62 3.7% -0.23 [-0.35, -0.12] Matsumoto 2005 13 125 17 62 3.5% -0.17 [-0.29, -0.05] Subtotal (95% CI) 246 124 7.1% -0.20 [-0.29, -0.12] Total events 18 34 Heterogeneity: Tau² = 0.00; Chi² = 0.52, df = 1 (P = 0.47); I² = 0% Test for overall effect: Z = 4.70 (P < 0.00001)

1.8.8 Tapentadol Afilalo 2010 15 334 56 337 5.8% -0.12 [-0.17, -0.08] Subtotal (95% CI) 334 337 5.8% -0.12 [-0.17, -0.08] Total events 15 56 Heterogeneity: Not applicable Test for overall effect: Z = 5.22 (P < 0.00001)

1.8.9 Tramadol Babul 2004 19 124 45 122 3.9% -0.22 [-0.32, -0.11] Fishman 2007 21 106 15 76 3.6% 0.00 [-0.12, 0.12] Fishman 2007 11 111 15 76 4.0% -0.10 [-0.20, 0.01] Fishman 2007 11 108 15 75 3.9% -0.10 [-0.21, 0.01] Fleischmann 2000 28 63 49 66 2.6% -0.30 [-0.46, -0.14] Gana 2006 31 203 11 51 3.5% -0.06 [-0.19, 0.06] Gana 2006 29 203 11 51 3.5% -0.07 [-0.20, 0.05] Gana 2006 18 204 12 51 3.5% -0.15 [-0.27, -0.02] Gana 2006 23 205 12 52 3.5% -0.12 [-0.24, 0.00] Thorne 2008 1 94 3 88 5.8% -0.02 [-0.07, 0.02] Subtotal (95% CI) 1421 708 37.8% -0.10 [-0.16, -0.05] Total events 192 188 Heterogeneity: Tau² = 0.00; Chi² = 23.26, df = 9 (P = 0.006); I² = 61% Test for overall effect: Z = 3.80 (P = 0.0001)

Total (95% CI) 3873 2584 100.0% -0.13 [-0.16, -0.10] Total events 386 596 Heterogeneity: Tau² = 0.00; Chi² = 85.36, df = 24 (P < 0.00001); I² = 72% -1 -0.5 0 0.5 1 Test for overall effect: Z = 7.49 (P < 0.00001) Favours placebo Favours opioid Test for subgroup differences: Chi² = 24.37, df = 8 (P = 0.002), I² = 67.2%

116

Figure 6 (Electronic Supplementary Material): Effect estimates (risk difference) of dropping out due to adverse events during study

117

118

Figure 7 (Electronic Supplementary Material): Effect estimates (risk difference) of serious adverse events during study Opioids Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight M-H, Random, 95% CI M-H, Random, 95% CI 1.6.1 Buprenorphine Breivik 2010 5 100 4 99 1.5% 0.01 [-0.05, 0.07] Munera 2010 0 152 2 163 10.8% -0.01 [-0.03, 0.01] Subtotal (95% CI) 252 262 12.3% -0.01 [-0.03, 0.01] Total events 5 6 Heterogeneity: Tau² = 0.00; Chi² = 0.84, df = 1 (P = 0.36); I² = 0% Test for overall effect: Z = 0.98 (P = 0.33)

1.6.2 Fentanyl Langford 2006 6 202 2 197 6.4% 0.02 [-0.01, 0.05] Subtotal (95% CI) 202 197 6.4% 0.02 [-0.01, 0.05] Total events 6 2 Heterogeneity: Not applicable Test for overall effect: Z = 1.40 (P = 0.16)

1.6.3 Hydromorphone Rauck 2013 13 330 3 166 5.6% 0.02 [-0.01, 0.05] Rauck 2013 8 319 2 166 8.3% 0.01 [-0.01, 0.04] Vojtassak 2011 10 139 9 149 1.5% 0.01 [-0.05, 0.07] Subtotal (95% CI) 788 481 15.4% 0.02 [-0.00, 0.03] Total events 31 14 Heterogeneity: Tau² = 0.00; Chi² = 0.21, df = 2 (P = 0.90); I² = 0% Test for overall effect: Z = 1.77 (P = 0.08)

1.6.4 Oxycodone Afilalo 2010 10 342 6 337 9.1% 0.01 [-0.01, 0.03] Markenson 2005 3 56 0 51 1.1% 0.05 [-0.01, 0.12] Subtotal (95% CI) 398 388 10.2% 0.02 [-0.01, 0.05] Total events 13 6 Heterogeneity: Tau² = 0.00; Chi² = 1.35, df = 1 (P = 0.24); I² = 26% Test for overall effect: Z = 1.17 (P = 0.24)

1.6.5 Tapentadol Afilalo 2010 4 344 6 337 14.2% -0.01 [-0.02, 0.01] Afilalo 2010 0 0 0 0 Not estimable Subtotal (95% CI) 344 337 14.2% -0.01 [-0.02, 0.01] Total events 4 6 Heterogeneity: Not applicable Test for overall effect: Z = 0.67 (P = 0.50)

1.6.6 Tramadol Fishman 2007 2 325 2 224 20.3% -0.00 [-0.02, 0.01] Fleischmann 2000 0 63 2 66 1.9% -0.03 [-0.08, 0.02] Gana 2006 4 201 0 51 4.3% 0.02 [-0.01, 0.05] Gana 2006 3 201 1 51 2.8% -0.00 [-0.05, 0.04] Gana 2006 3 202 0 51 4.7% 0.01 [-0.02, 0.05] Gana 2006 6 202 1 52 2.5% 0.01 [-0.03, 0.05] Thorne 2008 0 94 1 88 5.2% -0.01 [-0.04, 0.02] Subtotal (95% CI) 1288 583 41.6% -0.00 [-0.01, 0.01] Total events 18 7 Heterogeneity: Tau² = 0.00; Chi² = 4.52, df = 6 (P = 0.61); I² = 0% Test for overall effect: Z = 0.03 (P = 0.97)

Total (95% CI) 3272 2248 100.0% 0.00 [-0.00, 0.01] Total events 77 41 Heterogeneity: Tau² = 0.00; Chi² = 15.28, df = 15 (P = 0.43); I² = 2% -0.2 -0.1 0 0.1 0.2 Test for overall effect: Z = 0.90 (P = 0.37) Favours placebo Favours opioid Test for subgroup differences: Chi² = 7.42, df = 5 (P = 0.19), I² = 32.6%

119

Figure 8 (Electronic Supplementary Material): Effect estimates (risk difference) of death during study

120

Forest Plots of standardised mean differences and risk differences of opioids compared to placebo on selected outcomes

EERW design

Figure 9 (Electronic Supplementary Material): Effect estimates (standardised mean differences) of mean pain intensity reduction at end of treatment Opioid Placebo Std. Mean Difference Std. Mean Difference Study or Subgroup Mean SD Total Mean SD Total Weight IV, Random, 95% CI IV, Random, 95% CI 2.2.3 Morphine Katz 2010 -0.4 1.3 170 -0.2 1.3 173 40.5% -0.15 [-0.37, 0.06] Subtotal (95% CI) 170 173 40.5% -0.15 [-0.37, 0.06] Heterogeneity: Not applicable Test for overall effect: Z = 1.42 (P = 0.16)

2.2.4 Oxycodone Caldwell 1999 0.44 0.76 34 1 0.78 36 16.6% -0.72 [-1.20, -0.23] Friedmann 2011 -0.7 2.05 203 -0.3 2.48 207 42.9% -0.18 [-0.37, 0.02] Subtotal (95% CI) 237 243 59.5% -0.40 [-0.92, 0.12] Heterogeneity: Tau² = 0.11; Chi² = 4.17, df = 1 (P = 0.04); I² = 76% Test for overall effect: Z = 1.49 (P = 0.14)

Total (95% CI) 407 416 100.0% -0.26 [-0.49, -0.03] Heterogeneity: Tau² = 0.02; Chi² = 4.63, df = 2 (P = 0.10); I² = 57% -4 -2 0 2 4 Test for overall effect: Z = 2.17 (P = 0.03) Opioid Placebo Test for subgroup differences: Chi² = 0.73, df = 1 (P = 0.39), I² = 0%

121

Figure 10 (Electronic Supplementary Material): Effect estimates (risk difference) of 50% pain reduction at end of treatment

Opioid Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 2.4.3 Morphine Katz 2010 97 171 82 173 100.0% 0.09 [-0.01, 0.20] Subtotal (95% CI) 171 173 100.0% 0.09 [-0.01, 0.20] Total events 97 82 Heterogeneity: Not applicable Test for overall effect: Z = 1.74 (P = 0.08)

Total (95% CI) 171 173 100.0% 0.09 [-0.01, 0.20] Total events 97 82 Heterogeneity: Not applicable -10 -5 0 5 10 Test for overall effect: Z = 1.74 (P = 0.08) Favours placebo Favours opioid Test for subgroup differences: Not applicable

122

Figure 11 (Electronic Supplementary Material): Effect estimates (standardised mean differences) of physical function improvementat end of treatment

123

Figure 12 (Electronic Supplementary Material): Effect estimates (risk difference) of dropping out due to lack of efficacy during study

Opioid Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 2.5.1 Buprenorphine Munera 2010 43 152 57 163 16.2% -0.07 [-0.17, 0.04] Subtotal (95% CI) 152 163 16.2% -0.07 [-0.17, 0.04] Total events 43 57 Heterogeneity: Not applicable Test for overall effect: Z = 1.28 (P = 0.20)

2.5.2 Morphine Katz 2010 6 171 32 173 37.6% -0.15 [-0.21, -0.09] Subtotal (95% CI) 171 173 37.6% -0.15 [-0.21, -0.09] Total events 6 32 Heterogeneity: Not applicable Test for overall effect: Z = 4.58 (P < 0.00001)

2.5.3 Oxycodone Caldwell 1999 3 34 7 18 3.0% -0.30 [-0.55, -0.06] Caldwell 1999 4 37 6 18 3.1% -0.23 [-0.46, 0.01] Friedmann 2011 12 205 38 207 40.0% -0.13 [-0.19, -0.06] Subtotal (95% CI) 276 243 46.2% -0.16 [-0.24, -0.07] Total events 19 51 Heterogeneity: Tau² = 0.00; Chi² = 2.37, df = 2 (P = 0.31); I² = 16% Test for overall effect: Z = 3.56 (P = 0.0004)

2.5.4 Tramadol Subtotal (95% CI) 0 0 Not estimable Total events 0 0 Heterogeneity: Not applicable Test for overall effect: Not applicable

Total (95% CI) 599 579 100.0% -0.13 [-0.18, -0.09] Total events 68 140 Heterogeneity: Tau² = 0.00; Chi² = 4.31, df = 4 (P = 0.37); I² = 7% -1 -0.5 0 0.5 1 Test for overall effect: Z = 6.12 (P < 0.00001) Favours placebo Favours opioid Test for subgroup differences: Chi² = 2.16, df = 2 (P = 0.34), I² = 7.2%

124

Figure 13 (Electronic Supplementary Material): Effect estimates (risk difference) of dropping out due to adverse events during study Opioid Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 2.8.3 Morphine Katz 2010 18 171 13 173 45.5% 0.03 [-0.03, 0.09] Subtotal (95% CI) 171 173 45.5% 0.03 [-0.03, 0.09] Total events 18 13 Heterogeneity: Not applicable Test for overall effect: Z = 0.98 (P = 0.33)

2.8.4 Oxycodone Caldwell 1999 3 34 3 36 15.6% 0.00 [-0.13, 0.14] Friedmann 2011 43 205 22 207 38.9% 0.10 [0.03, 0.17] Subtotal (95% CI) 239 243 54.5% 0.07 [-0.02, 0.16] Total events 46 25 Heterogeneity: Tau² = 0.00; Chi² = 1.69, df = 1 (P = 0.19); I² = 41% Test for overall effect: Z = 1.52 (P = 0.13)

Total (95% CI) 410 416 100.0% 0.05 [-0.00, 0.11] Total events 64 38 Heterogeneity: Tau² = 0.00; Chi² = 3.06, df = 2 (P = 0.22); I² = 35% -0.5 -0.25 0 0.25 0.5 Test for overall effect: Z = 1.89 (P = 0.06) Placebo Opioid Test for subgroup differences: Chi² = 0.52, df = 1 (P = 0.47), I² = 0%

125

Figure 14 (Electronic Supplementary Material): Effect estimates (risk difference) of serious adverse events during study

Opioid Placebo Risk Difference Risk Difference Study or Subgroup Events Total Events Total Weight IV, Random, 95% CI IV, Random, 95% CI 2.9.3 Morphine Katz 2010 9 171 11 173 20.3% -0.01 [-0.06, 0.04] Subtotal (95% CI) 171 173 20.3% -0.01 [-0.06, 0.04] Total events 9 11 Heterogeneity: Not applicable Test for overall effect: Z = 0.43 (P = 0.66)

2.9.4 Oxycodone Friedmann 2011 5 205 2 207 79.7% 0.01 [-0.01, 0.04] Subtotal (95% CI) 205 207 79.7% 0.01 [-0.01, 0.04] Total events 5 2 Heterogeneity: Not applicable Test for overall effect: Z = 1.16 (P = 0.25)

Total (95% CI) 376 380 100.0% 0.01 [-0.01, 0.03] Total events 14 13 Heterogeneity: Tau² = 0.00; Chi² = 0.83, df = 1 (P = 0.36); I² = 0% -0.5 -0.25 0 0.25 0.5 Test for overall effect: Z = 0.84 (P = 0.40) Placebo Opioid Test for subgroup differences: Chi² = 0.83, df = 1 (P = 0.36), I² = 0%

126