Follicle Stimulating Hormone Is an Accurate Predictor of Azoospermia in Childhood Cancer
Total Page:16
File Type:pdf, Size:1020Kb
1
1 Follicle Stimulating Hormone is an accurate predictor of
2 azoospermia in childhood cancer survivors
3 4 5 Thomas W Kelsey1, Lauren McConville2, Angela B Edgar3, Alex I Ungurianu1, Rod 6 Mitchell4, Richard A Anderson4 and W Hamish B Wallace3 7 8 1 School of Computer Science, University of St. Andrews, St. Andrews, United 9 Kingdom 10 11 2School of Medicine, University of Edinburgh, Edinburgh EH16 4TJ, United 12 Kingdom 13 14 3 Department of Haematology/Oncology, Royal Hospital for Sick Children, Edinburgh 15 EH9 1LF, United Kingdom 16 17 4 MRC Centre for Reproductive Health, University of Edinburgh, EH16 4TJ, United 18 Kingdom 19 20 For correspondence: 21 Department of Haematology/Oncology 22 Royal Hospital for Sick Children 23 17 Millerfield Place 24 Edinburgh EH9 1LW 25 UK 26 Email [email protected] 27 28 29 30 Key words: FSH/azoospermia /childhood cancer/late effects 31 32 DISCLOSURE STATEMENT: The authors have no conflicts of interest to disclose. 33
1 2
34 Abstract 35 Study question: How accurate is FSH as a predictor of azoospermia in survivors of 36 childhood cancer? 37 Summary answer: FSH is an accurate predictor, with a diagnostic threshold of 17 38 IU/L giving 94% probability of avoiding a misdiagnosis of azoospermia. 39 What is known already: The accuracy of FSH as a predictor of azoospermia in adult 40 survivors of childhood cancer is unclear, with conflicting results in the published 41 literature. 42 Study design, size, duration: A systematic review and post hoc analysis of combined 43 data (n = 367) were performed on all published studies containing extractable data on 44 both serum FSH concentration and semen concentration in survivors of childhood 45 cancer. 46 Participants/materials, setting, methods: To identify relevant studies based on the 47 PRISMA statement, PubMed and Medline databases were searched up to September 48 2016 by two blind investigators. Articles were included if they contained both serum 49 FSH concentration and semen concentration, used World Health Organisation 50 certified methods for semen analysis, and the study participants were all childhood 51 cancer survivors. 52 Main results and the role of chance: There was no evidence for either publication 53 bias or heterogeneity for the five studies. For the combined data (n=367) the optimal 54 FSH threshold was 10.4 IU/L with specificity 81% (95% CI 76% - 86%) and 55 sensitivity 83% (95% CI 76% - 89%). The AUC was 0.89 (95%CI 0.86 – 0.93). A 56 range of threshold FSH values for the diagnosis of azoospermia with their associated 57 sensitivities and specificities were calculated. 58 Limitations, reasons for caution: Semen sample analysis remains the gold standard 59 for diagnosis of azoospermia; our findings provide an alternative and inferior method 60 for patients who are reluctant to submit semen samples. 61 Wider implications of the findings : This study provides strong supporting evidence 62 for the use of serum FSH as a surrogate biomarker for azoospermia in adult males 63 who have been treated for childhood cancer. 64 Study funding/competing interest(s): RTM is supported by a Wellcome Trust 65 Intermediate Clinical Fellowship (Grant No: 098522). TWK is supported by EPSRC 66 grant EP/P015638/1. The funding bodies played no role in the design, methods, data
2 3
67 management or analysis or in the decision to publish. The authors have no conflicts of 68 interest to declare. 69
3 4
70 Introduction
71 The potential impact of childhood cancer treatment on male fertility is a significant
72 issue for both families at the time of diagnosis, and the young adult survivor
73 (Anderson, et al., 2015, Skinner, et al., 2017). Treatment at any age, with
74 chemotherapy agents, particularly high doses of alkylating agents, and pelvic
75 radiotherapy, may damage the testes resulting in impaired sperm production(Chow, et
76 al., 2016, Green, et al., 2014, Greenfield, et al., 2007, Greenfield, et al., 2010,
77 Skinner, et al., 2017, van Beek, et al., 2007). While semen analysis remains the gold
78 standard, a serum biomarker of sufficient accuracy, for example Follicle Stimulating
79 Hormone (FSH) would provide a useful indirect assessment of fertility.
80
81 The feedback relationship between the seminiferous tubule and the
82 hypothalamus/pituitary underpins the putative value of FSH and inhibin B in the
83 quantitative assessment of spermatogenesis (McCullagh, 1932). FSH concentrations
84 are negatively related to sperm concentration in both normal men and in those with
85 testicular dysfunction, whereas serum inhibin B is positively related (Anderson, et al.,
86 1997, Illingworth, et al., 1996, Jensen, et al., 1997). Both can be used to aid
87 discrimination of obstructive vs non-obstructive azoospermia in infertile men (Toulis,
88 et al., 2010) without clear benefit of one over the other, likely reflecting their
89 interdependence and relationship to maturational stages of spermatogenesis (Okuma,
90 et al., 2006).
91
92 The ready availability and acceptability of serum FSH analysis compared to semen
93 analysis makes it of potential value as a predictor of azoospermia in childhood cancer
94 survivors (CCS), but the literature contains conflicting reports of the sensitivity and
4 5
95 specificity of plasma concentrations of FSH in this context. Green et al. (Green, et al.,
96 2013) found that FSH was unsuitable as predictor of azoospermia in CCS whilst
97 Romerius et al. (Romerius, et al., 2011) concluded that FSH was an excellent
98 predictor. It is possible that sources of heterogeneity such as diagnosis, treatment
99 regimens or pubertal status may account for this difference. It is also possible that
100 there is little or no inherent heterogeneity, in which case data can be combined from
101 multiple studies in order to provide a dataset suitable for improved assessment of the
102 true level of diagnostic strength.
103
104 In this study we identified studies that have reported FSH and sperm concentrations in
105 CCS, and used them to (a) test the data for homogeneity and (b) to assess the value of
106 FSH as a diagnostic predictor of azoospermia in CCS.
107
5 6
108 Patients and methods
109 Using an established methodology (Iliodromiti, et al., 2013, Iliodromiti, et al., 2016,
110 Kelsey, et al., 2013), a scoping search was carried out using relevant MeSH headings
111 which generated 680 results on PubMed and 973 on Scopus. The abstracts of all
112 studies identified were screened, and any studies in cancer survivors that had data on
113 semen analysis and FSH levels were read in full. Studies were selected if they met
114 the following criteria: (i) they contained both serum FSH concentration and semen
115 concentration (either as explicit values or reported in a scatterplot), (ii) World Health
116 Organisation (WHO) certified methods (World Health Organisation, 2010) were used
117 in the semen analysis; (ii) the study participants were all childhood cancer survivors,
118 or data was clearly demarcated between childhood cancer survivors and normal
119 controls, in which case only cancer survivor data was extracted; (iii) all study designs
120 were included except case reports.
121
122 In addition to data identified from a systematic search of the literature, we included
123 our own data (SI 1) used (but not explicitly reported or given as a scatterplot) in a
124 CCS semen quality study (Thomson, et al., 2002). This study involved 33 male
125 survivors of childhood cancer recruited from the oncology database at the Royal
126 Hospital for Sick Children, Edinburgh, from whom FSH levels were obtained in
127 addition to semen concentrations determined according to WHO protocols.
128
129 While recognising that different FSH assays were used in the studies included, a
130 detailed comparison has shown ‘fair to strong consistency’ between the relevant
131 assays (Radicioni, et al., 2013) thus extracted data were used without further
132 conversion.
6 7
133 Approval was not required from an ethics committee or institutional review board
134 since our research was limited to use of previously collected, non-identifiable data
135 that has been published in peer reviewed journals which is specifically excluded from
136 Research Ethics Committee review by the National Research Ethics Service
137 guidelines of the UK Health Research Agency (HRA, 2013).
138
139 The risks of publication bias and potential small study effect were visually assessed
140 by constructing funnels plots, in which calculated diagnostic accuracy is set against
141 statistical precision (Sterne, et al., 2011). In addition, we performed a linear
142 regression of log diagnostic ratios on the inverse root of effective sample sizes as a
143 test for funnel plot asymmetry, where a non-zero slope coefficient is suggestive of
144 significant asymmetry and small study bias (Deeks, et al., 2005).
145
146
147 Statistical analysis
148 Initial analysis considered the heterogeneity or otherwise of the included studies. This
149 was tested using four distinct techniques: visually by forest plots (Sedgwick, 2015),
150 numerically by calculating the slope of the affine regression equation linking the
151 study diagnostic odds ratios (DOR) to the study thresholds (Moses, et al., 1993,
152 Walter, 2002) (where a slope close to zero shows homogeneity of the studies), and
153 statistically by (i) calculating the p-value for the chi-squared test of the hypothesis
154 that the studies are heterogeneous (a high p-value suggests homogeneity) and (ii)
155 calculating Higgins I2 statistic for measuring inconsistency in meta-analyses (Higgins,
156 et al., 2003) (a small value suggests homogeneity). Two statistical tests were used as
157 the interpretation of I2 can be misleading, since the importance of inconsistency
7 8
158 depends on several factors and the magnitude and direction of effects could lead to a
159 small I2 despite a large chi-squared p-value (Julian P T Higgins, 2011).
160
161 After combining the data into a single set of (FSH, azoospermic or not azoospermic)
162 pairs, a ROC curve was constructed. 95% confidence intervals for the AUC were
163 calculated using 200 bootstraps of the data set, as were the optimal threshold (i.e. the
164 level of FSH that maximizes the probability of a randomly-selected (azoospermic, not
165 azoospermic) pair from the CCS population being correctly diagnosed) and the 95%
166 confidence intervals for the specificity and sensitivity at each threshold value. All
167 analyses were performed using the mada and pROC packages for the R statistical
168 language (R Development Core Team, 2010).
169
170
8 9
171 Results
172 The application of inclusion and exclusion criteria to the studies found in the literature
173 yielded four sources of FSH and semen concentration in CCS (Table 1, SI 2, Fig. 1)
174 (Green, et al., 2013, Lahteenmaki, et al., 2008, Rendtorff, et al., 2012, van Beek, et
175 al., 2007). The Chi-squared statistical test for funnel plot asymmetry (Fig. 2) did not
176 reach statistical significance (p=0.32 for sensitivity; p = 0.17 for specificity),
177 suggesting that neither studies with small sample size nor studies with results lacking
178 statistical significance are missing from the literature. As all the included studies used
179 WHO protocols, we conclude that they are at low risk of bias and have low concern
180 about applicability, as specified by the QUODAS-2 and STARD frameworks for
181 reporting diagnostic accuracy (Bossuyt, et al., 2015, Whiting, et al., 2011).
182
183
184 The confidence intervals for the log-adjusted DOR for each study have similar ranges,
185 suggesting a lack of significant study heterogeneity (Fig. 3). The slope of the
186 regression equation linking the study log DOR to the study FSH thresholds was close
187 to zero (slope = -0.01), providing numerical evidence for study homogeneity. The chi-
188 squared p-values were 0.32 for study sensitivity and 0.17 for study specificity,
189 supplying no statistically significant evidence for the hypothesis that the studies are
190 heterogeneous. The Higgin’s I2 statistic was 0%, the lowest possible indication of
191 study heterogeneity. Taken together, and in conjunction with the lack of publication
192 bias, we conclude that the studies are homogeneous in terms of dependency on FSH
193 thresholds to determine diagnostic accuracy, and hence that combining the study data
194 into a single set results in a representative sample of the CCS population in terms of
195 FSH levels and sperm concentrations.
9 10
196
197 For the combined data (n=367, SI 1, SI 2) the optimal FSH threshold was 10.4 IU/L
198 with specificity 81% (95% CI 76% - 86%) and sensitivity 82% (95% CI 76% - 88%).
199 The AUC was 0.89 (95%CI 0.85 – 0.92), demonstrating that FSH is a strong predictor
200 of azoospermia for CCS (Fig. 4).
201
202 The optimal threshold maximizes the chance of a correct classification for an arbitrary
203 survivor of childhood cancer. In order to quantify FSH levels that minimize
204 misdiagnosis of azoospermia, a range of threshold FSH values for the diagnosis of
205 azoospermia were calculated, together with the median and 95% confidence intervals
206 for their associated sensitivities and specificities (Table 2). A diagnostic threshold of
207 17 IU/L for FSH gives 94% probability of avoiding misdiagnosis of azoospermia,
208 with 95% confidence interval 90 - 97% (Table 2).
209
10 11
210 Discussion
211 We have shown that FSH has strong diagnostic power, with 89% probability that FSH
212 levels will correctly classify as azoospermic, not azoospermic a randomly chosen
213 survivor of childhood cancer (i.e. positive predictive value) (Hanley and McNeil,
214 1982), with 95% confidence that this probability is within 85% and 92% (Fig. 4). We
215 have also calculated clinically-useful diagnostic levels for a range of FSH thresholds
216 (Table 2).
217
218 We have assessed heterogeneity of existing studies using visual, numeric modelling
219 and two distinct statistical tests; none of these suggested any important level of
220 heterogeneity (Figs. 2 and 3). This result is of clinical and biomedical interest in its
221 own right, but also allows us safely to combine the data into a single set, which has
222 greater power for statistical analysis than any single study reported to date. While
223 different FSH assays were used in the studies included in this analysis, there is good
224 concordance between them (Radicioni, et al., 2013).
225
226 FSH, inhibin B, and more recently anti-Mullerian hormone have been previously
227 investigated as biomarkers of seminiferous tubule function, often to attempt to predict
228 the surgical recovery of sperm in azoospermic men (Toulis, et al., 2010). The latter
229 two are products of the Sertoli cell, with potentially an additional contribution to
230 serum inhibin B from germ cells (Makanji, et al., 2014). In the post-chemotherapy
231 testis, the key pathology determining azoospermia or not is the presence or absence of
232 spermatogonial stem cells at the end of treatment. This differs therefore from the
233 situation in the more general male infertility population, where disorders of
234 spermatogenic maturation are relatively common, with likely impact on the germ cell-
11 12
235 Sertoli cell interaction and production of inhibin B, and feedback regulation of FSH.
236 It is thus possible that serum biomarkers of spermatogenesis may be more accurate in
237 post-chemotherapy assessment than with the wide range of pathologies seen in the
238 general infertile population.
239
240 Our calculated optimal FSH threshold for classifying a CCS as azoospermic is 10.4
241 UI/L, where optimal means providing the best tradeoff between sensitivity (i.e.
242 minimized prediction of non-zero sperm concentration for CCS who are in reality
243 azoospermic) and specificity (i.e. minimized prediction of azoospermia for CCS who
244 in reality have non-zero sperm concentration). In clinical practice of long-term follow
245 up of CCS, however, we suggest that a more conservative threshold is more
246 appropriate, since a wrong diagnosis of azoospermia is worse than a false negative. It
247 should also be emphasized that in some azoospermic CCS it is possible to obtain
248 sperm by micro-TESE (Shin, et al., 2016). The bootstrap sampling used to provide the
249 optimal threshold (necessarily) allows calculation of confidence intervals for
250 sensitivities and specificities for all potential thresholds, and from these we observe
251 that a diagnostic threshold of 17 IU/L for FSH has a 94% probability of avoiding this
252 misdiagnosis, with 95% confidence interval of 90% - 97% (Table 2). Our specificity
253 results are in quantitative agreement with a study that reported mean FSH of 22 IU/L
254 in 21 azoospermic CSS compared to 9 IU/L in 10 controls with 81% specificity at a
255 10 IU/L cutoff (Wilhelmsson, et al., 2014) compared to our value of 78%. However
256 our calculated sensitivity at this cutoff is higher: 56% (Wilhelmsson, et al., 2014)
257 compared to 83%.
258
12 13
259 Serum assessment of FSH is therefore a useful test before the patient is ready to
260 submit a semen sample for analysis, and the present analysis indicates high predictive
261 accuracy. Attempts to survey CCS with universal semen analysis have demonstrated
262 the reluctance of these patients to submit semen samples. In contrast, a blood test is
263 less intrusive and more acceptable to these young CCS (Lahteenmaki, et al., 2008).
264 The use of hormone measurement in dried blood spots sent by post has recently been
265 evaluated in the analysis of reproductive function in female cancer survivors (Roberts,
266 et al., 2016), and this technique has clear potential to be useful in the male case.
267
268 This study provides strong supporting evidence for the use of serum FSH as a useful
269 surrogate biomarker for spermatogenesis in adult males who have been treated for
270 childhood cancer, however semen analysis should always be encouraged and remains
271 the gold standard test of spermatogenesis.
272
13 14
273 Author contributions
274 TWK study design, data collection, analysis, drafting and finalising manuscript 275 LM study design, data collection, drafting and finalising manuscript 276 AU data analysis, drafting and finalising manuscript 277 WHBW study design, data analysis, drafting and finalising manuscript 278 ABE data analysis, drafting and finalising manuscript 279 RTM data analysis, drafting and finalising manuscript 280 RAA data analysis, drafting and finalising manuscript 281 282
14 15
283 References 284 Anderson RA, Mitchell RT, Kelsey TW, Spears N, Telfer EE, Wallace WH. Cancer 285 treatment and gonadal function: experimental and established strategies for fertility 286 preservation in children and young adults. Lancet Diabetes Endocrinol 2015;3: 556- 287 567. 288 Anderson RA, Wallace EM, Groome NP, Bellis AJ, Wu FC. Physiological 289 relationships between inhibin B, follicle stimulating hormone secretion and 290 spermatogenesis in normal men and response to gonadotrophin suppression by 291 exogenous testosterone. Hum Reprod 1997;12: 746-751. 292 Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, 293 Moher D, Rennie D, de Vet HC et al. STARD 2015: An Updated List of Essential 294 Items for Reporting Diagnostic Accuracy Studies. Clin Chem 2015;61: 1446-1452. 295 Chow EJ, Stratton KL, Leisenring WM, Oeffinger KC, Sklar CA, Donaldson SS, 296 Ginsberg JP, Kenney LB, Levine JM, Robison LL et al. Pregnancy after 297 chemotherapy in male and female survivors of childhood cancer treated between 1970 298 and 1999: a report from the Childhood Cancer Survivor Study cohort. Lancet Oncol 299 2016;17: 567-576. 300 Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other 301 sample size effects in systematic reviews of diagnostic test accuracy was assessed. J 302 Clin Epidemiol 2005;58: 882-893. 303 Green DM, Liu W, Kutteh WH, Ke RW, Shelton KC, Sklar CA, Chemaitilly W, Pui 304 CH, Klosky JL, Spunt SL et al. Cumulative alkylating agent exposure and semen 305 parameters in adult survivors of childhood cancer: a report from the St Jude Lifetime 306 Cohort Study. Lancet Oncol 2014;15: 1215-1223. 307 Green DM, Zhu L, Zhang N, Sklar CA, Ke RW, Kutteh WH, Klosky JL, Spunt SL, 308 Metzger ML, Navid F et al. Lack of specificity of plasma concentrations of inhibin B 309 and follicle-stimulating hormone for identification of azoospermic survivors of 310 childhood cancer: a report from the St Jude lifetime cohort study. J Clin Oncol 311 2013;31: 1324-1328. 312 Greenfield DM, Walters SJ, Coleman RE, Hancock BW, Eastell R, Davies HA, 313 Snowden JA, Derogatis L, Shalet SM, Ross RJ. Prevalence and consequences of 314 androgen deficiency in young male cancer survivors in a controlled cross-sectional 315 study. J Clin Endocrinol Metab 2007;92: 3476-3482.
15 16
316 Greenfield DM, Walters SJ, Coleman RE, Hancock BW, Snowden JA, Shalet SM, 317 DeRogatis LR, Ross RJ. Quality of life, self-esteem, fatigue, and sexual function in 318 young men after cancer: a controlled cross-sectional study. Cancer 2010;116: 1592- 319 1601. 320 Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating 321 characteristic (ROC) curve. Radiology 1982;143: 29-36. 322 Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta- 323 analyses. BMJ 2003;327: 557-560. 324 HRA. Does my project require review by a Research Ethics Committee? 2013. 325 Iliodromiti S, Kelsey TW, Anderson RA, Nelson SM. Can anti-Mullerian hormone 326 predict the diagnosis of polycystic ovary syndrome? A systematic review and meta- 327 analysis of extracted data. J Clin Endocrinol Metab 2013;98: 3332-3340. 328 Iliodromiti S, Sassarini J, Kelsey TW, Lindsay RS, Sattar N, Nelson SM. Accuracy of 329 circulating adiponectin for predicting gestational diabetes: a systematic review and 330 meta-analysis. Diabetologia 2016;59: 692-699. 331 Illingworth PJ, Groome NP, Byrd W, Rainey WE, McNeilly AS, Mather JP, Bremner 332 WJ. Inhibin-B: a likely candidate for the physiologically important form of inhibin in 333 men. J Clin Endocrinol Metab 1996;81: 1321-1325. 334 Jensen TK, Andersson AM, Hjollund NH, Scheike T, Kolstad H, Giwercman A, 335 Henriksen TB, Ernst E, Bonde JP, Olsen J et al. Inhibin B as a serum marker of 336 spermatogenesis: correlation to differences in sperm concentration and follicle- 337 stimulating hormone levels. A study of 349 Danish men. J Clin Endocrinol Metab 338 1997;82: 4059-4063. 339 Julian P T Higgins SG. Cochrane Handbook for Systematic Reviews of Interventions, 340 2011. The Cochrane Collaboration. 341 Kelsey TW, Dodwell SK, Wilkinson AG, Greve T, Andersen CY, Anderson RA, 342 Wallace WH. Ovarian volume throughout life: a validated normative model. PLoS 343 One 2013;8: e71465. 344 Lahteenmaki PM, Arola M, Suominen J, Salmi TT, Andersson AM, Toppari J. Male 345 reproductive health after childhood cancer. Acta Paediatr 2008;97: 935-942. 346 Makanji Y, Zhu J, Mishra R, Holmquist C, Wong WP, Schwartz NB, Mayo KE, 347 Woodruff TK. Inhibin at 90: from discovery to clinical application, a historical 348 review. Endocr Rev 2014;35: 747-794. 349 McCullagh DR. Dual Endocrine Activity of the Testes. Science 1932;76: 19-20. 16 17
350 Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic 351 test into a summary ROC curve: data-analytic approaches and some additional 352 considerations. Stat Med 1993;12: 1293-1316. 353 Okuma Y, O'Connor AE, Hayashi T, Loveland KL, de Kretser DM, Hedger MP. 354 Regulated production of activin A and inhibin B throughout the cycle of the 355 seminiferous epithelium in the rat. J Endocrinol 2006;190: 331-340. 356 R Development Core Team. R: A language and environment for statistical learning. 357 2010. R Foundation for Statistical Computing, Vienna, Austria. 358 Radicioni A, Lenzi A, Spaziani M, Anzuini A, Ruga G, Papi G, Raimondo M, Foresta 359 C. A multicenter evaluation of immunoassays for follicle-stimulating hormone, 360 luteinizing hormone and testosterone: concordance, imprecision and reference values. 361 J Endocrinol Invest 2013;36: 739-744. 362 Rendtorff R, Beyer M, Muller A, Dittrich R, Hohmann C, Keil T, Henze G, 363 Borgmann A. Low inhibin B levels alone are not a reliable marker of dysfunctional 364 spermatogenesis in childhood cancer survivors. Andrologia 2012;44 Suppl 1: 219- 365 225. 366 Roberts SC, Seav SM, McDade TW, Dominick SA, Gorman JR, Whitcomb BW, Su 367 HI. Self-collected dried blood spots as a tool for measuring ovarian reserve in young 368 female cancer survivors. Hum Reprod 2016;31: 1570-1578. 369 Romerius P, Stahl O, Moell C, Relander T, Cavallin-Stahl E, Wiebe T, Giwercman 370 YL, Giwercman A. High risk of azoospermia in men treated for childhood cancer. Int 371 J Androl 2011;34: 69-76. 372 Sedgwick P. How to read a forest plot in a meta-analysis. BMJ 2015;351: h4028. 373 Shin T, Kobayashi T, Shimomura Y, Iwahata T, Suzuki K, Tanaka T, Fukushima M, 374 Kurihara M, Miyata A, Kobori Y et al. Microdissection testicular sperm extraction in 375 Japanese patients with persistent azoospermia after chemotherapy. Int J Clin Oncol 376 2016;21: 1167-1171. 377 Skinner R, Mulder RL, Kremer LC, Hudson MM, Constine LS, Bardi E, Boekhout A, 378 Borgmann-Staudt A, Brown MC, Cohn R et al. Recommendations for gonadotoxicity 379 surveillance in male childhood, adolescent, and young adult cancer survivors: a report 380 from the International Late Effects of Childhood Cancer Guideline Harmonization 381 Group in collaboration with the PanCareSurFup Consortium. The Lancet Oncology 382 2017;18: e75-e90.
17 18
383 Sterne JA, Sutton AJ, Ioannidis JP, Terrin N, Jones DR, Lau J, Carpenter J, Rucker G, 384 Harbord RM, Schmid CH et al. Recommendations for examining and interpreting 385 funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 386 2011;343: d4002. 387 Thomson AB, Campbell AJ, Irvine DC, Anderson RA, Kelnar CJ, Wallace WH. 388 Semen quality and spermatozoal DNA integrity in survivors of childhood cancer: a 389 case-control study. Lancet 2002;360: 361-367. 390 Toulis KA, Iliadou PK, Venetis CA, Tsametis C, Tarlatzis BC, Papadimas I, Goulis 391 DG. Inhibin B and anti-Mullerian hormone as markers of persistent spermatogenesis 392 in men with non-obstructive azoospermia: a meta-analysis of diagnostic accuracy 393 studies. Hum Reprod Update 2010;16: 713-724. 394 van Beek RD, Smit M, van den Heuvel-Eibrink MM, de Jong FH, Hakvoort-Cammel 395 FG, van den Bos C, van den Berg H, Weber RF, Pieters R, de Muinck Keizer- 396 Schrama SM. Inhibin B is superior to FSH as a serum marker for spermatogenesis in 397 men treated for Hodgkin's lymphoma with chemotherapy during childhood. Hum 398 Reprod 2007;22: 3215-3222. 399 Walter SD. Properties of the summary receiver operating characteristic (SROC) curve 400 for diagnostic test data. Stat Med 2002;21: 1237-1256. 401 Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang 402 MM, Sterne JA, Bossuyt PM, Group Q-. QUADAS-2: a revised tool for the quality 403 assessment of diagnostic accuracy studies. Ann Intern Med 2011;155: 529-536. 404 Wilhelmsson M, Vatanen A, Borgstrom B, Gustafsson B, Taskinen M, Saarinen- 405 Pihkala UM, Winiarski J, Jahnukainen K. Adult testicular volume predicts 406 spermatogenetic recovery after allogeneic HSCT in childhood and adolescence. 407 Pediatr Blood Cancer 2014;61: 1094-1100. 408 World Health Organisation. Examination and processing of human semen. 5th edn, 409 2010. The World Health Organisation. 410 411
18 19
412 413 Table 1
414 Characteristics of the included studies. 415 Age 1st Author Year PubMed ID Number CSS (years, median & range) Green 2012 23423746 257 30.5, 19.7 – 59.1 Lähteenmäki 2008 18430073 23 20.5, 15.6 -31.2 Rendtorff 2012 21726269 37 25, 19-45 van Beek 2007 17981817 17 27,, 17.7 – 42.6 Thomson 2002 12241775 33 21.9, 16.5 - 35.2 416 417 418 419 Table 2. Sensitivity and specificity of FSH-based azoospermia diagnosis for a 420 range of threshold values. Median and 95% CI are calculated from 2,000 421 stratified bootstrap replicates of the combined data (n = 367).
Threshold FSH Specificity Specificity Sensitivity Sensitivity (IU/L) Median 95% CI Median 95% CI
9 0.743 0.690 – 0.801 0.851 0.787 – 0.901
10 0.783 0.730 – 0.836 0.830 0.766 – 0.894
10.4 0.814 0.761 – 0.863 0.823 0.759 – 0.897
11 0.827 0.774 – 0.872 0.801 0.731 – 0.865
12 0.858 0.810 – 0.903 0.773 0.702 – 0.837
13 0.885 0.845 – 0.925 0.752 0.681 – 0.823
14 0.898 0.858 – 0.938 0.716 0.638 – 0.780
15 0.916 0.881 – 0.951 0.695 0.617 – 0.766
16 0.925 0.889 – 0.956 0.660 0.582 – 0.731
17 0.938 0.903 – 0.969 0.638 0.560 – 0.723
18 0.947 0.916 -- 0.974 0.610 0.532 – 0.688
19 0.951 0.920 – 0.978 0.589 0.511 – 0.674
20 0.969 0.943 – 0.991 0.553 0.468 – 0.638
422
19 20
423 424
20 21
425 Figure legends 426 Figure 1. 427 Flow-chart of systematic search methodology. n = number of studies; N = number of 428 childhood cancer survivors fulfilling criteria. 429 430 431 Figure 2. 432 Funnel plots for specificity (upper panel) and sensitivity (lower panel) relating study 433 size to reported diagnostic accuracy for the five studies listed in Table 1. The Chi- 434 squared statistical test for funnel plot asymmetry did not reach statistical significance 435 (p=0.32 for sensitivity; p = 0.17 for specificity), suggesting a lack of publication bias. 436 437 438 439 Figure 3. 440 Forest plot of 95% confidence limits for the log-adjusted diagnostic odds ratio for the 441 five studies listed in Table 1. The vertical dashed line denotes the line of no effect. 442 Visual inspection shows that each study is statistically significant in its own right, that 443 the intervals overlap to a great extent, and that therefore the studies are unlikely to be 444 heterogeneous. 445
446 447 448 449 450 Figure 4. 451 Receiver-operator characteristic (ROC) curve analysis of FSH as predictor of 452 azoospermia (combined cohort: n=367). Area under the curve: 0·89 (95% CI 0·85 – 453 0·92. The optimal diagnostic threshold is 10.4 mIU/mL, with sensitivity 0.814 and 454 specificity 0.823. 455 456 457
21 22
458
22