HealthConclusion Technology Assessment 2010; Vol. 14: No.251 Health Technology Assessment 2010; Vol. 14: No. 25 ChapterAbstract 6 ReferencesStrategies for avoiding errors ListOverview of abbreviations MutualAppendix understanding 1 ExecutiveModelAvoiding complexity summary and identifying errors in health technology assessment models HousekeepingBackground GuidelinesAppendixAim and objectives 2 SkillsMethodsSearch and training strategies ExplanatoryResults analysis concerning methods for preventing errors PlacingAppendixRecommendations the 3findings of the interviews in the context of the literature on avoiding errors ResearchCharts recommendations Avoiding and identifying errors in Chapter 7 ChapterHealthStrategies Technology 1 for identifying Assessment errors reports published to date health technology assessment models: OverviewIntroduction CheckHealthBackground face Technology validity with Assessment experts programme qualitative study and methodological DoAim the and results objectives appear reasonable? Testing model behaviour review ChapterIs the model 2 able to reproduce its inputs? CanMethods the model replicate other data not used in its construction? J Chilcott, P Tappenden, A Rawdin, CompareMethods answersoverview with the answers generated by alternative models PeerIn-depth review interviews of models with HTA modellers M Johnson, E Kaltenthaler, S Paisley, DoubleLiterature checking review of search input valuesmethods Double-programmingValidity checking D Papaioannou and A Shippam ExplanatoryThe organisation analysis of concerningthe interview methods sample for preventing errors Placing the findings of the interviews in the context of the literature on identifying model error Chapter 3 ChapterThe 8model development process OverviewBarriers and facilitators BarriersUnderstanding the decision problem FacilitatorsConceptual modelling Use of information in model development ChapterImplementation 9 model ModelDiscussion checking and conclusions StrengthsReporting and weaknesses of the study CurrentSynthesised understanding model development of modelling process errors in the HTA community ADiscussion taxonomy ofof modelHTA modelling development errors process Current strategies for avoiding errors ChapterCurrent strategies 4 for identifying errors in HTA models RecommendationsDefinition of an error ResearchOverview recommendations Intention, deliberation and planning AcknowledgementsExtent to which the model is fit for purpose ContributionsSimplification of versusauthors error Impact of error on the model results and subsequent decision-making Placing the interview findings in the context of literature on model verification and validation

Chapter 5 A taxonomy of errors Introduction May 2010 Error in understanding the problem Error in the structure and methodology used 10.3310/hta14250 Error in the use of evidence Error in implementation of the model Errors in operation of the model and model reporting Review of taxonomy/classification literature Placing the interview taxonomy of errors in the context of literature on error classifications Health Technology Assessment NIHR HTA programme www.hta.ac.uk HTA

How to obtain copies of this and other HTA programme reports An electronic version of this title, in Adobe Acrobat format, is available for downloading free of charge for personal use from the HTA website (www.hta.ac.uk). A fully searchable DVD is also available (see below). Printed copies of HTA journal series issues cost £20 each (post and packing free in the UK) to both public and private sector purchasers from our despatch agents. Non-UK purchasers will have to pay a small fee for post and packing. For European countries the cost is £2 per issue and for the rest of the world £3 per issue. How to order: – fax (with credit card details) – post (with credit card details or cheque) – phone during office hours credit( card only).

Additionally the HTA website allows you to either print out your order or download a blank order form.

Contact details are as follows: Synergie UK (HTA Department) Email: [email protected] Digital House, The Loddon Centre Tel: 0845 812 4000 – ask for ‘HTA Payment Services’ Wade Road (out-of-hours answer-phone service) Basingstoke Hants RG24 8QW Fax: 0845 812 4001 – put ‘HTA Order’ on the fax header

Payment methods Paying by cheque If you pay by cheque, the cheque must be in pounds sterling, made payable to University of Southampton and drawn on a bank with a UK address. Paying by credit card You can order using your credit card by phone, fax or post.

Subscriptions NHS libraries can subscribe free of charge. Public libraries can subscribe at a reduced cost of £100 for each volume (normally comprising 40–50 titles). The commercial subscription rate is £400 per volume (addresses within the UK) and £600 per volume (addresses outside the UK). Please see our website for details. Subscriptions can be purchased only for the current or forthcoming volume.

How do I get a copy of HTA on DVD? Please use the form on the HTA website (www.hta.ac.uk/htacd/index.shtml). HTA on DVD is currently free of charge worldwide.

The website also provides information about the HTA programme and lists the membership of the various­ ­committees. Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review J Chilcott,* P Tappenden, A Rawdin, M Johnson, E Kaltenthaler, S Paisley, D Papaioannou and A Shippam

School of Health and Related Research (ScHARR), Regent Court, Sheffield, UK

*Corresponding author

Declared competing interests of authors: none

Published May 2010 DOI: 10.3310/hta14250

This report should be referenced as follows:

Chilcott JB, Tappenden P, Rawdin A, Johnson M, Kaltenthaler E, Paisley S, et al. Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review. Health Technol Assess 2010;14(25).

Health Technology Assessment is indexed and abstracted in Index Medicus/MEDLINE, Excerpta Medica/EMBASE, Science Citation Index Expanded (SciSearch®) and Current Contents®/Clinical Medicine. NIHR Health Technology Assessment programme

he Health Technology Assessment (HTA) programme, part of the National Institute for Health TResearch (NIHR), was set up in 1993. It produces high-quality research information on the effectiveness, costs and broader impact of health technologies for those who use, manage and provide care in the NHS. ‘Health technologies’ are broadly defined as all interventions used to promote health, prevent and treat disease, and improve rehabilitation and long-term care. The research findings from the HTA programme directly influence decision-making bodies such as the National Institute for Health and Clinical Excellence (NICE) and the National Screening Committee (NSC). HTA findings also help to improve the quality of clinical practice in the NHS indirectly in that they form a key component of the ‘National Knowledge Service’. The HTA programme is needs led in that it fills gaps in the evidence needed by the NHS. There are three routes to the start of projects. First is the commissioned route. Suggestions for research are actively sought from people working in the NHS, from the public and consumer groups and from professional bodies such as royal colleges and NHS trusts. These suggestions are carefully prioritised by panels of independent experts (including NHS service users). The HTA programme then commissions the research by competitive tender. Second, the HTA programme provides grants for clinical trials for researchers who identify research questions. These are assessed for importance to patients and the NHS, and scientific rigour. Third, through its Technology Assessment Report (TAR) call-off contract, the HTA programme commissions bespoke reports, principally for NICE, but also for other policy-makers. TARs bring together evidence on the value of specific technologies. Some HTA research projects, including TARs, may take only months, others need several years. They can cost from as little as £40,000 to over £1 million, and may involve synthesising existing evidence, undertaking a trial, or other research collecting new data to answer a research problem. The final reports from HTA projects are peer reviewed by a number of independent expert referees before publication in the widely read journal series Health Technology Assessment. Criteria for inclusion in the HTA journal series Reports are published in the HTA journal series if (1) they have resulted from work for the HTA programme, and (2) they are of a sufficiently high scientific quality as assessed by the referees and editors. Reviews in Health Technology Assessment are termed ‘systematic’ when the account of the search, appraisal and synthesis methods (to minimise biases and random errors) would, in theory, permit the replication of the review by others. The research reported in this issue of the journal was commissioned by the HTA programme as project number 07/54/01. The contractual start date was in May 2007. The draft report began editorial review in March 2009 and was accepted for publication in November 2009. As the funder, by devising a commissioning brief, the HTA programme specified the research question and study design. The authors have been wholly responsible for all data collection, analysis and interpretation, and for writing up their work. The HTA editors and publisher have tried to ensure the accuracy of the authors’ report and would like to thank the referees for their constructive comments on the draft document. However, they do not accept liability for damages or losses arising from material published in this report. The views expressed in this publication are those of the authors and not necessarily those of the HTA programme or the Department of Health. Editor-in-Chief: Professor Tom Walley CBE Series Editors: Dr Martin Ashton-Key, Dr Aileen Clarke, Professor Chris Hyde, Dr Tom Marshall, Dr John Powell, Dr Rob Riemsma and Professor Ken Stein Editorial Contact: [email protected] ISSN 1366-5278 © 2010 Queen’s Printer and Controller of HMSO This journal is a member of and subscribes to the principles of the Committee on Publication Ethics (COPE) (http://www.publicationethics.org/). This journal may be freely reproduced for the purposes of private research and study and may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NETSCC, Health Technology Assessment, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK. Published by Prepress Projects Ltd, Perth, Scotland (www.prepress-projects.co.uk), on behalf of NETSCC, HTA. Printed on acid-free paper in the UK by Henry Ling Ltd, The Dorset Press, Dorchester. G DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Abstract Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review J Chilcott,* P Tappenden, A Rawdin, M Johnson, E Kaltenthaler, S Paisley, D Papaioannou and A Shippam

School of Health and Related Research (ScHARR), Regent Court, Sheffield, UK

*Corresponding author

Background: Health policy decisions must be the data within and across themes and subthemes: relevant, evidence-based and transparent. Decision- organisation, roles and communication; the model analytic modelling supports this process but its role development process; definition of error; types of is reliant on its credibility. Errors in mathematical model error; strategies for avoiding errors; strategies decision models or simulation exercises are for identifying errors; and barriers and facilitators. unavoidable but little attention has been paid to Results: There was no common language in processes in model development. Numerous error the discussion of modelling errors and there was avoidance/identification strategies could be adopted inconsistency in the perceived boundaries of what but it is difficult to evaluate the merits of strategies constitutes an error. Asked about the definition of for improving the credibility of models without first model error, there was a tendency for interviewees developing an understanding of error types and causes. to exclude matters of judgement from being errors Objectives: The study aims to describe the current and focus on ‘slips’ and ‘lapses’, but discussion of slips comprehension of errors in the HTA modelling and lapses comprised less than 20% of the discussion community and generate a taxonomy of model errors. on types of errors. Interviewees devoted 70% of Four primary objectives are to: (1) describe the the discussion to softer elements of the process current understanding of errors in HTA modelling; of defining the decision question and conceptual (2) understand current processes applied by the modelling, mostly the realms of judgement, skills, technology assessment community for avoiding errors experience and training. The original focus concerned in development, debugging and critically appraising model errors, but it may be more useful to refer models for errors; (3) use HTA modellers’ perceptions to modelling risks. Several interviewees discussed of model errors with the wider non-HTA literature concepts of validation and verification, with notable to develop a taxonomy of model errors; and (4) consistency in interpretation: verification meaning explore potential methods and procedures to reduce the process of ensuring that the computer model the occurrence of errors in models. It also describes correctly implemented the intended model, whereas the model development process as perceived by validation means the process of ensuring that a practitioners working within the HTA community. model is fit for purpose. Methodological literature on Data sources: A methodological review was verification and validation of models makes reference undertaken using an iterative search methodology. to the Hermeneutic philosophical position, highlighting Exploratory searches informed the scope of that the concept of model validation should not interviews; later searches focused on issues arising be externalised from the decision-makers and the from the interviews. Searches were undertaken in decision-making process. Interviewees demonstrated February 2008 and January 2009. In-depth qualitative examples of all major error types identified in the interviews were performed with 12 HTA modellers literature: errors in the description of the decision from academic and commercial modelling sectors. problem, in model structure, in use of evidence, in Review methods: All qualitative data were analysed implementation of the model, in operation of the using the Framework approach. Descriptive and model, and in presentation and understanding of explanatory accounts were used to interrogate results. The HTA error classifications were compared iii

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Abstract

against existing classifications of model errors in validity of the use of the model in addressing the the literature. A range of techniques and processes real-world problem are consistent with the views are currently used to avoid errors in HTA models: expressed by the HTA community and are therefore engaging with clinical experts, clients and decision- recommended as the basis for further discussions of makers to ensure mutual understanding, producing model credibility. Such discussions should focus on written documentation of the proposed model, risks, including errors of implementation, errors in explicit conceptual modelling, stepping through matters of judgement and violations. Discussions of skeleton models with experts, ensuring transparency modelling risks should reflect the potentially complex in reporting, adopting standard housekeeping network of cognitive breakdowns that lead to errors techniques, and ensuring that those parties involved in models and existing research on the cognitive basis in the model development process have sufficient and of human error should be included in an examination relevant training. Clarity and mutual understanding of modelling errors. There is a need to develop a were identified as key issues. However, their current better understanding of the skills requirements for implementation is not framed within an overall the development, operation and use of HTA models. strategy for structuring complex problems. Interaction between modeller and client in developing Limitations: Some of the questioning may have mutual understanding of a model establishes that biased interviewees responses but as all interviewees model’s significance and its warranty. This highlights were represented in the analysis no rebalancing of the that model credibility is the central concern of report was deemed necessary. A potential weakness decision-makers using models so it is crucial that the of the literature review was its focus on spreadsheet concept of model validation should not be externalised and program development rather than specifically from the decision-makers and the decision-making on model development. It should also be noted that process. Recommendations for future research the identified literature concerning programming would be studies of verification and validation; the errors was very narrow despite broad searches being model development process; and identification of undertaken. modifications to the modelling process with the aim of Conclusions: Published definitions of overall model preventing the occurrence of errors and improving the validity comprising conceptual model validation, identification of errors in models. verification of the computer model, and operational

iv DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Contents

List of abbreviations ...... vii Errors in operation of the model and model reporting ...... 38 Executive summary ...... ix Review of taxonomy/classification literature ...... 39 1 Introduction ...... 1 Placing the interview taxonomy of errors Background ...... 1 in the context of literature on error Aim and objectives ...... 2 classifications ...... 44 Conclusion ...... 47 2 Methods ...... 3 Methods overview ...... 3 6 Strategies for avoiding errors ...... 49 In-depth interviews with HTA modellers .. 3 Overview ...... 49 Literature review search methods ...... 5 Mutual understanding ...... 49 Validity checking ...... 6 Model complexity ...... 53 The organisation of the interview Housekeeping ...... 55 sample ...... 6 Guidelines ...... 57 Skills and training ...... 58 3 The model development process ...... 7 Explanatory analysis concerning methods Overview ...... 7 for preventing errors ...... 60 Understanding the decision problem ...... 7 Placing the findings of the interviews in Conceptual modelling ...... 9 the context of the literature on Use of information in model avoiding errors ...... 62 development ...... 11 Implementation model ...... 13 7 Strategies for identifying errors ...... 63 Model checking ...... 14 Overview ...... 63 Reporting ...... 15 Check face validity with experts ...... 63 Synthesised model development Do the results appear reasonable? ...... 63 process ...... 15 Testing model behaviour ...... 64 Discussion of model development Is the model able to reproduce its inputs? . 66 process ...... 17 Can the model replicate other data not used in its construction? ...... 66 4 Definition of an error ...... 19 Compare answers with the answers Overview ...... 19 generated by alternative models ...... 67 Intention, deliberation and planning ...... 19 Peer review of models ...... 68 Extent to which the model is fit for Double checking of input values ...... 69 purpose ...... 19 Double-programming ...... 69 Simplification versus error ...... 21 Explanatory analysis concerning methods Impact of error on the model results and for preventing errors ...... 71 subsequent decision-making ...... 21 Placing the findings of the interviews Placing the interview findings in the in the context of the literature on context of literature on model identifying model error ...... 71 verification and validation ...... 22 8 Barriers and facilitators ...... 73 5 A taxonomy of errors ...... 25 Barriers ...... 73 Introduction ...... 25 Facilitators ...... 74 Error in understanding the problem ...... 26 Error in the structure and 9 Discussion and conclusions ...... 75 methodology used ...... 30 Strengths and weaknesses of the study ...... 75 Error in the use of evidence ...... 33 Current understanding of modelling Error in implementation of the model ...... 36 errors in the HTA community ...... 76 v

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Contents

A taxonomy of HTA modelling errors ...... 77 Appendix 1 Avoiding and identifying Current strategies for avoiding errors ...... 77 errors in health technology assessment Current strategies for identifying errors models ...... 87 in HTA models ...... 78 Recommendations ...... 78 Appendix 2 Search strategies ...... 91 Research recommendations ...... 79 Appendix 3 Charts ...... 95 Acknowledgements ...... 81 Health Technology Assessment References ...... 83 reports published to date ...... 109

Health Technology Assessment programme ...... 131

vi DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

List of abbreviations

EuSPRIG European Spreadsheet Risks NICE National Institute for Health Interest Group and Clinical Excellence HAQ Health Assessment PBAC Australia’s Pharmaceutical Questionnaire Benefit Advisory Committee HTA Health Technology Assessment PSA probabilistic sensitivity analysis ICER incremental cost-effectiveness QALY quality-adjusted life-year ratio RFP Request For Proposal NCCHTA National Coordinating SCS Society for Modelling and Centre for Health Technology Simulation International Assessment

All abbreviations that have been used in this report are listed here unless the abbreviation is well- known (e.g. NHS), or it has been used only once, or it is a non-standard abbreviation used only in figures/tables/appendices, in which case the abbreviation is defined in the figure legend or in the notes at the end of the table.

vii

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Executive summary

Background Aim and objectives

The National Institute for Health and Clinical The study aims to describe the current Excellence (NICE) in England and Wales and comprehension of errors in the HTA modelling similar structures elsewhere are required to community and to generate a taxonomy of model make health policy decisions that are relevant, errors to facilitate discussion and research within evidence-based and transparent. Decision-analytic the HTA modelling community, on strategies for modelling is well placed to support this process. reducing errors and improving the robustness of The key role that models play is, however, reliant modelling for HTA decision support. on their credibility. Credibility in models depends on a range of factors including the coherence of The study has four primary objectives: the model with the beliefs and attitudes of the decision-makers, the decision-making framework 1. to describe the current understanding of errors within which the model is used, the validity of the in HTA modelling, focusing specifically on: model in being an adequate representation of the i. types of errors problem in hand and the quality of the model. A ii. how errors are made recent study investigating the quality of models 2. to understand current processes applied by the used to support national policy-making in Australia technology assessment community for avoiding reported that 203 of 247 models reviewed were errors in the development, debugging models considered by the investigators to be flawed in and critically appraising models for errors some respect. 3. to use HTA modellers’ perceptions of model errors together with the wider non-HTA Errors in mathematical decision models or literature to develop a taxonomy of model simulation exercises are a natural and unavoidable errors part of the software development process. However, 4. to explore potential methods and procedures little attention has been paid to the processes to reduce the occurrence of errors in models. involved in model development. Good practice guidance either acknowledges the absence of In addition, the study describes the model methodological and procedural guidance on development process as perceived by practitioners model development and testing or makes no working within the HTA community; this emerged reference to the issue. Numerous error avoidance/ as an intermediate objective for considering the identification strategies could be adopted, occurrence of errors in models. potentially impacting upon the whole range of disciplines involved in the decision support process, including information specialists, health Methods economists, statisticians, systematic reviewers and operational research modellers. However, it The study involved two parallel methodological is difficult to evaluate the merits of strategies for strands. The first strand involved a methodological improving the credibility of models without first review of literature discussing model errors. The developing an understanding of error types and second strand comprised in-depth qualitative causes. This study seeks to understand the nature interviews with 12 HTA modellers, including of errors within HTA models, to describe current representatives from academic and commercial processes for minimising the occurrence of such modelling sectors. All qualitative data were errors and to develop a first classification of errors analysed using the Framework approach. to aid discussion of potential strategies for avoiding Descriptive and explanatory accounts were used to and identifying errors. interrogate the data within and across themes and subthemes.

ix

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Executive summary

The themes identified within the analysis are: conceptual model validity, verification of the computer model, and operational validity of the • organisation, roles and communication use of the model in addressing the real-world • the model development process problem. In the absence of explicit conceptual • definition of error modelling, the concept of overall model validity • types of model error breaks down. • strategies for avoiding errors • strategies for identifying errors The methodological literature on verification • barriers and facilitators. and validation of models makes reference to the Hermeneutic philosophical position that recognises that objectivity is unachievable and suggests Results that meaning is created through intersubjective Current understanding of communication. The literature proposes that it is the interaction between the modeller and client in modelling errors in the HTA developing mutual understanding of a model that community establishes a model’s significance and its warranty. There is a general consensus that an important This position highlights that model credibility is part of the definition of what constitutes an error is the central concern to decision-makers in using its impact on decision-making. This indicates that models as an input to the decision-making process. a pragmatic approach is generally taken across the This highlights the point that the concept of model HTA modelling community. Despite this implied validation should not be externalised from the common outlook, there was no common language decision-makers and the decision-making process. used in the discussion of modelling errors and inconsistency in the perceived boundaries of what A taxonomy of HTA modelling constitutes an error. When asked explicitly about errors the definition of model error, there was a tendency for interviewees to exclude matters of judgement Interviewees collectively demonstrated examples of from being errors and focus on ‘slips’ and ‘lapses’. all major error types identified in the literature on However, discussion of slips and lapses comprised errors in end-user developer spreadsheet systems. less than 20% of the discussion on types of errors. Broad error domains include: (1) errors in the When considering how individual elements of description of the decision problem; (2) errors in the modelling process might contribute to flaws model structure; (3) errors in the use of evidence; in decision-making, interviewees devoted 70% of (4) errors in the implementation of the model; the discussion to softer elements of the process (5) errors in the operation of the model; and (6) of defining the decision question and conceptual errors in the presentation and understanding of modelling, mostly the realms of judgement, skills, results. Each error domain contains a breakdown experience and training. of error types and their potential root causes. The HTA errors classifications were compared against Although the original focus of the study concerned existing classifications of model errors identified model errors, when considering methods of within the literature. improving modelling practice it may be more useful to refer to modelling risks rather than the Current strategies for avoiding more black and white term modelling errors. errors Several interviewees discussed concepts of validation and verification, with notable consistency The qualitative analysis suggests that a range of in interpretation. Verification was taken to mean techniques and procedures are currently used to the process of ensuring that the computer model avoid errors in HTA models. Importantly, there correctly implemented the intended model, is some overlap between methods used to identify whereas validation meant the process of ensuring errors and methods used to avoid errors in models. that a model is fit for purpose (hence subsuming Strategies for error avoidance are loosely defined as verification). The qualitative analysis highlights either processes or techniques; the former relate to considerable variation in modelling practice across issues in the model development process, whereas the HTA modelling community, particularly in the latter relate to techniques of implementation. terms of the demonstration of explicit conceptual Generally, the ‘techniques’ are explicit and can be modelling before implementation. Methodological interpreted as relating to how something should literature suggests that overall validity comprises be done, for example, implementing a specific x DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

model layout. Conversely, the ‘processes’ recognise Those error identification methods which do map an unfulfilled requirement and acknowledge that directly to specific aspects of the taxonomy are something should be done as part of the model diagnostic in nature; mismatches in model results development process, yet in many cases this is not and expectations are indicative of the presence accompanied by a clear strategy for achieving the of model error. Those aspects which map to any required goal. Current methods for avoiding errors or all points in the taxonomy are effectively non- include: engaging with clinical experts, clients and specific model screening methods; the presence of decision-makers to ensure mutual understanding, differences between models and prior expectations producing written documentation of the proposed are not necessarily the result of a model error. model, explicit conceptual modelling, e.g. using diagrams and sketches, stepping through skeleton models with experts, ensuring transparency in Recommendations reporting, adopting standard housekeeping techniques, and ensuring that those parties • Published definitions of overall model validity involved in the model development process comprising conceptual model validation, have sufficient and relevant training. Clarity and verification of the computer model, and mutual understanding were identified as key operational validity of the use of the model issues. Current strategies supporting these aspects in addressing the real-world problem are of model development are expressed as process consistent with the views expressed by the HTA requirements, for example, establishment of long- community and are therefore recommended term clinical input and iterative negotiation with as the basis for further discussions of model the decision-maker or client may be used to avoid credibility. errors in the description of the decision problem. • Discussions of model credibility should focus Although a number of techniques were suggested on risks, including errors of implementation, by the interviewees, for instance, sketching out errors in matters of judgement and violations clinical pathways, their use appears to be partial, – violations being defined as puffery, fraud or and the extent of their use appears to vary breakdowns in operational procedures. considerably between individual modellers. Despite • Discussions of modelling risks should reflect an acknowledgement of the importance of these the potentially complex network of cognitive methods, their current implementation is not breakdowns that lead to errors in models framed within an overall strategy for structuring and subsequent failures in decision support. complex problems. Existing research concerning the cognitive basis of human error should be brought into Current strategies for the examination of modelling errors. identifying errors in HTA • There is a need to develop a better models understanding of the skills requirements for the development, operation and use of HTA Methods for identifying errors in HTA models models. include checking face validity, assessing whether • The qualitative interviews highlighted a model results appear reasonable, black-box number of barriers to model checking. testing strategies, testing internal consistency and However, it was indicated that increasing time predictive validity, checking model input values, and resources would not necessarily improve double-programming and peer review. These model checking activities without a matched strategies largely relate to specific techniques increase in their prioritisation. (rather than processes) that may be applied by • The authors take the view, as supported third-party scrutiny. However, the specific target within the methods literature, that it is the of the techniques, i.e. the types of error that the interaction between the modeller and client in technique is intended to identify, is not always clear. developing mutual understanding of a model The majority of methods may be used to identify that establishes a model’s significance and its symptoms of errors; however, the root cause may be warranty. This position highlights that model entirely unclear. This represents a considerable credibility is the central concern to decision- challenge in the peer review of models. The makers in using models. It is crucial then that same may be true of certain black-box validation the concept of model validation should not be techniques; only the tests of the underlying logic externalised from the decision-makers and the of the model guarantee the presence of an error. decision-making process.

xi

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Executive summary

Research recommendations perceptions between stakeholders in the Verification and validation decision problem • development of the model design process Most modellers instinctively take a pragmatic and mechanisms for reporting and specifying approach to developing model credibility. Further models. research on the theory of model verification and validation is required to provide a solid foundation Errors research for (1) the model development process and (2) processes for making evidence-based policy and There is little evidence to suggest that models guidance. are improving in reliability. Further research is required to define, implement and evaluate Model development process modifications to the modelling process with the aim of preventing the occurrence of errors and Further research is required in the model improving the identification of errors in models. development process. Two specific areas were Mechanisms for using National Institute for Health identified: Research-funded model developments to facilitate this research could be pursued, for example, by • techniques and processes for structuring providing research funding for the specification complex HTA models, developing mutual and evaluation of enhanced modelling methods understanding and identifying conflicting within National Institute for Health Research- funded studies.

xii DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 1 Introduction

Background modest corrected estimate. The grossly inflated incorrect figure has, however, subsequently been The National Institute for Health and Clinical used (and is still in use) by authors including Excellence (NICE) in England and Wales and Culyer6 and Drummond et al.7,8 in their similar structures elsewhere are required to make seminal texts to demonstrate the importance of health-policy decisions that are relevant, evidence- incremental analysis in health economics. The based and transparent. Decision-analytic modelling failure in this case was traced back to an error in is recognised as being well placed to support this the interpretation and subsequent modelling of process.1 The key role that models play is, however, diagnostic test characteristics. In a study in 2000 reliant on their credibility. Credibility in decision investigating the quality of models used to support models depends on a range of factors including a national policy-making process, Australia’s the coherence of the model with the beliefs and Pharmaceutical Benefit Advisory Committee attitudes of the decision-makers, the decision- (PBAC) reported that around 60% were flawed making framework within which the model is in some way and 30% demonstrated problems used, the validity of the model as an adequate in the modelling.9 Although it is tempting to representation of the problem in hand and the think that things may get better over time, a quality of the model. recent repeat retrospective analysis reported in 2008 demonstrated that in fact the situation had The predominant software platform for Health appeared to deteriorate with 82% (203 out of 247) Technology Assessment (HTA) models is the of models reviewed being considered by PBAC to spreadsheet. In studies of generic spreadsheet be flawed in some respect.10 system development, there is abundant evidence of the ubiquity of errors and some evidence of Errors in software implementing mathematical their potential to impact critically on decision- decision models or simulation exercises are a making. The European Spreadsheet Risks Interest natural and unavoidable part of the software Group (EuSPRIG) maintains a website recording development process.11 This is recognised research on risks in spreadsheet development outside the HTA domain and accordingly there and on public reports of errors (www.eusprig.org/ exists research, for example in the computer accessed 20 February 2009). In 1993, Cragg and sciences field and in the operational research King2 undertook an investigation of spreadsheet field, in model testing and verification. In development practices in 10 organisations contrast, the modelling research agenda within and found errors in 25% of the spreadsheets HTA has developed primarily from a health- considered. Since 1998, Panko3 has maintained a services research, health economics and statistics review of spreadsheet errors and this suggests that perspective and there has to date, been very little spreadsheet error rates are consistent with error attention paid to the processes involved in model rates in other programming platforms, and when development.12 The extent of this shortcoming is last updated in 2008 error rates had not shown any reflected in the fact that guidance on good practice marked improvement. either acknowledges the absence of methodological and procedural guidance on model development To date, there has been no formal study of the and testing processes, or, makes no reference to the occurrence of errors in models within the HTA issue.13–17 domain. However, high-impact errors have been recorded, perhaps the earliest being the case of A wide range of strategies could be adopted with the sixth stool guaiac test, where Neuhauser and the intention of avoiding and identifying errors Lewicki4 estimated an incremental cost of $47 in models. These include improving the skills of million per case of detected. practitioners, improving modelling and decision- Brown and Burrows5 subsequently identified an making processes, improving modelling methods, error in the model and generated a comparatively improving programming practice and improving

1

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Introduction

software platforms. Within each of these categories discussion of potential strategies for avoiding and there are many further options, so improving identifying errors. the skill base of practitioners might include: the development and dissemination of good practice guides, identification of key skills and redesign Aim and objectives of training and education, sponsoring of skills workshops, and determination of minimum The aim of this study is to describe the current training/skill requirements. These interventions comprehension of errors in the HTA modelling again might impact across the whole range of community and to generate a taxonomy of model disciplines involved in the decision support errors, to facilitate discussion and research within process, including information specialists, health the HTA modelling community on strategies for economists, statisticians, systematic reviewers reducing errors and improving the robustness of and not to forget, modellers themselves. Outside modelling for HTA decision support. the HTA domain, Panko identifies a wide range of initiatives for best practice in spreadsheet The study has four primary objectives: development, testing and inspection, including such things as guidelines for housekeeping 1. to describe the current understanding of errors structures in spreadsheets.18 However, Panko also in HTA modelling, focusing specifically on goes on to review the evidence on the effectiveness i. types of errors of these interventions and identifies first a lack of ii. how errors are made high-quality research on effectiveness and second 2. to understand current processes applied by the that what evidence does exist is at best equivocal.18 technology assessment community for avoiding This, taken together with the evidence that models errors in development, debugging models and are not improving over time, indicates that caution critically appraising models for errors should be exercised in recommending quick-fix, 3. to use HTA modellers’ perceptions of model intuitively appealing solutions. errors together with the wider non-HTA literature to develop a taxonomy of model In the absence of an understanding of error errors types and the causes of errors, it is difficult to 4. to explore potential methods and procedures evaluate the relative merits of such initiatives and to reduce the occurrence of errors in models. to develop an efficient and effective strategy for improving the credibility of models. This study In addition, the study describes the model seeks to understand the nature of errors within development process as perceived by practitioners HTA models, to describe current processes for working within the HTA community, this emerged minimising the occurrence of such errors and as an intermediate objective for considering the to develop a first classification of errors to aid occurrence of errors in models.

2 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 2 Methods

Methods overview Qualitative data collection Face-to-face in-depth interviews19 were used as The project involved two methodological strands: the method of data collection; this is particularly appropriate given their focus on the individual 1. A methodological review of the literature and the need to elicit a detailed personal discussing errors in modelling and principally understanding of the views, perceptions, focusing on the fields of modelling and preferences and experiences of each interviewee. computer science. A topic guide was developed by the research 2. In-depth qualitative interviews with the HTA team that enabled the elicitation of demographic modelling community, including: information, as well as views and experiences of –– Academic Technology Assessment the modelling process and issues around modelling Groups involved in supporting the error (see Appendix 1). Interviews began with a NICE Technology Appraisal Programme discussion of background details and progressed (hereafter referred to as Assessment to a more in-depth exploration of modellers’ Groups) experience and knowledge. The topic guide was –– Outcomes Research and Consultancy piloted internally with one of the authors (AR) to Groups involved in making submissions to ascertain clarity of the questions; the topic guide NICE on behalf of the health-care industry, was subsequently revised. The topic guide was (hereafter referred to as Outcomes designed to facilitate the flow of the interviews, Research organisations). although the interviews were intentionally flexible and participant-focused. During the interviews, interviewees were asked about their personal In-depth interviews with views rather than acting as representatives of their HTA modellers organisation.

Overview of qualitative methods During each interview the participant was asked Interviews with 12 mathematical modellers to describe their view of the model development working within the field of HTA modelling were process. This description was sketched onto undertaken to obtain a description of current paper by a second researcher and subsequently model development practices, to develop an reviewed with the participant who was asked to understanding of models errors and strategies confirm the sketch or ‘map’ as a representative and for their avoidance and identification. From these accurate record of their view. This diagrammatic descriptions, issues of error identification and personal description of the model development prevention were explored.19 process was instrumental within the interviews, representing a personalised map against which to Sampling locate model errors according to the perceptions of the interviewee, thereby guiding the content The interview sample was purposive, comprising and agenda of the remainder of the interview.20 one HTA modeller from each Assessment Group ‘Prompting’ was used to ‘map’ and ‘mine’ contracted by the National Coordinating Centre interviewee responses while ‘probing’ questions for Health Technology Assessment (NCCHTA) to were used to further elaborate responses and to support NICE’s Technology Appraisal Programme, provide richness of depth in the interviewees’ as well as HTA modellers working for UK-based responses. Outcomes Research organisations. The intention of the sampling frame was to identify diversity across All interviews were undertaken between modelling units. Characteristics of the interview September and October 2008 by three members sample are detailed at the end of this chapter. of the research team (PT, JC and AR). All three

3

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Methods

interviewers have experience in developing produce a set of dimensions that was capable of and using health economic models. One of the both describing the range of interview data and interviewers (PT) received formal training in in- discriminating between responses. Discussions depth interviewing techniques and analysis of within the team were carried out during coding qualitative data, this training being shared with and categorisation to obtain consensus before the the modelling team working with the advice of a next stage. Descriptive and explanatory accounts qualitative reviewer (MJ). All but two interviews were used to interrogate the data within and across were paired, involving a lead interviewer and themes and subthemes. The key stages of analysis a second interviewer. The role of the second and synthesis are described in more detail below. interviewer was to facilitate the elicitation of the model development process by developing charts The coding scheme and to ensure relevant issues were explored fully by the lead interviewer. The remaining two interviews The coding scheme covers seven main themes were undertaken on a one-to-one basis. Each together with a range of subthemes. This coding interview lasted approximately 1½ hours. system was initially based on the topic guides used within the interviews. Upon further interrogation Data analysis and synthesis of the interview transcript data, a revised coding scheme was developed through discussion between Interviews with participants were recorded and the three interviewers. Table 1 presents a brief transcribed verbatim; all qualitative data were description of each theme together with a brief analysed using the Framework approach.21 description of its content. Framework is an inductive thematic framework approach particularly suited to the analysis of The first theme (Organisation, roles and in-depth qualitative interview data, involving communication) includes mainly demographic a continuous and iterative matrix-based information that is used to describe the sample of approach to the ordering and synthesising of participants and compare data between groups. qualitative data. The first step in the analysis Although this is not central to the qualitative involved familiarisation with the interview data, synthesis, it does provide relevant information that leading to the identification of an initial set of helps to interpret and explain variations in stated emergent themes (e.g. definition of model error), views between respondents. subthemes (e.g. fitness for purpose) and concepts. Transcript data were then indexed according to The second theme (see Chapter 3) relates to data these themes, both by hand and using NVivo® from the model development process maps and software (QSR International, Southport, UK). Full interview transcript data. A meeting was held matrices were produced for each theme, detailing with all the authors to analyse and synthesise data from each interviewee across each of the evidence from the process maps. This meeting subthemes. Data within each subtheme was then assisted the analysis by providing an overview of categorised, ensuring that the original language the data and informing decisions regarding the of the interviewee was retained, and classified to subsequent qualitative process. The analysis of the

TABLE 1 Description of themes for the qualitative analysis

Theme Description of theme Chapter 1. Organisation, roles and Background of the interviewee, roles and experience, use of specific 2 communication software platforms, working arrangements with other researchers 2. The model development process Key sets of activities and processes in the development of health 3 economic models 3. Definition of error The interviewee’s perception of what does and what does not 4 constitute a model error 4. Types of model error Specific types of model error discussed within the interviews 5 5. Strategies for avoiding errors Approaches to avoiding errors within models 6 6. Strategies for identifying errors Approaches to identifying errors within models 7 7. Barriers and facilitators Potential interventions to avoid and identify errors within models and 8 constraints on their use 4 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

model development charts unearthed a potential The fifth and sixth themes (see Chapters 6 and 7) typology of modelling processes followed by present an analysis of techniques and processes for practitioners. Generalities could be made to some avoiding and identifying errors in HTA models, as extent around particular steps taken during the discussed by interviewees. Within both chapters, modelling process, and so these were explicitly a descriptive analysis is presented together with drawn up on a separate generic map/list during an explanatory analysis, which relates current discussions between the authors. In addition, a error identification and avoidance methods to the comprehensive descriptive analysis of the perceived classification of model errors. The seventh theme model development process was undertaken using (see Chapter 8) briefly presents a descriptive the interview transcript data. Finally, a stylised analysis of interviewees’ views on barriers and model development process was developed; this facilitators to error checking activities. model attempts to both capture and explain key stages in the model development process as well as important iterations between stages, based on the Literature review search perceived processes of individual interviewees. This methods was used to explain some of the nuances in the data and variations in processes between respondents. At the outset the scope and fields of literature relevant to the study were not known. The purpose The third theme (see Chapter 4) was analysed of the searches was twofold; first exploratory using the Framework approach. Key dimensions of searches in advance of the interviews were what is perceived to be and what is not perceived undertaken to inform the development of the scope to be an error were drawn out from the interview and content of the in-depth interviews, second the transcript material. Literature concerning the searches after the interviews were undertaken to verification and validation of models from expand the discussion on key issues raised by the outside the HTA field was used to facilitate the interview analysis and to place the interview data in interpretation of qualitative evidence on the the context of the broader literature. characteristics of model errors. An iterative approach was taken to searching the The fourth theme (see Chapter 5) involved a literature. This is recognised as being the most descriptive analysis of interview data according appropriate approach to searching, in the case of to the initial framework analysis coding scheme: reviews of methodological topics, where the scope error in understanding the problem; error in the of relevant evidence is not known in advance.22 structure and methodology used; error in the use Based on information-seeking behaviour models23 of evidence; error in implementation of the model; the exploratory search strategy used techniques error in operation of the model; and error in including author searching, citation searching, presentation and understanding of results. As with reference checking and browsing. other emergent themes within the interview data, the descriptive analysis was developed through The scoping search was undertaken to establish the a process of categorisation and classification of volume and type of literature relating to modelling comments that made direct or indirect reference errors within the computer science and water to error types.21 The structure of the taxonomy resources fields both identified as potentially rich emerged from the descriptive analysis of interview fields. Highly specific and focused searches were data. The taxonomy identifies three levels: carried out in Computer and Information Systems Abstracts and Water Resources Abstracts. • the error domain, that is the part of the model development process in which the error occurs The interviews and scoping search of the • the error description, illustrating the error in literature identified the need for further reviews relation to the domain including: first, a topic review of error avoidance • the root cause error, which attempts to identify and identification; second, a review of model root causes of errors and to draw out common verification and validation; and third a review of types of error across the modelling process. error classifications and taxonomies. The searches were undertaken in February 2008 and January Non-HTA literature concerning modelling error 2009. The search strategies and sources consulted classifications and taxonomies was identified from are detailed in Appendix 2. searches and used to facilitate the interpretation of the qualitative data. 5

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Methods

Validity checking • a unit with 30 health economists with close links to another unit with 60–70 health-service The report was subjected to internal peer review researchers by a qualitative researcher, mixed methodologist, • a small team of two dedicated modellers with modellers not involved in the interviews and links to a team who undertake systematic a systematic reviewer. The final draft report reviews was distributed to all interviewees to obtain • a unit including operational researchers, health their feedback on whether the findings are economists, health-service researchers and representative of their views, hence providing literature reviewers with direct access to clinical respondent validation.24 experts • a large international organisation covering many different disciplines The organisation of the • a team of 14–15 health economists with no interview sample mention of systematic reviewers. This section briefly details the background, Background of the interviewees expertise, experience and working arrangements of the 12 individuals included in the interview Five of the 12 respondents came from an sample. economics or health economics background, two from a mathematical background, one of whom Organisation previously worked as an actuary. Five respondents have an operational research or modelling Of the 12 respondents, four work primarily for background. All interviewees included in the industry and eight are employed to produce sampling frame have considerable experience reports for NICE and are university-based. Of those developing health economic models and were interviewees working within Assessment Groups, purposefully selected on this basis. two also discussed work undertaken for commercial clients. The respondents reported a variety of Platforms and types of models work types including economic evaluations, as well as primary and secondary research. Commercial All respondents apart from one, mentioned work included health outcomes and pricing and Microsoft excel® as one of the preferred modelling reimbursement work as well as cost-effectiveness software platforms, although two mentioned modelling. The size of the organisations varied potential limitations. Treeage was frequently with a range of 12 to 180 staff employed. Three mentioned and Simul8 was mentioned by two interviewees discussed working internationally. respondents. Other software programs and applications discussed by interviewees included Composition of the research Crystal Ball, stata, winbugs, Delphi, Visual Pascal teams and spss. Ten of the twelve respondents mentioned the use of Markov models. Also reported were The respondents reported a variety of working microsimulation/discrete event simulation, decision arrangements, to demonstrate the range of analysis and decision tree models. organisational arrangements these included:

6 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 3 The model development process

Overview decision-makers’ needs. This phase of model development may draw on evidence, including This chapter presents a descriptive analysis published research and clinical judgement, and of individual respondents’ perceptions of the experience within the research team. In some model development process. The purpose of instances the research question was perceived to eliciting information concerning current model follow directly from the received decision problem, development processes within the interviews was whereas other respondents expressed a perception twofold. First, it allows for the development of an that the research question was open to clarification understanding of the similarities and variations in and negotiation with the client/decision-maker. modelling practice between modellers that might In this sense the research question represents impact on the introduction of certain types of an intellectual leap from the perception of the model error. Second, the elicitation of interviewee’s decision problem. One respondent noted that personalised view of the process was used to form this aspect of model development may not be a the structure of the remaining portion of the discrete step which is finalised before embarking interview, with the model development charts on research: therefore forming a map against which specific types of model error could be identified and The whole process starts at the point where we discussed. are thinking about a project but it will continue once a project has actually started, almost until The descriptive analysis was undertaken according the final day when the model is finalised that to the coding scheme, and examined six key process will continue…we will not finalise the stages of the model development process: (1) structure of the model and the care pathway understanding the decision problem; (2) use perhaps until quite close to the end of the of information; (3) implicit/explicit conceptual whole project because we will be constantly modelling; (4) software model implementation; (5) asking for advice about whether or not we have model checking activities; and (6) model reporting. it correct or at least as good as we can get it. It should be noted that these six stages were not [I8] specified a priori, but rather they emerged from the qualitative synthesis process. A seventh theme Similarly, another respondent highlighted that concerning iterations between stages of model the decision problem may not be fixed even when development was also analysed. The descriptive the model has been implemented, indicating synthesis of the model development process is that modelling may have a role in exploring the presented together with an inferred stylised model decision problem space. Hence, for instance, development process describing both the range modelling can answer questions such as ‘Is and diversity of responses. technology A economically attractive compared to technology B’ or ‘Under what circumstances is A economically attractive compared to B’. These Understanding the decision are fundamentally different decision problems problem however the decision-makers may change their focus throughout the course of an analysis and are For all 12 interview participants, the first stage dependent on the outcomes of the analysis. in the model development process involved developing an understanding of the decision …commonly at the analysis stage, you have problem. Broadly speaking, this phase involved the opportunity or there’s the possibility of a delicate balancing/negotiation process between changing the question that you’re asking. the modeller’s perceived understanding of the So, if the results of the model are in any way problem, the clinical perspective of the decision unexpected or in fact, if they’re in any way problem, as well as the clients’ needs and the insightful, then you might want to refine or

7

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

amend the outcome or the conclusion you’re perceived decision problem, but also what the going to try and support. So if we start off with research question should be (so questioning an analysis of a clinical trial population that is the appropriateness of that perception) and potentially cost-effective, in the course of the understanding wider aspects of the decision model, you find that the product is particularly problem context outside its immediate remit: appropriate for a subgroup of patients, or there are a group of patients where the cost- …our entire initial stage is to try and effectiveness is not attractive, then we’d refine understand what the question should be and the question. what the processes leading up to the decision [I6] point and the process that goes from that decision point. Several interviewees noted the importance of [I8] understanding the decision problem and defining the research question appropriately; in particular, The issue of generating written documentation of one respondent indicated the fundamental the proposed model to foster a shared agreement importance of understanding the decision problem of the decision problem was raised only by two before embarking on the implementation of any interviewees, both of whom work for Outcomes model: Research organisations:

…it all stems from what is the decision It’s almost like a mini report really. It’s problem, to start off with…no matter what summarising what we see as the key aspects you’ve done, if you’ve got the decision problem of the disease, it’s re-affirming the research wrong…basically it doesn’t matter how accurate question again – what’s the model setting out the model is. You’re addressing effectively the to do. It’s pitching the model and methodology wrong decision. selected and why, so if it’s a Markov [model] [I7] it’s kind of going on to detail what the key health states are. We try and not make it too Written documentation of the detailed because then it just becomes almost decision problem and research self-defeating but you are just trying to, again, question with this, we are always trying to get to a point at the end of that particular task where we have Several interviewees mentioned that explicit got the client signed in and as good a clinical documentation of the perceived decision problem approval I guess, as we can get. and the research question forms a central part of [I10] the proposal within consultancy contracts. The discussions suggested that such documentation For several respondents, an important dimension extends beyond the typical protocols produced of understanding the decision problem involved within the NICE Technology Appraisal understanding the perceived needs of the decision- Programme, including a statement of the objective, maker and the client, mediated by the ability of the a description of the patient population and modelling team to meet these. A common theme subgroups, and a summary of key data sources. which emerged from the interviews with Outcomes Several respondents highlighted that although Research modellers was an informal iterative the decision problem may be set out in a scope process of clarification of the decision-maker document, the appropriate approach to addressing needs; that the potential cost of not doing so is the the decision problem may be more complex. development of a model which does not adequately address the decision problem. The analysis …you’ve read your scope you think you know suggested some similarities in the role of initial what it’s about. But then you start reading documentation between Assessment Groups and about it, and you think, ‘Oh! This is more Outcomes Research organisations. In particular, complicated than I thought!’ modellers working for the pharmaceutical [I3] industry highlighted the use of a Request For Proposal (RFP) document and the development This point was further emphasised by another of a proposal; these appear to broadly mirror respondent, who highlighted that there was a the scope and protocol documents developed by need to understand not only the decision-makers’ NICE and the Assessment Groups, respectively.

8 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

There was some suggestion that RFPs were subject Conceptual modelling methods to clarification and negotiation whereas scope The extent to which conceptual modelling is documents were not. demonstrated in practice appears to be highly variable between participants; variation was The first thing is to get an idea of exactly what evident in terms of the explicitness of model the client wants. On a number of occasions you conceptualisation methods and the extent to which get a RFP through which is by no means clear the conceptual model was developed and shared what the request is for and it won’t be very before actual model programming. The interview helpful to rush off and start to develop any data for three respondents implied that no distinct kind of model on that kind of platform…you formal or informal conceptual modelling activities can often get a situation where you answer the took place within the process, that the model was question that they have asked and they then conceptualised and implemented in parallel. One decide that was not the question they had in of these respondents highlighted an underlying mind so certain processes of ensuring clarity rationale for such an approach. are useful. [I12] …I’d rather go with a wrong starting point and get told it’s wrong than go with nothing. Methods for understanding the [I2] decision problem Explicit methods for conceptual modelling Methods used by modellers to understand the included developing written documentation of the perceived decision problem varied between proposed model structure, assumptions, the use respondents, including ‘immersion’ in the clinical of diagrams or sketches of model designs and/or epidemiology and natural history of the disease, clinical/disease pathways, memos, representative understanding the relationship between the mock-ups to illustrate specific issues in the decision point and the broader clinical pathway, proposed implementation model, and written as well as understanding relevant populations, interpretations of evidence. subgroups, technologies and comparators. The extent to which the conceptual model was …you start by, by just immersing yourself complete before embarking on programming in whatever you can find that gives you an the software model varied among interviewees; understanding of…all the basics. several respondents indicated that they would [I3] not begin implementing the software model until some tangible form of conceptual model …it’s the process of becoming knowledgeable had been agreed, or in other cases, ‘signed- about what you are going to be modelling… off ’ by the client or experts. Across the nine reading the background literature, knowing interviewees who did undertake some degree of what the disease process is, knowing the explicit conceptual modelling, the main purposes clinical...pathways typically that patients of such activities included fostering agreement experience within the situation you are with the client and decision-maker, affirming the modelling. Going to see clinical experts to ask research question, pitching and justifying the questions and find out more and gradually proposed implementation model, sense-making hone in on an understanding on the clinical or validity checking, as well as trying ideas, getting area being studied in a way that enables you to feedback, ‘throwing things around’, ‘picking out begin to represent it systematically. ideas’and arranging them in abstraction process. [I4] Broadly speaking, such activities represent a communication tool, that is, a means of generating a shared understanding between the research team, Conceptual modelling the decision-maker and the client.

Nine of the twelve interviewees either alluded to, …we’ll get clinical advisors in to comment on or explicitly discussed, a set of activities involving the model’s structure and framework, comment the conceptualisation and abstraction of the on our interpretation of the evidence, and...we decision problem within a formal framework think we’ve got a shared understanding of that before implementing the mathematical model in a clinical data but are we really seeing it for what software platform. it is? [I10] 9

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

…for us, whether we’ve written it or not, we’ve and you know what utilities are associated with got an agreed set of health states, assumptions, it you can get a damn good answer. and…an agreed approach to populating the [I2] parameters that lie within it…which I think is the methods and population of your decision Identifying key model factors model. [I7] Several interviewees discussed the processes they undertake in developing the structure of the The extent to which the conceptual model model. One interviewee described this activity as a is formally documented varied between the means of: interviewees. One respondent represented a deviant case in the sense that whereas the …identifying the fundamental aspects of the majority of the model development process decision problem and organising them into a concerns understanding the decision problem coherent framework. and (potentially) implicit conceptual modelling, [I4] very little time is spent on actual model implementation. Much of the discussion concerned identifying and specifying relationships between key outcomes, So every aspect of what you then need to, to identifying key states to be differentiated, programme in and populate it is…in people’s identifying transitions, capturing the importance brains to various degrees…if you get all that of patient histories on subsequent prognosis, and agreed, then the actual implementation in addressing the need for more complex descriptions excel should be pretty straightforward. But if of health states. Several interviewees suggested that process has taken you 90% of your time, that this process was at least to some degree, data- then…you build a model pretty quickly. driven, for example, having access to patient-level [I7] trial data may influence the structure of a survival model and the inclusion of specific coefficients. For three interviewees [I5, I2 and I12] there was no Several interviewees discussed the inevitability formal distinction between conceptual modelling of simplification; one suggested the notion of activities and software model implementation, developing ‘feasible models’, ‘best approximations’ rather they occur in parallel. Among these three and ‘best descriptions of evidence’. interviewees, there was a degree of consistency in that all three respondents discussed the …I think it’s a judgement call that modellers development of early implementation models are constantly forced to make. What level (sometimes referred to by interviewees as ‘quick of simplification, what level of granularity models’ or ‘skeleton models’) as a basis for is appropriate for the modelling process? I eliciting information from clinical experts, for think what’s very important is to continually testing ideas of what will and will not work in the refer back to the decision that you are hoping final implementation model, or as a means of to support with your model. So don’t try generating an expectation of what the final model and answer questions that aren’t going to be results are likely to be: asked… [I4] So there is an element of dipping your toe in the water and seeing what looks like it’s going …the aim is to produce the feasible model that to work. It’s actually being prepared to say no best approximates what we think we’re going to this is not going to work. need…on occasions we will just have to say yes [I12] we would love to put this into a model but the trials have not recorded this… I would generally get a rough feel for the [I5] answer within three or four days which is almost analogous to the pen and paper One respondent had a different viewpoint approach which is why I do that. If you know concerning how to identify key factors for inclusion that 20,000 people have this disease which is in the mathematical model, indicating an entirely costing x thousand per year and you know your data-led approach whereby the decision to include drug which costs y pounds will prevent half of it certain elements of the decision problem in the

10 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

implementation model was determined by its Use of information in model influence on the model results: development

…what to keep in and what to chuck out…it For many of the interviewees, the use of evidence depends on how quickly you’ve got the data. If to inform the model development process was you’ve got all of the data then keep everything broader than the generation of systematic reviews in, if you’ve already got it, because you can of clinical effectiveness. The use of evidence used to tell with an EVSI [expected value of sample determine the model structure was also commonly information] or an EVI [expected value of discussed. Further, the use of evidence in informing information] what is important or not even if the model was seldom restricted to a single stage in you or you fit a meta-model to it or the ones the development process; for several respondents, that are capped fall straight out… evidence was used to understand, shape and [I2] interpret all aspects of model development.

…you are only including these things Use of previous models because you think they will affect the ICER [incremental cost-effectiveness ratio], if Views concerning the appropriate use of previous they don’t affect the ICER you shouldn’t be models to inform the model under development including them. varied between study participants. One subtle but [I2] potentially important difference in use of previous models concerned the extent to which such models Clearly, decisions concerning what should and developed for a different decision problem, should not be included in the model, how such decision-maker and purpose, would be used to factors should be captured and related to one inform the structure of the current model. For another, and the appropriate level of complexity some respondents, previous models were examined and granularity are highly complex. Indeed, two as a means of ‘borrowing’ model structures, respondents highlighted that this aspect of model whereas for others, the model represents a starting development was hard to teach and was learned point for thinking about an underlying conceptual through experience. framework for the model:

I don’t think…that I can sit here and write out …suppose the first thing we will do actually is how you build a model. I think it’s something see if there are any existing models that we can which comes with experience. build from and if there are and we think they [I7] are any good then we have a structure there. [I5] Formal a priori model specification …it’s not a question of what people have done in the past and whether they did a good job or One respondent [I1] highlighted the absence of an not, it’s seeing what we can borrow from those important stage of the model development process that can be relevant. common within software development projects: [I12] that of model design and analysis, whereby the proposed model is formally specified and its The interview data suggest the notion of being software implementation is planned before the ‘data-led’ in model structuring. One may infer two model is programmed. In this sense, the design risks associated with being dependent on previous and analysis stage represents a direct link between model structures: one uses an existing model which the conceptual model and the implementation is not adequately structured to address the decision model. Such activity would usually include problem at hand, or one misses out on developing producing a formal model specification in advance an understanding and agreement of the decision of any programming, including details of how the problem through conceptual modelling (see model will be programmed, where parameters will Conceptual modelling). This inference appears to be stored and linked, housekeeping issues and an a be supported by the views of a further respondent: priori specification of how the model will checked and tested. If you set two modelling teams the same task I think it’s quite likely they would come up

11

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

with very different models, which is one of the between model structuring processes and data aspects of familiarisation. Unless of course identification and use; whereby the model there were lots of existing models that they structure has a considerable influence on the data could draw on and fall into familiar grooves in requirements to populate the model, whereas the terms of the way you model a particular process availability of evidence may in turn influence the so that often happens you look at what has structure of the model (whether implemented already been done in terms of the modelling or conceptualised). In effect, the ‘worldview’ or process and that is part of the familiarisation weltenschauungen25 of the modeller influences the process and you say oh yeah that makes perfect perception of what evidence is required, and the sense we will do the same again. identification and interpretation of that evidence in [I4] turn, influences and adapts that weltenschauungen.

I think there is a danger in that as well you I divided it between design[ing] and know if you just slavishly adopt a previous populat[ing]…the populating being searching structure to a model and everybody does the for, locating the information to actually same there’s no potential for better structures parameterise those relationships. And then to develop or for mistakes to be appealed so I going back and changing the relationships to think there is a danger in that in some ways. ones that you can actually parameterise from [I4] the data that’s available, and then changing the data that you look for to fit your revised view of Use of clinical/methodological the world. expert input [I6] The extent to which clinical experts were involved Two respondents indicated that previous working within the model development process varied arrangements had led to the systematic review and considerably between interviewees. Several modelling activities being perceived as distinct interviewees indicated that they would involve entities that did not inform one another, so clinical experts in the research team itself, from hindering the iterative dynamic highlighted above. the very outset of the project, to help develop However, both of these respondents indicated an understanding of the decision problem and a change in this working culture, suggesting a to formulate the research question; this was joint effort between the modeller and the other particularly the case where in-house clinical members of the research team: expertise was not available. I think things are…kind of trying to be …we try to get in clinical experts as part of the changed. I’m not seeing a systematic review… team and they will usually also be authors on as something separate from modelling; now the report and it tends to be very much a joint they are working together and defining what process of structuring the problem. [it] is…that we are looking for together. [I5] [I1]

At the other end of the spectrum, one respondent Several respondents highlighted the importance suggested that they would only involve clinical of this iterative phase of model development and experts after having built a skeleton model. its implications for the feasibility of the resulting mathematical model. I would build the model, just build it straight off, I mean it depends on how quick you work The most difficult ones aren’t just the numbers; but if you build it quickly then you can have they are where there’s a qualitative decision. all that in place before you talk to clinicians or You know, is there any survival benefit? Yes or other peer reviewers… no? If there is a survival benefit, it’s a different [I2] model?…So it’s structural. [I6] Relationship between clinical data and model structure …it’s no good having an agreed model structure that’s impossible to populate with The majority of the interviewees highlighted data so you obviously give some thought as to the existence of a complex iterative relationship how you are going to make it work that’s part 12 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

of the agreement process. You outline what particular respondent indicated that his institution the data requirements are likely to be and step is likely to represent a deviant case in this respect. through with the people collecting the data… [I4] I wouldn’t say we had a procedure, but an underlying principle…as a department we One respondent highlighted the difficulties of this don’t distinguish between direct and indirect relationship, suggesting a danger for models to evidence. We just think it’s all evidence. become data-led rather than problem-led. [I7]

We try as a matter of principle not to let data [We] are less beholden to statistical significance; blur the constraints of models too much but in other people are more conventional in their practice sometimes it has to. sort of…approaches. One…set of researchers [I5] may say that…there is no statistical significance of heterogeneity…[that] we can assume that all One respondent highlighted the use of initial this evidence can be pooled together may not searches and literature summaries as a means be considered by…another set of researchers to of developing an early understanding of likely be an appropriate sort of thing to do… evidence limitations, a process undertaken [I7] alongside clinical experts, before model development. Another respondent highlighted Interestingly, almost all discussions concerning what he considered a key event in the model the use of evidence in populating models focused development process concerning iterations entirely on clinical efficacy evidence; very little between the model structure and the evidence used discussion was held concerning methods for to inform its parameters: identifying, selecting and using non-clinical evidence to inform other parameters within We have a key point which we call data models, for example costs and health-related lockdown because that’s key to the whole quality of life. Across the 12 interviewees, it was process, if you are working to a tight timescale unclear who holds responsibility for identifying, you need to make sure the modellers have interpreting and analysing such evidence or enough time to verify their model with the how (and if) it influences the structure of the data. model, and how such activities differ from the [I4] identification and use of clinical efficacy data.

Differences between the model structure and the review data Implementation model A common emergent theme across respondents All respondents discussed a set of activities related was the idea that the data produced by systematic to the actual implementation of a mathematical reviews may not be sufficient or adequate for the model on a software platform. Such activities model, meaning that either the data would need either involve the transposition of the prespecified transforming into a format in which the model soft conceptual model framework into a ‘hard’ could ‘accommodate’ the evidence (for example quantitative model framework involving various premodel analysis such as survival modelling), degrees of refinement and adaptation, or the or the developed model itself would need to be parallel development of an implicit conceptual restructured to allow the data to be incorporated model and explicit implementation model. into the model. Model refinement …but it wouldn’t be just a question of understanding the clinical data, it’s a question Several respondents discussed activities involving of what it’s going to mean for the modelling. model refinement at the stage at which the model [I12] is implemented, although the degree to which such activities are required was variable. The interview A further point raised by one respondent data suggest that this aspect of model development concerned the Assessment Groups’ perception was a key feature for the three respondents who of what constitutes evidence for the model; this do not separate conceptual and implementation

13

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

model processes. As noted earlier (see Conceptual up a whole model but if necessary we would modelling), although these individuals do not rather do that than carry on with a model that draw a distinction between the conceptual and isn’t what we intended. the implementation, they do appear to develop [I5] ‘skeleton models’ to present to clinical experts, which evolve iteratively and take shape over time. Model checking I will actually literally take them through a patient path and actually run through, do a Given the extent of discussion around model step-by-step running of the model in front of checking during the interviews, a detailed analysis them. of current methods for checking models is [I5] presented (see Chapters 6 and 7). Model checking activities varied and appear to occur at various However, one respondent highlighted the potential stages in the model development process. One dangers of developing software models without interviewee suggested that model checking begins having fully determined the underlying structure once the model has been built; however, for other or having the agreement of experts: respondents, the checking process happens at various distinct stages. One respondent indicated …whether it’s adding something in or taking that model checking activities are undertaken something out there is the worry at the back of throughout the entire model development process. your mind that it’s going to affect something Whereas, subject to certain nuances in the precise else in a way that perhaps you don’t observe… methods employed, much of the discussion there is a danger if you make the decision to around current model checking activities focused include or exclude definitively that you might on testing and understanding the underlying regret it later on, it might not be a question of behaviour of the model and making sure that it states it might be a question of which outcomes makes sense: to model on as well… [I12] …it’s about finding out what the real dynamics of the model are so what are the key The interview data suggested that model parameters that are driving the model, where refinement activities were less iterative and do we need to then emphasise our efforts burdensome for those respondents who had the in terms of making sure our data points are conceptual model agreed or ‘signed-off ’ before precise? What are the sensitivities, in which implementation; this was particularly true for one case what subgroups might we need to think respondent: about modelling? Discussing how the model is behaving in certain situations. What the likely …then build an excel model…it’s an iterative outcome is. If the ICER is near to the critical process; you don’t want to keep on rebuilding thresholds for the decision-maker then where your model,…I can’t imagine what I’d do in we need to pay attention if the ICER is way excel which would make me rethink the way I off to the side, if you’ve got an output that’s structure my model…most of the big issues I dominated or if you have got an ICER that’s see are all about the sort of thought processes half a million pounds it seems likely that unless behind defining that decision problem, the model is wildly wrong the decision is going defining the structure, defining the core set to be fairly clear cut. So if you are operating of assumptions…if we can get agreement at the decision-making threshold how precise about that, the implementation of it is really and careful you have to be and what kind of straightforward. refinement you need therefore to build into the [I7] model. [I4] At the other extreme, however, one respondent mentioned rare occasions in which the research Other model checking activities included team had developed a model, decided it was not examining the face validity of the model results adequate and subsequently started from scratch. in isolation or in comparison to other existing models, internal and clinical peer review, checking Obviously it doesn’t very often happen that we the input data used in the model and checking have built a whole model and have had to tear premodel analysis can replicate the data, and 14 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

checking the interpretation of clinical data. Many …reportable convention standards can be of these approaches involve a comparison of the very important in ensuring everyone has a actual model outputs against some expectation of clear view of what’s being said. There are what the results should be; the interview data were ways in which model outputs can be more not clear how such expectations were formed. transparently depicted and the key messages conveyed to users more clearly, often some can Either the intuition or the modelling or the get lost in the text or hidden away somewhere data is wrong and we tend to assume that either intentionally or unintentionally. it is only one of them…I think you tend to [I4] assume that once they’re [clinical experts] not surprised by the thing then, that means you have got it right. Synthesised model [I5] development process

Two respondents indicated a minimalist approach Figure 1 shows a stylised representation of the to model checking activities: model development process generated through a synthesis of all interview transcripts and from the So we do just enough…just enough but not as charted descriptions of the model development much as you’d want to do. process elicited from the interviewees. The [I10] stylised model development process is comprised of five broad bundles of activities including; …it depends whether you’re error searching understanding the decision problem, conceptual for what I term significant errors or non- modelling, implementing the model within a significant errors, I probably would stop and software platform, checking the model, and finally the non-significant errors that are probably in a engaging with the decision. It should be noted that larger percentage of models than people would these activities do not perfectly match the coding like to know would remain. scheme used to undertake the Framework analysis. [I2] The general remit of these sets of activities is as follows.

Reporting Understand the decision problem This phase involves immersion in research evidence, definition The majority of respondents referred to of research question, engaging with clinicians, model reporting as the final step in the model engaging with decision-makers, understanding development process. This stage typically involved what is feasible, understanding what the writing the report, translating the methods and decision-maker wants/needs and engaging with results of the model and other analysis; and methodologists. engaging with the decision. Although little of the interview content concerned this phase of the Conceptual modelling It is a commonplace to process, two important points were raised within state that all mathematical models are built upon a the interviews. First, one respondent highlighted conceptual model. At its most banal it is impossible that the reporting stage may act as a trigger for to implement a model without first conceiving checking models; where the results are unexpected the model. The key issue in the decision-making or queried by internal or external parties. The endeavour and reflected in the modelling process second point concerned the importance of the is communication: specifically the process of interpretation of the model by decision-makers and developing a shared understanding of the problem. other process stakeholders: Conceptual modelling is the process of sharing, testing, questioning and agreeing this formulation I think the recommendation we would of the problem; concerned with defining the make is we need to pay more attention to scope of a model and providing the inputs to the understanding how our models are understood process of systems analysis and design associated and how we present them, a lot of work that with defining a solution to the problem. The scope can be done in model presentation. of a model deals with defining the boundary and [I4] depth of a model, to identify the critical factors

15

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

that need to be incorporated in the model. Models sources etc. A detailed description of current model are necessarily simplifications of our understanding checking activities is presented in Chapters 6 of a problem. The conceptual model allows a and 7. description of one’s understanding of the system that is broader than the description of the system Engage with decision This phase concerns the captured in the model. An implemented model reporting and use of the model with and by the will therefore be a subset of a conceptual model. decision-maker. This hierarchy allows simplifications represented in the model to be argued and justified. This process Activities concerning the use of evidence, peer of simplification is the process of determining review and other clinical consultation may arise relevance. within any or all of the five model development phases. The arrow leading from ‘understanding the decision problem’ to ‘implementation model’ Several points are of note in the interpretation reflects the views of three interviewees who implied of the diagram. First, as the diagram has been that they either do not build a conceptual model developed through a synthesis of all interview data, or that conceptualisation and implementation are respondents did not adopt a uniformly standard simultaneous activities. model development process, and respondents discussed different activities to varying degrees Implementation model This phase involves the of detail, the lines of demarcation between main implementation of the model within a software sets of activities were not entirely clear. Second, platform. the diagram is intended to be descriptive rather than prescriptive; it describes current model Model checking This stage includes all activities development processes rather than a normative used to check the model. This includes engaging judgement about how models should be developed. with clinical experts to check face validity, testing The dashed arrows in Figure 1 indicate significant extreme values, checking logic, checking data iterations between key sets of model development

Use of evidence and clinical input

Revise and rebuild 6 Modify model design in light of data availability 1 2

Understanding Conceptual Model Model Engaging with decision problem modelling implementation checking decision

Iteration between 4 understanding decision problem 3 and defining Generating/confirming agreement the research of appropriate structure question 5 Re-use existing model for multiple decision problems

FIGURE 1 Stylised model development process showing key iterations between sets of activities.

16

Title: 07-54-01 Proof Stage: 2 Figure Number: 00.01.ai

Cactus Design and Illustration Ltd DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

activities. It should also be noted that each model without a comprehensive understanding of individual phase is likely to involve iterations the decision problem. within sets of activities, e.g. there may be several versions of an implementation model as it is being The final iteration (marked as number 6 on Figure developed. 1) loops from model checking to understanding the decision problem. The three respondents The iterative nature of developing mathematical who highlighted that implementation and models is well documented. The interview data conceptualisation activities occur in parallel all suggest six key points of model iteration within the indicated the possibility of revising and rebuilding development process. The first substantial iteration the model from scratch. The same suggestion was (marked as number 1 on Figure 1) concerns not indicated by the remaining nine respondents. developing an understanding of the decision At the other extreme, one respondent indicated problem; this may involve iterations in terms of that there was no significant backwards iteration striking a balance between what the analyst believes once the implementation model was under way, are the decision-makers needs, what is feasible i.e. the conceptual model was not amended once within the project resources and negotiations agreed. therein.

The second notable iteration (marked as number Discussion of model 2 on Figure 1) involves looping between the development process development of an explicit conceptual model and its implementation in a software platform. Several The analysis presented within this chapter respondents highlighted the inevitable existence of highlights considerable variation in modelling a circular loop between these two sets of activities practice between respondents. This may be in whereby the intended conceptual model structure part explained by the apparent variations in is redefined in light of limited evidence available to expertise, background and experience between populate that structure, and whereby the evidence the respondents. The key message drawn from requirements are redefined in light of the intended the qualitative analysis is that there is a complete model structure. absence of a common understanding of a model development process. Although checklists and The interview data suggest two further loops good modelling practice have been developed, relating to iterations from the model checking these perhaps indicate a general destination of stage to revising or refining the conceptual and/or travel without specifying how to get there. This implementation model (marked as numbers 3 and represents an important gap and should be 4 on Figure 1). The interview data suggest that any considered a priority for future development. of the large number of checking activities currently undertaken have the capacity to result in backwards The stylised model development process presented iterations to earlier steps. in Figure 1 represents a summary of all interviewees’ stated approaches to modelling. It should be noted The fifth key iteration of note (marked as number 5 that the diagram does not entirely represent the on Figure 1) concerns the use of an existing model views of any single individual, but is sufficiently for multiple decision problems. In such instances, generalisable to discriminate between the views there is a loop between the final and first stages of each interviewee; in this sense, any modeller of the process. Two respondents highlighted should be able to describe their own individual examples of this iteration: one in terms of the use approach to model development through reference of ‘global models’ for use in different decision- to this diagram. As noted above, one particularly making jurisdictions, and one in terms of the apparent distinction that emerged from the ongoing development of independent models qualitative synthesis was the presence or absence of across multiple appraisals. As noted earlier (see explicit conceptual modelling methods within the Use of information in model development), several model development process. Although it is difficult respondents discussed the use of existing models to clearly identify a typology on these grounds as the basis for developing a model structure. In from a mere 12 interviewees, the qualitative data such circumstances, there is a danger that adopting are indicative of such a distinction. Related to this an existing model developed by another party point is the issue of the a priori specification of and applying this to a new decision problem could the model design and analysis, which represents effectively represent a loop back to a conceptual the leap from the conceptual model to the 17

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. The model development process

implementation model. Some respondents did factors’ can be determined. It is the conceptual discuss the preparation of materials which go some modelling process that is engaged with identifying way in detailing proposed conceptual models, and justifying what are considered important platforms, data sources, model layout, checking factors and what are considered minor or irrelevant activities and general analytical designs; however, factors for a decision model. the extent to which these activities are specified before implementation appears to be limited. The conceptual modelling process is centrally concerned with communication and this can be A common feature of modelling critiques that is supported by many forms including, conversation, reflected in the interviews is the statement that meeting notes, maps (causal, cognitive, mind etc.), different teams can derive different models for skeletal or pilot models. However, care needs to the same decision problem. This arises from the be exercised in using methods that focus on the largely implicit nature of existing conceptual problem formulation rather than formulation of modelling processes described by the interviews. the solution.26 It is suggested that focusing on the process of conceptual modelling may provide the key to The description of the model development addressing this critique. Furthermore, the quality processes adopted by the interviewees within this of a model depends crucially on the richness of the chapter is directly used to inform the development underlying conceptual model. Critical appraisal of the taxonomy of model errors (see Chapter 5) checklists of models all currently include bland and to provide a context for analysing strategies references to incorporating all important factors in for identifying and avoiding model errors (see a disease yet do not address how these ‘important Chapters 6 and 7).

18 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 4 Definition of an error

Overview Intention, deliberation and planning This chapter presents the description of key dimensions and characteristics of model errors and The interview data strongly suggested a general attempts to draw a boundary around the concept consensus that model errors were unintended, of ‘model error’, i.e. ‘what is it about a model error unplanned or non-deliberate actions; instances that makes it an error?’. The descriptive analysis whereby the modeller would have done things highlighted both overt and subtle variations differently had they been aware of a certain aspect in respondents’ perceptions concerning what or underlying characteristic of the model; or constitutes a model error. In seeking to explain, instances whereby the model produced unexpected assign meaning and draw a boundary around the or inconsistent results. The unintended actions concept of model error, the respondents identified of the modeller and the non-deliberate nature of a variety of factors including: the modellers’ actions when developing models appeared to be central to the interviewees’ • non-deliberate and unintended actions of definitions of model error. modellers in the design, implementation and reporting of a model • the extent to which the model is fit for purpose Extent to which the model • the relationship between simplifications, model is fit for purpose assumptions and model errors • the impact of error on the model results and A commonly cited dimension of a model error subsequent decision-making. concerned the extent to which the model could be considered fit for purpose. In particular, The above factors are discussed in this chapter, the respondents highlighted that a model could interview findings are then placed in the context of be unfit for its intended purpose in terms of the literature on model verification and validation. the underlying conceptual model as well as the software implementation model, so representing Key dimensions of model error distinct errors relating to model design as well as errors relating to the execution of the model. Table 2 presents the key characteristics and Respondents implied a relationship between the dimensions of model error as suggested by the 12 concept of model error and the extent to which interview respondents (note: this has attempted a model is fit for purpose, suggesting that the to retain the natural language of respondents presence of model errors represents a threat but does involve a degree of paraphrasing by the to the model’s fitness for purpose, whereas the authors). development and use of a model that is unfit for its intended purpose is a model error in itself. Table 2 highlights a diverse set of characteristics of model errors as discussed by interviewees. Several interviewees drew reference to concepts Evidently, there is no clear consensus concerning of verification and validity as a means through what constitutes a ‘model error’. Conflicting views which to define the concept of model error. Where were particularly noticeable in factors concerning interviewees used these terms there was a consensus model design, for example model structures, regarding their definition, with verification assumptions and methodologies. The respondents’ referring to the question ‘is the model right?’ and explicit or implied construct of a model error validation referring to ‘is it the right model?’. appeared to be considerably broader than Although there was general consensus that the ‘technical errors’ relating to the implementation of term model error included issues concerning the the model. verification of a model, there was less clarity about

19

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Definition of an error

TABLE 2 Dimensions/characteristics of model error

Dimensions which are perceived not to Dimensions which are perceived to constitute a model error constitute a model error a model that is not fit for the purpose for which it was intended software bugs something that causes the model to produce the wrong result unwritten assumptions that are never challenged something that causes the model to lead to an incorrect interpretation choices made on the grounds of evidence of the results/answer matters of judgement an aspect of the model that is either conceptually or mathematically inconsistencies with other reports invalid simplifying assumptions that are explained failure to accord with accepted best practice conventions – not doing something that you should have done use of simpler methodologies/assumptions use of inappropriate assumptions that are arbitrary aspects of the model that do not influence the results use of inappropriate assumptions that contradict strong evidence small technical errors with limited impact use of assumptions that the decision-maker does not own, do not feel comfortable with or cannot support overoptimistic interpretation of data choices made on the grounds of convenience rather than evidence mistakes that are immediately corrected something that causes the model to produce unexpected results reporting errors something you would do differently if you were aware of it an aspect of the model that is unambiguously wrong an aspect of the model that does not make sense an aspect of the model that does not reflect the intention of the model builder for an unplanned reason a model that inappropriately approaches the scope or modelling technique an unjustified mismatch between what the process says should be happening and what actually happens an inappropriate relationship between inputs and outputs an implementation model that does not exactly replicate the conceptual model something that markedly changes the incremental cost-effectiveness ratio irrespective of whether it changes the conclusion something in the model that leads to the wrong decision a mistake at any point in the appraisal process – from decision problem specification to interpretation of results

the relationship between model errors and validity. Further, it should be noted that one interviewee For instance, problems in the conceptualisation indicated that errors affecting the model validity, of the decision problem and the structure of the that is errors in the description of the decision model were generally referred to as ‘inappropriate’ problem and conceptual model potentially dwarfed rather than ‘incorrect’, yet nonetheless these were technical errors that affected its verification. described as model errors by some respondents. …my belief is that if you spend…the majority I think it’s probably easier to know that you of your time, as I say, getting the decision have made an error of verification where problem right, making sure you understand there’s clearly a bit of wrong coding or one data and how that data relates to the decision cell has been indexed wrongly or the wrong problem and decision model, that…the errors data has been used for example which wasn’t of programming itself, those ones which are intended. So those are all kinds of error that much more minor. are all verification type errors. Errors in the [I7] model rather than it being the wrong model which is a much bigger area the area of A repeatedly stated dimension of model error validation and there people can argue and it that emerged from the interviews was the failure can be open to debate. to adhere to best practice or failure to accord [I4] to current conventions. Alternative viewpoints 20 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

included the notions of failure to do ‘the best you can something new that wasn’t obvious from the do’ or failure to do ‘what you should do’. It was noted data that you start with. that in some instances this may be the result of a [I6] lack of skills. Further complexity was apparent in terms of the selection of modelling method and its Further to this discussion, one respondent relationship to model errors: highlighted a distinction between assumptions and errors by examining the underlying reason for the I don’t think I would class [it] as an error when use of the assumption: someone says that…they developed this hugely complex…kind of Markov-type model, and So you have to say, ‘Well, to what extent are then said they’d much rather have done it as a the choices being made being made on the simulation. grounds of evidence, and how much were they [I11] on the grounds of convenience?’ Now if they’re made on the grounds of convenience and in addition to that you know, from my point of Simplification versus error view they are mathematically illiterate, then that is an error, to me. They are unsustainable. The qualitative synthesis suggested mixed views [I3] between the concepts of model assumptions and model errors. For example, one respondent Contrary to this viewpoint, one respondent appeared to hold a strong view that all suggested that unreasonable assumptions and the inappropriate assumptions were model errors, overoptimistic interpretation of evidence did not highlighting the importance of the perspective of constitute a model error because of their deliberate the stakeholder (or model user) in distinguishing intent. An alternative suggestion for distinguishing between the two concepts: between assumptions and model errors concerned the perspective of the decision-maker; assumptions …One man’s assumption is another man’s were errors if the decision-maker does not ‘own’ error… them, does not feel comfortable with them, or [I3] cannot support them upon a wider appraisal of evidence. Other respondents highlighted a fine line between the necessary use of assumptions, in model simplification and the introduction of errors into Impact of error on the model: the model results and

If you have an inaccuracy in the model that subsequent decision-making has no, or a trivial effect, then it’s not obvious A number of interviewees suggested that the to me where that stops being a simplification existence of model errors is inevitable; that it is that you’ve made to make the decision problem only important to ensure that the model is free tractable, and where it becomes an error,… from significant errors which have an important something that’s inaccurate and we wish to influence on the model outputs and the resulting avoid…that point would be where it starts to policy decision. cause a risk that the model is giving a wrong answer, or is not addressing the question that …I think it’s rare to find any model without you wanted it to. an error in it somewhere…I think we’re [I6] deluding ourselves if we think…that there are no errors in our models...I think [that] even …it’s very rare that the model is completely if you identify one, that doesn’t mean there accurate in that it does exactly what it says it’s aren’t others, and if you don’t identify one that meant to be doing. And it’s never the case that doesn’t means there are none. the model is completely accurate in that it’s [I5] a full and complete picture of the world. And I’m very comfortable with that, because the An error is something that markedly changes whole point of doing your model is that it’s a the ICER regardless of whether it changes the simplification, and that you’re trying to extract conclusion. a simplified view of the world that teaches you [I2] 21

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Definition of an error

The problem is…can you really cope with that The definitions of model validation and verification error if someone’s making a policy decision provided by the HTA modellers interviewed in this based on that outcome? study were best captured by the description that [I4] verification refers to the question ‘is the model right?’ and validation refers to the question ‘is it One interviewee made the interesting observation the right model?’. These definitions are consistent that the impact of an error should be considered with the SCS definitions. Although the interviewees relative to the overall uncertainty in results: descriptions might lack the apparent rigour of the SCS definitions, it should be noted that this rigour …if it changes [the incremental cost- is perhaps somewhat illusory because in both cases effectiveness ratio] from £8266 to…£8267 the SCS definitions simply recast the problem technically it’s an error, the question is, those as one of defining what constitutes a satisfactory technical errors will be absolutely dwarfed by range of accuracy. the uncertainty in the efficacy of the drug, so I wouldn’t consider them an error. In searching for rigorous approaches to validation [I2] and verification of Operational Research models many authors draw reference to epistemology and This begs the question how wrong does something philosophies of science. In 1967 Naylor et al.28 have to be before it becomes a model error? rehearsed the opposition between rationalism and empiricism and proposed a three-stage Consequently, for some interviewees, this led to approach to model verification based explicitly a very broad boundary around what would be upon merging these principles; it should be noted defined as an error: that at this early stage the terms verification and validation were used interchangeably. Naylor et …it’s a mistake at any point in the…appraisal al. suggest that rationalism is represented in the process, from the initial specification of a initial search for a set of postulates regarding the decision problem through to the assumptions, components, variables and functional relationships the parameter inputs and the analysis that constitute a model and specifically that that goes behind that, through to the final the search for face validity of a model by its implementation and programming of an excel- accordance to expert judgement is a rationalist based model. And even then, an error in the perspective on validity. However, the authors interpretation of the results therein. consider these initial postulates only as ‘tentative [I7] hypotheses’ on the behaviour of the system and suggest that these postulates should be verified empirically, and that this verification constitutes Placing the interview an empiricist perspective on validity. The third findings in the context stage of validation suggested by Naylor et al., consists of testing the ability of a model to predict, of literature on model retrospectively or prospectively, the behaviour of a verification and validation system and represents an application of a positive economics perspective on model validity. The In 1979 the Society for Modelling and Simulation three stages outlined above are closely reflected International (SCS) defined the distinction between in the HTA interviewees’ description of the model model verification and validation.27 development process outlined in Chapter 3 and indeed in the discussion of simplification versus • Model validation is defined as ‘substantiation error contained earlier in this chapter. that a computerised model within its domain of applicability possesses a satisfactory range This early literature on model verification and of accuracy consistent with the intended validation focuses on modelling as a knowledge application of the model’. creation process. This perspective has led to • Model verification is defined as ‘substantiation validity being defined with reference to a model’s that a computerised model represents a ability to predict observable phenomena. In conceptual model within specified limits of certain methodological work this epistemological accuracy’. perspective has led to the definition of statistical criteria for the goodness of fit of a model as

22 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

a measure of its validity.29,30 None of the HTA modellers interviewed recognised the importance modellers interviewed pursued this perspective of of ensuring mutual understanding in the definition validity. of the problem and structure of the model, none of the interviewees explicitly raised the issue of In 1993 the European Journal of Operational model credibility in considering the definitions Research31 published a special issue on validation of model error, although this may well have been in modelling, with the purpose of raising debate assumed. The interviews demonstrate that HTA on this topic. The papers presented in this modellers tend to act in the manner described by special issue expand upon the epistemological Kleindorfer et al., using all means at their disposal roots of model validation methodology. A key to demonstrate model validity and gain credibility, characteristic of operational research modelling without necessarily referring to any philosophical is the focus, not on creating knowledge, but underpinning for this mode of action. rather on supporting decision-makers. Moreover, supporting decision-makers working within a Decision analytic modelling is essentially a broad social and political context, working with a Bayesian undertaking, both in the theoretical range of criteria and with particular approaches underpinnings and in the central role of the Bayes to the decision-making process. This focus on updating formula in the statistical analysis of supporting decisions underlies the importance decision problems. The Bayes formula provides a of the pragmatic, instrumentalist or utilitarian statistical mechanism for updating probabilities or philosophies for model validation. Under these beliefs in the light of new observations. There is a pragmatic approaches the usefulness of a model, substantial methodological and applied literature that is its success in action, is its measure of validity. on Bayesian methods. Most applications of the Déry et al.32 acknowledge Raitt33 and Rorty34 as first Bayesian approach in the HTA domain apply explicitly bringing this utilitarian approach to the the updating procedure to refining estimates of discussion of model validation. The interviewees’ model parameters within a model in the light of principle focus on fitness for purpose as opposed new data or in facilitating probabilistic sensitivity to statistical accuracy of models reflects strongly analysis. However, the same principle can be this instrumentalist viewpoint on validity, all of applied in approaching the validity of a model. the HTA modellers interviewed were consistent in In the Bayesian approach, the problem of validity adopting this perspective. is framed explicitly in terms of model credibility. Hence, the credibility, or measure of belief, in Kleindorfer et al.35 in 1998 reviewed the a theory or in our case mathematical model is philosophical underpinnings of model validation updated in the light of new evidence.38 and extended the discussion to the implications of Hermeneutics as propounded by Gadamer36 P(m|d) = P(d|m)P(m)/[P(d|m)P(m) + and more recently Bernstein.37 Hermeneutics P(d|not m)P(not m)] recognises that objectivity is unachievable and suggests that meaning is created through Where m = model and d = data. intersubjective communication. It is characterised by a description of rationality that is historically This latter approach is closely related to handling situated and practical, involving choice, structural uncertainty in models through model deliberation and judgement. Kleindorfer’s reading averaging methods. A number of issues arise in this of Hermeneutics is used to provide a basis for approach principally concerning the nature of the the primacy of interaction between the modeller subjective prior probability of a theory or model, and client in developing mutual understanding and the difficulty in accounting for the open of a model. It is this mutual understanding that nature of possible models, i.e. the impossibility establishes a model’s significance and its warranty. of constructing a complete set of possible models Kleindorfer et al. propose a framework for model to consider in the model averaging process or validation based upon the analogy of a judicial assigning an adequate assessment of the probability system, where the model builder is free to establish that none of the described models are correct. the credibility of the model through any reasonable Furthermore, it should be noted that there is means. They argue that most model practitioners almost a complete absence of discussion of model instinctively operate in this middle ground verification or operational validity of the model between objectivism and relativism in attempting within the decision-making process within the to achieve model credibility. Whereas the HTA Bayesian literature.

23

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Definition of an error

Sargent39–41 in a series of closely related conference The methodological literature on model validation proceedings describes a refreshingly practical becomes sparse after the 1993 European Journal approach to verification and validity of models of Operational Research Special Issue,31 somewhat without recourse to epistemology or philosophy. ironically given its stated aims of promoting Sargent uses a simple model of the model discussion. Citations of the articles contained development process, reproduced in Figure 2, to therein (searches undertaken January 2009) consist develop different aspects of model validity and mainly of developments of validation approaches verification. in specific application domains, for example Oreskes et al.42 in the earth sciences and indeed In this scheme the overall validity of a modelling this monograph in HTA. There are few generic project comprises the validity of the conceptual methodological papers41,43,44 and these focus on model, the veracity of the implemented model to pragmatic approaches to validation. the conceptual model and the operational validity of the results and interpretation of the results in The HTA modellers interviewed demonstrate a the decision-making process. It should be noted pragmatic approach to model validation. Although that using verification to describe the correct there may certainly exist sound philosophical implementation of the conceptual model in the underpinnings for this stance, it is unclear, firstly, computerised model, has resonance with the logical the extent to which this is a satisfactory position empiricist philosophical approach to verification as and secondly the extent to which practitioners discussed by Déry et al.32 However, this highlights are consciously implementing this as principled the fact that this definition of verification relies on pragmatism or are implementing an unprincipled their being an explicit and complete description laissez faire approach to validation. In their of the conceptual model that acts as a specification seminal 1967 paper, Naylor et al.28 described the of the computerised model. Where the description problem of ascertaining the validity of a model of the conceptual model is absent or incomplete, as ‘perhaps the most elusive’ unresolved problem this separation between the concepts of verification in computer simulation methods; the level of and validation breaks down. As has been discussed conflicting opinions and practices identified in in the previous section the absence of an adequate the literature and in the interviews undertaken description of the conceptual model is a common in this study indicates that there is still room for feature of HTA modelling practice. development in this area.

Problem entity (system)

Operational Conception model validity validation

Experimentation Data Analysis and validity modeling

Computerised Programming and Conceptual model implemention model

Model verification

FIGURE 2 Model of the modelling process. Reproduced with permission from the Institute of Electrical and Electronics Engineers, Sargent,41 ©2004 Institute of Electrical and Electronics Engineers, Inc. 24

Title: 07-54-01 Proof Stage: 1 Figure Number: 00.02.ai

Cactus Design and Illustration Ltd DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 5 A taxonomy of errors

Introduction the decision support process, through model development, implementation and use. This This chapter presents a taxonomy of errors in description of each individual’s perception of HTA models, with the purpose of developing a the model development process was then used common language for the development of methods as a vehicle for the discussion of the error types. and processes for identifying errors and avoiding This helped to ensure that the discussion of error errors in HTA models. The taxonomy provides a types, and the balance of error types across the basis for relating current strategies for avoiding modelling process, is not biased towards different and identifying model errors to the types of errors parts of the process by the nature of the interviews they are intended to impact upon (see Explanatory and perceptions of the interviewers. In addition, analysis concerning methods for preventing errors non-HTA literature regarding spreadsheet and in Chapters 6 and 7). programming error taxonomies was used as a basis for comment on the current understanding and Identifying errors and avoiding errors imply two as a means to further develop the taxonomy (see different aspects to the taxonomy. First, identifying Review of taxonomy/classification literature). errors implies a focus on error symptoms and error checking processes, whereas avoiding Table 3 below presents the distribution of comments errors implies a focus on the process of model from the interviews across the major themes. It is development and implementation. The taxonomy notable that over 70% of comments related to the is not designed with the purpose of providing problem definition, structuring process and the use a checklist for the critical appraisal of models. of evidence, whereas only 17% of comments related As a result, where terminology such as ‘error of to actual errors in the implementation of models. judgement’ is used, this is not meant to necessarily imply that a process is required or may exist for This focus among HTA modellers on errors in the identifying a particular instance of judgment as ‘softer’ side of the model development process is being in error. supported by explicit comments throughout the interviews. A two-stage process was used in the development of the taxonomy. First, an interview-based taxonomy …my concerns about quality of outputs and was developed based upon the framework analysis accuracy is much more around the design of of the in-depth interviews undertaken with a the conceptual model and asking the correct sample of the HTA modelling community. Before questions than it is about the implementing. discussing individual error types, the interviewees Which is not, that they’re not both important, were asked to describe their conception of but that it’s easier to do guidelines around

TABLE 3 Distribution of comments across major themes

Comments

n % Error in understanding the problem 21 9 Error in the structure and methodology used 71 31 Error in the use of evidence 70 31 Error in implementation of the model 38 17 Error in operation of the model 6 3 Error in presentation and understanding of results 20 9 Total 226 100 25

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

the technical aspects and it’s something that The key error domains in the understanding and as modellers you’re more interested in, it’s description of the decision problem identified by your distributions and your methods and your the interviewees were: environments, and it’s things that you can write down precisely and measure. • error in the definition of population and [I6] subgroups • error in the definition of interventions and But it’s much easier, I would think, to ask the comparators wrong question and [that] not get found, and • error in the definition of outcomes. for that not to be picked up until the end, than it is to make a major mistake in coding and for The charted interview data were examined to that not to be picked up. identify components that make up the description [I6] of a decision problem. Interviewees explicitly mentioned the definition of comparators, These concerns relate directly to the synthesis of interventions and populations as being important respondent’s views of what constitutes a model in the description of the decision problem and error (see Chapter 4), whereby inappropriate potentially being subject to error. Five out of model assumptions and matters of judgement were seven comments mentioned the comparators clearly considered to represent model errors by and/or interventions, which were sometimes some interviewees but not by others. indistinguishable, and three comments identified the population as a key element. Comments The next five sections on errors describe the included: interview-based taxonomy of model errors presented in Figure 3 structured according to the Their conceptualisation missed out strategies six major coded themes presented in Table 3, that were highly likely to be the most cost- noting that the discussion of ‘error in operation effective. of the model’ and ‘error in presentation and [I2] understanding of results’ have been merged, owing to the limited number of comments in these …we have to be very clear from the outset domains and their close association. The taxonomy that the specification of a decision problem… is structured vertically according to the major and I guess that’s got a number of different coded themes that represent different aspects of levels, which is, you know…what is the patient the model development process constituting error population that we’re interested in, what domains, these major domains are each broken are the relevant potential subgroups that lie down into subdomains. Types of errors described therein. by interviewees in each domain are then discussed; [I7] these are presented in column 2 of the taxonomy. For each type of error there has been an attempt Errors in the definition of populations, to identify potential root causes of errors, these are interventions and comparators seemed to be presented in column 3 and are grouped together associated with subtle aspects of their definition, again in column 4 to identify major themes in the for example when considering populations it is causes of error. important to appreciate the differences between subgroups, or in the definition of interventions or comparators it is the treatment strategy, that is Error in understanding the the definition of treatment sequences, or missed problem comparators that give rise to errors.

Well, I guess it all stems from what is the It is also notable that none of the interviewees decision problem, to start off with. So…you mentioned the definition of the outcome as know, no matter what you’ve done, if you’ve potentially problematic when discussing the got the decision problem wrong…basically description of the problem. This probably reflects it doesn’t matter how accurate the model an assumption that has become axiomatic among is. You’re addressing effectively the wrong HTA modellers that decision-makers require decision. and are interested primarily in the generic cost- [I7] effectiveness outcome of the incremental cost per

26 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25 Grouping of causes Grouping All stakeholders Communication: of – interpretation error Judgement evidence Model reporting Communication: gathering – evidence Methods error – interpretation error Judgement of evidence – methods selection error Judgement and – system analysis Methods error design and – system analysis Methods error design – modelling skills error Judgement Index 10 20 30 40 50 60 70 80 90 Figure Number: 00.03.ai Number: Figure Proof Stage: 3 Stage: Proof Root cause error of communication – failure in process Error stakeholders between exists Counterfactual evidence or equivocal is no evidence there Where in: should be transparency there evidence a) assumptions the description of any or judgements underpinning evidence b) c) the potential to impact on outcomes and decision-making constitute an error may Lack of such transparency or methods in process gathering in evidence Error in – error of evidence Misinterpretation of statistics misinterpretation judgement, error Judgement to consider – failure error Modelling process and design system analysis under time and resource error Judgement constraints – modelling skills error Judgement Description of error with clinical understanding Incoherent of the disease Inadequate description of the boundary and content of the assessment in simplifying assumption Error Obsolete or outdated model structures in conceptual modal structure Error selection of modelling Inappropriate method in selection of modelling software Error implementation of model Premature structure Title: 07-54-01 Title: Cactus Design and Illustration Ltd Taxonomy of Taxonomy errors modelling: in HTA parts I–III. Error domain Error in description of the Error decision problem Definition of populations and subgroups Definition of interventions and comparators Definition of outcomes and in model structure Error methodology used of conceptual model Development management) natural history, (Epidemiology, Selection of modelling methods conceptual model to from Moving implementation FIGURE 3 27

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors Grouping of causes Grouping gathering – evidence Methods error of evidence – interpretation error Judgement violation Direct Slips Within team Communication: Within team Communication: violation Direct of evidence – interpretation error Judgement gathering – evidence Methods error – methods selection error Judgement – statistical methods error Judgement – modelling skills error Judgement – modelling skills error Judgement Index 100 110 120 130 225 150 160 170 180 190 200 210 220 Figure Number: 00.03b.ai Number: Figure Proof Stage: 2 Stage: Proof Root cause error terms Inadequate search sources Incomplete use of evidence Deliberate selection of evidence Data extraction errors failure Misunderstanding through within team of communication failure Misunderstanding through within team of communication or Deliberate misinterpretation interpretation over-optimistic of judgement Errors gathering methods in evidence Error Inadequate methods of identifying and eliciting judgement subjective skills and training – experience, error Judgement error Judgement in modelling methods Error Description of error of the evidence Inadequate capturing of evidence in systematic review Error of data misinterpretation arising from Errors of the evidence Inadequate capturing of data in statistical analysis Errors in the selection of distributions for Error characterising uncertainty with critical incoherent Use of evidence study quality to adjust for – failure appraisal Title: 07-54-01 Title: Cactus Design and Illustration Ltd Error domain Error in the use of evidence Error processes Generic evidence of model structure Development data inputs Populating

28 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25 Grouping of causes Grouping Slips Slips – modelling skills error Judgement Slips Lapses Slips Model reporting Communication: Model reporting Communication: violation Direct Index 230 240 250 260 270 280 290 300 310 Figure Number: 00.03c.ai Number: Figure Proof Stage: 2 Stage: Proof Root cause error errors Typing errors Copying skills and experience error, Judgement errors Typing to update values/model switch Failure appropriately errors Transcription perception Misunderstanding decision-makers’ modelling, health economics, (disease, of problem decision-making approach) adequate range of information to present Failure an (this can imply drivers on outcomes and key description) in problem error of results interpretation Over Description of error calling in cell references/variable Error in data Error in text Error in formulae/operators Error in implementation of Error modelling methods in data Error variables in environment Error in results Errors in tables Errors in communication Error of results through in communication Failure loss of model credibility incorrect in drawing Errors results conclusions from Title: 07-54-01 Title: Cactus Design and Illustration Ltd Taxonomy of Taxonomy errors modelling: in HTA parts I–III. (continued) Error domain Error in implementation Error of the model Coding of individual cells Coding of logic structures in operation of the model Error Setting up model runs in the presentation Error and understanding of results of results Presentation of results Communications and conclusions FIGURE 3 29

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

quality-adjusted life-year (QALY) gained. However, The prime root cause suggested by the interviewees in discussing the presentation and understanding for errors in the description and understanding of results one interviewee indicated that of the decision problem was a failure of presentation of disease-specific results was a key communication between the stakeholders to the element in helping decision-makers to understand process. The interviewees identified three types of the nature of the model results: stakeholders:

So cost per QALY amalgamates a lot of things • the decision-maker (NICE or client) together and you might not really know what’s • the decision-modelling team (either an potentially driving the results quite clearly and academic Assessment Group or an Outcomes you might not necessarily know that there is an Research organisation) implication of some of the things that you have • clinical experts. done and you are not pulling them out so you are failing to really provide all the information to a decision-maker that they should have. Error in the structure and [I8] methodology used

It is noteworthy that interviewees focused solely on Three broad domains within model structuring can elements of the PICO (populations, interventions, be distinguished within the interviews: comparators, outcomes) description of the decision problem. This may indicate that interviewees • development of a conceptual model of considered that the PICO definition adequately the disease including a description of its captures the typical HTA decision problem. epidemiology, natural history and management • selection of modelling methods The errors identified by the interviewees applicable • moving from the conceptual model to an across these areas were: implemented model.

• definitions incoherent with the clinical Development of a conceptual understanding of the disease model of the disease • inadequate description of the boundary and including a description of the content of the decision problem. epidemiology, natural history These error types are illustrated in the interview and management responses presented below: The key subthemes identified by the interviewees were: …but you can see that in some of the NICE reviews…where the question was asked • boundary between simplification and structural incorrectly or the question was framed early error on in a way that was not coherent with the • obsolete or outdated model structures clinicians’ understanding of the disease. And • error in model structure. so the whole process flowed through and generated junk at the end. A common subtheme identified by nearly half the [I6] interviewees concerned difficulties in identifying when a simplifying assumption made within the …they didn’t realise I’d have to build a model structure constitutes an error, as discussed treatment model alongside it and then they earlier (see Chapter 4). hadn’t invited the right people to come to it… [they] dropped the ball if they believed you Any model is going to be built on a whole set of could do a screening [model] without having a assumptions, some of which should be written treatment model, but that wasn’t our error. up, will be written and some of them might [I2] not be and it is all those assumptions basically are the ones which in some way may cause us It’s not an error in the model because we did an error in terms of the results we are going what they asked us to, but they could have to get out because they are almost always a asked us a better question. simplification in terms of how we should have [I2] been doing things. There could be very good 30 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

reasons as to why you have done it but in some an error depends on whether a decision-maker senses they are going to potentially force you would have ownership of the assumption. down to a particular conclusion when it’s not Transparency represents a necessary condition necessarily the conclusion you should have got. for ownership of assumptions; as noted by the [I8] respondent [I8], not all simplifying assumptions are necessarily reported and transparent. So if there [was] something important in terms of how long these people were going to live or Where there is no evidence or equivocal evidence, what their quality of life was going to be, that the interviewees indicated that the key themes are would have been captured by a more detailed transparency in (1) the nature of the structural prognostic model, but is not being captured assumptions; (2) underpinning evidence or by the simplification, then it’s an inaccuracy judgment; and (3) the potential to impact on the that you’ve introduced. And you try and insure model outcomes and subsequent decision-making. yourself against that by looking at the whether Although interviewees did not explicitly make the there’s an established literature, there’s a analytical leap, the implication could be that lack consensus among the clinicians that response of such transparency may constitute an error in status is a good predictor for prognosis. process. There’s not a consensus; there’s a good body of published methodology that this simplification Three interviewees [I3, I4, I9] expressed concerns is acceptable for, for patients with similar about the use of obsolete and outdated model conditions. But unless you do both models, structures arising from the shifting nature of the you’ll never actually know whether you’ve evidence base. simplified something important out of it. [I6] Sometimes there might [be an] accumulation of minor differences, sometimes there’s a These two quotes demonstrate the nub of the paradigm change. problem in that it is difficult to determine with [I3] any confidence whether assumptions underlying the model structure might have an impact on the I think there is a danger in that as well you model results such that subsequent conclusions and know if you just slavishly adopt a previous recommendations are affected. structure to a model and everybody does the same there’s no potential for better structures Discussions focused on the role and nature of to develop or for mistakes to be appealed so I evidence supporting such structural assumptions. think there is a danger in that in some ways. There was consistent agreement among the [I4] interviewees that where available evidence contradicts an assumption then this would The above quotes indicate that an error may lie in constitute an error. However, where evidence failing to sufficiently capture the relevant evidence is absent or equivocal, interviewees were less base and as such constitutes an error in evidence consistent in whether it would constitute an error. gathering. Other errors in model structure were identified by interviewees as being caused by the …if you’ve developed a structural assumption misinterpretation of evidence. A cited example that the data would contradict, then you’d of such misinterpretation is in assuming that a reconsider that. But if it’s simply a case of non-statistically significant difference implies an absent data, then I wouldn’t seek to change equality between two treatment effects that can that, that assumption. therefore be analysed using a common model [I9] structure.

…if the assumption is arbitrary, or if the Someone who treats something…not to be assumption directly contradicts strong statistically significant who treats that as being evidence, then it is probably an error in the the evidence of no difference at all seems to sense that the decision-maker would not own it. me to be a huge error and has consequences [I3] for the rest of the model which is a bit difficult to describe. [The] counter position of having This second quote makes the interesting allowed the uncertainty around that non- observation that whether an assumption constitutes significant treatment effect or whatever it is 31

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

and see what the consequences are – that is an Nearly half the interviewees [I1, I2, I8, I9, I12] important one. identified potential errors introduced in moving [I12] from the conceptual model to the implemented software model. The process of moving from the Selection of modelling methods conceptual to the implementation of the model involves a set of activities and modelling decisions Three interviewees [I2, I3, I4] explicitly identified including: the selection of appropriate modelling methods as a potential area for error. Explicit examples • selection of modelling software platform included the selection of Markov methods in • development of a detailed realisation of violation of Markov constraints, inappropriate the conceptual model, possibly including use of Monte Carlo sampling methods, and the amendments, that arise in implementation of inappropriate decision concerning the use of the model cohort model or individual patient-level models. • design of the implementation of modelling methods. You are making a decision first of all as to what modelling method to use so there is a As discussed in Chapter 3, there is a wide danger there that you maybe adopt the wrong range of modelling practice among the HTA methodology. I suppose that is a potential community, particularly with regard to the explicit source of validation error as you set off using identification of conceptual modelling and model Markov when the Markov constraints stop implementation as distinct activities. you like simulation or an individual patient sampling model or something like that so there Two interviewees [I2, I12] indicated that the is a potential for an error there and that may premature selection of the modelling software not be picked up for a long time. and the premature implementation of the model [I4] structure (that is before the conceptual model has been fully developed) can lead to the generation of …as soon as you get interactions it becomes errors in models. Furthermore, these interviewees a lot more dangerous to work on a simple indicated that such failures in model design cohort model…just assuming the patients are can lead to a vicious circle wherein errors when independent. identified necessitate further reprogramming that [I2] is in itself highly prone to further errors.

…even when it comes to Markov type …the addition of states halfway through the modelling, not…Monte Carlo type things, model as regularly happens…just makes your because I find that there are very few people recoding of an excel model incredibly, well not who routinely use these packages who actually incredibly difficult, incredibly tedious which know the finer points; who understand the can mean that by being bored senseless you potential pitfalls. inadvertently introduce an error. [I3] [I2]

The above comments indicate that the selection The model grew from 30 megabytes to 180 in of appropriate modelling methods is a complex trying to fix this (that is a structural error in decision area and that the errors are associated the model) at a late stage in an excel model, if with sometimes fine errors of judgement. we had known this at the outset no way would we have created the model in excel. Moving from conceptual model [I12] to implementation model One interviewee suggested that time and resource There’s no such thing as the right way to constraints imposed upon the modelling process model. You can choose any way you like. It’s and the skills of analysts were factors in decisions just that some are, you know, easier to use than regarding implementation and design and others, more convenient. And so, you know, therefore might be underlying causes of error. your conceptual model can be realised in, in an infinite number of ways. …in terms of our skills it is subjective because [I3] arguably what you might have is a particular 32 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

modelling approach that’s been adopted that between these processes and, therefore, individual is less than optimal given constraints on time quotes concerning the use of evidence in model and research time to what could have been structuring may refer to different aspects of this produced to answer that research question. So process. A central theme raised by interviewees you may think a very simple decision analytical concerning errors in the development of model model where more appropriately it would have structure is the misinterpretation of data, for been a more sophisticated model…[that] more example the inappropriate generalisation of correctly follows the care pathways or where evidence from one environment to another. we believe there may be changes in costs and outcomes between the interventions we are If…you’re modelling an intervention, and looking at. the intervention is a clinical action which is [I8] predicated on the environment in which it has to be delivered and the prevailing ethos and, Sometimes exploring whether more and accepted norms of that clinical community, sophisticated models would actually have made which is why, you know, an evaluation done in a difference and again that’s…making sure you the States or in Brazil cannot be imported into have actually got the time and there’s your skill the UK without substantial adaptation. base. [I3] [I8] The interviewees suggested that the As noted in Chapter 3, one interviewee highlighted misinterpretation of the data can be associated a missing step in the model development process with simple misunderstanding of data definitions, as implemented by most HTA modellers, that is deliberate misinterpretation or errors of judgement a failure to consider system analysis and design in interpretation. before model implementation: In terms of what types of errors, well there’s …how we will be best able to implement the… errors that can be made in understanding, program?, how will it be the structure of the interpreting…the data, that’s quite an programming part? And we miss all this, so important area so it’s errors in how the data we end up building a model that will solve the has been used in the model. problem for sure, but I…I’m not sure it is the [I4] best…the optimal model. [I1] One root cause of misunderstanding suggested by two interviewees concerned a failure of communication within the Assessment Group; this Error in the use of evidence was particularly associated with separation of the information gathering, reviewing functions and Discussions with interviewees concerning errors in modelling functions. the use of evidence focussed on: Basically we have made errors before where • the use of evidence in informing the the modellers misunderstood the data that has development of model structure been given to us by the person sourcing the • the role of evidence in populating data inputs data so that’s quite an easy and obvious error. to the model [I4] • generic evidence processes. It is the problem of having effectively two Use of evidence in the models, one that’s doing a review of the development of model structure effectiveness and one that’s an economic model and they are not the same thing so they are The development of model structure is previously not translating directly one to the other. One defined as incorporating the development of the person does one thing, one person the other conceptual model of the disease, including the and they have to talk. epidemiology, natural history and management of [I8] the disease, selection of modelling methods and moving from the conceptual to the implemented Two interviewees explicitly identified deliberate model. Interviewees did not necessarily distinguish misinterpretation of data as a particular form of 33

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

culpable error. One interviewee suggested that selection of experts covering the breadth of this type of error was sometimes associated with opinion in a given area, and in ensuring effective direct and possibly indirect pressure from decision- elicitation of clinical information. makers to produce models in the absence of direct evidence, whereas other interviewees suggested Well there are obvious errors in terms of that if commercial modelling clients might exert indirect you don’t necessarily speak to the right people. pressure to provide modelled estimates of People have particular perspectives on a outcomes. situation and they may tell you what they think but it may miss out important aspects and the Yes I think it is this business of being deliberate implication of that is you have a care pathway but then it tends to be over optimistic which is biased for or against a particular interpretation of data is really what we’re intervention because you have missed out a criticising. benefit or you’ve missed out a problem with it. [I5] [I8]

They [NICE] made us build the model, and Statistical analysis of data, either in premodel there was no information! analysis or directly within a model, was identified [I1] by four interviewees [I2, I3, I4, I7] as an area of potential errors within models. Three specific One interviewee identified errors of judgement in example areas were identified; ranging from making ‘soft’ decisions about model structure as a complex issues of data synthesis or survival analysis potential root cause of error. to simple manipulation of data in modelling population subgroups; all of these impact upon …the most difficult ones aren’t just the model structuring. numbers; they are where there’s a qualitative decision. You know, is there any survival …then it’s actually how do you then use that benefit? Yes or no? If there is a survival benefit, data and synthesise it in a particular way? it’s a different model? You know what I mean? And I think there’s a whole series of errors So it’s structural. just waiting to happen at that stage in terms [I3] of assumptions, in terms of statistical sort of approaches… One interviewee identified the premature [I7] definition of model conceptualisation as a potential source of error in model structuring; this was I think survival analysis is an interesting area associated with making decisions regarding and it’s becoming a more prominent area the model structure that inadequately capture in many of our models in the way we model the evidence base, including subjective clinical survival and I think again it’s not such a clear judgement. The root cause of this type of error is a cut thing, because fitting a curve, determining failure of the process for evidence gathering for use the parameters for a Weibull curve in another in model structuring. source of potential error. [I4] …the approaches that have been derived from the basis of, of your first conceptualisation I’ve seen it happen elsewhere where people as you’e setting up to wed yourself that first have taken the average value and then for a conceptualisation so far, and to make sure subset multiplied it by relative risk without that are there changes in understanding when thinking that the average would then change. going from an ad hoc, non-systematic overview [I2] to something that is more systematic…That’s to say one of the dangers, I guess, could be that One of the key themes regarding survival we end up, because of the time constraints, modelling is in the selection and justification of developing an idea of the area before you go models for extrapolating survival. off and see your clinical experts. [I9] It’s this classic situation of a chronic disease, and what people are doing is extrapolating Another interviewee suggested that errors arose straight lines. Indefinitely. OK, what have you from the inadequate capture of subjective evidence got in the whole of the rheumatoid arthritis 34 from clinical experts, from the inappropriate literature is built around, modelling for the DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

HAQ [Health Assessment Questionnaire] scale, in outcomes. A central requirement of PSA is the which is closed at both ends. Right? So, what characterisation of all parameters within the model will happen to any trend in HAQ? Well, it won’t using quantified statistical distributions. However, go like a straight line. It will asymptote towards interviewees recognised the potential for error a limit. It may be the maximum of the scale in characterising input parameter uncertainty or the minimum of the scale, or somewhere in through the specification of adequate distributions: between, but it will certainly, the rate of change will decrease over time, OK? What does that I would suggest it’s to do with the transparency mean? Well, if you, if you compare a straight of the inputs that are required, and the line extrapolated against the trend, what you, robustness of the analysis to model parameters what you are doing is always overstating your which are not within the range that’s expected. benefit. Systematically. The net result is that [I6] you’re virtually halving your ICER. [I3] Two specific issues were raised by interviewees. The first concerned the use of triangular distributions …you can fit a polynomial trend line, but to characterise parameter uncertainty. you don’t know what happens the moment you drop off the end of the data. It could go The widespread use of, you know, so-called anywhere so the argument really was to go triangular distributions. Good grief what is back to causality, and go through the metabolic a triangular distribution? You know, the fact processes that are driving the changes, and that people who call themselves statisticians then model those and then demonstrate recommend using constructs like that. I don’t, effectively calibrate those against clinical it just drives me mad. I mean just you know, it’s evidence. And then you’ve got…a basis on illiterate, really. which to project forward. [I3] [I3] The second, noted that the characterisation of A root cause of these errors was identified by highly skewed distributions can lead to errors. one interviewee as errors of judgement related to the experience, skills and training of analysts If the distributions are really skewed that’s undertaking this element of the modelling work. happened in the past, that’s the one where I can recall where I got the answer wrong I guess it’s just the statistical techniques because it’s got a really skewed distribution of themselves, you know. I guess people don’t relative risk that may have had a mean of or understand what they’re doing sometimes… a median of 2.8 and an upper a log normal it’s sort of less experienced people trying to do distribution with an upper of 31 and you don’t fairly sophisticated analysis…without actually need many [samples] within the 31 or above 20 necessarily knowing…you know, exactly the times relative risk in order to make something statistical methods they should be using. cost-effective. [I7] [I2]

…it’s errors in judgement, I think, as opposed Two interviewees [I6, I12] explicitly referred to to errors in what people are doing. I just think problems in interpreting evidence from poorly it’s…you know, whether I think you can pool… reported or poorly executed clinical studies. you know, different types of utility studies together is, is my sort of analytic judgement …when I go look in the literature, I have one versus somebody else’s. But it could quite study published in 1984 for 17 patients that clearly result in an error. doesn’t even report standard deviations. So [I7] I can follow the guideline, and I can create a gamma distribution about that, with a bunch The role of evidence in of arbitrary parameters, and that will then give populating data inputs to the me a nice neat 95% confidence interval in my model output. But the quality of that information is no better, and probably worse, than the Probabilistic sensitivity analysis (PSA) is recognised information I had to basically make up to put as the preferred method for estimating mean into the distribution. values of outcomes and describing uncertainty [I6] 35

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

There is one that’s occurred to me that isn’t to In considering generic systematic review activities, do with model building, we are relying on what two sources of error were identified by the other people have done, we are dependent interviewees. First the potential for mismatch in on their having reported things correctly and the interpretation and definition of data between what they did in the first place having been the systematic review and the decision model, appropriate. I guess that might be if we want and second, the potential for simple transcription to be consistent with what’s being done in a errors in the data extraction activities within the clinical trial that presupposes to some degree systematic review. that what was done in the clinical trial was right in the first place. You might have a result from the systematic [I12] review that is because of the type of data restrictions they are facing is clinically The specific role of critical appraisal of clinical implausible so the actual estimate and level effectiveness studies is to identify the quality of of precision they have around the estimate is execution and reporting of studies. There is an just not plausible so you think what could be absence of recognised and accepted methods in clinically plausible so you use that to define how to incorporate such quality assessment into what are the numbers you put into the model. the quantitative interpretation of study results. One [I8] interviewee indicated that this issue is not covered adequately by existing guidelines in characterising I think there are two sources of errors: one uncertainty. which is the sheer getting those numbers out, and making sure that no errors actually Generic evidence processes happen in terms of the transcription…and I think there’s a huge source…of error at that Interviewee comments on generic evidence issues stage…again depending on the size of your focused on searching for evidence and systematic project etc., you’d hope that it would be kind of reviewing of evidence. extracted by two individual people with some kind of consultation, but…I’m not sure that You have to be careful to look for the right sort always happens. of search terms. [I7] [I12]

…you can look to see what they have done Error in implementation of and see whether there is any, the same way you the model peer review…are there any data sources missed should they have done other stuff within Errors in implementation identified by the analyses, sensitivities, so that all comes in, you interviewees can be classified into the following two would be stupid to blind yourself to it. domains: [I2] • coding of individual cells I suppose in terms of when you’re critically • coding of logic structures within the model. appraising something, has a potential effect been missed out...I mean you can look at Coding of individual cells something missing from the model that says, well, that’s just because the data weren’t there, The interviewees identified that cell contents can people missed it, or it’s something that, you be subject to: know, the model’s been structured deliberately in a way to produce a given outcome. • error in cell referencing/variable calling [I9] • error in values • error in cell text Underlying the theme of missing data are issues • error in formulae/operators. of intention versus accidental error, application of adequate processes and methods, appropriate level The cell referencing within a spreadsheet is the of skills in team members and the application of primary mechanism for representing the logical judgement. structure of a decision model:

36 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

…it should map out your conceptual model Interviewees only identified two basic root causes it should be an identical replica to the of errors in cell codes: simple typing errors and conceptual. copying errors. [I2] …there are so many different types I think, However, the most frequently discussed error in from, you know, basic slip-ups where, it’s pretty model implementation concerned incorrect cell obvious that you’ve made a silly mistake, a referencing. typing mistake or copying something from one cell to another wrongly, all those sort of very There’s errors in the wiring of the model, mechanical type problems. references that aren’t correct, coding that is [I3] wrong within the model. [I4] The operators and functions within a model were not explicitly identified by interviewees as being If you’re in excel referencing is quite often a subject to implementation error. Where operators big error, you’ve referenced it to the wrong cell and functions were explicitly mentioned the focus and you don’t notice [because] you reference concerned structural errors. However. it would 2000 cells in your model. be reasonable to assume that these elements of [I2] a cell code or programme code could be subject to similar root cause errors, i.e. typing error or The issue of intention was also raised as a defining copying error. characteristic of an error: Well the typos is factually incorrect, the second Likewise a programming error where you one, the referencing is structurally incorrect misreference cells, presumably you would do but the entire thing could be built perfectly but it differently if you did it again because you still be wrong because your entire structure’s misreferenced it in the first place so that is an round the bend. I could give you a model that error. says well for cancer the ICER is 2 × A + 3 × B [I12] and A and B are just some numbers but mathematically that’s correct, but structurally There was also some discussion about the relative and conceptually it’s a load of XXXX. propensity of different software platforms to logical [I2] errors through misreferencing or errors in calling variables: Coding of logic structures within the model …there are…the technical errors that are… related to…all these little cells in excel and, One interviewee gave two examples of a broader er…calling a wrong variable. class of errors in implementation which concerned [I1] the incorrect coding of logic structures. These examples were errors of implementation of Errors in values within a model and errors in cell standard methods that affect the spreadsheet more text were also identified as key issues. broadly than simple random errors in the contents of single cells. The root cause of these types of …it wasn’t the point estimate that had been errors was indicated to lie in the judgement, entered wrong, it was the confidence interval knowledge or experience of the modeller. and things like that are difficult to spot…I couldn’t see what was wrong at first it was only So the accepted convention for discounting looking at that and thinking that can’t be right. is that you discount after the first year in [I12] whole numbers. Right? So this 12 months is not discounted, the next is discounted at …we’ve put in this process somebody separate, 1 year and 2 years and so on. Right, well, you a health economic modeller, will get a couple know, people will correctly use the correct of days to look into the model, check all our discounting formula from point zero, so they links, check, you know, even down to spelling are discounting every day from then onwards, in text in there. Try and break this! Give it a OK? It’s not technically incorrect, but it is kicking. conventionally incorrect and inappropriate [I10] if you are then going to make a comparison 37

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

with another study in the same area that’s Where I’ve written 4.3 I actually meant 5.3 but discounted in a different way. So then your the model’s right and it’s just me writing down answers are then not comparable. the number wrong. That’s not a modelling [I3] error, it’s a reporting error. [I2] …when I first learned this sort of thing you always had to think very carefully about ...because we will know what the tables are randomisation and the order in which you going to look like prior to the results being generate random numbers. Because you can produced we will effectively have dummy inadvertently bias your results if you don’t get tables, though after they will be completed they it right. will be looked at to see if they make sense, they [I3] will be checked by the other member of the project team. [I8] Errors in operation of the model and model reporting The root cause identified for the above examples concerns a simple typing error in transcribing the The key domains discussed by the interviewees in results from the model to the report. this section were: Communication of results and • operation of the model conclusions • presentation of results • communication of results and conclusions. One interviewee discussed the potential for errors in the communication of results to the decision- Operation of the model making audience.

Two interviewees [I4, I6] identified the potential …something that you…may not have to make errors in setting up the model to generate considered actually…is error in presentation outputs, this was particularly relevant when or understanding; that although the model generating large numbers of sensitivity analyses outputs are reported correctly they may not and was primarily related to ensuring that all have been reported in a way that the intended environmental variables, model switches and data users understand properly so users may were set to the intended values for the specific misunderstand the presented outputs and I model run. think it’s an area that is not often given enough attention. For some of the simpler models we leave [I4] the one-way sensitivity analysis to be done manually, which, of course, gives you an Hence, the communication of results is not clear- opportunity for errors at that stage. cut and the root cause of this failure lies in the [I6] modelling team’s understanding of the external audience, specifically in terms of understanding You can have errors in the way the model the needs, requirements, experience and language output is resolved and you can have errors in of the external audience. the way the model is used. [I4] I think it’s very easy when you are working on the inside of a project to assume that the Two primary causes of these errors are implied, knowledge you possess…the things you take for first simple typing errors and second failure to granted are readily accessible to the people you update values of variables and model switches. are communicating to and that is not always the case. Presentation of results [I4]

Two interviewees [I2, I8] identified the potential Two interviewees discussed failures of for making errors in the presentation of results into communication through loss of model credibility the text and tables of reports. with the decision-making audience. Once

38 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

credibility of the model is lost, the interviewees referred to the potential for the overinterpretation experienced grave difficulty in regaining it. In both of results in drawing conclusions. instances the loss in credibility was associated with the presentation of counterintuitive results. There is interpretation isn’t there and I mean there is overinterpretation of results. …we had one example of a model where we [I5] were dealing with quite a large number of technologies but the model was actually being I suppose there [are] things like saying that… driven by adverse events rather than by the looking at the cost-effectiveness acceptability prime reason for giving the treatment and curve and saying well if this one is at 51% then it was quite a struggle not so much with the it is the preferred option. people within our own team but actually the [I5] clinical experts who had read the model and then had come along to NICE and it was quite a struggle explaining why we were actually Review of taxonomy/ getting plausible results even though we were classification literature getting widely different effectiveness results from a number of technologies that were Quantity and quality of the equally effective for the primary purpose but studies identified it was because they had different profiles for Searches identified eight studies45–52 that presented adverse events and that was what was driving original work on classifications or taxonomies the model but then of course it goes to the of errors, and one critical review.53 Of the eight technical leads and if there is anything that the studies all except two focused on errors in technical leads don’t understand that comes spreadsheet systems, one focused on generic back to us so again. programming errors48 and a second focused on [I5] errors in the requirements specification process.50 All of the papers were conference proceedings, with I was presenting a model and I got quite a lot the exception of one,48 which was published in a of stick, ‘this is meant to be a final version of peer-reviewed journal. A brief description of the a model – what is this glaring error?’ And it studies is presented with the spreadsheet systems wasn’t an error in the model, it didn’t arrive papers discussed first and in chronological order. at something that was consistent with some Table 4 presents the error classifications from each external view of what it should be and it was of the spreadsheet studies aligned to highlight taken to be an error in the model. congruent themes. [I12] The interview taxonomy is set in the context of The credibility of model results therefore this literature (see Placing the interview taxonomy depends on them either being intuitively of errors in the context of literature on error acceptable, or for counterintuitive results to classifications). have an immediate rational explanation through presentation of an adequate range of information Panko RH, Halverson RP. on outcomes and key drivers. The tailoring of Spreadsheets on trial: a survey the presentation of results to such circumstances of research on spreadsheet of course depends on the modelling team having 45 a perception of what the audience might be risks. 1996 intuitively expecting. As noted earlier (see Error Panko and Halverson45 report a review of in understanding the problem), one interviewee literature including experiments in spreadsheet explicitly mentioned the benefit of generating development, field audits of spreadsheets and a larger range of disease-specific outcomes than observational studies of spreadsheet development. the generic cost per QALY, explicitly to aid Methods for identifying the literature are not decision-makers in understanding the nature of reported. the model results. The purpose of the paper is to summarise With regard to errors in drawing incorrect research findings on the risks inherent in end-user conclusions from results, one interviewee explicitly spreadsheet development. The paper uses this

39

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

literature as the basis for developing a framework of spreadsheet research using a subset of the of spreadsheet risks and organises a discussion of literature referenced by the earlier Panko and the research findings around this framework. Halverson45 paper.

Three dimensions of risk are identified: The error classification cited in the paper and methodology, life cycle stage and research issues. used in the design of the experiment is the Research issues are described as the dependent Panko–Halverson45 classification with qualitative variables of the research surveyed, the principal errors further subdivided into ‘jamming errors’ examples given are structural concerns about and ‘duplication errors’. These additional error spreadsheets as a development platform and errors classes being referenced to an earlier monograph in spreadsheets. Second, the life cycle stage is by Panko.54 Jamming errors refer to the practice of described as important because of the changing entering values within a formula, and duplication characteristics and frequencies of errors throughout errors refer to the practice of defining a variable the different stages of spreadsheet development more than once within a spreadsheet. and use. Five life cycle stages are identified; requirements and design, cell entry, draft stage, Rajalingham K, Chadwick DR, debugging stage and the operational stage. The Knight B. Classification of methodology dimension is not discussed in detail spreadsheet errors. 200047 in this report. These authors present a classification based The Panko–Halverson 1996 classification upon a ‘thorough review of literature relevant differentiates first between qualitative errors to spreadsheet development’. The methods for and quantitative errors. Quantitative errors are undertaking the review are not presented and the flaws that ‘lead to incorrect bottom-line values’. source materials used are not referenced. Qualitative errors are described as ‘flaws that do not produce immediate quantitative errors. But The authors state that the purpose of the taxonomy they do degrade the quality of the spreadsheet is to facilitate analysis and comprehension of errors model and may lead to quantitative errors’. to devise solutions or methods of detection.

Panko and Halverson classify quantitative errors The classification, presented in Table 4, is defined into mechanical, logic and omission errors. as a binary tree. The first level divides errors into Mechanical errors refer to simple slips including system-generated, that is, generated by bugs in typing and pointing errors. Logic errors include the software, and user-generated errors. Panko designing an incorrect algorithm or ‘creating and Halverson45 is referenced in dividing user- the wrong formula to implement the algorithm’. generated errors into qualitative and quantitative Omission errors are ‘things left out of the model errors. that should be there’. Qualitative errors are divided into semantic The authors explore further possible subdivisions and maintainability errors. Semantic errors of logic errors. One suggested subdivision is are described as being related to distortion into Eureka or Cassandra errors, the first being or ambiguity in the meaning of data and errors that are readily identified and proven to maintainability errors are flaws in spreadsheet be errors as compared with Cassandra errors that design or implementation that make it hard to are not identified or demonstrated to be errors. update or are error prone in use. It is not clear An alternative subdivision is into pure logic errors that this division of qualitative errors is exhaustive ‘resulting from a lapse in logic’ or domain logic or mutually exclusive. Semantic errors are further errors resulting from a lack of domain knowledge. subdivided into structural and temporal errors.

Teo TSH, Tan M. Quantitative Quantitative errors are first divided into accidental and qualitative errors in errors or reasoning errors. Accidental errors are spreadsheet development. then categorised by perpetrator, of which three 46 different types are identified, ‘developer’, ‘data 1997 inputter’ and ‘interpreter’. The constraint of This conference proceedings paper by Teo and using a binary classification forces the authors to Tan describes a laboratory-based experiment in group ‘data inputter’ and ‘interpreter’ together spreadsheet development using business school as ‘end user’. For each perpetrator a tertiary 40 students. The literature review gives a brief outline classification of errors into ‘omission’, ‘alteration’ 41

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25 Context errors: section algorithm design, requirements Formula errors: wrong algorithm, wrong expression of algorithm Slips: sensory- motor errors (typing, pointing) Lapses: memory errors Mistakes Slips and lapses Quantitative Qualitative 52 Panko –  –  –  –  –  –  –  2008 –  Spreadsheet errors Violations Blameless errors Mathematical representation Modification Update Real world knowledge Syntax Logic Insertion Update Visible Hidden Temporal Domain knowledge Implementation Accidental Structural Reasoning 51 Purser 2006 Quantitative –  –  –  –  –  Qualitative –  –  –  –  –  –  –  –  –  –  –  Spreadsheet errors Modification Deletion Modification Deletion Mathematical representation Insertion Update Insertion Update Real world knowledge Syntax Logic Structural Data Input Domain knowledge Implementation Visible Hidden Temporal Accidental Reasoning Structural 49 Rajalingham 2005 Quantitative –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Spreadsheet errors Qualitative –  –  –  –  . Omission Alteration Duplication Omission Alteration Duplication Omission Alteration Duplication Data Inputter Interpreter Real world knowledge Mathematical representation Syntax Logic Developer End user Domain knowledge Implementation Structural Temporal Accidental Reasoning Semantic Maintainability Quantitative Qualitative 47 Rajalingham et al 2000 –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  Spreadsheet errors System generated User generated –  –  –  –  –  Mechanical errors Logic errors Omission errors Jamming errors Duplication errors 46 Teo and Tan Teo 1997 Quantitative –  –  –  Spreadsheet errors Qualitative –  –  Domain errors Pure logic errors Eureka errors Cassandra errors Mechanical errors Logic errors Omission errors 45 Panko and Halverson 1996 Spreadsheet errors Quantitative –  –  –  –  –  –  –  Qualitative Classifications of spreadsheet errors Reference Author Date Scope Classification

40 TABLE 4 41

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

and ‘duplication’ is then defined, that is leaving The focus on error identification leads the authors something out of the model, amending content of to make the observation that ‘all errors exhibit the model incorrectly or duplicating either data or characteristics’ and further that classification of structural components of a model. It is not clear error types based upon these characteristics would where primary slips in model development, such as provide a classification or taxonomy of particular accidentally mistyping a formula, would fall in this use in error identification. However, the authors categorisation. state that the characteristics of error types are ‘beyond the scope of this research’. Reasoning errors are categorised into domain knowledge errors and implementation errors. As an alternative, the authors review previous Domain knowledge errors are divided into real classifications of errors including Panko and world knowledge errors and errors in mathematical Halverson45 and generate a further revision of the representation. These are similar to the pure logic classification of Rajalingham49 with the objective and domain logic errors described by Panko and of removing the remaining duplication within the Halverson.45 Implementation errors arise from bifurcation tree and tailoring it for the purposes of incomplete knowledge of the functionality of the error identification. platform being used and are divided into errors of syntax and logic. The classification of qualitative errors and quantitative reasoning errors remain unchanged. Rajalingham K. A revised By removing the classification of accidental errors classification of spreadsheet into ‘structural’ and ‘data input’ errors the authors errors 200549 remove the duplication of insertion, update, modification and deletion errors. In 2005 Rajalingham presented a simplified version of the earlier classification,47 that The results of the investigation are equivocal incorporates two main modifications. First, the and not all the conclusions made by the authors dubious classification of qualitative errors into are entirely supported by the results of the semantic and maintainability errors is removed investigation. and the previous structural errors classification is further subdivided into visible and hidden The authors use a quasi-experimental language errors. Second, the observation that developers to describe the investigation, proposing a series of spreadsheets tend to be end users means that of hypotheses. However, there appears to be no the error perpetrator classification of accidental serious attempt at randomisation in the study and errors is superfluous. Instead, accidental errors no statistical hypothesis testing is undertaken to are classified into structural errors or data errors. support the conclusions. The results suggest that Both structure and data are then subdivided into awareness of the examined error types makes little insertion and update errors and update errors are difference to overall error detection rates (slight further divided into modification and deletion worsening); however, subgroup analysis suggests errors. The classification of reasoning errors is left that identification of qualitative errors actually gets unchanged from the earlier paper.47 markedly worse and quantitative errors better. The conclusions however focus on the improvement in Purser M, Chadwick D. Does qualitative error together with the statement ‘and an awareness of differing types thus logically awareness of spreadsheet error types of spreadsheet error aid end- should aid the user in identifying quantitative users in identifying spreadsheet spreadsheet errors identification’. It is possible that 51 the central weakness of this paper arises from the errors. 2006 authors’ failure to develop a classification of errors Purser and Chadwick in this conference proceeding based upon observable error type characteristics, as describe an investigation of the effect of experience they themselves identify. and error type awareness on the identification of errors in spreadsheets. The investigation uses Panko RR. Revising the a web survey including an error identification Panko–Halverson taxonomy of exercise undertaken with professionals who use spreadsheet risks. 200852 spreadsheets as part of their work and students from Business, Computing and Mathematics This paper, yet another conference proceeding, schools. As a preliminary part of the project a reiterates the three dimensions of spreadsheet 42 classification of error types is devised. risks; risk category (or error), life cycle stage and DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

methodology from the author’s earlier work.45 The authors differentiate between failures, faults The paper presents the original classification45 and errors. Runtime failures are defined as together with a further discussion of its roots in situations where a programme’s behaviour does human error research.55 The modified classification not comply with its design specification. A runtime starts with a division of all errors into blameless fault is defined as ‘a machine state that may errors and violations. Key examples of violations cause a failure’ and a software error is defined as are defined as puffery, fraud and a failure to follow ‘fragment of code that may cause a fault during organisational policies regarding, for instance, execution’. Following reason,58 cognitive processes spreadsheet development, testing and archiving. underpinning the occurrence of errors in software development are classified into skill breakdowns, Blameless errors are as before divided into rule breakdowns and knowledge breakdowns. qualitative and quantitative errors. Quantitative errors, again following human error research, Skill breakdowns relate to inattention or are divided into mistakes, or errors of incorrect overattention: inattention being associated with intention, and errors of incorrect execution. five types of breakdown, strong habit intrusion, interruptions, delayed action, exceptional stimulus Mistakes are divided into context errors and and interleaving and overattention breakdown formula errors. Formula errors include both types being omission and repetition. To illustrate, mistakes in the design of an algorithm and interruptions during programming may cause mistakes in the expression of an algorithm. attentional checks to be missed leading to an action Panko uses a corollary with a study examining being skipped or a goal being forgotten, unusual the writing process56 to demonstrate the scope stimuli may be overlooked and appropriate actions of context errors. Accordingly, in writing, a not taken or attentional checks in the middle of a hierarchy of abstraction levels is described routine action may lead to an assumption that an including overall purpose, document, chapter action was not completed leading to repetition. paragraph and word. In writing each word the writer has to maintain the integrity of the entire Rule breakdowns are classified into wrong rules, hierarchy. Spreadsheet development has an exact ‘use of a rule that is successful in most contexts, corollary of abstract levels including; the decision but not all’ and bad rules, ‘use of a rule with problem requirements, spreadsheet, module, problematic conditions or actions’. Knowledge section algorithm and formula. Hence, just as in breakdowns are categorised into bounded writing, in spreadsheet development the integrity rationality problems; selectivity, biased reviewing of the hierarchy needs to be maintained. Panko and availability bias, and faulty models of the differentiates errors in formulae and errors in problem space, including simplified causality, context as referring to the rest of this hierarchy. illusory correlations, overconfidence and Omission errors are included in these context confirmation bias. errors. Three life cycle stages are defined, specification, Errors of incorrect execution are categorised into implementation and runtime activities, the slips, that is sensory-motor errors, such as typing programming activities in each stage are errors and pointing errors, and lapses, that is considered in relation to the types of cognitive memory errors. breakdowns. The authors introduce the important concept that although individual breakdowns may Ko AJ, Myers BA. A framework give rise to software errors, chains of multiple and methodology for studying breakdowns, errors and faults may be involved in the causes of software errors in giving rise to a runtime failure. programming systems. 200548,57 Walia GS, Carver J, Philip T. In this paper Ko and Myers provide a review Requirement error abstraction of previous studies that have classified bugs, and classification: an empirical errors and problems in a range of programming 50 languages. This review, together with previous study. 2006 studies on the cognitive difficulties in programming Walia, Carver and Philip present a classroom and general human error mechanisms, has been experiment investigating the occurrence of errors used to define a framework for describing the in the requirements specification phase of software occurrence of errors in programming systems and development. As a precursor to undertaking this a methodology for studying them. classroom study the authors develop a requirement 43

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

error taxonomy, outlined in Table 5. The use of Explicitness concerning the purpose of a taxonomy this classification was evaluated qualitatively by the is a key factor in its development and design. study participants and all errors identified in the Hence the failure of the study by Purser and study could be classified. Chadwick51 into error detection can be ascribed at least in part to weaknesses in their classification system. The use of a slightly modified version Placing the interview of the Rajalingham49 system, rather than a taxonomy of errors in the classification based upon error characteristics or symptoms as discussed in their introduction, was context of literature on inadequate for the purpose of improving error error classifications detection. Similarly, Teo and Tan46 used the Panko 45 Purpose and design of error and Halverson classification as a basis for a laboratory study of spreadsheet development. So classifications as to avoid bias in the experimental design the The error classifications reviewed have been investigation is restricted to those error categories developed with different purposes in mind. Panko unrelated to domain knowledge. The simple and Halverson originally described their intention classification of domain knowledge or contextual as providing a framework for the discussion errors is insufficient to support experimental of research on spreadsheet risks, similarly examination of these error types. Rajalingham focuses on facilitating an analysis and comprehension of errors to devise solutions. In contrast Ko and Myers48 develop a classification system focused on enabling examination of Strategies for reducing errors and improving the root causes of errors and this classification leans robustness of systems for decision support are heavily on research into cognitive mechanisms wide ranging, including for instance; improving of human error. The Ko and Myers classification error detection, improving software systems works at a far more detailed level of abstraction through design of development environments and than is reflected in the discussion of errors in debugging tools, training and skills development the HTA modelling interviews. Ko and Myers of practitioners and development of modelling use the software error framework that they methods. Clearly each strategy may be associated develop as the basis for an observational study of with different characteristics of errors and therefore software programmers practice in development imply a different focus for the classification of and debugging. The results of this study are errors within a taxonomy. used to inform the development of the software

TABLE 5 Requirements errors identified by Waliaet al.50

People errors Communication Participation Domain knowledge Understanding specific application Process execution Other human cognition Process errors Inadequate method of achieving goal/objective Management Elicitation Analysis Traceability Documentation errors Organisation No standard usage Specification

44 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

development environment, including the taxonomy may be better thought of as an error design of debugging aids. The work of Ko and classification rather than a formal taxonomy. Myers57 indicates the importance and relevance of research on cognitive processes underlying The interview classification of human errors for programming errors. The HTA HTA modelling errors modelling interviews indicate that there is scope for developing our understanding of human error Panko and Halverson45 and Ko and Myers48 processes in relation to modelling errors. recognise the importance of the life cycle dimension of errors. The use of the modelling The classification of errors generated from process description as a mechanism for discussing interviews with HTA modellers demonstrates that errors in the HTA modeller interviews, and we are in the very early stages of development the subsequent use of the modelling process to in this area. The purpose of the HTA interview structure the interview classification, echoes this life classification is to set down a starting point for cycle dimension. developing our understanding of errors and to facilitate discussion. To take forward any strategy The second classification system of Panko52 for reducing errors and improving the robustness provides perhaps the best match in terms of of decision support is likely to require further purpose and level of abstraction to the interview development of this classification of errors. classification of modelling errors. The interview Specifically the lessons drawn from the Purser and classification is mapped onto this second Panko Chadwick study51 indicate that a close attention classification in Table 6 and this is used here to describing the symptoms and characteristics of to further discuss the details of the interview errors may be beneficial as a basis for improving classification system. error detection techniques. The concept of a ‘violation error’ proposed Criteria for taxonomy by Panko is recognised explicitly by the HTA modelling community in three ways: first, in Two important criteria of a taxonomy are (1) the deliberate structuring of models to produce it should be complete, that is, any item being given outcomes; second, in the overoptimistic considered should be able to be located in the interpretation of evidence; and third, in the taxonomy, and (2) the classes should be mutually overoptimistic interpretation of results and exclusive, so an item should fall in only one class conclusions. All the relevant points of discussion within the taxonomy. here arose in relation to discussions about critically appraising models. The spreadsheet error classification systems identified in the literature and indeed the HTA I suppose in terms of when you’re critically interview error classification, struggle with these appraising something…the model’s been criteria. In the first Panko and Halverson45 system structured deliberately in a way to produce a it is unclear why some omission errors might not given outcome. fall into either the mechanical or the logical errors [I9] categories, similarly in the second Rajalingham system49 it is unclear that the division of qualitative However, Panko also considers non-compliance errors into structural and temporal is exhaustive. with procedures and policies also to constitute a violation error. This issue, although not explicitly The classification derived from the HTA modeller discussed by interviewees, is potentially implied interviews exhibits similar difficulties to those where errors are associated with inadequacies found in the literature, a number of reasons of process or methods, especially where those underpin the difficulties in this regard. First, the processes and methods are subject to guidelines or absence of a common description of the modelling process and methods documents. The definition of process means that location of errors in the failure to follow guidance for process and methods modelling life cycle is problematic. Second, the depends significantly on the soundness and validity interviewees do not have a common perspective of that guidance. on differentiating between root causes of errors, model errors and the impact of errors, as a result, The location of ‘violation errors’ at the top of the differentiating between these characteristics is Panko hierarchy of errors is potentially misleading problematic. For these reasons the interview because each element of the subsequent error 45

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. A taxonomy of errors

TABLE 6 HTA modelling errors classified according to Panko52

Error type by HTA modelling

Error in description of the decision problem Context error

Error in model structure and methodology used Violation Qualitative error Context error: spreadsheet, module, section design Context error: formula errors design Error in the use of evidence Violation Context error: spreadsheet, module, section design

Error in implementation of the model Slips Context error: spreadsheet, module, section design Context error: formula errors design Error in operation of the model Slips Lapses

Error in the presentation and understanding of results Violation Context error: spreadsheet, module, section design Slips

classification tree might also be the subject of a implementation leading to excessive updating and violation error. For instance, if there is a point debugging, leading in turn to further errors. Note at which a misjudgement becomes a negligent that this description in the interviews reflects the misjudgement then this would constitute a chains of errors, faults and failures proposed by Ko violation. Alternatively, where errors are the and Myers.48 Furthermore interviewees identified result of a breakdown in domain or programming errors associated with the premature selection knowledge, this may reflect a violation in the of software platforms or model structures and responsibility of the organisation or individuals described these as leading to cumbersome and within the organisation for recruitment of unwieldy models, prone to errors in programming appropriately skilled people and adequate staff and operation, this description precisely reflects development. Hence, rather than being a separate the type of qualitative errors described by Panko.52 class at the head of the hierarchy, violations may ‘Jamming’, as defined by Panko as the mixing potentially be better considered as more a separate of values within formulae, was explicitly cited by dimension of errors that run throughout the error several interviewees as bad practice leading to the classification tree and may themselves be subject potential for errors to arise. to further grades in much the same way as sins are classed as venial or mortal. It is worth noting that not all error types occur in all domains. Whereas the distribution of error types The concept of qualitative errors in spreadsheet over the different domains appears at first sight design was recognised by the interviewees in intuitively sensible, for example errors in model several ways. Interviewees referred to the vicious structure are associated with violations, qualitative circle of poorly designed or executed model and context errors, it is not clear whether the 46 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

complete absence of slips and lapses in this domain Conclusion is correct or represents a shortcoming in the perception of the modelling community. The interviewees collectively demonstrated examples of all the major error types identified Understanding the decision in the literature on errors in end-user developer problem spreadsheet systems. Taken together, the interview classification of modelling errors provides a basis Walia et al.50 focus on errors in the requirements for developing our understanding of errors and specification phase of the programming cycle, this facilitating discussion of errors. is the equivalent to the understanding the decision problem phase of the modelling cycle and is To take forward any strategy for reducing informative when compared with the classification errors and improving the robustness of decision arising from the HTA modeller interviews. The support, it is likely that further development comments in the HTA modeller interviews focused of this classification of errors will be required. primarily on failures in generating a common Particular attention should be shown in this regard understanding between stakeholders or failures of when considering methods for improving the communication. In the Panko classification this is identification of errors. captured as a context error, a broad grouping that is unhelpful in considering error avoidance and The literature detailing classifications of identification strategies. spreadsheet errors, is based around the work of two key authors, Panko and Rajalingham. Six The study by Walia et al.50 describes three categories different versions of classifications have been of error types, people errors, process errors identified, that all struggle in varying degrees and documentation errors. Whereas the central to present classifications that are complete and theme of communication is identified within mutually exclusive, whereas being rich enough ‘people errors’, Walia et al. also highlight errors to be useful. Ko and Myers48 identified explicitly in participation. Although not raised as errors the often complex chains or networks of errors by the HTA interviewers themselves, views such and faults that can lead to a failure and related as ‘It’s not an error in the model because we did these to research on the cognitive background to what they asked us to, but they could have asked human error. The principle of complex chains us a better question,’ [I2] might indicate errors of underpinning errors is described in the interviews participation on behalf of the interviewee. but the relation of modelling errors to human cognitive processes is poorly understood. This In terms of process errors, it should be noted layer of complexity is poorly developed in the that the interviews included only representatives spreadsheet error literature and not explicitly of the HTA modelling community whereas the discussed in the HTA modelling interviews. responsibility for defining and managing key elements of the process of defining the decision The error category termed by Panko ‘context problem frequently does not lie directly in the errors’ hides a multitude of different error hands of the HTA modellers. causes. For example this category contains errors associated with breakdowns in domain/ …they didn’t realise I’d have to build a disease knowledge, modelling skills/knowledge, treatment model alongside it and then they programming skills/knowledge, mathematical hadn’t invited the right people to come to it… logic and statistical methods knowledge. All [they] dropped the ball if they believed you of these represent areas of judgement where could do a screening [model] without having a the identification of and definition of error treatment model, but that wasn’t our error. are problematic and may include at their most [I2] extreme, violations of methods and processes. This area requires specific attention when moving Addressing these issues may require development towards strategies to reduce errors in HTA of process as well as modelling methods. modelling.

47

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 6 Strategies for avoiding errors

Overview Mutual understanding Clinician input to the model This chapter explores the procedures and techniques for preventing the introduction of development process errors in health economic models discussed by The interview data suggested that respondents the interview respondents. As with the other agreed that developing an understanding of themes that emerged from interrogation of the the clinical situation or disease process being interview data, a descriptive account of discussions investigated is paramount in ensuring model surrounding the avoidance of model errors credibility, highlighting the importance of was developed through the categorisation and clinical input within the model development classification of the charted transcript data. The process. Although interviewees’ definitions of interview data were analysed in terms of the what constitutes a model error (see Chapter methods and procedures for error avoidance 4) did not consistently include ‘softer’ issues as well as the experiences, views and perceived around model structuring, much of the discussion importance of these methods. Within the around avoiding errors focused on these. The qualitative analysis, five classifications emerged primary area in which clinician involvement was from the analysis: indicated concerned the development of mutual understanding of the disease process under 1. Mutual understanding, which relates to joint consideration: understanding and communication between stakeholders to the processes of model …make sure that there is always some kind of development and use. This includes decision- oversight by someone with the appropriate makers and clients, clinical advisors and other clinical background to put you right… stakeholders, health economic modellers [I9] and other members of the research team such as systematic reviewers and information …what we would have on every project is as specialists. least one member of the team with research 2. Model complexity and its perceived training but also with a clinical background to relationship to the generation of errors. help us with the day-to-day understanding… 3. Housekeeping processes, which relate [I8] to features and techniques that can be incorporated into implemented models to alert All respondents either implicitly or explicitly the modeller to possible errors and ensure suggested that developing an ongoing relationship the analyst is aware of the current state of the with clinicians who have a specialism in the model. disease area under consideration is essential for 4. Guidelines and interviewees’ perceptions of the development of health economic models. A their necessary content, role and limitations. preference for involvement with multiple clinical 5. Skills and training, which explores the skills, advisors throughout the model development abilities and knowledge that modellers working process was suggested by several respondents. within the HTA process should possess. …people do have vested interests and The techniques and processes identified within sometimes people may only know part of the the descriptive analysis were related to the model care pathway, they may be thinking I know the taxonomy to examine gaps between types of model whole lot but they are not involved in the whole error and current approaches currently adopted by of the care pathway they are just saying what the HTA community to ensure their avoidance (see they think is happening and that means you Table 7). need to speak to various people… [I8]

49

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

One interviewee working within an Outcomes …part of my job is actually to get the clinicians Research organisation noted that while clinical aware of what is possible to be modelled and advice may be internally available from within sometimes that is a question, it depends on the client’s organisation, the use of independent their previous experience of modelling in some clinical advisors may also be preferable. The same cases it is trying to expand the horizons and respondent stated, however, that an independent make them realise that things are feasible that clinical advisor may not be aware of important they might have been told in the past might aspects of a clinical trial used to inform the model, not be on other occasion it might be…to say hence independent and company-employed what can’t be done. The clinicians that I deal advisors may both be valuable in developing with it’s mostly a case of trying to expand their models. horizons out… [I5] …you can have clinical involvement of a clinician who is removed from the client and One respondent highlighted that clinical can offer an independent voice, not always advisors serve a role not only in developing an possible due to time constraints…and then you understanding of current practice (what is), but are relying on clinical advice from within the also the impact of new interventions on that company…sometimes that can be better than practice (what will/may be). advice you get outside because sometimes they can be more aware of important aspects of the …we are interested in the current practice and trial… then we are starting to think about how that [I12] practice may change if we put our intervention in at a particular point… There was some disagreement among respondents [I8] concerning the preferred nature of the relationship with clinical advisors. Some interviewees suggested Other respondents indicated that they would seek that the research team should meet with clinical advice from clinicians regarding the best sources of advisors early in the model development process to data with which to populate the model: develop an understanding of the disease process, the health states that will need to be considered We worked closely with one clinician in and their relationships, and to understand current developing the original concepts and he clinical practice before attempting to implement pointed us in the right direction for data for the model on a computer platform. populating particular links in the model… [I3] …there will be a series of meetings where actually people sit down and try to work out Respondents indicated that clinical advice is also and start to draw what they think the care important in generating parameter estimates in pathways are going to be…It would be myself instances whereby no evidence is available. or one of the other economists on the project, sitting there hand drawing it on a big sheet of …we would go back to the clinician or other A3… experts just to see whether the value you’re [I8] choosing has any kind of validity… [I9] However, an alternative view (see Chapter 3) was that the analyst should attempt to implement Using clinical input to ensure an initial model on a computer platform before face validity meeting with advisors. The majority of statements relating to mutual understanding between Respondents have indicated that clinicians may be modellers and clinicians concerned ensuring that asked to contribute to the validation of the model the views of the clinician are adequately captured in a number of ways. Their input may be sought as by the research team. One respondent discussed to whether they consider that the model adequately the importance of ensuring that clinicians represents the disease process and disease understand the modelling process to ensure management pathways. that they are able to fully contribute to model development: …we have a lot of checks with various people who may be commentators or people who we 50 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

think can question what we are doing; those question was asked incorrectly, or the question experts will be the people who have got an was framed early on in a way that was not interest in the disease area or condition area coherent with the clinicians’ understanding of and have got some understanding of the the disease. And so the whole process flowed process… through and generated junk at the end… [I8] [I6]

…you’ve got connections between health states Furthermore, one respondent raised an issue or whatever and you’re saying that this is how concerning a mismatch in expectations between the people move around in there and they [the research team and the decision-maker. clinicians] can say well that never happens and so it’s partly I guess a kind of validation of your So now NICE wants in the protocol to appear understanding… what modelling technique and what software [I9] you will be using…but, from my point of view it’s too early… With regards to this point, one respondent [I1] emphasised the benefits of discrete event simulation software applications that involve visual The same respondent suggested that the interactive modelling, allowing clinical advisors opportunity for negotiation between the decision- to watch patients moving between various health maker and the research team was limited. Two states on screen. further respondents expressed a similar view, suggesting that the specification of the scoping …it’s so easy to do, when you actually see document was rigid. people move through a model you would instinctively walk someone through a Simul8 The evaluation should have included all valid model because that actually visualises what options but when the model is set up what is they are expecting. In excel it becomes a lot the difference in the evaluation of the model, harder… the model itself could be fine in terms of [I2] what it’s done but it’s not really addressing the question which it should have been Respondents also stated that they invite clinical addressing…to miss out something that should advisors to provide feedback on interim and final have been there could potentially lead you to model results and to examine whether the model come to an incorrect conclusion… results are in agreement with their expectations: [I8]

…we will try to explain, well we will either find …so I think you can start with what you think out where we’ve gone wrong or alternatively the ideal is; I think you can then see how explain to them [the clinicians] why it shouldn’t that then maps into what the decision-maker be a surprise because sometimes it shouldn’t specified as what they think the decision be or, you know, sometimes it is their intuition problem is, and if there is some kind of or the data or the modelling is wrong…but I mismatch, and the question is where do you think you tend to assume that once they’re [the go from there really with someone like NICE, clinicians] not surprised by the thing then that who will be quite sort of explicit about ‘we want means you have got it right… you to look at this particular intervention,’ [I5] ‘we think these are potentially relevant comparators’. But, then we’re pretty restricted Mutual understanding between by licence issues etc… the decision-maker and other [I7] stakeholders One respondent raised a point concerning The respondents suggested that communication the Final Appraisal Determination document, between NICE and other stakeholders involved in highlighting the need for consensus of opinion the HTA process had improved over time. where evidence was lacking or absent.

…but you can see that in some of the NICE …in other places it’s a real unknown and there reviews, some of the older ones where the is no way of satisfactorily filling that gap and 51

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

you will find often then e.g. a NICE Final As further noted (see Chapter 3), clinical advice Appraisal Determination, a lot of that report may also be sought through the use of clinical will be discussing that weakness in terms of advisory boards with the clients to generate the evidence then it just comes down to one consensus and mutual understanding (often using person’s judgment potentially over another’s external clinical experts). The same respondent although you would hope more that there suggested that a key role of clinical input would be consensus around it… concerned refining specific aspects of the model [I12] where no predetermined approach had been agreed, as well as providing iterative feedback The same respondent further suggested the need on the model during its development. Indeed, for mutual understanding in instances whereby the rather than using clinical input at a single point in clinical pathway was not well understood or subject time, it was suggested that the clinician–modeller to variation. relationship should involve the modeller managing the expectations of the advisors, but that the entire So in certain disease areas it might be useful to process should be ‘a shared journey’: have some consensus around what we should do when we can no longer be explicit… …I think if we start hitting a specific issue that [I12] we want to really detail and nail down I think our tendency…and we don’t have any fixed Mutual understanding approach…I think what we end up probably between the modeller and the doing more often than not…is producing client (Outcomes Research something separate that looks at that particular aspect, either a short memo or a few slides to organisations). explain it, or we may mock something up in As noted earlier (see Chapter 3), the development excel to demonstrate what the key issue is and of models within the consultancy field differs I think that’s the way we do it… slightly to standard processes within the academic [I10] groups, particularly in terms of understanding the decision problem. The interview data suggested a …it’s about managing expectation. And it’s somewhat iterative process of negotiation between back to you don’t want them reaching here the research team and the client, primarily at and going. This is not a ‘da-da’ moment at the RFP and proposal stage. One interpretation the final results, it’s not, it should be like a of this aspect of the process is to ensure mutual shared journey. You should be finding these understanding between both parties and to reduce things out together and managing them in true the possibility of a mismatch of expectations. collaborative style… One respondent stated that this negotiation [I10] often starts when the analyst is bidding for the work and specifying the disease area, treatment, Mutual understanding within the outcomes and model purpose. The respondent also analytical team suggested that one approach to generating this shared understanding was through the production It was noted by the interviewees that research of a report summarising the evidence relating to teams typically consist of a number of researchers the decision problem to be addressed. and analysts with various backgrounds and expertise, e.g. modellers, reviewers, statisticians …but so then we go through that process and and information specialists. Researchers may be we normally would put that together in some working individually on their own particular part kind of report or tabular summary, as a draft of the project, or in small groups; several of the that will get discussed with clients, that will respondents indicated the importance of ensuring come back and we’ll kind of take on board effective communication between the individual comments and finalise that data set. So we try members of the team (and indeed with wider and get to a point where we’ve got an evidence external stakeholders involved in the process). The summary that is a shared understanding of the qualitative data suggested that this was commonly evidence between us and the client and the in done through meetings and by presenting the effect signed up… model to people outside the research team. One [I10] respondent described how they would present the model to individuals within the research group 52 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

who had not been involved in the design or between different people can be made less implementation of the model. error prone… [I4] …there’s also a stage of exposing the model at that stage to I suppose to people within the …virtually everything we do will be presented health economics group but also outside of either to the public health researchers at a that, so I think there is a, well, I think there’s team meeting… a benefit in terms of, again, just going over [I5] the way the model’s been put together…if it’s a Markov model it would be a presentational state transition diagram. It would be back Model complexity to then explaining the links between states, beginning to clarify where the data were One key issue that modellers are regularly faced coming from, I think I had a suggestion as to with when designing or developing a model what type of outputs are coming from it… concerns its complexity. The interview data [I9] suggested that modellers do not want to spend a substantial amount of time designing and Another respondent stated that owing to the implementing a highly complex model when a large number of people in the research team, simpler design would have sufficed. Nor, however, considerable co-ordination is required to ensure do they want to simplify the implementation of that information sourced by one member of the model to such an extent that clinically relevant the team is coherently communicated to other factors are lost or omitted. It should be noted team members; the failure of this process has that the concept of model complexity is in itself the capacity to introduce errors into the model subjective (what is considered a complex model by development process. one individual may be simplistic to another).

…what you have got is effectively quite a large • model complexity and the importance of research team, it’s not one person doing it transparency so a lot of co-ordination of effort is required. • arguments in favour of model simplicity Things can get lost, not reported, lost along • arguments in favour of model complexity. the way, or it might just seem that well actually for whatever reason I was probably having Model complexity and the problems building the model that was going to importance of transparency reflect this so I thought about an assumption which seemed quite sensible to me but when The interview data suggested that modellers’ you have to present it to someone else people preferences concerning the degree of model say hold on and that means you cannot do x, y complexity was related to their perceived need and z and that will imply something in terms for the model to be transparent. However, several of results and that may cause a problem later interviewees suggested that the model was required on… to be sufficiently complex to answer the question at [I8] the level of detail required.

One interviewee suggested that many of the errors …I have a preference for the simplest model that are introduced into models are a result of structure that will answer the question…what miscommunication, the implication of which is we do has to be transparent and verifiable by that modellers need to ensure not only that they an external reviewer… understand the decision problem and how the [I6] model relates to it but that all other members of the research group share this understanding. In a related discussion, another respondent raised As noted above, one way of doing so was the the issue of transparency with regards to debugging exposition of the model at meetings throughout its the model at the end of implementation, development. highlighting a trade-off between having a model that is sufficiently complex to answer the question …I think that very often the source of a lot of but sufficiently simple to ensure transparency and errors is miscommunications so finding ways understanding. in which you can make sure communication 53

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

…transparency can be an interesting one, the the timing of events or the patient history. I worst kind of error is the one that is never thought two to three per cent was not enough labelled an error but has resulted in the wrong of a difference in your final output to justify the decision so transparency is key in allowing extra complexity… errors to be seen… [I6] [I4] A similar viewpoint was expressed by other Arguments in favour of model interview participants. simplicity …unless there is a clinical imperative for it to The qualitative analysis revealed a degree of be modelled in the more complex manner, I agreement between the respondents that models would try and keep it as simple as possible… should be as simple as possible provided that [I12] they answer the question to the required degree of accuracy and do not omit or neglect clinically …I think one simple rule might be to keep it relevant features of the decision problem. As as simple as you can. There is a lovely quote indicated earlier (see Chapter 3) there was no from Einstein which is ‘things should be kept clear set of principles guiding the appropriate as simple as possible but made no simpler than breadth, depth and level of granularity of models. that’… One respondent highlighted a trade-off between [I4] spending time understanding the evidence base and spending time adding complexity into models: However, one respondent claimed that models typically become more complex as the model …my gut feel is that normally the bells and development process continues, although this point whistles don’t add very much…in terms of was shared by all (one respondent suggested that the accuracy of the final product, your time is sometimes models may become less complex over better spent really understanding the data that time). you’ve got than increasing the sophistication of the relationships that you create, or the …a natural tendency for things to become parameters that you pull out of that data… more complex in a model over time because [I6] people say well what about this and what about that and you have to start building other states The same respondent referred to an example in or other transitions… drawn from his own experience in which the results [I4] of a complex model and a simple model relating to the same decision problem were compared; the Importantly, one respondent perceived a direct interviewee noted very little difference in results relationship between model complexity and the so he expressed an opinion that the additional chance of creating technical model errors. accuracy did not justify the extra commitment of time and resources. …I think it’s a very interesting, fundamental thing which I haven’t mentioned yet which is …looking at the findings of quite detailed the extent to which you keep a model simple patient simulation models with a lot of in order to avoid errors as well because I think flexibility and description of the individuals there is an understanding that building more and sort of progressive process, and progressive functionality and complexity into a model degenerative diseases and comparing that with comes at the overhead of a greater probability a much simpler Markov state process where that you make a mistake in the model… you’re grouping the patients much more rigidly [I4] and imposing a structure on what can happen to them and actually looking at similar inputs A similar view was expressed by another to the two, we got within I think two or three respondent, who suggested that there was no point per cent, near enough, the final output, with in building an individual patient-level model if it the capacity to do the same sensitivity analyses. is not needed, as it adds complexity, hinders peer I used to wonder what you were gaining from review and increases the probability of an error the extra run-time and sophistication and it being introduced into the model. seemed that the extra flexibility in terms of 54 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

…there is no point using [an] individual • use of automatic checks and flags [patient level] model if you don’t need to, • use of model settings which are visible on the that’s just a waste of effort and also adds in screen complexity that clinicians might not like, peer • standard layout conventions reviewers might not like and also adds in a • naming of cells and cell ranges. chance of more errors… [I2] Automatic checks and flags Arguments in favour of model One housekeeping approach raised by respondents complexity involved the use of automatic checks and ‘flags’ incorporated into the model to draw the attention Several respondents noted that there are occasions of the modeller to potential errors. They can when, given the level of understanding of the be used to flag up errors within the calculations underlying disease or the presence of dynamic performed within the model, for example sum- interactions between model entities, a greater checks to ensure that the sum of rows of transition level of model complexity may be required. One probabilities add up to one. In particular, respondent noted that if implemented without respondents suggested the use of automatic checks technical error, the more complex model would be to ensure that no obvious errors have occurred his preference. in the data entry (e.g. typographical errors) and programming logic of the model; importantly, the …apart from that everything else as I see it use of such checks may raise the awareness not only at the minute it’s all in the same box with of the modeller, but any other model user. different levels of how hard it is to do…I would tend to go with the more complicated one …you check that your columns and your rows provided they’re confident, in its ability to, that add up to the same number, you check that it’s error free… your probabilities always come to 1, all those [I2] sort of things, they should be on, you know, if it’s a spreadsheet, they should be on the However, respondents highlighted a cost associated spreadsheet. It should be obvious to anybody with developing more complex models: if there’s a problem anywhere. And you know if there is a number which is always positive, …the danger with the NICE process when make sure you have a test to flag this up if you’re up against hard deadlines is that if a anything goes negative, even if it’s 30 years particular set of data doesn’t come you may into the future… forget that it hasn’t come…so in that context [I3] one does try and flag up potential for leaving in arbitrary data. The simpler the model the …there are all sorts of checks that you can more important it is that one particular thing build into the model or apply to it, so you is right… know there may be basic checks like making [I5] sure your transition probabilities add up to one. So you can build that in as a development The analysis of responses concerning model model and have a check, or make sure you are complexity and transparency highlights the accounting for everyone in the population… pragmatic difficulty in defining the appropriate [I11] level of complexity of a model. Inevitably this is a subjective matter of judgement that appeared to Summary of model settings differ between interviewees. visible on screen A further housekeeping technique suggested by Housekeeping some interview respondents involved the use of visible switches (potentially incorporating a master Housekeeping concerns features and techniques switch or monitor) to alert the model developer employed within an implementation model (and or user to the current status of the model. For its development) to facilitate error checking. Four example, this approach could be used to highlight approaches were discussed by respondents: whether a certain scenario was selected, e.g. base- case values, or to toggle between deterministic and 55

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

probabilistic analyses. One respondent suggested than having hard-coded things and stuff that using a monitoring tool (perhaps a cell which can go wrong… displays a certain colour when all switches or flags [I11] are set at default values or settings). Although the incorporation of such techniques may involve some Several respondents expressed a belief that such programming time, it was suggested that there is housekeeping procedures facilitate internal peer a payoff in terms of avoiding the presentation of review by other modellers who have not been unintended analysis results. directly involved in the design or implementation of the model: …to have a summary of what things are set to so that tends to keep reporting errors …we use a standard set of colours for the minimised to some extent… different sorts of cell in our spreadsheet. So we [I12] use one colour for the check sum, we use one colour for the data entry points, we use one …very often we have a model which has half colour for the derived data entry points, use a dozen switches in it, very easy thing to put certain colours for each output, we arrange a master monitoring cell that goes the right our sheets in a particular way and we have a colour when you have any of those switches set. separate work sheet for specific elements of the The base case will probably be with them unset, models. For example, each comparator arm so then when you see the output you know if will have its own sheet and we always do that… the big box is red you know that that output having those standards allows each modeller to represents something with the switch on and it much more quickly understand the model and tells you the switch is on… to build it in a way that potentially avoids some [I4] of the sources of error… [I4] Use of a standard model layout It differs because if you want the model to be At one extreme, one of the respondents stated that transparent within excel…you need it to be their institution had adopted a formalised standard in a nice way so that all your variables that are model layout with colour coding of certain going to change within the model are on one common elements of the model (e.g. parameters). spreadsheet, they’re all in a nice line… However, many respondents stated that they [I2] adopted at least some degree of standardisation in terms of model layout; most commonly this I would have all the parameters on the same involved retaining all model parameters within the page, all the input parameters on the same same worksheet. page… [I2] …try and keep all the inputs in one place and have them referenced so that someone can go Naming of spreadsheet cells through and say yes that unit cost was £43, tick that sort of check… A further housekeeping procedure suggested by [I11] the interviewees concerned the use of names for input cells (and cell ranges) within spreadsheets. A further suggestion concerned the structured It is clear that this approach may avoid some programming of formulae within the model by programming errors; whereas mistyping a cell copying cell formulae rather than ‘hard coding’ reference in excel may go unnoticed during model numbers: development, mistyping a cell name will return an error value (a ‘#NAME? value’). Such approaches …I think you can design the spreadsheets to may also enhance the transparency of relationships minimise that by keeping the inputs in one within the model if cell ranges are named clearly. place by keeping the calculations in a nice ordered way…the calculation phases across …it’s a very different game when you’re several sheets in a spreadsheet it’s more building a model in excel as opposed to difficult to identify where it’s all going wrong… writing a document, the consequences of it’s possible just to copy formulae down rather making a small mistake can be quite high you

56 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

have to be very careful about how you build to the prevention of errors in the models things and how you then check them and so. just adopting simple standards for writing Using named references for example, rather models… than saying A3 using a named reference you [I4] probably would not have made that mistake… [I4] Another respondent expressed a similar viewpoint but raised concerns about the potential to use If we use range names we try and label them guidelines as a means of reducing accountability in the actual thing as well so you can see the for model errors. range names on cells… [I10] …they would be ‘best practice in modelling’ type guidelines, I can’t see any harm…you have got to be careful not to take those sorts of Guidelines guidelines too prescriptively, you have got to be responsible for your own work and own errors, Eleven of twelve respondents discussed the you can’t disclaim it… potential role of modelling guidelines. However, [I12] the term ‘guidelines’ may have been interpreted differently; at one extreme some may view One respondent further suggested a potential guidelines as a standardised model development benefit of the development of guidelines in terms map that prescribes good practice under any of defining a point of reference for models. eventuality, whereas others may have more relaxed views of what constitutes a guideline, for example, …in that sense they [guidelines] are useful reference cases, critical appraisal checklists and because they allow anybody the same point of good practice suggestions. reference… [I8] Guidelines define good practice …I think the principle of having an overriding The qualitative synthesis suggested some set of standards or guidelines in a wider sense, agreement between the respondents that guidelines is kind of necessary; and that’s what’s led to the which define either good practice or a ‘minimum reference case I guess and various things isn’t standard’ should be attained. However, it was it?… suggested that such modelling guidelines should [I10] not be restrictive or inhibit the exploration of potentially innovative techniques and solutions Not all respondents were in favour of guidelines beyond their remit. in any form; one respondent suggested that guidelines were unnecessary in any form; that they …[there is] no scientific reason of why [we] are not a sufficient replacement of scrutiny. use excel [it’s] just a practical thing…other software that can be much more efficient and …I’m not a great fan of guidelines and effective…So if you are going to see guidelines standard operating procedures…I’m a great like that I don’t want the guidelines but if you believer in scrutiny and a great believer in… see it as a quality standard then it’s another fitness for purpose… thing… [I5] [I1] Similarly, a different respondent suggested that the …guidelines that are too prescriptive might value of guidelines was at best limited, serving a inhibit the scope to explore things in the useful role only in terms of teaching purposes. In way they should be explored by using the particular, several respondents expressed negative appropriate methods and might inhibit views concerning the value of ‘ticking boxes’ in innovation as well because things are always checklists. done in the same way so I think you need guidelines that are flexible but I think there …guidelines for, you know, for people to are a lot of very sensible things that can be operate and to use for checking and ticking done in any situation that contribute greatly boxes and things. I mean, I don’t know that

57

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

it would do too much harm, but it wouldn’t we would like to use for the different kinds of do much good either, I don’t think…to be parameters… perfectly honest. I mean, the only useful [I6] place for using that sort of thing, I think, is in teaching… [I3] Skills and training

…We actually took the Drummond checklist The final subtheme relating to the prevention for economic evaluations and applied it to the of errors within models concerned the skills and industry model and we got all the way down training of the modeller and the broader research saying yes, yes, yes, yes until we got to I think team. Interview discussions covered a range of it was number 34, do the results follow from skills; the qualitative data were interrogated with the data at which point we said no in very large respect to four classifications: capitals… [I5] • skills related to the design and development of models This concern was summarised by one of the • skills related to locating errors in models respondents: • skills in statistical analysis techniques • background and other abilities of the modeller. …It’s all so easy for you to get a lot of ticks on guidelines and checklists when actually you’ve Skills related to the design and done something fundamental that is so special development of models to the particular problem…but the guideline, and unless you are getting to questions like One of the points that the qualitative analysis does the model reflect the clinical reality, that exposed was that analysts are concerned more is the whole question from the start. It’s not about the ability to design and develop the correct really a question on the checklist it is the whole model than the ability to actually implement the objective of the checklist itself… model correctly on the chosen computer platform: [I5] …in the design stage a bit more…not to do Remit of guidelines with how to use the software, like…whether I learnt to use…excel or whatever, but rather As alluded to above, the interview data did more on how to develop programs… suggest some consensus surrounding the value of [I1] modelling guidelines, indicating that although technical aspects could be easily covered, issues One respondent stated that there were some surrounding understanding the decision problem modelling approaches that he would not attempt and conceptual modelling were more difficult to to implement because he was not confident in his capture. ability to complete the investigation within agreed timescales. …so I would perceive a risk that if we go down the guidelines path, we end up with something …there will be some things you yourself will which is a technical guide for software do particular modelling approaches that I will developers that doesn’t address the basic the probably not want to do simply because I am earlier question in the process, which is: ‘is not confident in my ability to complete them to this model actually designed appropriately to my satisfaction within a time scale… answer the question for which it was intended?’ [I8] So I’d be much more interested in guidelines or in ways of working that helped address that Skills related to locating errors fit-for-purpose question, but I appreciate that’s in models a much harder thing to do… [I6] Some discussion was held concerning modellers’ skills in identifying model errors. One respondent …again it’s easy to write a guideline that has a emphasised that the ability to scrutinise models is a wish list of all our favourite distributions that key skill.

58 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

…well it is mostly appropriate training and I Background and other abilities think again it is scrutiny…it all comes back to of the modeller it’s scrutiny… The final classification concerned the impact of [I5] the background and abilities of the modeller in avoiding errors. Limited discussions were had A separate respondent defined the skills to locate on this issue. One respondent had a particularly a model error as separate to skills concerning strong view about the characteristics of ‘good’ developing the correct model or implementing the modellers. model on a computer platform. …if you are intelligent you should spot a hell …it’s nothing to do with how good you are at of a load of them [errors] straight off…your excel and I don’t think it’s anything to do with best bet for stopping errors is to employ an how intellectual you are at conceptualising the intelligent person… problem, I just think it’s. I think it’s separate, [I2] I think it’s, I don’t know, it’s just the ability to be prepared to see a problem as something …you will have people who are naturally better that you break down into bits and you track at estimating an answer from just looking it down in a logical fashion and you try and at it on paper, than those that aren’t, but I narrow down to where the problem is. There’s think those are the people who make the best the problem, cut it in half and the problem’s in modellers and then the best modellers will get this half, right I’m going to cut it in thirds, the it more right because they know what they are problems in this third. And within 5 minutes doing… you’ve nailed it down to the five lines of code [I2] where it has to be otherwise this thing wouldn’t be happening… The same respondent discussed an ‘instinctive’ [I10] characteristic of modellers and the necessary presence of ‘gut feeling’ with respect to the Skills in statistical analysis identification and avoidance of errors. techniques …you instinctively know what model to build, A minority of interviewees discussed training in the you instinctively know whether the answer use of statistics techniques and software as a means looks wrong and if they do look wrong that’s of avoiding errors. One of the respondents raised when your alarm bell goes off. Obviously, if an important point regarding the use of statistics: I miss some what I’m calling non-significant that user-friendly software may result in errors, not errors, they might be in there and I’ve missed as a fault of the software itself, but rather through them because my gut feeling hasn’t said they the misunderstanding of the user implementing were wrong, anybody who never has a gut the software (for example, winbugs software comes feeling telling them it’s wrong will never spot with a user ‘health warning’; www.mrc-bsu.cam. any of those… ac.uk/bugs). [I2]

…it’s a matter of knowing what the tool does Furthermore, a different respondent suggested as you learn to use it and really trying to that the ideal background for health economic teach limitations alongside teaching use of modellers may not be health economics, but rather tools…sometimes user friendly software isn’t some associated modelling discipline. necessarily a good thing because there is a distinction between software that is easy to use …maybe the people that do that [design and well and software that is easy to use badly… develop models] best are the people who’ve [I5] not come to this from a pure health economist context but have come to it from a more …there’s two levels of training. One which is professional modelling context and I think training in sort of statistical approaches and… there’s something in that kind of mathematical methods sort of research, and then there’s a modelling mindset that we’ve already got sort of actually…you know, the actual using before we’ve got into health economics and software to then do that… knew what a QALY was that says that’s how [I7] you deal with that kind of problem. You either 59

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

come at it from another route, break it down they are likely (or intended) to avoid, based on the into bits or you stick these numbers through, interview data and through detailed discussion and I think to an extent you only get that among the authors. As far as possible these through having that background of doing approaches for avoiding errors have been mapped enough of these that you find your way with to the root cause of the model error; however, in it… several instances the primary source or cause of [I10] error was unclear. The mapping presented in Table 7 may not exhaustively capture every type of error that these methods address. In addition, given Explanatory analysis the variation in the elicited model development concerning methods for processes of individual respondents (see Chapter 3), the use of the full range of methods for preventing errors avoiding errors is not necessarily typical. Table 7 presents an explanatory analysis which attempts to link the current methods for A number of emergent issues arose from the preventing model errors to the taxonomy of model interpretation of the mapping presented in error presented earlier (see Chapter 5). The Table 7. A simple crosstabulation of the number leftmost column details ‘methods’ currently used of avoidance strategies for each error domain to avoid error raised by the interview respondents. gives an indication of the focus of the modelling These are loosely defined as either processes or community on different areas of model validity. techniques; the former relate to issues in the model Clearly, the range of methods and approaches to development process, whereas the latter relate to avoiding model errors is considerably broader than techniques of implementation. These methods the housekeeping techniques focused on avoiding have been mapped against the types of model error errors in the implementation of the model; only

TABLE 7 Relationship between methods for avoiding errors and error types

Relationship with error taxonomy Interventions discussed by interviewees Intervention type Relevant error domains Primary error targets

Clinician input to the model development process and use of clinical input to ensure face validity Establish ongoing long-term Process Description of the decision 10, 40, 150 involvement with clinicians who know problem/structure and about disease methodology used/use of evidence Use internal and external clinical input Process Use of evidence 150, 225 to elicit information on efficacy and effectiveness Ask clinicians to provide feedback on Process Presentation and 290 (may relate to any how results meet their expectations understanding of results/ point on taxonomy) potentially all error domains Discuss what is possible with clinicians Process Model structure and 40, 50 methodology used Discuss assumptions with clinicians Process Model structure and 20, 40, 50 throughout process methodology used Discuss likely impact of intervention Process Model structure and 40 with clinicians methodology used Discuss data sources with clinician Process Use of evidence 100, 180 (directed by clinician) Ask experts who know about the Process Model structure and 20, 40, 50 disease to comment on the model methodology used Ask experts who know about the Process Model structure and 40 disease to comment on the clinical methodology used pathways

60 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

TABLE 7 Relationship between methods for avoiding errors and error types (continued)

Relationship with error taxonomy Interventions discussed by interviewees Intervention type Relevant error domains Primary error targets Step through model pathways with Process Model structure and 40 clinicians methodology used Build initial model before meeting Process/technique Model structure and 40, 50, 60 clinicians methodology used Draw out pathways on paper with Technique Model structure and 40 clinicians methodology used Developing written documentation of Technique Model structure and 20, 40, 60 the proposed model structure methodology used Use of diagrams or sketches of Technique Model structure and 40 model designs and/or clinical/disease methodology used pathways Memos Technique Model structure and 20, 40, 60 methodology used Representative mock-ups to illustrate Technique Model structure and 40, 50, 60 specific issues in the proposed methodology used implementation model Written interpretations of evidence Technique Model structure and 20, 30, 40, 50 methodology used

Mutual understanding between the researcher and decision-maker/client Iterative negotiation and Process Description of the decision 10 communication between the modeller problem and the client (managing expectations) Advisory committees to generate Process Model structure and 20, 30, 40, 50 consensus of best approach methodology used Mutual understanding within the team Meetings with research team through Process Model structure and 70, 150, 225 project methodology used/use of evidence Presentations to internal team and Process Model structure and 70, 150, 225 wider team throughout project methodology used/use of evidence Exposition of model during research Process Model structure and 70, 150, 225 team meetings methodology used/use of evidence Handling model complexity Ensuring transparency of model Technique Model structure and 30 methodology used

Housekeeping techniques Automatic checks Technique Implementation of the model 230, 240, 250 Automatic flags (including master flag) Technique Operation of the model 270 Summary of model settings visible on Technique Operation of the model 270 screen Use of a standard model layout Technique Implementation of the model/ 250, 270 operation of the model Naming of spreadsheet cells and Technique Implementation of the model 230, 240, 250 ranges

continued

61

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for avoiding errors

TABLE 7 Relationship between methods for avoiding errors and error types (continued)

Relationship with error taxonomy Interventions discussed by interviewees Intervention type Relevant error domains Primary error targets

Establishment and dissemination of good practice Development of guidelines Process Model structure and 30, 50, 110, 170, 190, 310 methodology used/use of evidence/presentation and understanding of results

Skills and training Develop skills in understanding Process Model structure and 60, 70, 250 software methodology used/ implementation of the model Develop skills in developing programs Process Model structure and 70, 80, 90 methodology used Develop skills in scrutinising models Process All error domains Any point on taxonomy Develop skills for analysing problems Process All error domains (primary Any point on taxonomy communication and judgement errors) Developing skills besides health Process All error domains Any point on taxonomy economics

one-fifth of the methods target the implementation techniques are suggested by the interviewees, for and operation of the model. The remaining four- example, sketching out clinical pathways, these fifths target the conceptual validity of the model. are not framed within an overall strategy for Very little emphasis is placed on presenting structuring complex problems. and communicating results and conclusions. An emergent issue that is evident in the synthesis concerns the nature, and balance, of processes and Placing the findings of the techniques for avoiding model errors. In particular, interviews in the context of the techniques detailed are explicit, and can be interpreted as relating to how something should the literature on avoiding be done, for example, implementing a specific errors model layout. On the other hand, the processes for avoiding errors recognise that there is a need Fostering mutual understanding between the to be addressed and acknowledge that something modelling team and all stakeholders to the should be done as part of model development; decision-making process is targeted at achieving however, in many cases this is not accompanied model credibility. The issue of defining appropriate by a clear strategy for achieving the required goal. level of complexity is well discussed within the This issue particularly affects those aspects relating literature; however, no objective methods are to the definition of the decision problem and suggested for dealing with this issue. Literature conceptual modelling. A number of requirements on problem-structuring methods suggests that the are identified to achieve clarity and mutual focus should be on developing adequate models, understanding; these are expressed as process that are ‘owned’ by the decision-maker. This in requirements (e.g. establishment of long-term turn, highlights the importance of transparency of clinical input) to avoid errors in the description reporting, implementation and interpretation of of the decision problem. Whereas a number of models.

62 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 7 Strategies for identifying errors

Overview the ‘reasonableness’ of model results, and whether they match preconceived expectations, were raised This chapter presents a qualitative analysis of as dimensions of what constitutes a model error. discussions concerning procedures and techniques that are employed to identify whether, where and …there’s a level of validation which can why errors have been introduced into models. happen at [the] end of any model, because, you As part of the Framework analysis, a number of know, you want to make sure it’s actually giving classifications emerged from the qualitative data. you numbers which aren’t ridiculous… [I7] • Check face validity with clinicians. • Do the results appear reasonable? …I suppose, that there’s a kind of issue of • Test model behaviour. judgement in terms of whether it looks, • Is the model able to reproduce its inputs? whether the model itself looks reasonable… • Can the model replicate other data not used in [I9] its construction? • Compare answers with the answers generated The ability of a modeller to develop an expectation by alternative models. of the answer has been discussed previously (see • Peer review of models. Chapter 6, Skills and training). As highlighted • Double checking of input values. through these previous discussions, the source of • Double-programming. these expectations differed between respondents. For example, one respondent suggested that an expectation of the answer can be formed by the Check face validity with modeller through an examination of the key experts features of the disease and the evidence used to inform the model. A commonly cited approach used in the identification of model errors concerned …you’ll look at the key things which… checking face validity of interim and final model are going on, but you want this right at the results with people who know about the disease beginning of the process. We call it the sniff and treatment(s) under consideration. This test which is whether the result smells right… has relevance both in the avoidance and the [I6] identification of errors; in this case the primary differences in its role concerning whether the In practice, this approach echoes the idea of error is found before or after the decision has been ‘skeleton models’; ‘back-of-the-envelope’ models, made, or whether a model is being scrutinised by which adopt a high level of abstraction to clinical experts acting on behalf of the decision- generate an expectation of the model result. One maker (e.g. reviewing models submitted by respondent highlighted how this might work in manufacturers within the Single Technology practice; however, as already noted (see Chapter Appraisal process). Key discussions surrounding 4), although a model result may match one’s this issue have therefore been presented (see expectations, this does not provide a guarantee Chapter 6). that the model is error-free.

…my back of the envelope said this was going Do the results appear to be about forty grand per QALY and there reasonable? were some subgroups that might be interesting. If you come to the end, or you come to a draft A common issue concerning the identification of model of the results, and you have something model errors involved whether the results appeared radically different from that, then there should to be ‘reasonable’. As noted earlier (see Chapter 4), be a reason. If it comes out and it’s four grand 63

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for identifying errors

per QALY then it just smells wrong, and …if issues are identified in the model during something has changed between the decision construction or in quality control and it’s one problem you thought you were looking at in of those really annoying, why the hell is this the beginning and the data that’s actually been doing this? Then one of the approaches is, implemented… and again this is experience and not written [I6] down. It’s out the model and I’m going to go down to back of the fag packet, start again and …do the results, if it [the model] says the ICER just build up the same thing, in essence, not is £50,000…does that seem about right using replicate it, but just build up the same kind of the back of the envelope sort of techniques… relationship and logic to see whether that’s at you’re never going to be able to confirm that all possible or try and track down which part of exactly without the full model full calculations that process must be going wrong to hit those but you should be able to at least justify [it] or kind of numbers… not within a fairly…certain range… [I10] [I11] The same respondent suggested that such The same respondent described the action which approaches should be a standard part of the model he would take if there were a significant difference development process. between the expectation and the answer from the model: All the good modellers do and the bad ones don’t… That’s the sort of detective phase I think, [I10] identifying which bit is different, so your fag packet might have got differences in costs divided by differences in QALYs…so firstly Testing model behaviour which of those two or both is different from what the model is generating, and if it’s Testing model behaviour relates to the process of say the difference in costs, is there a cost of changing the input values in a way which will result intervention or the cost of the comparator, in a change in the results, and examining whether which one’s wrong? the change was expected (note there is a direct [I11] link to the methods for developing expectations described previously). One respondent suggested Similarly, another respondent stated that he a series of logical tests involving the efficacy of estimates what the model output should be using interventions as a means of identifying potential pen and paper at the start of the project; if this errors within a model. approximated result is not within a reasonable tolerance of the results from the final model, he …if I increase the efficacy of an intervention, then attempts to establish which is wrong and why. the effectiveness ought to go up and the cost- effectiveness ought to go down. If I switch the …I regularly work out the answer before I do inputs for the two arms of the model, I should the project and see if I am right or not, and if get the opposite result out of it at the end. If I I don’t end up with the answer I think I have, put the same inputs into both arms, I should work out why I was mistaken in the beginning, get a null… why my original calculation was wrong or why [I6] the new calculation is wrong… [I2] …as you start to think through the model you start to think of an expectation of what the That’s why it’s interesting, when it doesn’t results are going to be and given a certain match your back of the fag packet answer, change how you think the results should you’ve got to be damn sure you are right… change and the more specific you get in [I2] terms of the structure of a model and the data inputs the more specific you can be about how A different respondent stated that when a model is you expect the change to be so if it isn’t in operating in an unexpected fashion then one of the the direction or the magnitude that you are approaches is to build up the same relationships expecting, you are expecting that you have and logic and try to track down which part of the probably made a mistake somewhere… 64 model is responsible for the error. [I8] DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Another respondent states that it is possible to that the use of such techniques did not guarantee prospectively define a set of input parameters and that errors would be identified (a similar point was develop an expectation of their impact on the made concerned developing expectations; see Do model output. the results appear reasonable?). One respondent suggested that the model scenario may match …well…one thing you can do is you’d define previously determined expectations as a result a set of input[s], and you know what will be of luck rather than as a consequence of internal the set of output[s]. And with that, you know consistency within the model. exactly…whether the results are right or wrong… …if you can test this by a kind of univariate [I1] type sensitive analysis approach where you’re saying I think this is going to have a huge …you see testing as a black box where you impact or at least if I increase this value, the define input and you know what output you are result, you know, that value should increase expecting. So you change the inputs, and you rather than decrease…you may have the see how the outputs change… right direction, seeing the right magnitude of [I1] change, but actually purely by luck…It’s just pure luck that there happens to be the case and The same respondent highlighted that techniques there’s some error in there. exist that can facilitate this process of design and [I9] analysis; however these are not currently used within HTA modelling. Other examples of model tests included setting all utility parameters to zero, adjusting transition Imagine you have five inputs that can have probabilities and assessing the impact, and setting different values, each of them, and you know the parameters in each arm of the model to the what the outputs should be for the values. same value and checking that the results were What would be the minimum number of identical. combinations that you can do with your five inputs in order to get to be sure that the Simple things like if you set utility values in the outputs are correct…so there are techniques model to zero is the output of the utility zero that tell you how to define your testing inputs, in each arm? If you set the data points in each then, in order to get the outputs… arm equal do you get an ICER of zero or not… [I1] [I4]

Another suggestion concerned the use of model …when you increase one of the transitions functionality tests such as altering assumptions which you expect to reduce the utility then the concerning the initial distribution of patients in the utility goes downward. So you play around with model. it and make sure its pretty much behaving like you want… …model functionality tests…what happens if [I4] you set the starting population to be all in one state rather than another… Although the majority of interviewees discussed the [I11] techniques they adopted, one respondent adopted a more normative stance, suggesting that the Another approach was to test extreme input values testing model behaviour should be a standard part in terms of stability and robustness of model of the model development process. outputs. …every time you change something, you …taking extreme settings as inputs as ways of should as a matter of course just check that checking that the model is resilient to use its the ICER goes in the right direction…A new input… parameter, a parameter changes or a new [I11] variable or a new state or any of these things get added you rerun the model, check the Whereas many of the respondents discussed the use ICER, did it go in the right direction… of model testing to identify errors, it was suggested [I2]

65

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for identifying errors

…what we have at the moment is a model prior …internal validity means that you get out what to final or full analysis being run that we are you put in, big deal, fine...you have to do that, happy with and at that point it is only really but it doesn’t get you anywhere. It doesn’t tell when we are starting the final stages of the you that your model is right. It only means that model when you are producing results, some of what you put in – if you put rubbish in it, then the results, especially those exploring extreme you’ll get rubbish out of it. So all it, all it means values, that your problems become fully is that, you know, you’ve got the mathematics… apparent and then the question becomes do we right… need to refine the model or is our model giving [I3] reasonably good predictions or any biases that we have within our model reasonable over Another respondent highlighted that there may be the plausible range of those values and only instances whereby one would not expect the model becoming uncertain over implausible values. So outputs to match the input data used to inform it; we are actually maybe trying to specify at what however, this respondent did express a preference points our model is going to fall over… for demonstrating internal consistency where [I8] possible.

…you’d hope that that sort of iterative process, …even if you have good reasons your numbers the sort of extreme values would kind of dig up won’t come out exactly the same and some of the major ones… those might be overriding reasons, because [I7] of the way you are trying to use the trial for a particular model in a particular setting. If it’s possible to build a model in such a way that you Is the model able to can show consistency with a clinical trial then reproduce its inputs? I think you should do that. That is a useful check… Testing internal validity, that is, checking that [I12] the model can generate the inputs used to populate it, was also discussed by participants. Interestingly, one respondent suggested that testing This corresponds to Eddy’s second order of model the internal consistency of the model did not validation.59 Several respondents suggested that necessarily take place within the model itself, but they would typically use this approach as part of may be undertaken as part of a premodel analysis. the model checking process; examples included The perceived benefit of this approach was that it testing incidence data and survival analysis used to may ensure that the data have been interpreted inform model parameters. correctly before their incorporation into the model.

…I suppose it does depend on you know if …I would just make sure that my survival you have taken for example, national statistics, models and other data that I have pulled from cancer incidence data to go into your model the trial is making sense, quite separately from then you very much expect the incidence any economic decision model. Can I reproduce coming out of it… these things in a way that looks right? Just to [I11] make sure I have done the analysis correctly and understood the results… …we might do some kind of level of validation [I12] in terms of…model survival curves [do they] look anything like the data itself? And so there would be some kind of, where it’s possible, Can the model replicate internal sort of validation on data inputs, and I other data not used in its think that would be done on a sort of statistical basis… construction? [I7] An additional aspect of model validation discussed by interview participants concerned the ability of However, some participants’ views of the utility of the model to replicate data that have not been used this method were negative, indicating that it may in its construction; this corresponds to Eddy’s third be a necessary but not sufficient condition of any order of model validation.59 One respondent stated model. that this level of model validation is rarely possible 66 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

given issues of data scarcity and the general problematic as a method of validation because principle of using all relevant evidence to inform models which attempt to address similar decision the model. problems may have employed different structures, assumptions and parameters. As noted by one What does that mean – external validity? Well, respondent, it may be necessary to accept that if it means accurately predicting something some difference in results is to be expected; from a different source, well, that’s all right, however, the underlying cause of the difference except that most of the modelling that we do, may not be clear. that I’ve ever done, is obliged to use every source that’s available! Because of data scarcity. …it’s a judgement, but you’re not expecting to So basically there you know, you cannot do get absolutely the same result, but, you know, external validation… the convergent validity, let’s call it that...the [I3] differences between these models, do they, come down to fundamental differences in The same respondent highlighted other problems structure, differences in conceptualisation, concerning the relationship between the data differences in scope, or is it the parameter used to populate the model and its relationship to inputs? And what would obviously be evidence collected in the future. reassuring in that situation is change the parameter inputs to suit those of the existing …when you’re on a chronic model, by the time models produced, you know, similar results… you get 10 years into the future, the whole [I9] circumstances have changed, anyway. The epidemiology’s changed, the interventions …there are two levels I guess of validity, one have changed, the population’s changed, of which is does…your model…look kind of you know, everything’s gone…if somebody reasonable in terms of what other people have predicted it and put it in a sealed envelope, it’s done… bound to be wrong. Does that mean that the [I7] model was wrong? No, because it was talking on the basis of what was known at the time and One respondent suggested that this aspect of the circumstances at the time…no, predictive model checking should run alongside model validation is impossible anyway. So the whole development; however, there may be a danger of notion of model validation is either trivial or becoming reliant on the structure and assumptions impossible… of previous models (see Chapter 3, Discussion of [I3] model development process). Indeed, the presence of differences between models does not guarantee One respondent highlighted that even if such that either model is ‘wrong’, rather such differences analyses were possible, time constraints may may be explained by the modellers’ preference for represent a barrier for this level of validation. a given set of assumptions and methods.

…we talked about [whether] our model had …if somebody else has got a model of predicted the trial results but by that point you a different answer to it, we wouldn’t are on another project so you’ve got too many automatically say it’s an error. Because, you things going round your head and we’ve never know, we’ve got our own preferred set of actually done it, and then it’s probably too late assumptions and ways of dealing with it… to say you know that model three years ago, it’s [I7] wrong… [I2] Nonetheless, it was suggested by one respondent that certain aspects of the model should always follow accepted conventions and that failure to do Compare answers with so limits the comparability of model results. the answers generated by So the accepted convention for discounting is alternative models that you discount after the first year in whole Interviewees also discussed comparing the de numbers…so this 12 months is not discounted, novo model results against results from previously the next is discounted at 1 year and 2 years published studies. However, this is likely to be and so on…well, you know, people will 67

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for identifying errors

correctly use the correct discounting formula however, this may be constrained by time and from point zero, so they are discounting every concerns regarding intellectual property and a lack day from then onwards…it’s not technically of willingness to share ideas. As a consequence, it incorrect, but it is conventionally incorrect and was suggested that internal peer review is likely to inappropriate if you are then going to make be a viable second-best alternative: a comparison with another study in the same area that’s discounted in a different way. So …if there were time, you would want to send it then your answers are then not comparable… out to some independent expert who would go [I3] through it all with a fine-tooth comb and tell you everything that’s wrong with it. In practice that doesn’t happen, it doesn’t happen because Peer review of models there isn’t the time to do it, and secondly, it doesn’t happen, because if you’re doing The majority of respondents discussed internal anything that’s genuinely original, you know, peer review as a common element of the model you don’t want to run the risk of anybody else development process representing a useful means picking up all your ideas. So, you know, there of identifying errors. Perhaps the principal reason are problems on that front. The best that you for peer review was that ‘your own mistakes are the can do is to try and expose it to somebody hardest to see’ [I4]. For example, one respondent probably internal, who hasn’t been involved in stated that completed models are passed to a senior the development, and let them go through it member of staff who is charged with the task of and say, ‘Well, can you find any obvious flaws ‘breaking’ the model. here?’ And that’s probably the best that, in reality, we can usually do… We have a process here where we have a sort of [I3] a peer review, so we have enough experienced staff that we could give the the draft model to It was suggested that the benefits of peer review did one of the other project leaders and ask them not lie solely in terms of identifying errors, but also to spend a few days kicking it about to see if in consolidating the modeller’s own understanding there are errors or mistakes in it… of the decision problem and the justification for the [I6] modelling approach.

Often, but not always, this was suggested to involve …there‘s also a stage of exposing the model a modeller who has not been involved in the at that stage to I suppose to people within the design or implementation of the model; however, health economics group but also outside of some respondents suggested that meetings were that, so I think there is a, well, I think there’s a held and model results presented throughout benefit in terms of, again, just going over the the development process (for example, in the way the model’s been put together… supervision of junior staff). [I9]

…you might need someone external to the Several respondents highlighted the importance of modelling to find those kind of errors… model scrutiny through peer review because of the [I1] model’s political importance, as well as a feeling that if the peer review failed to identify substantial A key issue in the internal model peer review was errors the modeller could be more confident that that the reviewer must be capable of understanding the model was error-free. the complexities of the model. We have had people check my…model because …you would like to think that the people that’s a huge, politically important piece of checking it are reputable and able so they will work… understand the complexities that they have [I2] put into the model whereas others might just dismiss them… …these things are subject to severe scrutiny [I12] and if they find something tiny well they’ve looked through a lot…it was a pure Similarly, one respondent suggested that external transcription error, I put the wrong number in scrutiny of the model may also be seen as an ideal; and it was picked up by the technical lead…the 68 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

thought was if that had been checked with that check, go through each data point and make much scrutiny and that was all they could find sure it’s right… then there is actually a good chance that the [I4] rest of it was correct… [I5] One respondent suggested that the effectiveness of such checking activities may be constrained by time However, despite the potential benefits of model availability and the tedium of the task. peer review, some respondents raised concerns about the sensitivity of peer review (before …I mean, checking and double-checking, publication) in detecting model errors: really. The only way really is having the time and the discipline to go through, crosscheck …I remain baffled as to how general reviewers everything you do several times… assess the quality of models without looking [I3] at them…but how a journal can perform a meaningful review on the basis of a three and One respondent stipulated that such checks must a half thousand word manuscript describing a involve checking confidence intervals as well as highly technical model… mean input values. [I6] It wasn’t the point estimate that had been We try to publish in journals with the highest entered wrong it was the confidence interval citation rating. Those are almost inevitably and things like that are difficult to spot… clinical journals. Clinical journals use clinicians [I12] and generally speaking the reviews for clinical journals are the older clinicians who’ve had very little direct exposure to economic Double-programming evaluation, modelling or anything. They are pure clinicians. They don’t know what they are The issue of ‘double-programming’ was discussed reviewing… implicitly or explicitly as a potential approach [I3] to identifying model errors. For the purposes of this analysis, double-programming relates to the development of the same mathematical model Double checking of input in two different platforms; a distinction should values therefore be drawn between double-programming and the development of ‘skeleton’ models discussed One commonly cited process for identifying model earlier (see Chapter 3). Although respondents errors involved double-checking model input generally agreed the double-programming was a parameters in terms of its incorporation within valuable activity, there was variation surrounding the model and cross-checking against the original the extent to which double-programming was sources from which parameter value(s) were undertaken as a standard component of the model derived, e.g. clinical trial publications. In relation development process. to definitions of model error, these processes primarily concern ensuring that data have been Perceived value of double- transcribed and linked within the model as the programming modeller intended. The potential value of double-programming …we ask them to look at the accuracy of the primarily concerned developing confidence in the input data, so that’s been incorporated into the results of the original model; that the results should model appropriately… be comparable. [I6] …I think that is about rebuilding models. It’s …there is a lot of checking to make sure the about you know, rebuilding it in a different data is right not just so you talked about what software package and making sure that you get the meaning of the data points are but also similar answers… make sure that the actual values are what they [I7] should be so we step through the model and

69

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for identifying errors

Clearly, the focus here is on technical and logical Aspiration or realism? errors in the model implementation, rather than Several respondents suggested that double- problems in the structure of the original model; programming was an aspirational ‘gold standard’ as such the value of double-programming in rather than an attainable goal. identifying errors may depend on how the second model-build is implemented. However, one …one of the things we don’t typically do is respondent noted that if the same analyst rebuilds replicate our models using another piece of the model in two different software platforms, they software and often that would be useful because are less likely to identify any errors because they that would be another way of checking things, may not re-examine the structure of the model. so we don’t do double-programming… [I8] …building it in two platforms but that’s effectively your double-build but the same One respondent highlighted that their gold person building it in two platforms but I would standard would involve multiple modellers guess that is more prone to just using the same programming the model independently; data and not re-examining your logic to see however, one may foresee problems in terms of whether your logic was correct you’d just do comparability of the resulting models (see quote exactly the same, so I think that’s less likely to from respondent I4, Chapter 3, Use of information solve errors… in model development). [I2] …well the real ideal is to have two people However, one respondent highlighted that building the model independently…well even if the second model-build was undertaken that is short of the ideal. You can have as many by an individual who was not involved in the people as you like you know…At least three development of the original model, then this may people building the model independently highlight problems in interpretation of the decision would be good but… problem. [I5]

…[It] depends on the circumstances but one Constraints on double- of the things that might happen is there is an programming independent rebuild either by someone you know or outside…it’s just as a check principally Two respondents discussed potential constraints on whether you find any errors or in surrounding double-programming. In particular, interpretation as the idea is that a reasonable the ability to undertake such model checking person would come to a similar conclusion… activities may be limited by cost and time. However, [I12] one respondent suggested a relationship between double-programming and model complexity; that For one respondent, double-programming was double-programming may be a viable option for considered a standard part of model development, simpler models. whereby the model is developed using two platforms in parallel: …I know in a lot of those guidelines they talk about, in this verification stage they talk …certainly at the stage of you know, about parallel development of models in other developing it on the alternative platform… software. We’ve never done it and I can’t see those are done with a model that you feel us ever, commercially being able to do that reasonably confident with… because you are, by definition, you’re doing it [I9] twice and you’re doing it twice for a reason, to identify whether doing it in a different software …I would normally do a quick development in or a slightly different approach makes a big TreeAge. If we’re thinking of sending it off to difference… NICE, then we’d do it in excel and, and we’d [I10] make sure that those two versions are kept in tandem… …if it is a big complex model it needn’t be an [I9] aspiration you could do it but that can take a lot of time…it’s an aspiration in some cases

70 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

because time or complexity doesn’t allow in whether the behaviour of the model is in line other cases it’s achievable if it’s a reasonably with the expected behaviour. An example of trace simple model… testing might involve tracking the proportion [I12] of patients in each health state numerically across time; the exhibited behaviour can then be compared to determine agreement with the Explanatory analysis expected behaviour. concerning methods for The literature also describes Turing testing, preventing errors in which the ability of an expert to distinguish Table 8 presents an explanatory synthesis which between observed phenomena and output from a attempts to link the current procedures and simulation model is used as a test of the model’s techniques for identifying model errors to the validity. This process was not described by the taxonomy of model error presented in Chapter 6. interviewees and may have some potential for The layout and interpretation of the table, and the applications in HTA modelling. methods of synthesis used in its construction, are essentially the same as that for Table 7. Given the overlap between the identification and the prevention of model errors, the processes and techniques relating to clinical involvement Placing the findings of the in model development and model scrutiny are interviews in the context of not repeated here. One key point is immediately noticeable from the interpretation of Table 8; many the literature on identifying of the methods used to identify errors in the model model error involve specific techniques (rather than processes) that may be applied once the model is complete, The literature revealed a number of validation in progress or under scrutiny by a third-party. techniques which have been developed in the However, the specific target of the techniques, i.e. field of computer science. Many of the techniques the types of error that the technique is intended described in the literature were discussed by the to identify is not always clear. Indeed, the majority interviewees. These included: comparison of of methods do not appear to map directly to the model with previous modelling of the same specific error types in the taxonomy, but rather situation, seeking face validity through asking they may be used to identify symptoms of errors, clinicians who have specialised in the disease the true source of which may relate to any point in process or medical condition being investigated the taxonomy of model errors. For example, the to comment on the model, retrospective and identification of a mismatch between actual model prospective predictive validation, degenerate results and the derivation of prior expectations testing in which the value of a single parameter from a previous model may be suggestive of the is changed and the behaviour of the model is presence of an error within the model, yet its root examined to determine if it is in line with the cause may be entirely unclear (there may even expected behaviour, and extreme condition testing be legitimate reasons for such differences). This in which the model parameters are assigned represents a considerable challenge in the peer extreme values and the behaviour of the model is review of models. The same may be true of certain investigated. Another process that is described in black-box validation techniques; only the methods the literature and the interviews is the process of which test the underlying logic of the model assigning the model fixed values that will facilitate can guarantee the presence of an error. In this manual calculation of the results or interim results, sense, the methods outlined in Table 8 that can be analogous to the pen and paper approach. mapped directly to specific aspects of the taxonomy (e.g. switching model inputs between arms) are A number of processes were listed that were either diagnostic in nature; mismatches in model results not discussed or discussed in different terms. These and expectations are indicative of the presence include animation, operational graphics and trace of model error. Conversely, those techniques and testing. In animation, for example the number procedures which map to any or all points in of patients in each health state across time might the taxonomy are effectively non-specific model be displayed graphically across time to determine screening methods.

71

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Strategies for identifying errors

TABLE 8 Relationship between methods for identifying errors and error types

Relationship with error taxonomy

Intervention Primary error Interventions discussed by interviewees type Relevant error domains targets

Checking face validity of the model Clinical review of model structure, evidence See Table 7 and results

Evaluating whether model results appear reasonable Compare interim or final model results against Technique Any/all error domains predetermined expectation (from previous models, from skeleton/back of the envelope model, from earlier version of model)

Testing model behaviour (black-box validation) Increase efficacy of intervention – Technique Any/all error domains Any point on effectiveness up, ICER down taxonomy Switch inputs for two model arms – results Technique Implementation of the model 230, 240, 250 should be opposite Functionality tests, e.g. set initial distribution Technique Any/all error domains Any point on all to single state taxonomy Test how model behaves at extreme values Technique Any/all error domains Any point on taxonomy Set utility values to zero – no QALYs Technique Implementation of the model 230, 240, 250 Adjust transition probabilities Technique Any/all error domains Any point on taxonomy Set same model parameters for each arm Technique Implementation of the model 230, 240, 250

Testing the internal consistency of the model Testing the internal validity of the model – can Technique Structure and methodology used/ 70, 80, 90, 200, it reproduce data used as inputs, e.g. incidence use of evidence/implementation of 230, 240, 250 and survival data? the model

Testing the predictive validity of the model Testing the ability of the model to predict data Technique Any/all error domains not used in its construction

Peer review of models Internal peer review by modeller responsible Process Any/all error domains for building model (e.g. supervisor of junior staff) Internal peer review by modeller not external Process Any/all error domains to the process External peer review Process Any/all error domains

Checking model input values Check model input values from source Technique Use of evidence/implementation 130, 230, 240, 280 material of model/presentation and understanding of results

Double-programming Double-programming by same modeller in two Technique Structure and methodology used/ 70, 80, 90, 230, platforms implementation of the model 240, 250 Double/triple-programming by independent Technique Any/all error domains Any point on modelling groups taxonomy

72 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 8 Barriers and facilitators

The final theme concerned barriers and facilitators as double-programming, particularly for more to model checking. Interviewees were asked about complex models. Two respondents also suggested their views concerning barriers and facilitators to a relationship between time constraints and model model checking and whether model checking is validity issues in terms of the use of an appropriate currently given sufficient priority. Respondents methodology and plausible assumptions. engaged in broader, more detailed discussion regarding barriers to model checking rather than I’ve gone back and gone well why did I make facilitators. that assumption? Actually a better assumption was this, but I’ve gone with an assumption because it was easier, quicker and they need the Barriers report tomorrow. [I2] Barriers to model checking activities discussed by interviewees are summarised in Box 1. The A second respondent also suggested that while one interviewees commonly focused on resource may wish to explore whether more sophisticated constraints (time constraints, people, money) and models would have made a difference, this is organisational/process constraints. Two further constrained by time. Human resource constraints barriers included the perceived importance and were also raised by several respondents, priority of model checking activities and modellers’ including availability of modellers as well as other confidence in current model checking procedures. stakeholders within the model development process, e.g. clinical experts providing input on Views concerning the nature of current barriers clinical practice. to model checking varied between respondents. Two respondents highlighted a trade-off between You can have clinician involvement in terms time available for developing the model and time of the clinician who is removed from the client available for checking the model: and can offer an independent voice, not always possible due to time constraints, it really …if you spend all of your time doing quality needs to be done quite quickly to organise the control you don’t have as much to actually necessary meetings, preparatory literature and build a model. so on… [I6] [I12]

Time was highlighted as a barrier to some of the Alongside physical human resources, the more aspirational model checking activities such dimensions of human constraints also included

BOX 1 Summary of barriers to model checking

Lack of time for checking models, timing of the process and time management Financial resources Availability of clinical input Tedium of model checking process Lack of confidence in usefulness of current model checking processes Lack of awareness of the need to check models Lack of awareness of the need for good housekeeping Lack of accountability for model errors Low priority accorded to model checking activities

73

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Barriers and facilitators

the availability of sufficient levels of expertise in attached to these activities, indicating that simple scrutinising HTA models. Financial constraints resource constraints were not necessarily the key were noted explicitly by some respondents, and barrier. implicitly by others. However, even with sufficient financial resources, it was suggested that more I don’t think we’d drop anything. I’d spend less extensive model checking activities may not time on validation…well, most of these projects identify and avoid all model errors potentially are all underfunded anyway…so, we’d probably because of the tedious nature of model checking again apportion that money into those other activities and a lack of confidence therein. activities, and again…put less on validation. [I7] …we don’t get paid enough, the cheaper option is to get someone else to go through it …if we are under severe time pressure we but after they’ve been through a thousand lines try to cut back on what the outputs delivered of coding and not spotted an error, do they will be, so if we were hoping to do a value of give up and not check the next thousand. information analysis we will chop that, if we [I4] were hoping to do sophisticated sensitivity analysis we would chop that, just drop blocks of …it is a matter of painstakingly going through work rather than compromising the quality of and seeing ‘yes, this number and this model key work which is the fundamental outputs in and where does this correspond to in the the model. report’, but that is not something you know – it [I4] is not something that is not guaranteed to work just because it is a tedious process…you are not going to have time to go through and find Facilitators every single number. [I5] Discussion of potential facilitators to model checking was typically brief, with few respondents The above statements serve to highlight some suggesting how any of the above barriers could of the gaps between the breadth of error types be ameliorated. Interviews suggested that setting presented in the taxonomy and the effectiveness up internal processes to ensure that things are of current model checking activities to identify not done in a hurry, that projects are adequately and avoid these (see Tables 7 and 8). Other barriers staffed and that work is divided up appropriately included developing models ‘too quickly’ and might facilitate improved error checking activities. organisational barriers such as working within In contrast to the barriers to model checking, groups in which a shared understanding is not one respondent from an Outcomes Research maintained. One respondent raised a number of organisation suggested that in his experience less tangible barriers to model checking in which clients would provide a budget for quality control; the onus is placed on the modeller rather than the this view was not reflected in any respondents constraints of the decision-making process. Barriers working within an academic setting. One included a lack of proper time management, a interviewee suggested the use of a time delay lack of appreciation of the need to check models within the model development process to allow to identify errors, failure to adopt housekeeping improved internal review: approaches and a lack of accountability for model errors. If we had a 3-month delay in every project where you handed it in…and then you came To explore the priority allocated to model checking back in 3 months to read your report and re- activities interviewees were asked ‘if you had to examine your assumptions, your logic and increase checking which other activities would you what you had written in the report, your report drop?’ and/or ‘where in the development process would be a hell of a lot better, which is why the would additional monies/time be allocated?’. Some peer review. of the comments indicated that a low priority was [I2]

74 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Chapter 9 Discussion and conclusions

Strengths and weaknesses loose ends raised during the interview. To ensure of the study congruence between the lead interviewers the interview pilots and the first interview were The structure for considering the validity of undertaken by the two lead interviewers together. qualitative research described by Mays and Some of the interviewees were known personally by Pope24 is used here to examine the strengths and the interviewers, this may have affected the nature weaknesses of this study. Mays and Pope identify of some of the responses. These effects may have the following key issues of validity: triangulation of been positive in aiding an open communication methods, respondent validation, transparency of during the interviews, and negative in leading to methods, reflexivity and fair dealing. interviewers assuming the nature of some of the interview responses. This study uses two primary sources of evidence, interviews with HTA modellers and reviews of the Regarding the definition of errors, the preliminary literature, this twin approach has the strength of questioning about the full breadth of the modelling providing a triangulation of different research process may have biased interviewees towards methods, helping to ensure a complete coverage a definition that encompassed all stages of the of issues and allowing a convergent validity of the process. An alternative method of questioning results and conclusions to be examined. would have been to commence discussion on technical errors of implementation and let Respondent validation was undertaken with interviewees explore the definition outward until the HTA modellers interviewed to ensure that they defined their own boundary. However, this the extracts from the transcripts used within approach may equally have led to an artificially the analysis were taken within context and did constrained definition of errors. not misrepresent the views of the interviewees. This validation was undertaken by sending the In the analysis the importance ascribed to interviewees a near final draft of the report, individual accounts is only partially related to their including the complete interview analyses and frequency. The importance ascribed to comments is principle conclusions. The communication invited at least potentially subject to bias arising from the both general comment on the content of the report prior beliefs and prejudices of the analysts, this was and specific comment on the above validity issues. guarded against by attention to deviant cases and All interviewees were happy that their views had by undertaking elements of the analyses in teams. not been misrepresented. The interview sample was designed to ensure fair To ensure clarity in reporting of methods the representation of organisational perspectives in report has been subjected to internal peer review the analysis. After the first draft of the descriptive from a qualitative researcher, mixed methodologist, interview analysis, the authors checked the modellers not involved in the interviews and a distribution of comments used in the report across systematic reviewer. the different interviewees. Although there was some variation in the number of comments ascribed Attention was paid in the research design to ensure to different individuals, all interviewees were as far as possible that the research methods and represented in the analysis. No rebalancing of the particular perspectives of the researchers did not report was therefore deemed necessary. bias the results. Three individuals were involved in the interviews to avoid bias associated with having a With regard to the review of literature on error single interviewer. Ten of the twelve interviews were classifications, one potential weakness was that the undertaken in pairs, with one lead interviewer and literature focused on spreadsheet and program one interviewer responsible for ensuring complete development rather than specifically on model coverage of the questioning and capturing of development. It is assumed that this evidence

75

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Discussion and conclusions

is transferable to the modelling domain. It The original focus of this research, and for should also be noted that the identified literature consistency the terminology throughout this report, concerning programming errors was very narrow has been in terms of errors in models. However, despite broad searches being undertaken. in the light of the previous discussion of error boundaries and when considering methods of The interviews focused on the process of modelling improving modelling practice and processes and for HTA decision-making; however, much of this examining skills and training requirements, it may discussion focuses on generic modelling issues and be more useful to refer to modelling risks rather one strength of this work is the potential validity than the more black and white term modelling outside its immediate domain. Perhaps the main errors. impact of the focus on HTA modelling is in the area of understanding the decision problem. The During the interviews, a number of respondents HTA modellers interviewed moved directly to a discussed the concepts of validation and discussion of the PICO definition of a problem verification. Where validation and verification scope. When moving to different decision-making were discussed, there was a consistency among the areas, this process of defining the problem would participants in their interpretation. Verification have to be more broadly based with an initial focus was taken to mean the process of ensuring that on defining the key characteristics of the decision the computer model correctly implemented the problem. Thereafter, much of the modelling intended model, that is the implementation of discussion appears to be generic. the model is free of errors. Validation meant the process of ensuring that a model is fit for purpose, some interviewees explicitly noting that Current understanding of this concept might subsume verification. There modelling errors in the was some discussion that related to the concept of credibility; however, this was not developed in any HTA community formal sense. As a consequence, issues of validation There is a general consensus that an important and verification are central to the discussion of part of the definition of what constitutes an error is errors. Sargent41 provides a definition of overall its impact on decision-making. This indicates that validity, encapsulated within Figure 2, as being a pragmatic philosophical approach is generally comprised of conceptual model validity, verification taken across the HTA modelling community. Due of the computer model, and operational validity to the nature of the interview process it cannot be of the use of the model in addressing the real assumed that the absence of an explicit discussion world problem. The definitions of verification of such a theoretical underpinning means that and validation provided by the interviewees are individuals do not use such a framework in compatible with these definitions. These definitions informing their modelling activities. are therefore recommended as the basis for further discussion of these topics. Although the interview discussions all implied this common pragmatic outlook, there was no common The qualitative analysis highlights considerable language used in the discussion of modelling errors variation in modelling practice across the and furthermore there was a somewhat inconsistent HTA modelling community. This may be in approach to the boundaries of what constitutes an part explained by the apparent variations in error. For instance, when asked explicitly about the expertise, background and experience between definition of model error, there was a tendency for the respondents. One particular area of variation participants to exclude matters of judgement from concerned the explicit demonstration of conceptual being errors and focus on what have been termed modelling. slips and lapses. However, discussion of slips and lapses comprised less than 20% of the discussion on Although there was consensus surrounding the types of errors. When considering how individual definition of verification, to make this a measurable elements of the modelling process might contribute and useful concept there needs to be an explicit to flaws in decision-making, the interviewees and complete description (or design specification) devoted 70% of the discussion to softer elements of the intended or conceptual model. A common of the process of defining the decision question theme in the modelling process discussion was the and conceptual modelling, mostly the realms of absence of a fully transparent description of the judgement, skills, experience and training. intended model (indeed for some interviewees this

76 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

barely exists at all). So even the distinction between it is likely that further development of this verification and validation breaks down in any classification of errors will be required, specifically practically useful sense. with regard to considering methods for improving the retrospective identification of errors. This is The methodological literature on verification because the taxonomy does not seek to classify and validation of models makes reference to errors by symptoms. the Hermeneutic philosophical position that recognises that objectivity is unachievable The error category termed by Panko as ‘context and suggests that meaning is created through errors’ hides a multitude of different error intersubjective communication. This literature causes. For example, this category contains supports the proposition that it is the interaction errors associated with breakdowns in domain/ between the modeller and client in developing disease knowledge, modelling skills/knowledge, mutual understanding of a model that establishes programming skills/knowledge, mathematical a model’s significance and its warranty.35 This logic and statistical methods knowledge. All position highlights that model credibility is the of these represent areas of judgement where central concern to decision-makers in using models the identification and definition of error are as an input to the decision-making process. This problematic and may include at their most highlights the point that the concept of model extreme, violations of methods and processes. validation should not be externalised from the This area requires specific attention when decision-makers and the decision-making process. moving towards strategies to avoid errors in HTA modelling. A taxonomy of HTA modelling errors Current strategies for avoiding errors The interviewees collectively demonstrated examples of all the major error types identified The qualitative analysis suggests that a range of in the literature on errors in end-user developer techniques and procedures are currently used to spreadsheet systems. Taken together, the interview avoid errors in HTA models. Importantly, there is classification of modelling errors provides a basis some degree of overlap between methods used to for developing our understanding of errors and identify errors and methods used to avoid errors facilitating discussion of errors. The literature in models; the distinction between avoidance and detailing classifications of spreadsheet errors identification methods is determined by the point is based around the work of two key authors, at which the strategy is applied. These strategies Panko and Rajalingham. Six different versions of for error avoidance are loosely defined as either classifications have been identified, each of which processes or techniques; the former relate to issues struggle to present classifications that are complete in the model development process, whereas the and mutually exclusive while being sufficiently rich latter relate to techniques of implementation. to be useful, the HTA error classifications similarly Generally, the techniques are explicit and can be struggle with these issues. interpreted as relating to how something should be done, for example, implementing a specific Ko explicitly identifies complex chains or networks model layout. Conversely, the processes recognise of errors and faults that can lead to a failure. This an unfulfilled requirement and acknowledge layer of complexity is not explicitly discussed that something should be done as part of model either in the spreadsheet error literature or in the development, yet in many cases this is not HTA modelling interviews. Further development accompanied by a clear strategy for achieving of model errors and risks should reflect this the required goal. Current methods for avoiding complexity in the relationship between errors in errors include engaging with clinical experts, models and failures in decision-making. Existing engaging with the client or decision-maker to research on the cognitive basis of human error ensure mutual understanding, developing written should be brought into the examination of documentation of the proposed model, explicit modelling errors. conceptual modelling, e.g. using diagrams and sketches, stepping through skeleton models with To take forward any strategy for reducing errors experts, ensuring transparency in reporting, and improving the robustness of decision support, adopting standard housekeeping techniques, and

77

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Discussion and conclusions

ensuring that those parties involved in the model a mismatch between actual model results and the development process have sufficient training derivation of prior expectations from a previous in relevant disciplines (e.g. health economics, model may be suggestive of the presence of an statistics, systematic reviewing), as well as ensuring error, yet its root cause may be entirely unclear. appropriate skills in model programming, This represents a considerable challenge in the model scrutiny and general problem structuring. peer review of models. The same may be true Evidently, the range of strategies and approaches of certain black-box validation techniques; only for avoiding model errors is considerably broader the tests of the underlying logic of the model than housekeeping techniques alone; only one-fifth guarantee the presence of an error. In this sense, of the strategies target the implementation and the methods outlined in Table 8 which do map operation of the model. The remaining four-fifths directly to specific aspects of the taxonomy are target the conceptual validity of the model. Very diagnostic in nature; mismatches in model results little emphasis is placed on how model results are and expectations are indicative of the presence conveyed to users. of model error. Those aspects which map to any or all points in the taxonomy are effectively non- Clarity and mutual understanding, specifically specific model screening methods; the presence of in the definition of the decision problem and differences between models and prior expectations conceptual modelling, are identified as key issues. are not necessarily the result of a model error. Currently adopted strategies supporting these aspects of model development are expressed as process requirements, e.g. establishment of long- Recommendations term clinical input and iterative negotiation with the decision-maker or client may be used to avoid • Published definitions of overall model validity errors in the description of the decision problem. comprising conceptual model validation, Although a number of techniques were suggested verification of the computer model, and by the interviewees, for example, sketching operational validity of the use of the model out clinical pathways, their use appears to be in addressing the real-world problem are partial, and the extent of their use appears to consistent with the views expressed by the HTA vary considerably between individual modellers. community and are therefore recommended Hence, while there is an acknowledgement of as the basis for further discussions of model the importance of these methods, their current credibility. implementation is not framed within an overall • Discussions of model credibility should focus strategy for structuring complex problems. on risks, including errors of implementation, errors in matters of judgement and violations – violations being defined as puffery, fraud or Current strategies for breakdowns in operational procedures. identifying errors in HTA • Discussions of modelling risks should reflect the potentially complex network of cognitive models breakdowns that lead to errors in models Current methods for identifying errors in HTA and subsequent failures in decision support. models include checking face validity, assessing Existing research concerning the cognitive whether model results appear reasonable, black- basis of human error should be brought into box testing strategies, testing internal consistency the examination of modelling errors. and predictive validity, checking model input • There is a need to develop a better values, double-programming and peer review. understanding of the skills requirements for These strategies largely relate to specific techniques the development, operation and use of HTA (rather than processes) that may be applied by models. third-party scrutiny. However, the specific target • The qualitative interviews highlighted a of the techniques, i.e. the types of error that the number of barriers to model checking. technique is intended to identify, is not always However, it was indicated that increasing time clear. Indeed, the majority of methods do not and resources would not necessarily improve appear to map directly to specific error types model checking activities without a matched in the taxonomy, but rather they may be used increase in their prioritisation. to identify symptoms of errors, the true source of • The authors take the view, as supported which may relate to any point in the taxonomy of within the methods literature, that it is the model errors. For example, the identification of interaction between the modeller and client in 78 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

developing mutual understanding of a model Model development process that establishes a model’s significance and its Further research is required in the model warranty. This position highlights that model development process. Two specific areas were credibility is the central concern to decision- identified: makers in using models. It is crucial then that the concept of model validation should not be • techniques and processes for structuring externalised from the decision-makers and the complex HTA models, developing mutual decision-making process. understanding and identifying conflicting perceptions between stakeholders in the decision problem Research recommendations • development of the model design process Verification and validation and mechanisms for reporting and specifying models. There has been remarkably little development in the theoretical underpinning of model Errors research verification and validation since a European Journal of Operational Research special issue31 focused on Despite a large literature on modelling best the topic in 1993. It was then noted that most practice there is little evidence to suggest that modellers instinctively took a pragmatic approach models are improving in reliability. Further to developing model credibility, operating in the research is required to define, implement and middle ground between objectivism and relativism. importantly evaluate modifications to the This description accurately portrays the current modelling process with the aim of preventing position in HTA modelling. Further research on the occurrence of errors and improving the the theory of model verification and validation is identification of errors in models. Mechanisms required to provide a solid foundation for (1) the for using National Institute for Health Research- model development process and (2) the processes funded model developments to facilitate this for making evidence-based policy and guidance. research could be pursued, for example, by This research would inform how modelling is providing research funding for the specification best used in the decision-making process and and evaluation of enhanced modelling methods how credibility, the warranty for action, is best within National Institute for Health Research- developed. funded studies.

79

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Acknowledgements

rofessor Alan Brennan and Dr Alicia O’Cathain in the interviews. Two interviewees remained Pcommented on the draft report. We would like anonymous. to thank Mrs Clare Watson, Ms Elizabeth Marsden, Ms Andrea Shippam and Miss Sarah McEvoy for transcribing the interview recordings. Ms Andrea Contributions of authors Shippam organised the retrieval of papers and helped in preparing and formatting the report. Jim Chilcott led the project. Jim Chilcott, Paul Tappenden and Suzy Paisley were responsible The authors also wish to thank Mr Dominic for the design of the study. Jim Chilcott, Paul Muston, Mr Steve Beard, Dr Martin Pitt, Professor Tappenden and Andrew Rawdin undertook all Adrian Bagust, Dr Matt Stevenson, Dr Alejandra in-depth interviews. All authors contributed Duenas, Dr Jeremy Jones, Mr Andrew Davies, Mr towards the qualitative synthesis methods and Pelham Barton and Mr Adam Lloyd for partaking implementation. Diana Papaioannou carried out the literature searches.

81

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

References

1. Buxton M, Drummond M, Van Hout B, Prince R, 13. Gold M, Siegel J, Russell L, Weinstein M, editors. Sheldon T, Szucs T, et al. Modelling in economic Cost-effectiveness in Health and Medicine. Oxford: evaluation: an unavoidable fact of life. Health Econ Oxford University Press; 1996. 1997;6:217–27. 14. Canadian Coordinating Office for Health 2. Cragg PB, King M. Spreadsheet modelling Technology Assessment. Guidelines for Economic abuse: An opportunity for OR. J Op Res Soc Evaluation of Pharmaceuticals. 2nd edn. Ottowa, 1993;44:743–52. Canada: CCOHTA; 1997.

3. Panko RR. What we know about spreadsheet errors. 15. Garrison LJ. The ISPOR Good Practice Modeling J End User Comp Special issue on Scaling Up End User Principles – a sensible approach: be transparent, be Development 2008;10:15–21. reasonable. Value in Health 2003;6(1):6–8.

4. Neuhauser D, Lewicki A. What do we gain from the 16. McCabe C, Dixon S. Testing the validity of sixth stool guaiac? N Engl J Med 1975;293:226–8. cost-effectiveness models. Pharmacoeconomics 2000;5:501–13. 5. Brown K, Burrows C. The sixth stool guaiac test. J Health Econ 1990;9:429–55. 17. Sculpher M, Fenwick E, Claxton C. Assessing quality in decision analytic cost-effectiveness models: a 6. Culyer A. Economics. Oxford: Blackwell Scientific suggested framework and example of application. Publications; 1985. Pharmacoeconomics 2000;17:461–77.

7. Drummond M, Stoddart G, Torrance G. Methods 18. Panko RR. A rant on the lousy use of science in for the Economic Evaluation of Health Programmes. best practice recommendations for spreadsheet Oxford: Oxford University Press; 1987. development, testing and inspection. 2007. URL: http://panko.cba.hawaii.edu 8. Drummond M, Sculpher M, Torrance G, O’Brien B, Stoddart G. Methods for the Economic Evaluation of 19. Britten N. Qualitative interviews. In Pope C, Health Care Programmes. Oxford: Oxford University Mays N, editors. Qualitative Research in Health Care. Press; 2005. 3rd edn. Oxford: BMJ Blackwell Publishing; 2006. pp. 12–20. 9. Hill S, Mitchell A, Henry D. Problems with the interpretation of pharmacoeconomic analyses: 20. Mason J. Qualitative Researching. London, UK: Sage a review of submissions to the Australian Publications Ltd; 1998. Pharmaceuticals Benefit Scheme. J Am Med Assoc 2000;283:2116–21. 21. Ritchie J, Lewis J. Qualitative Research Practice. A Guide for Social Science Students and Researchers. 10. Li J, Harris A, Chin G. Assessing the quality London, UK: Sage Publications Ltd; 2003. of evidence for decision-making (QED): a new instrument. Abstract 104. 2008. Health Priorities 22. Black N, Brazier J, Fitzpatrick R, Reeves B. Health 2008. Conference of the International Society for services research methods: a guide to best practices. Priorities in Health Care, Gateshead, UK 28–31 London, UK: BMJ Books; 1998. pp. 187–98. October 2008. 23. Fisher K, Erdelez S, McKechnie L. Theories 11. Holcombe M, Ipate F. Correct Systems – Building a of information behaviour. ASIST monograph Business Process Solution. Applied Computing Series. series 2006. Medford, NJ: American Society for Berlin: Springer-Verlag; 1998. Information Science and Technology; 2006.

12. Cooper N, Coyle D, Abrams K, Mugford M, 24. Mays N, Pope C. Quality in qualitative health Sutton A. Use of evidence in decision models: research. In Pope C, Mays N, editors. Qualitative an appraisal of health technology assessments Research in Health Care. 3rd edn. Oxford: BMJ in the UK since 1997. J Health Service Res Policy Blackwell Publishing; 2006. pp. 82–101. 2005;10:245–50.

83

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. References

25. Rosenhead J, Mingers J. Rational Analysis for a 41. Sargent RG. Validation and verification of Problematic World Revisited. Problem Structuring simulation models. In Ingall RG, Rossetti MD, Methods for Complexity, Uncertainty and Conflict. Smith TS, Peters BA, editors. Proceedings of the 36th 2nd edn. Chichester: John Wiley & Sons Ltd; 2001. Conference on Winter Simulation. New Jersey, NJ: J Wiley & Sons, IEEE Press 2004; pp. 17–28. 26. Pidd M. Computer Simulation in Management Science. Chichester: John Wiley and Sons; 2004. 42. Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation and confirmation 27. Schlesinger S, Crosbie RE, Gagne RE, Innis GS, of numerical models in earth sciences. Science Lalwani CS, Loch J, et al. Terminology for model 1994;263(5147):641–6. credibility. Simulation 1979;32:103–4. 43. Kleijnen J. Verification and validation of simulation 28. Naylor TH, Finger JM, McKenney JL, Schrank WE, models. Eur J Op Res 1995;82:145–62. Holt CC. Verification of computer simulation models. Management Sci 1967;14:B92–B106. 44. Chwif L, Shimada LM. A prescriptive technique for V&V of simulation models when no real-life data 29. Luis SJ, McLaughlin D. A stochastic approach to are available. In Perrone LF, Lawson BG, Liu J, model validation. Adv Water Resour 1992;15:15–32. Wieland FP, editors. Proceedings of the 38th Winter Simulation Conference. New Jersey, NJ: J Wiley & 30. Kleijnen J. Verification and validation of simulation Sons, IEEE Press; 2006; pp. 911–18. models. Eur J Op Res 1995;82:145–62. 45. Panko RR, Halverson Jr. RP. Spreadsheets on 31. Special issue on model validation. Eur J Op Res trial: a survey of research on spreadsheet risks. In 2009;66(2). Proceedings of the 29th Hawaii International Conference on System Sciences. New Jersey, NJ: IEEE Press; 32. Déry R, Landry M, Banville C. Revisiting the issue 1996;2:326. of model validation in OR: an epistemological view. Eur J Op Res 1993;66:168–83. 46. Teo TSH, Tan M. Quantitative and qualitative errors in spreadsheet development. Proceedings of 33. Raitt R. OR and science. J Op Res Soc the 30th Hawaii International Conference on System 1974;20:835–6. Sciences. New Jersey, NJ: IEEE Press; 1997;3:149.

34. Rorty R. Philosophy and the Mirror of Nature. 47. Rajalingham K, Chadwick DR, Knight B. Princeton, NJ: Princeton University Press; 1979. Classification of spreadsheet errors. Conference of the European Spreadsheet Risks Interest Group (EuSpRIG) 35. Kleindorfer GB, O’Neill L, Ganesahn R. Validation 2000. in simulation, various position in the philosophy of science. Management Sci 1998;44:1087–99. 48. Ko AJ, Myers BA. Development and evaluation of a model of programming errors. In Proceedings 36. Gadamer H. Truth and Method. 4th edn. New York, of IEEE Symposium on Human-Centric Computing NY: Seabury Press; 1975. Languages and Environments, Auckland, New Zealand, 28–31 October 2003. pp. 7–14. 37. Bernstein R. Beyond Objectivism and Relativism: Science, Hermeneutics and Praxis. Philadelphia, PA: 49. Rajalingham K. A revised classification of University of Pennsylvania Press; 1983. spreadsheet errors. Conference of the European Spreadsheet Risks Interest Group (EuSpRIG) 2005. 38. Howson C, Urbach P. Scientific Reasoning: The Bayesian Approach. 2nd edn. Chicago, IL: Open 50. Walia GS, Carver J, Philip T. Requirement error Court Publishing Company; 1996. abstraction and classification:an empirical study. Proceedings of the 2006 ACM/IEEE International 39. Sargent RG. An exposition on verification and Symposium on Empirical Software Engineering; 21–22 validation of simulation models. In Gantz DT, September 2006. Blais GC, Solomon SL, editors. Proceedings of the 17th Conference on Winter Simulation. New Jersey, NJ: 51. Purser M, Chadwick D. Does an awareness of IEEE Press; 1985; pp. 15–22. differing types of spreadsheet errors aid end-users in identifying spreadsheets errors? Conference of the 40. Sargent RG. Validation and verification European Spreadsheet Risks Interest Group (EuSpRIG) of simulation models. In Farrington PA, 2006; pp. 185–204. Nembhard HB, Sturrock DT, Evans GW, editors. Proceedings of the 31st Conference on Winter Simulation: 52. Panko RR. Revising the Panko–Halverson taxonomy Simulation – a bridge to the future, vol. 1, New Jersey, of spreadsheet risks. 42nd Hawaii International 84 NJ: IEEE Press; 1999. pp. 39–48. Conference on System Sciences, 2009. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

53. Powell SG, Baker KR, Lawson B. A critical review of Steinberg ER. editors. Cognitive Processes in Writing. the literature on spreadsheet errors. Decision Support Hillsdalle, NJ: Erlbaum; 1980. pp. 31–50. Systems 2008;46:128–38. 57. Ko A, Myers B. A framework and methodology 54. Panko RR. End User Computing: Management, for studying the causes of errors in programming Applications and Technology. New York, NY: John systems. J Visual Lang Comp 2005;16:41–84. Wiley; 1988. 58. Reason J. Human Error. Cambridge: Cambridge 55. Allwood C. Error detection processes in statistical University Press; 1990. problem solving. Cogn Sci 1984;8:413–37. 59. Eddy D. Technology assessment: the role of 56. Flower L, Hayes J. The dynamics of composing: mathematical modeling. In Assessing Medical making plans and juggling constraints. In Greg LW, Technologies. Washington D.C.: National Academy Press; 1985. pp. 144–75.

85

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Appendix 1 Avoiding and identifying errors in health technology assessment models

Interview topic guide Research aims to explore:

• Definitions of model error • Experience of model errors • Reasons for/causes of model errors • Perceived importance of model errors • Interviewees’ approaches to model checking • Perceived facilitators and barriers to model checking • Potential ways of reducing errors in the future – what can others learn from their experience

1. Introduction AIM: TO ENSURE THAT THE INTERVIEWEE IS AWARE OF THE PURPOSE OF THE RESEARCH AND THE NATURE OF THE INTERVIEW.

• Introduce self and School of Health and Related Research (ScHARR). • Explain: –– nature and purpose of research –– who the research is for –– how the interview results will be used –– interested in views and experiences – no right or wrong answers. • Stress confidentiality and anonymity. • Introduce tape recorder. • Any other issues?

2. Background AIM: TO OBTAIN PRELIMINARY INFORMATION ABOUT THE INTERVIEWEE BUT ALSO TO GET THEM COMFORTABLE WITH TALKING. QUESTIONS SHOULD NOT BE INTRUSIVE.

• Context and composition of research organisation. • Their role – how do they fit it? • What types of modelling activities have they been involved in? –– experiences –– case studies –– methods –– software.

3. The model development process AIM: TO ELICIT THE MODEL DEVELOPMENT PROCESS FROM THE PERSPECTIVE OF THE INTERVIEWEE. THE SECOND INTERVIEWER SHOULD MAP VISUALLY THIS USING THE TERMS USED BY THE INTERVIEWEE.

• Elicitation of conceptual map: –– Where does it start/stop? 87

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 1

–– Types of activities at each point? –– Iterations? • Recap of elicited map by second interviewer.

4. Definition and experience of errors AIM: TO ELICIT WHAT THE INTERVIEWEE UNDERSTANDS BY THE TERM ERROR – TO DRAW A BOUNDARY AROUND THE TERM. WHAT ARE THE KEY FEATURES? WITH EXAMPLES. LOOK OUT FOR USE OF ‘VALIDATION’ AND ‘VERIFICATION’.

• How would they define what a model error is? –– How would they explain what an error is to someone else? –– What are the features or characteristics of a model error? Intended or not –– What is it that makes an error an error?

5 Types of model errors AIM: TO IDENTIFY DIFFERENT TYPES OF ERRORS AND WHERE AND WHY THEY MAY OCCUR WITHIN THE MODEL DEVELOPMENT PROCESS.

• Thinking back through the model development process you described earlier, what types of model errors might arise? –– incorrect calculation (logic) –– incorrect programming –– poorly scoped problems –– failure to meet the scope –– failure to meet the reference case –– inappropriate assumptions –– inappropriate methodology –– problems in evidence review –– errors in reporting/poor reporting. • For each type, ask: –– What they mean by that type of error? –– Why did they arise? Were they avoidable? –– When and how were they discovered? –– What was done about them? Why? –– What impact did they have? –– Did they influence your future practice? –– What are the most important types of error; reasons.

6. Current approaches to avoiding errors AIM: TO EXPLORE INTERVIEWEE’S CURRENT APPROACHES TO AVOIDING ERRORS.

[Interviewer to define ‘checking’ to avoid framing assumptions – didn’t want to use the word validation or verification. ‘Any activity that you embark on to ensure that the model is free from errors’]

• You’ve talked about x errors and how they were discovered – I’d like to explore that a little more systematically. What types of activities do you embark on to ensure that the model is free from errors? –– conceptual model validation, e.g. with experts –– computerised model validation –– good modelling practice – e.g. common approach to tabulation, including all parameters on same sheet, not using values in formulae –– data validation –– external validation against other non-input data –– reference to other models.

88 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

• At what point do you embark on these checking activities? –– Throughout the model development process? –– At specific stages? Which ones? By who? –– At the end of model development? –– When do you start/stop? • Why do you undertake this checking? –– confidence in results –– does it matter – impact on decision –– requirement of commissioner –– avoid embarrassment later. • If you find and correct an error in your model, does that make you more or less confident that the model is now error-free?

7. Future initiatives in model checking • You’ve talked about what you currently do, what might you do in terms of checking models? Are there any other initiatives that you’ve not covered? –– guidelines –– double-programming. • What do you think are the pros and cons of x? –– all models are different –– use of different software packages.

8. Barriers and facilitators to model checking AIMS: TO EXPLORE INTERVIEWEE’S PERCEPTION OF BARRIERS AND FACILITATORS TO MODEL CHECKING.

• What factors affect the amount of testing and validation you are able to perform? –– time constraints –– knowledge of formal testing and validation procedures –– idiosyncrasies of model/decision problem –– lack of framework/guidelines on checking –– lack of training/skills –– what’s possible –– employing the right people.

9. Interview close • Are there any other issues regarding errors in HTA models that we haven’t talked about that you would like to raise but haven’t had the opportunity? • Offer report prior to submission to ensure no quotes taken out of context. • Anticipated timelines. • Thanks.

89

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Appendix 2 Search strategies

Scoping search

Database Search query Results Computer and Information {[DEa = (‘modelling’)] or [TIb = (model or models)]} and (DE = ‘errors’) 115 Systems Abstracts and [TI = (error*c)] Water Resources Abstracts (TI = error*) and [(DE = ‘errors’) and (DE = ‘model studies’)] 49 Water Resources Abstracts (Aud = Bevan) 160

a DE, database index term. b TI, title field. c *, truncation. d Au, author field.

Search strategies to locate error avoidance and identification – February 2008 Water Resources Abstracts via CSA

Results Selected 1. (DEa = ‘model studies’) and (DE = ‘model testing’) and 96 19 (DE = ‘errors’) 2. ABb = (model structure error) 5 5 3. AB = (model structural error) 7 5 (plus one included in step 2) 4. TIc = (GLUEd methodology) 8 5 (2 previously included) 5. AB = (error*e in modelling) 3 1 6. AB = (error* in modelling) 2 0 7. TI = (model error*) 9 3 (2 previously marked) 8. TI = error* and AB = (artificial neural network*) 3 1 (1 previously marked) 9. TI = error* and AB = (artificial neural network*) 2 1 10. TI=(error analysis) 56 13 11. TI=(model testing) 25 7 12. TI=(model verification) 26 8 13. refsgaard.auf 46 7

a DE, database index term. b AB, abstract field. c TI, title field. d GLUE, Generalized Likelihood Uncertainty Estimation. e *, truncation. f au, author field.

91

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 2

ACM Digital Library

Results Selected 1. TIa = model testing 72 13 2. TI = model verification 90 5 3. TI = model error 49 0

a TI, title field

Computer and information systems abstracts via CSA

Results Selected 1. DEa = model studies 32 2 2. TIb = model error*c 26 15 3. TI = error* and TI = (modelling or modelling) 173 0 scanned first 50 4. TI = (error analysis) and TI = model* 31 20 5. TI = (model testing) 13 8 6. TI = (model verification) 27 10

a DE, database index term. b TI, title field. c *, truncation.

IEEE/IET Electronic Library (IEL)

Results Selected 1. Model testing 58 0 2. {[(model error*a)tib] [(assess or verif* or assurance or 17 14 valid*)abc]} 3. {[(model error*)ti] [(identif* or analys*)ab]} 41 19 4. (model verification).ti 46 7

a *, truncation. b ti, title field. c ab, abstract field.

ANTE: abstracts in new technologies and engineering via CSA

Results Selected 1. Model verification.ti.a 10 2 2. Model validation.ti. 29 1 3. Model*b error*.ti. 5 5 4. model* error*.ab.c 109 0

a ti, title field. b *, truncation. c ab, abstract field.

92 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Google Scholar

Results Selected 1. Spreadsheet error*a 87 28

a *, truncation.

Web of Knowledge

Results Selected 1. Refsgaard.au.a 21 5 2. (Model verification and error*b).ti.c 9 4 3. model error correction.ti. 129 17 4. (model validation and error*).ti. 21 6 5. (model error* and uncertainty).ti. 23 7 6. GLUE methodology.ti. 9 2 7. (model testing and error*).ti. 43 2

a au, author field. b *, truncation. c ti, title field.

Search strategies to locate error Citation searches February 2009 classification systems – January undertaken on Google Scholar 2009 and Web of Knowledge Web of Knowledge Janvrin D, Morrison J. Using a structured design approach to reduce risks in end user spreadsheet Topic = (program*a or spreadsheet*) AND Topic=(error*) development. Inform Manage 2000;37(1):1–12. AND Topic=(taxonom*):62 Abstracts in new technologies and engineering Panko R. What we know about spreadsheet errors. J End (ANTE) User Comp 1998;10(2):15–21. (program* or spreadsheet*) and error* and taxonom*): 10 Powell SG, Baker KR, Lawson B. A critical review of the Computer and Information Systems Abstracts literature on spreadsheet errors. Decision Support Systems (program* or spreadsheet*) and error* and taxonom*: 68 2008;46(1):128–38. results Purser M, Chadwick D. Does an awareness of differing IEEE/IET Electronic Library types of spreadsheet errors aid end-users in identifying b program* or spreadsheet* in ti and (error*) in all fields and spreadsheets errors? Proceedings of European Spreadsheet [(taxonom* or classif*)all fields]: 40 Risks International Group (EuSpRIG) 2006; pp. 185–204. Google Scholar Spreadsheet error taxonom* (JC sifted results and Rajalingham K, Chadwick DR, Knight B. 2001, important relevant citations directly): 4630 results Classification of Spreadsheet Errors. Proceedings of European Spreadsheet Risks International Group (EuSpRIG) Note: a *, truncation; b ti, title field. 2001.

Web of Knowledge: 0 results

Google Scholar: JC sifted results and important relevant citations directly

93

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Appendix 3 Charts

95

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I1

96 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I2

97

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I3

98 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I4

99

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I5

100 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I6

101

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I7

102 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I8

103

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I9

104 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I10

105

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Appendix 3

I11

106 DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

I12

107

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Health Technology Assessment reports published to date Volume 1, 1997 No. 11 No. 6 Newborn screening for inborn errors of Effectiveness of hip prostheses in No. 1 metabolism: a systematic review. primary total hip replacement: a critical Home parenteral nutrition: a systematic By Seymour CA, Thomason MJ, review of evidence and an economic review. Chalmers RA, Addison GM, Bain MD, model. By Richards DM, Deeks JJ, Cockburn F, et al. Sheldon TA, Shaffer JL. By Faulkner A, Kennedy LG, Baxter K, Donovan J, Wilkinson M, No. 12 Bevan G. No. 2 Routine preoperative testing: a Diagnosis, management and screening systematic review of the evidence. of early localised prostate cancer. No. 7 By Munro J, Booth A, Nicholl J. A review by Selley S, Donovan J, Antimicrobial prophylaxis in colorectal Faulkner A, Coast J, Gillatt D. No. 13 surgery: a systematic review of randomised controlled trials. No. 3 Systematic review of the effectiveness of The diagnosis, management, treatment laxatives in the elderly. By Song F, Glenny AM. and costs of prostate cancer in England By Petticrew M, Watt I, Sheldon T. and Wales. No. 8 No. 14 A review by Chamberlain J, Melia J, Bone marrow and peripheral Moss S, Brown J. When and how to assess fast-changing blood stem cell transplantation for technologies: a comparative study of malignancy. No. 4 medical applications of four generic Screening for fragile X syndrome. technologies. A review by Johnson PWM, Simnett SJ, A review by Murray J, Cuckle H, Taylor A review by Mowatt G, Bower DJ, Sweetenham JW, Morgan GJ, Stewart G, Hewison J. Brebner JA, Cairns JA, Grant AM, LA. McKee L. No. 5 No. 9 A review of near patient testing in Screening for speech and language primary care. Volume 2, 1998 delay: a systematic review of the By Hobbs FDR, Delaney BC, literature. Fitzmaurice DA, Wilson S, Hyde CJ, Thorpe GH, et al. No. 1 By Law J, Boyle J, Harris F, Antenatal screening for Down’s Harkness A, Nye C. No. 6 syndrome. Systematic review of outpatient services A review by Wald NJ, Kennard A, No. 10 for chronic pain control. Hackshaw A, McGuire A. Resource allocation for chronic By McQuay HJ, stable angina: a systematic review of Moore RA, Eccleston C, Morley S, de C No. 2 effectiveness, costs and cost-effectiveness Williams AC. Screening for ovarian cancer: a of alternative interventions. systematic review. No. 7 By Bell R, Petticrew M, Luengo S, By Sculpher MJ, Petticrew M, Neonatal screening for inborn errors of Sheldon TA. Kelland JL, Elliott RA, Holdright DR, metabolism: cost, yield and outcome. Buxton MJ. A review by Pollitt RJ, Green A, McCabe No. 3 CJ, Booth A, Cooper NJ, Leonard JV, Consensus development methods, No. 11 et al. and their use in clinical guideline Detection, adherence and control of development. No. 8 hypertension for the prevention of A review by Murphy MK, Black NA, stroke: a systematic review. Preschool vision screening. Lamping DL, McKee CM, Sanderson By Ebrahim S. A review by Snowdon SK, CFB, Askham J, et al. Stewart-Brown SL. No. 4 No. 12 No. 9 Implications of socio-cultural contexts A cost–utility analysis of interferon beta Postoperative analgesia and vomiting, for the ethics of clinical trials. for multiple sclerosis. with special reference to day-case A review by Ashcroft RE, Chadwick By Parkin D, McNamee P, Jacoby A, surgery: a systematic review. DW, Clark SRL, Edwards RHT, Frith L, Miller P, Thomas S, Bates D. By McQuay HJ, Moore RA. Hutton JL. No. 5 No. 13 No. 10 Effectiveness and efficiency of methods A critical review of the role of neonatal of dialysis therapy for end-stage renal Choosing between randomised and hearing screening in the detection of disease: systematic reviews. nonrandomised studies: a systematic congenital hearing impairment. By MacLeod A, Grant A, review. By Davis A, Bamford J, Wilson I, Donaldson C, Khan I, Campbell M, By Britton A, McKee M, Black N, Ramkalawan T, Forshaw M, Wright S. Daly C, et al. McPherson K, Sanderson C, Bain C. 109

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 14 No. 4 No. 15 A randomised controlled trial of different Evaluating patient-based outcome Near patient testing in diabetes clinics: approaches to universal antenatal HIV measures for use in clinical trials. appraising the costs and outcomes. A review by Fitzpatrick R, Davey C, testing: uptake and acceptability. Annex: Buxton MJ, Jones DR. Antenatal HIV testing – assessment of a By Grieve R, Beech R, Vincent J, routine voluntary approach. Mazurkiewicz J. No. 15 By Simpson WM, Johnstone FD, Boyd FM, Goldberg DJ, Ethical issues in the design and conduct Hart GJ, Gormley SM, et al. No. 16 of randomised controlled trials. Positron emission tomography: A review by Edwards SJL, Lilford RJ, No. 5 establishing priorities for health Braunholtz DA, Jackson JC, Hewison J, Methods for evaluating area-wide and technology assessment. Thornton J. organisation-based interventions in health and health care: a systematic A review by Robert G, Milne R. No. 16 review. Qualitative research methods in health By Ukoumunne OC, Gulliford MC, No. 17 (Pt 1) technology assessment: a review of the Chinn S, Sterne JAC, Burney PGJ. literature. The debridement of chronic wounds: a By Murphy E, Dingwall R, No. 6 systematic review. Greatbatch D, Parker S, Watson P. Assessing the costs of healthcare By Bradley M, Cullum N, Sheldon T. technologies in clinical trials. No. 17 A review by Johnston K, Buxton MJ, The costs and benefits of paramedic skills Jones DR, Fitzpatrick R. No. 17 (Pt 2) in pre-hospital trauma care. Systematic reviews of wound care By Nicholl J, Hughes S, Dixon S, No. 7 management: (2) Dressings and topical Turner J, Yates D. Cooperatives and their primary care agents used in the healing of chronic emergency centres: organisation and wounds. No. 18 impact. By Hallam L, Henthorne K. By Bradley M, Cullum N, Nelson EA, Systematic review of endoscopic Petticrew M, Sheldon T, Torgerson D. ultrasound in gastro-oesophageal cancer. No. 8 By Harris KM, Kelly S, Berry E, Screening for cystic fibrosis. Hutton J, Roderick P, Cullingworth J, No. 18 et al. A review by Murray J, Cuckle H, Taylor G, Littlewood J, Hewison J. A systematic literature review of spiral and electron beam computed No. 19 No. 9 tomography: with particular reference to Systematic reviews of trials and other clinical applications in hepatic lesions, A review of the use of health status studies. pulmonary embolus and coronary artery measures in economic evaluation. disease. By Sutton AJ, Abrams KR, Jones DR, By Brazier J, Deverill M, Green C, Sheldon TA, Song F. Harper R, Booth A. By Berry E, Kelly S, Hutton J, Harris KM, Roderick P, Boyce JC, et al. No. 20 No. 10 Primary total hip replacement surgery: Methods for the analysis of quality-of-life No. 19 a systematic review of outcomes and and survival data in health technology modelling of cost-effectiveness associated assessment. What role for statins? A review and with different prostheses. A review by Billingham LJ, Abrams KR, economic model. A review by Fitzpatrick R, Shortall E, Jones DR. Sculpher M, Murray D, Morris R, Lodge By Ebrahim S, Davey Smith G, McCabe C, Payne N, Pickin M, M, et al. No. 11 Sheldon TA, et al. Antenatal and neonatal haemoglobinopathy screening in the UK: Volume 3, 1999 review and economic analysis. No. 20 By Zeuner D, Ades AE, Karnon J, Factors that limit the quality, number and Brown J, Dezateux C, Anionwu EN. No. 1 progress of randomised controlled trials. Informed decision making: an annotated No. 12 bibliography and systematic review. A review by Prescott RJ, Counsell CE, Assessing the quality of reports of Gillespie WJ, Grant AM, Russell IT, By Bekker H, Thornton JG, randomised trials: implications for the Kiauka S, et al. Airey CM, Connelly JB, Hewison J, conduct of meta-analyses. Robinson MB, et al. A review by Moher D, Cook DJ, Jadad No. 21 AR, Tugwell P, Moher M, Jones A, et al. No. 2 Antimicrobial prophylaxis in total hip Handling uncertainty when performing No. 13 replacement: a systematic review. economic evaluation of healthcare ‘Early warning systems’ for identifying By Glenny AM, Song F. interventions. new healthcare technologies. A review by Briggs AH, Gray AM. By Robert G, Stevens A, Gabbay J. No. 22 No. 3 No. 14 Health promoting schools and health The role of expectancies in the placebo A systematic review of the role of human promotion in schools: two systematic effect and their use in the delivery of papillomavirus testing within a cervical reviews. health care: a systematic review. screening programme. By Crow R, Gage H, Hampson S, By Cuzick J, Sasieni P, Davies P, By Lister-Sharp D, Chapman S, 110 Hart J, Kimber A, Thomas H. Adams J, Normand C, Frater A, et al. Stewart-Brown S, Sowden A. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 23 No. 10 No. 21 Economic evaluation of a primary care- Publication and related biases. Systematic reviews of wound care based education programme for patients A review by Song F, Eastwood AJ, Gilbody management: (3) antimicrobial agents with osteoarthritis of the knee. S, Duley L, Sutton AJ. for chronic wounds; (4) diabetic foot A review by Lord J, Victor C, Littlejohns ulceration. P, Ross FM, Axford JS. No. 11 By O’Meara S, Cullum N, Majid M, Cost and outcome implications of the Sheldon T. organisation of vascular services. By Michaels J, Brazier J, No. 22 Volume 4, 2000 Palfreyman S, Shackley P, Slack R. Using routine data to complement and enhance the results of randomised No. 1 No. 12 controlled trials. The estimation of marginal time Monitoring blood glucose control in By Lewsey JD, Leyland AH, preference in a UK-wide sample diabetes mellitus: a systematic review. Murray GD, Boddy FA. (TEMPUS) project. By Coster S, Gulliford MC, Seed PT, A review by Cairns JA, van der Pol MM. Powrie JK, Swaminathan R. No. 23 Coronary artery stents in the treatment No. 13 No. 2 of ischaemic heart disease: a rapid and The effectiveness of domiciliary systematic review. Geriatric rehabilitation following health visiting: a systematic review of By Meads C, Cummins C, Jolly K, fractures in older people: a systematic international studies and a selective Stevens A, Burls A, Hyde C. review. review of the British literature. By Cameron I, Crotty M, Currie C, By Elkan R, Kendrick D, Hewitt M, Finnegan T, Gillespie L, Gillespie W, et al. Robinson JJA, Tolley K, Blair M, et al. No. 24 Outcome measures for adult critical care: No. 3 No. 14 a systematic review. Screening for sickle cell disease and The determinants of screening uptake By Hayes JA, Black NA, Jenkinson C, thalassaemia: a systematic review with and interventions for increasing uptake: Young JD, Rowan KM, Daly K, et al. supplementary research. a systematic review. By Davies SC, Cronin E, Gill M, By Jepson R, Clegg A, Forbes C, No. 25 Greengross P, Hickman M, Normand C. Lewis R, Sowden A, Kleijnen J. A systematic review to evaluate the effectiveness of interventions to promote No. 4 No. 15 the initiation of breastfeeding. The effectiveness and cost-effectiveness By Fairbank L, O’Meara S, Community provision of hearing aids of prophylactic removal of wisdom teeth. Renfrew MJ, Woolridge M, Sowden AJ, and related audiology services. A rapid review by Song F, O’Meara S, Lister-Sharp D. A review by Reeves DJ, Alborz A, Hickson Wilson P, Golder S, Kleijnen J. FS, Bamford JM. No. 26 No. 16 Implantable cardioverter defibrillators: No. 5 Ultrasound screening in pregnancy: arrhythmias. A rapid and systematic False-negative results in screening a systematic review of the clinical review. programmes: systematic review of impact effectiveness, cost-effectiveness and By Parkes J, Bryant J, Milne R. and implications. women’s views. By Petticrew MP, Sowden AJ, Lister- By Bricker L, Garcia J, Henderson J, No. 27 Sharp D, Wright K. Mugford M, Neilson J, Roberts T, et al. Treatments for fatigue in multiple sclerosis: a rapid and systematic review. No. 6 No. 17 By Brañas P, Jordan R, Fry-Smith A, Costs and benefits of community A rapid and systematic review of the Burls A, Hyde C. postnatal support workers: a randomised effectiveness and cost-effectiveness of the taxanes used in the treatment of controlled trial. No. 28 advanced breast and ovarian cancer. Early asthma prophylaxis, natural history, By Morrell CJ, Spiby H, Stewart P, By Lister-Sharp D, McDonagh MS, skeletal development and economy Walters S, Morgan A. Khan KS, Kleijnen J. (EASE): a pilot randomised controlled trial. No. 7 No. 18 Implantable contraceptives (subdermal Liquid-based cytology in cervical By Baxter-Jones ADG, Helms PJ, implants and hormonally impregnated screening: a rapid and systematic review. Russell G, Grant A, Ross S, Cairns JA, intrauterine systems) versus other By Payne N, Chilcott J, McGoogan E. et al. forms of reversible contraceptives: two systematic reviews to assess relative No. 19 No. 29 effectiveness, acceptability, tolerability Randomised controlled trial of non- Screening for hypercholesterolaemia and cost-effectiveness. directive counselling, cognitive– versus case finding for familial By French RS, Cowan FM, behaviour therapy and usual general hypercholesterolaemia: a systematic Mansour DJA, Morris S, Procter T, practitioner care in the management of review and cost-effectiveness analysis. Hughes D, et al. depression as well as mixed anxiety and By Marks D, depression in primary care. Wonderling D, Thorogood M, No. 8 By King M, Sibbald B, Ward E, Lambert H, Humphries SE, Neil HAW. An introduction to statistical methods for Bower P, Lloyd M, Gabbay M, et al. No. 30 health technology assessment. No. 20 A rapid and systematic review of A review by White SJ, Ashby D, Brown PJ. Routine referral for radiography of the clinical effectiveness and cost- patients presenting with low back pain: effectiveness of glycoprotein IIb/IIIa No. 9 is patients’ outcome influenced by GPs’ antagonists in the medical management Disease-modifying drugs for multiple referral for plain radiography? of unstable angina. sclerosis: a rapid and systematic review. By Kerry S, Hilton S, Patel S, By McDonagh MS, Bachmann LM, By Clegg A, Bryant J, Milne R. Dundas D, Rink E, Lord J. Golder S, Kleijnen J, ter Riet G. 111

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 31 Volume 5, 2001 No. 11 A randomised controlled trial Effectiveness of autologous chondrocyte of prehospital intravenous fluid No. 1 transplantation for hyaline cartilage replacement therapy in serious trauma. Clinical and cost-effectiveness of defects in knees: a rapid and systematic review. By Turner J, Nicholl J, Webber L, donepezil, rivastigmine and galantamine By Jobanputra P, Parry D, Fry- Cox H, Dixon S, Yates D. for Alzheimer’s disease: a rapid and systematic review. Smith A, Burls A. By Clegg A, Bryant J, Nicholson T, No. 32 McIntyre L, De Broe S, Gerard K, et al. No. 12 Intrathecal pumps for giving opioids in Statistical assessment of the learning chronic pain: a systematic review. No. 2 curves of health technologies. By Williams JE, Louw G, Towlerton G. The clinical effectiveness and cost- By Ramsay CR, Grant AM, effectiveness of riluzole for motor Wallace SA, Garthwaite PH, Monk AF, Russell IT. No. 33 neurone disease: a rapid and systematic review. Combination therapy (interferon alfa By Stewart A, Sandercock J, Bryan S, No. 13 and ribavirin) in the treatment of chronic Hyde C, Barton PM, Fry-Smith A, et al. The effectiveness and cost-effectiveness hepatitis C: a rapid and systematic of temozolomide for the treatment of review. No. 3 recurrent malignant glioma: a rapid and systematic review. By Shepherd J, Waugh N, Hewitson P. Equity and the economic evaluation of By Dinnes J, Cave C, Huang S, healthcare. Major K, Milne R. No. 34 By Sassi F, Archard L, Le Grand J. A systematic review of comparisons of No. 14 effect sizes derived from randomised and No. 4 A rapid and systematic review of non-randomised studies. Quality-of-life measures in chronic the clinical effectiveness and cost- By MacLehose RR, Reeves BC, diseases of childhood. effectiveness of debriding agents in Harvey IM, Sheldon TA, Russell IT, By Eiser C, Morse R. treating surgical wounds healing by Black AMS. secondary intention. No. 5 By Lewis R, Whiting P, ter Riet G, No. 35 Eliciting public preferences for O’Meara S, Glanville J. healthcare: a systematic review of Intravascular ultrasound-guided techniques. No. 15 interventions in coronary artery disease: By Ryan M, Scott DA, Reeves C, Home treatment for mental health a systematic literature review, with Bate A, van Teijlingen ER, Russell EM, problems: a systematic review. decision-analytic modelling, of outcomes et al. By Burns T, Knapp M, Catty J, and cost-effectiveness. Healey A, Henderson J, Watt H, et al. By Berry E, Kelly S, Hutton J, No. 6 Lindsay HSJ, Blaxill JM, Evans JA, et al. General health status measures for No. 16 people with cognitive impairment: How to develop cost-conscious No. 36 learning disability and acquired brain guidelines. A randomised controlled trial to evaluate injury. By Eccles M, Mason J. the effectiveness and cost-effectiveness By Riemsma RP, Forbes CA, of counselling patients with chronic Glanville JM, Eastwood AJ, Kleijnen J. No. 17 depression. The role of specialist nurses in multiple sclerosis: a rapid and systematic review. By Simpson S, Corney R, Fitzgerald P, No. 7 By De Broe S, Christopher F, Beecham J. An assessment of screening strategies for fragile X syndrome in the UK. Waugh N. No. 37 By Pembrey ME, Barnicoat AJ, Carmichael B, Bobrow M, Turner G. No. 18 Systematic review of treatments for atopic A rapid and systematic review of the clinical effectiveness and cost- eczema. No. 8 By Hoare C, Li Wan Po A, Williams H. effectiveness of orlistat in the Issues in methodological research: management of obesity. perspectives from researchers and By O’Meara S, Riemsma R, Shirran L, commissioners. No. 38 Mather L, ter Riet G. Bayesian methods in health technology By Lilford RJ, Richardson A, assessment: a review. Stevens A, Fitzpatrick R, Edwards S, No. 19 Rock F, et al. By Spiegelhalter DJ, Myles JP, The clinical effectiveness and cost- Jones DR, Abrams KR. effectiveness of pioglitazone for type 2 No. 9 diabetes mellitus: a rapid and systematic Systematic reviews of wound care review. No. 39 management: (5) beds; (6) compression; By Chilcott J, Wight J, Lloyd Jones M, The management of dyspepsia: a (7) laser therapy, therapeutic ultrasound, Tappenden P. systematic review. electrotherapy and electromagnetic therapy. By Delaney B, Moayyedi P, Deeks J, No. 20 Innes M, Soo S, Barton P, et al. By Cullum N, Nelson EA, Extended scope of nursing practice: Flemming K, Sheldon T. a multicentre randomised controlled No. 40 trial of appropriately trained nurses No. 10 and preregistration house officers in A systematic review of treatments for Effects of educational and psychosocial preoperative assessment in elective severe psoriasis. interventions for adolescents with general surgery. By Griffiths CEM, diabetes mellitus: a systematic review. By Kinley H, Czoski- Clark CM, Chalmers RJG, Li Wan Po A, By Hampson SE, Skinner TC, Hart J, Murray C, George S, McCabe C, 112 Williams HC. Storey L, Gage H, Foxcroft D, et al. Primrose J, Reilly C, et al. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 21 No. 31 No. 5 Systematic reviews of the effectiveness of Design and use of questionnaires: The clinical effectiveness and cost- day care for people with severe mental a review of best practice applicable effectiveness of inhaler devices used disorders: (1) Acute day hospital versus to surveys of health service staff and in the routine management of chronic admission; (2) Vocational rehabilitation; patients. asthma in older children: a systematic (3) Day hospital versus outpatient care. By McColl E, Jacoby A, Thomas L, review and economic evaluation. By Marshall M, Crowther R, Almaraz- Soutter J, Bamford C, Steen N, et al. By Peters J, Stevenson M, Beverley C, Serrano A, Creed F, Sledge W, Kluiter H, Lim J, Smith S. et al. No. 32 A rapid and systematic review of No. 6 No. 22 the clinical effectiveness and cost- The clinical effectiveness and cost- The measurement and monitoring of effectiveness of paclitaxel, docetaxel, effectiveness of sibutramine in the surgical adverse events. gemcitabine and vinorelbine in non- management of obesity: a technology By Bruce J, Russell EM, Mollison J, small-cell lung cancer. assessment. Krukowski ZH. By Clegg A, Scott DA, Sidhu M, By O’Meara S, Riemsma R, Shirran L, Hewitson P, Waugh N. Mather L, ter Riet G. No. 23 Action research: a systematic review and No. 33 No. 7 guidance for assessment. Subgroup analyses in randomised The cost-effectiveness of magnetic By Waterman H, Tillen D, Dickson R, controlled trials: quantifying the risks of resonance angiography for carotid artery de Koning K. false-positives and false-negatives. stenosis and peripheral vascular disease: By Brookes ST, Whitley E, Peters TJ, a systematic review. No. 24 Mulheran PA, Egger M, Davey Smith G. By Berry E, Kelly S, Westwood ME, Davies LM, Gough MJ, Bamford JM, A rapid and systematic review of et al. the clinical effectiveness and cost- No. 34 effectiveness of gemcitabine for the Depot antipsychotic medication in the treatment of pancreatic cancer. treatment of patients with schizophrenia: No. 8 Promoting physical activity in South By Ward S, Morris E, Bansback N, (1) Meta-review; (2) Patient and nurse Asian Muslim women through ‘exercise Calvert N, Crellin A, Forman D, et al. attitudes. By David AS, Adams C. on prescription’. By Carroll B, Ali N, Azam N. No. 25 A rapid and systematic review of the No. 35 No. 9 evidence for the clinical effectiveness A systematic review of controlled trials of Zanamivir for the treatment of influenza and cost-effectiveness of irinotecan, the effectiveness and cost-effectiveness in adults: a systematic review and oxaliplatin and raltitrexed for the of brief psychological treatments for economic evaluation. treatment of advanced colorectal cancer. depression. By Burls A, Clark W, Stewart T, By Lloyd Jones M, Hummel S, By Churchill R, Hunot V, Corney R, Preston C, Bryan S, Jefferson T, et al. Bansback N, Orr B, Seymour M. Knapp M, McGuire H, Tylee A, et al. No. 10 No. 26 No. 36 Cost analysis of child health surveillance. A review of the natural history and Comparison of the effectiveness of epidemiology of multiple sclerosis: inhaler devices in asthma and chronic By Sanderson D, Wright D, Acton C, Duree D. implications for resource allocation and obstructive airways disease: a systematic health economic models. review of the literature. By Richards RG, Sampson FC, By Brocklebank D, Ram F, Wright J, Beard SM, Tappenden P. Barry P, Cates C, Davies L, et al. Volume 6, 2002 No. 11 No. 27 No. 1 Screening for gestational diabetes: The cost-effectiveness of magnetic A study of the methods used to select a systematic review and economic resonance imaging for investigation of review criteria for clinical audit. evaluation. the knee joint. By Hearnshaw H, Harker R, By Scott DA, Loveman E, McIntyre L, By Bryan S, Weatherburn G, Cheater F, Baker R, Grimshaw G. Waugh N. Bungay H, Hatrick C, Salas C, Parry D, et al. No. 2 No. 12 Fludarabine as second-line therapy for The clinical effectiveness and cost- No. 28 B cell chronic lymphocytic leukaemia: a effectiveness of surgery for people with A rapid and systematic review of technology assessment. morbid obesity: a systematic review and the clinical effectiveness and cost- By Hyde C, Wake B, Bryan S, economic evaluation. effectiveness of topotecan for ovarian Barton P, Fry-Smith A, Davenport C, et al. By Clegg AJ, Colquitt J, Sidhu MK, cancer. Royle P, Loveman E, Walker A. By Forbes C, Shirran L, Bagnall A-M, No. 3 Duffy S, ter Riet G. Rituximab as third-line treatment for No. 13 refractory or recurrent Stage III or IV The clinical effectiveness of trastuzumab No. 29 follicular non-Hodgkin’s lymphoma: for breast cancer: a systematic review. Superseded by a report published in a a systematic review and economic By Lewis R, Bagnall A-M, Forbes C, later volume. evaluation. Shirran E, Duffy S, Kleijnen J, et al. By Wake B, Hyde C, Bryan S, No. 30 Barton P, Song F, Fry-Smith A, et al. No. 14 The role of radiography in primary The clinical effectiveness and cost- care patients with low back pain of at No. 4 effectiveness of vinorelbine for breast least 6 weeks duration: a randomised A systematic review of discharge cancer: a systematic review and economic (unblinded) controlled trial. arrangements for older people. evaluation. By Kendrick D, Fielding K, Bentley E, By Parker SG, Peet SM, McPherson A, By Lewis R, Bagnall A-M, King S, Miller P, Kerslake R, Pringle M. Cannaby AM, Baker R, Wilson A, et al. Woolacott N, Forbes C, Shirran L, et al. 113

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 15 No. 24 No. 33 A systematic review of the effectiveness A systematic review of the effectiveness The effectiveness and cost-effectiveness and cost-effectiveness of metal-on- of interventions based on a stages-of- of imatinib in chronic myeloid metal hip resurfacing arthroplasty for change approach to promote individual leukaemia: a systematic review. treatment of hip disease. behaviour change. By Garside R, Round A, Dalziel K, By Vale L, Wyness L, McCormack K, By Riemsma RP, Pattenden J, Stein K, Royle R. McKenzie L, Brazzelli M, Stearns SC. Bridle C, Sowden AJ, Mather L, Watt IS, et al. No. 34 No. 16 A comparative study of hypertonic saline, The clinical effectiveness and cost- No. 25 daily and alternate-day rhDNase in effectiveness of bupropion and nicotine A systematic review update of the clinical children with cystic fibrosis. replacement therapy for smoking effectiveness and cost-effectiveness of By Suri R, Wallis C, Bush A, cessation: a systematic review and glycoprotein IIb/IIIa antagonists. Thompson S, Normand C, Flather M, economic evaluation. By Robinson M, et al. By Woolacott NF, Jones L, Forbes CA, Ginnelly L, Sculpher M, Jones L, Mather LC, Sowden AJ, Song FJ, et al. Riemsma R, Palmer S, et al. No. 35 A systematic review of the costs and No. 17 No. 26 effectiveness of different models of A systematic review of effectiveness A systematic review of the effectiveness, paediatric home care. and economic evaluation of new drug cost-effectiveness and barriers to By Parker G, Bhakta P, Lovett CA, treatments for juvenile idiopathic implementation of thrombolytic and Paisley S, Olsen R, Turner D, et al. arthritis: etanercept. neuroprotective therapy for acute By Cummins C, Connock M, Fry- ischaemic stroke in the NHS. Smith A, Burls A. By Sandercock P, Berge E, Dennis M, Volume 7, 2003 No. 18 Forbes J, Hand P, Kwan J, et al. Clinical effectiveness and cost- No. 1 effectiveness of growth hormone in No. 27 How important are comprehensive children: a systematic review and A randomised controlled crossover trial literature searches and the assessment economic evaluation. of nurse practitioner versus doctor-led of trial quality in systematic reviews? By Bryant J, Cave C, Mihaylova B, outpatient care in a bronchiectasis clinic. Empirical study. Chase D, McIntyre L, Gerard K, et al. By Caine N, Sharples LD, By Egger M, Jüni P, Bartlett C, Hollingworth W, French J, Keogan M, Holenstein F, Sterne J. No. 19 Exley A, et al. Clinical effectiveness and cost- No. 2 effectiveness of growth hormone in No. 28 Systematic review of the effectiveness adults in relation to impact on quality of Clinical effectiveness and cost – and cost-effectiveness, and economic life: a systematic review and economic consequences of selective serotonin evaluation, of home versus hospital or evaluation. reuptake inhibitors in the treatment of satellite unit haemodialysis for people By Bryant J, Loveman E, Chase D, sex offenders. with end-stage renal failure. Mihaylova B, Cave C, Gerard K, et al. By Adi Y, Ashcroft D, Browne K, By Mowatt G, Vale L, Perez J, Beech A, Fry-Smith A, Hyde C. Wyness L, Fraser C, MacLeod A, et al. No. 20 Clinical medication review by a No. 29 No. 3 Systematic review and economic pharmacist of patients on repeat Treatment of established osteoporosis: evaluation of the effectiveness of prescriptions in general practice: a a systematic review and cost–utility infliximab for the treatment of Crohn’s randomised controlled trial. analysis. disease. By Zermansky AG, Petty DR, By Kanis JA, Brazier JE, Stevenson M, By Clark W, Raftery J, Barton P, Raynor DK, Lowe CJ, Freementle N, Calvert NW, Lloyd Jones M. Vail A. Song F, Fry-Smith A, Burls A. No. 30 No. 21 No. 4 Which anaesthetic agents are cost- The effectiveness of infliximab and A review of the clinical effectiveness effective in day surgery? Literature etanercept for the treatment of and cost-effectiveness of routine anti-D review, national survey of practice and rheumatoid arthritis: a systematic review prophylaxis for pregnant women who are randomised controlled trial. and economic evaluation. rhesus negative. By Elliott RA Payne K, Moore JK, By Jobanputra P, Barton P, Bryan S, By Chilcott J, Lloyd Jones M, Wight J, Davies LM, Harper NJN, St Leger AS, Burls A. Forman K, Wray J, Beverley C, et al. et al. No. 22 No. 5 A systematic review and economic No. 31 Systematic review and evaluation of the evaluation of computerised cognitive Screening for hepatitis C among use of tumour markers in paediatric behaviour therapy for depression and injecting drug users and in genitourinary oncology: Ewing’s sarcoma and anxiety. medicine clinics: systematic reviews neuroblastoma. By Kaltenthaler E, of effectiveness, modelling study and By Riley RD, Burchill SA, Abrams KR, Shackley P, Stevens K, Beverley C, national survey of current practice. Heney D, Lambert PC, Jones DR, et al. Parry G, Chilcott J. By Stein K, Dalziel K, Walker A, McIntyre L, Jenkins B, Horne J, et al. No. 6 No. 23 The cost-effectiveness of screening for A systematic review and economic No. 32 Helicobacter pylori to reduce mortality and evaluation of pegylated liposomal The measurement of satisfaction with morbidity from gastric cancer and peptic doxorubicin hydrochloride for ovarian healthcare: implications for practice from ulcer disease: a discrete-event simulation cancer. a systematic review of the literature. model. By Forbes C, Wilby J, Richardson G, By Crow R, Gage H, Hampson S, By Roderick P, Davies R, Raftery J, 114 Sculpher M, Mather L, Riemsma R. Hart J, Kimber A, Storey L, et al. Crabbe D, Pearce R, Bhandari P, et al. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 7 No. 16 No. 26 The clinical effectiveness and cost- Screening for fragile X syndrome: a Can randomised trials rely on existing effectiveness of routine dental checks: literature review and modelling. electronic data? A feasibility study to a systematic review and economic By Song FJ, Barton P, Sleightholme V, explore the value of routine data in evaluation. Yao GL, Fry-Smith A. health technology assessment. By Davenport C, By Williams JG, Cheung WY, Elley K, Salas C, Taylor-Weetman CL, No. 17 Cohen DR, Hutchings HA, Longo MF, Fry-Smith A, Bryan S, et al. Russell IT. Systematic review of endoscopic sinus surgery for nasal polyps. No. 8 No. 27 By Dalziel K, Stein K, Round A, A multicentre randomised controlled Evaluating non-randomised intervention Garside R, Royle P. trial assessing the costs and benefits studies. of using structured information and By Deeks JJ, Dinnes J, D’Amico R, No. 18 analysis of women’s preferences in the Sowden AJ, Sakarovitch C, Song F, et al. management of menorrhagia. Towards efficient guidelines: how to By Kennedy ADM, Sculpher MJ, monitor guideline use in primary care. No. 28 Coulter A, Dwyer N, Rees M, Horsley S, By Hutchinson A, McIntosh A, Cox S, et al. A randomised controlled trial to assess Gilbert C. the impact of a package comprising a patient-orientated, evidence-based self- No. 9 No. 19 help guidebook and patient-centred Clinical effectiveness and cost–utility Effectiveness and cost-effectiveness of consultations on disease management of photodynamic therapy for wet acute hospital-based spinal cord injuries and satisfaction in inflammatory bowel age-related macular degeneration: services: systematic review. disease. a systematic review and economic By Kennedy A, Nelson E, Reeves D, evaluation. By Bagnall A-M, Jones L, Richardson G, Duffy S, Riemsma R. Richardson G, Roberts C, Robinson A, By Meads C, Salas C, Roberts T, et al. Moore D, Fry-Smith A, Hyde C. No. 20 No. 29 No. 10 Prioritisation of health technology The effectiveness of diagnostic tests for Evaluation of molecular tests for prenatal assessment. The PATHS model: methods and case studies. the assessment of shoulder pain due to diagnosis of chromosome abnormalities. soft tissue disorders: a systematic review. By Townsend J, Buxton M, Harper G. By Grimshaw GM, Szczepura A, By Dinnes J, Loveman E, McIntyre L, Hultén M, MacDonald F, Nevin NC, Waugh N. Sutton F, et al. No. 21 Systematic review of the clinical No. 30 No. 11 effectiveness and cost-effectiveness of The value of digital imaging in diabetic First and second trimester antenatal tension-free vaginal tape for treatment of retinopathy. screening for Down’s syndrome: urinary stress incontinence. By Sharp PF, Olson J, Strachan F, the results of the Serum, Urine and By Cody J, Wyness L, Wallace S, Hipwell J, Ludbrook A, O’Donnell M, Ultrasound Screening Study (SURUSS). Glazener C, Kilonzo M, Stearns S, et al. et al. By Wald NJ, Rodeck C, Hackshaw AK, Walters J, Chitty L, Mackinson AM. No. 22 No. 31 The clinical and cost-effectiveness of Lowering blood pressure to prevent No. 12 patient education models for diabetes: The effectiveness and cost-effectiveness myocardial infarction and stroke: a new a systematic review and economic preventive strategy. of ultrasound locating devices for central evaluation. venous access: a systematic review and By Law M, Wald N, Morris J. By Loveman E, Cave C, Green C, economic evaluation. Royle P, Dunn N, Waugh N. By Calvert N, Hind D, No. 32 McWilliams RG, Thomas SM, Beverley C, No. 23 Clinical and cost-effectiveness of Davidson A. capecitabine and tegafur with uracil for The role of modelling in prioritising and the treatment of metastatic colorectal No. 13 planning clinical trials. cancer: systematic review and economic A systematic review of atypical By Chilcott J, Brennan A, Booth A, evaluation. antipsychotics in schizophrenia. Karnon J, Tappenden P. By Ward S, Kaltenthaler E, Cowan J, By Bagnall A-M, Jones L, Lewis R, Brewer N. Ginnelly L, Glanville J, Torgerson D, No. 24 et al. Cost–benefit evaluation of routine No. 33 influenza immunisation in people Clinical and cost-effectiveness of new and No. 14 65–74 years of age. emerging technologies for early localised Prostate Testing for Cancer and By Allsup S, Gosney M, Haycox A, prostate cancer: a systematic review. Treatment (ProtecT) feasibility study. Regan M. By Hummel S, Paisley S, Morgan A, By Donovan J, Hamdy F, Neal D, Currie E, Brewer N. Peters T, Oliver S, Brindle L, et al. No. 25 The clinical and cost-effectiveness of No. 34 No. 15 pulsatile machine perfusion versus cold Literature searching for clinical and Early thrombolysis for the treatment of storage of kidneys for transplantation cost-effectiveness studies used in health acute myocardial infarction: a systematic retrieved from heart-beating and non- technology assessment reports carried review and economic evaluation. heart-beating donors. out for the National Institute for Clinical By Boland A, Dundar Y, Bagust A, By Wight J, Chilcott J, Holmes M, Excellence appraisal system. Haycox A, Hill R, Mujica Mota R, et al. Brewer N. By Royle P, Waugh N. 115

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 35 No. 2 No. 11 Systematic review and economic decision Systematic review and modelling of the The use of modelling to evaluate modelling for the prevention and investigation of acute and chronic chest new drugs for patients with a chronic treatment of influenza A and B. pain presenting in primary care. condition: the case of antibodies against By Turner D, Wailoo A, Nicholson K, By Mant J, McManus RJ, Oakes RAL, tumour necrosis factor in rheumatoid Cooper N, Sutton A, Abrams K. Delaney BC, Barton PM, Deeks JJ, et al. arthritis. By Barton P, Jobanputra P, Wilson J, Bryan S, Burls A. No. 36 No. 3 A randomised controlled trial to evaluate The effectiveness and cost-effectiveness the clinical and cost-effectiveness of No. 12 of microwave and thermal balloon Clinical effectiveness and cost- Hickman line insertions in adult cancer endometrial ablation for heavy menstrual patients by nurses. effectiveness of neonatal screening bleeding: a systematic review and for inborn errors of metabolism using By Boland A, Haycox A, Bagust A, economic modelling. tandem mass spectrometry: a systematic Fitzsimmons L. By Garside R, Stein K, Wyatt K, review. Round A, Price A. By Pandor A, Eastham J, Beverley C, No. 37 Chilcott J, Paisley S. Redesigning postnatal care: a No. 4 randomised controlled trial of protocol- A systematic review of the role of No. 13 based midwifery-led care focused bisphosphonates in metastatic disease. Clinical effectiveness and cost- on individual women’s physical and effectiveness of pioglitazone and psychological health needs. By Ross JR, Saunders Y, Edmonds PM, Patel S, Wonderling D, rosiglitazone in the treatment of type By MacArthur C, Winter HR, Normand C, et al. 2 diabetes: a systematic review and Bick DE, Lilford RJ, Lancashire RJ, economic evaluation. Knowles H, et al. No. 5 By Czoski-Murray C, Warren E, Chilcott J, Beverley C, Psyllaki MA, No. 38 Systematic review of the clinical Cowan J. Estimating implied rates of discount in effectiveness and cost-effectiveness of capecitabine (Xeloda®) for locally healthcare decision-making. No. 14 advanced and/or metastatic breast cancer. By West RR, Routine examination of the newborn: McNabb R, Thompson AGH, By Jones L, Hawkins N, Westwood M, the EMREN study. Evaluation of an Sheldon TA, Grimley Evans J. Wright K, Richardson G, Riemsma R. extension of the midwife role including a randomised controlled trial of No. 39 No. 6 appropriately trained midwives and Systematic review of isolation policies in Effectiveness and efficiency of guideline paediatric senior house officers. the hospital management of methicillin- dissemination and implementation By Townsend J, Wolke D, Hayes J, resistant Staphylococcus aureus: a review of strategies. Davé S, Rogers C, Bloomfield L, et al. the literature with epidemiological and By Grimshaw JM, Thomas RE, economic modelling. MacLennan G, Fraser C, Ramsay CR, No. 15 By Cooper BS, Stone SP, Kibbler CC, Vale L, et al. Involving consumers in research and Cookson BD, Roberts JA, Medley GF, development agenda setting for the et al. No. 7 NHS: developing an evidence-based approach. Clinical effectiveness and costs of the By Oliver S, Clarke-Jones L, Rees R, No. 40 Sugarbaker procedure for the treatment Milne R, Buchanan P, Gabbay J, et al. Treatments for spasticity and pain in of pseudomyxoma peritonei. multiple sclerosis: a systematic review. By Bryant J, Clegg AJ, Sidhu MK, No. 16 By Beard S, Hunn A, Wight J. Brodin H, Royle P, Davidson P. A multi-centre randomised controlled trial of minimally invasive direct coronary No. 41 No. 8 bypass grafting versus percutaneous The inclusion of reports of randomised Psychological treatment for insomnia transluminal coronary angioplasty with trials published in languages other than in the regulation of long-term hypnotic stenting for proximal stenosis of the left English in systematic reviews. drug use. anterior descending coronary artery. By Moher D, Pham B, Lawson ML, By Morgan K, Dixon S, Mathers N, By Reeves BC, Angelini GD, Klassen TP. Thompson J, Tomeny M. Bryan AJ, Taylor FC, Cripps T, Spyt TJ, et al. No. 42 No. 9 No. 17 The impact of screening on future Improving the evaluation of therapeutic health-promoting behaviours and health interventions in multiple sclerosis: Does early magnetic resonance imaging beliefs: a systematic review. development of a patient-based measure influence management or improve By Bankhead CR, Brett J, Bukach C, of outcome. outcome in patients referred to secondary care with low back pain? A Webster P, Stewart-Brown S, Munafo M, By Hobart JC, Riazi A, Lamping DL, et al. pragmatic randomised controlled trial. Fitzpatrick R, Thompson AJ. By Gilbert FJ, Grant AM, Gillan MGC, Vale L, No. 10 Scott NW, Campbell MK, et al. Volume 8, 2004 A systematic review and economic evaluation of magnetic resonance No. 18 No. 1 cholangiopancreatography compared The clinical and cost-effectiveness of What is the best imaging strategy for with diagnostic endoscopic retrograde anakinra for the treatment of rheumatoid acute stroke? cholangiopancreatography. arthritis in adults: a systematic review By Wardlaw JM, Keir SL, Seymour J, By Kaltenthaler E, Bravo Vergel Y, and economic analysis. Lewis S, Sandercock PAG, Dennis MS, Chilcott J, Thomas S, Blakeborough T, By Clark W, Jobanputra P, Barton P, 116 et al. Walters SJ, et al. Burls A. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 19 No. 28 No. 37 A rapid and systematic review and Effectiveness and cost-effectiveness Rituximab (MabThera®) for aggressive economic evaluation of the clinical of imatinib for first-line treatment of non-Hodgkin’s lymphoma: systematic and cost-effectiveness of newer drugs chronic myeloid leukaemia in chronic review and economic evaluation. for treatment of mania associated with phase: a systematic review and economic By Knight C, Hind D, Brewer N, bipolar affective disorder. analysis. Abbott V. By Bridle C, Palmer S, Bagnall A-M, By Dalziel K, Round A, Stein K, Darba J, Duffy S, Sculpher M, et al. Garside R, Price A. No. 38 Clinical effectiveness and cost- No. 20 No. 29 effectiveness of clopidogrel and VenUS I: a randomised controlled trial of Liquid-based cytology in cervical modified-release dipyridamole in the two types of bandage for treating venous screening: an updated rapid and secondary prevention of occlusive leg ulcers. systematic review and economic analysis. vascular events: a systematic review and By Iglesias C, Nelson EA, Cullum NA, economic evaluation. By Karnon J, Peters J, Platt J, Torgerson DJ, on behalf of the VenUS Chilcott J, McGoogan E, Brewer N. By Jones L, Griffin S, Palmer S, Team. Main C, Orton V, Sculpher M, et al. No. 21 No. 30 No. 39 Systematic review of the long-term Systematic review of the effectiveness Pegylated interferon α-2a and -2b effects and economic consequences of and cost-effectiveness, and economic in combination with ribavirin in the treatments for obesity and implications evaluation, of myocardial perfusion treatment of chronic hepatitis C: for health improvement. scintigraphy for the diagnosis and a systematic review and economic By Avenell A, Broom J, Brown TJ, management of angina and myocardial evaluation. infarction. Poobalan A, Aucott L, Stearns SC, et al. By Shepherd J, Brodin H, Cave C, By Mowatt G, Vale L, Brazzelli M, Waugh N, Price A, Gabbay J. No. 22 Hernandez R, Murray A, Scott N, et al. Autoantibody testing in children with No. 40 No. 31 newly diagnosed type 1 diabetes mellitus. Clopidogrel used in combination with A pilot study on the use of decision By Dretzke J, Cummins C, aspirin compared with aspirin alone theory and value of information analysis Sandercock J, Fry-Smith A, Barrett T, in the treatment of non-ST-segment- as part of the NHS Health Technology Burls A. elevation acute coronary syndromes: Assessment programme. a systematic review and economic By Claxton K, Ginnelly L, No. 23 evaluation. Sculpher M, Philips Z, Palmer S. By Main C, Palmer S, Griffin S, Clinical effectiveness and cost- Jones L, Orton V, Sculpher M, et al. effectiveness of prehospital intravenous No. 32 fluids in trauma patients. The Social Support and Family Health No. 41 By Dretzke J, Sandercock J, Bayliss S, Study: a randomised controlled trial and Provision, uptake and cost of cardiac Burls A. economic evaluation of two alternative rehabilitation programmes: improving forms of postnatal support for mothers services to under-represented groups. No. 24 living in disadvantaged inner-city areas. By Beswick AD, Rees K, Griebsch I, By Wiggins M, Oakley A, Roberts I, Newer hypnotic drugs for the short-term Taylor FC, Burke M, West RR, et al. management of insomnia: a systematic Turner H, Rajan L, Austerberry H, et al. review and economic evaluation. No. 33 No. 42 By Dündar Y, Boland A, Strobl J, Involving South Asian patients in clinical Dodd S, Haycox A, Bagust A, et al. Psychosocial aspects of genetic screening of pregnant women and newborns: a trials. systematic review. By Hussain-Gambles M, Leese B, No. 25 By Green JM, Hewison J, Bekker HL, Atkin K, Brown J, Mason S, Tovey P. Development and validation of methods Bryant, Cuckle HS. for assessing the quality of diagnostic No. 43 accuracy studies. No. 34 Clinical and cost-effectiveness of By Whiting P, Rutjes AWS, Dinnes J, Evaluation of abnormal uterine continuous subcutaneous insulin infusion Reitsma JB, Bossuyt PMM, Kleijnen J. bleeding: comparison of three outpatient for diabetes. procedures within cohorts defined by age By Colquitt JL, Green C, Sidhu MK, No. 26 and menopausal status. Hartwell D, Waugh N. EVALUATE hysterectomy trial: a By Critchley HOD, Warner P, Lee AJ, multicentre randomised trial comparing Brechin S, Guise J, Graham B. No. 44 abdominal, vaginal and laparoscopic Identification and assessment of ongoing methods of hysterectomy. No. 35 trials in health technology assessment By Garry R, Fountain J, Brown J, Coronary artery stents: a rapid systematic reviews. Manca A, Mason S, Sculpher M, et al. review and economic evaluation. By Song FJ, Fry-Smith A, By Hill R, Bagust A, Bakhai A, Davenport C, Bayliss S, Adi Y, Wilson JS, No. 27 Dickson R, Dündar Y, Haycox A, et al. et al. Methods for expected value of information analysis in complex health No. 36 No. 45 economic models: developments on the Review of guidelines for good practice Systematic review and economic health economics of interferon-β and in decision-analytic modelling in health evaluation of a long-acting insulin glatiramer acetate for multiple sclerosis. technology assessment. analogue, insulin glargine By Tappenden P, Chilcott JB, By Philips Z, Ginnelly L, Sculpher M, By Warren E, Weatherley-Jones E, Eggington S, Oakley J, McCabe C. Claxton K, Golder S, Riemsma R, et al. Chilcott J, Beverley C. 117

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 46 No. 4 No. 13 Supplementation of a home-based Randomised evaluation of alternative Cervical screening programmes: can exercise programme with a class-based electrosurgical modalities to treat bladder automation help? Evidence from programme for people with osteoarthritis outflow obstruction in men with benign systematic reviews, an economic analysis of the knees: a randomised controlled prostatic hyperplasia. and a simulation modelling exercise trial and health economic analysis. By Fowler C, McAllister W, Plail R, applied to the UK. Karim O, Yang Q. By Willis BH, Barton P, Pearmain P, By McCarthy CJ, Mills PM, Pullen R, Bryan S, Hyde C. Richardson G, Hawkins N, Roberts CR, et al. No. 5 A pragmatic randomised controlled trial No. 14 of the cost-effectiveness of palliative Laparoscopic surgery for inguinal No. 47 therapies for patients with inoperable repair: systematic review of effectiveness Clinical and cost-effectiveness of once- oesophageal cancer. and economic evaluation. daily versus more frequent use of same By Shenfine J, McNamee P, Steen N, By McCormack K, Wake B, Perez J, potency topical corticosteroids for Bond J, Griffin SM. Fraser C, Cook J, McIntosh E, et al. atopic eczema: a systematic review and No. 15 economic evaluation. No. 6 Clinical effectiveness, tolerability and By Green C, Colquitt JL, Kirby J, Impact of computer-aided detection Davidson P, Payne E. cost-effectiveness of newer drugs for prompts on the sensitivity and specificity epilepsy in adults: a systematic review of screening mammography. and economic evaluation. No. 48 By Taylor P, Champness J, Given- By Wilby J, Kainth A, Hawkins N, Acupuncture of chronic headache Wilson R, Johnston K, Potts H. Epstein D, McIntosh H, McDaid C, et al. disorders in primary care: randomised controlled trial and economic analysis. No. 7 No. 16 By Vickers AJ, Rees RW, Zollman CE, Issues in data monitoring and interim A randomised controlled trial to McCarney R, Smith CM, Ellis N, et al. analysis of trials. compare the cost-effectiveness of tricyclic By Grant AM, Altman DG, antidepressants, selective serotonin Babiker AB, Campbell MK, Clemens FJ, reuptake inhibitors and lofepramine. No. 49 Darbyshire JH, et al. By Peveler R, Kendrick T, Buxton M, Generalisability in economic evaluation Longworth L, Baldwin D, Moore M, et al. studies in healthcare: a review and case No. 8 studies. Lay public’s understanding of equipoise No. 17 By Sculpher MJ, Pang FS, Manca A, and randomisation in randomised Clinical effectiveness and cost- Drummond MF, Golder S, Urdahl H, controlled trials. effectiveness of immediate angioplasty et al. By Robinson EJ, Kerr CEP, for acute myocardial infarction: Stevens AJ, Lilford RJ, Braunholtz DA, systematic review and economic No. 50 Edwards SJ, et al. evaluation. By Hartwell D, Colquitt J, Loveman E, Virtual outreach: a randomised Clegg AJ, Brodin H, Waugh N, et al. controlled trial and economic evaluation No. 9 Clinical and cost-effectiveness of of joint teleconferenced medical No. 18 consultations. electroconvulsive therapy for depressive illness, schizophrenia, catatonia and A randomised controlled comparison of By Wallace P, Barber J, Clayton W, mania: systematic reviews and economic alternative strategies in stroke care. Currell R, Fleming K, Garner P, et al. modelling studies. By Kalra L, Evans A, Perez I, By Greenhalgh J, Knight C, Hind D, Knapp M, Swift C, Donaldson N. Beverley C, Walters S. No. 19 Volume 9, 2005 No. 10 The investigation and analysis of critical incidents and adverse events in Measurement of health-related quality No. 1 healthcare. of life for people with dementia: By Woloshynowych M, Rogers S, Randomised controlled multiple development of a new instrument Taylor-Adams S, Vincent C. treatment comparison to provide a cost- (DEMQOL) and an evaluation of current effectiveness rationale for the selection of methodology. antimicrobial therapy in acne. No. 20 By Smith SC, Lamping DL, Potential use of routine databases in By Ozolins M, Eady EA, Avery A, Banerjee S, Harwood R, Foley B, Smith P, health technology assessment. Cunliffe WJ, O’Neill C, Simpson NB, et al. et al. By Raftery J, Roderick P, Stevens A. No. 11 No. 21 No. 2 Clinical effectiveness and cost- Clinical and cost-effectiveness of newer Do the findings of case series studies vary effectiveness of drotrecogin alfa immunosuppressive regimens in renal ® significantly according to methodological (activated) (Xigris ) for the treatment transplantation: a systematic review and characteristics? of severe sepsis in adults: a systematic modelling study. review and economic evaluation. By Dalziel K, Round A, Stein K, By Woodroffe R, Yao GL, Meads C, Garside R, Castelnuovo E, Payne L. By Green C, Dinnes J, Takeda A, Bayliss S, Ready A, Raftery J, et al. Shepherd J, Hartwell D, Cave C, et al. No. 3 No. 22 No. 12 A systematic review and economic Improving the referral process for A methodological review of how evaluation of alendronate, etidronate, familial breast cancer genetic counselling: heterogeneity has been examined in risedronate, raloxifene and teriparatide findings of three randomised controlled systematic reviews of diagnostic test for the prevention and treatment of trials of two interventions. accuracy. postmenopausal osteoporosis. By Wilson BJ, Torrance N, Mollison J, By Dinnes J, Deeks J, Kirby J, By Stevenson M, Lloyd Jones M, 118 Wordsworth S, Gray JR, Haites NE, et al. Roderick P. De Nigris E, Brewer N, Davis S, Oakley J. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 23 No. 32 No. 41 A systematic review to examine Longer term clinical and economic Displaced intracapsular hip fractures the impact of psycho-educational benefits of offering acupuncture care to in fit, older people: a randomised interventions on health outcomes and patients with chronic low back pain. comparison of reduction and fixation, costs in adults and children with difficult By Thomas KJ, bipolar hemiarthroplasty and total hip asthma. MacPherson H, Ratcliffe J, Thorpe L, arthroplasty. Brazier J, Campbell M, et al. By Smith JR, Mugford M, Holland R, By Keating JF, Grant A, Masson M, Candy B, Noble MJ, Harrison BDW, et al. Scott NW, Forbes JF. No. 33 Cost-effectiveness and safety of epidural No. 42 No. 24 steroids in the management of sciatica. Long-term outcome of cognitive An evaluation of the costs, effectiveness By Price C, Arden N, Coglan L, behaviour therapy clinical trials in and quality of renal replacement therapy Rogers P. central Scotland. provision in renal satellite units in By Durham RC, Chambers JA, England and Wales. No. 34 Power KG, Sharp DM, Macdonald RR, By Roderick P, The British Rheumatoid Outcome Study Major KA, et al. Nicholson T, Armitage A, Mehta R, Group (BROSG) randomised controlled Mullee M, Gerard K, et al. trial to compare the effectiveness and No. 43 cost-effectiveness of aggressive versus The effectiveness and cost-effectiveness No. 25 symptomatic therapy in established of dual-chamber pacemakers compared rheumatoid arthritis. Imatinib for the treatment of patients with single-chamber pacemakers for By Symmons D, Tricker K, Roberts C, bradycardia due to atrioventricular block with unresectable and/or metastatic Davies L, Dawes P, Scott DL. gastrointestinal stromal tumours: or sick sinus syndrome: systematic review and economic evaluation. systematic review and economic No. 35 evaluation. By Castelnuovo E, Stein K, Pitt M, Conceptual framework and systematic Garside R, Payne E. By Wilson J, Connock M, Song F, review of the effects of participants’ and Yao G, Fry-Smith A, Raftery J, et al. professionals’ preferences in randomised No. 44 controlled trials. Newborn screening for congenital heart No. 26 By King M, Nazareth I, Lampe F, defects: a systematic review and cost- Indirect comparisons of competing Bower P, Chandler M, Morou M, et al. effectiveness analysis. interventions. By Knowles R, Griebsch I, No. 36 By Glenny AM, Altman DG, Song F, Dezateux C, Brown J, Bull C, Wren C. The clinical and cost-effectiveness of Sakarovitch C, Deeks JJ, D’Amico R, et al. implantable cardioverter defibrillators: a No. 45 systematic review. The clinical and cost-effectiveness of left No. 27 By Bryant J, Brodin H, Loveman E, ventricular assist devices for end-stage Cost-effectiveness of alternative strategies Payne E, Clegg A. heart failure: a systematic review and for the initial medical management economic evaluation. of non-ST elevation acute coronary No. 37 By Clegg AJ, Scott DA, Loveman E, syndrome: systematic review and A trial of problem-solving by community Colquitt J, Hutchinson J, Royle P, et al. decision-analytical modelling. mental health nurses for anxiety, By Robinson M, depression and life difficulties among general practice patients. The CPN-GP No. 46 Palmer S, Sculpher M, Philips Z, The effectiveness of the Heidelberg Ginnelly L, Bowens A, et al. study. By Kendrick T, Simons L, Mynors- Retina Tomograph and laser diagnostic glaucoma scanning system (GDx) in No. 28 Wallis L, Gray A, Lathlean J, Pickering R, et al. detecting and monitoring glaucoma. Outcomes of electrically stimulated By Kwartz AJ, Henson DB, gracilis neosphincter surgery. No. 38 Harper RA, Spencer AF, McLeod D. By Tillin T, Chambers M, Feldman R. The causes and effects of socio- demographic exclusions from clinical No. 47 No. 29 trials. Clinical and cost-effectiveness of autologous chondrocyte implantation for The effectiveness and cost-effectiveness By Bartlett C, Doyal L, Ebrahim S, Davey P, Bachmann M, Egger M, et al. cartilage defects in knee joints: systematic of pimecrolimus and tacrolimus for review and economic evaluation. atopic eczema: a systematic review and By Clar C, Cummins E, McIntyre L, economic evaluation. No. 39 Is hydrotherapy cost-effective? A Thomas S, Lamb J, Bain L, et al. By Garside R, Stein K, Castelnuovo E, randomised controlled trial of combined Pitt M, Ashcroft D, Dimmock P, et al. hydrotherapy programmes compared No. 48 with physiotherapy land techniques in Systematic review of effectiveness of No. 30 children with juvenile idiopathic arthritis. different treatments for childhood Systematic review on urine albumin By Epps H, Ginnelly L, Utley M, retinoblastoma. testing for early detection of diabetic Southwood T, Gallivan S, Sculpher M, By McDaid C, complications. et al. Hartley S, Bagnall A-M, Ritchie G, By Newman DJ, Mattock MB, Light K, Riemsma R. Dawnay ABS, Kerry S, McGuire A, No. 40 Yaqoob M, et al. A randomised controlled trial and No. 49 cost-effectiveness study of systematic Towards evidence-based guidelines screening (targeted and total population for the prevention of venous No. 31 screening) versus routine practice for the thromboembolism: systematic Randomised controlled trial of the cost- detection of atrial fibrillation in people reviews of mechanical methods, oral effectiveness of water-based therapy for aged 65 and over. The SAFE study. anticoagulation, dextran and regional lower limb osteoarthritis. By Hobbs FDR, Fitzmaurice DA, anaesthesia as thromboprophylaxis. By Cochrane T, Davey RC, Mant J, Murray E, Jowett S, Bryan S, By Roderick P, Ferris G, Wilson K, Matthes Edwards SM. et al. Halls H, Jackson D, Collins R, et al. 119

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 50 No. 9 No. 17 The effectiveness and cost-effectiveness Topotecan, pegylated liposomal Randomised controlled trials of of parent training/education programmes doxorubicin hydrochloride and paclitaxel conventional antipsychotic versus for the treatment of conduct disorder, for second-line or subsequent treatment new atypical drugs, and new atypical including oppositional defiant disorder, of advanced ovarian cancer: a systematic drugs versus clozapine, in people with in children. review and economic evaluation. schizophrenia responding poorly to, or By Dretzke J, Frew E, Davenport C, intolerant of, current drug treatment. Barlow J, Stewart-Brown S, Sandercock J, By Main C, Bojke L, Griffin S, By Lewis SW, Davies L, Jones PB, et al. Norman G, Barbieri M, Mather L, et al. Barnes TRE, Murray RM, Kerwin R, et al. No. 10 No. 18 Volume 10, 2006 Evaluation of molecular techniques Diagnostic tests and algorithms used in prediction and diagnosis in the investigation of haematuria: No. 1 of cytomegalovirus disease in systematic reviews and economic The clinical and cost-effectiveness of immunocompromised patients. donepezil, rivastigmine, galantamine and evaluation. By Szczepura A, Westmoreland D, memantine for Alzheimer’s disease. By Rodgers M, Nixon J, Hempel S, Vinogradova Y, Fox J, Clark M. By Loveman E, Green C, Kirby J, Aho T, Kelly J, Neal D, et al. Takeda A, Picot J, Payne E, et al. No. 11 No. 19 No. 2 Screening for thrombophilia in high-risk Cognitive behavioural therapy in FOOD: a multicentre randomised trial situations: systematic review and cost- addition to antispasmodic therapy for evaluating feeding policies in patients effectiveness analysis. The Thrombosis: irritable bowel syndrome in primary care: admitted to hospital with a recent stroke. Risk and Economic Assessment of randomised controlled trial. By Dennis M, Lewis S, Cranswick G, Thrombophilia Screening (TREATS) By Kennedy TM, Forbes J. study. Chalder T, McCrone P, Darnley S, By Wu O, Robertson L, Twaddle S, Knapp M, Jones RH, et al. No. 3 Lowe GDO, Clark P, Greaves M, et al. The clinical effectiveness and cost- effectiveness of computed tomography No. 20 screening for lung cancer: systematic No. 12 A systematic review of the clinical reviews. A series of systematic reviews to inform effectiveness and cost-effectiveness of By Black C, Bagust A, Boland A, a decision analysis for sampling and enzyme replacement therapies for Fabry’s Walker S, McLeod C, De Verteuil R, et al. treating infected diabetic foot ulcers. disease and mucopolysaccharidosis type 1. No. 4 By Nelson EA, O’Meara S, Craig D, Iglesias C, Golder S, Dalton J, et al. By Connock M, Juarez-Garcia A, A systematic review of the effectiveness Frew E, Mans A, Dretzke J, Fry-Smith A, and cost-effectiveness of neuroimaging et al. assessments used to visualise the seizure No. 13 focus in people with refractory epilepsy Randomised clinical trial, observational No. 21 being considered for surgery. study and assessment of cost-effectiveness By Whiting P, Gupta R, Burch J, Health benefits of antiviral therapy for of the treatment of varicose veins mild chronic hepatitis C: randomised Mujica Mota RE, Wright K, Marson A, (REACTIV trial). et al. controlled trial and economic evaluation. By Michaels JA, Campbell WB, By Wright M, Grieve R, Roberts J, No. 5 Brazier JE, MacIntyre JB, Palfreyman SJ, Main J, Thomas HC, on behalf of the UK Comparison of conference abstracts and Ratcliffe J, et al. Mild Hepatitis C Trial Investigators. presentations with full-text articles in the health technology assessments of rapidly No. 14 No. 22 evolving technologies. The cost-effectiveness of screening for Pressure relieving support surfaces: a By Dundar Y, Dodd S, Dickson R, oral cancer in primary care. randomised evaluation. Walley T, Haycox A, Williamson PR. By Speight PM, Palmer S, Moles DR, By Nixon J, Nelson EA, Cranny G, No. 6 Downer MC, Smith DH, Henriksson M, Iglesias CP, Hawkins K, Cullum NA, et al. Systematic review and evaluation et al. of methods of assessing urinary No. 23 incontinence. No. 15 A systematic review and economic model By Martin JL, Williams KS, of the effectiveness and cost-effectiveness Measurement of the clinical and cost- Abrams KR, Turner DA, Sutton AJ, of methylphenidate, dexamfetamine effectiveness of non-invasive diagnostic Chapple C, et al. and atomoxetine for the treatment of testing strategies for deep vein attention deficit hyperactivity disorder in thrombosis. No. 7 children and adolescents. The clinical effectiveness and cost- By Goodacre S, By King S, Griffin S, Hodges Z, effectiveness of newer drugs for children Sampson F, Stevenson M, Wailoo A, Weatherly H, Asseburg C, Richardson G, with epilepsy. A systematic review. Sutton A, Thomas S, et al. et al. By Connock M, Frew E, Evans B-W, Bryan S, Cummins C, Fry-Smith A, et al. No. 16 No. 24 No. 8 Systematic review of the effectiveness and The clinical effectiveness and cost- ® Surveillance of Barrett’s oesophagus: cost-effectiveness of HealOzone for the effectiveness of enzyme replacement exploring the uncertainty through treatment of occlusal pit/fissure caries therapy for Gaucher’s disease: a systematic review, expert workshop and and root caries. systematic review. economic modelling. By Brazzelli M, By Connock M, Burls A, Frew E, Fry- By Garside R, Pitt M, Somerville M, McKenzie L, Fielding S, Fraser C, Smith A, Juarez-Garcia A, McCabe C, 120 Stein K, Price A, Gilbert N. Clarkson J, Kilonzo M, et al. et al. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 25 No. 33 No. 41 Effectiveness and cost-effectiveness Computerised cognitive behaviour The clinical and cost-effectiveness of of salicylic acid and cryotherapy for therapy for depression and anxiety oxaliplatin and capecitabine for the cutaneous warts. An economic decision update: a systematic review and economic adjuvant treatment of colon cancer: model. evaluation. systematic review and economic evaluation. By Thomas KS, Keogh-Brown MR, By Kaltenthaler E, By Pandor A, Eggington S, Paisley S, Chalmers JR, Fordham RJ, Holland RC, Brazier J, De Nigris E, Tumur I, Tappenden P, Sutcliffe P. Armstrong SJ, et al. Ferriter M, Beverley C, et al. No. 42 No. 26 No. 34 A systematic review of the effectiveness of A systematic literature review of the Cost-effectiveness of using prognostic adalimumab, etanercept and infliximab effectiveness of non-pharmacological information to select women with breast for the treatment of rheumatoid arthritis interventions to prevent wandering in cancer for adjuvant systemic therapy. in adults and an economic evaluation of dementia and evaluation of the ethical their cost-effectiveness. implications and acceptability of their By Williams C, Brunskill S, Altman D, By Chen Y-F, Jobanputra P, Barton P, use. Briggs A, Campbell H, Clarke M, et al. Jowett S, Bryan S, Clark W, et al. By Robinson L, Hutchings D, Corner L, Beyer F, No. 35 No. 43 Dickinson H, Vanoli A, et al. Psychological therapies including Telemedicine in dermatology: a dialectical behaviour therapy for randomised controlled trial. No. 27 borderline personality disorder: a By Bowns IR, Collins K, Walters SJ, McDonagh AJG. A review of the evidence on the effects systematic review and preliminary and costs of implantable cardioverter economic evaluation. No. 44 defibrillator therapy in different By Brazier J, Tumur I, Holmes M, Cost-effectiveness of cell salvage and patient groups, and modelling of cost- Ferriter M, Parry G, Dent-Brown K, et al. alternative methods of minimising effectiveness and cost–utility for these perioperative allogeneic blood groups in a UK context. No. 36 transfusion: a systematic review and By Buxton M, Caine N, Chase D, Clinical effectiveness and cost- economic model. Connelly D, Grace A, Jackson C, et al. effectiveness of tests for the diagnosis By Davies L, Brown TJ, Haynes S, and investigation of urinary tract Payne K, Elliott RA, McCollum C. No. 28 infection in children: a systematic review Adefovir dipivoxil and pegylated and economic model. No. 45 interferon alfa-2a for the treatment of By Whiting P, Westwood M, Bojke L, Clinical effectiveness and cost- chronic hepatitis B: a systematic review Palmer S, Richardson G, Cooper J, et al. effectiveness of laparoscopic surgery for and economic evaluation. colorectal cancer: systematic reviews and economic evaluation. By Shepherd J, Jones J, Takeda A, No. 37 Davidson P, Price A. By Murray A, Lourenco T, de Cognitive behavioural therapy in Verteuil R, Hernandez R, Fraser C, chronic fatigue syndrome: a randomised McKinley A, et al. No. 29 controlled trial of an outpatient group An evaluation of the clinical and cost- programme. No. 46 effectiveness of pulmonary artery By O’Dowd H, Gladwell P, Rogers CA, Etanercept and efalizumab for the catheters in patient management in Hollinghurst S, Gregory A. treatment of psoriasis: a systematic intensive care: a systematic review and a review. randomised controlled trial. By Woolacott N, By Harvey S, Stevens K, Harrison D, No. 38 Hawkins N, Mason A, Kainth A, Young D, Brampton W, McCabe C, et al. A comparison of the cost-effectiveness Khadjesari Z, Bravo Vergel Y, et al. of five strategies for the prevention No. 30 of nonsteroidal anti-inflammatory No. 47 drug-induced gastrointestinal toxicity: Systematic reviews of clinical decision Accurate, practical and cost-effective a systematic review with economic tools for acute . assessment of carotid stenosis in the UK. modelling. By Liu JLY, Wyatt JC, Deeks JJ, By Wardlaw JM, Chappell FM, By Brown TJ, Hooper L, Elliott RA, Clamp S, Keen J, Verde P, et al. Stevenson M, De Nigris E, Thomas S, Payne K, Webb R, Roberts C, et al. Gillard J, et al. No. 48 Evaluation of the ventricular assist device No. 39 No. 31 programme in the UK. Etanercept and infliximab for the The effectiveness and cost-effectiveness By Sharples L, Buxton M, Caine N, treatment of psoriatic arthritis: a of computed tomography screening Cafferty F, Demiris N, Dyer M, et al. systematic review and economic for coronary artery disease: systematic evaluation. review. No. 49 By Woolacott N, Bravo Vergel Y, By Waugh N, Black C, Walker S, A systematic review and economic model Hawkins N, Kainth A, Khadjesari Z, McIntyre L, Cummins E, Hillis G. of the clinical and cost-effectiveness of Misso K, et al. immunosuppressive therapy for renal transplantation in children. No. 40 By Yao G, Albon E, Adi Y, Milford D, No. 32 What are the clinical outcome and cost- Bayliss S, Ready A, et al. The cost-effectiveness of testing for effectiveness of undertaken hepatitis C in former injecting drug by nurses when compared with doctors? No. 50 users. A Multi-Institution Nurse Endoscopy Amniocentesis results: investigation of By Castelnuovo E, Thompson-Coon J, Trial (MINuET). anxiety. The ARIA trial. Pitt M, Cramp M, Siebert U, Price A, By Williams J, Russell I, Durai D, By Hewison J, Nixon J, Fountain J, et al. Cheung W-Y, Farrin A, Bloor K, et al. Cocks K, Jones C, Mason G, et al. 121

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

Volume 11, 2007 No. 10 No. 19 Exercise Evaluation Randomised Trial The clinical effectiveness and cost- No. 1 (EXERT): a randomised trial comparing effectiveness of gemcitabine for Pemetrexed disodium for the treatment GP referral for leisure centre-based metastatic breast cancer: a systematic of malignant pleural mesothelioma: exercise, community-based walking and review and economic evaluation. a systematic review and economic advice only. By Takeda AL, Jones J, Loveman E, evaluation. By Isaacs AJ, Critchley JA, Tan SC, Clegg AJ. By Dundar Y, Bagust A, Dickson R, See Tai S, Buckingham K, Westley D, Dodd S, Green J, Haycox A, et al. Harridge SDR, et al. No. 20 No. 2 No. 11 A systematic review of duplex ultrasound, A systematic review and economic model Interferon alfa (pegylated and non- magnetic resonance angiography and of the clinical effectiveness and cost- pegylated) and ribavirin for the computed tomography angiography effectiveness of docetaxel in combination treatment of mild chronic hepatitis for the diagnosis and assessment of with prednisone or prednisolone for C: a systematic review and economic symptomatic, lower limb peripheral the treatment of hormone-refractory evaluation. arterial disease. metastatic prostate cancer. By Shepherd J, Jones J, Hartwell D, By Collins R, Cranny G, Burch J, By Collins R, Fenwick E, Trowman R, Davidson P, Price A, Waugh N. Aguiar-Ibáñez R, Craig D, Wright K, et al. Perard R, Norman G, Light K, et al. No. 12 No. 3 No. 21 Systematic review and economic A systematic review of rapid diagnostic The clinical effectiveness and cost- evaluation of bevacizumab and tests for the detection of tuberculosis effectiveness of treatments for children cetuximab for the treatment of metastatic infection. with idiopathic steroid-resistant colorectal cancer. By Dinnes J, Deeks J, Kunst H, nephrotic syndrome: a systematic review. By Tappenden P, Jones R, Paisley S, Gibson A, Cummins E, Waugh N, et al. By Colquitt JL, Kirby J, Green C, Carroll C. Cooper K, Trompeter RS. No. 4 No. 13 The clinical effectiveness and cost- No. 22 effectiveness of strontium ranelate for A systematic review and economic the prevention of osteoporotic fragility evaluation of epoetin alfa, epoetin A systematic review of the routine fractures in postmenopausal women. beta and darbepoetin alfa in anaemia monitoring of growth in children of By Stevenson M, Davis S, Lloyd- associated with cancer, especially that primary school age to identify growth- Jones M, Beverley C. attributable to cancer treatment. related conditions. By Wilson J, Yao GL, Raftery J, By Fayter D, Nixon J, Hartley S, No. 5 Bohlius J, Brunskill S, Sandercock J, et al. Rithalia A, Butler G, Rudolf M, et al. A systematic review of quantitative and qualitative research on the role and No. 14 No. 23 effectiveness of written information A systematic review and economic Systematic review of the effectiveness of available to patients about individual evaluation of statins for the prevention of preventing and treating Staphylococcus medicines. coronary events. aureus carriage in reducing peritoneal By Raynor DK, By Ward S, Lloyd Jones M, Pandor A, catheter-related infections. Blenkinsopp A, Knapp P, Grime J, Holmes M, Ara R, Ryan A, et al. Nicolson DJ, Pollock K, et al. By McCormack K, Rabindranath K, Kilonzo M, Vale L, Fraser C, McIntyre L, No. 15 et al. No. 6 A systematic review of the effectiveness Oral naltrexone as a treatment for and cost-effectiveness of different models relapse prevention in formerly opioid- of community-based respite care for frail No. 24 dependent drug users: a systematic older people and their carers. The clinical effectiveness and cost review and economic evaluation. By Mason A, of repetitive transcranial magnetic By Adi Y, Juarez-Garcia A, Wang D, Weatherly H, Spilsbury K, Arksey H, stimulation versus electroconvulsive Jowett S, Frew E, Day E, et al. Golder S, Adamson J, et al. therapy in severe depression: a multicentre pragmatic randomised No. 7 No. 16 controlled trial and economic analysis. Glucocorticoid-induced osteoporosis: Additional therapy for young children By McLoughlin DM, Mogg A, a systematic review and cost–utility with spastic cerebral palsy: a randomised Eranti S, Pluck G, Purvis R, Edwards D, analysis. controlled trial. et al. By Kanis JA, Stevenson M, By Weindling AM, Cunningham CC, McCloskey EV, Davis S, Lloyd-Jones M. Glenn SM, Edwards RT, Reeves DJ. No. 25 No. 8 A randomised controlled trial and Epidemiological, social, diagnostic and No. 17 economic evaluation of direct versus economic evaluation of population Screening for type 2 diabetes: literature indirect and individual versus group screening for genital chlamydial review and economic modelling. modes of speech and language therapy infection. By Waugh N, Scotland G, for children with primary language By Low N, McCarthy A, Macleod J, McNamee P, Gillett M, Brennan A, impairment. Goyder E, et al. Salisbury C, Campbell R, Roberts TE, By Boyle J, McCartney E, Forbes J, et al. O’Hare A. No. 18 No. 9 The effectiveness and cost- No. 26 Methadone and buprenorphine for the effectiveness of cinacalcet for secondary management of opioid dependence: hyperparathyroidism in end-stage renal Hormonal therapies for early breast a systematic review and economic disease patients on dialysis: a systematic cancer: systematic review and economic evaluation. review and economic evaluation. evaluation. By Connock M, Juarez-Garcia A, By Garside R, Pitt M, Anderson R, By Hind D, Ward S, De Nigris E, 122 Jowett S, Frew E, Liu Z, Taylor RJ, et al. Mealing S, Roome C, Snaith A, et al. Simpson E, Carroll C, Wyld L. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 27 No. 36 No. 45 Cardioprotection against the toxic effects A systematic review of the clinical, public The effectiveness and cost-effectiveness of anthracyclines given to children with health and cost-effectiveness of rapid of carmustine implants and cancer: a systematic review. diagnostic tests for the detection and temozolomide for the treatment of By Bryant J, Picot J, Levitt G, identification of bacterial intestinal newly diagnosed high-grade glioma: Sullivan I, Baxter L, Clegg A. pathogens in faeces and food. a systematic review and economic By Abubakar I, Irvine L, Aldus CF, evaluation. No. 28 Wyatt GM, Fordham R, Schelenz S, et al. By Garside R, Pitt M, Anderson R, Adalimumab, etanercept and infliximab Rogers G, Dyer M, Mealing S, et al. for the treatment of ankylosing No. 37 spondylitis: a systematic review and A randomised controlled trial examining No. 46 economic evaluation. the longer-term outcomes of standard By McLeod C, Bagust A, Boland A, versus new antiepileptic drugs. The Drug-eluting stents: a systematic review Dagenais P, Dickson R, Dundar Y, et al. SANAD trial. and economic evaluation. By Marson AG, Appleton R, By Hill RA, Boland A, Dickson R, No. 29 Baker GA, Chadwick DW, Doughty J, Dündar Y, Haycox A, McLeod C, et al. Prenatal screening and treatment Eaton B, et al. strategies to prevent group B No. 47 streptococcal and other bacterial No. 38 The clinical effectiveness and cost- infections in early infancy: cost- Clinical effectiveness and cost- effectiveness of cardiac resynchronisation effectiveness and expected value of effectiveness of different models (biventricular pacing) for heart failure: information analyses. of managing long-term oral anti- systematic review and economic model. By Colbourn T, Asseburg C, Bojke L, coagulation therapy: a systematic review By Fox M, Mealing S, Anderson R, Philips Z, Claxton K, Ades AE, et al. and economic modelling. Dean J, Stein K, Price A, et al. By Connock M, Stevens C, Fry- No. 30 Smith A, Jowett S, Fitzmaurice D, No. 48 Clinical effectiveness and cost- Moore D, et al. effectiveness of bone morphogenetic Recruitment to randomised trials: proteins in the non-healing of fractures No. 39 strategies for trial enrolment and participation study. The STEPS study. and spinal fusion: a systematic review. A systematic review and economic By Garrison KR, Donell S, Ryder J, model of the clinical effectiveness and By Campbell MK, Snowdon C, Shemilt I, Mugford M, Harvey I, et al. cost-effectiveness of interventions for Francis D, Elbourne D, McDonald AM, preventing relapse in people with bipolar Knight R, et al. No. 31 disorder. A randomised controlled trial of By Soares-Weiser K, Bravo Vergel Y, No. 49 postoperative radiotherapy following Beynon S, Dunn G, Barbieri M, Duffy S, Cost-effectiveness of functional cardiac breast-conserving surgery in a minimum- et al. testing in the diagnosis and management risk older population. The PRIME trial. of coronary artery disease: a randomised By Prescott RJ, Kunkler IH, No. 40 controlled trial. The CECaT trial. Williams LJ, King CC, Jack W, van der Taxanes for the adjuvant treatment of Pol M, et al. By Sharples L, Hughes V, Crean A, early breast cancer: systematic review and Dyer M, Buxton M, Goldsmith K, et al. economic evaluation. No. 32 By Ward S, Simpson E, Davis S, No. 50 Current practice, accuracy, effectiveness Hind D, Rees A, Wilkinson A. and cost-effectiveness of the school entry Evaluation of diagnostic tests when there hearing screen. No. 41 is no gold standard. A review of methods. By Bamford J, Fortnum H, Bristow K, The clinical effectiveness and cost- By Rutjes AWS, Smith J, Vamvakas G, Davies L, et al. effectiveness of screening for open Reitsma JB, Coomarasamy A, Khan KS, angle glaucoma: a systematic review and Bossuyt PMM. No. 33 economic evaluation. The clinical effectiveness and cost- By Burr JM, Mowatt G, Hernández R, No. 51 effectiveness of inhaled insulin in Siddiqui MAR, Cook J, Lourenco T, et al. diabetes mellitus: a systematic review and Systematic reviews of the clinical economic evaluation. effectiveness and cost-effectiveness of No. 42 proton pump inhibitors in acute upper By Black C, Cummins E, Royle P, gastrointestinal bleeding. Philip S, Waugh N. Acceptability, benefit and costs of early screening for hearing disability: a study By Leontiadis GI, of potential screening tests and models. No. 34 Sreedharan A, Dorward S, Barton P, Surveillance of cirrhosis for By Davis A, Smith P, Ferguson M, Delaney B, Howden CW, et al. hepatocellular carcinoma: systematic Stephens D, Gianopoulos I. review and economic analysis. No. 52 No. 43 By Thompson A review and critique of modelling in Coon J, Rogers G, Hewson P, Wright D, Contamination in trials of educational prioritising and designing screening Anderson R, Cramp M, et al. interventions. programmes. By Keogh-Brown MR, By Karnon J, Goyder E, Tappenden P, Bachmann MO, Shepstone L, Hewitt C, No. 35 McPhie S, Towers I, Brazier J, et al. The Birmingham Rehabilitation Howe A, Ramsay CR, et al. Uptake Maximisation Study (BRUM). No. 53 Homebased compared with hospital- No. 44 based cardiac rehabilitation in a multi- Overview of the clinical effectiveness of An assessment of the impact of the ethnic population: cost-effectiveness and positron emission tomography imaging NHS Health Technology Assessment patient adherence. in selected cancers. Programme. By Jolly K, Taylor R, Lip GYH, By Facey K, Bradbury I, Laking G, By Hanney S, Buxton M, Green C, Greenfield S, Raftery J, Mant J, et al. Payne E. Coulson D, Raftery J. 123

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

Volume 12, 2008 No. 9 No. 17 The clinical effectiveness of diabetes Systematic review of the clinical No. 1 education models for Type 2 diabetes: a effectiveness and cost-effectiveness of A systematic review and economic model systematic review. 64-slice or higher computed tomography angiography as an alternative to of switching from nonglycopeptide to By Loveman E, Frampton GK, invasive coronary angiography in the glycopeptide antibiotic prophylaxis for Clegg AJ. surgery. investigation of coronary artery disease. By Mowatt G, Cummins E, Waugh N, By Cranny G, Elliott R, Weatherly H, No. 10 Chambers D, Hawkins N, Myers L, et al. Walker S, Cook J, Jia X, et al. Payment to healthcare professionals for patient recruitment to trials: systematic No. 2 No. 18 review and qualitative study. ‘Cut down to quit’ with nicotine Structural neuroimaging in psychosis: replacement therapies in smoking By Raftery J, Bryant J, Powell J, a systematic review and economic cessation: a systematic review of Kerr C, Hawker S. evaluation. effectiveness and economic analysis. By Albon E, Tsourapas A, Frew E, By Wang D, Connock M, Barton P, No. 11 Davenport C, Oyebode F, Bayliss S, et al. Fry-Smith A, Aveyard P, Moore D. Cyclooxygenase-2 selective non-steroidal anti-inflammatory drugs (etodolac, No. 19 No. 3 meloxicam, celecoxib, rofecoxib, Systematic review and economic analysis A systematic review of the effectiveness etoricoxib, valdecoxib and lumiracoxib) for osteoarthritis and rheumatoid of the comparative effectiveness of of strategies for reducing fracture risk in different inhaled corticosteroids and children with juvenile idiopathic arthritis arthritis: a systematic review and economic evaluation. their usage with long-acting beta2 with additional data on long-term risk of agonists for the treatment of chronic fracture and cost of disease management. By Chen Y-F, Jobanputra P, Barton P, asthma in adults and children aged By Thornton J, Ashcroft D, O’Neill T, Bryan S, Fry-Smith A, Harris G, et al. 12 years and over. Elliott R, Adams J, Roberts C, et al. By Shepherd J, Rogers G, No. 12 Anderson R, Main C, Thompson-Coon J, No. 4 The clinical effectiveness and cost- Hartwell D, et al. Does befriending by trained lay workers effectiveness of central venous catheters improve psychological well-being and treated with anti-infective agents in No. 20 quality of life for carers of people preventing bloodstream infections: with dementia, and at what cost? A a systematic review and economic Systematic review and economic analysis randomised controlled trial. evaluation. of the comparative effectiveness of different inhaled corticosteroids and By Charlesworth G, Shepstone L, By Hockenhull JC, Dwan K, their usage with long-acting beta Wilson E, Thalanany M, Mugford M, 2 Boland A, Smith G, Bagust A, Dundar Y, agonists for the treatment of chronic Poland F. et al. asthma in children under the age of 12 years. No. 5 No. 13 By Main C, Shepherd J, Anderson R, A multi-centre retrospective cohort study Rogers G, Thompson-Coon J, Liu Z, et al. comparing the efficacy, safety and cost- Stepped treatment of older adults on effectiveness of hysterectomy and uterine laxatives. The STOOL trial. artery embolisation for the treatment By Mihaylov S, Stark C, McColl E, No. 21 of symptomatic uterine fibroids. The Steen N, Vanoli A, Rubin G, et al. Ezetimibe for the treatment of HOPEFUL study. hypercholesterolaemia: a systematic By Hirst A, Dutton S, Wu O, Briggs A, No. 14 review and economic evaluation. Edwards C, Waldenmaier L, et al. A randomised controlled trial of By Ara R, Tumur I, Pandor A, cognitive behaviour therapy in Duenas A, Williams R, Wilkinson A, et al. No. 6 adolescents with major depression Methods of prediction and prevention treated by selective serotonin reuptake No. 22 of pre-eclampsia: systematic reviews of inhibitors. The ADAPT trial. Topical or oral ibuprofen for chronic accuracy and effectiveness literature with By Goodyer IM, knee pain in older people. The TOIB economic modelling. Dubicka B, Wilkinson P, Kelvin R, study. By Meads CA, Cnossen JS, Meher S, Roberts C, Byford S, et al. By Underwood M, Juarez-Garcia A, ter Riet G, Duley L, et al. Ashby D, Carnes D, Castelnuovo E, No. 15 Cross P, Harding G, et al. No. 7 The use of irinotecan, oxaliplatin and The use of economic evaluations in NHS raltitrexed for the treatment of advanced No. 23 decision-making: a review and empirical colorectal cancer: systematic review and investigation. A prospective randomised comparison of economic evaluation. minor surgery in primary and secondary By Williams I, McIver S, Moore D, care. The MiSTIC trial. Bryan S. By Hind D, Tappenden P, Tumur I, Eggington E, Sutcliffe P, Ryan A. By George S, Pockney P, Primrose J, Smith H, Little P, Kinley H, et al. No. 8 No. 16 Stapled haemorrhoidectomy No. 24 (haemorrhoidopexy) for the treatment of Ranibizumab and pegaptanib for haemorrhoids: a systematic review and the treatment of age-related macular A review and critical appraisal economic evaluation. degeneration: a systematic review and of measures of therapist–patient By Burch J, Epstein D, Baba- economic evaluation. interactions in mental health settings. Akbari A, Weatherly H, Fox D, Golder S, By Colquitt JL, Jones J, Tan SC, By Cahill J, Barkham M, Hardy G, 124 et al. Takeda A, Clegg AJ, Price A. Gilbody S, Richards D, Bower P, et al. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 25 No. 34 No. 6 The clinical effectiveness and cost- Curative catheter ablation in atrial The harmful health effects of recreational effectiveness of screening programmes fibrillation and typical atrial flutter: ecstasy: a systematic review of for amblyopia and strabismus in children systematic review and economic observational evidence. up to the age of 4–5 years: a systematic evaluation. By Rogers G, Elston J, Garside R, review and economic evaluation. By Rodgers M, McKenna C, Palmer S, Roome C, Taylor R, Younger P, et al. By Carlton J, Karnon J, Czoski- Chambers D, Van Hout S, Golder S, et al. Murray C, Smith KJ, Marr J. No. 7 Systematic review of the clinical No. 26 No. 35 effectiveness and cost-effectiveness A systematic review of the clinical Systematic review and economic of oesophageal Doppler monitoring effectiveness and cost-effectiveness and modelling of effectiveness and cost utility in critically ill and high-risk surgical economic modelling of minimal incision of surgical treatments for men with patients. total hip replacement approaches in the benign prostatic enlargement. By Mowatt G, Houston G, management of arthritic disease of the By Lourenco T, Armstrong N, Hernández R, de Verteuil R, Fraser C, hip. N’Dow J, Nabi G, Deverill M, Pickard R, Cuthbertson B, et al. By de Verteuil R, Imamura M, Zhu S, et al. Glazener C, Fraser C, Munro N, et al. No. 8 No. 36 The use of surrogate outcomes in model- No. 27 based cost-effectiveness analyses: a survey Immunoprophylaxis against respiratory A preliminary model-based assessment of UK Health Technology Assessment syncytial virus (RSV) with palivizumab of the cost–utility of a screening reports. in children: a systematic review and programme for early age-related macular By Taylor RS, Elston J. degeneration. economic evaluation. By Karnon J, Czoski-Murray C, By Wang D, Cummins C, Bayliss S, No. 9 Smith K, Brand C, Chakravarthy U, Sandercock J, Burls A. Controlling Hypertension and Davis S, et al. Hypotension Immediately Post Stroke (CHHIPS) – a randomised controlled No. 28 Volume 13, 2009 trial. Intravenous magnesium sulphate and By Potter J, Mistri A, Brodie F, sotalol for prevention of atrial fibrillation Chernova J, Wilson E, Jagger C, et al. after coronary artery bypass surgery: No. 1 Deferasirox for the treatment of iron a systematic review and economic No. 10 evaluation. overload associated with regular Routine antenatal anti-D prophylaxis blood transfusions (transfusional By Shepherd J, Jones J, for RhD-negative women: a systematic haemosiderosis) in patients suffering with Frampton GK, Tanajewski L, Turner D, review and economic evaluation. Price A. chronic anaemia: a systematic review and By Pilgrim H, Lloyd-Jones M, Rees A. economic evaluation. No. 29 By McLeod C, Fleeman N, No. 11 Absorbent products for urinary/faecal Kirkham J, Bagust A, Boland A, Chu P, Amantadine, oseltamivir and zanamivir incontinence: a comparative evaluation et al. for the prophylaxis of influenza of key product categories. (including a review of existing guidance By Fader M, Cottenden A, Getliffe K, No. 2 no. 67): a systematic review and Gage H, Clarke-O’Neill S, Jamieson K, economic evaluation. et al. Thrombophilia testing in people with venous thromboembolism: systematic By Tappenden P, Jackson R, No. 30 review and cost-effectiveness analysis. Cooper K, Rees A, Simpson E, Read R, et al. A systematic review of repetitive By Simpson EL, Stevenson MD, functional task practice with modelling of Rawdin A, Papaioannou D. resource use, costs and effectiveness. No. 12 Improving the evaluation of therapeutic By French B, Leathley M, Sutton C, No. 3 McAdam J, Thomas L, Forster A, et al. interventions in multiple sclerosis: the Surgical procedures and non-surgical role of new psychometric methods. devices for the management of non- By Hobart J, Cano S. No. 31 apnoeic snoring: a systematic review of The effectiveness and cost-effectivness of clinical effects and associated treatment No. 13 minimal access surgery amongst people costs. with gastro-oesophageal reflux disease Treatment of severe ankle sprain: a By Main C, Liu Z, Welch K, Weiner G, – a UK collaborative study. The reflux pragmatic randomised controlled trial trial. Quentin Jones S, Stein K. comparing the clinical effectiveness By Grant A, Wileman S, Ramsay C, and cost-effectiveness of three types of Bojke L, Epstein D, Sculpher M, et al. No. 4 mechanical ankle support with tubular Continuous positive airway pressure bandage. The CAST trial. No. 32 devices for the treatment of obstructive By Cooke MW, Marsh JL, Clark M, Time to full publication of studies of anti- sleep apnoea–hypopnoea syndrome: a Nakash R, Jarvis RM, Hutton JL, et al., cancer medicines for breast cancer and systematic review and economic analysis. on behalf of the CAST trial group. the potential for publication bias: a short By McDaid C, Griffin S, Weatherly H, No. 14 systematic review. Durée K, van der Burgt M, van Hout S, Non-occupational postexposure By Takeda A, Loveman E, Harris P, Akers J, et al. Hartwell D, Welch K. prophylaxis for HIV: a systematic review. By Bryant J, Baxter L, Hird S. No. 5 No. 33 Performance of screening tests for Use of classical and novel biomarkers No. 15 child physical abuse in accident and as prognostic risk factors for localised Blood glucose self-monitoring in type 2 emergency departments. prostate cancer: a systematic review. diabetes: a randomised controlled trial. By Woodman J, Pitt M, Wentz R, By Sutcliffe P, Hummel S, Simpson E, By Farmer AJ, Wade AN, French DP, Taylor B, Hodes D, Gilbert RE. Young T, Rees A, Wilkinson A, et al. Simon J, Yudkin P, Gray A, et al. 125

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 16 No. 24 Bortezomib for the treatment of multiple How far does screening women for Enhanced external counterpulsation for myeloma patients. domestic (partner) violence in different the treatment of stable angina and heart By Green C, Bryant J, Takeda A, health-care settings meet criteria for failure: a systematic review and economic Cooper K, Clegg A, Smith A, et al. a screening programme? Systematic analysis. reviews of nine UK National Screening By McKenna C, McDaid C, Fludarabine phosphate for the first- Committee criteria. Suekarran S, Hawkins N, Claxton K, line treatment of chronic lymphocytic By Feder G, Ramsay J, Dunne D, Light K, et al. leukaemia. Rose M, Arsene C, Norman R, et al. By Walker S, Palmer S, Erhorn S, No. 25 Brent S, Dyker A, Ferrie L, et al. No. 17 Development of a decision support tool Erlotinib for the treatment of relapsed for primary care management of patients Spinal cord stimulation for chronic non-small cell lung cancer. pain of neuropathic or ischaemic with abnormal function tests without origin: systematic review and economic clinically apparent liver disease: a record- By McLeod C, Bagust A, Boland A, evaluation. linkage population cohort study and Hockenhull J, Dundar Y, Proudlove C, et al. By Simpson, EL, Duenas A, decision analysis (ALFIE). Holmes MW, Papaioannou D, Chilcott J. By Donnan PT, McLernon D, Cetuximab plus radiotherapy for the Dillon JF, Ryder S, Roderick P, Sullivan F, treatment of locally advanced squamous No. 18 et al. cell carcinoma of the head and neck. The role of magnetic resonance imaging By Griffin S, Walker S, Sculpher M, in the identification of suspected acoustic No. 26 White S, Erhorn S, Brent S, et al. neuroma: a systematic review of clinical A systematic review of presumed consent and cost-effectiveness and natural history. systems for deceased organ donation. Infliximab for the treatment of adults By Fortnum H, O’Neill C, Taylor R, By Rithalia A, with psoriasis. Lenthall R, Nikolopoulos T, Lightfoot G, McDaid C, Suekarran S, Norman G, By Loveman E, Turner D, Hartwell D, et al. Myers L, Sowden A. Cooper K, Clegg A.

No. 19 No. 27 No. 30 Dipsticks and diagnostic algorithms in Paracetamol and ibuprofen for the Psychological interventions for postnatal urinary tract infection: development and treatment of fever in children: the depression: cluster randomised trial and validation, randomised trial, economic PITCH randomised controlled trial. economic evaluation. The PoNDER trial. analysis, observational cohort and By Hay AD, Redmond NM, By Morrell CJ, Warner R, Slade P, qualitative study. Costelloe C, Montgomery AA, Fletcher M, Dixon S, Walters S, Paley G, et al. By Little P, Turner S, Rumsby K, Hollinghurst S, et al. Warner G, Moore M, Lowes JA, et al. No. 31 No. 28 The effect of different treatment No. 20 A randomised controlled trial to compare durations of clopidogrel in patients with Systematic review of respite care in the minimally invasive glucose monitoring non-ST-segment elevation acute coronary frail elderly. devices with conventional monitoring syndromes: a systematic review and value in the management of insulin-treated By Shaw C, McNamara R, Abrams K, of information analysis. diabetes mellitus (MITRE). Cannings-John R, Hood K, Longo M, By Rogowski R, Burch J, Palmer S, et al. By Newman SP, Cooke D, Casbard A, Craigs C, Golder S, Woolacott N. Walker S, Meredith S, Nunn A, et al. No. 21 No. 32 No. 29 Neuroleptics in the treatment of Systematic review and individual patient aggressive challenging behaviour for Sensitivity analysis in economic data meta-analysis of diagnosis of heart people with intellectual disabilities: a evaluation: an audit of NICE current failure, with modelling of implications of randomised controlled trial (NACHBID). practice and a review of its use and value different diagnostic strategies in primary in decision-making. By Tyrer P, Oliver- care. Africano P, Romeo R, Knapp M, By Andronis L, Barton P, Bryan S. By Mant J, Doust J, Roalfe A, Dickens S, Bouras N, et al. Barton P, Cowie MR, Glasziou P, et al. Suppl. 1 No. 22 Trastuzumab for the treatment of No. 33 Randomised controlled trial to primary breast cancer in HER2-positive A multicentre randomised controlled determine the clinical effectiveness and women: a single technology appraisal. trial of the use of continuous positive cost-effectiveness of selective serotonin By Ward S, Pilgrim H, Hind D. airway pressure and non-invasive reuptake inhibitors plus supportive care, positive pressure ventilation in the early versus supportive care alone, for mild Docetaxel for the adjuvant treatment treatment of patients presenting to the to moderate depression with somatic of early node-positive breast cancer: a emergency department with severe acute symptoms in primary care: the THREAD single technology appraisal. cardiogenic pulmonary oedema: the (THREshold for AntiDepressant By Chilcott J, Lloyd Jones M, 3CPO trial. response) study. Wilkinson A. By Gray AJ, Goodacre S, By Kendrick T, Chatwin J, Dowrick C, The use of paclitaxel in the management Newby DE, Masson MA, Sampson F, Tylee A, Morriss R, Peveler R, et al. of early stage breast cancer. Dixon S, et al., on behalf of the 3CPO study investigators. By Griffin S, Dunn G, Palmer S, No. 23 Macfarlane K, Brent S, Dyker A, et al. Diagnostic strategies using DNA testing No. 34 for hereditary haemochromatosis in at- Rituximab for the first-line treatment Early high-dose lipid-lowering therapy to risk populations: a systematic review and of stage III/IV follicular non-Hodgkin’s avoid cardiac events: a systematic review economic evaluation. lymphoma. and economic evaluation. By Bryant J, Cooper K, Picot J, By Dundar Y, Bagust A, Hounsome J, By Ara R, Pandor A, Stevens J, 126 Clegg A, Roderick P, Rosenberg W, et al. McLeod C, Boland A, Davis H, et al. Rees A, Rafia R. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 35 No. 44 No. 46 Adefovir dipivoxil and pegylated The effectiveness and cost-effectiveness The effects of biofeedback for the interferon alpha for the treatment of cochlear implants for severe to treatment of essential hypertension: a of chronic hepatitis B: an updated profound deafness in children and systematic review. systematic review and economic adults: a systematic review and economic By Greenhalgh J, Dickson R, evaluation. model. Dundar Y. By Jones J, Shepherd J, Baxter L, By Bond M, Mealing S, Anderson R, Gospodarevskaya E, Hartwell D, Harris P, Elston J, Weiner G, Taylor RS, et al. No. 47 et al. A randomised controlled trial of the Suppl. 2 use of aciclovir and/or prednisolone for No. 36 the early treatment of Bell’s palsy: the Gemcitabine for the treatment of BELLS study. Methods to identify postnatal depression metastatic breast cancer. in primary care: an integrated evidence By Sullivan FM, Swan IRC, synthesis and value of information By Jones J, Takeda A, Tan SC, Donnan PT, Morrison JM, Smith BH, analysis. Cooper K, Loveman E, Clegg A. McKinstry B, et al. By Hewitt CE, Gilbody SM, Brealey S, Varenicline in the management of et al. Paulden M, Palmer S, Mann R, smoking cessation: a single technology Suppl. 3 appraisal. Lapatinib for the treatment of HER2- No. 37 overexpressing breast cancer. By Hind D, Tappenden P, Peters J, By Jones J, Takeda A, Picot J, von A double-blind randomised placebo- Kenjegalieva K. controlled trial of topical intranasal Keyserlingk C, Clegg A. corticosteroids in 4- to 11-year-old Alteplase for the treatment of acute children with persistent bilateral otitis Infliximab for the treatment of ulcerative ischaemic stroke: a single technology colitis. media with effusion in primary care. appraisal. By Hyde C, Bryan S, Juarez-Garcia A, By Williamson I, Benge S, Barton S, By Lloyd Jones M, Holmes M. Petrou S, Letley L, Fasey N, et al. Andronis L, Fry-Smith A. Rituximab for the treatment of Rimonabant for the treatment of No. 38 rheumatoid arthritis. overweight and obese people. The effectiveness and cost-effectiveness By Bagust A, Boland A, Hockenhull J, By Burch J, McKenna C, Palmer S, of methods of storing donated kidneys Fleeman N, Greenhalgh J, Dundar Y, Norman G, Glanville J, Sculpher M, et al. from deceased donors: a systematic et al. review and economic model. Telbivudine for the treatment of chronic By Bond M, Pitt M, Akoh J, Omalizumab for the treatment of severe hepatitis B infection. Moxham T, Hoyle M, Anderson R. persistent allergic asthma. By Hartwell D, Jones J, Harris P, By Jones J, Shepherd J, Hartwell D, Cooper K. No. 39 Harris P, Cooper K, Takeda A, et al. Rehabilitation of older patients: day Entecavir for the treatment of chronic hospital compared with rehabilitation at Rituximab for the treatment of relapsed hepatitis B infection. home. A randomised controlled trial. or refractory stage III or IV follicular By Shepherd J, Gospodarevskaya E, By Parker SG, Oliver P, Pennington M, non-Hodgkin’s lymphoma. Frampton G, Cooper, K. Bond J, Jagger C, Enderby PM, et al. By Boland A, Bagust A, Hockenhull J, Febuxostat for the treatment of Davis H, Chu P, Dickson R. hyperuricaemia in people with gout: a No. 40 single technology appraisal. Adalimumab for the treatment of Breastfeeding promotion for infants in By Stevenson M, Pandor A. neonatal units: a systematic review and psoriasis. economic analysis By Turner D, Picot J, Cooper K, Rivaroxaban for the prevention of venous By Renfrew MJ, Craig D, Dyson L, Loveman E. thromboembolism: a single technology McCormick F, Rice S, King SE, et al. appraisal. Dabigatran etexilate for the prevention By Stevenson M, Scope A, Holmes M, No. 41 of venous thromboembolism in patients Rees A, Kaltenthaler E. undergoing elective hip and knee The clinical effectiveness and cost- surgery: a single technology appraisal. effectiveness of bariatric (weight loss) Cetuximab for the treatment of recurrent surgery for obesity: a systematic review and By Holmes M, C Carroll C, and/or metastatic squamous cell economic evaluation. Papaioannou D. carcinoma of the head and neck. By Picot J, By Greenhalgh J, Bagust A, Boland A, Jones J, Colquitt JL, Gospodarevskaya E, Romiplostim for the treatment of chronic Fleeman N, McLeod C, Dundar Y, et al. Loveman E, Baxter L, et al. immune or idiopathic thrombocytopenic purpura: a single technology appraisal. Mifamurtide for the treatment of osteosarcoma: a single technology By Mowatt G, Boachie C, Crowther M, No. 42 appraisal. Fraser C, Hernández R, Jia X, et al. Rapid testing for group B streptococcus By Pandor A, Fitzgerald P, during labour: a test accuracy study with Stevenson M, Papaioannou D. evaluation of acceptability and cost- Sunitinib for the treatment of gastrointestinal stromal tumours: a effectiveness. Ustekinumab for the treatment of critique of the submission from Pfizer. By Daniels J, Gray J, Pattison H, moderate to severe psoriasis. Roberts T, Edwards E, Milner P, et al. By Bond M, Hoyle M, Moxham T, By Gospodarevskaya E, Picot J, Napier M, Anderson R. Cooper K, Loveman E, Takeda A. No. 43 Screening to prevent spontaneous No. 45 No. 48 preterm birth: systematic reviews of Vitamin K to prevent fractures in older Endovascular stents for abdominal aortic accuracy and effectiveness literature with women: systematic review and economic aneurysms: a systematic review and economic modelling. evaluation. economic model. By Honest H, Forbes CA, Durée KH, By Stevenson M, Lloyd-Jones M, By Chambers D, Epstein D, Walker S, Norman G, Duffy SB, Tsourapas A, et al. Papaioannou D. Fayter D, Paton F, Wright K, et al. 127

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment reports published to date

No. 49 No. 57 No. 3 Clinical and cost-effectiveness of Communication of carrier status The clinical effectiveness and cost- epoprostenol, iloprost, bosentan, information following universal newborn effectiveness of testing for cytochrome sitaxentan and sildenafil for pulmonary screening for sickle cell disorders and P450 polymorphisms in patients with arterial hypertension within their cystic fibrosis: qualitative study of schizophrenia treated with antipsychotics: licensed indications: a systematic review experience and practice. a systematic review and economic evaluation. and economic evaluation. By Kai J, Ulph F, Cullinan T, By Fleeman N, McLeod C, Bagust A, Qureshi N. By Chen Y-F, Jowett S, Barton P, Beale S, Boland A, Dundar Y, et al. Malottki K, Hyde C, Gibbs JSR, et al. No. 58 No. 4 No. 50 Antiviral drugs for the treatment of Systematic review of the clinical Cessation of attention deficit influenza: a systematic review and effectiveness and cost-effectiveness of hyperactivity disorder drugs in the young economic evaluation. photodynamic diagnosis and urine (CADDY) – a pharmacoepidemiological By Burch J, Paulden M, Conti S, biomarkers (FISH, ImmunoCyt, NMP22) and qualitative study. Stock C, Corbett M, Welton NJ, et al. and cytology for the detection and follow- up of bladder cancer. By Wong ICK, Asherson P, Bilbow A, By Mowatt G, Zhu S, Kilonzo M, Clifford S, Coghill D, R DeSoysa R, et al. No. 59 Boachie C, Fraser C, Griffiths TRL, et al. Development of a toolkit and glossary No. 51 to aid in the adaptation of health No. 5 technology assessment (HTA) reports for ARTISTIC: a randomised trial of human Effectiveness and cost-effectiveness of use in different contexts. papillomavirus (HPV) testing in primary arthroscopic lavage in the treatment cervical screening. By Chase D, Rosten C, Turner S, of osteoarthritis of the knee: a mixed Hicks N, Milne R. methods study of the feasibility of By Kitchener HC, conducting a surgical placebo-controlled Almonte M, Gilham C, Dowie R, trial (the KORAL study). Stoykova B, Sargent A, et al. No. 60 By Campbell MK, Skea ZC, Colour vision testing for diabetic Sutherland AG, Cuthbertson BH, retinopathy: a systematic review of No. 52 Entwistle VA, McDonald AM, et al. diagnostic accuracy and economic The clinical effectiveness of glucosamine evaluation. and chondroitin supplements in slowing No. 6 or arresting progression of osteoarthritis By Rodgers M, Hodges R, Hawkins J, A randomised 2 × 2 trial of community of the knee: a systematic review and Hollingworth W, Duffy S, McKibbin M, versus hospital pulmonary rehabilitation economic evaluation. et al. for chronic obstructive pulmonary By Black C, Clar C, Henderson R, disease followed by telephone or No. 61 MacEachern C, McNamee P, Quayyum Z, conventional follow-up. et al. Systematic review of the effectiveness and By Waterhouse JC, Walters SJ, cost-effectiveness of weight management Oluboyede Y, Lawson RA. schemes for the under fives: a short No. 53 report. No. 7 Randomised preference trial of medical By Bond M, Wyatt K, Lloyd J, The effectiveness and cost-effectiveness versus surgical termination of pregnancy Welch K, Taylor R. of behavioural interventions for the less than 14 weeks’ gestation (TOPS). prevention of sexually transmitted infections in young people aged 13–19: By Robson SC, Kelly T, Howel D, No. 62 Deverill M, Hewison J, Lie MLS, et al. a systematic review and economic Are adverse effects incorporated in evaluation. economic models? An initial review of By Shepherd J, Kavanagh J, Picot J, No. 54 current practice. Cooper K, Harden A, Barnett-Page E, Randomised controlled trial of the use By Craig D, McDaid C, Fonseca T, et al. of three dressing preparations in the Stock C, Duffy S, Woolacott N. management of chronic ulceration of the No. 8 foot in diabetes. Dissemination and publication of research findings: an updated review of By Jeffcoate WJ, Price PE, Phillips CJ, Volume 14, 2010 Game FL, Mudge E, Davies S, et al. related biases. By Song F, Parekh S, Hooper L, No. 1 Loke YK, Ryder J, Sutton AJ, et al. No. 55 Multicentre randomised controlled VenUS II: a randomised controlled trial trial examining the cost-effectiveness of No. 9 of larval therapy in the management of contrast-enhanced high field magnetic The effectiveness and cost-effectiveness leg ulcers. resonance imaging in women with of biomarkers for the prioritisation By Dumville JC, primary breast cancer scheduled for wide of patients awaiting coronary Worthy G, Soares MO, Bland JM, local excision (COMICE). revascularisation: a systematic review and Cullum N, Dowson C, et al. By Turnbull LW, Brown SR, Olivier C, decision model. Harvey I, Brown J, Drew P, et al. By Hemingway H, Henriksson M, Chen R, Damant J, No. 56 Fitzpatrick N, Abrams K, et al. A prospective randomised controlled trial No. 2 and economic modelling of antimicrobial Bevacizumab, sorafenib tosylate, No. 10 silver dressings versus non-adherent sunitinib and temsirolimus for renal Comparison of case note review methods control dressings for venous leg ulcers: cell carcinoma: a systematic review and for evaluating quality and safety in health the VULCAN trial economic evaluation. care. By Michaels JA, Campbell WB, By Thompson Coon J, Hoyle M, By Hutchinson A, Coster JE, King BM, MacIntyre J, Palfreyman SJ, Green C, Liu Z, Welch K, Moxham T, Cooper KL, McIntosh A, Walters SJ, 128 Shackley P, et al. et al. Bath PA, et al. DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

No. 11 No. 16 No. 21 Clinical effectiveness and cost- Randomised controlled trials for policy Early referral strategies for management effectiveness of continuous subcutaneous interventions: a review of reviews and of people with markers of renal disease: insulin infusion for diabetes: systematic meta-regression. a systematic review of the evidence of review and economic evaluation. By Oliver S, Bagnall AM, Thomas J, clinical effectiveness, cost-effectiveness By Cummins E, Royle P, Snaith A, Shepherd J, Sowden A, White I, et al. and economic analysis. Greene A, Robertson L, McIntyre L, et al. By Black C, Sharma P, Scotland G, No. 17 McCullough K, McGurn D, Robertson L, No. 12 Paracetamol and selective and non- et al. Self-monitoring of blood glucose in type selective non-steroidal anti-inflammatory 2 diabetes: systematic review. drugs (NSAIDs) for the reduction of No. 22 By Clar C, Barnard K, Cummins E, morphine-related side effects after major A randomised controlled trial of Royle P, Waugh N. surgery: a systematic review. cognitive behaviour therapy and By McDaid C, Maund E, Rice S, motivational interviewing for people No. 13 Wright K, Jenkins B, Woolacott N. with Type 1 diabetes mellitus with North of England and Scotland Study of persistent sub-optimal glycaemic control: Tonsillectomy and Adeno-tonsillectomy No. 18 A Diabetes and Psychological Therapies in Children (NESSTAC): a pragmatic A systematic review of outcome measures (ADaPT) study. randomised controlled trial with a used in forensic mental health research By Ismail K, Maissi E, Thomas S, parallel non-randomised preference with consensus panel opinion. Chalder T, Schmidt U, Bartlett J, et al. study. By Fitzpatrick R, Chambers J, By Lock C, Wilson J, Steen N, Burns T, Doll H, Fazel S, Jenkinson C, No. 23 Eccles M, Mason H, Carrie S, et al. et al. A randomised controlled equivalence trial to determine the effectiveness No. 14 No. 19 and cost–utility of manual chest Multicentre randomised controlled trial The clinical effectiveness and cost- physiotherapy techniques in the of the clinical and cost-effectiveness of effectiveness of topotecan for small cell management of exacerbations of a bypass-surgery-first versus a balloon- lung cancer: a systematic review and chronic obstructive pulmonary disease angioplasty-first revascularisation economic evaluation. (MATREX). strategy for severe limb ischaemia due to By Loveman E, Jones J, Hartwell D, By Cross J, Elender F, Barton G, infrainguinal disease. The Bypass versus Bird A, Harris P, Welch K, et al. Clark A, Shepstone L, Blyth A, et al. Angioplasty in Severe Ischaemia of the Leg (BASIL) trial. No. 20 No. 24 By Bradbury AW, Adam DJ, Bell J, Antenatal screening for A systematic review and economic Forbes JF, Fowkes FGR, Gillespie I, et al. haemoglobinopathies in primary care: evaluation of the clinical effectiveness a cohort study and cluster randomised and cost-effectiveness of aldosterone No. 15 trial to inform a simulation model. The antagonists for postmyocardial infarction A randomised controlled multicentre trial Screening for Haemoglobinopathies in heart failure. of treatments for adolescent anorexia First Trimester (SHIFT) trial. By McKenna C, Burch J, Suekarran S, nervosa including assessment of cost- By Dormandy E, Walker S, Bakhai A, Witte K, et al. effectiveness and patient acceptability – Bryan S, Gulliford MC, Roberts T, the TOuCAN trial. Ades T, Calnan M, et al. By Gowers SG, Clark AF, Roberts C, Byford S, Barrett B, Griffiths A, et al.

129

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Health Technology Assessment programme

Director, Deputy Director, Professor Tom Walley, Professor Jon Nicholl, Director, NIHR HTA Director, Medical Care Research programme, Professor of Unit, University of Sheffield Clinical Pharmacology, University of Liverpool

Prioritisation Strategy Group Members Chair, Dr Andrew Cook, Professor Paul Glasziou, Ms Lynn Kerridge, Professor Tom Walley, Consultant Advisor, NETSCC, Professor of Evidence-Based Chief Executive Officer, Director, NIHR HTA HTA Medicine, University of Oxford NETSCC and NETSCC, HTA programme, Professor of Clinical Pharmacology, Dr Peter Davidson, Dr Nick Hicks, Dr Ruairidh Milne, University of Liverpool Director of Science Support, Director of NHS Support, Director of Strategy and NETSCC, HTA NETSCC, HTA Development, NETSCC Deputy Chair, Professor Jon Nicholl, Professor Robin E Ferner, Dr Edmund Jessop, Ms Kay Pattison, Director, Medical Care Research Consultant Physician and Medical Adviser, National Section Head, NHS R&D Unit, University of Sheffield Director, West Midlands Centre Specialist, National Programme, Department of for Adverse Drug Reactions, Commissioning Group (NCG), Health Dr Bob Coates, City Hospital NHS Trust, Department of Health, London Consultant Advisor, NETSCC, Birmingham Ms Pamela Young, HTA Specialist Programme Manager, NETSCC, HTA

HTA Commissioning Board Members Programme Director, Professor John Cairns, Professor Allan House, Professor Mark Sculpher, Professor Tom Walley, Professor of Health Economics, Professor of Liaison Psychiatry, Professor of Health Economics, Director, NIHR HTA London School of Hygiene and University of Leeds University of York programme, Professor of Tropical Medicine Clinical Pharmacology, Dr Martin J Landray, Professor Helen Smith, University of Liverpool Professor Peter Croft, Reader in Epidemiology, Professor of Primary Care, Director of Primary Care Honorary Consultant Physician, University of Brighton Chair, Sciences Research Centre, Keele Clinical Trial Service Unit, Professor Jon Nicholl, University University of Oxford Professor Kate Thomas, Director, Medical Care Research Professor of Complementary & Unit, University of Sheffield Professor Nicky Cullum, Professor Stuart Logan, Alternative Medicine Research, Director of Centre for Evidence- Director of Health & Social University of Leeds Deputy Chair, Based Nursing, University of Care Research, The Peninsula Dr Andrew Farmer, York Medical School, Universities of Professor David John Senior Lecturer in General Exeter and Plymouth Torgerson, Practice, Department of Professor Jenny Donovan, Director of York Trials Unit, Primary Health Care, Professor of Social Medicine, Dr Rafael Perera, University of York University of Oxford University of Bristol Lecturer in Medical Statisitics, Department of Primary Health Professor Hywel Williams, Professor Ann Ashburn, Professor Steve Halligan, Care, Univeristy of Oxford Professor of Dermato- Professor of Rehabilitation Professor of Gastrointestinal Epidemiology, University of and Head of Research, Radiology, University College Professor Ian Roberts, Nottingham Southampton General Hospital Hospital, London Professor of Epidemiology & Public Health, London School Professor Deborah Ashby, Professor Freddie Hamdy, of Hygiene and Tropical Professor of Medical Statistics, Professor of Urology, Medicine Queen Mary, University of University of Sheffield London

Observers Ms Kay Pattison, Dr Morven Roberts, Section Head, NHS R&D Clinical Trials Manager, Programme, Department of Medical Research Council Health 131

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment programme

Diagnostic Technologies and Screening Panel Members Chair, Dr Dianne Baralle, Professor Anthony Robert Mrs Una Rennard, Professor Paul Glasziou, Consultant & Senior Lecturer Kendrick, Service User Representative Professor of Evidence-Based in Clinical Genetics, Human Professor of Primary Medicine, University of Oxford Genetics Division & Wessex Medical Care, University of Ms Jane Smith, Clinical Genetics Service, Southampton Consultant Ultrasound Deputy Chair, Southampton, University of Practitioner, Ultrasound Dr David Elliman, Southampton Dr Susanne M Ludgate, Department, Leeds Teaching Consultant Paediatrician and Director, Medical Devices Hospital NHS Trust, Leeds Honorary Senior Lecturer, Dr Stephanie Dancer, Agency, London Great Ormond Street Hospital, Consultant Microbiologist, Dr W Stuart A Smellie, London Hairmyres Hospital, East Dr Anne Mackie, Consultant, Bishop Auckland Kilbride Director of Programmes, UK General Hospital Professor Judith E Adams, National Screening Committee Consultant Radiologist, Dr Ron Gray, Professor Lindsay Wilson Manchester Royal Infirmary, Consultant, National Perinatal Dr David Mathew Turnbull, Central Manchester & Epidemiology Unit, Institute of Service User Representative Scientific Director of the Centre for Magnetic Resonance Manchester Children’s Health Sciences, University of Dr Michael Millar, Lead University Hospitals NHS Oxford Investigations and YCR Consultant in Microbiology, Professor of Radiology, Hull Trust, and Professor of Department of Pathology & Diagnostic Radiology, Imaging Professor Paul D Griffiths, Royal Infirmary Professor of Radiology, Microbiology, Barts and The Science and Biomedical London NHS Trust, Royal Dr Alan J Williams, Engineering, Cancer & Academic Unit of Radiology, University of Sheffield London Hospital Consultant in General Imaging Sciences, University of Medicine, Department of Manchester Mr Martin Hooper, Mr Stephen Pilling, Thoracic Medicine, The Royal Service User Representative Director, Centre for Outcomes, Bournemouth Hospital Mr A S Arunkalaivanan, Research & Effectiveness, Honorary Senior Lecturer, University College London University of Birmingham and Consultant Urogynaecologist and Obstetrician, City Hospital

Observers Dr Tim Elliott, Dr Catherine Moody, Dr Ursula Wells, Team Leader, Cancer Programme Manager, Principal Research Officer, Screening, Department of Neuroscience and Mental Department of Health Health Health Board

Disease Prevention Panel Members Chair, Dr Elizabeth Fellow-Smith, Dr Chris McCall, Professor Ian Roberts, Dr Edmund Jessop, Medical Director, West London General Practitioner, The Professor of Epidemiology and Medical Adviser, National Mental Health Trust, Middlesex Hadleigh Practice, Corfe Public Health, London School Specialist Commissioning Mullen, Dorset of Hygiene & Tropical Medicine Advisory Group (NSCAG), Dr Colin Greaves Department of Health Senior Research Fellow, Miss Nicky Mullany, Professor Carol Tannahill, Peninsular Medical School Service User Representative Glasgow Centre for Population Deputy Chair, (Primary Care) Health Professor Margaret Dr Julie Mytton, Thorogood, Dr John Jackson, Locum Consultant in Public Mrs Jean Thurston, Professor of Epidemiology, General Practitioner, Parkway Health Medicine, Bristol Service User Representative University of Warwick Medical Medical Centre, Newcastle Primary Care Trust upon Tyne Professor David Weller, School, Coventry Professor Irwin Nazareth, Head, School of Clinical Dr Robert Cook Dr Russell Jago, Professor of Primary Care Science and Community Clinical Programmes Director, Senior Lecturer in Exercise, and Director, Department of Health, University of Bazian Ltd, London Nutrition and Health, Centre Primary Care and Population Edinburgh for Sport, Exercise and Health, Sciences, University College University of Bristol London

Observers Ms Christine McGuire, Ms Kay Pattison Dr Caroline Stone, Research & Development, NHS R&D Programme/DH, Programme Manager, Medical Department of Health Leeds Research Council

132

Current and past membership details of all HTA programme ‘committees’ are available from the HTA website (www.hta.ac.uk) DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

External Devices and Physical Therapies Panel Members

Chair, Dr Dawn Carnes, Dr Peter Martin, Dr Pippa Tyrrell, Dr John Pounsford, Senior Research Fellow, Barts Consultant Neurologist, Stroke Medicine, Senior Consultant Physician North and the London School of Addenbrooke’s Hospital, Lecturer/Consultant Stroke Bristol NHS Trust, Bristol Medicine and Dentistry, Cambridge Physician, Salford Royal London Foundation Hospitals’ Trust, Deputy Chair, Dr Lorraine Pinnigton, Salford Professor E Andrea Nelson, Dr Emma Clark, Associate Professor in Reader in Wound Healing and Clinician Scientist Fellow Rehabilitation, University of Dr Sarah Tyson, Director of Research, University & Cons. Rheumatologist, Nottingham, Nottingham Senior Research Fellow & of Leeds, Leeds University of Bristol, Bristol Associate Head of School, Dr Kate Radford, University of Salford, Salford Professor Bipin Bhakta Mrs Anthea De Barton-Watson, Division of Rehabilitation and Charterhouse Professor in Service User Representative Ageing, School of Community Dr Nefyn Williams, Rehabilitation Medicine, Health Sciences. University of Clinical Senior Lecturer, Cardiff University of Leeds, Leeds Professor Christopher Griffiths, Nottingham, Nottingham University, Cardiff Professor of Primary Care, Mrs Penny Calder Barts and the London School Mr Jim Reece, Service User Representative of Medicine and Dentistry, Service User Representative London Professor Paul Carding, Professor Maria Stokes, Professor of Voice Pathology, Dr Shaheen Hamdy, Professor of Newcastle Hospital NHS Trust, Clinical Senior Lecturer Neuromusculoskeletal Newcastle and Consultant Physician, Rehabilitation, University of University of Manchester, Southampton, Southampton Manchester

Observers Dr Phillip Leech, Ms Kay Pattison, Dr Morven Roberts, Dr Ursula Wells Principal Medical Officer for Section Head R&D, DH, Leeds Clinical Trials Manager, MRC, PRP, DH, London Primary Care, Department of London Health , London

Interventional Procedures Panel Members Chair, Mr Seamus Eckford, Dr Nadim Malik, Dr Ashish Paul, Professor Jonathan Michaels, Consultant in Obstetrics & Consultant Cardiologist/ Medical Director, Bedfordshire Consultant Surgeon & Gynaecology, North Devon Honorary Lecturer, University PCT Honorary Clinical Lecturer, District Hospital of Manchester University of Sheffield Dr Sarah Purdy, Professor David Taggart, Mr Hisham Mehanna, Consultant Senior Lecturer, Mr David P Britt, Consultant Cardiothoracic Consultant & Honorary University of Bristol Service User Representative, Surgeon, John Radcliffe Associate Professor, University Cheshire Hospital Hospitals Coventry & Mr Michael Thomas, Warwickshire NHS Trust Consultant Colorectal Surgeon, Mr Sankaran Dr Matthew Hatton, Bristol Royal Infirmary ChandraSekharan, Consultant in Clinical Dr Jane Montgomery, Consultant Surgeon, Colchester Oncology, Sheffield Teaching Consultant in Anaesthetics and Professor Yit Chiun Yang, Hospital University NHS Hospital Foundation Trust Critical Care, South Devon Consultant Ophthalmologist, Foundation Trust Healthcare NHS Foundation Royal Wolverhampton Hospitals Dr John Holden, Trust NHS Trust Professor Nicholas Clarke, General Practitioner, Garswood Consultant Orthopaedic Surgery, Wigan Dr Simon Padley, Mrs Isabel Boyer, Surgeon, Southampton Consultant Radiologist, Chelsea Service User Representative, University Hospitals NHS Trust & Westminster Hospital London

133

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved. Health Technology Assessment programme

Pharmaceuticals Panel Members Chair, Dr Peter Elton, Dr Dyfrig Hughes, Dr Martin Shelly, Professor Imti Choonara, Director of Public Health, Reader in Pharmacoeconomics General Practitioner, Leeds, Professor in Child Health, Bury Primary Care Trust and Deputy Director, Centre and Associate Director, NHS University of Nottingham for Economics and Policy in Clinical Governance Support Professor Robin Ferner, Health, IMSCaR, Bangor Team, Leicester Deputy Chair, Consultant Physician and University Dr Lesley Wise, Director, West Midlands Centre Dr Gillian Shepherd, Unit Manager, for Adverse Drug Reactions, Dr Yoon K Loke, Director, Health and Clinical Pharmacoepidemiology City Hospital NHS Trust, Senior Lecturer in Clinical Excellence, Merck Serono Ltd Research Unit, VRMM, Birmingham Pharmacology, University of Medicines & Healthcare East Anglia Mrs Katrina Simister, Products Regulatory Agency Dr Ben Goldacre, Assistant Director New Research Fellow, Division of Professor Femi Oyebode, Medicines, National Prescribing Mrs Nicola Carey, Psychological Medicine and Consultant Psychiatrist Centre, Liverpool Senior Research Fellow, Psychiatry, King’s College and Head of Department, School of Health and Social London University of Birmingham Mr David Symes, Care, The University of Service User Representative Reading Dr Bill Gutteridge, Dr Andrew Prentice, Medical Adviser, London Senior Lecturer and Consultant Mr John Chapman, Strategic Health Authority Obstetrician and Gynaecologist, Service User Representative The Rosie Hospital, University of Cambridge

Observers Ms Kay Pattison, Mr Simon Reeve, Dr Heike Weber, Dr Ursula Wells, Section Head, NHS R&D Head of Clinical and Cost- Programme Manager, Principal Research Officer, Programme, Department of Effectiveness, Medicines, Medical Research Council Department of Health Health Pharmacy and Industry Group, Department of Health

Psychological and Community Therapies Panel Members Chair, Dr Steve Cunningham, Ms Mary Nettle, Dr Howard Ring, Professor Scott Weich, Consultant Respiratory Mental Health User Consultant, Consultant & University Professor of Psychiatry, Paediatrician, Lothian Health Gloucestershire Lecturer in Psychiatry, University of Warwick Board University of Cambridge Professor John Potter, Professor Jane Barlow, Dr Anne Hesketh, Professor of Ageing and Stroke Dr Karen Roberts, Professor of Public Health in Senior Clinical Lecturer in Medicine, University of East Nurse/Consultant, Dunston Hill the Early Years, Health Sciences Speech and Language Therapy, Anglia Hospital, Tyne and Wear Research Institute, Warwick University of Manchester Medical School Dr Greta Rait, Dr Karim Saad, Dr Yann Lefeuvre, Senior Clinical Lecturer and Consultant in Old Age Dr Sabyasachi Bhaumik, GP Partner, Burrage Road General Practitioner, University Psychiatry, Coventry & Consultant Psychiatrist, Surgery, London College London Warwickshire Partnership Trust Leicestershire Partnership NHS Trust Dr Jeremy J Murphy, Dr Paul Ramchandani, Dr Alastair Sutcliffe, Consultant Physician & Senior Research Fellow/Cons. Senior Lecturer, University Mrs Val Carlill, Cardiologist, County Durham & Child Psychiatrist, University of College London Service User Representative, Darlington Foundation Trust Oxford Gloucestershire Dr Simon Wright, Mr John Needham, GP Partner, Walkden Medical Service User, Buckingmashire Centre, Manchester

Observers Ms Kay Pattison, Dr Morven Roberts, Professor Tom Walley, Dr Ursula Wells, Section Head, R&D, DH, Leeds Clinical Trials Manager, MRC, HTA Programme Director, Policy Research Programme, London Liverpool DH, London

134

Current and past membership details of all HTA programme ‘committees’ are available from the HTA website (www.hta.ac.uk) DOI: 10.3310/hta14250 Health Technology Assessment 2010; Vol. 14: No. 25

Expert Advisory Network Members Professor Douglas Altman, Mr Jonothan Earnshaw, Professor Allen Hutchinson, Professor Miranda Mugford, Professor of Statistics in Consultant Vascular Surgeon, Director of Public Health and Professor of Health Economics Medicine, Centre for Statistics Gloucestershire Royal Hospital, Deputy Dean of ScHARR, and Group Co-ordinator, in Medicine, University of Gloucester University of Sheffield University of East Anglia Oxford Professor Martin Eccles, Professor Peter Jones, Professor Jim Neilson, Professor John Bond, Professor of Clinical Professor of Psychiatry, Head of School of Reproductive Professor of Social Gerontology Effectiveness, Centre for Health University of Cambridge, & Developmental Medicine & Health Services Research, Services Research, University of Cambridge and Professor of Obstetrics University of Newcastle upon Newcastle upon Tyne and Gynaecology, University of Tyne Professor Stan Kaye, Liverpool Professor Pam Enderby, Cancer Research UK Professor Professor Andrew Bradbury, Dean of Faculty of Medicine, of Medical Oncology, Royal Mrs Julietta Patnick, Professor of Vascular Surgery, Institute of General Practice Marsden Hospital and Institute National Co-ordinator, NHS Solihull Hospital, Birmingham and Primary Care, University of of Cancer Research, Surrey Cancer Screening Programmes, Sheffield Sheffield Mr Shaun Brogan, Dr Duncan Keeley, Chief Executive, Ridgeway Professor Gene Feder, General Practitioner (Dr Burch Professor Robert Peveler, Primary Care Group, Aylesbury Professor of Primary Care & Ptnrs), The Health Centre, Professor of Liaison Psychiatry, Research & Development, Thame Royal South Hants Hospital, Mrs Stella Burnside OBE, Centre for Health Sciences, Southampton Chief Executive, Regulation Barts and The London School Dr Donna Lamping, and Improvement Authority, of Medicine and Dentistry Research Degrees Programme Professor Chris Price, Belfast Director and Reader in Director of Clinical Research, Mr Leonard R Fenwick, Psychology, Health Services Bayer Diagnostics Europe, Ms Tracy Bury, Chief Executive, Freeman Research Unit, London School Stoke Poges Project Manager, World Hospital, Newcastle upon Tyne of Hygiene and Tropical Confederation for Physical Medicine, London Professor William Rosenberg, Therapy, London Mrs Gillian Fletcher, Professor of Hepatology Antenatal Teacher and Tutor Mr George Levvy, and Consultant Physician, Professor Iain T Cameron, and President, National Chief Executive, Motor University of Southampton Professor of Obstetrics and Childbirth Trust, Henfield Neurone Disease Association, Gynaecology and Head of the Northampton Professor Peter Sandercock, School of Medicine, University Professor Jayne Franklyn, Professor of Medical Neurology, of Southampton Professor of Medicine, Professor James Lindesay, Department of Clinical University of Birmingham Professor of Psychiatry for the Neurosciences, University of Dr Christine Clark, Elderly, University of Leicester Edinburgh Medical Writer and Consultant Mr Tam Fry, Pharmacist, Rossendale Honorary Chairman, Child Professor Julian Little, Dr Susan Schonfield, Growth Foundation, London Professor of Human Genome Consultant in Public Health, Professor Collette Clifford, Epidemiology, University of Hillingdon Primary Care Trust, Professor of Nursing and Professor Fiona Gilbert, Ottawa Middlesex Head of Research, The Consultant Radiologist and Medical School, University of NCRN Member, University of Professor Alistaire McGuire, Dr Eamonn Sheridan, Birmingham Aberdeen Professor of Health Economics, Consultant in Clinical Genetics, London School of Economics St James’s University Hospital, Professor Barry Cookson, Professor Paul Gregg, Leeds Director, Laboratory of Hospital Professor of Orthopaedic Professor Rajan Madhok, Infection, Public Health Surgical Science, South Tees Medical Director and Director Dr Margaret Somerville, Laboratory Service, London Hospital NHS Trust of Public Health, Directorate Director of Public Health of Clinical Strategy & Public Learning, Peninsula Medical Dr Carl Counsell, Bec Hanley, Health, North & East Yorkshire School, University of Plymouth Clinical Senior Lecturer in Co-director, TwoCan Associates, & Northern Lincolnshire Neurology, University of West Sussex Health Authority, York Professor Sarah Stewart-Brown, Aberdeen Professor of Public Health, Dr Maryann L Hardy, Professor Alexander Markham, Division of Health in the Professor Howard Cuckle, Senior Lecturer, University of Director, Molecular Medicine Community, University of Professor of Reproductive Bradford Unit, St James’s University Warwick, Coventry Epidemiology, Department Hospital, Leeds of Paediatrics, Obstetrics & Mrs Sharon Hart, Professor Ala Szczepura, Gynaecology, University of Healthcare Management Dr Peter Moore, Professor of Health Service Leeds Consultant, Reading Freelance Science Writer, Research, Centre for Health Services Studies, University of Professor Robert E Hawkins, Ashtead Dr Katherine Darton, Warwick, Coventry Information Unit, MIND – The CRC Professor and Director Dr Andrew Mortimore, Mental Health Charity, London of Medical Oncology, Christie Public Health Director, Mrs Joan Webster, CRC Research Centre, Southampton City Primary Consumer Member, Southern Professor Carol Dezateux, Christie Hospital NHS Trust, Care Trust Derbyshire Community Health Professor of Paediatric Manchester Council Epidemiology, Institute of Child Dr Sue Moss, Health, London Professor Richard Hobbs, Associate Director, Cancer Professor Martin Whittle, Head of Department of Primary Screening Evaluation Unit, Clinical Co-director, National Mr John Dunning, Care & General Practice, Institute of Cancer Research, Co-ordinating Centre for Consultant Cardiothoracic University of Birmingham Sutton Women’s and Children’s Surgeon, Papworth Hospital Health, Lymington NHS Trust, Cambridge Professor Alan Horwich, Dean and Section Chairman, The Institute of Cancer Research, London

135

© 2010 Queen’s Printer and Controller of HMSO. All rights reserved.

Feedback The HTA programme and the authors would like to know your views about this report. The Correspondence Page on the HTA website (www.hta.ac.uk) is a convenient way to publish your comments. If you prefer, you can send your comments to the address below, telling us whether you would like us to transfer them to the website. We look forward to hearing from you.

NETSCC, Health Technology Assessment Alpha House University of Southampton Science Park Southampton SO16 7NS, UK Email: [email protected] www.hta.ac.uk ISSN 1366-5278