A national education programme Development, validation and impact on interpretation skills and birth hypoxia

PhD thesis Line Thellesen

This PhD thesis was submitted to the Graduate School of The Faculty of Health and Medical , University of Copenhagen on November 1, 2016 Public defence: March 10, 2017, Rigshospitalet.

Author

Line Thellesen MD Department of , The Juliane Marie Centre for children, women and reproduction Rigshospitalet, University of Copenhagen, Denmark Email: [email protected]

Academic supervisors

Thomas Bergholt MD PhD MSc Department of Obstetrics, The Juliane Marie Centre for children, women and reproduction Rigshospitalet, University of Copenhagen, Denmark

Morten Hedegaard MD PhD Head of Department Department of Obstetrics, The Juliane Marie Centre for children, women and reproduction Rigshospitalet, University of Copenhagen, Denmark

Jette Led Sørensen MD PhD MMEd The Juliane Marie Centre for children, women and reproduction, Rigshospitalet, University of Copenhagen, Denmark

Assessment committee

Associate Professor Christina Rørbye Lundin MD PhD (Chairman) Department of Obstetrics and , Hvidovre Hospital, Denmark

Associate professor Olof Stephansson MD PhD Department of Women’s and Children’s Health, Karolinska Institutet, Stockholm, Sweden

Professor Timothy Draycott MBBS MRCOG MD Department of Women's Health, Southmead Hospital, Bristol, UK

2 Contents

PREFACE ...... 5

ABBREVIATIONS ...... 6

ENGLISH SUMMARY ...... 7

DANSK RESUMÉ ...... 9

INTRODUCTION ...... 11

BACKGROUND ...... 11

Cardiotocography ...... 11 Definition ...... 11 Historical aspect ...... 13 Effect of cardiotocography ...... 14 Obstetric compensation claims ...... 15

The CTG education programme ...... 17

Development of an education programme ...... 19 Curriculum development ...... 19 Assessment and validity ...... 20

Evaluation of an education programme ...... 22 Kirkpatrick’s evaluation model ...... 22 Birth hypoxia outcomes ...... 23

AIMS ...... 24

RESEARCH QUESTIONS ...... 25

METHODS AND RESULTS ...... 26

Study I – Curriculum development ...... 26

Study II – Test development ...... 28

Study III – CTG education: effect on knowledge and interpretation skills ...... 30

Study IV – CTG education: effect on birth hypoxia ...... 32 3 Ethics, Study I-IV ...... 34

DISCUSSION ...... 35

Development of an education programme ...... 38

Evaluation of an education programme ...... 41

Fetal monitoring with cardiotocography ...... 43

CONCLUSIONS ...... 45

PERSPECTIVES ...... 46

ACKNOWLEDGEMENTS ...... 47

FUNDING ...... 49

REFERENCES ...... 50

MANUSCRIPTS / PUBLISHED PAPERS ...... 57

Paper I ...... 59

Paper II ...... 69

Paper III ...... 91

Paper IV ...... 113

4 Preface

This PhD thesis was conducted during my employment at the Department of Obstetrics, The Juliane Marie Centre, Rigshospitalet, University of Copenhagen from 2012 to 2016.

The thesis is based on the following four papers

1 Thellesen L, Hedegaard M, Bergholt T, Colov NP, Hoegh S, Sorensen JL. Curriculum development for a national cardiotocography education program: a Delphi survey to obtain consensus on learning objectives. Acta Obstet Gynecol Scand 2015; 94: 869–877.

2 Thellesen L, Bergholt T, Hedegaard M, Colov NP, Christensen KB, Andersen KS, Sorensen JL. Development of a written assessment for a national interprofessional cardiotocography education program. Under revision

3 Thellesen L, Sorensen JL, Hedegaard M, Rosthoej S, Colov NP, Andersen KS, Bergholt T. Cardiotocography interpretation skills and the association with size of maternity unit, years of obstetric work experience and healthcare professional background: a national cross-sectional study. Under revision

4 Thellesen L, Bergholt T, Sorensen JL, Rosthoej S, Hvidman L, Eskenazi B, Hedegaard M. The impact of a national cardiotocography education programme on birth hypoxia; a historically controlled intervention study. Submitted

5 Abbreviations

CTG Cardiotocography FHR Fetal heart rate MCQ Multiple-choice question FIGO International Federation of Gynecology and Obstetrics DIF Differential Item functioning OR Odds ratio CI Confidence interval

6 English summary

Studies indicate that errors in the management of fetal monitoring are the main cause of hypoxic brain injuries among newborns in cases of substandard care. To increase quality and safety of care in Danish maternity units and to reduce the incidence of birth hypoxia, a national obstetric quality project, Safe Deliveries, was introduced in 2012. As part of the project a national standardised inter-professional cardiotocography (CTG) education programme was implemented in all maternity units. Based on four studies this PhD thesis describes and discusses the development, validation and evaluation of the CTG education programme.

In the first study we developed a prioritised list of 40 CTG learning objectives based on a national Delphi survey that included experienced and obstetricians from all Danish maternity units. We found that the learning objectives reflected a wide variety of knowledge, skills and attitudes concerning CTG monitoring and that interpretation skills and clinical decision-making were rated higher than knowledge on fetal physiology.

The learning objectives were used in the second study, which was a validation study describing the development of a CTG multiple-choice question (MCQ) test. We found that the MCQ test measured the intended; CTG knowledge, interpretation skills and clinical decision- making, and had a good capacity to discriminate. As a result, we concluded that implementing the test in the CTG education programme would be meaningful. However, additional items, including ones with a higher difficulty, need to be implemented for the test to serve as a high- stake examination with the potential to have a serious impact on examinees.

We evaluated the CTG education programme based on Kirkpatrick’s four-level evaluation model. Our third study showed that course evaluations were positive and detected self- perceived learning, in addition to an increase in CTG knowledge and interpretation skills assessed by pre- and post-testing using the CTG MCQ test. In a cross-sectional study we found that doctors and midwives’ mean CTG test score was positively associated with working in large-sized maternity units and with having less than 15 years of obstetric work experience.

The fourth study was a national historically controlled intervention study from 2009 to 2015 measuring the effect of the CTG education programme on birth hypoxia. The study included 331 282 intended vaginal deliveries of liveborn singletons in and ≥37 weeks. We compared the pre-implementation (2009-2012) with the post-

7 implementation (2014-2015) period and found no change in risk of pH <7.00, five- minute Apgar score <7 or neonatal therapeutic hypothermia. The risk of emergency did not change, whereas the risk of assisted vaginal delivery decreased significantly by 14%.

8 Dansk resumé

Flere studier konkluderer, at fejl i håndteringen af fosterovervågning under fødslen er hovedårsagen til hypoxisk betinget hjerneskade hos nyfødte i patienterstatningssager. Med henblik på at øge kvaliteten og patientsikkerheden på de danske fødeafdelinger og reducere antallet af børn født med iltmangel blev et nationalt obstetrisk kvalitetsprojekt, Sikre Fødsler, introduceret i 2012. Som en del af projektet blev der implementeret et nationalt standardiseret tværfagligt cardiotocografi (CTG) uddannelsesprogram, obligatorisk for alle læger og jordemødre på landets fødeafdelinger. Denne Ph.d.-afhandling, bestående af 4 studier, beskriver og diskuterer udviklingen, valideringen og evalueringen af det nationale CTG uddannelsesprogram.

I det første studie udviklede vi en prioriteret liste over 40 CTG læringsmål baseret på en Delphi spørgeskema undersøgelse, hvor erfarne obstetrikere og jordemødre fra alle landets fødeafdelinger blev inviteret til at deltage. Vi fandt at læringsmålene reflekterede både viden, færdigheder og holdninger vedrørende CTG monitorering og at tolkningsfærdigheder og klinisk beslutningstagning generelt blev bedømt som mere relevante kompetencer end viden om fosterfysiologi.

Læringsmålene blev anvendt i vores andet studie som var et valideringsstudie omhandlende udviklingen af en CTG multiple-choice question (MCQ) test. Vi fandt at testen målte det intenderede; CTG viden, tolkningsfærdigheder og klinisk beslutningstagning og at testen havde acceptable diskrimineringsevner. Vi konkluderede at testen meningsfuldt kunne anvendes som en del af CTG uddannelsesprogrammet, men at den ikke var omfattende eller svær nok til at udgøre en certificerende eksamen hvor konsekvenserne af testen kan have stor betydning for den enkelte test-tager.

Vi evaluerede CTG uddannelsesprogrammet baseret på Kirkpatrick’s evalueringsmodel. I vores tredje studie fandt vi at CTG undervisning blev fundet udbytterig af de deltagende læger og jordemødre og at undervisningen førte til øget CTG tolkningsfærdigheder målt ved selv- evaluering og før- og efter testning ved brug af CTG MCQ testen. I et nationalt tværsnitsstudie fandt vi at læger og jordemødres CTG test score var positivt associeret med at arbejde på store fødeafdelinger og at have mindre en 15 års klinisk obstetrisk arbejde.

Det fjerde studie var et nationalt historisk kontrolleret interventionsstudie fra 2009 til 2015, hvori vi evaluerede effekten af CTG uddannelsesprogrammet på iltmangel ved fødslen. I alt blev

9 331 282 intenderede vaginale fødsler med levendefødte singletons i hovedstilling og gestationsalder ≥37 uger inkluderet. Vi sammenlignede præ-implementeringsperioden (2009- 2012) med post-implementeringsperioden (2014-2015) og fandt ingen ændring i risiko for navlesnors pH <7.00, fem-minutters Apgar score <7 eller neonatal kølebehandling. Risikoen for akut kejsersnit var uændret, mens risikoen for instrumentel vaginal forløsning blev reduceret signifikant med 14%.

10 Introduction

Birth hypoxia is a dreaded obstetric complication associated with neonatal mortality and neurological impairment.1,2 A Danish study from 2008 analysing approved obstetric compensation claims concerning perinatal hypoxic brain injuries concluded that the main reason for the majority of cases was related to insufficient management of cardiotocography (CTG) during labour.3 With the purpose to decrease the incidence of birth hypoxia a national obstetric quality project, Safe Deliveries, was initiated in Denmark in 2012.4 As part of the project a standardised mandatory inter-professional CTG education programme was developed and implemented nationally. This PhD thesis describes and discusses the development and validation of the CTG education programme and its effect on the CTG interpretation skills of doctors and midwives and on the incidence of birth hypoxia.

Background

Cardiotocography

Definition

CTG is a fetal monitoring method widely used in developed countries.5,6 It measures the fetal heart rate (FHR) and the uterine contractions, either externally using an ultrasound transducer and a tocodynamometer or internally using a fetal electrode and an intrauterine pressure sensor.7 The aim of the surveillance method is to timely identify fetuses with inadequate oxygenation to allow appropriate intervention to avoid irreversible injury and on the other hand avoid unnecessary interventions when fetal oxygenation is sufficient.8 A CTG recording is interpreted by evaluating baseline (mean level of the FHR), variability (fluctuations in the baseline FHR), accelerations (transient increase in FHR), decelerations (transient slowing of FHR) and contractions (Fig. 1).9

11 Normalt CTG

Baseline Variability Accelera7on

Contrac7ons

Fig. 1. Normal CTG recording without decelerations. (Source: Safe Deliveries’ CTG course) 1

Classification of CTG and subsequent clinical actions based on a CTG interpretation has been discussed since the development of the technique. In 1987 and in 2015 the International Federation of Gynecology and Obstetrics (FIGO) published international consensus guidelines on CTG classification, but various countries and organisations have independent classification systems.10-14 The Danish CTG classification is based on the FIGO and Neoventa CTG classification systems (Fig. 2).10,15

NB! Max. 5 contractions per 10 minutes CTG classification* Baseline Variability Decelerations Intrapartum (> 15 bpm and > 15 sec)

• 110-150 bpm • 5-25 bpm • Uniform, early Normal CTG

Accelerations Variable uncomplicated normal CTG on usual indication • • (>15 bpm and >15 sec) (< 60 sec and loss of < 60 beats)

Intermediary CTG • 100-110 bpm • > 25 bpm • Variable uncomplicated intermediary (one intermediary factor) • 150-170 bpm • < 5 bpm and no accelerations in (< 60 sec and loss of > 60 beats) 40-60 min Continue CTG, • Short bradycardia: consider second opinion 80-100 bpm in 3-10 min or < 80 bpm in 2-3 min

2 or more intermediary factors = abnormal CTG

Abnormal CTG (one abnormal factor) • 150-170 bpm and decreased • < 5 bpm > 60 min • Variable complicated variability with no accelaration (> 60 sec) Second opinion required, • > 170 bpm Most often indication for scalp • Sinusoidal pattern • Uniform, late blood sample or delivery • Persistent bradycardia: 80-100 bpm > 10 min or < 80 bpm > 3 min

Preterminal CTG • Total lack of variability with or without decelerations or bradycardia Most often indication for immediate delivery

*Antepartum CTG should generally not display decelerations

Fig. 2. The Danish CTG classification system

12 Historical aspect Auscultation of FHR was first described in the medical literature in 1818 and involved using either a stethoscope or simply an ear on the maternal abdomen.16 During the 19th century FHR auscultation was used to confirm , assess fetal viability and and diagnose multigravidas.17 At the same time theories arose on the association between changing FHR and fetal hypoxia, in addition to the suggestion that a changing FHR was indication to undertake assisted vaginal delivery.

In 1906 the first fetal electrocardiogram (Fig. 3) was recorded and from the mid-twentieth century the electronic measuring of the FHR developed significantly, making it possible to measure the FHR continuously and in combination with uterine contractions.18,19 The potential of the technique was met with excitement, the technique initially deemed to be an absolutely trustworthy and precise method for identifying fetal hypoxia before or during labour.19,20

18 Fig. 3. First recording of a fetal electrocardiogram, 1906

At that time cerebral palsy was believed to be caused mainly by intrapartum hypoxia, thus the possibility of diagnosis and treatment of fetal hypoxia before or during labour was expected to greatly reduce the incidence of cerebral palsy.17

Animal and fetal studies performed mainly in the 1950’s to 1970’s constitute the prerequisites for the current CTG interpretation and classification.16,17 FHR is believed to be modulated by the fetal autonomic nervous system, with a constant sympathetic and parasympathetic influence on the fetal heart.21,22 Changes in fetal blood pressure and oxygenation affect the autonomic nervous system and the fetal myocardium and consequently modify the FHR, which therefore provides information on the wellbeing of the fetus.17 Specific FHR patterns are found to be associated with hypoxic and non-hypoxic fetuses respectively.23,24 Fetal baroreceptors, chemoreceptors, vagal and adrenergic responses constitute the underlying physiological aetiology for the changes in FHR.17 It is beyond the scope of this thesis to describe these mechanisms in detail.

13 Other factors such as administration of maternal medication, maternal circulation, placental and fetal anomalies, fetal anaemia and fetal arousal state can also affect FHR due to an effect on the fetal autonomic nervous system. CTG was rapidly implemented into clinical practice in the developed world in the late 1960s.7 In 1977 CTG was applied in 54% of deliveries in the US, while in 2004 its prevalence of use increased to more than 80%.5,25

Effect of cardiotocography Initial non-randomised studies on concluded a positive impact of CTG based on a timely associated decrease in intrapartum and neonatal deaths.26 The studies were characterised by heterogeneity in designs, patient populations and CTG equipment. The following clinical randomised controlled trials put the positive impact of CTG on fetal wellbeing into question when comparing CTG with intermittent auscultation.19 Subsequent metaanalyses found no decrease in perinatal mortality, low Apgar score, low umbilical cord pH or cerebral palsy and concluded that CTG monitoring decreases the incidence of neonatal seizures as the only neonatal outcome affected and increases the incidence of caesarean sections and assisted vaginal deliveries.27 No studies exist comparing CTG to no fetal monitoring.

The diagnostic features of CTG were also criticised early on. Too many operative deliveries were performed on non-hypoxic infants and CTG was suggested to be used as a warning or screening test and not as a precise diagnostic tool of fetal hypoxia.19,28 A key aspect of evaluating a diagnostic or screening test is identifying how sensitive (ability to detect sick individuals) and specific (ability to detect healthy individuals) the test is, as well as knowing how well the test results predict (positive predictive value) or absence of disease (negative predictive value) (Fig. 4).29

Birth hypoxia

Present Absent True positive False Positive Positive predictive value: Positive A B A / (A+B) False negative True Negative Negative predictive value: Negative C D D / (C+D) Cardiotocgraphy Sensitivity: Specificity:

A / (A+C) D / (B+D)

Fig. 4. Diagram of test sensitivity, specificity and positive and negative predictive values

14 It is now commonly recognised that CTG is a sensitive indicator of fetal hypoxia but is limited by a low specificity and low positive predictive value.11 This is due to a high rate of false positive CTG patterns, which means that a pathological CTG pattern in many cases will precede a non- hypoxic infant. The rarity in false negative CTG patterns entails high negative predictive value. Due to the low specificity of CTG, adjunctive methods such as scalp blood sample and ST waveform analysis of fetal electrocardiogram (STAN) are used to increase the specificity to avoid unnecessary operative deliveries.

Other aspects that challenge the role of CTG is the considerable variation in intra- and interobserver interpretation, low reproducibility in clinical management and the possible influence knowledge of neonatal outcomes has on CTG interpretation making it less reliable.30-33 All these issues explain the ongoing discussion concerning CTG, which despite differing opinions and evidence, has been implemented in daily obstetric practice in the last 50 years. Its advocates argue that the randomised controlled studies are underpowered, cannot be compared to current practice and that the effect of CTG is limited due to the human factor.11,34,35 Its opponents believe the technique was prematurely integrated in routine clinical practice, some suggesting the method be abandoned in favour of beginning anew with alternative fetal surveillance systems.36,37

Obstetric compensation claims Studies from several different countries analysing obstetric compensation claims and substandard care concerning birth hypoxia and hypoxic brain injuries all find an association with errors in the management of CTG monitoring.3,38-41 The errors comprise omission of use of CTG when indicated, lack of use of adjunctive fetal monitoring methods, CTG misinterpretation, delayed response despite abnormal CTG, a failure to measure contractions and prolonged decision-to-deliver time. Most of the studies find inadequate fetal monitoring to be the most important factor leading to compensation.3,39,40

The studies all emphasise the importance of CTG education and recommend regular CTG training. In addition, the risk-reduction recommendations of the Joint Commission on Accreditation of Healthcare Organizations point out fetal monitoring education of healthcare professionals as a means for preventing infant death and injury during delivery.42 The same applies in the Danish Health Authority, which recommends regular CTG education for doctors and midwives.43

15 A 2011 systematic review evaluating the effect of CTG education found that it led to an increase in CTG knowledge, interpretation skills and improved quality of care.44 A lack of validated assessment methods was identified and only two out of the 20 studies examined the clinical effect of CTG education. The results from these studies were contradictory.

16 The CTG education programme

Safe Deliveries is a national obstetric quality project introduced in 2012 to increase safety and quality of care in all Danish maternity units with the specific aim of reducing the incidence of birth hypoxia.4

A standardised inter-professional national CTG education programme was a part of the project, which was funded by Danish Regions and initiated by the Danish Society of Obstetrics and Gynaecology, the Danish Association of Midwives, the Danish Paediatric Society, the Danish Society for Patient Safety and the Patient Compensation Association.

In addition to CTG education and to ensure safe work routines, Safe Deliveries comprised three checklists (admission/time out, infusion, vacuum delivery) to be implemented in all maternity units, patient safety learning seminars and monthly telephone conferences for designated inter-professional teams from each maternity unit. This thesis does not cover this part of the Safe Deliveries project.

The CTG education programme consisted of an e-learning programme and a one-day course with a written assessment. The content of the education programme addressed fetal physiology, CTG interpretation and classification and clinical decision-making. Completion of the e-learning programme was a prerequisite for attending the one-day course. Available online, the e-learning programme took approximately four hours to complete and could be taken at or outside the hospital as of June 2013. The programme contained theoretical information and involved taking a test on 20 interactive cases and passing a test covering selected elements from the entire curriculum. The interactive cases were based on authentic CTG traces and required making suggestions on CTG interpretation, classification and clinical actions. Feedback was provided on both correct and incorrect answers.

The courses requiring attendance began in September 2013 and were conducted at five different locations in the various regions of Denmark. Safe Deliveries provided funding for the facilities, catering and the instructors’ expenses, while the individual maternity units covered the cost of giving staff the day off to attend the course. Twenty experienced obstetricians and midwives recruited from all of Denmark made up the teaching team, with one of each represented at every course. The course participants were a deliberate mixture of approximately 40 midwives and doctors. The courses included lectures, plenary discussions, small group teaching and completing a written assessment. The teaching material was standardised and all instructors attended a train-the-trainer course prior to commencing the courses. 17 Fig. 5. illustrates the timeline for the CTG education programme, the number of courses and participants.

e-learning (June 2013)

53 courses (n=1801)

5 courses 4 courses (n=106) (n=187)

2013 (Sept-Dec) 2014 (Mar-Apr) 2015 (Apr)

Fig. 5. CTG education programme timeline and number of courses and participants (n)

18 Development of an education programme

Curriculum development A curriculum is defined as a planned educational experience.45 Kern describes a six-step approach to curriculum development for medical education and, as early as the 1980s, Harden specified ten questions to ask when planning a course.45,46 Both approaches emphasise a needs assessment for the educational intervention, defining the learning objectives and deciding on the teaching methods, as well as the planning of the implementation and evaluation of the educational intervention. Using the approach to curriculum development in Kern’s six steps, we have below included the choices and activities of our own and from the Safe Deliveries project:

Step 1. Problem identification and general needs assessment, where the needs are reflected in “the difference between the ideal approach and the current approach”.45 The problem identified in the current case was avoidable hypoxic brain injury in newborns. The 2008 Danish study on hypoxic injury compensation claims indicated a need for improvement in the CTG management and interpretation skills of midwives and doctors.3

Step 2. Targeted needs assessment that addresses the needs of the specific target group and its learning environment. Fetal monitoring with CTG is available in all maternity units in Denmark and prior to implementation of the national education programme, local and regional CTG teaching took place. An overview of the extent, frequency, form or content of this teaching is not available. Safe Deliveries identified the targeted need to be a mandatory, national standardised CTG education programme for midwives and doctors responsible for labouring women.

Step 3. Defining the goals and objectives. We developed CTG learning objectives based on a national questionnaire survey (Study I). To promote joint ownership of the education programme and to identify possible differences across regions, we invited midwives and obstetricians from every maternity unit in Denmark.

Step 4. Decision on educational strategies. Safe Deliveries predefined the CTG education programme to include an e-learning programme and a one-day course with a written assessment.

19 Step 5. Implementation planning. Responsibility was delegated for the various parts of the CTG education programme, the e-learning programme was made available for all maternity units and the CTG course instructors were introduced to the teaching tasks. Course materials were distributed, course locations booked and a booking system for the courses was established.

Step 6. Evaluation and feedback that reflect both individual assessment and programme evaluation. An individual assessment (Study II) was integrated into the CTG education programme and this PhD thesis evaluates the programme (Study III and IV).

Although divided into six separate steps, curriculum development, according to Kern, is an ever- evolving dynamic, interactive and ongoing process.

Assessment and validity Assessment is defined as “any systematic method of obtaining information from tests and other sources, used to draw inferences about characteristics of people, objects, or programs”.47 In the current case the assessment was intended to measure CTG knowledge, interpretation skills and clinical decision-making. Integrating assessment in an education programme increases retention of the acquired learning and motivates trainees to learn.48 As result, from a learning perspective, integrating an assessment into a curriculum is a reasonable endeavour. In healthcare professional education four major assessment methods are described: written tests, observation of clinical performance, performance tests and other assessments such as oral examinations or portfolios.49 We chose a written test as the assessment method and multiple-choice questions (MCQ) as the format. MCQ testing is suitable for large groups and the scoring system is time saving. The format is known for its high reliabilty and an MCQ test can test more than factual knowledge, if well designed.50,51

Validity is considered the single most important aspect in terms of assessment and all assessments require evidence of validity.49 Validity refers to the degree to which evidence and theory support the interpretations of test scores.47 Thus, validity can be regarded as an argument for the intended interpretations.

20 Validity is not a definite size but always a matter of degree, neither is it a property of the instrument but of the interpretations based on the instrument’s score.52 The greater the impact of the test results, the higher the requirements must be for the evidence to support the validity.49 Reliability is a necessary component of validity that refers to the reproducibility of the scores of the assessment.52 The concept of validity has changed over time, with contemporary theory representing a unitary concept based on the theories of Messick.53 Construct validity is the overall term and the Standards for Educational and Psychological Testing suggests the following five sources of evidence from which information should be systematically collected and documented to support or refute the proposed interpretations: content, response process, internal structure, relations to other variables and consequences.47 The current CTG MCQ test was developed based on this framework.

21 Evaluation of an education programme

Kirkpatrick’s evaluation model The main focus in educational programmes often concerns the development and implementation of the intervention, with less attention given to programme evaluation.54 Developing an educational intervention, however, involves the planning of an evaluation of the intervention, as described in the sixth step of Kern’s curriculum development approach.45 Different evaluation frameworks exist, some objectives-oriented and others process- and participant-oriented, each of which has its own strengths and limitations.54,55 We based the evaluation of the CTG education programme on the theoretical framework of Kirkpatrick’s four-level evaluation model (Fig. 6), which follows the objectives-oriented approach.56 The model measures the effect of the educational intervention on four increasing levels. The two lower levels measure the participants’ evaluation of the programme (Reaction) and their acquisition of knowledge or skills due to the education (Learning). In levels three and four, the transfer of knowledge and skills (Behaviour), as well as the effect on organisational level or patient outcome (Results), is evaluated.

Results Effect on workplace e.g. patient outcome Behaviour Transfer of learning to work place Learning Knowledge or skills acquired Reaction Participant satisfaction

Fig. 6. Kirkpatrick’s evaluation model

Course evaluation and self-perceived learning (Reaction) was measured in a participant questionnaire. The acquired CTG knowledge and skills (Learning) were measured by pre- and post-testing. The impact of the CTG education programme on birth hypoxia (Results) was measured in a national historically controlled intervention study. We did not measure a possible effect on level 3 (Behaviour).

22 Birth hypoxia outcomes We measured birth hypoxia by umbilical cord pH <7.00, five-minute Apgar score <7 and neonatal therapeutic hypothermia. We considered these measures to be proxies for potential risk of long-term neurological impairments such as cerebral palsy, which is not usually diagnosed until well after the neonatal period.57

Umbilical cord blood analysis is currently the only way of objectively measuring fetal hypoxia.8 Low pH values are associated with neonatal mortality, encephalopathy and cerebral palsy and are considered an important outcome measure.2 The threshold of a low pH value is disputable and different values have been suggested.1,2 We chose pH <7.00 based on the international consensus criteria to identify a severe acute hypoxic event as a potential cause for cerebral palsy.58

The Apgar score reflects the pulmonary, cardiovascular and neurological functions of the newborn. A low five-minute Apgar score is associated with long-term adverse neurological outcome and neonatal death.59,60 The outcome is limited in being a subjective measure and a low Apgar score can also be caused by non-hypoxic events, such as prematurity, infection, meconium aspiration and maternal administered medication.8

In Denmark the inclusion criteria for neonatal therapeutic hypothermia is ten-minute Apgar score <5 or umbilical cord pH <7.00 or standard base excess ≤-16 and gestational age ≥36, encephalopathy, abnormal electroencephalogram and initiation of treatment before a birth age of 5.5 hours.61 Thus, therapeutic hypothermia is a good marker for severe birth hypoxia. A relatively recent implementation of the treatment into Danish paediatric care limits the outcome.

23 Aims

The overall aim of this PhD thesis was to develop, validate and evaluate a national standardised CTG education programme for Danish doctors and midwives. We hypothesised that the implementation of the CTG education programme in Denmark would lead to a subsequent decrease in incidence of birth hypoxia.

Study I – Curriculum development The aim of this study was to define learning objectives for the CTG education programme based on a national Delphi survey.

Study II – Test development The aim of this study was to develop a CTG MCQ test and examine whether the test measured CTG knowledge, interpretation skills and clinical decision-making.

Study III – CTG education: effect on knowledge and skills The aim of this study was to examine the effect of the national CTG education programme on CTG knowledge and interpretation skills and to examine a possible association with size of maternity unit, years of obstetric work experience and healthcare professional background.

Study IV – CTG education: effect on birth hypoxia The aim of this study was to examine the effect of the national CTG education programme on the incidence of umbilical cord pH <7.00, five-minute Apgar score <7 and neonatal therapeutic hypothermia.

24 Research questions

Study I Which CTG competences are midwives and doctors with responsibility for labouring women required to have?

Study II To which extend does a newly developed CTG MCQ test measure CTG knowledge, interpretation skills and clinical decision-making?

Study III Does a one-day CTG course increase the CTG knowledge and interpretation skills of midwives and doctors? How are CTG interpretation skills associated with size of maternity unit, years of obstetric work experience and professional background?

Study IV Does a CTG education programme decrease the incidence of birth hypoxia? Are there unwanted effects associated with the implementation of a CTG education programme?

25 Methods and results Study I – Curriculum development

Method (Fig. 7) We conducted a three-round Delphi survey to obtain a national consensus on CTG learning objectives. A and an obstetrician from each maternity unit in Denmark with CTG teaching experience and more than five years of clinical work were invited to participate. Based on national and international guidelines we decided on the following six topics: fetal physiology, CTG equipment, indication, interpretation, management and communication.12-14,62,63 We subsequently sent an e-mail questionnaire to participants asking them to list one to five learning objectives based on those topics. Their responses were analysed and condensed using a directed approach to content analysis and the phrasing of the objectives was modified with reference to Bloom’s taxonomy.64,65 In subsequent rounds, the participants commented on and rated the objectives based on a five-point relevance scale.

Figure 1. Description of the three Delphi rounds, number of respondents and analyses

Method and analyses Delphi panel members Delphi rounds (number of maternity units)

• Delphi participants appointed by head of department or chief midwife from each 42 (21) Danish maternity unit

• Stating six topics important when using CTG founded on national and international guidelines • E-mail to panel members with the first Delphi round questionnaire

1. Listing 1-5 learning objectives 31 (20) within each predefined topic • Condensing data based on a directed approach of content analysis • Developing objectives based on Bloom´s Taxonomy • E-mail to panel members with the objectives

2. Rating the relevance of each • Assessing ratings of objectives by means and objective on a five-point scale. distribution of ratings in percent for each Suggest new objectives and comment on existing objective 29 (19) • Revision of objectives • E-mail to panel members with the revised objectives including the ratings

• Developing a prioritized list of objectives 26 (18) 3. Re-rating the relevance of each based on mean values objective

Fig. 7. Study I; Method, participants and analyses

26

The rating was measured by mean values and the variance depicted as the distribution of ratings as a percentage. Consensus was predefined as objectives with a mean rating value of ≥3. The outcome was a prioritised list of CTG learning objectives.

Results Fig. 7 lists the number of participants. The response rate in the first, second and third Delphi rounds were 74%, 69% and 62% respectively. The initial 536 responses received in the first Delphi round were condensed to 41 learning objectives. Comments from the participants in the second Delphi round entailed minor revisions of the objectives. Relevance rating of each objective in the third Delphi round resulted in a prioritised list of 40 learning objectives. The highest ranked objectives emphasised CTG interpretation and clinical decision- making, while the lowest-ranked objectives centred on fetal physiology. Table 1 presents the five objectives that received a rating of five (extremely relevant) by all participants and the seven items that received a rating of less than four (very relevant).

Learning objectives Mean Distribution of ratings (percent*) rating Rated extremely relevant by all participants in the third Delphi round (ranking) Not relevant Less relevant Relevant Very relevant Extremely relevant

5.00 (1) 26 (100) Define and evaluate baseline, variability, accelerations and decelerations

5.00 (2) 26 (100) Classify a CTG as normal, intermediate, pathological or preterminal

5.00 (3) 26 (100) Identify CTG patterns that require immediate delivery

Evaluate the pattern of contractions, comprising frequency, duration and interval 5.00 (4) 26 (100) and describe the maximum number of contractions accepted pr. 10 minutes.

26 (100) Identify CTG patterns in which supplying fetal surveillance is needed 5.00 (5)

Rated below very relevant in the third Delphi round Describe the function of and explain how the function can be affected by 3.77 (35) 1 (3.8) 8 (30.8) 13 (50.0) 4 (15.4) labor Explain the ability of the CTG to identify whether or not a fetus is affected by hypoxia 3.77 (36) 12 (46.2) 8 (30.8) 6 (23.1) (sensibility and specificity)

3.38 (37) 5 (19.2) 9 (34.6) 9 (34.6) 3 (11.5) Describe the differences between aerobic and anaerobic metabolism

3.35 (38) 2 (7.7) 14 (53.8) 9 (34.6) 1 (3.8) Obtain informed consent to CTG monitoring

3.15 (39) 5 (19.2) 12 (46.2) 9 (34.6) Explain the influence of the autonomic nervous systems on the fetal heart rate

3.15 (40) 4 (15.4) 14 (53.8) 8 (30.8) Describe the circulation of the placenta and the fetus

Describe how fetal hemoglobin differs from adult hemoglobin 2.19 (41) 4 (15.4) 13 (50.0) 9 (34.6) (Excluded due to mean rating below three)

Table 1. CTG learning objectives with the five highest and the seven lowest ranked objectives

27 Study II – Test development

Method (Fig. 8) We used a five-step unitary validity model to develop a CTG MCQ test.47,53 Participants were midwives, specialists and trainees within obstetrics and gynaecology representing all five regions of Denmark, and medical and midwifery students. The test development process was preceded by a national consensus on learning objectives and involved decision on blueprint, item writing based on the one-best-answer format, pilot testing, sensitivity analyses, standard setting using the contrasting groups method, reliability estimation with Cronbach’s alpha and evaluation of psychometric properties using item response theory models in the form of Rasch analyses and differential item functioning (DIF).

1. Content

Developing learning objectives based on a national Delphi survey (16) Deciding on a test blueprint with five domains Developing 50 one-best-answer multiple-choice questions

2. Response Process Proofreaders (n=6) Proofreading 1: relevance, language, spelling, academic content Two midwives and three obstetricians Proofreading 2: format, construction One obstetrician with test development experience [19] Pilot testing

Selecting 30 items to constitute the test

Pilot test participants (n=118) 3. Relations to other variables 32 Specialists in obstetrics and gynecology (20 obstetricians, 12 gynecologists) Sensitivity analysis 25 Residents in obstetrics and gynecology Comparing test responses from groups with expected differentiated level of CTG knowledge and clinical (13 first-year, 12 second-to-fifth-year) competences; Obstetricians, first-year residents, medical students, and midwives and midwifery students. 38 Midwives 8 Medical students !15 Midwifery students Recruited from six maternity units, representing all five 4. Consequences regions in Denmark and both small and large-sized units. Establishing a passing score using the Contrasting Groups method Detecting the discriminating point between a competent and a non-competent group; Obstetricians, and medical and midwifery students.

Implementing the test at the national CTG education program 5. Internal structure

Evaluating the psychometric properties of the test CTG course participants (n=1679) Loglinear Rasch model 269 Specialists in obstetrics and gynecology Differential item functioning 150 Residents in obstetrics and gynecology Cronbach’s alpha 1260 Midwives

Fig. 8. Study II. Method, participants and analyses

28 Results We initially developed 50 items for the CTG test. Based on the proofreading, pilot testing and time devoted for the test at the education programme, 30 items were selected to be in the final test. Basic test properties were examined on the responses from the pilot test participants (n=118). Cronbach’s Alpha equalled 0.79, which is regarded as acceptable,49 and the sensitivity analyses indicated acceptable discriminating abilities (Fig. 9). The standard score was set at 25 correct answers. Psychometric properties were examined on the responses from the CTG course participants (n=1679). Loglinear Rasch analysis revealed a good fit for all items. DIF was disclosed in relation to profession and regions, which means the test cannot be used to measure differences among midwives and doctors or across regions. Many items displayed ceiling effect and Cronbach’s alpha in this population was 0.63.

Doctors 30 27,0 27.0 25 23,9 23.9 20 answers 16.3 15 16,3

correct 10 5 No. of 0 Stud.med. (8) Medical Introlæge (13) First-year Speciallæge (20) Obstetricians (20) students (8) trainees (13)

30 Midwives

25 26,0 26.0 20

answers 18,5 18.5

15

correct 10 5 No. of 0 Stud.jord. (15) Jordemoder (38) Midwifery Midwives (38) students (15)

Fig. 9. Test discriminating abilities. Responses to a 30-item CTG MCQ test among doctors and midwives, mean number of correct answers with 95% confidence interval

29 Study III – CTG education: effect on knowledge and interpretation skills

Method We used questionnaires, pre- and post-testing and a cross-sectional study design to examine participant evaluation and the learning effect of the CTG course (Kirkpatrick’s level 1 and 2), as well as examining whether CTG knowledge and interpretation skills were associated with size of maternity unit, years of obstetric work experience and healthcare professional background. Midwives and specialists and trainees in obstetrics and gynaecology who attended the one-day CTG course were included in the study. The CTG course was preceded by the e-learning programme also a part of the national CTG education program. At the beginning of the course the participants answered 10 out of 30 items on the MCQ test that they took at the end of the course (Fig. 10). Descriptive analyses were used to assess the course evaluation and the learning effect. The association between the mean test score from the 30-item test and work conditions was analysed using multivariable robust regression.

10-item 30-item CTG e-learning! CTG one-day course CTG test CTG test

Fig. 10. Design of the CTG education programme

Results Fifty-three CTG courses were conducted in 2013 and 1671 (97%) of the eligible doctors and midwives participated in the study. Fig. 11 depicts the allocation of correct answers on the 30-item MCQ test. Ninety-five percent of the participants passed the test.

30

600"

500" 473" 479"

400" 313" 300"

Numberparticipantsof 200" 161"

105" 100" 60" 32" 12" 20" 0" 2" 0" 1" 4" 3" 6" 0" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30"

Number of correct answers

Fig. 11. Allocation of correct answers in the 30-item CTG MCQ test among midwives and doctors

The CTG course was seen as rewarding by 89% of the participants and 75% agreed that they felt more confident with CTG interpretation and classification after the course (response rate 95%). When comparing the 10 CTG items presented before and after the CTG course, we found that among participants whose scores could increase (received 0 to 9 correct answers on the initial test) 84% increased their score, 11% maintained their score and 5% decreased their score.

In the multivariable robust regression we found that doctors and midwives’ mean test score on the 30-item CTG MCQ test was positively associated with working in large maternity units and having less than 15 years of obstetric work experience. No differences were detected concerning healthcare professional background.

31 Study IV – CTG education: effect on birth hypoxia

Method In a historically controlled intervention study from 2009 to 2015 we examined the impact of the national mandatory inter-professional CTG education programme on the risk of birth hypoxia. Intended vaginal deliveries in all Danish hospitals with liveborn singletons in cephalic presentation and gestational age ≥37 weeks were included.

Data were retrieved from the Danish Medical Birth Register and the Danish National Patient Register. The study was divided into three periods: pre-implementation (2009-2012), implementation (2013) and post-implementation (2014-2015).

Using logistic regression we estimated odds ratios for cord pH <7.00, five-minute Apgar score <7 and therapeutic hypothermia with the pre-implementation period as the reference. Data were adjusted for potential maternal, neonatal and delivery-associated confounders. Missing data were accounted for by multiple imputation. To assess possible unintendedFig.1 Selection of study population effects of the education we All deliveries !2009 - 2015 compared the risk of emergency 402 645 Multiple deliveries caesarean section and assisted ! 8451 (2.1%) Singleton deliveries vaginal delivery in the three ! 394 194 study periods. Stillbirths! 1261 (0.3%) Liveborn singleton! deliveries Results 392 933 Home deliveries! The study population consisted 1775 (0.4%) Liveborn singleton !deliveries, hospital of 331 282 deliveries (Fig. 12). 391 158 GA < 37 or !≥ 44 weeks Fig. 13 shows the observed 19 433 (4.8%) Liveborn singleton deliveries, hospital, yearly incidences of the three GAFig. ≥ 37 ! 12.weeks Selection of study population primary outcomes. Table 2 371 725 Non-cephalic! presents odds ratios and 95% 13 804 (3.4%) Liveborn singleton deliveries, hospital, GA ≥ 37 weeks, cephalic confidence intervals in the 357 921 implementation and post- Planned caesarean! deliveries implementation period for the Liveborn singleton intended vaginal 26 639 (6.6%) deliveries, hospital, GA ≥ 37 weeks, cephalic three primary and the two ! 331 282 secondary outcomes.

Fig. 12. Selection of study population 32 Fig.2. Yearly incidences of birth hypoxia and operative deliveries 2009-2015 (based on complete cases)

0,70.7

0.60,6

0.50,5

0.40,4 pH < 7.00

0.3 Apgar < 7 Percent 0,3 Hypothermia 0.20,2

0.10,1

0.00 20091 20102 20113 20124 20135 20146 20157

Fig. 13. Observed yearly incidences of pH <7.00, five-minute Apgar score <7 and therapeutic hypothermia. 12

10

We did not detect 8 a risk reduction in birth hypoxia after the implementation of the national CTG education programme. We detected a 14% decrease in the risk of assisted vaginal delivery and 6 Emergency percent a transient increase in risk of emergency caesarean sections thatcaesarean ceased in the post- 4 section implementation period. Assisted vaginal 2 delivery 0 20091 2010 2 20113 20124 20135 20146 20157 No. of observed Missing Crude OR Crude OR Adjusted OR** p-value cases / no. of data observed data imputed data imputed data Adjusted deliveries n (%) (95% CI) (95% CI) (95% CI)

Primary outcomes

Umbilical cord pH <7.00 2009-2012 700 / 159 508* 35 634 (18.3) - - - - 2013 185 / 42 308* 2 211 (5.0) 1.00 (0.85-1.17) 1.00 (0.85-1.17) 0.99 (0.84-1.16) 0.90 2014-2015 435 / 88 479* 3 142 (3.4) 1.12 (0.99-1.26) 1.12 (0.99-1.27) 1.12 (1.00-1.26) 0.05

Five-minute Apgar score <7 2009-2012 1 146 / 194 520* 622 (0.3) - - - - 2013 254 / 44 373* 146 (0.3) 0.97 (0.85-1.11) 0.97 (0.85-1.11) 0.97 (0.84-1.11) 0.62 2014-2015 529 / 91 254* 367 (0.4) 0.98 (0.89-1.09) 0.98 (0.89-1.09) 0.99 (0.90-1.10) 0.92

Hypothermia treatment 2009-2012 111 / 195 142 - - - - - 2013 30 / 44 519 - 1.19 (0.79-1.77) Equal to analyses on 1.21 (0.80-1.81) 0.36 2014-2015 67 / 91 621 - 1.29 (0.95-1.74) observed data 1.34 (0.99-1.82) 0.06

Secondary outcomes

Emergency caesarean section 2009-2012 20 638 / 195 142 - - - - - 2013 5 035 / 44 519 - 1.08 (1.04-1.11) Equal to analyses on 1.05 (1.01-1.08) 0.008 2014-2015 9 683 / 91 621 - 1.00 (0.97-1.03) observed data 0.98 (0.96-1.01) 0.14

Assisted vaginal delivery 2009-2012 16 173 / 195 142 - - - - - 2013 3 506 / 44 519 - 0.95 (0.91-0.98) Equal to analyses on 0.91 (0.87-0.95) <0.0001 2014-2015 6 879 / 91 621 - 0.90 (0.87-0.93) observed data 0.86 (0.84-0.89) <0.0001 *The denominator equals the number of deliveries with a registered pH value or Apgar score respectively **Analysis of primary outcomes adjusted for: Maternal age, BMI, smoking, parity, , hypertensive disorder, child sex, congenital malformations, gestational age, birth weight, induction, umbilical cord prolapse, , , shoulder dystocia Analysis of secondary outcomes adjusted for: Maternal age, BMI, smoking, parity, diabetes, hypertensive disorder, child sex, congenital malformations, gestational age, birth weight, induction, umbilical cord prolapse

Table 2. Crude and adjusted odds ratios (OR) and 95% confidence interval (CI) in the implementation and post-implementation period for the three primary and the two secondary outcomes. 33 Ethics, Study I-IV Informed consent Informed consent to participate was obtained in Study I, II and III. For Study I participants had to activate a link in the email for the Delphi questionnaire, while Study II and III required written consent prior to taking the CTG test.

Anonymity Data processing was conducted anonymously in all four studies. Participant information was not accessible when the data for Study I were processed. For Study II and III all participants were assigned a unique unidentifiable number, which was used during data processing. For Study IV the data were delivered with encrypted identification numbers.

Ethics approval The Regional Ethics Committee of the Capital Region of Denmark evaluated all of the studies and, according to Danish regulations, ethics approval was not required. The file numbers for the studies are: Study I (H-1-2012- FSP), Study II and III (H-1-2013-FSP-48) and Study IV (H-1- 2013-FSP-9).

Data protection In Denmark, research projects managing personal data have to be authorised by the Danish Data Protection Agency if the data contains information on ethnicity, political, religious or philosophical orientation or health or sexual circumstances.66 Thus, Study IV required and received authorisation from the Data Protection Agency (file.no.: 30-1341).

Funding This PhD project was funded by TrygFonden, Aase and Ejnar Danielsens Foundation, Oestifterne, Toemmerhandler Johannes Fog’s Foundation and the Department of Obstetrics and The Juliane Marie Centre at Rigshospitalet, University of Copenhagen, Denmark. None of the funders played a role in the study designs, data collections, data analyses or writing of the manuscripts.

Conflicts of interest Morten Hedegaard was a member of the advisory board of Safe Deliveries, which was a non- profit organisation. We have no other conflicts of interest to declare.

34 Discussion

The overall objective of this PhD thesis was to develop, validate and evaluate a national CTG education programme. In the following we will discuss and answer the research questions and subsequently discuss the three subjects presented in the background section: development of an education programme, evaluation of an education programme and fetal monitoring with CTG.

Study I and II In the first two studies we developed a prioritised list of CTG learning objectives based on national consensus and also developed a 30-item CTG MCQ test.

We found that the CTG competences required for midwives and doctors responsible for women in labour reflected a wide variety of knowledge, skills and attitudes concerning CTG monitoring and that interpretation skills and clinical decision-making were rated higher than knowledge on fetal physiology. The allocation of learning objectives reflects the fetal monitoring examination content of the National Certification Corporation,67 however, several sources highlight the importance of interpreting CTG in the context of fetal physiology and the overall clinical picture.14,68,69 Thus, a thorough understanding of the fetal physiology seems an important foundation for CTG interpretation. The Delphi survey was a useful method for collating expert opinion into a consensus and the list of objectives was a helpful tool when deciding on content for the CTG course and MCQ test. Prospectively, it would be relevant to include more stakeholders, such as junior doctors and midwives, gynaecologists and patients, to ensure a diverse and complete Delphi panel. Other Delphi studies have validated the panel decisions and integrated more stakeholders.70 For the CTG MCQ test we found that the unitary approach to validity by Messick provided a thorough and systematic approach to determining whether or not the test measured CTG knowledge, interpretation skills and clinical decision-making.47,56 The concept that the whole developmental process is a part of the validation process and that evidence is collected from several different sources makes this a strong method and both the test’s strengths and limitations are clearly depicted. We found that the test measured what it was intended to measure and considered it appropriate to integrate in the CTG course. However, more items, including ones with a higher difficulty, need to be implemented for the test to serve as a high-stake examination or certification.

35 The fetal monitoring assessments, for example, of the National Certification Corporation and the Royal Australian and New Zealand College Obstetricians and Gynaecologists’ intrapartum Fetal Surveillance Programme contain 100 and 50 items, respectively.67,71

Study III and IV In the final two studies we evaluated the effect of the CTG education programme on Kirkpatrick levels 1, 2 and 4 and examined whether doctors and midwives’ CTG MCQ test scores were associated with work conditions.

We detected a positive effect on levels 1 and 2 and found that the mean number of correct answers in the CTG MCQ test was positively associated with working in a large maternity unit and having less than 15 years of obstetric work experience. We pointed out awareness on a possible challenge in maintaining CTG skills in small maternity units and awareness on a possible underrepresentation of CTG education among healthcare professionals with many years of work experience. A positive effect on the two lower Kirkpatrick levels has been detected previously.44 To our knowledge the differences we found concerning work conditions have not been examined in other CTG-studies. The differences were small and the clinical implications are debatable, the study nevertheless reflects conclusions drawn in other studies indicating poorer performance with increased years of work experience and poorer neonatal outcomes and also a higher incidence of approved obstetric claims in small maternity units compared to larger units.72-74 Whether our findings are applicable to the clinical setting we cannot answer, which is a limitation of this study.

Lastly, we evaluated the clinical effect of the national CTG education programme and did not detect a decrease in risk of umbilical cord pH <7.00, Apgar score >7 or use of neonatal therapeutic hypothermia. Other studies have explored the effect of CTG education on neonatal outcomes, most of them in the context of a patient safety programme that also involved other interventions, and most of them did not have any positive findings concerning umbilical cord pH or Apgar score.75-78 The Australian Fetal Surveillance Education Program saw a decrease in the incidence of hypoxic-ischaemic encephalopathy; however, changes in the coding might have affected the results.79 A 2006 British study evaluating the impact of CTG education and simulation training of six different obstetric emergencies found a reduction in five-minute Apgar score <7 and hypoxic- ischaemic encephalopathy.80

36 Thus the positive effect of CTG education on neonatal outcomes is not conclusive.

An increase in caesarean sections has been found in other studies evaluating the effect of CTG education.75,80 In our study we found a transient increase in risk of caesaren section and a 14% reduction in risk of assisted deliveries, hence did not experience unintended consequences of CTG education with regard to procedure frequency. We consider avoidance of tachysystole, maternal haemodynamic optimisation and avoidance of maternal fever and infection as intrapartum interventions that potentially could decrease birth hypoxia, but are aware that an operative delivery could as well. Thus, we cannot reject that an increase in operative deliveries could have lead to a decrease in incidence of birth hypoxia. Ideally, we wanted to know whether operative deliveries were performed when fetal oxygenation was inadequate and avoided when fetal oxygenation was sufficient, in accordance with the aim of fetal surveillance. This information could unfortunately not be extracted from the national registers.

37 Development of an education programme

The initial steps of Kern’s curriculum development involve defining the problem and assessing the needs. In the current case, the needs analysis was limited to concern healthcare professionals’ CTG knowledge and skills, and we hypothesised that standardised CTG education would eliminate the gap between ideal and current behaviour. However, the insufficient CTG management described in the compensation claims might not merely be due to a lack of knowledge.

In its Sentinel Event Alert on infant death and injury during delivery, the Joint Commission described communication issues as the most identified root cause.42 In addition to CTG education, they recommended team training, clinical drills and debriefings but also a review of organisational policies. In our case, identifying other aspects of the problem and having performed a more extensive needs analysis might have entailed a broader and perhaps more effective intervention. In addition to lack of CTG knowledge and skills, non-compliance with written guidelines, failure in obtaining senior medical help, delayed decision-to-delivery time and non-optimal mode of delivery are mentioned as reasons for substandard care in compensation claims concerning hypoxic brain injuries.3,39,40 These issues of substandard care are not addressed by CTG education alone. As Kern states: “one needs to keep in mind that an educational intervention by itself cannot solve all aspects of a complex problem”.45

In Kern’s third and fourth steps, goals and objectives are defined and educational strategies chosen. We found that the prioritised list of learning objectives developed in Study I was a useful tool when making decisions about the course and test content. Combining various teaching methods is known to enhance learning; thus, employing the combination of computer- based individual learning, classroom teaching, plenary discussions and small group case-based interactive learning was well founded.81 To achieve full compliance with the learning objectives, however, simulation, team and communication training should have been integrated into the curriculum. Logistics, facilities, economics, time and staff capacities are all components that must be considered when choosing teaching methods. Organising 62 CTG courses with more than 2000 midwives and doctors in attendance was a comprehensive task. Running courses with 40 participants each with high quality teaching by two teachers is expectedly a low-cost option for a national teaching intervention compared to what simulation and team-training courses would involve.

38 As recommended the CTG education programme was deliberately conducted in an inter- professional context.17,82 Hereby avoiding the silo approach and instead building a uniform language for doctors and midwives on a national level, which hopefully has reduced the amount of communication errors. The mandatory and national approach has expectedly increased the possible transfer of knowledge and skills by addressing the issue of colleagues not speaking the same CTG language. This phenomenon is described after advanced life support training where attendants describe the transfer of acquired skills challenged in emergency situations where not all team members had attended the training course, hence did not have the same approach to advanced life support.83

Concerning the fifth and sixth step of Kern’s model we do find it impressive that the CTG education programme was implemented with only a short delay and was completed by the majority of midwives and doctors at the Danish maternity units. Due to time restraints a piloting of the education programme was not performed, which was a limitation. It would expectedly have provided useful information on practicalities, barriers and resources, as well as knowledge on what worked and did not work. These experiences were instead gathered during the initial courses and minor revisions were made. Another limitation was the lack of feedback given to CTG course participants, who received an e-mail after the course with their test results but did not have the opportunity to discuss their results with the course instructors. Feedback contributes with an opportunity of correcting mistakes, provides as guidance and encourages learning, hence should prospectively be given more weight in the curriculum.81

Test development is a complex and time-consuming process but the importance of examining whether a test actually measures what it is purposed to measure cannot be over-emphasised. An awareness of which competences a specific assessment is capable of measuring is essential. In Miller’s framework for assessing clinical competences, the written assessment operates on the two lower level of competence assessment “knows” and “knows how”.84 If the aim is to obtain information on how healthcare professionals perform in the clinical setting, “shows how” and “does”, other types of assessments, such as observational and performance tests, need to be integrated in the curriculum. Assessing competences is increasingly being named as an essential part of CTG education, some recommending that completing a CTG education programme and passing a CTG test should be a prerequisite for working in a maternity unit.14,68,85,86

39 We concur with medical education literature recommendations that decisions of considerable consequence for individuals, such as restrictions on clinical work, should not be made based on one assessment method only.49 During the CTG courses we found that undergoing the MCQ test was a significant and, for some of the midwives and doctors, a stressful aspect of the education programme. Some of their anxiety was apparently due to the lack of clarity concerning the purpose of the test. Anxiety might affect performance and, prospectively, it is important to be very clear about the purpose of the test to ensure transparency and to decrease stress and anxiety.87

40 Evaluation of an education programme

We measured a positive effect on the two lower levels of Kirkpatrick’s model and found a reduction in assisted vaginal delivery on level 4. It is a limitation of this thesis that possible behaviour changes in the clinical setting (Level 3) indicating transfer of CTG knowledge and skills to the workplace were not examined. Chart reviews and perinatal audits focusing on the quality of intrapartum care before and after the CTG education programme could have illuminated this aspect. As well could an updated analysis of obstetric compensation claims.

Other studies have described behaviour changes after obstetric safety interventions including CTG training, such as a decrease in obstetric compensation claims, increase in quality of intrapartum care and improvement in teamwork and safety climate.77,88-90

As we aimed to measure the learning effect of the CTG education programme and the impact on neonatal outcomes, Kirkpatrick’s evaluation model seemed appropriate. However, the model is criticised for presenting an oversimplified view of training effectiveness. The assumption of causality between the four levels and the assumption of increased importance of each level is being questioned.91 Additionally, focusing on specific outcomes promotes tunnel vision and unexpected effects of an education intervention might not be identified.54 It is suggested “to move beyond asking whether a programme worked, to establishing how it worked, why it worked and what else happened”.92 In retrospect it would have been relevant to integrate both process- and participant-oriented evaluations to explore this.

Concerning birth hypoxia as an outcome, medical education literature cautions against focusing solely on patient outcomes in that the skills acquired during a teaching intervention may be diluted in a complex clinical setting in which many other factors affect the patient outcome.93 Labour is a complex event and many other aspects besides correct CTG monitoring may affect the wellbeing of the newborn. As stated by patient safety literature: “In a complex system, because of the deep and extended webs of interactions and interconnections, the action of any agent controls little but influences almost everything”.94 This dilution of effect could be one of the explanations for the current interventions’ lack of impact on birth hypoxia. Other comprehensive obstetric educational interventions have as well not been able to decrease adverse neonatal outcomes.95,96 Measuring educationally sensitive patient outcomes, such as performance and teamwork skills in the working environment has been suggested as more appropriate outcomes that expectedly will lead to improved patient outcome.97

41 An additional explanation for the lack of effect on birth hypoxia could be the rarity of the outcomes. In our study population the overall risk of cord pH <7.00 and five-minute Apgar score <7 was 0.45% and 0.58%. Other studies with comparable populations find risks of 2.2% and 0.86% respectively.1,80 Perhaps the rarity makes a further decrease of birth hypoxia in Denmark difficult to achieve.

A major strength of this thesis is the large sample size made possible by the national context of the education programme, which also enabled the use of rare and clinically relevant outcomes. The joining of medical education research and epidemiological methodology is also considered a strength. Our design, however, makes it difficult to establish a causal link between the intervention and the outcomes as it does not ensure comparable groups and does not preclude the potential impact of other changes in obstetric care during the study period. A research design that includes a contemporary control group or a cluster randomisation would be required to strengthen the results addressing the research questions in Study IV.

We did not examine the retention of CTG knowledge and interpretation skills. Other studies on CTG education describe good retention of knowledge after six and seven months and suggest annual CTG education to maintain competences.98-100

42 Fetal monitoring with cardiotocography

CTG is an integral part of daily obstetric care in Denmark and in developed countries in general without indication of a decrease in use. Midwives and doctors therefore need to have the knowledge and skills to manage CTG properly, which is why continuing education is a necessity. Our results, which indicate an increase in CTG knowledge and improved interpretation skills among the vast majority of healthcare professionals after taking the CTG course, support this statement. As stated by FIGO already in the 1980s: “..it cannot be emphasized enough that understanding and interpretation of a FHR record is not an easy matter and that formal training in the underlying physiology and the practice of FHR monitoring is indispensable for all those supposed to make decisions on FHR records”.10

While CTG is criticised for being implemented into clinical practice without being properly examined rigorous evidence is needed before implementing a new technique.101 This stance is difficult to disagree with but a better alternative to CTG has yet to be developed. According to the Cochrane meta-analysis CTG is as effective or better (lower seizure incidence) than intermittent auscultation but seems to increase the incidence of operative deliveries.27

A variety of ways have been suggested to improve the technique, including: a simple classification system, clear guidelines connecting CTG classification and clinical management, CTG education, computer-assisted interpretation techniques, interpreting CTG recordings based on fetal physiology knowledge and the overall clinical picture and a focus on communication and input from senior professionals.34,69,102 Others emphasise the shortcomings of visual interpretation and do not believe that education can advance the technology any further.103 They suggest development and implementation of intelligent computer systems that integrates clinical data when analysing FHR and that recommends what actions to take. Computer systems designed to assist CTG interpretation emerged in the 1980s and currently five systems are in use.104 The publications from two large multicentre randomised controlled trials which examine the impact of computerised interpretation of CTG versus non-computerised CTG interpretation on birth hypoxia are awaited.105,106 CTG was developed and implemented with the hope of decreasing brain injuries such as cerebral palsy, however the predictive value in determining cerebral palsy is uncertain and the incidence of cerebral palsy have been stable during the last 30 years.107,108 Cerebral palsy, now viewed as a heterogeneous condition with multiple and possible additive causes, is suggested better named cerebral palsies.109

43 Some studies indicate intrapartum hypoxia as the cause in only 8% of cerebral palsy cases among term infants, while others estimate it as the cause in 28% of cases.110,111 International consensus criteria for a causal relationship between acute intrapartum events and cerebral palsy was developed in 1999 to minimise the risk of cerebral palsy cases being mislabelled as due to intrapartum hypoxia.58,109 The cases described in the compensation claims concerning hypoxic brain injuries do not all meet the above mentioned consensus criteria.3,39,40 This means, that an overestimation of the significance of CTG errors cannot be excluded, which again might explain the missing effect of CTG education on neonatal outcomes.

44 Conclusions

Errors in CTG management, such as CTG misinterpretation, omission of use of CTG when indicated and delay in response to a pathological recording are known and well described as causes of substandard care in obstetrics, and have been suggested as being associated to hypoxic brain injuries among newborns.3,39-41 Healthcare authorities, international guidelines and individual clinicians recommend continuing education in fetal monitoring.11,14,38,42,43,68,69 This PhD described and discussed the development, validation and impact of a national inter- professional CTG education programme. We developed a prioritised list of CTG learning objectives and a CTG MCQ test measuring CTG knowledge, interpretation skills and decision- making. We evaluated the CTG education programme and found it to increase CTG knowledge and interpretation skills but not to have a reducing effect on birth hypoxia. Additionally, we found that CTG test scores of doctors and midwives were positively associated with working in large maternity units and having less than 15 years of work experience. Based on our studies we view a well-planned curriculum with a thorough problem identification and needs analysis as the foundation for a teaching intervention and cannot over-emphasise the importance of a thorough validation process when integrating an assessment in the curriculum. We consider the current CTG education programme beneficial in terms of increasing self- perceived and measured CTG knowledge and interpretation skills and in establishing a uniform CTG language on a national level with joint ownership among midwives and doctors of CTG management. We recommend awareness of the possible challenges involved in maintaining CTG skills in small maternity units and among experienced healthcare workers. Our study did not detect a decrease in birth hypoxia, which can be due to numerous reasons, such as; a limited problem identification and needs assessment entailing an insufficient intervention, a dilution of effect when the improved knowledge and skills are integrated in a complex clinical setting, the rarity of the outcome and a possible over-estimation of the association between CTG management errors and birth hypoxia. As long as CTG is a part of daily obstetric practice continuing development and maintenance of CTG skills remains crucial.

45 Perspectives

The studies in this thesis have contributed to knowledge concerning the impact of CTG education. The thesis illuminates the complexity in developing and evaluating a large-scale educational intervention. Time and resources, as well as inputs from all stakeholders, all healthcare groups involved and from professionals with expertise within the relevant subject areas and medical education are crucial in the planning process. In addition, both development and evaluation should be given proper attention and time.

Exclusive outcome-oriented evaluation of an educational intervention limits the gained information and provides insufficient knowledge on the process, hence limits the use of the evaluation in future educational projects. Various evaluation approaches should be considered as well as joining different methodologies. What is a successful educational intervention seems an essential question to ask during the planning process. Safety- and teamwork climate as well as healthcare professionals’ job satisfaction and motivation for learning and changing of behaviour might as well be important measures.

Based on previous studies and this thesis we doubt that knowledge-based CTG education as a single intervention will decrease birth hypoxia. The combination of CTG education and obstetric emergency simulation training seems beneficial as well as integrating team and communication training might be considered, in addition to involve other healthcare professionals such as paediatricians and anaesthesiologists.

46 Acknowledgements

This PhD would not have been possible without the collaboration of many inspiring people. I am deeply grateful to all of your for being part of the journey.

I would like to express a deep felt thanks to my three supervisors, Morten Hedegaard, Thomas Bergholt and Jette Led Sørensen for their enduring encouragement and guidance. Morten, for trusting me with this project and for sharing your profound expertise in research and obstetrics. Working with you has been such a privilege. Your enthusiasm and positive attitude are deeply inspiring and I am grateful for the continuing support and the trust you placed in me and my work during the process. Thank you for opening doors and for making me a part of the world of obstetrics, not to mention sharing enlightening historical quizzes during long trips to meetings in Denmark. Thomas, my research and intellectual mentor, a coffee break at Hillerød Hospital with you and Helle Ejdrup Bredkjær ended up with an exciting PhD-project - thank you! Working with you has been a tremendous pleasure, your stringent methodological approach to research and your insistence of accuracy instead of speed a significant lesson. Your direct questions about research and life in general, have led to insightful reflection. Also thank you so much for great times around the piano. Jette, for introducing me to the exciting world of medical education. Your professional dedication and fearlessness of pointing out supoptimal conditions is admirable and inspiring. I greatly appreciate the close and rewarding collaboration we have shared. Thank you for being right next door and for always thoroughly examining and reflecting on my work, as well as for being great company at conferences around Europe.

To my amazing officemates and highly valued friends, Charlotte Holm and Flemming Bjerrum. Working with you has been an incredible pleasure. Thank you for the constructive discussions, for the support during challenging times and the joint celebration in good times. You made it so much fun to go to work. I would also like to thank my PhD-course-companion and friend Christina Norrbom for a great collaboration and for excellent coffee breaks.

Thank you to my inspiring fellow PhD students at the Department of Obstetrics and Gynaecology at Rigshospitalet, who I look forward to continuing to work with in the clinical setting. I am also very grateful to the skilled and helpful staff at the Juliane Marie Centre, Rigshospitalet.

47 Obstetrician Nina Palmgren and midwife Kristine Sylvan Andersen, what a joy and inspiration to work with you. Thank you for expanding my knowledge and understanding of fetal surveillance. You have been there from start to finish, always bringing the clinical picture, the women and the children into the research. I am also deeply indepted to secretary Susanne Mårtensson, my saviour and premium problem solver.

An especially warm thank you to the midwives, doctors, and medical and midwifery students who participated in the studies in this PhD. Also I am grateful for the open-minded and rewarding collaboration with Sikre fødsler and with The Danish Regions, for the insight gained into the processes, facilitation and challenges involved in implementing a large national intervention.

Susanne Rosthøj, thank you for invaluable assistance and expertise concerning statistical analyses, as well as for your patience, time and amazingly calming influence. To datamanagement wizard Steen Christian Rasmussen, I owe a dept of gratitude for helping me taim a comprehensive data set. And thank you to obstetric coding expert and midwife Lene Friis Eskildsen for sharing your knowledge.

I would also like to thank my co-authors midwife Stinne Høgh, statistician Karl Bang Christensen and obstetrician Lone Hvidman for rewarding collaborations on studies and manuscripts. My deepest gratitude to Professor Brenda Eskenazi and the Center for Environmental Research and Children’s Health, School of Public Health (CERCH), University of California, Berkeley for a highly inspiring research stay.

And last, I would like to express my endless gratitudes and love for my friends and family, for your continuous support, hugs and talks about everything else but research. Finally, my deepest thanks go to Chano, Sille and Vilma, for your love and patience and for putting it all into perspective.

48 Funding

The studies in this thesis was funded by

TrygFonden Østifterne Aase and Ejnar Danielsens Foundation Tømmerhandler Johannes Fogs Foundation Department of Obstetrics, Rigshospitalet The Juliane Marie Centre, Rigshospitalet to whom I am deeply grateful.

49 References

1. Yeh P, Emary K, Impey L. The relationship between umbilical cord arterial pH and serious adverse neonatal outcome: analysis of 51,519 consecutive validated samples. BJOG: An International Journal of Obstetrics & Gynaecology. 2012 Jun;119(7):824–31.

2. Malin GL, Morris RK, Khan KS. Strength of association between umbilical cord pH and perinatal and long term outcomes: systematic review and meta-analysis. BMJ. British Medical Journal. 2010 May 13;340(may13 1):c1471–1.

3. Hove LD, Bock J, Christoffersen JK, Hedegaard M. Analysis of 127 peripartum hypoxic brain injuries from closed claims registered by the Danish Patient Insurance Association. Acta Obstet Gynecol Scand. 2008;87(1):72–5.

4. Sikre Fødsler [Safe Deliveries]. http://www.dsog.dk/wp/dsog/projekt-sikre-fodsler.

5. Ananth CV, Chauhan SP, Chen H-Y, D'Alton ME, Vintzileos AM. Electronic fetal monitoring in the United States: temporal trends and adverse perinatal outcomes. Obstet Gynecol. 2013 May;121(5):927–33.

6. Holzmann M, Nordström L. Follow-up national survey (Sweden) of routines for intrapartum fetal surveillance. Acta Obstet Gynecol Scand. 2010 May;89(5):712–4.

7. Ayres-de-Campos D, Nogueira-Reis Z. Technical characteristics of current cardiotocographic monitors. Best Pract Res Clin Obstet Gynaecol. 2016 Jan;30:22–32.

8. Ayres-de-Campos D, Arulkumaran S. FIGO consensus guidelines on intrapartum fetal monitoring: Physiology of fetal oxygenation and the main goals of intrapartum fetal monitoring. International Journal of Gynecology & Obstetrics. 2015 Oct;131(1):5–8.

9. Ayres-de-Campos D, Bernardes J, FIGO Subcommittee. Twenty-five years after the FIGO guidelines for the use of fetal monitoring: time for a simplified approach? Int J Gynaecol Obstet. 2010 Jul;110(1):1–6.

10. Guidelines for the use of fetal monitoring. Int J Gynecol Obstet 1987; 25:159-167.

11. Ayres-de-Campos D, Spong CY, Chandraharan E, FIGO Intrapartum Fetal Monitoring Expert Consensus Panel. FIGO consensus guidelines on intrapartum fetal monitoring: Cardiotocography. Int J Gynaecol Obstet. 2015 Oct;131(1):13–24.

12. Intrapartum care for healthy women and babies. Monitoring during labour. National Institute for Health and Care Excellence Guideline. 2014. http://www.nice.org.uk/guidance/cg190/resources/intrapartum-care-for-healthy-women-and- babies-35109866447557.

13. American College of Obstetricians and Gynecologists. ACOG Practice Bulletin No. 106: Intrapartum fetal heart rate monitoring: nomenclature, interpretation, and general management principles. Obstetrics and gynecology. 2009. pp.192–202.

14. Intrapartum fetal surveillance. The Roal Australian and New Zealand College of Obstetricians and Gynaecologists Guideline. 2014. http://www.ranzcog.edu.au/intrapartum-fetal-surveillance- clinical-guidelines.html.

15. Neoventa Classification of CTG. http://www.neoventa.com/ctg-pocket-guide-app/

50 16. Goodlin RC. History of fetal monitoring. Am J Obstet Gynecol. 1979 Feb 1;133(3):323–52.

17. Freeman RK, Garite TJ, Nageotte MP, Miller LA. Fetal Heart Rate Monitoring. Lippincott Williams & Wilkins; 2012. Ch.1-3 & 14.

18. Symonds ME, Sahota D, Chang A. Fetal Electrocardiography. 2001. Imperial college press. p.1.

19. Jenkins HM. Thirty years of electronic intrapartum fetal heart rate monitoring: discussion paper. J R Soc Med. 1989 Apr;82(4):210–4.

20. Resnik R. Electronic fetal monitoring: the debate goes on...and on...and on. Obstet Gynecol. 2013 May;121(5):917–8.

21. Schifferli PY, Caldeyro-Barcia R. Effects of atropine and bet-adrenerg drugs on the fetal heart rate of the human fetus. In Fetal pharmacology by Boréus LO. Raven Press, New York, 1973. pp.259-279.

22. Renou P, Warwick N, Wood C. Autonomic control of fetal heart rate. Am J Obstet Gynecol. 1969 Nov;105(6):949-53.

23. Robinson B, Nelson L. A Review of the Proceedings from the 2008 NICHD Workshop on Standardized Nomenclature for Cardiotocography: Update on Definitions, Interpretative Systems With Management Strategies, and Research Priorities in Relation to Intrapartum Electronic Fetal Monitoring. Rev Obstet Gynecol. 2008;1(4):186–92.

24. Parer JT, King T, Flanders S, Fox M, Kilpatrick SJ. Fetal acidemia and electronic fetal heart rate patterns: Is there evidence of an association? J Matern Fetal Neonatal Med. 2006 Jan;19(5):289–94.

25. Williams RL, Hawes WE. Cesarean section, fetal monitoring, and perinatal mortality in California. Am J Public Health. 1979 Sep;69(9):864–70.

26. Parer JT. Fetal heart-rate monitoring. Lancet. 1979 Sep 22;2(8143):632–3.

27. Alfirevic Z, Devane D, Gyte GML. Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour. Alfirevic Z, editor. Cochrane Database Syst Rev. Chichester, UK: John Wiley & Sons, Ltd; 2013;5:CD006066.

28. Saldana LR, Schulman H, Yang WH. Electronic fetal monitoring during labor. Obstet Gynecol. 1976 Jun;47(6):706–10.

29. Altman DG. Practical Statistics for Medical Research. CRC Press; 1990. Section 14.4.

30. Blix E, Sviggum O, Koss KS, Oian P. Inter-observer variation in assessment of 845 labour admission tests: comparison between midwives and obstetricians in the clinical setting and two experts. BJOG: An International Journal of Obstetrics & Gynaecology. 2003 Jan;110(1):1–5.

31. Rhöse S, Heinis AMF, Vandenbussche F, van Drongelen J, van Dillen J. Inter- and intra- observer agreement of non-reassuring cardiotocography analysis and subsequent clinical management. Acta Obstet Gynecol Scand. 2014 June;93(6):596–602.

32. Ayres-de-Campos D, Arteiro D, Costa-Santos C, Bernardes J. Knowledge of adverse neonatal outcome alters clinicians' interpretation of the intrapartum cardiotocograph. BJOG: An International Journal of Obstetrics & Gynaecology. 2011 Jul;118(8):978–84.

33. Reif P, Schott S, Boyon C, Richter J, Kavšek G, Timoh KN, et al. Does knowledge of fetal outcome influence the interpretation of intrapartum cardiotocography and subsequent clinical management? A multicentre European study. BJOG: An International Journal of Obstetrics & 51 Gynaecology. 2016 Feb 16;:n/a–n/a.

34. Santo S, Ayres-de-Campos D. Human factors affecting the interpretation of fetal heart rate tracings: an update. Curr Opin Obstet Gynecol. 2012 Mar;24(2):84–8.

35. Mongelli M, Chung TK, Chang AM. Obstetric intervention and benefit in conditions of very low prevalence. Br J Obstet Gynaecol. 1997 Jul;104(7):771-4.

36. Banta HD, Thacker SB. Policies toward medical technology: the case of electronic fetal monitoring. Am J Public Health. 1979 Sep;69(9):931–5.

37. Sartwelle TP, Johnston JC. Cerebral palsy litigation: change course or abandon ship. J Child Neurol. 2015 Jun;30(7):828–41.

38. Ten Years of Maternity Claims. An Analysis of NHS Litigation Authority Data. Published by NHS litigation Authority. Oct. 2012. http://www.nhsla.com/safety/Documents/Ten%20Years%20of%20Maternity%20Claims%20- %20An%20Analysis%20of%20the%20NHS%20LA%20Data%20-%20October%202012.pd

39. Andreasen S, Backe B, Oian P. Claims for compensation after alleged birth asphyxia: a nationwide study covering 15 years. Acta Obstet Gynecol Scand. 2014 Feb 1;93(2):152–8.

40. Berglund S, Grunewald C, Pettersson H, Cnattingius S. Severe asphyxia due to delivery- related malpractice in Sweden 1990-2005. BJOG: An International Journal of Obstetrics & Gynaecology. 2008 Feb;115(3):316–23.

41. Evers ACC, Brouwers HAA, Nikkels PGJ, Boon J, van Egmond-Linden A, Groenendaal F, et al. Substandard care in delivery‐related asphyxia among term infants: prospective cohort study. Acta Obstet Gynecol Scand. 2013 Jan 1;92(1):85–93.

42. Preventing infant death and injury during delivery. Sentinel Event Alert. 2004 Jul 21;(30):1–3.

43. Svangreomsorgen [Recommendations for care during pregnancy] 2015, Danish Health Authority. pp.160. https://sundhedsstyrelsen.dk/da/udgivelser/2015/~/media/C18BD8F183104A8384F80B73B155 826D.ashx

44. Pehrson C, Sorensen JL, Amer-Wåhlin I. Evaluation and impact of cardiotocography training programmes: a systematic review. BJOG: An International Journal of Obstetrics & Gynaecology. 2011 Jul;118(8):926–35.

45. Thomas PA, Kern DE, Hughes MT, Chen BY. Curriculum Development for Medical Education. A six step approach. Third edition. JHU Press; 2015

46. Harden RM. Ten questions to ask when planning a course or curriculum. Med Educ. 1986 Jul;20(4):356–65.

47. Standards for educational and psychological testing. Amer Educational Research 1999. pp.11- 17.

48. Larsen DP, Butler AC, Roediger HL. Test-enhanced learning in medical education. Med Educ. 2008 Oct;42(10):959–66.

49. Downing SM, Yudkowsky R. Assessment in Health Professions Education. New York: Routledge; 2009. Ch.1-3.

50. Schuwirth LWT, van der Vleuten CPM. ABC of learning and teaching in : Written assessment. BMJ. 2003 Mar 22;326(7390):643–5.

52 51. Schuwirth LWT, van der Vleuten CPM. Different written assessment methods: what can be said about their strengths and weaknesses? Med Educ. 2004 Sep;38(9):974–9.

52. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med. 2006 Feb;119(2):166.e7–16.

53. Messick S. Validity of Psychological Assessment. Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologists 1995;50:741-749..

54. Cook DA. Twelve tips for evaluating educational programs. Med Teach. 2010;32(4):296–301.

55. Eseryel D. Approaches to evaluation of training: Theory & practice. Educational Technology & Society 5(2) 2002.

56. Kirkpatrick DL, Kirkpatrick JD. Evaluating Training Programs. The four levels. Third edition. Berrett-Koehler Publishers; 2006.

57. Herskind A, Greisen G, Nielsen JB. Early identification and intervention in cerebral palsy. Dev Med Child Neurol. 2015 Jan;57(1):29–36.

58. MacLennan A. A template for defining a causal relation between acute intrapartum events and cerebral palsy: international consensus statement. BMJ: British Medical Journal. 1999 Oct 16;319(7216):1054.

59. Nelson KB, Ellenberg JH. Apgar scores as predictors of chronic neurologic disability. . 1981 Jul;68(1):36-44.

60. Casey BM, McIntire DD, Leveno KJ. The continuing value of the Apgar score for the assessment of newborn infants. N Engl J Med. 2001 Feb 15;344(7):467-71.

61. Lando A, Jonsbo F, Hansen BM, Greisen G. [Induced hypothermia in infants born with hypoxic- ischaemic encephalopathy]. Ugeskr Laeg. 2010 May 10;172(19):1433–7.

62. The Society of Obstetricians and Gynaecologists of Canada. Fetal Health Surveillance: Antepartum and intrapartum consensus guideline 2008.

63. The Danish Society of Obstetrics and Gynaecology Guidelines. Asphyxia, Doorstep CTG, Scalp pH/ Scalp lactate.

64. Anderson LW, Krathwohl DR, Airasian PW, Cruikshank, KA, Mayer RE, Pintrich PR, et al. A taxonomy for learning, teaching, and assessing. Boston: Addison Wesley Longman, 2001. Section I-III.

65. Hsieh H-F, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. SAGE Publications; 2005 Nov;15(9):1277–88.

66. Datatilsynet [The Data Protection Agency]. https://www.datatilsynet.dk/fileadmin/user_upload/dokumenter/Persondatalovspjece/Persondat alovspjece.htm.

67. Electronic Fetal Monitoring. 2014 Candidate Guide. The National Certification Corporation. http://www.nccwebsite.org/resources/docs/2014-efm-candidate_guide.p

68. Ugwumadu A, Steer P, Parer B, Carbone B, Vayssiere C, Maso G, et al. Time to optimise and enforce training in interpretation of intrapartum cardiotocograph. BJOG: An International Journal of Obstetrics & Gynaecology. 2016 May;123(6):866–9.

69. Pinas A, Chandraharan E. Continuous cardiotocography during labour: Analysis, classification 53 and management. Best Pract Res Clin Obstet Gynaecol. 2016 Jan;30:33–47.

70. Burden C, Fox R, Lenguerrand E, Hinshaw K, Draycott TJ, James M. Curriculum development for basic gynaecological laparoscopy with comparison of expert trainee opinions; prospective cross-sectional observational study. Eur J Obstet Gynecol Reprod Biol. 2014 Sep;180:1–7.

71. The Royal Australian and New Zealand College of Obstetricians and Gynaecologists. Fetal Surveillance Education Program (FSEP). http://www.fsep.edu.au

72. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: the relationship between clinical experience and quality of health care. Ann Intern Med. 2005 Feb 15;142(4):260–73.

73. Milland M, Mikkelsen KL, Christoffersen JK, Hedegaard M. Severe and fatal obstetric injury claims in relation to labor unit volume. Acta Obstet Gynecol Scand. 2015 May;94(5):534–41.

74. Moster D, Lie RT, Markestad T. Neonatal mortality rates in communities with small maternity units compared with those having larger maternity units. BJOG: An International Journal of Obstetrics & Gynaecology. 2001 Sep;108(9):904–9.

75. Pettker CM, Thung SF, Norwitz ER, Buhimschi CS, Raab CA, Copel JA, et al. Impact of a comprehensive patient safety strategy on obstetric adverse events. Am J Obstet Gynecol. 2009 May;200(5):492.e1–8.

76. Goffman D, Brodman M, Friedman AJ, Minkoff H, Merkatz IR. Improved obstetric safety through programmatic collaboration. J Healthc Risk Manag. 2014;33(3):14–22.

77. Young P, Hamilton R, Hodgett S, Moss M, Rigby C, Jones P, et al. Reducing risk by improving standards of intrapartum fetal care. J R Soc Med. 2001 May;94(5):226–31.

78. Millde-Luthander C, Källen K, Nyström ME, Högberg U, Håkansson S, Härenstam KP, et al. Results from the National Perinatal Patient Safety Program in Sweden: the challenge of evaluation. Acta Obstet Gynecol Scand. 2016 May;95(5):596–603.

79. Byford S, Weaver E, Anstey C. Has the incidence of hypoxic ischaemic encephalopathy in Queensland been reduced with improved education in fetal surveillance monitoring? Aust N Z J Obstet Gynaecol. 2014 Aug;54(4):348–53.

80. Draycott T, Sibanda T, Owen L, Akande V, Winter C, Reading S, et al. Does training in obstetric emergencies improve neonatal outcome? BJOG: An International Journal of Obstetrics & Gynaecology. 2006 Feb;113(2):177–82.

81. Dent J, Harden RM. A Practical Guide for Medical Teachers. Fourth edition 2013. Churchill Livingstone. Section 1 & 6.

82. Collins DE. Multidisciplinary teamwork approach in labor and delivery and electronic fetal monitoring education: a medical-legal perspective. J Perinat Neonatal Nurs. 2008 Apr;22(2):125–32.

83. Rasmussen MB, Tolsgaard, Dieckmann P, Barry Issenberg S, Ostergaard D, Søreide E, Rosenberg J, Ringsted CV. Factors relating to the perceived managament of emergency situations: a survey of former advanced life support course participants's clinical experience. Resuscitation. 2014 Dec;85(12):1726–31.

84. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990 Sep;65(9 Suppl):S63–7.

85. Berkowitz RL, D'Alton ME, Goldberg JD, O'Keeffe DF, Spitz J, Depp R, et al. The case for an electronic fetal heart rate monitoring credentialing examination. Am J Obstet Gynecol. 2014 Mar;210(3):204–7. 54 86. Bhide A, Chandraharan E, Acharya G. Fetal monitoring in labor: Implications of evidence generated by new systematic review. Acta Obstet Gynecol Scand. 2016 Jan;95(1):5–8.

87. Chapell MS, Blanding ZB, Silverstein ME, Takahashi M, Newman B, Gubi A, et al. Test Anxiety and Academic Performance in Undergraduate and Graduate Students. Journal of Educational Psychology. American Psychological Association; 2005 May 1;97(2):268–74.

88. Grunebaum A, Chervenak F, Skupski D. Effect of a comprehensive obstetric patient safety program on compensation payments and sentinel events. Am J Obstet Gynecol. 2011 Feb;204(2):97–105.

89. Iverson RE, Heffner LJ. Patient safety series: obstetric safety improvement and its reflection in reserved claims. Am J Obstet Gynecol. 2011 Nov;205(5):398–401.

90. Pettker CM, Thung SF, Raab CA, Donohue KP, Copel JA, Lockwood CJ, et al. A comprehensive obstetrics patient safety program improves safety climate and culture. Am J Obstet Gynecol. 2011 Mar;204(3):216.e1-6.

91. Bates R. A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence. Evaluation and Program Planning. 2004 Aug;27(3):341–7.

92. Haji F, Morin M-P, Parker K. Rethinking programme evaluation in health professions education: beyond 'did it work?'. Med Educ. 2013 Apr;47(4):342–51.

93. Cook DA, West CP. Perspective: Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Acad Med. 2013 Feb;88(2):162–7.

94. Dekker S. Patient safety: A human Factors approach. CRC Press; 2011.

95. Fransen AF, van de Ven J, Schuit E, van Tetering A, Mol BW, Oei SG. Simulation-based team training for multi-professional obstetric care teams to improve patient outcome: a multicentre, cluster randomised controlled trial. BJOG. 2016 Oct 10. doi: 10.1111/1471-0528.14369. [Epub ahead of print]

96. Nielsen PE, Goldman MB, Mann S, Shapiro DE, Marcus RG, Pratt SD, et al. Effects of teamwork training on adverse outcomes and process of care in labor and delivery: a randomized controlled trial. Obstet Gynecol 2007;109:48–55.

97. Kalet AL, Gillespie CC, Schwartz MD, Holmboe ES, Ark TK, Jay M, et al. New measures to establish the evidence base for medical education: identifying educationally sensitive patient outcomes. Acad Med. 2010 May;85(5):844–51.

98. Trépanier MJ, Niday P, Davies B, Sprague A, Nimrod C, Dulberg C, et al. Evaluation of a fetal monitoring education program. J Obstet Gynecol Neonatal Nurs. 1996 Feb;25(2):137–44.

99. Beckley S, Stenhouse E, Greene K. The development and evaluation of a computer‐assisted teaching programme for intrapartum fetal monitoring. BJOG: An International Journal of Obstetrics & Gynaecology. 2000 Sep 1;107(9):1138–44.

100. Guild SD. A comprehensive fetal monitoring program for nursing practice and education. J Obstet Gynecol Neonatal Nurs. 1994 Jan;23(1):34–41.

101. Bloom SL, Belfort M, Saade G, Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal-Fetal Medicine Units Network. What we have learned about intrapartum fetal monitoring trials in the MFMU Network. Semin Perinatol. 2016 Apr 29.

102. Chandraharan E, Arulkumaran S. Prevention of birth asphyxia: responding appropriately to cardiotocograph (CTG) traces. Best Pract Res Clin Obstet Gynaecol. 2007 Aug;21(4):609–24.

55 103. Devoe LD. Future perspectives in intrapartum fetal surveillance. Best Pract Res Clin Obstet Gynaecol. 2016 Jan;30:98–106.

104. Nunes I, Ayres-de-Campos D. Computer analysis of foetal monitoring signals. Best Pract Res Clin Obstet Gynaecol. 2016 Jan;30:68–78.

105. Brocklehurst P, INFANT Collaborative Group. A study of an intelligent system to support decision making in the management of labour using the cardiotocograph - the INFANT study protocol. BMC Pregnancy . 3rd ed. BioMed Central; 2016;16(1):10.

106. Ayres-de-Campos D, Ugwumadu A, Banfield P, Lynch P, Amin P, et al. A randomised clinical trial of intrapartum fetal monitoring with computer analysis and alerts versus previously available monitoring. BMC Pregnancy Childbirth. 2010 Oct 28;10:71. doi: 10.1186/1471- 2393-10-71.

107. Nelson KB, Dambrosia JM, Ting TY, Grether JK. Uncertain value of electronic fetal monitoring in predicting cerebral palsy. N Engl J Med. 1996 Mar 7;334(10):613–8.

108. Clark SL, Hankins GDV. Temporal and demographic trends in cerebral palsy--fact and fiction. Am J Obstet Gynecol. 2003 Mar;188(3):628–33.

109. MacLennan AH, Thompson SC, Gecz J. Cerebral palsy: causes, pathways, and the role of genetic variants. Am J Obstet Gynecol. 2015 Dec;213(6):779–88.

110. Blair E, Stanley FJ. Intrapartum asphyxia: A rare cause of cerebral palsy. The Journal of Pediatrics. Mosby; 1988 Apr 1;112(4):515–9.

111. Hagberg B, Hagberg G, Beckung E, Uvebrant P. Changing panorama of cerebral palsy in Sweden. VIII. Prevalence and origin in the birth year period 1991-94. Acta Paediatr. 2001 Mar;90(3):271–7.

56

Manuscripts / Published papers

57

58

Paper I

59 AC TA Obstetricia et Gynecologica

AOGS MAIN RESEARCH ARTICLE Curriculum development for a national cardiotocography education program: a Delphi survey to obtain consensus on learning objectives LINE THELLESEN1, MORTEN HEDEGAARD1, THOMAS BERGHOLT2, NINA P. COLOV1, STINNE HOEGH1 & JETTE L. SORENSEN1 1Department of Obstetrics, Juliane Marie Center for Children, Women, and Reproduction, Rigshospitalet University Hospital/University of Copenhagen, Copenhagen, and 2Department of Gynecology and Obstetrics, Nordsjaellands Hospital/University of Copenhagen, Hillerod,€ Denmark

Key words Abstract Cardiotocography, fetal monitoring, obstetrics, Delphi technique, curriculum, Objective. To define learning objectives for a national cardiotocography (CTG) consensus, medical education education program based on expert consensus. Design. A three-round Delphi survey. Population and setting. One midwife and one obstetrician from each Correspondence maternity unit in Denmark were appointed based on CTG teaching experience Line Thellesen, Department of Obstetrics, and clinical obstetric experience. Methods. Following national and international Juliane Marie Center for Children, Women, guidelines, the research group determined six topics as important when using and Reproduction, Rigshospitalet University Hospital/University of Copenhagen, CTG: fetal physiology, equipment, indication, interpretation, clinical manage- Blegdamsvej 9, Copenhagen 2200, Denmark. ment, and communication/responsibility. In the first Delphi round, partici- E-mail: [email protected] pants listed one to five learning objectives within the predefined topics. Responses were analyzed by a directed approach to content analysis. Phrasing Conflict of interest was modified in accordance with Bloom’s taxonomy. In the second and third The authors have stated explicitly that there Delphi rounds, participants rated each objective on a five-point relevance scale. are no conflicts of interest in connection with Consensus was predefined as objectives with a mean rating value of ≥3. Main this article. outcome measures. A prioritized list of CTG learning objectives. Results. A total Please cite this article as: Thellesen L, of 42 midwives and obstetricians from 21 maternity units were invited to par- Hedegaard M, Bergholt T, Colov NP, Hoegh ticipate, of whom 26 completed all three Delphi rounds, representing 18 mater- S, Sorensen JL. Curriculum development for a nity units. The final prioritized list included 40 objectives. The highest ranked national cardiotocography education objectives emphasized CTG interpretation and clinical management. The lowest program: a Delphi survey to obtain consensus ranked objectives emphasized fetal physiology. Mean ratings of relevance ran- on learning objectives. Acta Obstet Gynecol ged from 3.15 to 5.00. Conclusions. National consensus on CTG learning objec- Scand 2015; 94: 869–877. tives was achieved using the Delphi methodology. This was an initial step in Received: 10 December 2014 developing a valid CTG education program. A prioritized list of objectives will Accepted: 12 April 2015 clarify which topics to emphasize in a CTG education program.

DOI: 10.1111/aogs.12662 Abbreviation: CTG, cardiotocography.

Introduction In severe cases, fetal hypoxia can lead to brain injury and Key Message death. Cardiotocography (CTG) is a widely used surveil- lance method that aims to identify fetal hypoxia to decide Learning objectives are essential when planning and whether additional fetal assessment or accelerated delivery developing an educational intervention. A prioritized is required. However, studies show that misinterpretation list of CTG learning objectives based on national con- of, and delayed clinical actions to, an abnormal CTG are sensus will clarify which topics to emphasize in a significant etiological factors for hypoxic brain injuries CTG education program.

ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877 869 60 Objectives for a CTG education program L. Thellesen et al. during labor (1–3). In addition to the considerable effect uted over five regions and the number of deliveries varied on families and obstetric staff, these cases also result in from 238 to 6659 per year (10). large financial costs. A 10-year report on maternity claims The research group consisted of one obstetric resident from the English National Health Service Litigation conducting research on education in CTG (L.T.), one Authority was published in 2012 (4). The total amount of obstetrician with a master’s degree and research experi- compensation awarded was £3.1 billion. The three most ence in medical education (J.L.S.), two obstetricians with frequent categories of claims were those relating to man- several years of clinical and research experience (M.H., agement of labor (including CTG interpretation), cerebral T.B.), one obstetrician with extensive experience in CTG palsy, and cesarean section. The two former accounted teaching (N.P.C.), and one midwife experienced in peri- for 70% of the total value of all the claims. It was con- natal audit (S.H.). cluded from these studies that CTG training and educa- tion are essential. Delphi methodology To reduce the incidence of hypoxic injuries during labor, a comprehensive national obstetric intervention (Safe deliv- We used a Delphi survey to obtain national consensus on eries) was initiated in Denmark in September 2012 (5). The learning objectives for a national CTG education pro- Danish regions, the Danish Society of Obstetrics and Gyne- gram. The Delphi methodology attempts to obtain expert cology, the Danish Association of Midwives, the Danish opinion in a systematic manner (11). The group of Society of Pediatrics, the Danish Society for Patient Safety experts is generally referred to as the Delphi panel or the and the Patient Compensation Association all support the expert panel, and the experts’ opinions are usually col- initiative. As part of the intervention, all midwives and lected by self-administered questionnaires. The survey is physicians working at a maternity unit in Denmark must characterized by a systematic group communication pro- complete a CTG education program. The education pro- cess that involves a number of rounds, feedback of gram consists of a CTG e-learning program, a 1-day CTG responses to participants between rounds, opportunities course, and a CTG multiple-choice question test. There are for participants to modify their responses, and anonymity several publications concerning the impact of CTG educa- of responses (12). The aim is to combine expert opinion tion but a lack of validated assessment methods has been into a group consensus (13). The method benefits from indicated (6). Learning objectives based on national con- being anonymous and so avoids disproportional domi- sensus will increase the validity of the education program, nance from influential persons (14). Geographic limita- including both the course and the assessment. tions are reduced, which enables responses from a large There are detailed CTG curricula, descriptions of CTG group of experts, and the expenses are low if the survey is examination contents, and reports on content validation electronically conducted. This methodology has been used from CTG education programs in Australia and the USA to develop evaluation tools (15,16), diagnostic criteria (7,8) but we were unable to identify published articles (17), research questions (18), curricula (19), and learning concerning the development of CTG learning objectives. objectives (20) in health care research. Objectives are essential when planning and developing an education program. They constitute the foundation of the Selection of participants to the Delphi panel content, the teaching strategies, and the assessment (9). We aimed to develop learning objectives for the The Delphi panel in the present study consisted of experi- national CTG education program based on a systematic enced midwives and obstetricians from all Danish mater- consensus methodology. To induce joint ownership of the nity units. Experience was defined as midwives and national teaching intervention and detect possible differ- obstetricians with CTG teaching experience and more ences between the individual maternity units, we involved than 5 years of clinical obstetric experience. The obstetric experienced midwives and obstetricians from all mater- management (chief physician or chief midwife) from each nity units in Denmark in the developing process. maternity unit was contacted and asked to appoint a midwife and an obstetrician, following the given inclusion Material and methods criteria. The study originated in the Department of Obstetrics, at Questionnaire content and Delphi consensus the Juliane Marie Center for Children, Women and Reproduction, Rigshospitalet University Hospital, Univer- Prior to the first Delphi round, the research group deter- sity of Copenhagen. The study took place from December mined six important topics for using CTG, based on 2012 to April 2013. All 24 Danish maternity units were national and international guidelines on electronic invited to participate. The maternity units were distrib- fetal monitoring (21–25): fetal physiology, equipment,

870 ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877

61

L. Thellesen et al. Objectives for a CTG education program

indication, interpretation, clinical management, and com- questionnaire in Google Drive. The questionnaire was con- munication/responsibility. For each of the topics we asked structed in six steps, according to the predefined topics. questions to clarify the task of developing learning objec- In the second and third Delphi rounds, in order to avoid tives: ‘What knowledge about fetal physiology do you find missing data, restrictions in the questionnaire made it essential to have as a midwife or an obstetrician responsible impossible to continue to the next step without complet- for CTG monitoring?’ and ‘Which CTG interpretation skills ing the previous one. For each round, the participants do you find essential to have as a midwife or obstetrician had 4 weeks to respond. Within that period three e-mail responsible for CTG monitoring?’ The participants were reminders where sent at 1-week intervals. The flowchart informed that objectives could address knowledge, atti- in Figure 1 describes the three Delphi rounds. Supple- tudes, and skills. In subsequent Delphi rounds the partici- mentary details are outlined in Supporting Information pants were asked to rate the relevance of the objectives Appendix S1. on a five-point scale (1 = not relevant, 2 = less relevant, Study description and participatory conditions were 3 = relevant, 4 = very relevant and 5 = extremely rele- sent by e-mail. The first author (L.T.) was not blinded to vant). The study was predefined to go through three participants and their individual responses. Otherwise, rounds. Consensus was predefined as objectives with a the study was anonymously conducted. Participation mean rating value of ≥3. agreement was given by activating a link in the mail to the Delphi questionnaire. The study was evaluated by the Regional Ethical Committee of the Capital Region of Questionnaire design and administration Denmark, and ethical approval was not required accord- All participants were contacted by e-mail. The mail ing to Danish regulations (Protocol number: H-1-2012- contained study information and a link to the Delphi FSP).

Method and analyses Delphi panel members Delphi rounds (number of maternity units)

•The management of each Danish maternity unit appointed the Delphi panel 42 (21) members 21 midwives 21 obstetricians The research group •statedsixtopicsimportantwhenusing CTG founded on national and international guidelines •E-mailedpanel membersthe firstDelphi round questionnaire

1. Panel members listed one to The research group 31 (20) five learning objectives within each • Condensed data based on a directed 15 midwives predefined topic approach to content analysis 16 obstetricians • Developed objectives based on Bloom’s taxonomy • E-mailed panel members the objectives

2. Panel members rated the The research group relevance of each objective on a •Revisedthe objectives five-point scale, suggested new objectives and commented on •Assessedthe ratings of the objectivesby 29 (19) means and distribution of ratings in existing ones. 14 midwives percent for each objective 15 obstetricians •E-mailedpanel membersthe revised objectives, including the ratings

•The research group developed a 26 (18) 3. Panel members re-rated the prioritized list of objectives based on 13 midwives relevance of each objective. mean values 13 obstetricians Comments were not encouraged.

Figure 1. Description of the three Delphi rounds, number of respondents, and analyses.

ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877 871

62

Objectives for a CTG education program L. Thellesen et al.

minor revisions. A list of comments and modifications is Data analyses available from the authors on request. A directed approach to content analysis was applied in Ratings of relevance over two Delphi rounds resulted processing the data (26). The structure and phrasing of in a prioritized list of objectives presented in Table 1. the objectives were based on Bloom’s taxonomy to secure There were no missing values within respondents. One meaningful and consistent use of verbs (27). Supplemen- objective was excluded due to a mean rating value <3, so tary details are outlined in Appendix S1. The rating of 40 objectives were included in the final list. the objectives was assessed by means, and the variances of responses were presented as the distribution of ratings in Discussion percent for each objective. Data were compiled and assessed in EXCEL 2010 (Microsoft Corp., Redmond, This national Delphi survey, where we aimed to develop WA, USA). CTG learning objectives for a national CTG education program, resulted in a prioritized list of 40 objectives. The Results highest-ranked objectives centered on CTG interpretation and clinical management, and the lowest-ranked objectives Figure 1 depicts an overview of the study. Joint manage- emphasized fetal physiology. This allocation of topics is ment occurred in six of the 24 maternity units, which compatible with the examination content of the electronic meant that in three cases, two units had common obstet- fetal monitoring examinations of the National Certifica- ric management. A total of 21 units were therefore tion Corporation (8), recommended by the American included. The inclusion criteria of CTG teaching experi- Congress of Obstetricians and Gynecologists. ence could not be sufficiently honored at three of the Five objectives were rated as being extremely relevant maternity units (two midwives and two obstetricians) due by all participants. These addressed skills in CTG classifi- to small size of maternity unit and/or lack of coordinated cation and evaluation of contractions, and decision-mak- local CTG courses. All participants had more than 5 years ing concerning supplying surveillance and immediate of clinical experience. delivery. Seven items were rated below 4 (very relevant). Of the 42 invited participants 33 responded in the first They addressed knowledge about placental and fetal cir- round, representing 20 maternity units. Two participants culation, the autonomic nervous systems, metabolism, did not write any text in the questionnaire and were and CTG equipment. Surprisingly, obtaining informed therefore excluded, so the total number of first round consent was among the lowest rated. We do not have participants equaled 31. A total of 29 participants data to explain this priority. The objectives rated between responded in the second round, and 26 participants in 4 and 5 represented all six predefined topics, but detected the third round, representing 19 and 18 maternity units, the same pattern as seen in the highest and lowest rated respectively. The overall participant response rate was groups, i.e. skills were generally rated higher than knowl- 62% (26 of 42). The distribution of midwives and obste- edge. tricians who responded was equal in the three rounds, Learning objectives are essential in teaching and assess- and all five regions of Denmark were represented. Six ment processes, and are required when planning and maternity units function as highly specialized units in developing an education program. A prioritized list of Denmark and manage approximately half of all Danish CTG learning objectives is most likely useful to CTG births. All six were represented in the first two Delphi instructors and educational program developers, as it pro- rounds and five were represented in the third round. vides information on core topics to be emphasized in the A total of 536 responses were collected after the first teaching material and assessment tools. Learning objec- Delphi round. The responses varied from well-con- tives based on national consensus will increase content structed learning objectives, to cues, long descriptive sen- validity of the planned CTG course and CTG assessment, tences, questions, and reflective remarks. The responses and should increase joint ownership of the education were condensed to 41 learning objectives during the con- program. This will expectedly enhance the feasibility and tent analysis. implementation of the CTG education program on a No new objectives were suggested. Of the 29 respon- national level. dents in the second round, only four obstetricians and The Delphi methodology is a widely used and accepted two midwives from the Delphi panel commented on the consensus method. One of the criticisms of the method is objectives. The comments emphasized the cognitive level the lack of definitions associated with the process (13). of knowledge, and objectives concerning ST analysis of We chose an open-ended first Delphi round inspired by fetal electrocardiography and fetal scalp-blood sampling. Stefanidis et al. (18). This is the classical Delphi method The research group evaluated the comments and made and should allow more freedom in the responses from

872 ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877

63 L. Thellesen et al. Objectives for a CTG education program

Table 1. Prioritized list of 41 CTG learning objectives.

Distribution of ratings (percent*) Mean Delphi rating Not Less Very Extremely Learning objectives ranked according to third round ratings round [rank] relevant relevant Relevant relevant relevant

Define and evaluate baseline, variability, accelerations and 3 5.00 [1] 26 (100) decelerations 2 5.00 [1] 29 (100) Classify a CTG as normal, intermediate, pathological or 3 5.00 [2] 26 (100) preterminal 2 5.00 [2] 29 (100) Identify CTG patterns that require immediate delivery 3 5.00 [3] 26 (100) 2 4.93 [3] 1 (3.4) 28 (96.6) Evaluate the pattern of contractions, comprising frequency, 3 5.00 [4] 26 (100) duration and interval and describe the maximum number of 2 4.79 [8] 6 (20.7) 23 (79.3) contractions accepted pr. 10 minutes Identify CTG patterns in which supplying fetal surveillance is 3 5.00 [5] 26 (100) needed 2 4.79 [7] 6 (20.7) 23 (79.3) Suggest and initiate relevant action on an intermediate, 3 4.96 [6] 1 (3.8) 25 (96.2) pathological, and preterminal CTG based on joint consideration 2 4.90 [4] 3 (10.3) 26 (89.7) of medical history, clinical information and CTG interpretation Identify the staff that must be involved at an intermediate, 3 4.96 [7] 1 (3.8) 25 (96.2) pathological, and preterminal CTG, respectively, and consider 2 4.69 [13] 1 (3.4) 7 (24.1) 21 (72.4) how fast they must be involved Determine when a CTG monitoring is indicated 3 4.96 [8] 1 (3.8) 25 (96.2) 2 4.72 [12] 1 (3.4) 6 (20.7) 22 (75.9) Demonstrate how to apply a fetal scalp electrode 3 4.96 [9] 1 (3.8) 25 (96.2) 2 4.83 [6] 5 (17.2) 24 (82.8) Identify an insufficient CTG monitoring and state possible 3 4.96 [10] 1 (3.8) 25 (96.2) ways to optimize the signal 2 4.66 [15] 1 (3.4) 8 (27.6) 20 (69.0) Identify fetuses who are in increased risk of developing 3 4.96 [11] 1 (3.8) 25 (96.2) asphyxia and discuss the terms high- and low-risk 2 4.76 [9] 1 (3.4) 5 (17.2) 23 (79.3) and high- and low-risk labors Explain in which situations a CTG must be interpreted/ 3 4.92 [12] 2 (7.7) 24 (92.3) re-interpreted 2 4.72 [10] 8 (27.6) 21 (72.4) Recognize professional boundaries and consult with a 3 4.92 [13] 2 (7.7) 24 (92.3) colleague when in doubt 2 4.86 [5] 4 (13.8) 25 (86.2) Perform a fetal scalp-blood sampling (only obstetricians) 3 4.92 [14] 2 (7.7) 24 (92.3) 2 4.41 [22] 5 (17.2) 7 (24.1) 17 (58.6) Operate the CTG equipment and correctly apply transducers 3 4.92 [15] 2 (7.7) 24 (92.3) 2 4.45 [21] 1 (3.4) 4 (13.8) 5 (17.2) 19 (65.5) Identify specific CTG patterns, such as a sinusoidal and 3 4.88 [16] 3 (11.5) 23 (88.5) sleeping pattern 2 4.72 [11] 1 (3.4) 6 (20.7) 22 (75.9) Use the correct terminology when communicating about CTG 3 4.88 [17] 3 (11.5) 23 (88.5) 2 4.66 [14] 10 (34.5) 19 (65.5) Explain advantages and disadvantages for external and internal 3 4.85 [18] 4 (15.4) 22 (84.6) CTG, and explain contraindications for the internal 2 4.59 [16] 12 (41.4) 17 (58.6) monitoring Interpret a fetal scalp-blood sample 3 4.85 [19] 1 (3.8) 2 (7.7) 23 (88.5) 2 4.48 [20] 2 (6.9) 11 (37.9) 16 (55.2) Identify possible sources of error when monitoring the fetus 3 4.85 [20] 1 (3.8) 2 (7.7) 23 (88.5) with CTG 2 4.52 [18] 3 (10.3) 8 (27.6) 18 (62.1) Discuss possible causes of fetal hypoxia 3 4.77 [21] 6 (23.1) 20 (76.9) 2 4.48 [19] 1 (3.4) 13 (44.8) 15 (51.7) Evaluate in which situations a CTG should be continuous or 3 4.77 [22] 1 (3.8) 4 (15.4) 21 (80.8) intermittent, respectively 2 4.31 [26] 1 (3.4) 3 (10.3) 10 (34.5) 15 (51.7) Explain what a CTG monitors and describe the components of 3 4.77 [23] 1 (3.8) 4 (15.4) 21 (80.8) the CTG equipment 2 4.28 [27] 6 (20.7) 9 (31.0) 14 (48.3)

ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877 873

64 Objectives for a CTG education program L. Thellesen et al.

Table 1. Continued

Distribution of ratings (percent*) Mean Delphi rating Not Less Very Extremely Learning objectives ranked according to third round ratings round [rank] relevant relevant Relevant relevant relevant

Evaluate the tonus of 3 4.58 [24] 11 (42.3) 15 (57.7) 2 4.21 [30] 1 (3.4) 3 (10.3) 13 (44.8) 12 (41.4) Explain and utilize local CTG guidelines 3 4.58 [25] 1 (3.8) 9 (34.6) 16 (61.5) 2 4.55 [17] 1 (3.4) 11 (37.9) 17 (58.6) Describe differences between antepartum and intrapartum 3 4.58 [26] 2 (7.7) 7 (26.9) 17 (65.4) CTG 2 4.28 [28] 1 (3.4) 3 (10.3) 12 (41.4) 13 (44.8) Explain the distribution of responsibility during CTG 3 4.50 [27] 2 (7.7) 9 (34.6) 15 (57.7) monitoring, comprising the responsibility for CTG 2 4.41 [23] 1 (3.4) 2 (6.9) 10 (34.5) 16 (55.2) interpretation and clinical decision-making Interpret a cord-blood sample and differentiate between 3 4.42 [28] 15 (57.7) 11 (42.3) metabolic and respiratory acidosis 2 4.38 [24] 1 (3.4) 16 (55.2) 12 (41.4) Document according to the record-keeping requirement and 3 4.42 [29] 1 (3.8) 13 (50.0) 12 (46.2) discuss which data is relevant to document during CTG 2 4.28 [29] 1 (3.4) 2 (6.9) 13 (44.8) 13 (44.8) monitoring Discuss possible physiological causes to different CTG changes, 3 4.38 [30] 16 (61.5) 10 (38.5) comprising how medicine, labor, and maternal conditions can 2 4.31 [25] 2 (6.9) 16 (55.2) 11 (37.9) affect the fetus Describe the difference between CTG and STAN (ST segment 3 4.35 [31] 2 (7.7) 1 (3.8) 1 (3.8) 4 (15.4) 18 (69.2) analysis) 2 4.03 [34] 2 (6.9) 1 (3.4) 5 (17.2) 7 (24.1) 14 (48.3) Describe the differences concerning CTG interpretation in 3 4.31 [32] 1 (3.8) 16 (61.5) 9 (34.6) preterm and term fetuses 2 4.17 [31] 3 (10.3) 18 (62.1) 8 (27.6) Explain the fetal defenses against and consequences of lack of 3 4.31 [33] 6 (23.1) 6 (23.1) 14 (53.8) oxygen, involving the terms hypoxia, hypoxemia and asphyxia 2 4.10 [32] 1 (3.4) 7 (24.1) 9 (31.0) 12 (41.4) Describe the fall in pH values during the normal labor and at 3 4.08 [34] 2 (7.7) 20 (76.9) 4 (15.4) total umbilical cord clamping 2 4.03 [33] 1 (3.4) 3 (10.3) 19 (65.5) 6 (20.7) Describe the function of placenta and explain how the 3 3.77 [35] 1 (3.8) 8 (30.8) 13 (50.0) 4 (15.4) function can be affected by labor 2 3.69 [35] 2 (6.9) 11 (37.9) 10 (34.5) 6 (20.7) Explain the ability of the CTG to identify whether or not a 3 3.77 [36] 12 (46.2) 8 (30.8) 6 (23.1) fetus is affected by hypoxia (sensibility and specificity) 2 3.48 [36] 1 (3.4) 1 (3.4) 13 (44.8) 11 (37.9) 3 (10.3) Describe the differences between aerobic and anaerobic 3 3.38 [37] 5 (19.2) 9 (34.6) 9 (34.6) 3 (11.5) metabolism 2 3.41 [37] 1 (3.4) 4 (13.8) 11 (37.9) 8 (27.6) 5 (17.2) Obtain informed consent to CTG monitoring 3 3.35 [38] 2 (7.7) 14 (53.8) 9 (34.6) 1 (3.8) 2 3.38 [38] 5 (17.2) 12 (41.4) 8 (27.6) 4 (13.8) Explain the influence of the autonomic nervous systems on the 3 3.15 [39] 5 (19.2) 12 (46.2) 9 (34.6) fetal heart rate 2 3.14 [39] 1 (3.4) 6 (20.7) 12 (41.4) 8 (27.6) 2 (6.9) Describe the circulation of the placenta and the fetus 3 3.15 [40] 4 (15.4) 14 (53.8) 8 (30.8) 2 3.14 [40] 1 (3.4) 7 (24.1) 11 (37.9) 7 (24.1) 3 (10.3) Describe how fetal hemoglobin differs from adult hemoglobin 3 2.19 [41] 4 (15.4) 13 (50.0) 9 (34.6) (Excluded due to mean rating below three) 2 2.21 [41] 9 (31.0) 8 (27.6) 9 (31.0) 3 (10.3)

*As the percentage is rounded up or down it does not equal 100 in all cases.

the panel (13). The Delphi methodology has been criti- study, but might have been elaborated on had there been cized for giving too much power to the research team a topic about it. and forcing consensus. By having an open-ended first To reduce selection bias, the panel members were round, we endeavored to minimize this effect. However, appointed by obstetric management from each maternity predefining six topics important when using CTG might unit. The optimal size of a Delphi expert panel is not have limited, instead of inspired, the participants’ sugges- defined in the literature; 10–50 participants have been tions on objectives. The National Certification Corpora- suggested as standard, but Delphi panel sizes in health tion’s fetal monitoring examination also contains care research vary from a few to several thousand partici- questions on legal aspects (8). Objectives concerning legal pants (12). The high dropout rate during the first round aspects are represented in some of the objectives in this could be due to the time-consuming open-ended

874 ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877

65 L. Thellesen et al. Objectives for a CTG education program

approach, and the fact that participants were appointed this study, individual curriculum developers have the pos- and did not necessarily participate due to self-motivation. sibility to choose all objectives or only objectives at cer- In the second and third rounds, few panelists dropped tain levels of relevance. Table 1 shows that the highest out. This may be due to less time-consuming rounds and ranked objectives were generally rated higher in the third the frequently repeated reminders, which have a powerful than the second round, whereas the lowest ranked objec- influence on the response rate (28). A minimum response tives in the second round were rated lower or alike in the rate of 70% is suggested to maintain rigor, but no clear third round. The variances of responses showed a reduced definition exists (12). Our response rate of 62% equals tendency in the third round, implying greater consensus previous research findings (16,18,19) but still represents a among the panelists. limitation to the study. The dropouts did not distort the The process of content analysis is challenging. There is distribution of participants’ profession, geographic distri- a risk of losing important information, and in our study bution or the distribution of highly specialized maternity the content analysis proceeded over several steps and was units, and we perceive that the study represents opinions time-consuming. We aimed to incorporate all informa- from both midwives and obstetricians at a national level, tion in the objectives, allowing the participants to con- despite dropouts. The first author was not blinded, due sider the relevance. This resulted in very specific to an iterative e-mail correspondence with the Delphi objectives and a high number of objectives. panel members. The study did not concern sensitive per- The Delphi methodology is not one specific method; sonal data and we do not believe that the un-blinded there are many different views on which methodology is method affected the results in a considerable way. the proper Delphi (12). Prospectively, the current design The Delphi panel constituted experienced midwives might be optimized if the research group (constituting and obstetricians, which we believe contributed to the educationalists and all stakeholders) developed the learn- construction of clinically relevant objectives. The CTG ing objectives and the Delphi panel members rated, com- education program is intended for both midwives and mented on, and suggested additional objectives. This physicians and we found it essential for both the Delphi would be expected to lead to a higher response rate, fewer panel and the research group to reflect this inter-profes- Delphi rounds, and diminished content analysis processes, sional approach. However, the choice of panel members without considerably different objectives. is always disputable. This study might have benefitted The CTG learning objectives developed in this study from including obstetric residents, less experienced mid- cover the skills and the knowledge required for midwives wives, maternity unit managers, patients, and profession- and obstetricians working with CTG. It is meant to con- als within patient safety. In Burden et al.’s Delphi study stitute a flexible template in which all or some of the concerning a laparoscopic curriculum in gynecology (29), objectives can be used as the foundation for an education the expert panel comprised laparoscopic surgeons, senior program. Some of the objectives require simulation-based residents and medical educationalists. Consensus was learning and team training, others indicate small group validated by first and second year residents and the study teaching, and some are suitable for class-room teaching showed a high agreement between the experts and the or self-tuition. Economics, time, logistics, staff capacities, residents on categories to be included in the curriculum, and assessment tools are important for choosing which but disagreements concerning teaching mode, simulator learning objectives to implement. use and the requirement for a final assessment. Prospec- tively, it seems relevant, but resource-intensive, to include Conclusions all stakeholders when developing an education program. The definition of consensus in the Delphi methodology National consensus on CTG learning objectives was can be expressed in different ways. Some definitions achieved using the Delphi methodology; this was an ini- include elements above a certain mean score (11), Cron- tial step in developing a valid CTG education program. A bach’s alpha above 0.80 or 0.90 (15,17,19) or a minimum prioritized list of objectives will clarify which topics to percentage of the experts rating an element by certain val- emphasize in a CTG education program. This study pro- ues on a scale (16). We predefined consensus as objec- vides a template to be used in developing learning objec- tives with a mean rating of ≥3 on a five-point scale, tives using a transparent methodology relevant for other based on Fink et al.’s consensus method guideline (11). areas of postgraduate education. This implied that nearly all objectives were included, and one must consider whether the predefined consensus was Acknowledgments set too low. We believe that important information is contained in all of the 40 objectives. By including the rel- We thank all the midwives and obstetricians who par- evant, very relevant and extremely relevant objectives in ticipated in the study and thank the obstetric manage-

ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877 875

66 Objectives for a CTG education program L. Thellesen et al. ment from the Danish maternity units for appointing 11. Fink A, Kosecoff J, Chassin M, Brook RH. Consensus the participants. For advice on content analysis we methods: characteristics and guidelines for use. Am J thank Ann-Helen Henriksen, educationalist at the Cen- Public Health. 1984;74:979–83. ter for Clinical Education (CEKU). For technical assis- 12. Mullen PM. Delphi: myths and reality. J Health Organ tance concerning the electronic survey we thank Manag. 2003;17:37–52. Flemming Bjerrum MD, PhD-fellow at the Juliane Marie 13. Keeney S, Hasson F, McKenna HP. A critical review of the Center, Rigshospitalet. Delphi technique as a research methodology for nursing. Int J Nurs Stud. 2001;38:195–200. 14. Jones J, Hunter D. Consensus methods for medical and Funding health services research. BMJ. 1995;311:376–80. 15. Palter VN, MacRae HM, Grantcharov TP. Development of The study was funded by Trygfonden, Tømmerhandler an objective evaluation tool to assess technical skill in Johannes Fog s Fond, Aase og Ejnar Danielsens Fond, and 0 laparoscopic colorectal surgery: a Delphi methodology. Am Department of Obstetrics, the Juliane Marie Center, Rigs- J Surg. 2011;201:251–9. hospitalet. None of the funders had a role in the study 16. Tolsgaard MG, Todsen T, Sorensen JL, Ringsted C, design, data collection, data analyses or writing of the Lorentzen T, Ottesen B, et al. International multispecialty manuscript. consensus on how to evaluate ultrasound competence: a Delphi consensus survey. PLoS One. 2013;8:e57687. References 17. Graham B, Regehr G, Wright JG. Delphi as a method to establish consensus for diagnostic criteria. J Clin 1. Berglund S, Grunewald C, Pettersson H, Cnattingius S. Epidemiol. 2003;56:1150–6. Severe asphyxia due to delivery-related malpractice in 18. Stefanidis D, Arora S, Parrack DM, Hamad GG, Capella J, Sweden 1990–2005. BJOG. 2008;115:316–23. Grantcharov T, et al. Research priorities in surgical 2. Hove LD, Bock J, Christoffersen JK, Hedegaard M. simulation for the 21st century. Am J Surg. 2012;203:49–53. Analysis of 127 peripartum hypoxic brain injuries from 19. Palter VN, Graafland M, Schijven MP, Grantcharov TP. closed claims registered by the Danish Patient Insurance Designing a proficiency-based, content validated virtual Association. Acta Obstet Gynecol Scand. 2008;87:72–5. reality curriculum for laparoscopic colorectal surgery: a 3. Evers ACC, Brouwers HAA, Nikkels PGJ, Boon J, van Delphi approach. Surgery. 2012;151:391–7. Egmond-Linden A, Groenendaal F, et al. Substandard care 20. Bachmann C, Abramovitch H, Barbu CG, Cavaco AM, in delivery-related asphyxia among term infants: Elorza RD, Haak R, et al. A European consensus on prospective cohort study. Acta Obstet Gynecol Scand. learning objectives for a core communication curriculum 2013;92:85–93. in health care professions. Patient Educ Couns. 4. National Health Service Litigation Authority. Ten years of 2013;93:18–26. maternity claims. An analysis of NHS Litigation Authority 21. American College of Obstetricians and Gynecologists. data. London: NHS Litigation Authority, 2012. ACOG Practice Bulletin No. 106: intrapartum fetal heart 5. Sikre fødser [Safe deliveries] Available online at: http:// rate monitoring: nomenclature, interpretation, and general www.regioner.dk/sundhed/kvalitet/patientsikkerhed/ management principles. Obstet Gynecol. 2009;114:192–202. sikre+fødsler (accessed December 10, 2014). 22. The National Institute for Health and Clinical Excellence. 6. Pehrson C, Sorensen JL, Amer-Wahlin I. Evaluation and Intrapartum care. Care of healthy women and their babies impact of cardiotocography training programmes: a during childbirth. Guideline. London: NICE, 2007. systematic review. BJOG. 2011;118:926–35. 23. The Royal Australian and New Zealand College of 7. The Royal Australian and New Zealand College of Obstetricians and Gynaecologists. Intrapartum surveillance, Obstetricians and Gynaecologists. Fetal Surveillance clinical guideline. East Melbourne: RANZCOG, 2006. Education Program (FSEP). Available online at: http:// 24. The Society of Obstetricians and Gynaecologists of www.fsep.edu.au (accessed December 10, 2014). Canada. Fetal health surveillance: antepartum and 8. The National Certification Corporation (NCC). Electronic intrapartum consensus guideline. Ottawa: SOGC, 2008. fetal monitoring. Available online at: http:// 25. The Danish Society of Obstetrics and Gynaecology www.nccwebsite.org/certification/Exam-detail.aspx?eid=18 Guidelines. Asphyxia (2010), Doorstep CTG (2001), Scalp (accessed December 10, 2014). pH/Scalp lactate (2010). Copenhagen: DSOG. 9. Kern DE, Thomas PA, Howard DM, Bass EB. 26. Hsieh H-F, Shannon SE. Three approaches to qualitative Curriculum Development for Medical Education. A six content analysis. Qual Health Res. 2005;15:1277–88. step approach. London: The Johns Hopkins University 27. Anderson LW, Krathwohl DR, Airasian PW, Cruikshank Press, 1998: 1–37. KA, Mayer RE, Pintrich PR, et al. A taxonomy for 10. Fødselsstatistikken [Danish Birth Statistics]. Copenhagen: learning, teaching, and assessing. Boston: Addison Wesley Statens Serum Institut, 2012. Longman, 2001. Section I-III.

876 ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877

67 L. Thellesen et al. Objectives for a CTG education program

28. Burns KEA, Duffett M, Kho ME, Meade MO, Adhikari NKJ, Sinuff T, et al. A guide for the design and conduct of self- Supporting information administered surveys of clinicians. CMAJ. 2008;179:245–52. Additional Supporting Information may be found in the 29. Burden C, Fox R, Lenguerrand E, Hinshaw K, Draycott TJ, online version of this article: James M. Curriculum development for basic gynaecological laparoscopy with comparison of expert Appendix S1. Supplementary details on Delphi rounds, trainee opinions; prospective cross-sectional observational content analysis and development of objectives. study. Eur J Obstet Gynecol Reprod Biol. 2014;180:1–7.

ª 2015 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica 94 (2015) 869–877 877

68

Paper II

69

Development of a written assessment for a national interprofessional cardiotocography education program

Line Thellesen MD1 Thomas Bergholt MD PhD MSci1 Morten Hedegaard MD PhD1 Nina Palmgren Colov MD1 Karl Bang Christensen, Statistician2 Kristine Sylvan Andersen, Midwife1 Jette Led Sorensen MD MMEd1

1 Department of Obstetrics, The Juliane Marie Centre for Children, Women and Reproduction, Rigshospitalet, University of Copenhagen. Blegdamsvej 9, DK-2100 Copenhagen. 2 Section of Biostatistics, Department of Public Health, University of Copenhagen. Oester Farimagsgade 5, B 2nd floor, DK-1014 Copenhagen K.

Corresponding author: Line Thellesen MD, Department of Obstetrics, Rigshospitalet, University of Copenhagen. Blegdamsvej 9, DK-2100 Copenhagen. [email protected]

70 Abstract Background To reduce the incidence of hypoxic brain injuries among newborns a national cardiotocography (CTG) education program was implemented in Denmark. A multiple-choice question test was integrated as part of the program. The aim of this article was to describe and discuss the test development process, and to introduce a feasible method for written test development in general. Method The test development was based on the unitary approach to validity. The process involved national consensus on learning objectives, standardized item writing, pilot testing, sensitivity analyses, standard setting and evaluation of psychometric properties using Item Response Theory models. Test responses and feedback from midwives, specialists and residents in obstetrics and gynecology, and medical and midwifery students were used in the process (proofreaders n=6, pilot test participants n=118, CTG course participants n=1679). Results The final test included 30 items and standard score was established at 25 correct answers. All items fitted a loglinear Rasch model and the test was able to discriminate levels of competence. Seven items revealed differential item functioning in relation to profession and geographical regions, which means the test is not suitable for measuring differences between midwives and physicians or differences across regions. In the setting of pilot testing Cronbach’s alpha equaled 0.79, whereas Cronbach’s alpha equaled 0.63 in the setting of the CTG education program. This indicates a need for more items and items with a higher degree of difficulty in the test, and illuminates the importance of context when discussing validity. Conclusion Test development is a complex and time-consuming process. The unitary approach to validity was a useful and applicable tool for development of a CTG written assessment. The process and findings supported our proposed interpretation of the assessment as measuring CTG knowledge and interpretive skills. However, for the test to function as a high-stake assessment a higher reliability is required.

Keywords Cardiotocography, Fetal Monitoring, Written assessment, Multiple-choice question, Validity, Interprofessional, Continuing professional development

71 Background Cardiotocography (CTG) is a widely used fetal surveillance method. Errors in the management of CTG are a recognized cause of adverse obstetric outcomes [1, 2]. Omission of use when indicated, misinterpretation, and delay in action are some of the described errors that can lead to severe fetal neurological damage or death. Regular education and training in fetal surveillance to all staff responsible for laboring women is recommended [3]. In 2012, a comprehensive national obstetric intervention (Safe Deliveries) was initiated in Denmark with the aim of increasing the quality of patient care and reducing hypoxia among newborns [4]. The Danish Regions, the Danish Society of Obstetrics and Gynecology, the Danish Association of Midwives, the Danish Pediatric Society, the Danish Society for Patient Safety and the Patient Compensation Association all supported the initiative. As part of the intervention all midwives and physicians working at a maternity unit in Denmark had to complete a CTG education program, consisting of an e-learning program, a one-day course, and a final written assessment. CTG training leads to improved interpretive skills, better management of intrapartum CTG, and a higher-quality care, but a lack of validated assessment methods has been indicated [5]. Comprehensive fetal surveillance education and credentialing programs exist in the United States, in Australia and New Zealand [6, 7], and an intervention similar to Safe Deliveries was implemented in Sweden in 2007 [8]. To ensure coherence to national guidelines and context a separate Danish CTG education and assessment program was developed.

Validity is known to be the single most important factor when discussing assessment, and all assessments require evidence of validity [9]. Validity refers to the evidence presented to support or refute the proposed interpretations of the assessment [10]. Thus, validity can be seen as an argument for the interpretations. Validity is not a definite size but always a matter of degree, neither is it a property of the instrument (in this case the written assessment) but of the interpretations made upon the instrument’s score [9]. Reliability is a necessary component of validity that refers to the reproducibility and consistency of the scores of the assessment [10]. We chose the multiple-choice question (MCQ) format for the assessment in the CTG educational program. In addition to validity and reliability, educational impact, cost effectiveness and acceptability needs to be taken into account in the process of test development [11]. MCQ testing is time- and cost effective and suitable for large groups.

72 The aim of this article was to describe and discuss the process of developing a CTG MCQ test to be used in a national CTG education program, and to introduce a feasible and acknowledged method for written test development in general. In the process we collected evidence to support or refute the proposed interpretation that the assessment measured knowledge, interpretive skills, and clinical decision-making concerning fetal surveillance with CTG.

Methods Setting and context Data collection took place from December 2012 to December 2013. The Danish maternity units (n=24) were distributed among five regions and numbers of annual deliveries ranged from 238 to 6659 [12]. In this study, physicians refer to specialists and residents in obstetrics and gynecology. In Denmark, specialists work mainly within obstetrics (obstetricians), gynecology (gynecologists) or, in smaller units, within both fields. Residency extends over five years and consists of first-year residency followed by second-to-fifth-year residency. The included participants are presented in Figure 1.

Five sources of validity evidence In the present study, we perceive validity as a unitary concept, with construct validity as the overall term [13]. Construct validity refers to what the test is proposed to measure. Evidence to support validity was collected from five sources based on The Standards for Educational and Psychological Testing [14]: content, response process, relations to other variables, consequences, and internal structure, which will be described in detail in the following. The study design is illustrated in Figure 1.

Content (do the items represent the construct?) Learning objectives: Learning objectives are essential when developing an educational intervention, as they define what learners should know and master after the intervention [15]. We developed objectives based on national consensus amongst midwives and obstetricians in a national Delphi study [16]. The content of an assessment should always represent the most important subjects, therefore, objectives with the highest relevance rating constituted the content of the test. Blueprint: Also based on the rated objectives we decided on a five-domain test blueprint: fetal physiology (24 percent), indication (3 percent), equipment (3 percent), classification (33 percent) 73 and management (37 percent). A blueprint is a framework that describes the subcategories (domains) in the test and specifies the proportion of items in each subcategory [9]. MCQ: The MCQ’s were constructed in a one-best-answer format [17-19]. The items consisted of a stem (predominantly a clinical case scenario) and a lead-in question, followed by a series of three or four options. The literature suggests that three options are adequate, but a fourth can be applied when plausible [20]. We emphasized to develop items that required problem solving and clinical reflection and not just recall of knowledge. An obstetrican with profound experience in CTG teaching and clinical use of CTG (NCP) constructed the first draft of items in collaboration with two members of the research group (LT and KSA). An item example is illustrated in Figure 2.

Response process (are the thought processes of the test-takers related to the intended construct?) Proofread: The items were initially evaluated in two rounds of proofreading, in which three of the proofreaders (MH, TB, JLS) were members of the research group (Figure 1). In the first proofreading, item relevance, language, spelling, and academic content were critically reviewed and in the second proofreading, item format and construction. Pilot test: The items were subsequently evaluated in a pilot test, in which the participants represented the intended test-takers; midwives, and specialists and residents in obstetrics and gynecology from all five regions of Denmark (Figure 1). Medical and midwifery students were additionally included in the pilot testing to examine the test’s discrimination abilities. The pilot participants were asked to answer and comment on the test and time for test completion was measured. The pilot testing was conducted during visits to the relevant maternity units and midwifery school. A member of the research team was present during the testing, which allowed both written and verbal feedback, ensured individual test responses, and secured test confidentiality. During the response process the research group iteratively revised items and excluded non- functioning items. At the end of the response process the research group decided which items to implement in the test.

Relations to other variables (are test responses correlated with scores from a similar instrument?) No other CTG test was available to relate to the current test. Therefore, we related the test to level of clinical competences and compared test responses from physicians (obstetricians, first-

74 year residents, and medical students) and midwives (midwives and midwifery students) with expected differentiated level of CTG knowledge and clinical competences. Test responses from pilot participants were used in this sensitivity analysis.

Consequences (how is the passing score determined? What are the consequences for the test- takers? Are patient outcomes improved?) We established a criterion-based standard score for the CTG test using the Contrasting Groups method. This method defines the passing score as the best discriminating point between a competent group and a non-competent group [21]. We defined obstetricians as competent and medical and midwifery students as non-competent. Test responses from pilot participants were used. The consequences of a participant’s test results was a local decision taken between the participant and the clinical director in each maternity unit. Repeated participation in the CTG course and test was possible. A possible improvement in patient outcome will be evaluated in a separate study.

Internal Structure (are the psychometric properties acceptable?) We examined the test’s psychometric properties using the test responses from the participants at the national CTG courses (Figure 1). The analyses are described in the statistics and in Additional file 1.

Statistics Test sensitivity was measured using a Mann-Whitney test. P-values < 0.05 were considered statistical significant. The loglinear Rasch model was used to examine the fit of each item. This Item Response Model integrates both the ability of the test-taker and the difficulty of the item when measuring the probability of a correct answer [22]. Examination of model fit can provide information about how justified it is to measure the construct with the chosen items [23]. Differential item functioning (DIF) was evaluated concerning profession, geographical regions, seniority, and size of maternity unit. DIF arises when an item performs differently in various subgroups [24]. The analyses were adjusted for multiple testing using the Benjamini and Hochberg procedure [25]. P-values < 0.05 were required for statistical significance.

75 Cronbach’s alpha was calculated as an estimate for reliability both in the context of pilot testing and in the context of the CTG education program. A Cronbach’s alpha value above 0.7 is regarded as acceptable, whereas a value above 0.9 is required for high-stake and certification assessments, in which the results can have serious impact on an examinee [9, 24]. Data were entered using double-entry technique. Statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and the DIGRAM software package (Department of Biostatistics, University of Copenhagen, Denmark). Supplementary details on the psychometric properties and the statistical aspects of validation are outlined in Additional file 1.

Results We initially developed 50 items for the national CTG test. Three items were excluded during proofreading and six items during pilot testing. Items were excluded due to similarity, extensive stem text, imprecise response options, different construct than intended, and lack of evidence in relation to item content. We selected 30 items to constitute the test based on the blueprint, the comments and responses from the pilot test participants and the time devoted for completion of the test at the national CTG course. Several items concerning management showed not to function optimally, which meant the initial blueprint could not be completely adhered to. The blueprint was distributed as follows: fetal physiology (27 percent), indication (7 percent), equipment (3 percent), classification (33 percent), and management (30 percent). Proportion of correct answers in the 30-item test among the pilot test participants is presented in Table 1. Cronbach’s alpha equaled 0.79. The sensitivity analysis detected a significant difference in mean test scores between obstetricians and first-year residents, between first-year residents and medical students, and between midwives and midwifery students (Table 2), indicating acceptable test discriminating abilities. We decided on a standard score of 25 correct answers, which was found to be the best discriminating point (Figure 3). The intersection of the two distributions equaled 23, but was adjusted to minimize false-positive errors. The standard score was evaluated on the initial 697 test responses at the CTG courses. A failure rate of 4.6 percent was detected, which was found to be acceptable by the research group and the Safe Deliveries steering committee.

76 A total of 1801 midwives and physicians participated in the one-day CTG courses. Pilot test participants (n=71) and participants without written consent (n=51) were excluded, thus the included number of participants equaled 1679. Table 1 presents the 30 items, along with the proportion of correct answers, the fit of the items to loglinear Rasch model, and the results of DIF analyses. The loglinear Rasch analysis showed a good fit for all items. Evidence of DIF was disclosed in four items related to profession and four items related to regions. No evidence of DIF was disclosed concerning size of maternity unit and seniority. The effect of including and excluding items with DIF are presented and discussed in the appendix, figure and table in Additional file 1, 2 and 3. Many items displayed ceiling effect, which means that a high proportion of the participants answered the item correctly. No floor effect was displayed. Cronbach’s alpha equaled 0.63.

Discussion In this validation study, where we aimed to develop a national CTG MCQ test, we found that the process and findings supported our proposed interpretation of the assessment as measuring CTG knowledge, interpretive skills, and clinical decision-making. The learning objectives’ development and item writing, the proofreading and pilot testing, and the sensitivity and Rasch analyses all underpin this. However, in its current form the test does not meet the criteria for a high-stake examination. More items and items with a higher degree of difficulty need to be integrated to increase the reliability estimate.

The thorough process of learning objectives’ development prior to this study was a robust foundation for the test development process. It generated relevant and coverable test content and a thorough discussion of and clear distinction of the construct of the assessment. The choice of assessment method and format is always disputable; each has its advantages and disadvantages. Nevertheless, there is general agreement that the content of the test is more important than the response format and MCQ’s can if constructed well, test more than simple facts [11]. A written assessment can, however, only be used to measure certain competences. From the perspective of Miller’s pyramid of competence, the written assessment operates on the two lower levels of competence measurement: knows and knows how [26]. If the aim is to obtain information about how midwives and physicians perform in a clinical context (shows how and

77 does in Miller’s pyramid), other assessment methods need to be integrated in the education program.

Valuable information was collected in the response process. An item that aimed to measure knowledge about cord blood pH values turned out to be offensive, as the item addressed the neonatal prognosis associated with a low pH value. The item therefore turned out to be a measure of ethical considerations rather than knowledge. Another test item that aimed to measure clinical decision-making turned out to be a test of reading because the stem text was too comprehensive. Both items were clearly non-functioning items that required extensive revision or exclusion.

The pilot testing was performed on a large sample representing the intended test-takers, which we perceive a strength of the study. Optimally, we should have performed the pilot testing on participants who had completed the CTG course. This was not possible due to simultaneously development of the test and the CTG course. It implied that sensitivity analyses and standard setting was performed on responses with a lower proportion of correct answers than in the intended context (Table 1). One must be aware that the percentage of correct answers may increase considerably when the test is incorporated in the education program.

When floor or ceiling effect is present the test or the affected items will have poor discrimination ability, as differences are harder to distinguish [24]. The ceiling effect might also have affected the reliability estimate, which was lower than expected in the final test. The fetal monitoring assessments in the United States and Australia contain 100 and 50 items, respectively [7, 23]. Lengthening the CTG test would expectedly result in a higher reliability estimate [9]. Cronbach’s alpha was substantially higher in the pilot test than in the final test, which we believe is attributed both to the inclusion of students among the pilot participants and the above- mentioned lack of course participation. This illustrates the importance of context when discussing validity and the importance of choice of pilot test participants. As literature encourage we strived to set a standard score that was reasonable, defensible and fair (21). There is no ‘true’ standard score, and all standard-setting methods require judgment and decisions [21]. We find it a strength that the passing score was validated, though we are aware, that this implied a frustrating wait for the course participants.

78 The large population of CTG course participants and the thorough evaluation of psychometric properties was an additional strength of this study. The fit of the loglinear Rasch model convincingly indicates that the test measures the intended construct. DIF was detected in relation to profession and regions, and the test is therefore not suitable for measuring differences between midwives and physicians or differences across regions. It is not surprising that differences are detected between two professions whose members have different education, competences and responsibilities. As prescribed in patient safety literature [27], it was important for Safe Deliveries to function in an interprofessional setting, thereby avoiding the ‘silo approach’ and instead striving for a uniform ‘CTG language’ on a national level. However, as this validation process reveals, it is challenging to develop a uniform test for both professions. An allocation of test items in different levels of competences might be a solution [23]. In The Standards internal structure is suggested to be the third validation step, and it was a limitation in our study that the psychometric properties of the test were not examined more thoroughly during the pilot phase. A large amount of test responses are required for Rasch analyses and we therefore chose to evaluate psychometric properties on the actual test-takers.

As demonstrated, the process of test development is complex and time-consuming. Professionals with extensive knowledge of the test content, educationalists, statisticians, time, an implementation plan, economics and stakeholder’s corporation are some of the crucial ingredients in the process. The question of whether or not to integrate a test in a teaching intervention is disputable. Testing is known to enhance learning [28], it outlines the important topics within a field and it can be a motivating factor for learning. Based on this we believe the current test is an important part of the CTG education program. Certification exams in fetal monitoring has been implemented in obstetric units in USA [29] and a positive effect on clinical outcomes has been suggested [30]. Future studies in Denmark will examine the educational and clinical impact of this national CTG education program. The medical education literature recommends that decisions concerning considerable consequences for individual participants, as a restriction to clinical work at a maternity unit, should not be made based on just one assessment method [9]. Therefore, observational and performance assessments could beneficially be implemented if the test prospectively should function as a high-stake examination. One of the considerable overall challenges in developing a CTG test are the well-known limitations of the surveillance method; Nonetheless, electronic fetal monitoring is widely

79 integrated in the care and management of labor, which makes development and maintenance of competences crucial.

Conclusion Test development is complex and time-consuming, and the importance of context cannot be overemphasized. The five-step unitary validation approach allowed for development of a CTG MCQ test. Our process and findings support the proposed inferences of the test, but a higher reliability is needed for the national CTG test to function as a high-stake assessment. This study provides a feasible template relevant for MCQ test development in general. Applying the unitary approach to validity will expectedly lead to improved assessments in medical education.

List of Abbreviations CTG: Cardiotocography DIF: Differential item functioning MCQ: Multiple-choice question

Ethics approval and consent to participate Written consent for participation was obtained from all participants. Data processing was conducted anonymously and no patients were involved. The Regional Committee of the Capital Region of Denmark evaluated the study and ethical approval was not required according to Danish regulations (protocol number: H-1-2013-FSP-48).

Consent for publication Written consent for publication was obtained from all participants.

Availability of data and materials Data supporting the conclusions of this article are presented in tables, figures, and additional files. The full dataset cannot be shared due to the possibility of compromising anonymity.

Competing interests Morten Hedegaard was a member of the advisory board of Safe Deliveries, which is a non-profit organisation. There are no other conflicts of interest to declare.

80 Funding The study was funded by Trygfonden, Aase og Ejnar Danielsens Fond, Østifterne, Tømmerhandler Johannes Fogs Fond, and Department of Obstetrics, The Juliane Marie Centre, Rigshospitalet, Copenhagen, Denmark. All funds are non-profit and none of the funders had a role in the study design, data collection, data analyses, or manuscript writing.

Authors’ contributions LT and JLS contributed to conception. All authors contributed to design, data collection, data interpretation, critical manuscript reading and final approval of the manuscript. KBC performed the DIF and Rasch analyses, supervised on the remaining analyses, and authored the additional statistical files.

Acknowledgements We would like to warmly thank all the midwives, physicians, and medical and midwifery students who participated in the development of the CTG test. We thank the management from the six maternity units that participated in the pilot testing for finding time in their busy work schedules. We wish to thank Obstetrician Marianne Johansen and midwives Stinne Hoegh and Mette Kiel Smed for thorough proofreading. We thank Mark Beaves, manager of The Royal Australian and New Zealand College of Obstetricians and Gynaecologists´ Fetal Surveillance Education Program (FSEP) for his encouragement and sharing of knowledge.

References

1. Hove LD, Bock J, Christoffersen JK, Hedegaard M: Analysis of 127 peripartum hypoxic brain injuries from closed claims registered by the Danish Patient Insurance Association. Acta Obstet Gynecol Scand 2008, 87:72–75.

2. Berglund S, Grunewald C, Pettersson H, Cnattingius S: Severe asphyxia due to delivery- related malpractice in Sweden 1990-2005. BJOG: An International Journal of Obstetrics & Gynaecology 2008, 115:316–323.

3. Sentinel event alert, Issue 30 (2004): Preventing infant death and injury during delivery. Joint Commission on Accreditation of Healthcare Organizations. http://www.jointcommission.org/assets/1/18/sea_30.pdf

81 4. Sikre fødsler [Safe Deliveries] http://www.regioner.dk/sundhed/kvalitet/patientsikkerhed/sikre+fødsler

5. Pehrson C, Sorensen JL, Amer-Wåhlin I: Evaluation and impact of cardiotocography training programmes: a systematic review. BJOG: An International Journal of Obstetrics & Gynaecology 2011, 118:926–935.

6. The Royal Australian and New Zealand College of Obstetricians and Gynaecologists. Fetal Surveillance Education Program (FSEP). http://www.fsep.edu.au

7. Electronic Fetal Monitoring. 2014 Candidate Guide. The National Certification Corporation. http://www.nccwebsite.org/resources/docs/2014-efm-candidate_guide.pdf

8. Projekt säker förlossningsvård [Safe Delivery Project] http://lof.se/wp-content/uploads/2015/05/slutrapport_saeker_foerlossning.pdf

9. Downing SM, Yudkowsky R. Assessment in Health Professions Education. New York: Routledge, 2009. Ch.1-3.

10. Cook DA, Beckman TJ: Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med 2006, 119:166.e7–16.

11. Schuwirth LWT, van der Vleuten CPM: ABC of learning and teaching in medicine: Written assessment. BMJ 2003, 326:643–645.

12. Fødselsstatistikken [Danish Birth Statistics] 2012, Statens Serum Institut.

13. Messick S. Validity of Psychological Assessment. Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologists 1995;50:741-749.

14. Standards for educational and psychological testing. Amer Educational Research Assn, 1999. pp.11-17.

15. Kern DE, Thomas PA, Howard DM, Bass EB. Curriculum Development for Medical Education. A six step approach. London: The Johns Hopkins University Press, 1998. pp. 1-37.

16. Thellesen L, Hedegaard M, Bergholt T, Colov NP, Hoegh S, Sorensen JL: Curriculum development for a national cardiotocography education program: A Delphi survey to obtain consensus on learning objectives. Acta Obstet Gynecol Scand 2015, 94:n/a–n/a.

17. Case SM, Swanson DB. Constructing written test questions for the basic and clinical sciences. National Board of Medical Examiners. 1998. http://www.nbme.org/pdf/itemwriting_2003/2003iwgwhole.pdf

18. Haladyna TM, Downing SM. A Taxonomy of Multiple-Choice Item-Writing Rules. Applied meassurement in education. 1989;2:37-50.

19. Sorensen JL, Thellesen L, Strandbygaard J, Svendsen KD, Christensen KB, Johansen M,

82 Langhoff-Roos P, Ekelund K, Ottesen B, van der Vleuten C: Development of knowledge tests for multi-disciplinary emergency training: a review and an example. Acta Anaesthesiol Scand 2015, 59:123–133.

20. Zoanetti N, Beaves M, Griffin P, Wallace EM: Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program. BMC Med Educ 2013, 13:35.

21. Downing SM, Tekian A, Yudkowsky R: Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med 2006, 18:50–57.

22. Christensen KB, Kreiner S, Mesbah M. Rasch Models in Health. London: Wiley-ISTE, 2012.

23. Zoanetti N, Griffin P, Beaves M, Wallace EM: Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment. BMC Med Educ 2009, 9:20.

24. Fayers P, Machin D. Quality of Life. John Wiley & Sons, 2013. pp. 71, 123, 176.

25. Benjamini Y HY: Controlling the false discovery rate: A practical and powerful approach to multiple testing. journal of the royal statistical society 1995, 57:289–300.

26. Miller GE: The assessment of clinical skills/competence/performance. Acad Med 1990, 65:S63–7.

27. Collins DE: Multidisciplinary teamwork approach in labor and delivery and electronic fetal monitoring education: a medical-legal perspective. J Perinat Neonatal Nurs 2008, 22:125–132.

28. Larsen DP, Butler AC, Roediger HL: Test-enhanced learning in medical education. Med Educ 2008, 42:959–966.

29. Berkowitz RL, D'Alton ME, Goldberg JD, O'Keeffe DF, Spitz J, Depp R, Nageotte MP: The case for an electronic fetal heart rate monitoring credentialing examination. Am J Obstet Gynecol 2014, 210:204–207.

30. Pettker CM, Thung SF, Norwitz ER, Buhimschi CS, Raab CA, Copel JA, Kuczynski E, Lockwood CJ, Funai EF: Impact of a comprehensive patient safety strategy on obstetric adverse events. Am J Obstet Gynecol 2009, 200:492.e1–8.

83 Figure 1. Study design. Flowchart of the five sources of validity evidence and the participants involved.

1. Content

Developing learning objectives based on a national Delphi survey (16) Deciding on a test blueprint with five domains Developing 50 one-best-answer multiple-choice questions

2. Response Process Proofreaders (n=6) Proofreading 1: relevance, language, spelling, academic content Two midwives and three obstetricians Proofreading 2: format, construction One obstetrician with test development experience [19] Pilot testing

Selecting 30 items to constitute the test

Pilot test participants (n=118) 3. Relations to other variables 32 Specialists in obstetrics and gynecology (20 obstetricians, 12 gynecologists) Sensitivity analysis 25 Residents in obstetrics and gynecology Comparing test responses from groups with expected differentiated level of CTG knowledge and clinical (13 first-year, 12 second-to-fifth-year) competences; Obstetricians, first-year residents, medical students, and midwives and midwifery students. 38 Midwives 8 Medical students !15 Midwifery students Recruited from six maternity units, representing all five 4. Consequences regions in Denmark and both small and large-sized units. Establishing a passing score using the Contrasting Groups method Detecting the discriminating point between a competent and a non-competent group; Obstetricians, and medical and midwifery students.

Implementing the test at the national CTG education program 5. Internal structure

Evaluating the psychometric properties of the test CTG course participants (n=1679) Loglinear Rasch model 269 Specialists in obstetrics and gynecology Differential item functioning 150 Residents in obstetrics and gynecology Cronbach’s alpha 1260 Midwives

84

Figure 2. Example of a multiple-choice question in a one-best-answer format

(Stem) Doorstep CTG from a healthy secundipara woman with an uncomplicated pregnancy. The first child was delivered by cesarean section due to breech presentation. The woman admits to hospital, gestational age 40+4, due to and starting contractions. The fluid is clear, the fetus is in cephalic presentation and is estimated to 3400 g. Blood pressure is 110/60, is fully effaced and 3 cm dilated. The contractions are intensifying.

(Lead-in question) How should the woman be monitored during labor?

(Options) A: Continuous CTG because of decelerations on the doorstep CTG B: Intermittent CTG because the decelerations on the CTG is a normal phenomenon after rupture of the membranes C: Continuous CTG because it is a high-risk delivery D. Intermittent CTG in the first stage of labor (dilation) and continuous CTG in the second stage of labor (pushing)

85 Table 1. Psychometric properties. Proportion of correct answers, loglinear Rasch model fit, and Table 1. Psychometric properties. Proportion of correct answers, loglinear Rasch model fit, and differential differentialitem functioning item functioning (DIF) in the (DIF)30-item in CTG the 30 test-item CTG test

CTG course Pilot test participants participants Blueprint Proportion of Item Proportion of correct Loglinear Rasch DIF domain correct answers in answers in percent percent n=118 n=1679 Observed Expected P-value P-value

Item1 Indication 81.4 97.7 0.350 0.346 - * Item2 Classification 78.8 91.8 0.737 0.685 - - Item3 Classification 82.2 92.9 0.795 0.751 - - Item4 Classification 80.5 97.0 0.524 0.530 - - Item5 Equipment 94.1 99.3 0.134 0.348 - - Item6 Management 94.1 99.5 0.537 0.348 - - Item7 Indication 74.6 93.9 0.466 0.372 - * Item8 Classification 73.3 89.7 0.296 0.341 - * Item9 Classification 57.6 70.0 0.153 0.242 - - Item10 Management 86.4 92.1 0.278 0.342 - - Item11 Physiology 72.9 95.6 0.371 0.345 - - Item12 Physiology 80.5 96.7 0.633 0.414 - - Item13 Classification 72.9 96.4 0.583 0.610 - - Item14 Management 83.1 97.3 0.636 0.704 - - Item15 Management 85.6 97.1 0.440 0.346 - - Item16 Physiology 76.3 96.3 0.331 0.345 - - Item17 Physiology 93.2 97.3 0.160 0.346 - - Item18 Physiology 72.0 85.0 0.327 0.338 - + Item19 Physiology 80.2 96.8 0.442 0.416 - + Item20 Classification 77.1 95.7 0.724 0.646 - - Item21 Classification 82.2 94.9 0.572 0.596 - - Item22 Physiology 91.5 98.5 0.615 0.517 - - Item23 Management 87.3 98.5 0.608 0.546 - - Item24 Management 88.1 98.5 0.552 0.347 - - Item25 Classification 71.2 93.5 0.481 0.451 - + Item26 Physiology 60.2 98.5 0.445 0.347 - - Item27 Management 93.2 96.9 0.479 0.346 - - Item28 Management 66.1 79.0 0.159 0.218 - *+ Item29 Classification 66.9 91.5 0.543 0.466 - - Item30 Management 74.6 98.9 0.723 0.500 - - - Non significant P-values * P-values that indicate DIF concerning profession + P-values that indicate DIF concerning regions.

86 Table 2. Sensitivity analysis. Mean test scores in the 30-item CTG test for groups with expected differentiated level of CTG knowledge and interpretive skills (pilot test participants)

Midwifery First-year Medical Midwives Obstetricians students residents students n=38 n=15 n=20 n=13 n=8 Mean test score (SD) 26.0 (3.0) 18.5 (3.2) 27.0 (2.4) 23.9 (3.0) 16.3 (4.2) Difference (95% CI) 7.4 (5.6 - 9.3) 3.0 (1.1-5.0) 7.7 (4.4-11.0)

P-value* p<0.0001 p=0.005 p=0.0008 *Mann Whitney Test

Figure 3. Standard setting in the 30-item CTG test using the Contrasting Groups method (pilot test participants).

Competent group

Non-competent group Proportion of participants in percent in of participants Proportion

Number of correct answers (medical and midwifery students n=23, obstetricians n=20)

87 Additional file 1. Supplementary details on psychometric properties and the statistical aspects of validation.

The degree of validity of the intended inferences of the test results can be studied by looking at the fit of the data to a psychometric model. We used the Rasch model [1,2] and an extension of that model, the loglinear Rasch model [3,4]. We evaluated the fit of the individual items using an item fit statistic [5] that evaluates the observed correlation between an item and the sum of the remaining items. The Rasch model imposes measurement requirements on the data and can be seen as a mathematical formulation of ideal measurement requirements [4]. Some of these requirements are technical, while others are essential. An example of the former is the requirement of local independence, which means that the underlying latent variable (in the current test: CTG knowledge, interpretive skills and clinical decision-making) explains all the correlation between any pair of items. An example of an essential requirement is that the difficulty of an item does not depend on external variables such as the profession or the seniority of the respondent. Local independence is the underlying assumption of latent variable models. The observed items are conditionally independent of each other, given an individual score on the latent variable(s). This means that the latent variable explains why the observed items are related to other items [6]. This requirement, called local independence, is unrealistic for the current test because some items share a common stem and others share a common topic. The loglinear Rasch model is an extension in which local dependence can be added. In the first analysis, the Rasch model rejected 10 of the 30 items (results not shown) and a loglinear Rasch model was used instead. We added local dependence for four item pairs (items 2 and 3, items 13 and 14, items 20 and 21, and items 22 and 23) as they shared a common stem. In this extended model only five out of the 30 items were rejected. We also found evidence of local dependence for three item pairs (items 2 and 25, items 3 and 4, and items 3 and 29) that covered CTG classification and for items 12 and 19 that covered fetal physiology. In this model three items were rejected. Adding local dependence for a single additional item pair (items 9 and 28) yielded a model where no strong evidence of item misfit was disclosed. In this model, only three items (items 9, 12, and 17) were significant at the 5% level. Adjusting for multiple testing, using the Benjamini and Hochberg procedure [7] to control the false discovery rate, indicated that these were type I errors.

88 Differential item functioning (DIF) occurs when respondents from different groups, such as people from different professions with the same ability, have a different probability of responding correctly to an item in a test [8]. An item does not display DIF if people from different groups have a different probability of giving a correct response; it only displays DIF if people from different groups with the same underlying true ability have a different probability of giving a correct response. When testing for DIF we found that items 1, 7, 8, and 28 functioned differently for physicians and midwives and that items 18, 19, 25, and 28 functioned differently across regions. No evidence of DIF was disclosed concerning seniority or size of maternity unit. The psychometric properties of the test are summarized in Table 1. To study the magnitude of DIF we computed, for each item revealing DIF, the proportion of physicians and midwives, respectively, who gave a correct answer (Additional file 2). For items 1, 7, 8, and 28, midwives consistently had a higher probability of giving a correct answer than physicians with the same score on the remaining items. Thus, including items that function differently will lead to different comparisons of physicians and midwives. Additional file 3 illustrates this, showing group comparisons based on three different sub-tests: (i) the total 30- item test, (ii) a reduced 26-item test in which the DIF items favoring midwives are removed, and (iii) a reduced four-item test with the items favoring midwives. The former two show no significant difference, whereas the latter shows significantly different group means.

References

1. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Danish National Institute for Educational Research, Copenhagen, 1960. 2. Fischer GH, Molenaar IW. Rasch models: Foundations, recent developments, and applications. Springer- Verlag, New York, 1995. 3. Kelderman H. Loglinear Rasch model tests. Psychometrika.1984;49(2):223–45. 4. Kreiner S, Christensen KB. Validity and objectivity in health-related summated scales: analysis by graphical loglinear Rasch models. In Von Davier M, Carstensen CH Multivariate and mixture distribution Rasch Models: Extensions and Applications. Spinger-Verlag, New York, 2007. 5. Kreiner S. A Note on Item-Restscore Association in Rasch Models. Applied Psychological Measurement. 2011; 35(7):557–61. 6. Lazarsfeld PF, Henry NW, Anderson TW. Latent structure analysis. Houghton Mill, Boston,1968. 7. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the royal statistical society. 1995;57:289–300. 8. Holland PW, Wainer H. Differential Item Functioning. Hillsdale, NJ: Lawrence Erlbaum, 1993.

89 Additional file 2. The magnitude of differential item functioning (DIF) with respect to profession. Proportion of correct answers for item 1, 7, 8 and 28 for physicians and midwives with equal amount of correct answers in remaining items. ! Additional file 2. The magnitude of differential item functioning (DIF) with respect to profession. Proportion of correct answers for item 1, 7, 8 and 28 for midwives and physicians with equal amount of correct answers in remaining items.

Additional file 3. The impact of differential item functioning (DIF). Proportion of correct answers among physicians and midwives in hypothetical sub-tests formed by including or excluding items with DIF

Number of Proportion of correct answers items Physician Midwife Group comparison mean (SD) mean (SD) Difference P-value* 30 items 0.94 (0.06) 0.94 (0.07) 0.00 0.63 26 items1 0.95 (0.05) 0.95 (0.07) 0.01 0.19 4 items2 0.87 (0.17) 0.91 (0.15) -0.04 0.00

1 Items 1, 7, 8 and 28 excluded 2 Items 1, 7, 8 and 28 * Mann Whitney test

90

Paper III

91

Cardiotocography interpretation skills and the association with size of maternity unit, years of obstetric work experience and healthcare professional background: a national cross-sectional study

Running title: CTG interpretation skills and work conditions

Line Thellesen MD1 Jette Led Sorensen MD PhD MMEd1 Morten Hedegaard MD PhD1 Susanne Rosthoej, Statistician PhD2 Nina Palmgren Colov MD1 Kristine Sylvan Andersen, Midwife1 Thomas Bergholt MD PhD MSc1

1Department of Obstetrics, Juliane Marie Centre for Children, Women and Reproduction, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark 2Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark

Corresponding author Line Thellesen MD, Department of Obstetrics, Rigshospitalet, University of Copenhagen. Blegdamsvej 9, DK-2100 Copenhagen. [email protected]

92

Abstract Introduction The aim of this study was to examine whether cardiotocography (CTG) knowledge, interpretation skills and decision-making measured by a written assessment were associated with size of maternity unit, years of obstetric work experience and healthcare professional background. Material and methods We conducted a national cross-sectional study in the setting of a CTG teaching intervention involving all 24 maternity units in Denmark. The participants were midwives (n=1260) and specialists (n=269) and residents (n=142) in obstetrics and gynecology who attended a one-day CTG course and answered a 30-item multiple-choice question test. The association between mean test score and work conditions were analysed using multivariable robust regression, in which the three variables were mutually adjusted. Results In the adjusted analyses, participants from units with >3000 deliveries/year scored higher on the 30-item test than participants from units with <1000 deliveries/year (3000-3999 deliveries/year: mean difference 0.8, p<0.0001; > 4000 deliveries/year: mean difference 0.5, p=0.006). Participants with less than 15 years of work experience scored higher than participants with more than 15 years of experience (15-20 years experience: mean difference -0.6, p=0.007; >20 years experience: mean difference -0.9, p<0.0001). No differences were detected concerning professional background. Conclusions CTG knowledge, interpretation skills and decision-making meaured by a written assessment were positively associated with working in large maternity units and having less than 15 years of obstetric work experience. Our findings indicate a possible challenge in maintaining CTG skills in small units and among staff with many years of work experience.

93 Keywords Cardiotocography, electronic fetal monitoring, assessment, multiple-choice questions, inter- professional education, continuing professional development

Abbreviations CTG: Cardiotocography MCQ: Multiple-choice question STAN: ST segment analysis

Key message A national CTG education program increased physicians and midwives’ CTG skills measured by self-evaluation and pre- and post testing using a written test. Test score was positively associated with working in large maternity units and having <15 years work experience.

Conflicts of interest Morten Hedegaard was a member of the advisory board of the Safe Deliveries project, which is a non-profit organisation. We have no other conflicts of interest to declare.

94 Introduction To reduce the incidence of hypoxic brain injuries among newborns a national obstetric quality project (Safe Deliveries) was initiated in Denmark in 2012.1 As part of the project all physicians and midwives responsible for laboring women in Danish maternity units were obliged to attend a standardized cardiotocography (CTG) education program consisting of e-learning and a one-day course with a written assesment. Fetal surveillance with CTG is widely used, its benefits and limitations continually discussed.2-5 Studies indicate that there is a causal relationship between errors in fetal monitoring management and hypoxic brain injuries among newborns.6-8 Continuing education in fetal surveillance is recommended.9-11 A review concludes that CTG training leads to increased knowledge and interpretation skills and improved quality of care.12 Heterogeneity of design, population and teaching methods characterise the included studies, and there is a lack of validated assessment tools.. Nationally developed learning objectives preceded the development of the current CTG course, and the written assessment underwent a thorough validation process.13,14

A national initiative means that comprehensive data are available, enabling detection of possible differences and patterns in CTG interpretation skills among specific subgroups, thus shedding light on specific challenges in CTG interpretation and allowing exploration of how to optimise CTG education prospectively. With this study we aimed to examine whether CTG knowledge, interpretation skills and decision-making measured by a written assessment at a national CTG attendance course was associated with size of maternity unit, years of obstetric work experience and healthcare professional background. An additional aim was to evaluate the participants’ self-perceived learning and the measured learning effect of attending the course.

Materials and Methods Design, population and setting This study was a national cross-sectional study examining the responses to a CTG test by healthcare professionals. The population comprised physicians and midwives from all 24 maternity units in Denmark who had participated in a CTG attendance course as part of a mandatory national CTG education program. The Danish maternity units were located across five regions and the number of annual deliveries ranged from 238 to 6659.15 Data collection took

95 place from September 2013 to December 2013. The term ‘physicians’ in this study refers to specialists and residents in obstetrics and gynecology. In Denmark specialists work mainly within obstetrics (obstetricians) and gynecology (gynecologists) or both (general specialists in obstetrics and gynecology). All groups participate in obstetric night shifts. Specialty training extends over five years and is divided into a first-year and a second-to-fifth-year residency period. CTG is available as a method for fetal surveillance at all Danish maternity units. The Danish CTG classification system (Fig.S1) is based on the International Federation of Gynecology and Obstetrics’ guideline and Neoventa’s classification of CTG.16,17 As ST segment analysis (STAN) is not used at a national level the education program only comprised CTG education.

Form, content and development of the CTG course The one-day attendance course included lectures, plenary discussions, small group teaching and completion of a written assessment that comprised a 30-item CTG multiple-choice question (MCQ) test. The course and the written assessment were designed based on learning objectives developed in a national Delphi study.13 The content addressed fetal physiology, CTG interpretation and classification, and clinical decision-making. The teaching emphasized an understanding of the physiological background for CTG changes and not just pattern recognition, and also CTG interpretation in the context of the overall clinical picture.

The MCQ test was constructed in a one-best answer format in accordance with acknowledged test development theory.18 Nearly all items in the test comprised a CTG recording and the physiology items emphasized applied knowledge concerning the physiological explanation for certain CTG patterns. Items concerning clinical decision-making were constructed with a clinical case and a CTG recording as the item stem. Examples of test items are provided in supporting information Fig.S2.

A prerequisite for participation in the CTG attendance course was completion of a CTG e-learning program. The e-learning program was available both at and outside the hospital for all maternity unit staff three months prior to the course. Combining different teaching methods is known to enhance learning, thus the blend of e-learning, lectures and small group discussions seems well founded.19

96

To evaluate the participants’ perception of the course and their self-percieved learning all participants were asked to fill out an anonymous evaluation questionnaire, which was constructed with questions/statements and a five-point evaluation scale for responses. To evaluate the learning effect of the CTG attendance course, participants answered 10 out of 30 items on the CTG MCQ test at the beginning of the course. The participants were not told that the same 10 items would be repeated at the end of the course, nor were the correct answers provided when the course took place. At the end of the course the participants individually answered all 30 test items. A minimum of 25 correct answers was required to pass the test and all participants had a copy of the Danish CTG classification guideline (Fig.S1) while taking the test. Fig. 1 illustrates the CTG education program design.

The process of test development involved several validation steps to ensure that the test actually measured CTG knowledge, interpretation skills and decision-making.14 We performed standardized item writing using the one-best answer format and conducted a pilot test that involved 118 physicians, midwives and midwifery and medical students. After performing sensitivity analyses to detect acceptable discriminating abilities, we decided on a standard score using the contrasting groups method. Item response theory models were used to evaluate the psychometric properties of the test. All items in the CTG test fit a loglinear Rasch model. Differential item functioning was detected for healthcare profession (physicians /midwives) but not for years of obstetric work experience or size of maternity unit.

The attendance courses took place over a four-month period (September to December 2013) and used standardized teaching material. A group of 20 experienced midwives and obstetricians, one of each on every course, conducted the teaching and had participated in a one-day train-the- trainer course prior to the teaching intervention. Course participants were a deliberate mix of approximately 40 midwives and physicians. As prescribed by patient safety literature, an essential aspect of the Safe Deliveries project was that it took place in an inter-professional setting, where both midwives and physicians were taught together, and that a uniform ‘CTG language’ was established nationally.20 Information on workplace, years of clinical obstetric work experience and healthcare professional background were obtained during the course. The test scoring was carried out at Rigshospitalet by LT, NPC, KSA and a student assistant. Each test item had a possible scoring of

97 0 or 1, thus a maxium score of 30 was possible. A wrong, missing or double answer was categorized as incorrect. Statistical analysis Size of maternity unit was divided into the following five categories based on the annual number of deliveries: <1000; 1000-1999; 2000-2999; 3000-3999 and ≥4000. If participants listed two maternity units as their workplace, they were allocated to the largest unit (n=27). Years of obstetric work experience was divided into the following six categories: <1; 1 to 5; >5 to 10; >10 to 15; >15 to 20 and >20. Healthcare professional background was divided into the following six categories: Gynecologists, obstetricians, general specialists, second-to-fifth-year residents, first-year residents and midwives.

To examine a possible association between number of correct answers in the 30-item CTG test and the three explanatory variables, univariable and multivariable robust regression analyses using M-estimators with Huber weights were applied to downweigh the potential influence of outliers.21 In the multivariable analysis the three explanatory variables were mutually adjusted. As a posthoc analysis we compared the test fail rates across the categories using a multivariable logistic regression model. Due to the low number of cases (n=80) fewer categories were defined for each variable. We adjusted for multiple testing using the Benjamini-Hochberg method.22

The evaluation questionnaires were assessed by measuring the percentage distribution of responses within the five-point scale. An additional descriptive analysis was used to measure the learning effect of the course and examined the percentage of participants who increased their number of correct answers in the 10-item CTG test repeated before and after the CTG course. Statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). P values <0.05 were required for statistical significance.

Ethical considerations Written consent was obtained from all participants. Data processing was conducted anonymously. The Regional Ethics Committee of the Capital Region of Denmark evaluated the study and ethical approval was not required according to Danish regulations (protocol number: H-1-2013-FSP-48)

98 Results A total of 53 CTG attendance courses were conducted and 1801 physicians and midwives from 24 maternity units participated. Physicians and midwives who participated in the CTG pilot test (n=71) were excluded as their point of departure for taking the CTG test would have been different than the rest of the participants’. Eight non-specialised physicians were excluded as they did not follow a residency program but were employed in un-classified occupations or research programs. Thus, 1722 participants were eligible for inclusion, 1671 (97%) of whom gave written consent to participate.

Fig. 2 illustrates the distribution of correct answers on the 30-item CTG test. Ninety-five percent of the participants passed the test. As shown in the figure, the test displayed ceiling effect and potential influential outliers.

Table 1 depicts the results of the robust regression analysis. The multivariable analyses showed that participants from maternity units with >3000 deliveries/year had significantly more correct answers than participants from maternity units with <1000 deliveries/year (Mean difference 0.8, 95% CI 0.4-1.1, p<0.0001 for maternity units with 3000-3999 deliveries; Mean difference 0.5, 95% CI 0.2-0.8, p=0.006 for maternity units with > 4000 deliveries). Participants with less than one year of obstetric experience had significantly more correct answers than participants with more than 15 years of experience (Mean difference -0.6, 95% CI -1.0-(-0.2), p=0.007 for 15-20 years of experience; Mean difference -0.9, 95% CI -1.2-(-0.5), p<0.0001 for >20 years of experience). No differences were detected between participants with less than one year of obstetric work experience and 1-5 years, 5-10 years or 10-15 years of work experience, respectively. Gynecologists had significantly fewer correct answers than obstetricians, midwives and second-to-fifth-year residents in the univariable analysis. This significance ceased in the multivariable analysis (Table 1). Sixty participants (3.6%) did not provide information on the length of their clinical obstetric work experience.

In the posthoc analysis we found a significantly increased probability of not passing the test when working in maternity units with <1000 deliveries/year and when having more than 15 years of work experience (Table 2). No differences were detected concerning professional background, hence, the posthoc analysis supported the findings from the robust regression.

99

Concerning the evaluation questionnaire 95% of the participants responded; 89% found the course rewarding, 75% agreed or strongly agreed that they had become more confident with CTG interpretation and classification and 71% agreed or strongly agreed that they had become more confident with what actions they should take based on a specific CTG (Table S1). Fig. 3 shows the responses to the 10 MCQ items presented before and after the CTG attendance course for the entire population. We found that among the participants who could increase their number of correct answers (0 to 9 correct answers in the initial test), 84% improved their score, 11% maintained their score and 5% decreased their score. Among the participants who had 10 correct answers on the initial test, 93% maintained their score. After examining the three explanatory variables separately, we found that a high proportion (73 to 92%) of the participants in each category improved their score, indicating that all groups benefitted from the course (Table S2). Thirty participants (1.8%) did not take the initial test because they arrived late for the course. The proportion of missing data was low and no missing data techniques were applied.

Discussion In this national cross-sectional study we found that the mean score on a CTG test of knowledge, interpretation skills and decision-making was positively associated with working in large maternity units and having less than 15 years of obstetric work experience. In addition we found that the standardized national CTG attendance course was positively evaluated and improved the CTG skills of both highly and less experienced physicians and midwives.

National data from a large and diverse population were included, which increased the study’s generalisability. A very high proportion of the eligible population was included and both physicians and midwives participated in the study, reflecting the inter-professional teams who manage deliveries. A further strength was the thorough process of course and test development, which ensured qualified teaching and assessment material. As shown in Fig. 2, however, the CTG test displayed ceiling effect, which limits the test’s discriminatory ability, which in turn may have diminished the detected differences among groups. Due to time constraints at the CTG course and a desire not to reveal the complete CTG test, a 10-item test was used to measure the learning effect of the course. Applying the full test would most likely have resulted in a more precise estimate.

100 A significant limitation of our study is the lack of information on how participants perform in the clinical setting. We percieve a theoretical foundation necessary for clinical work and hypothesize that there is an association between answering a CTG test in a classroom and CTG monitoring of a laboring woman in a maternity unit. Within assessment literature it is suggested that a written assessment can predict the results of a perfomance-based test, thereby indicating an association to clinical skills.23 However, our study does not address whether our findings are transferable to the clinical setting.

The differences we found among groups were small and interpretations should be made cautiously. In the multivariable analysis the significant differences in mean test score ranged from 0.5 to 0.9. Whether this is educationally relevant is disputable; however, the posthoc analysis using test pass/fail as an outcome supported our findings.

Other studies have described associations between years of work experience and quality of healthcare and between size of maternity unit and neonatal outcomes. A systematic review found that 45 out of 62 studies reported poorer performance with increased experience.24 Only two studies described a positive association between increased experience and the defined outcomes. The outcomes comprised knowledge, appropriate use of diagnostic and screening tests, adherence to standards of appropriate , and health outcomes.24. The review found the most plausible explanation to be lack of regular update of competences acquired during the years of training. They also speculated that senior physicians are less likely to adopt new treatments and less receptive to new standards of care. The same is apparent for CTG education, which occurs mostly during early physician specialty and midwifery training and may not be updated regularly. The CTG classification system has changed over time, so physicians and midwives with high seniority need to both unlearn old habits and routines and at the same time acquire new ones. Another explanation could be that professionals with less seniority have recently finished university and midwifery school and are therefore familiar with the process of gaining new knowledge and being tested. Thus, we have potentially tested the ability to take a test and have not tested the specific knowledge and skills. Finally, the fetal monitoring method ST segment analysis (STAN) was introduced in Denmark in 2000.25 With the method follows a standardized education and CTG classification system, which also can explain our results.

101 We cannot exclude that stress and test anxiety may have biased our results.26 Neither can we exclude the implication of motivation, which is known to enhance learning.27 The course was mandatory, which is why we must assume that the participant’s motivation for attending the course varied.

Concerning the association with size of maternity unit, other studies found that maternity units with a lower delivery volume have higher incidence rates of approved obstetric injury claims and higher neonatal mortality rates compared to units with higher delivery volumes.28,29 Lack of clinical knowledge and skills, underlying system errors and non-compliance with written guidelines could to some extend explain some of these findings.28 Maintaining acquired CTG knowledge and skills might be challenging in small units, as exposure to CTG monitoring is lower than in larger units. In Denmark women with high-risk pregnancies and deliveries are referred to large maternity units with highly specialized obstetric and neonatal services. More intensive fetal monitoring is required in these cases, which likely increases the CTG skills of the physicians and midwives on these units. More frequent local training sessions and courses may also be easier to conduct in larger units.

The positive evaluation and effect of CTG education has been described previously.12 An interesting finding in our study was that all groups had a high proportion of participants who increased their knowledge and interpretation skills; thus, it seems that CTG teaching is beneficial for all physicians and midwives, independent of years of work experience and healthcare professional background. We are aware of one study that evaluated improved CTG knowledge in relationship to profession and work experience and it also found that all groups benefitted from CTG education.30 Initiatives are already underway that might affect the tendencies we found. The development in Denmark is towards merging smaller maternity units into larger units. Collaborations between small and large units are taking place and may beneficially be expanded. This implies both clinical stays at other maternity units and training sessions across units. Our study and the previously mentioned review indicate that continuing professional development is necessary to maintain competences.21 Training and education is not limited to novices but must also be offered to professionals with many years of clinical experience.

102 Conclusion CTG knowledge, interpretation skills and decision-making measured by a written assessment was positively associated with large maternity units and less than 15 years of obstetric work experience. Our findings indicate a possible challenge in maintaining CTG competences in small maternity units and a potential underrepresentation of CTG education among experienced obstetricians and midwives. Our study also indicated that CTG teaching is beneficial for both junior and senior physicians and midwives.

Acknowledgements We warmly thank all the midwives and physicians who participated in this study. We would like to thank all the teaching midwives and obstetricians who ensured a high academic level on the CTG courses and also Safe Deliveries for a rewarding collaboration. We are also grateful to the team of engaged midwives, physicians and technicians from Aarhus University Hospital who developed the CTG e-learning programme.

Funding The study was funded by Trygfonden, Aase and Ejnar Danielsen Foundation, Oestifterne, Toemmerhandler Johannes Fog’s Foundation and Department of Obstetrics, Juliane Marie Centre for Children, Women and Reproduction, Rigshospitalet, Copenhagen, Denmark. All of these foundations are non-profit and none of the funders had a role in the study design, data collection, data analyses or manuscript writing.

References

1. Sikre Fødsler [Safe Deliveries] http://www.regioner.dk/sundhed/kvalitet/patientsikkerhed/sikre+fødsler

2. Ananth CV, Chauhan SP, Chen H-Y, D'Alton ME, Vintzileos AM. Electronic fetal monitoring in the United States: temporal trends and adverse perinatal outcomes. Obstet Gynecol. 2013 May;121(5):927–33.

3. Blix E, Sviggum O, Koss KS, Oian P. Inter-observer variation in assessment of 845 labour admission tests: comparison between midwives and obstetricians in the clinical setting and two experts. BJOG: An International Journal of Obstetrics & Gynaecology. 2003 Jan;110(1):1–5.

103 4. Graham EM, Adami RR, McKenney SL, Jennings JM, Burd I, Witter FR. Diagnostic accuracy of fetal heart rate monitoring in the identification of neonatal encephalopathy. Obstet Gynecol. 2014 Sep;124(3):507–13.

5. Alfirevic Z, Devane D, Gyte GML. Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour. Alfirevic Z, editor. Cochrane Database Syst Rev. 2013;5:CD006066.

6. Hove LD, Bock J, Christoffersen JK, Hedegaard M. Analysis of 127 peripartum hypoxic brain injuries from closed claims registered by the Danish Patient Insurance Association. Acta Obstet Gynecol Scand. 2008;87(1):72–5.

7. Berglund S, Grunewald C, Pettersson H, Cnattingius S. Severe asphyxia due to delivery-related malpractice in Sweden 1990-2005. BJOG: An International Journal of Obstetrics & Gynaecology. 2008 Feb;115(3):316–23.

8. Andreasen S, Backe B, Oian P. Claims for compensation after alleged birth asphyxia: a nationwide study covering 15 years. Acta Obstet Gynecol Scand. 2014 Feb 1;93(2):152–8

9. Preventing infant death and injury during delivery. Sentinel Event Alert. 2004 Jul 21;(30):1–3.

10. Anbefalinger for Svangreomsorgen 2013 [Recommendations for care during pregnancy]. http://sundhedsstyrelsen.dk/publ/Publ2013/10okt/Svangreomsorg2013.pd

11. Ten Years of Maternity Claims. An Analysis of NHS Litigation Authority Data. Published by NHS litigation Authority. Oct. 2012. http://www.nhsla.com/safety/Documents/Ten%20Years%20of%20Maternity%20Cl aims%20-%20An%20Analysis%20of%20the%20NHS%20LA%20Data%20- %20October%202012.pdf

12. Pehrson C, Sorensen JL, Amer-Wåhlin I. Evaluation and impact of cardiotocography training programmes: a systematic review. BJOG: An International Journal of Obstetrics & Gynaecology. 2011 Jul;118(8):926–35.

13. Thellesen L, Hedegaard M, Bergholt T, Colov NP, Hoegh S, Sorensen JL. Curriculum development for a national cardiotocography education program: A Delphi survey to obtain consensus on learning objectives. Acta Obstet Gynecol Scand. 2015 Aug;94(8):869-77.

14. Thellesen L, Bergholt T, Hedegaard M, Colov NP, Christensen KB, Andersen KS, et al. Development of a written assessment for a national interprofessional cardiotocography education program. Under review in BMC Med Educ.

15. Fødselsstatistikken [Danish Birth Statistics] 2012, Statens Serum Institut. http://www.ssi.dk/~/media/Indhold/DK%20%20dansk/Sundhedsdata%20og%20it/N SF/Registre/Fodselsregisteret/fødselsstatistikken2012vers%204.ashx

16. Guidelines for the use of fetal monitoring. Int J Gynecol Obstet 1987; 25: 159-167. 104 17. Neoventa Classification of CTG. http://www.neoventa.com/ctg-pocket-guide-app/

18. Case SM, Swanson DB. Constructing written test questions for the basic and clinical sciences. National Board of Medical Examiners. 1998. http://www.nbme.org/pdf/itemwriting_2003/2003iwgwhole.pdf

19. Dent J, Harden RM. A Practical Guide for Medical Teachers. Fourth edition 2013. Churchill Livingstone. Section 1.

20. Collins DE. Multidisciplinary teamwork approach in labor and delivery and electronic fetal monitoring education: a medical-legal perspective. J Perinat Neonatal Nurs. 2008 Apr;22(2):125–32.

21. Huber PJ. Robust Regression: Asymptotics, Conjectures and Monte Carlo. The annals of Statistics. 1973:1(5):799-821.

22. Benjamini Y HY. Controlling the false discovery rate: A practical and powerful approach to multiple testing. journal of the royal statistical society. 1995;57:289– 300.

23. Kramer AWM, Jansen JJM, Zuithoff P, Düsman H, Tan LHC, Grol RPTM, et al. Predictive validity of a written knowledge test of skills for an OSCE in postgraduate training for general practice. Med Educ. 2002 Sep;36(9):812–9.

24. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: the relationship between clinical experience and quality of health care. Ann Intern Med. 2005 Feb 15;142(4):260–73.

25. Colov NSP. [Need for extensive education when implementing new foetal monitoring technology]. Ugeskr Laeg. 2007 Sep 24;169(39):3294–7.

26. Chapell MS, Blanding ZB, Silverstein ME, Takahashi M, Newman B, Gubi A, et al. Test Anxiety and Academic Performance in Undergraduate and Graduate Students. Journal of Educational Psychology. American Psychological Association; 2005 May 1;97(2):268–74.

27. Artino AR, La Rochelle JS, Durning SJ. Second-year medical students' motivational beliefs, emotions, and achievement. Med Educ. 2010 Dec;44(12):1203–12.

28. Milland M, Mikkelsen KL, Christoffersen JK, Hedegaard M. Severe and fatal obstetric injury claims in relation to labor unit volume. Acta Obstet Gynecol Scand. 2015 May;94(5):534–41.

29. Moster D, Lie RT, Markestad T. Neonatal mortality rates in communities with small maternity units compared with those having larger maternity units. BJOG: An International Journal of Obstetrics & Gynaecology. 2001 Sep;108(9):904–9.

30. Beckley S, Stenhouse E, Greene K. The development and evaluation of a computer‐ assisted teaching programme for intrapartum fetal monitoring. BJOG: An International Journal of Obstetrics & Gynaecology. 2000 Sep 1;107(9):1138–44.

105 Fig. 1 The CTG education program design

Figure 1. The CTG education program design

CTG e-learning! 10-item CTG one-day course 30-item CTG test CTG test

Implemented nationally from June 2013. Conducted from September to December 2013. Consisting of Available at and outside hospital. lectures, small group teaching and plenary discussions.

Fig. 2 Distribution of correct answers in a 30-item CTG test among physicians and midwives. The dotted line represents the cut off between pass and fail

Figure 2. Distribution of correct answers in a 30-item CTG test among physicians and midwives. The dotted line represents the cut off between pass and fail.

600"

500" 473" 479"

400"

313" 300" Number of participants of participants Number

200" 161"

105" 100" 60" 32" 12" 20" 0" 2" 0" 1" 4" 3" 6" 0" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30"

Number of correct answers

106 TableTable 1 1 Responses. Responses byby physicians physicians and and midwives midwives to a 30 to-item a 30 CTG-item test CTG and the test association and the withassociation size of with size of maternitymaternity unit, unit, years of of obstetric obstetric work work experience experience and healthcare and hea professionallthcare professional background. background

Univariable Analysisa Multivariable Analysisa Difference Overall Difference Overall n p-valueb p-valueb (95% CI) p-value (95% CI) p-value Size of maternity unit in number of deliveries per year <1000 (mean no of correct answers 27.9) 95 - - 0.6 0.6 1000-1999 319 (0.3-1.0) 0.001 (0.2-0.9) 0.002 0.3 <0.0001 0.3 <0.0001 2000-2999 375 (0.0-0.7) 0.05 (-0.1-0.6) 0.13 0.9 0.8 3000-3999 322 (0.5-1.2) <0.0001 (0.4-1.1) <0.0001 0.6 0.5 >4000 560 (0.3-1.0) 0.001 (0.2-0.8) 0.006 Years of obstetric work

experiencec <1 (mean no of correct answers 28.6) 124 - -

0.2 0.1 1-5 486 (-0.1-0.5) 0.35 (-0.2-0.5) 0.67 0.1 0.1 >5-10 307 (-0.2-0.5) 0.55 (-0.2-0.5) 0.67 <0.0001 <0.0001 0.0 0.0 >10-15 263 (-0.3-0.3) 1.00 (-0.4-0.3) 0.91 -0.6 -0.6 >15-20 132 (-0.9-(-0.2)) 0.008 (-1.0-(-0.2)) 0.007 -0.9 -0.9 > 20 299 (-1.2-(-0.6)) <0.0001 (-1.2-(-0.5)) <0.0001 Healthcare professional

background Gynecologists (mean no of correct answers 28.1) 139 - - 0.6 0.5

Obstetricians 96 (0.2-1.0) 0.005 3 (0.1-0.9) 0.07 0.1 0.0 2 0. General specialists 34 (-0.4-0.7) 0.63 0.00 (-0.6-0.6) 0.92 0.8 0.3 2.-5.-year residents 96 (0.4-1.2) 0.001 (-0.1-0.8) 0.23 0.4 0.0 1.-year residents 46 (-0.1-0.9) 0.16 (-0.5-0.6) 0.92 0.4 0.2 Midwives 1260 (0.1-0.7) 0.005 (0.0-0.5) 0.23 a Estimated means from robust regression. In the multivariable analysis the three variables are mutually adjusted b Adjusted for multiple testing using the Benjamini Hochberg method for each explanatory variable c Missing information on 60 participants

107 Table 2 Crude and adjusted odds ratios (OR) and 95% confidence intervals (95% CI) for not passing a 30-itemTable CTG 2 Crude multiple and adjusted choice odds question ratios (OR) test, and categorized 95% confidence in intervals size of (95% maternity CI) for not unit, passing years a 30 -ofitem obstetric CTG multiple work-choice question test, categorized in size of maternity unit, years of obstetric work experience and healthcare professional background experience and healthcare professional background.

No. of participants who Percent of participants Crude OR Adjusted OR p-value did not pass the test / who did not pass the (95% CI) (95% CI) adjusted total no. of participants test

Size of maternity unit in number of deliveries per year <1000 (reference) 14/95 14.7 - - - 1000-2999 38/694 5.5 0.34 (0.17 - 0.65) 0.34 (0.17 - 0.70) 0.003 ≥3000 28/882 3.2 0.19 (0.10 - 0.38) 0.23 (0.11 - 0.47) <0.0001

Years of obstetric work experience* 0-5 (reference) 6/610 1.0 - - - >5-15 15/570 2.6 2.72 (1.05 - 7.06) 2.62 (0.96 - 7.10) 0.06 >15 47/431 10.9 12.32 (5.22 - 29.09) 11.26 (4.50 - 28.17) <0.0001

Healthcare professional background Gynecologists and 13/173 7.5 - - - General specialists (reference) Obstetricians 2/96 2.1 0.26 (0.06 - 1.19) 0.24 (0.03 - 1.97) 0.55 Residents 1/142 0.7 0.09 (0.01 – 0.68) 0.77 (0.08 - 7.10) 0.82 Midwives 64/1260 5.1 0.66 (0.36 – 1.22) 1.18 (0.56 - 2.52) 0.82 *Missing data, n=60 (12 cases)

108 Fig. 3 Learning effect of a national CTG course. Number of correct answers in a 10-item CTG test among physicians and midwives before and after a one-day CTG course (n=1641). The data above the line reflect the participants who improved their score; data below the line show participants who decreased their score, while data on the line shows participants who achieved the same score before and after the CTG course. Figure 3. Learning effect of a national CTG course. Number of correct answers in a 10-item CTG test among physicians and midwives before and after a one-day CTG course (n=1641). The data above the line reflect the participants who improved their score; data below the line show participants who decreased their score, while data on the line shows participants who achieved the same score before and after the CTG course.

!

109 Fig. S1 The Danish CTG classification guideline.

NB! Max. 5 contractions per 10 minutes CTG classification* Baseline Variability Decelerations Intrapartum (> 15 bpm and > 15 sec)

• 110-150 bpm • 5-25 bpm • Uniform, early Normal CTG

Accelerations Variable uncomplicated normal CTG on usual indication • • (>15 bpm and >15 sec) (< 60 sec and loss of < 60 beats)

Intermediary CTG • 100-110 bpm • > 25 bpm • Variable uncomplicated intermediary (one intermediary factor) • 150-170 bpm • < 5 bpm and no accelerations in (< 60 sec and loss of > 60 beats) 40-60 min Continue CTG, • Short bradycardia: consider second opinion 80-100 bpm in 3-10 min or < 80 bpm in 2-3 min

2 or more intermediary factors = abnormal CTG

Abnormal CTG (one abnormal factor) • 150-170 bpm and decreased • < 5 bpm > 60 min • Variable complicated variability with no accelaration (> 60 sec) Second opinion required, • > 170 bpm Most often indication for scalp • Sinusoidal pattern • Uniform, late blood sample or delivery • Persistent bradycardia: 80-100 bpm > 10 min or < 80 bpm > 3 min

Preterminal CTG • Total lack of variability with or without decelerations or bradycardia Most often indication for immediate delivery

*Antepartum CTG should generally not display decelerations

110 Fig. S2 CTG test items on fetal physiology, interpretation skills and clinical decision-making.

1. Fetal physiology

What is the most likely explanation for the illustrated CTG changes?

a. Compression of the umbilical cord due to oxytocin induced tachysystoli b. Placenta insufficiens c. Compression of the the vena cava d. Fetal-maternal hemorrhage ------

2. Interpretation skills

How would you classify this CTG recording?

a. Normal b. Intermediary c. Pathological d. Preterminal ------

3. Clinical decision-making

The CTG recording is from a secundipara woman, gestational age 38+0, with a normal first time pregnancy and delivery.

The woman is admitted for induction due to intrahepatic cholestasis of pregnancy. The fetus is cephalic presenting and is estimated to 3200 g. Blood pressure 150/80, normal urine test. This morning the woman was given misoprostol. 4 pm: Amniotomy with clear fluid. Cervix fully effaced and 2-3 cm dilated. CTG normal. 8 pm: Epidural and syntocinon infusion due to continuous cervix dilation of 3 cm. 10 pm: Cervix dilated to 4 cm, clear fluid, syntocinon infusion 120 ml/h.

What is the most appropriate action to this CTG recording?

a) Continue unchanged as the CTG is normal and the fetus therefore seems to cope well with the contractions b) Decrease the syntocinon infusion due to frequent contractions c) Take a scalp blood sample on the indication tachysystole

111 Table S2. Participant evaluation concerning course rewardness and self-percieved learning Table S1 CTG course evaluation.

N (%)a

Statements Not at all rewarding Not rewarding Neutral Rewarding Very rewarding

My overall evaluation of the of the CTG course 4 (0.2) 32 (1.9) 151 (8.9) 1005 (59.0) 512 (30.0)

Responses: 1704 (94.4%)b Strongly disagree Disagree Neutral Agree Strongly agree

I have become more confident with CTG interpretation and 6 (0.4) 54 (3.2) 375 (21.9) 1019 (59.5) 260 (15.2) classification

Responses: 1714 (95.0%)b

I have become more confident with what actions to perform 3 (0.2) 57 (3.3) 438 (25.6) 991 (57.9) 224 (13.1) based on a specific CTG

Responses: 1713 (94.9%)b a As the percentage is rounded up or down it does not equal 100 in all cases b Total population = 1805 (First time participants=1801; Second time participants=4, who are excluded in the other analyses. Not possible to exclude in the evaluation assessment due to anonymity)

Table S2 Improvements in responses to a 10-item CTG test among physicians and midwives before and after a one-day CTG course; categorized in size of maternity unit, years of obstetric work experience and Table S1. healthcareImprovements professional in responses background. to a 10-item CTG test among physicians and midwives before and after a one-day CTG course; categorised in size of maternity unit, years of obstetric work experience and healthcare professional background

Participants who scored 0-9 in pretest Participants who scored 0-10 in pretest Improved their Maintained their Decreased their Number of participants who Total n score in posttest, score in posttest, score in posttest, Total n scored 10 in pretest n (%) n (%) n (%) n (%) Size of maternity unit in number of deliveries/year <1000 59 43 (72.9) 10 (16.9) 6 (10.2) 94 35 (37.2) 1000-1999 154 132 (85.7) 10 (6.5) 12 (7.8) 311 157 (50.5) 2000-2999 185 143 (77.3) 37 (20.0) 5 (2.7) 373 188 (50.4) 3000-3999 147 133 (90.5) 10 (6.8) 4 (2.7) 315 168 (53.3) >4000 245 211 (86.1) 23 (9.4) 11 (4.5) 548 303 (55.3)

Years of obstetric work

experience <1 64 54 (84.4) 5 (7.8) 5 (7.8) 122 58 (47.5) 1-5 184 165 (89.7) 16 (8.7) 3 (1.6) 477 293 (61.4) >5-10 141 118 (83.7) 17 (12.1) 6 (4.3) 306 165 (53.9) >10-15 121 104 (86.0) 13 (10.7) 4 (3.3) 257 136 (52.9) >15-20 66 56 (84.8) 6 (9.1) 4 (6.1) 129 63 (48.8) >20 174 129 (74.1) 32 (18.4) 13 (7.5) 292 118 (40.4)

Healthcare professional

background Gynecologists 81 72 (88.9) 5 (6.2) 4 (4.9) 133 52 (39.1) Obstetricians 39 36 (92.3) 1 (2.6) 2 (5.1) 92 53 (57.6) General specialists 20 18 (90.0) 1 (5.0) 1 (5.0) 34 14 (41.2) 2.-5.-year residents 39 32 (82.1) 6 (15.4) 1 (2.6) 91 52 (57.1) 1.-year residents 32 25 (78.1) 4 (12.5) 3 (9.4) 46 14 (30.4) Midwives 579 479 (82.7) 73 (12.6) 27 (4.7) 1245 666 (53.5)

Missing pretest, n=30. Missing information on years of obstetric work experience, n=60 As the percentage is rounded up or down it does not equal 100 in all cases

112

Paper IV

113

The impact of a national cardiotocography education programme on birth hypoxia: a historically controlled intervention study

Running title: The impact of CTG education on birth hypoxia

Line Thellesen MD1 Thomas Bergholt MD PhD MSc1 Jette Led Sorensen MD PhD MMEd1 Susanne Rosthoej, Statistician PhD2 Lone Hvidman MD PhD 3 Brenda Eskenazi PhD Professor4 Morten Hedegaard MD PhD1

1Department of Obstetrics, The Juliane Marie Centre for Children, Women, and Reproduction, Rigshospitalet, University of Copenhagen, Denmark 2Section of Biostatistics, Department of Public Health, University of Copenhagen, Denmark 3Department of Gynaecology and Obstetrics, Aarhus University Hospital, Denmark 4Center for Environmental Research and Children’s Health, School of Public Health (CERCH), University of California, Berkeley

Corresponding author Line Thellesen MD, Department of Obstetrics, Rigshospitalet, University of Copenhagen. Blegdamsvej 9, DK-2100 Copenhagen, [email protected]

114 Abstract Objective To examine whether the implementation of a national inter-professional cardiotocography (CTG) education programme was associated with a decrease in risk of birth hypoxia. Design Historically controlled intervention study from 2009 to 2015. Setting All Danish maternity units. Population Intended vaginal deliveries with liveborn singletons in cephalic presentation and gestational age ≥37 weeks. Methods Data were retrieved from the Medical Birth Register and the National Patient Register. The study period was divided in three; pre-implementation (2009-2012), implementation (2013) and post- implementation (2014-2015). Using logistic regression we estimated odds ratios for outcomes associated with birth hypoxia using the pre-implementation period as reference. Analyses were adjusted for potential maternal, neonatal and delivery-associated confounders. Missing data were accounted for by multiple imputation. Main outcome measures Umbilical cord pH <7.00, five-minute Apgar score <7, neonatal therapeutic hypothermia. Results 331 282 deliveries were included. Overall risks of pH <7.00, Apgar score <7 and therapeutic hypothermia were 0.45%, 0.58% and 0.06%. Adjusted odds ratios in the post-implementation period were 1.12 (95% CI 1.00-1.26), 0.99 (95% CI 0.90-1.10) and 1.34 (95% CI 0.99-1.82) respectively. Risk of emergency caesarean section was unaltered, whereas the risk of assisted vaginal delivery decreased 14% (OR 0.86 (95% CI 0.84-0.89)). Conclusions Birth hypoxia is a rare event in Denmark and the implementation of a national CTG education programme did not decrease the risk. Effect dilution in a complex clinical setting, rare outcomes, insufficient intervention and a possible overestimation of the impact of CTG misinterpretation might explain the lack of effect on birth hypoxia.

115 Keywords Birth hypoxia, Cardiotocography, Electronic fetal monitoring, Medical education, Continuing professional development, Inter-professional education.

Tweetable abstract A national inter-professional CTG education programme did not decrease the risk of birth hypoxia

Abbreviations CTG: Cardiotocography MCQ: Multiple-choice question

116 Introduction Birth hypoxia increases the risk of neonatal mortality and adverse neurological outcomes.1,2 Several studies indicate an association between errors in the management of cardiotocography (CTG) and hypoxic brain injuries among newborns.3-5 The errors comprise omission of use of CTG when indicated, misinterpretation of CTG recordings, and insufficient or delayed clinical management.

Since the implementation of CTG as a fetal surveillance method in the developed world in the 1960’s, the desired aim of reducing the incidence of cerebral palsy has not been achieved.6 Adjunctive methods, such as ST waveform analysis of fetal electrocardiogram (STAN) and computer-based CTG interpretation have been introduced without conclusive improvements in neonatal outcomes.7,8 Yet CTG remains widely used in developed countries.9,10 Suggestions for improvement in CTG education and training have been proposed to overcome human errors in CTG management with the hope that improvements in neonatal outcomes would ensue.11,12 Although several studies conclude that CTG education leads to improved CTG knowledge and interpretation skills, few studies have evaluated the concomitant effect on neonatal outcomes.13

To increase quality of care for and infants and reduce the incidence of birth hypoxia a national obstetric project, Safe Deliveries, was introduced in Denmark in 2012. The project was initiated by the Danish Regions, the Danish Society of Obstetrics and Gynaecology, the Danish Association of Midwives, the Danish Paediatric Society, the Danish Society for Patient Safety and the Patient Compensation Association. As part of the project a mandatory inter-professional CTG education programme was implemented nationally. The Danish Medical Birth Register is a comprehensive database on all pregnancies and deliveries since 1973.14 It provides a unique opportunity to evaluate in a national setting, the effect of this CTG education programme on birth hypoxia, a relatively rare outcome. With this study we aimed to describe the implementation of a national mandatory inter- professional CTG education programme in Denmark and to report its impact on birth hypoxia as measured by umbilical cord pH <7.00, five-minute Apgar score <7 and neonatal therapeutic hypothermia using data from the Danish Medical Birth Register. We hypothesised that the CTG education programme would result in a decrease in risk of birth hypoxia. As an additional aim, we assessed whether the educational intervention was associated with an unwanted increase in risk of operative deliveries as an unintended consequence.

117 Method Design, population and setting We conducted a historically controlled study of births between January 1, 2009 to December 31, 2015. We included in analysis all intended vaginal deliveries in Denmark resulting in a liveborn singleton in cephalic presentation with a gestational age ≥37 weeks. Deliveries by elective caesarean section and homebirths were excluded, as they were not exposed to the possible change of behaviour due to the intervention (i.e. better management of intrapartum CTG).

All Danish maternity units are public departments located in public hospitals, which are managed and administered by the Danish Regions. No Danish obstetricians work as private practitioners. Women with low-risk pregnancies and deliveries are generally managed by midwives and obstetricians/gynaecologists are only involved if complications arise. During the study period, the number of maternity units decreased from 30 units in 2009 to 21 units in 2015. From August 2014 to September 2015, the total number of deliveries in Denmark equalled 51 966 and ranged from 203 to 6 322 at each maternity unit, with 11 units having more than 2000 deliveries per year.15

National CTG education programme Prior to the introduction of the national CTG education programme in June 2013, fetal monitoring with CTG was available at all Danish maternity units with local and regional CTG teaching. However, the extent, frequency, form and content of this teaching were not uniform. In contrast, the present national CTG education programme was standardised and consisted of an e-learning programme and a one-day course with a 30-item CTG multiple-choice question (MCQ) test. All doctors and midwives responsible for labouring women at a Danish maternity unit were obliged to participate in the programme. The content of the education programme addressed fetal physiology, CTG interpretation and classification and clinical decision-making. The CTG interpretation and classification were based on the national Danish CTG classification system, founded on the International Federation of Gynecology and Obstetrics and Neoventa classification systems.16,17

Completion of the e-learning programme was a prerequisite for attending the one-day course. Experienced midwives, obstetricians and technicians from different hospitals in Denmark collaboratively developed the e-learning programme. Available online, the programme took

118 approximately four hours to complete and could be taken as of June 2013 at or outside of the hospital. Each maternity unit developed their own policy concerning when and where to complete the programme. The programme constituted theoretical information and passing of 20 interactive cases and a test comprised of elements from the whole curriculum. The interactive cases were based upon authentic CTG traces and required suggestions on CTG interpretation, classification and clinical actions. Feedback was provided on both correct and incorrect answers. The participants were allowed to take a re-test in case they did not pass.

The one-day courses began in September 2013. The courses and the CTG MCQ test were developed at Rigshospitalet, and are described in previous publications.18,19 Briefly, the course included lectures, plenary discussion and small group case-based teaching. The teaching material was standardised and emphasised CTG interpretation in the context of the overall clinical picture and knowledge on fetal physiology. Twenty experienced midwives and obstetricians recruited from all of Denmark made up the teaching team, with one of each represented at every course. All instructors attended a one-day train-the-trainer course prior to commencing the courses. The course participants were a deliberate mixture of midwives and doctors, approximately 40 participants at every course. The course financing was part of the overall funding of Safe Deliveries, whereas the day-off clinical work to attend the course was financed by the individual maternity units. From 2013 to 2015 a total of 62 courses were conducted; 53 in 2013 (1801 participants), 5 in 2014 (106 participants) and 4 in 2015 (187 participants). The course participants comprised 1 552 (74%) midwives and 542 (26%) doctors.

In addition, Safe Deliveries comprised implementation of checklists (admission, vacuum delivery, augmentation) at all Danish maternity units.

Data source Data were retrieved from the Medical Birth Register and the Danish National Health Register in March 2016. Data are prospectively collected in both registers based on the unique personal identification number provided to all Danish citizens at birth. Data sets were delivered from the registries with encrypted personal identification numbers, thus all data processing was conducted anonymously.

119 Outcomes Umbilical cord pH <7.00, five-minute Apgar score <7 and neonatal therapeutic hypothermia were selected as markers for birth hypoxia. We considered these measures to be proxies for potential risk of long-term neurological impairments such as cerebral palsy, which is not usually diagnosed until well after the neonatal period.20

Umbilical cord pH <7.00: Umbilical cord blood analysis is currently the only way of objectively measuring hypoxia at birth.21 Low pH values are associated with neonatal mortality, encephalopathy and cerebral palsy and is considered an important outcome measure.1 pH <7.00 was selected as the cut-off based on the international consensus criteria to determine a severe acute hypoxic event as a potential cause of cerebral palsy.22 In Denmark, umbilical blood gas analysis is recommended at all deliveries and from August 2014 to September 2015 92% of all deliveries had at least one cord pH value measured.15 In the current study arterial pH was used when available. When this information was lacking, either venous or unspecified pH were used. To avoid invalid information, pH values below 6.5 and over 7.9 were considered as missing data.

Five-minute Apgar score <7: Apgar score reflects the pulmonary, cardiovascular and neurological functions of the newborn. A low five-minute Apgar score is associated with long-term adverse neurological outcome and neonatal death.23 Apgar score is limited in being a subjective measure and a low score can also be due to non-hypoxic causes, including prematurity, fetal anaemia or infection, meconium aspiration and maternal administered medication.21

Neonatal therapeutic hypothermia: In Denmark, the criteria for treatment with therapeutic hypothermia is ten-minute Apgar score <5 or umbilical cord pH <7.00 or standard base excess ≤-16 and gestational age ≥36 weeks, encephalopathy, abnormal EEG and start of treatment before 5 ½ hours birth age.24 Therapeutic hypothermia, thus, is a good marker for severe hypoxia.

Secondary outcomes: Previous studies have found an increase in the incidence of caesarean sections after the implementation of a CTG education programme.25,26 To assess possible unwanted effects of the

120 current education programme, we defined emergency caesarean section and assisted vaginal delivery as secondary outcomes.

Potential confounders Potential maternal, neonatal and delivery-associated confounders were decided a priori based on literature and biological plausibility. They constituted maternal age, parity, body mass index (BMI), smoking, diabetes, and hypertensive disorders; child sex, congenital malformation, gestational age, and birth weight; and induction, placental abruption, umbilical cord prolapse, uterine rupture and shoulder dystocia.

Avoidance of uterine tachysystole was a subject in the CTG education programme and therefore we did not adjust for augmentation as this factor was considered a possible mediator. Operative delivery was considered a potential mediator as well and was neither included in the model. ICD10 codes for the outcomes and potential confounders are presented in supporting information Table S1 together with information on categorisation, missing data and description of the data management applied to minimise misclassifications.

Statistics We divided the study into three periods: pre-implementation (January 1, 2009 to December 31, 2012), implementation (January 1 to December 31, 2013) and post-implementation (January 1, 2014 to December 31, 2015). In the primary analyses we compared the risk of umbilical cord pH <7.00, five-minute Apgar score <7 and therapeutic hypothermia in the three study periods with the pre-implementation period being the reference. In the secondary analyses, we compared the risk of emergency caesarean section and assisted vaginal delivery in the three study periods.

Using logistic regression, odds ratios were estimated using two analysis models, unadjusted and adjusted for maternal, neonatal and delivery-associated characteristics. The adjustment variables and the categorisation used in the analyses are given in Table S1.

Missing values occurred for umbilical cord pH and to a lesser degree for Apgar score and the adjustment variables (parity, BMI, smoking and birth weight). This entailed a possible systematic error in case the information was not missing at random. To correct the possible bias, missing data techniques were applied for the analyses of primary and secondary outcomes in the

121 unadjusted and adjusted analyses. Multiple imputation was used to create and analyse 50 multiple imputed data sets. The incomplete variables were imputed under fully conditional specification.27 Variables predictive of the primary outcomes and/or the missingness mechanism were included. These comprised the five outcomes of this study, the study periods, the variables used in the adjusted model as well as variables not included in the analysis models (auxiliary variables: admission to neonatal unit, neonatal death within 28 days and augmentation). The imputation models used the same categorisation of the variables as the analysis models and no interactions were included. Umbilical cord pH <7.00 and five-minute Apgar score <7 were imputed using logistic regression, smoking using a proportional odds model, and parity, BMI and birth weight using a multinomial logit model. The analysis models were applied to observed and imputed data sets for each of the five outcomes and odds ratios were estimated with 95% confidence intervals and p-values based on Rubin's rule.27

Calculations were performed using R version 3.2.5 and SAS version 9.4. Multiple imputation was implemented using the Multivariate Imputation by Chained Equations (mice) package.28 A p-value <0.05 was considered statistically significant.

Results There were 402 645 deliveries in Denmark from 2009 to 2015. After excluding stillbirths, preterm deliveries, multigravida deliveries, homebirths, planned caesarean deliveries and deliveries with fetuses in non-cephalic presentation, the final study population included 331 282 deliveries (Fig.1).

Table 1 summarises the characteristics of the women, infants, and deliveries in the three study periods based on observed data. The proportion of women with diabetes, deliveries among nulliparous women and induced deliveries increased during the study period whereas the proportion of women who smoked during pregnancy, augmented deliveries and deliveries with uterine rupture or shoulder dystocia decreased.

As shown in Fig. 2, the observed yearly incidences of umbilical cord pH <7.00, five-minute Apgar score <7 and therapeutic hypothermia ranged from 0.39 to 0.50%, 0.51 to 0.64% and 0.04 to 0.08%, respectively, with no clear trend over time.

122 Crude and adjusted odds ratio estimates based on observed and imputed data are presented in Table 2. The estimates indicated no decrease in any of the three outcomes in the implementation or post-implementation period compared to the pre-implemetation period. A borderline- significant increase in risk of pH <7.00 and therapeutic hypothermia was found in the post- implementation period.

The yearly incidences of emergency caesarean sections and assisted vaginal delivery are illustrated in Fig.2 and crude and adjusted odds ratio estimates presented in Table 2. We found a transient increase in risk of emergency caesarean section in the implementation period, which ceased in the post-implementation period, and a risk reduction in assisted vaginal deliveries in both the implementation and post-implementation period.

The proportion of missing umbilical cord pH values was considerably higher in the pre- implementation period (18.3%) than in the implementation (5.0%) and post-implementation periods (3.4%) as shown in Table 2. The umbilical cord blood sampling routines changed during the study period with an increase in proportion of arterial pH values from 49% in the pre- implementation period to 90% in the post-implementation period. Missingness for Apgar score (0.3-0.4%) was stationary during the study period (Table 2). The proportion of missing values for parity, BMI, smoking and birth weight equalled 0.8%, 2.3%, 1.2% and 0.2% respectively, and all but birth weight decreased over time (Table S1 and Table 1). The crude odds ratios on observed and imputed data did not differ (Table 2). When adjusted analysis was performed on observed data the odds ratio for pH <7.00 in the post-implementation period increased significantly (results not shown) in contrast to the analyses on imputed data. This was due to a loss of cases (81 cases out of 1320), most apparent in the pre-implementation period, because of missingness among the adjustment variables.

Discussion Main findings In this national historically controlled intervention study including 331 282 deliveries with liveborn term and postterm singletons in cephalic presentation, we found that the implementation of a comprehensive mandatory CTG education programme for Danish midwives and doctors did not decrease the risk of birth hypoxia as assessed by cord pH <7.00, five-minute Apgar score <7 and neonatal therapeutic hypothermia.

123 In secondary analyses we found a transient increase in risk of emergency caesarean section and a 14% risk reduction in assisted vaginal deliveries.

Strengths and limitations A major strength of this study is that our results are based on a large number of births representing all national births, which increases the external validity. The use of national register data within medical education research enabled measurements of rare and clinically relevant outcomes. Furthermore, our ability to examine possible unwanted effects of the intervention is an additional strength. There are also limitations to our study. We used a historical control group given the national context of the intervention. We expect that confounder adjustments have increased group comparability, but cannot exclude the possibility of unmeasured confounding variables. In addition, the study design does not preclude the potential impact of other changes in obstetric care during the study period. The quality of the data stored in the national registers is dependent on the quality of healthcare professionals’ registration, which is not flawless.14 We sought to minimise misclassifications by managing data as described in Table S1. Another limitation is missing data, which were most apparent in the pre-implementation period. The implementation of multiple imputation methods in the statistical analyses should have reduced the possible bias introduced by missingness.

Interpretation We consider Apgar score our most reliable primary outcome as it had few missing data and we do not suspect changes in healthcare professionals’ Apgar scoring or registration during the study period. Umbilical cord pH, though the best measure of birth hypoxia, was less reliable due to missing data and differences in sampling routines over time. As venous and expectedly unspecified pH values are higher than arterial values, the sampling routine changes might have entailed the non- significant increase in pH <7.00 we found in the post-implementation period. We are encouraged by the increasing data completeness, which provides more valid data for future studies. We interpret the non-significant increase in use of therapeutic hypothermia with caution. The treatment was gradually integrated in Danish paediatric care from 200624 and the diagnosis was implemented in the Danish National Patient Register in 2009, thus the increase could reflect the gradual implementation of a new treatment into clinical practice.

124 We consider our secondary outcomes reliable, as there were no changes in coding during the study period and the most common obstetric surgical interventions and procedures are correctly registered in the Danish Medical Birth Register.14 The aim of intrapartum fetal monitoring is to identify fetuses with inadequate oxygenation in labour to allow for appropriate intervention to avoid irreversible injury and on the other hand to avoid unnecessary interventions when fetal oxygenation is sufficient.21 The decrease in risk of assisted vaginal deliveries might indicate an increased awareness on this latter intention. However, factors other than CTG education may have caused this decrease, as a decrease was observed in the pre-implementation period as well. The slight increase in risk of emergency caesarean sections may be caused by the focus on substandard care, however, the increase was transient and we therefore consider the risk unchanged.

Other countries have examined the impact of CTG education on neonatal outcomes, most of them in the context of other interventions and most of them finding no effect on low pH or Apgar score.25,29,30,31 The implementation of the Royal Australian and New Zealand College of Obstetricians and Gynaecologists’ Fetal Surveillance Education Program was associated with a decrease in incidence of hypoxic ischaemic encephalopathy. However, changes in the diagnostic coding could have affected the results.32 A British study from 2006 found a reduction in five- minute Apgar score <7 and hypoxic ischaemic encephalopathy after CTG education and simulation training of six different obstetric emergencies.26 Thus, the positive impact of CTG education on birth hypoxia markers seems inconclusive.

The rarity of the outcomes could be one explanation for the current study not finding a decrease in birth hypoxia. In our study population the overall risk of cord pH <7.00 and five-minute Apgar score <7 was 0.45% and 0.58%. In other countries, studies with comparable populations find risks of 2.2% and 0.86% respectively.2,26 Perhaps the rarity makes a further decrease of birth hypoxia in Denmark difficult to achieve. Another reason might be the complexity of management of labour and delivery and the fact that CTG knowledge and interpretation skills are just one aspect of care. Based on the studies analysing obstetric compensation claims,3-5 we hypothesised that errors in CTG management was mainly a problem of lack of knowledge that could be addressed with CTG education. Conducting a more extensive needs analysis and identifying other potential aspects of these CTG management errors might have entailed a broader and perhaps more effective intervention. Team

125 training and clinical drills in high-risk obstetric events could be considered integrated prospectively in accordance with the recommendations from The Joint Commission of Accreditation of Healthcare Organizations and the Danish National Board of Health to prevent injury during delivery.33,34

Furthermore, the impact of CTG management errors on hypoxic brain injuries could be overestimated. Intrapartum hypoxia might only be the cause of cerebral palsy in 10% of cases and consensus criteria for a causal relationship between intrapartum events and cerebral palsy has been developed to minimise the risk of cerebral palsy being mislabelled as due to intrapartum hypoxia.22 These criteria are not universally applied in all studies analysing obstetric compensation claims, which weakens the association between CTG management errors and hypoxic brain injuries.

Medical education literature cautions against focusing solely on patient outcomes when evaluating the effect of an education intervention.35 The competences acquired during an education intervention may be diluted in a complex clinical setting in which communication, organisational structure and interdisciplinary collaborations may as well affect the wellbeing of the newborn. In addition, it can be challenging to establish a causal link between intervention and outcome and it is therefore recommended to evaluate an education intervention on different levels, e.g. by the use of Kirkpatrick’s four-level evaluation model.35-37 In a separate study we found the CTG education programme to have a positive effect on the two lower Kirkpatrick levels, reaction and learning.38 Prospectively, it seems relevant to integrate other evaluation approaches with more focus on process and context to answer how and why an education intervention did or did not have an effect instead of merely answering whether it worked.39,40

Conclusions Healthcare professionals are considered the weakest link of the CTG technology.41 In our study we did not find that increasing healthcare professionals’ CTG skills affected birth hypoxia. Dilution of effect, rare outcomes and a possible over-estimation of the impact of errors in CTG management might explain the lack of effect on birth hypoxia. CTG is widely used, making continuing development and maintenance of CTG competences necessary. The inter-professional context, the variety of educational strategies, the use of

126 assessment and the mandatory approach are elements we consider valuable in the current CTG education programme.

Acknowledgements We warmly thank Senior Advisor MPH Steen Rasmussen for his proficiency managing the national register data. We are grateful for the expertise on obstetric coding provided by Midwife Lene Friis Eskildsen and on paediatric coding provided by Neonatologist PhD Simon Trautner. We would like to acknowledge Safe Deliveries, Obstetrician Nina Palmgren Colov and Midwife Kristine Sylvan Andersen for their extensive collaboration. We are also grateful for the research stay at the Center for Environmental Research and Children’s Health, School of Public Health, University of California, Berkeley

Disclosure of interests Morten Hedegaard was a member of the advisory board of Safe Deliveries, which was a non- profit organisation. We have no other conflicts of interest to declare.

Contribution to authorship LT, TB, MH and JLS contributed to conception and design. All authors contributed to data interpretation, critical manuscript reading and final approval of the manuscript.

Details of ethics approval Data were processed anonymously. Study approval was obtained from the Danish Data Protection Agency (J.no: 30-1341). The Regional Committee of the Capital Region of Denmark evaluated the study and ethical approval was not required according to Danish regulations (protocol number: H-1-2013-FSP-9).

Funding The study was funded by Trygfonden, Aase and Ejnar Danielsens Foundation, Oestifterne, Toemmerhandler Johannes Fog’s Foundation and Department of Obstetrics and The Juliane Marie Centre, Rigshospitalet, University of Copenhagen, Denmark. None of the funders had a role in the study design, data collection, data analyses, or manuscript writing.

127 References

1. Malin GL, Morris RK, Khan KS. Strength of association between umbilical cord pH and perinatal and long term outcomes: systematic review and meta-analysis. BMJ. British Medical Journal Publishing Group; 2010 May 13;340(may13 1):c1471–1.

2. Yeh P, Emary K, Impey L. The relationship between umbilical cord arterial pH and serious adverse neonatal outcome: analysis of 51,519 consecutive validated samples. BJOG: An International Journal of Obstetrics & Gynaecology. 2012 Jun;119(7):824–31.

3. Hove LD, Bock J, Christoffersen JK, Hedegaard M. Analysis of 127 peripartum hypoxic brain injuries from closed claims registered by the Danish Patient Insurance Association. Acta Obstet Gynecol Scand. 2008;87(1):72–5.

4. Berglund S, Grunewald C, Pettersson H, Cnattingius S. Severe asphyxia due to delivery-related malpractice in Sweden 1990-2005. BJOG: An International Journal of Obstetrics & Gynaecology. 2008 Feb;115(3):316–23.

5. Andreasen S, Backe B, Oian P. Claims for compensation after alleged birth asphyxia: a nationwide study covering 15 years. Acta Obstet Gynecol Scand. 2014 Feb 1;93(2):152–8.

6. Clark SL, Hankins GDV. Temporal and demographic trends in cerebral palsy--fact and fiction. Am J Obstet Gynecol. 2003 Mar;188(3):628–33.

7. Saccone G, Schuit E, Amer-Wåhlin I, Xodo S, Berghella V. Electrocardiogram ST Analysis During Labor: A Systematic Review and Meta-analysis of Randomized Controlled Trials. Obstet Gynecol. 2016 Jan;127(1):127–35.

8. Nunes I, Ayres-de-Campos D, Ugwumadu A, Amin P, Banfield P, Antony N, et al. FM-ALERT: a randomised clinical trial of intrapartum fetal monitoring with computer analysis and alerts versus previously available monitoring. 2015. http://www.omniview.eu/Cache/binImagens/2015_UK_7730patient_RCT-647.pdf

9. Ananth CV, Chauhan SP, Chen H-Y, D'Alton ME, Vintzileos AM. Electronic fetal monitoring in the United States: temporal trends and adverse perinatal outcomes. Obstet Gynecol. 2013 May;121(5):927–33.

10. Holzmann M, Nordström L. Follow-up national survey (Sweden) of routines for intrapartum fetal surveillance. Acta Obstet Gynecol Scand. 2010 May;89(5):712–4.

11. Ten Years of Maternity Claims. An Analysis of NHS Litigation Authority Data. Published by NHS litigation Authority. Oct. 2012.

12. Ugwumadu A, Steer P, Parer B, Carbone B, Vayssiere C, Maso G, et al. Time to optimise and enforce training in interpretation of intrapartum cardiotocograph. BJOG: An International Journal of Obstetrics & Gynaecology. 2016 May;123(6):866–9.

13. Pehrson C, Sorensen JL, Amer-Wåhlin I. Evaluation and impact of 128 cardiotocography training programmes: a systematic review. BJOG: An International Journal of Obstetrics & Gynaecology. 2011 Jul;118(8):926–35.

14. Langhoff-Roos J, Krebs L, Klungsøyr K, Bjarnadottir RI, Källen K, Tapper A-M, et al. The Nordic medical birth registers--a potential goldmine for clinical research. Acta Obstet Gynecol Scand. 2014 Feb;93(2):132–7.

15. Dansk Kvalitetsdatabase for Fødsler, Årsrapport 2015 [Danish Quality Database for Births, Annual report 2015]. https://www.sundhed.dk/content/cms/66/4666_dkf-årsrapport-2014-2015.pdf

16. Guidelines for the use of fetal monitoring. Int J Gynecol Obstet 1987; 25:159-167.

17. Neoventa Classification of CTG. http://www.neoventa.com/ctg-pocket-guide-app/

18. Thellesen L, Hedegaard M, Bergholt T, Colov NP, Hoegh S, Sorensen JL. Curriculum development for a national cardiotocography education program: A Delphi survey to obtain consensus on learning objectives. Acta Obstet Gynecol Scand. 2015 Aug;94(8):869-77.

19. Thellesen L, Bergholt T, Hedegaard M, Colov NP, Christensen KB, Andersen KS, et al. Development of a written assessment for a national interprofessional cardiotocography education program. Submitted.

20. Herskind A, Greisen G, Nielsen JB. Early identification and intervention in cerebral palsy. Dev Med Child Neurol. 2015 Jan;57(1):29–36.

21. Ayres-de-Campos D, Arulkumaran S. FIGO consensus guidelines on intrapartum fetal monitoring: Physiology of fetal oxygenation and the main goals of intrapartum fetal monitoring. International Journal of Gynecology & Obstetrics. 2015 Oct;131(1):5–8.

22. MacLennan A. A template for defining a causal relation between acute intrapartum events and cerebral palsy: international consensus statement. BMJ : British Medical Journal. BMJ Group; 1999 Oct 16;319(7216):1054.

23. Nelson KB, Ellenberg JH. Apgar scores as predictors of chronic neurologic disability. Pediatrics 1981;68(1):36-44.

24. Lando A, Jonsbo F, Hansen BM, Greisen G. [Induced hypothermia in infants born with hypoxic-ischaemic encephalopathy]. Ugeskr Laeg. 2010 May 10;172(19):1433–7.

25. Pettker CM, Thung SF, Norwitz ER, Buhimschi CS, Raab CA, Copel JA, et al. Impact of a comprehensive patient safety strategy on obstetric adverse events. Am J Obstet Gynecol. 2009 May;200(5):492.e1–8.

26. Draycott T, Sibanda T, Owen L, Akande V, Winter C, Reading S, et al. Does training in obstetric emergencies improve neonatal outcome? BJOG: An International Journal of Obstetrics & Gynaecology. 2006 Feb;113(2):177–82.

129 27. Buuren SV. Flexible Imputation of Missing Data. CRC Press; 2012.

28. Buuren SV, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software. 2011 Dec.12;45(3):1–67. https://www.jstatsoft.org/article/view/v045i03

29. Goffman D, Brodman M, Friedman AJ, Minkoff H, Merkatz IR. Improved obstetric safety through programmatic collaboration. J Healthc Risk Manag. 2014;33(3):14– 22.

30. Young P, Hamilton R, Hodgett S, Moss M, Rigby C, Jones P, et al. Reducing risk by improving standards of intrapartum fetal care. J R Soc Med. 2001 May;94(5):226– 31.

31. Millde-Luthander C, Källen K, Nyström ME, Högberg U, Håkansson S, Härenstam KP, et al. Results from the National Perinatal Patient Safety Program in Sweden: the challenge of evaluation. Acta Obstet Gynecol Scand. 2016 May;95(5):596–603.

32. Byford S, Weaver E, Anstey C. Has the incidence of hypoxic ischaemic encephalopathy in Queensland been reduced with improved education in fetal surveillance monitoring? Aust N Z J Obstet Gynaecol. 2014 Aug;54(4):348–53.

33. Preventing infant death and injury during delivery. Sentinel Event Alert. 2004 Jul 21;(30):1–3.f

34. Svangreomsorgen [Recommendations for care during pregnancy] 2015, Danish Health Authority. pp.160.

35. Cook DA, West CP. Perspective: Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Acad Med. 2013 Feb;88(2):162–7.

36. Cook DA. Twelve tips for evaluating educational programs. Med Teach. 2010;32(4):296–301.

37. Kirkpatrick DL, Kirkpatrick JD. Evaluating Training Programs. The four levels. Third edition. Berrett-Koehler Publishers; 2006.

38. Thellesen L, Sorensen JL, Hedegaard M, S R, Colov NP, Andersen KS, et al. Cardiotocography interpretation skills and the association with size of maternity unit, years of obstetric work experience and healthcare professional background. Submitted.

39. Haji F, Morin M-P, Parker K. Rethinking programme evaluation in health professions education: beyond 'did it work?'. Med Educ. 2013 Apr;47(4):342–51.

40. Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, et al. Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015;350:h1258 doi:10.1136/bmj.h1258

41. Santo S, Ayres-de-Campos D. Human factors affecting the interpretation of fetal heart rate tracings: an update. Curr Opin Obstet Gynecol. 2012 Mar;24(2):84–8.

130 Fig. 1 Selection of study population

All deliveries !2009 - 2015 402 645 Multiple deliveries! 8 451 (2.1%) Singleton !deliveries 394 194 Stillbirths! 1 261 (0.3%) Liveborn singleton! deliveries 392 933 Home deliveries! 1 775 (0.4%) Liveborn singleton deliveries! in a hospital 391 158 Gestational age < !37 or ≥ 44 weeks 19 433 (4.8%) Liveborn singleton deliveries in a hospital, (19 243 < 37 weeks, 58 ≥ 44 weeks, 132 missing) gestational age! ≥ 37 weeks 371 725 Non-cephalic! presentation 13 804 (3.4%) Liveborn singleton deliveries in a hospital, (12 209 non-cephalic, 368 unspecified, 1 227 missing) gestational age ≥ 37 weeks, cephalic presentation

357 921 Planned caesarean! deliveries 26 639 (6.6%) Liveborn singleton intended vaginal deliveries, in a hospital, gestational age ≥ 37 weeks, cephalic presentation

331 282

131 Table 1 Maternal, neonatal and delivery-associated characteristics in the three study periods

2009-2012 2013 2014-15

Pre-implementation Implementation Post-implementation p-value

Intended vaginal deliveries with liveborn singletons in 195 142 44 519 91 621 cephalic presentation and gestational age ≥ 37 weeks n % n % n %

Maternal

<0.0001 Age, years < 25 25 803 13.2 5 933 13.3 11 744 12.8 25 - 29 61 304 31.4 14 272 32.1 30 653 33.5 30 - 34 69 955 35.9 15 574 35.0 31 090 33.9 35 - 39 32 315 16.6 7 301 16.4 15 079 16.5 ≥ 40 5 765 3.0 1 439 3.2 3 055 3.3

<0.0001 Parity 0 89 732 46.5 21 497 48.3 44 296 48.4 1 - 3 100 475 52.1 22 422 50.4 46 020 50.3 ≥ 4 2 588 1.3 556 1.3 1 175 1.3 Missing 2 347 (1.2) 44 (0.1) 130 (0.1)

0.02 BMI < 18.50 8 071 4.3 1 966 4.5 4 127 4.5 18.50 - 24.99 118 418 62.7 27 583 62.7 57 191 63.0 25.00 - 29.99 39 378 20.9 9 101 20.7 18 580 20.5 30.00 - 34.99 15 198 8.1 3 561 8.1 7 238 8.0 ≥ 35.00 7 841 4.2 1 818 4.1 3 689 4.1 Missing 6 236 (3.2) 490 (1.1) 796 (0.9)

<0.0001 Smoking Non-smoking 168 198 87.5 39 185 88.9 80 866 88.9 Stopped during pregnancy 5 602 2.9 1 438 3.3 3 254 3.6 1-10 cigarettes / day 13 157 6.9 2 482 5.6 5 163 5.7 > 10 cigarettes / day 5 231 2.7 958 2.2 1 721 1.9 Missing 2 954 (1.5) 456 (1.0) 617 (0.7)

Diabetes 5 883 3.0 1 520 3.4 3 446 3.8 <0.0001 Hypertensive disorders 9 377 4.8 2 315 5.2 4 335 4.7 0.0005

Neonatal

Child Sex 0.62 Female 95 148 48.8 21 722 48.8 44 505 48.6 Male 99 994 51.2 22 797 51.2 47 116 51.4

Congenital malformation 2 740 1.4 627 1.4 1 276 1.4 0.96

Gestational age, weeks 0.002 37+0 - 38+6 29 677 15.2 6 492 14.6 13 823 15.1 39+0 - 40+6 107 782 55.2 24 510 55.1 50 559 55.2 41+0 - 43+6 57 683 29.6 13 517 30.4 27 239 29.7

Birth weight, g <0.001 < 2500 2 169 1.1 476 1.1 1 044 1.1 2500 - 3999 156 649 80.4 35 915 81.1 74 132 81.2 ≥ 4000 36 100 18.5 7 882 17.8 16 093 17.6 Missing 224 (0.1) 246 (0.6) 352 (0.4)

Delivery

Induction 47 991 24.6 12 203 27.4 23 685 25.9 <0.0001 Augmentation (oxytocin, prostaglandin) 57 382 29.4 10 962 24.6 23 179 25.3 <0.0001 Placental abruption 537 0.3 126 0.3 236 0.3 0.62 Umbilical cord prolaps 142 0.1 41 0.1 62 0.1 0.29 Uterine rupture 347 0.2 64 0.1 119 0.1 0.007 Shoulder dystocia 2 343 1.2 487 1.1 933 1.0 <0.0001

132 Fig. 2 Observed yearly incidences of birth hypoxia and operative deliveries from 2009 to 2015. The brackets indicate the three study periods

0.70,7

0.60,6

0.50,5

0.40,4 pHCord < 7.00 pH < 7.00 Apgar score < 7 0.30,3 Apgar < 7 percent HypothermiaTherapeutic hypothermia 0.20,2

0.10,1

0.0 0 20091 20102 20113 20124 20135 20146 20157

12

10

8

6 EmergencyEmergency caesarean

percent caesareansection 4 section AssistedAssisted vaginal vaginal 2 deliverydelivery

0 20091 20102 20113 20124 20135 20146 20157

133 Table 2 Crude and adjusted odds ratios (OR) and 95% confidence intervals (95% CI) for birth hypoxia and operative deliveries during the implementation (2013) and post-implementation (2014- 2015) period of a national interprofessional CTG education programme. The pre-implementation period (2009-2012) serves as reference.

No. of observed Missing Crude OR Crude OR Adjusted OR** p-value cases / no. of data observed data imputed data imputed data Adjusted deliveries n (%) (95% CI) (95% CI) (95% CI)

Primary outcomes

Umbilical cord pH <7.00 2009-2012 700 / 159 508* 35 634 (18.3) - - - - 2013 185 / 42 308* 2 211 (5.0) 1.00 (0.85-1.17) 1.00 (0.85-1.17) 0.99 (0.84-1.16) 0.90 2014-2015 435 / 88 479* 3 142 (3.4) 1.12 (0.99-1.26) 1.12 (0.99-1.27) 1.12 (1.00-1.26) 0.05

Five-minute Apgar score <7 2009-2012 1 146 / 194 520* 622 (0.3) - - - - 2013 254 / 44 373* 146 (0.3) 0.97 (0.85-1.11) 0.97 (0.85-1.11) 0.97 (0.84-1.11) 0.62 2014-2015 529 / 91 254* 367 (0.4) 0.98 (0.89-1.09) 0.98 (0.89-1.09) 0.99 (0.90-1.10) 0.92

Hypothermia treatment 2009-2012 111 / 195 142 - - - - - 2013 30 / 44 519 - 1.19 (0.79-1.77) Equal to analyses on 1.21 (0.80-1.81) 0.36 2014-2015 67 / 91 621 - 1.29 (0.95-1.74) observed data 1.34 (0.99-1.82) 0.06

Secondary outcomes

Emergency caesarean section 2009-2012 20 638 / 195 142 - - - - - 2013 5 035 / 44 519 - 1.08 (1.04-1.11) Equal to analyses on 1.05 (1.01-1.08) 0.008 2014-2015 9 683 / 91 621 - 1.00 (0.97-1.03) observed data 0.98 (0.96-1.01) 0.14

Assisted vaginal delivery 2009-2012 16 173 / 195 142 - - - - - 2013 3 506 / 44 519 - 0.95 (0.91-0.98) Equal to analyses on 0.91 (0.87-0.95) <0.0001 2014-2015 6 879 / 91 621 - 0.90 (0.87-0.93) observed data 0.86 (0.84-0.89) <0.0001

*The denominator equals the number of deliveries with a registered pH value or Apgar score respectively **Analysis of primary outcomes adjusted for: Maternal age, BMI, smoking, parity, diabetes, hypertensive disorder, child sex, congenital malformations, gestational age, birth weight, induction, umbilical cord prolapse, placental abruption, uterine rupture, shoulder dystocia Analysis of secondary outcomes adjusted for: Maternal age, BMI, smoking, parity, diabetes, hypertensive disorder, child sex, congenital malformations, gestational age, birth weight, induction, umbilical cord prolapse

134 Supporting information Table S1 ICD10-codes for outcome variables and potential confounder variables and description of data management, categorisation and missing data

Variable ICD10-codes Data management Categorisation Missing data (accepted values) n (%) Outcomes pH < 7.00 Umbilical cord pH 6.5 ≥ pH ≤ 7.9 40 987 (12.4) pH ≥ 7.00 Apgar score < 7 Five-minute Apgar score 1 135 (0.3) Apgar score ≥ 7 Therapeutic hypothermia BMFL38B Yes / No - KMCA10A Emergency caesarean KMCA10D - section Yes / No KMCA10E KMAE00 KMEA03 Unsuccessful assisted vaginal delivery Assisted vaginal KMAE96 (KMAE20 and KMAE20) were excluded as - delivery KMAF00 the infants with these codes expectedly Yes / No KMAF10 underwent emergency caesarean section KMAF96 Potential confounders Maternal: < 25 years 25-29 years Age 30-34 years - 35-39 years ≥ 40 years 0 Parity 1-3 2 521 (0.8) ≥ 4 < 18.5 18.5-24.99 130 cm ≥ Height ≤ 200 cm BMI 25-29.99 7 522 (2.3) 30 kg ≥ Weight ≤ 300 kg 30-34.99 ≥ 35 Women who smoked an unregistered number of cigarettes were allocated to No smoking smoking 1-10 cigarettes / day as number of Stopped during pregnancy Smoking 4 027 (1.2) infants with birth weight < 2500 g and Smoking 1-10 cigarettes / day gestational age < 40 weeks was associated Smoking >10 cigarrettes / day with this group Diabetes Diabetes due to malnutrition (DO242) and

DO24* newly diagnosed manifesto diabetes Yes / No - (Type 1, Type 2, Gestational) (DO245) were not included (n=74) DO10* Hypertensive disorder DO11* DO13* Yes / No - (, preeclampsia, , HELLP) DO14* DO15* Neonatal: Child sex Female / Male - Cardiovascular: DQ20-DQ28*

Nervous: Congenital malformation - DQ00-DQ07* Yes / No Respiratory: DQ30-DQ34* 37+0 - 38+6 Gestational age 37 weeks ≥ Gestational age < 44 weeks 39+0 - 40+6 - 41+0 - 43+6 300 – 2499 g Birth weight ≥ 300 g Birth weight 2500 – 3999 g 822 (0.2) (maximum registered weight: 6661 g) ≥ 4000 g Delivery-associated: Induction BKHD2* KMAC00 Yes / No - (Prostaglandin, oxytoxin, amniotomy, balloon cathether) KMAC96A Placental abruption DO45* Yes / No - Umbilical cord prolapse DO690 Yes / No - DO711* Impending uterine rupture Uterine rupture DO710* (DO758A and DO711 or DO710 combined Yes / No - KMCC00 with DZ038) were not included DO660 Shoulder dystocia Yes / No - KMAH15 *ICD10 group

135