Relational Agents in Mental Health

A thesis submitted to the University of Manchester for the degree of

Doctor of Clinical Psychology

in the Faculty of Biology, Medicine and Health

2019

Hannah Gaffney

School of Health Sciences

Table of Contents

Word Counts ...... 7

List of Tables ...... 8

List of Figures ...... 8

Thesis abstract ...... 9

Relational Agents in Mental Health ...... 9

Declaration ...... 10

Copyright Statement ...... 10

Acknowledgements ...... 11

Paper 1: Systematic Review ...... 12

Abstract ...... 13

Introduction ...... 14

Rationale ...... 14

Objectives...... 15

Method ...... 16

Literature Search ...... 16

Eligibility criteria ...... 17

Screening ...... 17

Data extraction ...... 18

Risk of bias assessment ...... 18

Results ...... 19

Study selection ...... 19

Figure 1. Flow diagram of included studies. Search updates were conducted until January 2019, and 2 new papers were identified and included...... 20

Risk of bias ...... 21

Table 1. Descriptive statistics of methodological quality ...... 22

2

Study characteristics ...... 24

Table 2. Characteristics of included studies...... 26

Participants ...... 39

Relational agent interventions...... 39

Feasibility and engagement ...... 41

Psychological outcomes ...... 42

User experience outcomes ...... 43

Discussion...... 44

Principal results ...... 44

Limitations of included studies ...... 46

Strengths and limitations ...... 46

Future directions ...... 47

Conclusions ...... 48

Conflicts of interest ...... 49

Abbreviations ...... 49

References ...... 51

Paper 2: Empirical paper ...... 55

Acknowledgements ...... 56

Conflicts of interest ...... 56

Abstract ...... 56

Significant outcomes ...... 57

Limitations...... 58

Introduction ...... 59

Aims of the Study ...... 61

Material and Methods ...... 62

Design ...... 62

3

Participants ...... 62

Ethics ...... 63

Procedures ...... 63

Intervention (MYLO) ...... 64

Figure 1. MYLO conversation screen example...... 65

Measures ...... 66

Intervention process measure ...... 66

Intervention engagement ...... 67

Symptom measures ...... 67

Qualitative interview ...... 68

Piloting ...... 68

Statistical analysis ...... 69

Primary quantitative analysis...... 69

Figure 2. Illustration of the two-level hierarchical structure of data, split by questions identified as helpful and unhelpful ...... 70

Secondary quantitative analyses ...... 71

Qualitative analysis ...... 71

Results ...... 72

Participant Characteristics ...... 72

Figure 3. Recruitment flow diagram ...... 73

Primary results – intervention process ...... 74

Identification of helpful and unhelpful questions ...... 74

Table 1. Frequency table of questions identified as helpful or unhelpful ...... 75

Quantitative intervention process results ...... 76

Table 2. Descriptive statistics for scores on intervention process measure items by helpful and unhelpful questions ...... 77

4

Figure 4. Boxplot of scores on all variables by unhelpful and helpful questions ...... 77

Helpful questions ...... 78

Table 3. Effect of process variables on helpfulness for questions identified as helpful ...... 78

Table 4. Effect of intervention process measures on perceived helpfulness for questions classed as helpful ...... 79

Unhelpful questions ...... 80

Table 5. Effect of process variables on helpfulness for questions identified as unhelpful ...... 80

Table 6. Effect of intervention process measures on perceived helpfulness for questions classed as unhelpful ...... 81

Qualitative intervention process results ...... 82

Figure 5. Thematic map of participants’ reasons for choosing a question as being particularly helpful ...... 85

Figure 6. Thematic map of participants’ reasons for choosing a question as being particularly unhelpful ...... 87

Secondary results – Engagement, design and function, and clinical outcomes ...... 88

Engagement with MYLO ...... 88

Design and function ...... 88

Table 7. Participant feedback on MYLO intervention functions and design ...... 89

Clinical outcomes ...... 90

Table 8. Scores on clinical outcomes at baseline and follow-up and results of paired samples t-tests ...... 90

Discussion...... 91

Main findings ...... 91

Strengths and limitations ...... 93

Clinical implications ...... 95

5

References ...... 97

Paper 3: Critical appraisal ...... 104

Overview ...... 105

Paper 1 – Relational agents in the treatment of mental health problems: A mixed methods systematic review ...... 105

Rationale for review topic ...... 105

Literature search ...... 106

Quality appraisal & data extraction ...... 108

Synthesis ...... 109

Evaluation of findings ...... 109

Paper 2 – Agents of change: A multi-method study to optimise the helpfulness of an artificial relational agent for treating mental health problems...... 110

Rationale for the research topic and study design ...... 110

Patient and Public Involvement (PPI) ...... 111

Information governance and IT set-up ...... 112

Ethical considerations ...... 113

Recruitment ...... 114

Sample characteristics ...... 117

Intervention process measures ...... 118

Analytical approach ...... 120

Evaluation of findings ...... 121

Contribution to theory, research and practice ...... 122

Dissemination ...... 124

Concluding remarks ...... 124

References ...... 125

Appendices ...... 128

6

Appendix A. Journal of Medical Internet Research - guidelines for authors ...... 128

Appendix B. Systematic review search strategy ...... 134

Appendix C. Author guidelines for Acta Psychiatrica Scandinavica ...... 135

Appendix D. NHS Research Ethics Committee favourable opinion ...... 145

Appendix E. Participant information sheet (PIS) ...... 149

Appendix F. Intervention process measure ...... 155

Appendix G. Interview Schedule – Helpful and Unhelpful Questions ...... 157

Appendix H. Interview Schedule – Accessibility and Interface ...... 158

Appendix I. Correspondence with JMIR regarding open-access fee waiver ...... 159

Word Counts

Thesis section Title Word count* Thesis abstract Thesis abstract 368 Paper 1 Relational agents in the treatment of mental health 4,962 problems: A mixed methods systematic review Paper 2 Agents of change: A multi-method study to optimise the 7,929 helpfulness of an artificial relational agent for treating mental health problems Paper 3 Critical evaluation and reflection 5,311 Total 18,570 *Excluding: Abstracts, tables and references

7

List of Tables Paper one Page Table 1 List of criteria used to assess methodological quality and average score across studies 22 Table 2 Characteristics of included studies 26

Paper two Table 1 Frequency table of questions identified as helpful or unhelpful 75 Table 2 Descriptive statistics for scores on intervention process measure 77 items by helpful and unhelpful questions Table 3 Effect of process variables on helpfulness for questions identified as 78 helpful Table 4 Effect of intervention process measures on perceived helpfulness for 79 questions classed as helpful Table 5 Effect of process variables on helpfulness for questions identified as 80 unhelpful Table 6 Effect of intervention process measures on perceived helpfulness for 81 questions classed as unhelpful Table 7 Participant feedback on MYLO intervention functions and design 89

Table 8 Scores on clinical outcomes at baseline and follow-up and results of 90 paired samples t-tests

List of Figures Paper one Figure 1 Flow diagram of included studies. Search updates were conducted until 20 January 2019, and 2 new papers were identified and included Paper two Figure 1 MYLO conversation screen example 65 Figure 2 Illustration of the two-level hierarchical structure of data, split by 70 questions identified as helpful and unhelpful Figure 3 Recruitment flow diagram 73 Figure 4 Boxplot of scores on all variables by unhelpful and helpful questions 77 Figure 5 Thematic map of participants’ reasons for choosing a question as being 85 particularly helpful

Figure 6 Thematic map of participants’ reasons for choosing a question as being 87 particularly unhelpful

8

Thesis abstract Relational Agents in Mental Health

Demand for psychological treatment still far outstrips supply. Digital options for intervention are evolving rapidly. Software programs called relational agents use artificial intelligence to emulate conversation through text or speech. Relational agent interventions have been developed to treat a range of mental health problems, however, there is little consensus on whether this type of intervention is acceptable to clients or efficacious. Paper one systematically reviewed the literature on relational agent interventions in the treatment of mental health problems. The thirteen included studies were diverse in design and aimed to treat a broad range of mental health problems using an eclectic variety of therapeutic orientations and formats. Their potential to provide acceptable and efficacious mental health support without human therapist input appears promising. However, more robust study designs, comparison with existing or alternative intervention formats and clarity over their mechanisms of action is required to demonstrate and maximise efficacy and efficiency. It remains mostly unclear how psychological interventions achieve their effects and what it is about therapy that clients find helpful. A relational agent intervention called ‘Manage Your Life Online’ (MYLO) emulates a transdiagnostic called the Method of Levels (MOL). MOL arises from a theoretical approach called Perceptual Control Theory (PCT). The primary aim of Paper two was to explore participant experiences of the process of intervention with MYLO from a theory driven, PCT perspective. A multi-method approach was used to elucidate what people find helpful or hindering during intervention with MYLO. Fifteen participants with a variety of mental health related problems appeared to find MYLO acceptable. Consistent with core processes of psychological change according to PCT, questions which enabled free expression, increased awareness and novel insights were associated with helpfulness. In contrast, questions eliciting intense emotions, which were confusing or inappropriate or repetitive were unhelpful and associated with disengagement or loss of faith in the MYLO intervention. Findings provide insight into how to optimise the acceptability and efficiency of MYLO and support the transdiagnostic processes outlined in PCT. Critical appraisal of the processes and rationale for Papers one and two is conducted in Paper three alongside broader reflections on the contributions of both papers to the evidence base, implications for clinical practice and future research directions.

9

Declaration

No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. Copyright Statement

i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=24420), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.library.manchester.ac.uk/about/regulations/)

10

Acknowledgements

I am extremely grateful for the enthusiastic support and guidance of my academic supervisors, Dr Warren Mansell and Dr Sara Tai. They have been invaluable in providing encouraging and thoughtful supervision and I have learned a great deal from working with them. I am especially grateful for the time and patience of my main supervisor, Dr Warren Mansell who met regularly with me and always provided prompt feedback and supportive direction. I would also like to thank Dr Lesley-Anne Carter for her supportive direction on the statistical elements of my thesis, Tracey Hepburn whom supported the transcription of my qualitative interviews and my colleague Dr Rebecca Wilson whom assisted with quality appraisal for my systematic review paper.

This research would not have been possible without the people who participated in the study and those who enabled and facilitated recruitment. I am particularly grateful for the support from the team at Self Help Services whom assisted with study set-up and supported piloting. I am also thankful to Danny Whittaker for advertising the study to the wider community. I would like to thank the University of Manchester IT services and in particular, Chris Grave for his patience and support in setting up, testing and hosting MYLO.

I am thankful to my wonderful partner, Roy Zilberman and my family and friends for their unwavering support, love and encouragement throughout this process. I am particularly grateful to my mum, Susan Taylor and my dad, David Gaffney for their firm belief in me. I am also thankful to my granny, Anne Mason for being so supportive and always sending thoughtful cards with encouraging words at just the right time. I would like to extend a special thank you to my sister, Sarah. Her vibrancy and resilience through the toughest of years have helped me to maintain my perspective throughout this process and she has been a constant source of encouragement, support and most importantly laughter.

Finally, I would like to dedicate this thesis to my late and beloved auntie, Diane Shadlock. Her love, kindness and enthusiasm for having a ‘right old time’ right up until the end is a continued source of motivation and inspiration.

11

Paper 1: Systematic Review

Relational agents in the treatment of mental health problems: A mixed methods systematic review

The following paper has been accepted for publication in:

‘Journal of Medical Internet Research Mental Health’

The guidelines for authors can be found in Appendix A.

Word count: 4, 967 (excluding abstract, tables and references)

12

Abstract Background: The use of relational agent interventions in mental health is growing at pace. Recent existing reviews have focussed exclusively on a subset of embodied relational agent interventions despite other modalities aiming to achieve the common goal of improved mental health.

Objective: This study aimed to review the use of relational agent interventions in the treatment of mental health problems.

Methods: A systematic search was performed using relevant databases (MEDLINE, EMBASE,

PsycINFO, Web of Science and Cochrane library). Studies were included if they reported on an autonomous relational agent that simulated conversation and reported on a mental health outcome.

Results: A total of 13 studies were included in the review. Four full-scale RCTs were included. The rest were feasibility, pilot RCTs and quasi-experimental studies. Interventions were diverse in design and targeted a range of mental health problems using a wide variety of therapeutic orientations. All included studies reported reductions in psychological distress post intervention. Five controlled studies demonstrated significant reductions in psychological distress compared to inactive control groups. Three controlled studies comparing interventions with active control groups failed to demonstrate superior effects.

Conclusions: The efficacy and acceptability of relational agent interventions for mental health problems is promising. However, more robust experimental design is required to demonstrate efficacy and efficiency. A focus on streamlining interventions, demonstrating equivalence to other treatment modalities and elucidating mechanisms of action has the potential to increase acceptance by users and clinicians and maximise reach.

Registration: The protocol for this systematic review was registered at PROSPERO. Registration number: CRD42018106652

Key words: relational agent; conversational agent; chatbot; mental health; psychological therapy; intervention

13

Introduction

Rationale

Relational agents are software programs that use artificial intelligence to simulate a conversation with a user through written text or voice. Recent everyday examples include digital assistants such as Siri (Apple), Cortana (Microsoft), Google Now, and Alexa (Amazon)

[1]. The first relational agent of this kind was ELIZA [2] which was programmed to mimic conversation with a Rogerian psychotherapist using typed text. In the 50 years since ELIZA, interest in relational agents and artificial intelligence has waxed and waned, and this is reflected in publication rates over time [3]. However, significant advances in technology over the past two decades have facilitated the design of relational agents that can undertake ever more complex tasks [4]. This has resulted in an explosion of publications in this area and particularly since 2009 [3].

Evidence has begun to accumulate around the potential benefits of relational agents in diverse fields [5], within health and medical care [6] and specifically in mental health [7–11].

Increased access to information through the internet and smartphones has highlighted the potential for relational agents to provide autonomous, interactive and crucially, accessible, mental health support. Existing digital therapies have suffered from low adherence and concerns about their efficiency without continued human support [12,13]. Existing digital therapy formats tend to focus on psychoeducation and a modular style of fixed content and duration which is inflexible for users. Relational agents hold particular promise compared to other digital mental health interventions as they can provide greater interactivity through emulating therapeutic conversation and provide choice and control over session content

14 and intensity. They also have potential for greater scalability compared to other therapy modalities such as human therapists, ‘Wizard of Oz’ programs (where a therapist responds via a computer) or digital interventions that require ongoing support from a clinician to produce favourable outcomes.

The application of relational agents in mental health is varied and includes diagnostic tools, symptom monitoring and treatment or intervention [14]. Existing systematic and scoping reviews of relational agent interventions in the mental health field have focussed on a subset of relational agents with a visual character (embodied) [8–10] or are now outdated

[7]. As far as we are aware, this is the first comprehensive systematic review of relational agents in the treatment of mental health problems.

Objectives

We conducted a systematic review and synthesis to evaluate the efficacy and acceptability of relational agents in the treatment of mental health problems. Specifically, we identified and evaluated studies that utilised a relational agent intervention and reported on a mental health outcome. Relational agents are diverse in design [1] and include, for example, chatbots (e.g. casual conversation delivered verbally or through text), embodied conversational agents (ECA; a computer-generated visual character which simulates face-to- face conversation and nonverbal behaviour), relational agents with a physical presence (e.g. robots) and relational agents within virtual reality (VR). For this systematic review, studies which included an autonomous relational agent, with text or verbal based input (either response options or free-text) and an independent, stand-alone system were included.

15

Studies which used ‘Wizard of Oz’ methods where a person or therapist responds through the computer or programs that required the ongoing support from a therapist or similar were excluded. We followed the Preferred Reporting Items for Systematic Review and

Meta-Analysis Protocols guidelines [15,16]. The protocol was registered prospectively at

PROSPERO (Registration Number: CRD42018106652).

Method

Literature Search

A systematic search of the literature was performed in September 2018 and updated in

January 2019 using MEDLINE (1946-August week 5 2018), EMBASE (1974-September 2018),

PsycINFO (1806-September 2018), Web of Science (1900-September 2018) and the

Cochrane library (All-September 2018). The search was not restricted by publication year or language. Three categories of search terms were included: (1) relational agent, (2) mental health and (3) intervention. The Boolean operator AND was used to bring together separate categories and OR used to combine terms within categories. Keywords were collated from the existing literature, academics in the field of relational agents and The Diagnostic and

Statistical Manual of Mental Disorders, Fifth Edition (DSM-V; Amercian Psychiatric

Association, 2013). The search strategy included keyword truncations and mappings to subject heading (Medical Subject Heading [MeSH]) that were adapted appropriately for each database. See Appendix B for details of the complete search strategy. The reference lists of all included studies were hand searched to identify all relevant references. Grey literature identified through the database searches was also included for screening.

16

Eligibility criteria

Studies were included if they reported on a relational agent intervention for mental health; the agent was independent and autonomous; simulated conversation; relied on a turn taking process with the user and reported on a mental health outcome. Review papers were included if all studies included met inclusion criteria for this review. Studies were excluded if the output from the relational agent was solely predetermined e.g. psychoeducation and not generated in response to user input; used asynchronous communication e.g. email; relied on a human user to generate responses (e.g. ‘Wizard of Oz’ methods); required support from a person to operate e.g. a therapist or similar; were limited to adherence to medication or physical health behaviours e.g. smoking cessation; focussed solely on the technical/or programming of the agent; lacked sufficient detail to determine eligibility (e.g. short conference abstracts). Studies not written in English were translated as required. The review included a diverse range of study designs such as randomised controlled trials (RCTs), quasi-experimental designs, feasibility studies and mixed-methods studies.

Screening

Studies identified through the database searches were exported to reference management software (Mendeley) and duplicates deleted. Study selection was conducted by the first author (HG). Screening procedures were piloted prior to beginning the screening process.

Abstracts and titles were initially screened and articles not meeting the inclusion and exclusion criteria removed. The first author (HG) then screened full texts and selected the articles for inclusion. Any lack of clarity over the eligibly of studies was resolved through discussion with a second author (WM). A random sample of 26 studies (10%) identified for

17 full-text screening were also independently screened by a second reviewer. Cohen’s kappa was used to measure inter-rater agreement. Finally, reference lists of all included papers were screened for additional studies and the inclusion and exclusion criteria applied. See

Figure 1 for a detailed breakdown of the flow of included studies.

Data extraction

Data from the included studies was extracted into a pre-specified form which included: author, year of publication, study design, mental health domain, relational agent name and description (including embodiment, access, theoretical approach, input and output style), number and characteristics of participants (including age, gender, presence and type of diagnosis or psychological problem), intervention description (including length and structure of intervention), control group description (if applicable), mental health outcome measures, user experience measures, attrition and primary findings (primary mental health outcome and user experiences). Due to the heterogeneity of the included studies a meta-analysis was not undertaken. Instead, extracted data was narratively synthesised in line with guidance on the conduct of narrative synthesis in systematic reviews [18].

Risk of bias assessment

Risk of bias assessment of each study was conducted to ascertain the validity and reliability of the methods and findings to inform the narrative synthesis of the studies. The validated

16-item Quality Assessment Tool (QATSDD; [19]) was deemed appropriate for this review to assess study quality as it includes quantitative (14-items), qualitative (14-items) and mixed- methods (16-items) items. Each of the 16 items is rated from 0 (not at all) to 3 (complete).

18

Specifically, the tool assesses the clarity of the theoretical framework, study aims, study settings, the representativeness of the sample, rationale for data collection procedure, the appropriateness and reliability of data analysis and the study’s strengths and limitations. For each included study, scores for each item were summed and a percentage of the total possible score calculated. If a study did not provide adequate detail to rate an item, the item was scored 0. The quality of each included study was assessed by the first author (HG).

Results

Study selection

The search identified 30,853 articles (See Figure 1) using the pre-defined search strategy outlined above. Duplicates were removed (8,131) and articles that did not meet inclusion and exclusion criteria based on title and abstract (22,388) were excluded. Handsearching through references resulted in an additional 5 studies eligible for inclusion. The search was updated in January 2019 and a further 2 eligible studies were identified and included. Lack of clarity over the eligibility of some articles (n=13) was resolved through discussion with the second author (WM). Due to the large number of articles identified in the initial search and limited researcher resource, inter-rater reliability was not assessed at title and abstract stage. However, inter-rater reliability was assessed at full-text eligibility stage. A random sample of 26 studies (10% of the 264 studies identified for full-text screening) were independently screened by a second reviewer. The percentage agreement between first author (HG) and the independent rater was 96% (25/26 in agreement). Cohen’s Kappa was

0.65 indicating substantial interrater agreement. Any differences in ratings were discussed and an agreement reached. A total of 13 articles were included in the review evaluating 11 different relational agents.

19

Figure 1. Flow diagram of included studies. Search updates were conducted

until January 2019, and 2 new papers were identified and included.

EMBASE MEDLINE PsycINFO Web of Science Cochrane Library (n = 8105) (n = 5291) (n = 7604) (n = 640) (n = 9213)

Records identified through database searching

(n = 30,853) Identification

Records after duplicates removed (n = 22,722)

Records screened Records excluded

(n = 22,722) (n = 22,458) Screening

Full-text articles excluded, with reasons (n = 258) Not a relational agent (n = 88) Full-text articles assessed K=0.65 Not independent/autonomous (n = 46) for eligibility (n = 264) No mental health outcome measure (n = 62) Review paper includes studies not meeting inclusion criteria (n = 42)

Insufficiently detailed conference abstract (n=1) Eligibility Not a psychological intervention (n = 11) Technical specification study (n = 7) Unable to obtain full text (n = 1)

Additional articles identified

Studies included in through reference lists

synthesis (n = 5) (n = 13)

Included Search update (n=2)

20

Risk of bias

The methodological quality of the included studies was varied (see Table 1). Using the

QATSDD [19] assessment tool, methodological quality ranged from the lowest score of 35%

[20] to the highest score of 88% [21]. The average quality score was 59%. All of the included studies with percentage scores above 70% were RCTs [21–24]. Studies were not excluded based on methodical quality as this would have meant a substantial loss of data. However, the results of the quality appraisal facilitated the appropriate weighting of our analysis and conclusions towards the higher quality, RCT studies.

All included studies received the maximum score of 3 for the criterion “Statement of aims/objectives in main body of report”. All included studies scored a 2 or 3 for “Fit between research question and method of analysis” and “Fit between stated research question and method of data collection”. Most studies provided adequate “Descriptions of procedure for data collection” and “Detailed recruitment data”. Most studies (n=10) provided discussions of the key strengths and limitations of the study (scoring 2) and three studies gave thorough, complete discussions of strengths and limitations obtaining a maximum score of 3. The lowest average scores were found for “Representative sample of target group of a reasonable size”, “Good justification for analytic method selected”,

“Assessment of reliability of analytic process (Qualitative only)” and “Evidence of user involvement in design”. See Table 1 for scores on each criterion across studies.

21

Table 1. Descriptive statistics of methodological quality

Item Criteria Free Bird Fulm Fitzp Ly et Gaffn Inkst Gardi Pinto Burto Suga Ring Pinto Overall man et el., er et atrick al., ey et er et ner et al., n et numa et al., et al., Item et al., 2018 al., et al., 2017 al., al., et al., 2016 al., et al., 2015 2013 Mean 2018 [22] 2018 2017 [31] 2014 2018 2017 [28] 2016 2018 [20] [29] (SD)a [21] [24] [23] [25] [27] [30] [26] [32]

1 Explicit theoretical framework 2 3 2 2 2 3 1 1 0 1 1 1 0 1.5 (0.97) 2 Statement of aims/objectives in 3 3 3 3 3 3 3 3 3 3 3 3 3 3.0 (0) main body of report 3 Clear description of research 3 2 3 2 2 3 2 2 2 2 1 2 1 2.1 setting (0.64) 4 Evidence of sample size 3 3 3 3 3 0 0 3 0 1 0 0 1 1.5 considered in terms of analysis (1.5) 5 Representative sample of target 2 2 2 2 1 1 2 1 1 1 0 1 1 1.3 group of a reasonable size (0.63) 6 Description of procedure for 3 2 2 3 2 3 3 2 2 2 2 1 1 2.2 data collection (0.69) 7 Rationale for choice of data 3 2 3 3 3 2 1 1 2 1 2 0 1 1.8 collection tool(s) (0.99) 8 Detailed recruitment data 3 3 3 2 2 2 2 2 2 3 2 1 1 2.2 (0.69 9 Statistical assessment of 3 2 2 2 2 1 1 0 3 0 1 0 2 1.5 reliability and validity of (1.05) measurement tool(s) (Quantitative only) 10 Fit between stated research 3 2 3 3 3 2 3 3 3 2 1 2 2 2.5 question and method of data (0.66) collection (Quantitative only)

22

11 Fit between stated research N/A N/A 2 2 2 N/A 2 2 2 1 N/A 2 N/A 1.9 question and format and (0.35) content of data collection tool e.g. interview schedule (Qualitative only) 12 Fit between research question 3 3 2 3 2 3 3 2 3 2 3 2 2 2.5 and method of analysis (0.52) (Quantitative only) 13 Good justification for analytic 3 2 1 2 2 1 1 2 0 2 0 0 0 1.2 method selected (1.01) 14 Assessment of reliability of N/A N/A 0 0 0 N/A 0 0 0 0 N/A 0 N/A 0 (0) analytic process (Qualitative only) 15 Evidence of user involvement in 0 0 3 0 0 0 0 0 0 1 1 0 0 0.4 design (0.87) 16 Strengths and limitations 3 3 2 2 2 2 3 2 2 2 2 2 2 2.2 critically discussed (0.44) aScores can range from 0 (not at all) to 3 (complete)

23

Study characteristics

The characteristics of the included studies are summarised in Table 2. The 13 studies identified were conducted between 2013 and 2018 in four countries. Five studies were conducted in the United Kingdom [21,22,25–27], six studies in the United States

[20,23,24,28–30], one study in Sweden [31] and one study in Japan [32]. Across the studies, there was considerable heterogeneity in study design, intervention design and outcome measures used. The majority of included studies focussed on interventions for common mental health problems including depression [26–29] and/or anxiety [23,24], specific phobia

(heights) [21], loneliness [20] and psychological distress [22,25]. Three studies focussed on improving mental well-being [30–32]. A large proportion of studies (n=7) were preliminary and included feasibility [28], pilot RCTs [25,26,29–31] or non-randomised trials [32]. Two studies used quasi-experimental designs [20,27] and four studies were full-scale RCTs [21–

24].

Most studies (n=8) used mixed methods [20,23,24,26–28,30,31] and the majority (n=9) reported on both mental health outcomes and user experiences [20,21,23,24,26–28,30,31].

Over half of the included studies (n=7) used specifically designed control groups including screen/online psychoeducation [23,24,28,29], paper and CD/MP3 based psychoeducation

[30] or an active control condition utilising another relational agent, ELIZA [2], [22,25]. Two studies used TAU which consisted of treatment for depression with a clinician [26] or corresponded to no treatment [21]. One study used a waitlist control group [31] and one study used a non-randomised control group of participants who had expressed interest in taking part in the study but could not complete the intervention at that time [32]. Finally, two quasi-experimental studies did not use a control group [20,27]. However, Ring et al [20]

24 compared groups that used two different versions (proactive and passive) of the relational agent intervention.

25

Table 2. Characteristics of included studies.

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Freeman Acrophobia Now I Can Do  Single blind RCT3 [2-week intervention  Significantly reduced  No attrition 88% 1 et al., Heights; VR ; speech with 4-week FU4] + panel of participants fear of heights (HIQ)  96% attended 1+ VR sessions 2018 input & output; provided verbal feedback on the post treatment; effect  Mean sessions attended 4.66 (SD8 UK [21] embodied intervention size d=2.0, P<.0001 1.27)  100 adults with a fear of heights (>=30 on  Sustained at FU  Mean session duration 26.8 2 5 CBT ; virtual coach HIQ ) self-selected from community  69% fell below entry minutes (SD 2.7) delivers CBT for fear  Intervention group – n=49; 6x 30 min criterion at FU (<30 on  Mean total intervention time of heights including sessions 2-3 times per week for 2 weeks; HIQ) compared to none 124.43 (34.23) 6 behavioral Median age 45 (IQR 30-53); 41% female; of the control group  90% completed VR sessions; 4 experiments, belief 96% white; mean duration of fear of  Adjusting for people did not complete (3 found it ratings & heights 32.0 years (13.8); 86% diagnosis of imbalances in gender at too difficult & 1 could not attend psychoeducation acrophobia baseline between further appointments) 7  Control group – n=51; TAU (equivalent to groups did not alter  Levels of discomfort (SSQ9) in VR no treatment); Median age 46 (IQR 38-53); findings very low. 63% female; 88% white; Mean duration of  Panel comments reported fear of heights 28.4 years (15.0); 94% satisfaction with intervention diagnosis of acrophobia

26

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Bird et el., Psychologic MYLO; Online; free  RCT [1 session intervention with 2-week  No significant  No attrition between pre and post. 76% 2018 al distress text input; text FU] differences between FU optional, 60% attrition UK [22] output  171 staff & students self-reporting a intervention & control  Mean intervention session 10 MOL ; agent asks problem causing psychological distress; condition on self- duration 13 minutes questions aimed at Mean age 22.8 (SD 7.19); 81.6% female reported distress,  Mean control session duration 5 helping participant to  Intervention group – n=85; one online effect size d=- minutes shift awareness to session of participant determined length; 0.14(P=.27) or DASS-21,  Intervention rated as significantly higher levels in order Mean problem related distress 6.42 (SD effect size d=0.18 more helpful than control at post to resolve internal 1.92); Mean DASS-2111 total 34.63 (SD (P=.16) intervention and FU P=.001 conflict and reduce 19.22).  Significantly reduced distress  Control group – n=86; one online session self-reported distress with conversational agent ELIZA of and DASS-21 scores participant determined length; Mean over time in both problem related distress 6.34 (SD 1.86); groups P<.001 Mean DASS-21 total 30.26 (SD 19.69)  Intervention rated as significantly more helpful than control P=.001  Intervention resulted in significantly higher problem resolution post intervention compared to control P<.001.

27

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Fulmer et Depression Tess; Online; free  RCT [2 week or 4-week intervention] +  Significantly reduced  1% (n=1) attrition (control group) 75% al., 2018 & anxiety text or fixed user satisfaction survey depression symptoms  Intervention groups exchanged USA [24] response option  Mixed methods (PHQ-9) in intervention 14,238 messages in total input including  75 university students from 15 USA group 1 compared to  Mean messages exchanged 192 emojis; text output universities; Mean age 22.9; 70% female; control effect size  Group 1 exchanged a Mean of 283 Eclectic; Guided 41% white d=0.68, P=.03 messages (SD 147.6) activities based on  Intervention group – n=50; Unlimited  Significantly reduced  Group 2 exchanged a Mean of 286 self-reported mood. access to Tess online via an instant anxiety symptoms (104.6). Uses CBT, messenger app with daily check-ins for 2 (GAD-7) in intervention  86% of participants were satisfied mindfulness-based weeks (Group 1) or biweekly check-ins for group (G1: P=.045 G2: with intervention compared to 60% therapy, emotionally 4 weeks (Group 2) P=.02) compared to of control focused therapy, o Group 1 (n=24): mean age 24.1 (SD control  The best things about intervention 12 13 14 ACT , MI , self- 5.4); 71% female; Mean PHQ-9 score  Significantly reduced were; accessibility, empathy, 15 compassion therapy 6.67 (SD 4.6); Mean GAD-7 score PANAS scores in learning and interpersonal 6.71 (SD 4.0); Mean positive affect intervention group 1  The worst things about 16 (PANAS ) 19.88 (SD 1.4); Mean compared to control intervention were; limitations in approaches. Tess negative affect (PANAS) 13.08 (SD 1.3) P=.03. natural conversation, being unable learns over time o Group 2 (n=26): Mean age 22.19 (SD to understand certain responses, which intervention 2.8); 73% female; Mean PHQ-9 score getting confused by answers styles participants 7.04 (SD 4.9); Mean GAD-7 score 7.5 prefer and (SD 4.9); Mean positive affect (PANAS) decreases/increases 21.31 (SD 1.3); Mean negative affect content accordingly. (PANAS) 14.38 (SD 1.3)  Control group – n=24; Information control. Online link to National Institute of Mental Health eBook on depression; Mean age 22.5 (SD 4.0); 67% female; Mean PHQ-9 score 8.17 (SD 4.2); Mean GAD-7 score 9.46 (SD 3.9); Mean positive affect (PANAS) 22.13 (SD 1.4); Mean negative affect (PANAS) 15.75 (SD 1.3)

28

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Fitzpatrick Depression Woebot; App; free  RCT [2-week intervention] + free-text  Significantly reduced  17% (n=12) attrition; 31% (n=11) 71% et al., & anxiety text or fixed feedback questionnaire depression symptoms control; 9% (n=3) intervention 2017 response options  Mixed methods (PHQ-9), effect size  Mean frequency of interaction 12.1 USA [23] input including  70 university students with self-reported d=0.44 (intention-to- times (SD 2.23). emojis; text output symptoms of anxiety and/or depression; treat), P=.04 compared  Significantly higher satisfaction CBT; ‘onboarding’ Mean age 22.2 (SD 2.33); 67% female; 79% to control with intervention overall (P=<.001) (socialization); white; 46% moderately or severely  Study completers (both and with content (P=.021) guided exercises and depressed; 74% severely anxious groups) experienced a compared to control. psychoeducation;  Intervention group – n=34; brief, daily CBT significant reduction in  Participants liked the daily check- general questions informed intervention; Mean age 22.58 anxiety symptoms ins (n=9); intervention’s about context and (SD 2.38); 79% female; 82% Caucasian; (GAD-7), effect size ‘personality’ (n=7) & information mood e.g. ‘How are Mean PHQ-9 score 14.30 (SD 6.65); Mean d=0.37, P=.004 provided (n=12) you feeling’; links to GAD-7 score 18.05 (SD 5.89); Mean  No change observed in  Participants reported intervention CBT videos; a ‘word positive affect (PANAS) 25.54 (SD 9.58); affect (PANAS). had difficulty understanding some game’ relating to Mean negative affect (PANAS) 24.87 (SD responses (n=10); some technical cognitive distortions; 8.13) problems (n=8); problems with psychoeducation;  Control group – n=36; Information control. content & repetitiveness (n=2) goal setting; regular Online eBook entitled ‘Depression in check-in; daily/bi- college students’; Mean age 21.83 (SD daily usage prompts; 2.24); 55% female; 75% Caucasian; Mean weekly mood charts PHQ-9 score 13.25 (SD 5.17); Mean GAD-7 score 19.02 (SD 4.27); Mean positive affect (PANAS) 26.19 (SD 8.37); Mean negative affect (PANAS) 28.74 (SD 8.92); 46% moderately or severely depressed; 74% severely anxious

29

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Ly et al., Wellbeing Shim; App; free text  Pilot RCT [2-week intervention] + Semi-  No significant  No attrition 65% 2017 or fixed response structured interview (20-30 minutes) difference between  1 person in intervention did not Sweden option input; text focused on positive and negative aspects groups at post complete 14 daily reflections or [31] output of intervention intervention on the FS , was inactive for 7 or more days in a Eclectic; tailored  Mixed methods effect size d=0.01 row questions and  28 adults; self-selected community sample P=.20, PSS-10 effect  78.6% participants active 50% or psychoeducation; (university, online & social media); not size d=-0.96, P=.28, or more days) guided exercises and receiving psychological therapy or SWLS effect size  Mean frequency of app opening activities using medication; mean age 26.2 (SD 7.2); 53.6% d=0.17, P=.28 17.71 (SD 15.7) positive psychology female; 64% students (intention-to-treat).  Mean active days 8.21 (SD 3) approaches  Intervention group – n=14; daily  For intervention  Qualitive feedback (n=9), themes: (expressing gratitude, intervention; Mean age 21.1 (SD 8.8); 50% completers (n=13, o Negative: repetitive 17 practicing kindness, female; Mean FS score 44.43 (SD 5.9); active at least 25% of content; shallow 18 replaying positive Mean PSS-10 Score 15.36 (SD 5.2); Mean the days & not inactive relationship; lack of 19 experiences, SWLS score 25.5 (SD 5.2) for >=7 days), a notifications engaging in enjoyed  Control group – n=14; wait-list control significant difference o Positive: learning; available; activities) & third group; Mean age 25.4 (5.3); 57% female; between groups post accessible; perception of wave CBT strategies Mean FS score 46.14 (SD 4.7); Mean PSS- intervention on the FS app as real person; able to (present moment 10 Score 16.86 (SD 5.0); Mean SWLS score effect size d=0.14, form relationship awareness, valued 25.86 (SD 3.9) P=.032 and PSS-10 directions; d=1.06, P=.048. No committed actions; significant difference in empathic responses); SWLS, effect size daily check-ins; d=0.37, P=0.10. weekly summaries.

30

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Gaffney et Psychologic MYLO; Online; free  Pilot RCT [2-week intervention with 2-  No significant  12.5% attrition (4 excluded from 62% al., 2014 al distress text input; text week FU] + therapy process analysis differences between analysis due to server malfunction; UK [25] output  48 university students self-reporting intervention & control 1 excluded due to incomplete MOL; agent asks problem related psychological distress condition on self- measures; 1 lost to follow up) questions aimed at (website & posters); Mean age 21.4 (SD reported distress,  Mean usage intervention 19.23 (SD helping participant to 3.1); 79% female effect size d=-0.60 0.002) shift awareness to  Intervention – n=26; one session (up to 20 (P=.13) or DASS-21,  Significantly higher ratings of higher levels in order minutes); 68% female; Mean distress 6.77 effect size d=0.17 helpfulness (self-reported) post to resolve internal (SD 1.85); Mean DASS-21 score 36.73 (SD (P=.36) intervention for intervention group conflict and reduce 24.95)  Significantly reduced P<0.05. distress  Control – n=22; one session (up to 20 distress (self-reported  Therapy process analysis: greater minutes) with conversational agent ELIZA; & DASS-21) in both higher-level awareness of problem 90% female; Mean distress 7.10 (SD 1.41); groups at post significantly predicted greater Mean DASS-21 score 30.80 (SD 23.08) intervention P<.01 and problem resolution P=.01 sustained (DASS-21) P=.05 or significantly improved (self- reported distress) at FU P<.01  Problem resolution (self-reported) significantly higher for intervention group post intervention P<0.05

31

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Inkster et Depression Wysa; App; free text  Quasi-experimental [2-week intervention]  Significantly reduced  83% of high usage users used app 56% al., 2018 or fixed response + in-app feedback depression (PHQ-9) for for more than 4 days UK [27] options input; text  Mixed methods both high and low  60% completed at least one output  129 individuals with symptoms of usage groups (authors wellness tool; Eclectic; inbuilt depression (PHQ-2 score>=6); global acknowledge may be  In-app feedback: 92 users provided questionnaires e.g. sample whom downloaded app voluntarily due to regression to 282 feedback responses PHQ-9 to match from AppStore; diverse time zones 48.1% mean)  68% rated app experience symptoms to USA; 26.4% Europe; 18.6% Asia  High usage group favorable support; questions,  Intervention – n=129; stratified; experienced  Found app & tools helpful guided exercises and o High usage, n=108: at least one use significantly greater  Conversation helped to feel better psychoeducation between pre and post; Mean PHQ-9 improvement in 20  32% rated app less favorable utilizing CBT, DBT , score 18.92 (SD NR22) depression (PHQ-9) 21  Tools not helpful; did not use the MI, PBS , behavioral o Low usage, n=21: no usage compared to low users tools; app not understanding or reinforcement, between pre and post measures; CL=0.63, P=.03 roughly repeating; app self-focused; mindfulness, guided Mean PHQ-9 score 19.86 (SD NR) equivalent to d=0.47 conversations ‘bothered’ the user micro actions and  Users who reported it was ‘hard to tools to build cope’ rated app significantly more emotional resilience favorably than users who reported ‘not hard or slightly hard to cope’  There were 1.6% (128/8075) instances of ‘objection’ from the 129 users

32

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Gardiner Wellbeing Gabby; Online; fixed  Pilot RCT (Feasibility) [30-day intervention]  Number of stress  7% attrition [n=4; intervention 54% et al., (stress response options  Mixed methods management group 9.7% (n=3); control group 2017 manageme input; speech output;  61 women self-referred from outpatient techniques used 3.3 % (n=1)] USA [30] nt) embodied clinics and BMC24 online newsletter; Mean increased in both  Feasible 23 MBSR ; guided age 35 (SD 8.4); 51% white groups post  Intervention used Median 52 exercises and  Intervention – n= 31; daily (no time limit); intervention (Mean 1 minutes (IQR 101.4) psychoeducation mid-intervention reminder T/C or email; to 4 intervention  Women favored using intervention (MBSR) e.g. being 48% white; Mean age 33 (SD 8.1); Mean group, Mean 2 to 3 for compared to control present in the PHQ-9 score 7 (SD 4.7); Mean SF-12 MCS25 control). No significant  70% of women used intervention moment; responding score 61 (SD 11.6); Mean PSS score 17 (SD difference between information to manage stress and not reacting to 3.7); Mean frequency stress management groups despite a trend compared to 66% of controls. stress; awareness of techniques used in past week 1 (SD 2) favoring intervention  Intervention feedback: breath meditations;  Control – n=30; Information control, same group.  Benefits: fast, reliable, credible body scan; mindful content as intervention delivered via  No significant  Challenges: sound and quality of eating; mindful yoga; worksheets and CD/MP3 meditations; differences between voice, time commitment & 26 progressive muscle mid-intervention reminder T/C or email; groups post accessibility relation; guided 53% white; Mean age 37 (SD 8.4); Mean intervention on imagery. PHQ-9 score 7 (SD 4.6); Mean SF-12 MCS depression (PHQ-9; score 59 (SD 9.8); Mean PSS score 18 (SD P=.82), usual activities 3.5); Mean frequency stress management (SF-12 MCS; P=.46) or techniques used in past week 2 (SD 2.6) stress (PSS; P=.07)  A significant reduction in alcohol use for stress management in intervention condition P=.03

33

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Pinto et Depression eSMART-MH;  Feasibility & acceptability analysis of RCT  No harm, distress or  46% attrition over 12 weeks 50% al., 2016 Computer; ([29] also included) [3-session intervention adverse events (reported difficulties traveling to USA [28] embodied; fixed over 8 weeks with post measures at 12-  Depression (HADS) the university as travel costs not response options weeks] reduced between pre covered) input; output method  Mixed methods (Mean 8.08, SD 4.74)  48% of intervention participants 27 28 NS SBAR3 ;  60 Young adults (28 completed all and post (Mean 6.50, completed all 4 sessions interaction with measures) self-reporting depressive SD 4.23) for the  Participants generally liked the virtual healthcare symptoms for at least 2 weeks; Mean age intervention group intervention (ratings of 4/5 for staff with virtual 22 (SD 2.5); 67% female; 67% African P=.140. most items) coach who provides American; 58% self-reported a formal  No between group  Participants found intervention and tailored feedback diagnosis of depression or anxiety in past analyses reported avatars acceptable, Mean and psychoeducation or present immersion score 68.46 (SD 21.78) to facilitate effective  Intervention – n=12; 3 sessions (15-20 comments e.g. ‘It felt real, like I communication minutes) spaced 4 weeks apart; Mean was there’ about depressive HADS29 depression score baseline 8.08 (SD  Intervention providers (avatar symptoms 4.74) coach & Healthcare practitioner)  Control – n=16; attention control, screen- acceptable based education on healthy living (each  Content acceptable module 15-20 minutes); Mean HADS  Positive aspects of intervention: depression score baseline 8.50 (SD 3.83) o Interactivity o Increased preparedness for real life interactions  Suggested changes to intervention: o Greater freedom to tailor content and response options o Counselling option at the end o More frequent, longer sessions o Online access

34

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Burton et Depression Help4Mood;  Pilot RCT [4-week intervention] + semi-  Small improvement in  21% [n=7; intervention group 14% 52% al., 2016 Computer; structured interview of experience with BDI-2 scores in both (n=2); control group 36% (n=5)] UK [26] embodied; speech & intervention groups, intervention (-  Low uptake (aimed to recruit 52 fixed text response  Mixed methods 5.7) and control (-4.2) but closed after 28) options input; speech  28 adults with a diagnosis of MDD30 &  Regular users of  Median number of times used 10.5 & text output scoring >=10 on BDI-231 & currently intervention (at least  Median total duration used 134 CBT; CBT informed receiving fortnightly treatment with a twice a week) obtained minutes 32 intervention clinician (TAU ); Mean BDI-2 score 20.7 greater benefit, median  Almost all would recommend designed to support (SD 7.7); 64% female reduction of 8 points intervention to others patients receiving  Intervention – n=14; TAU + daily use of on BDI-2 compared to 3  Liked ability to customize gender & treatment for intervention at home; mid-intervention points for casual users appearance of avatar; tailor session depression with a T/C; Mean age 35.3 (SD 12.1); 71% female; (3-7 days per week). A length clinician; utilizes Mean BDI-2 score 19.6 (SD 8.1) reduction in BDI-2  Able to establish relationship symptom self-report  Control – n=14; TAU (appointments with a score of more than 5  Disliked repetition and ‘coldness’ tools; daily mood; clinician); Mean age 42 (SD 10.4); 57% points reflects a of agent weekly mood (PHQ- female; Mean BDI-2 score 21.8 (SD 6.8) clinically important 9); sleep; positive difference and negative thoughts; ; relaxation. Supplemented by accelerometer measurement of physical activity and acoustic analysis of speech

35

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Suganuma Wellbeing SABORI; Online;  Non-randomized pilot trial [1-month  Significantly improved  Overall, 74.1% (1978/2668) did not 45% et al., embodied; free text intervention] positive mental health compete FU measures 2018 input; text output  2668 eligible self-selected adults (WHO-5-J) effect size  42% (236/559) of intervention Japan [32] CBT; Guided (employees, students, ‘housewives’) d=0.09, P=.02 in participants did not complete 15+ behavioral responded to online advert intervention group days of intervention and were interventions and  454 included (completed post intervention compared to control excluded from analysis psychoeducation; & if in intervention group used post intervention  User experiences not assessed questions aimed at intervention for 15+ days); 70% female  Significantly reduced self-monitoring  Intervention – n=191; Access intervention negative mental health mood; feedback and at least every other day (i.e. >15 times in (K10) in intervention behavioral total); Mean age 38.04 (SD 10.75); 69% group compared to suggestions based on female; WHO-5-J33 Mean score 15.03 (SD control at post input 5.26); K1034 mean 23.58 (SD 9.56); BADS- intervention, , effect AC35 mean 16.09 (SD 8.36); BADS-AR36 size d=-0.24, P=.005 mean 18.51 (SD 8.79)  Significantly increased  Control – n=263; No intervention behavioral activation (expressed interested in intervention but (BADS-AC), effect size could not partake at that time); Mean age d=0.16, P=.01 for the 38.05 (SD 13.45); 71% female; WHO-5-J intervention group Mean score 15.64 (SD 5.53); K10 mean compared to control at 23.76 (SD 9.97); BADS-AC mean 15.67 (SD post intervention 8.27); BADS-AR mean 17.71 (SD 9.36)  No significant differences observed on avoidance/rumination (BADS-AR) between groups post intervention, effect size d=-0.05

36

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Ring et al., Loneliness Tanya; Computer;  Quasi-experimental [1-week intervention,  Trend for proactive  14% attrition [n=2; 1 due to 35% 2015 embodied; fixed pre-& post] + semi structured interview group to have greater technical problems & 1 due to USA [20] response options  Mixed methods reduction in loneliness mental ill health] 38 input; speech output;  14 (12 completed) self-selected (online (UCLA ) compared to  Both proactive and passive: Mean Eclectic; assesses advert on job recruiting website) older passive group P=.13 15.9 (SD 8.1) interactions per week affective state “How adults living alone with no significant  Reduction in loneliness lasting an average of 140 (SD 2.3) are you” and depressive symptoms (scoring <3 on PHQ- score was correlated seconds each provides empathic 237); Mean age 65 (Range 56-75); 79% with average time  Post-test satisfaction Mean 4.4 (SD feedback; talks about female spent interacting with 2.3) on scale of 1 (very unsatisfied) local sports; conducts  Intervention – n=12 stratified; the agent r=0.7, , effect to 7 (very satisfied) a brief social chat; o Proactive, n=7: size d=0.48, P<.05  Post-test ease of use Mean 1.9 (SD motivational o Passive, n=5:  Participants reported 1.5) on scale of 1 (very easy) to 7 dialogue encourages feeling less lonely (very difficult) physical activity to (P<.01), happier (P<.01)  Thematic analysis of interviews combat symptoms of and more comfortable revealed: depression. Two (P<.01) when talking to Participants liked: versions created – the proactive agent o Content that induced passive (no sensor), compared to the positive affect through which relies on passive agent humor; comforting person to activate statements; exercise and Proactive encouragement (sensor), which Participants disliked: detects when person o Irrelevant topics walks past and o Repetition attempts to initiate o Limited topics conversation.  67% described agent’s ‘personality’  Most (6/7 participants) would recommend proactive agent to a friend compared to only 2/5 in passive condition

37

Author, Mental Relational agent Study type, methods & participant Primary mental health Engagement and primary user QATSDD year, health name & description characteristics outcome(s) experience outcome(s) score country domain Intervention (%) approach & description Pinto et Depression eSMART-MH;  Pilot RCT [3-session intervention over 8  Significantly reduced  Attrition NR 40% al., 2013 Computer; weeks with post intervention measures at depression symptoms  User experiences not assessed USA [29] embodied; fixed 12-weeks] (HADS), P=.01 in response options  28 self-selected young adults with self- intervention group input; output not reported depression symptoms or compared to control specified; SBAR3; diagnosis of MDD; Mean age 22 (SD 2.2); group post intervention interaction with 82% non-white; 64% female; 71% not virtual healthcare taking psychotropic medication or staff with virtual psychotherapy; 69% scored >=8 on the coach who provides HADS tailored feedback  Intervention – n=NR; 3 sessions (duration and psychoeducation NR) to facilitate effective communication about depressive symptoms 1VR: Virtual reality; 2CBT: cognitive behavioural therapy; 3RCT: randomised controlled trial; 4FU: follow up; 5HIQ: heights interpretation questionnaire; 6IQR: interquartile range; 7TAU: treatment as usual; 8SD: standard deviation; 9SSQ: simulator sickness questionnaire; 10MOL: method of levels; 11DASS-21: depression, anxiety and stress scale- 21; 12ACT: acceptance and commitment therapy; 13MI: motivational interviewing; 14PHQ-9: patient health questionnaire-9; 15GAD-7: generalised anxiety disorder-7; 16PANAS: positive and negative affect schedule; 17FS: flourishing scale; 18PSS-10: perceived stress scale; 19SWLS: satisfaction with life scale; 20DBT: dialectical ; 21PBS: positive behavioural support; 22NR: not reported; 23MBSR: mindfulness based stress reduction; 24BMC: BioMed Central; 25SF12 MCS: short form survey mental health composite score; 26T/C: Telephone call; 27NS: Not specified; 28SBAR3: structured communication enhancement strategy; 29HADS: hospital anxiety and depression scale; 30MDD: major depressive disorder; 31BDI-2: Beck depression inventory-2; 32TAU: treatment as usual; 33WHO-5-J: WHO-Five wellbeing index Japanese; 34K10: Kessler psychological distress scale; 35BADS-AC: behavioural activation for depression scale, activation; 36BADS-AR: behavioural activation for depression scale, avoidance/rumination; 37PHQ-2: patient health questionnaire-2; 38UCLA: UCLA loneliness scale.

38

Participants

Only one study, a pilot RCT, recruited participants from clinician caseloads or registers [26].

The remaining studies recruited self-selected participants from the community through outpatient clinics [30], universities [22–25,28,29,31], online adverts [20,31,32], radio advert

[21] and by downloading the intervention app through the app store [27].

The included studies reported results from a total of 1,200 participants. Study sample sizes ranged from 14 [20] to 454 [32]. Study participants ranged between 16 and 75 years old and gender prevalence was 70% female from studies that reported this data (12 out of 13). One study with 129 participants [27] did not collect data on age or gender and one study recruited only women [30]. Participants varied widely in severity of psychological distress from minimal psychological symptoms [20] to formal clinical diagnoses such as Major

Depressive Disorder [26] and acrophobia [21]. Five of the 13 included studies recruited participants who self-reported symptoms of psychological distress to varying degrees.

Relational agent interventions

Six of the relational agents were embodied (7 studies) [20,21,26,28–30,32]. Relational agents used different technologies with three relational agents accessed on an app

[23,27,31], four online (5 studies) [22,24,25,30,32], three using an offline computer program

(4 studies) [20,26,28,29] and one virtual reality (VR) program utilising a VR headset [21].

The majority (8 out of 11 agents, evaluated in 9 studies) of the relational agents included took natural language input either written [22–25,27,31,32] or spoken [21,26]. The

39 remaining three agents took responses from participants using fixed onscreen response options (4 studies) [20,28–30]. Output mainly consisted of questions or written text (6 out of 11 agents, evaluated in 7 studies; [22–25,27,31,32]. Four agents used spoken output

[20,21,26,30]. Two studies (one relational agent) [29] did not specify whether the relational agent output was written or spoken.

The relational agents provided interventions aimed at reducing symptoms [20–27], increasing well-being [30–32] or improving self-management [28,29]. Across the set of relational agents, a range of therapeutic orientations were used including Cognitive

Behavioural Therapy (CBT) [21,23,26,32], Method of Levels (MOL) [22,25], Mindfulness

Based Stress Reduction (MBSR) [30], Structured Communication Enhancement Strategy

(SBAR3) [28,29], and eclectic interventions drawing on a wide variety of approaches

[20,24,27,31]. Over half of the relational agents (7 out of 11) focussed on providing psychoeducation and self-management strategies [23,24,26,27,30–32], one agent

(evaluated in 2 studies) utilised principles of Method of Levels (MOL) therapy in a question and answer format [22,25], one agent offered social companionship [20] and one agent

(evaluated in 2 studies) facilitated practice of effective communication with human healthcare professionals around psychological symptoms [28,29].

Relational agent interventions varied widely in frequency and duration (see Table 2). From short interventions of one session (participant determined length, [22]; up to 20 minutes,

[25]), three sessions (unspecified duration,[29]; 15-20 minutes each,[28]), and six sessions

(30 minutes each, [21] through to daily usage over two weeks [23,24,31], four weeks [24,26] or a month [30]. One study only used data from participants who had engaged with the 40 intervention at least every other day (>15 times) over a month [32]. Finally, one study installed one of two versions (‘passive’ activated at will and ‘proactive’ activated by a motion sensor) of the same relational agent into participant’s homes for one week. One study enabled participants to continue treatment as usual (TAU) for depression with a clinician alongside the relational agent intervention [26]. The majority of studies (n=9) set no upper limits on usage during the defined study period [20,22–24,26,27,30–32].

Feasibility and engagement

One study reported low uptake as they aimed to recruit 52 participants but closed the study at 28 participants [26]. Attrition rates between pre and post measures were reported in 11 studies and varied widely from no attrition [21,22,31] to 74.1% (1978 out of 2668 participants) [32]. Reasons reported for drop-out included difficulties attending the university to take part due to financial difficulties [28], technical problems [20,25] and mental illness [20]. One study with a high attrition rate (74.1%) [32] did not report any reasons, however it is noted that the majority of the drop out was from the control condition (88%; 1846/2109) compared to 24% (132/559) in the intervention condition.

Studies reported differing metrics for engagement, however, engagement with the relational agent interventions was highly variable from a short period of interaction in one session (e.g. a mean of 13 minutes; [22]) to a median interaction total of 134 minutes [26] or exchanging a mean of 192 messages during intervention [24]. See Table 2 above for full details. In Suganuma et al [32] 236 out of 427 (45%) of intervention participants did not complete 15 or more days of intervention and were excluded from the analysis. Three

41 people (6%) in the Freeman et al [21] study found the intervention sessions too difficult and did not complete the intervention. However, 44 out of 49 (90%) of participants completed the intervention, with a mean total intervention time of 124 minutes. One study [29] did not report any measures of engagement.

Psychological outcomes

Primary outcome measures were all validated but varied (see Table 2 for details), therefore the term psychological distress will be used to facilitate summary. Of the 13 studies included, five controlled studies reported significant post treatment improvements in psychological distress in the intervention group compared to a no treatment or information control group. Significant improvements were observed on measures of depression

[23,24,29], psychological distress [32], anxiety [24], fear of heights [21] and positive affect

[24,32]. Effects ranged from small (d=-0.24; [32]) to very large (d=2.0; [21]). Additionally, two pilot trials with active control groups found significantly higher ratings of problem resolution in the intervention group compared to control group [22,25].

Four controlled studies reported no significant post treatment differences on measures of psychological distress between intervention and control groups [22,25,30,31] with both intervention and control conditions demonstrating reduced distress [22,25,31] or increased uptake of stress management techniques [30]. Despite significant reductions in depression observed in the intervention group compared to the control group in intention-to-treat analysis in the Fitzpatrick et al [23] study, no significant post treatment differences in anxiety were observed between groups.

42

Finally, the two uncontrolled studies included in the review [20,27] and two studies that did not test for between group effects [26,28] reported reductions in depression [26–28] and loneliness [20] post intervention. Generally, greater engagement with the conversational agent resulted in greater reductions in psychological distress [20,24,26,27,31]. Only three studies included a follow up period [21,22,25].

User experience outcomes

From studies that reported user experience outcomes (n=11) generally, participants reported being satisfied with the relational agent interventions offered

[20,21,23,24,27,28,30]. Three studies reported that participants found the relational agent interventions available and accessible [24,30,31]. Participants reported that they found the agent empathic [24], that they liked the interactivity [28], the agent’s personality [20,23], the agent’s ability to form a relationship [26,31] and ability to learn from input [24].

Participants reported that they liked the ability to customise the gender and appearance of

ECAs [26] and the option to tailor the session length to their own needs [26]. Participants in the Fitzpatrick et al [23] study reported that they liked the daily check-ins and information provided. Two studies reported that participants indicated that they would recommend the relational agent intervention to other people [18,24; the proactive version].

The predominant challenges to intervention with a relational agent included repetitive content [20,23,24,26,27,31], limitations in the agents ability to understand or respond appropriately [20,23,24,27], a shallow or superficial relationship [26,31], the sound and

43 quality of the agents voice [30] and specific intervention tools or content [23,27]. Some participants in the Pinto et al [28] study reported that they would like more frequent, longer intervention sessions and greater freedom to tailor content and responses to their needs.

Discussion

Principal results

The use of relational agents for treating mental health problems appears to be limited but is growing quickly, with five of the included studies published in 2018 alone [21,22,24,27,32].

Furthermore, despite the heterogeneity in evaluation methods there is an increasing emphasis on fully powered RCTs testing efficacy. Included interventions were generally brief, allowed participants to control the intensity of intervention and drew from a wide variety of psychological approaches. All included studies reported reduced psychological distress post intervention with a conversational agent. Five controlled studies demonstrated significant reductions in psychological distress compared to an information or no treatment control group with small to large effects. This provides some support for the utility of conversational agents in treating mild to moderate psychological distress in adults

[21,23,24,29,32]. However, their broader utility in promoting positive wellbeing in non- clinical populations appears uncertain [30, 31]. Controlled studies with active control conditions (e.g. another conversational agent or human psychological therapy) failed to demonstrate superior effects [22,25,26]. However, it is important to highlight that these studies assessed relative rather than absolute, treatment efficacy and thus we cannot conclude an absolute lack of treatment efficacy.

44

Studies managed to recruit participants through several different methods. Remarkably, the only study that reported difficulties recruiting participants relied on clinicians to refer patients to the study [26]. Studies that used more flexible recruitment routes such as online adverts [32] and app stores [27] recruited greater numbers of participants. It is possible that clinician apprehension about digital treatment for mental health problems affected recruitment rates. This is supported by research indicating that clinicians are perhaps more reluctant to recommend digital interventions without clinician input or support [33,34]. Our findings illustrate that relational agents are generally an acceptable format of intervention for participants. Interestingly, participants valued aspects of agents usually seen as unique to therapy with a human such as empathic responses, ‘personality’, the ability to build a relationship and an interactive, conversational approach. This is consistent with research demonstrating that people relate to relational agents as if they were human despite knowing that they are computer programs [35]. Participants also valued the ability of the agent to learn from their input, perhaps emulating the learning of a human therapist over time. Participants found intervention with relational agents difficult or frustrating when the agent did not understand, became confused or was repetitive. This perhaps mirrors expectations around core relationship factors such as feeling understood. Control was also important for participants especially regarding tailoring session length and content to their own needs and engaging with interventions in their own words (e.g. free-text rather than fixed response options). The accessibility of the interventions was a key strength for many participants and where accessibility was limited, participants highlighted this and suggested ways to improve accessibility (e.g. through online access [28]).

45

Limitations of included studies

The studies described have several limitations. Sample sizes were mainly small and self- selected which reduces the ability to draw firm conclusions about the reliability and validity of the findings. Furthermore, due to short or absent follow up, conclusions about the sustainability of treatment gains cannot be made. Psychological comorbidity was not assessed in any of the studies despite comorbidities being prevalent in individuals with common mental health problems [36]. Safety was only explicitly evaluated and reported in one study [28]. Engagement with interventions was highly variable and the reasons for this remain unexplored. Furthermore, the impact of the design or features of the relational agents (e.g. embodiment and speech or text based) on engagement or outcomes was not explicitly assessed or compared therefore conclusions cannot be drawn as to the most effective or acceptable modality. No studies evaluated therapeutic equivalence or superiority to other treatment modalities such as face-to-face therapy. Finally, a large proportion of agents were eclectic interventions comprising of a variety of strategies and psychoeducation drawing on a range of therapeutic orientations [20,24,27,31]. Therefore, it is difficult to ascertain what the ‘active’ ingredients of the interventions are.

Strengths and limitations

Due to the lack of standardised terminology in this area, we conducted a comprehensive search that prioritised sensitivity over specificity. We also reviewed reference lists for additional papers not identified through the database searches. Published abstracts commonly presented in technology conferences were also included as they typically provide enough detail for decisions to be made about inclusion. The review was also registered on

46

PROSPERO prior to commencing. We also included a broad range of formats for relational agents including VR, embodied and/or text and speech input. Cohens kappa showed substantial agreement in full text screening and there was a high percentage of agreement overall. This is despite inconsistencies in the reporting of interventions which made the process of eligibility assessment more complicated and reflects the heterogeneity and complexity in the field. Due to the heterogeneity of included studies, no meta-analysis could be undertaken. Furthermore, some potentially relevant relational agents developed for the treatment of mental health problems were excluded from this review due to not reporting a mental health outcome measure (e.g. ELIZA [37][38–41]).

Future directions

Continued growth in the use of relational agents in mental health treatment is expected.

Considering the findings, several priority areas for further research are apparent. Firstly, addressing technical deficits such as repetition and confusion which were reported in half of the included studies [20,23,24,26,27,31] may help to overcome barriers to engagement.

Increased interdisciplinary working between computer science and mental health may facilitate this and help to drive innovations forward. Given that only one included study explicitly reported on safety [28], demonstrating safety will also be key to developing patient and public trust [40]. Furthermore, given the range of differing modalities of relational agents and lack of direct comparisons between them found in this review, it will be important to compare modalities e.g. embodied/non-embodied or speech/text or offer increased choice to individuals. This would enable further insight into what works and for whom. Our review found that a large proportion of relational agents use an eclectic mix of psychological interventions with often limited theoretical basis (e.g. [20]). Only one 47 included study reported on the process of psychological change [25] with relational agent,

MYLO. Identifying and demonstrating the key mechanisms of action of relational agent interventions has the potential to increase treatment efficiency, reduce unnecessary burden on users and increase transparency. Given the diversity of mental health problems (e.g. depression, anxiety, phobias) appearing amenable to treatment with relational agent interventions, consideration of transdiagnostic approaches to intervention would further increase applicability and reach (e.g. to people with co-morbidities or difficulties that don’t easily fit into prespecified diagnostic categories). Finally, in line with guidance on research priorities for digital interventions [42] it will be important to demonstrate efficacy and/or superiority compared to alternative relational agent interventions and other treatment modalities such as face-to-face therapy to develop patient and clinician confidence in this type of intervention.

Conclusions

This systematic review provides an assessment of relational agent interventions used for the treatment of mental health problems. Based on the current evidence, the efficacy and acceptability of conversational agent interventions appears promising compared to no treatment or information control. However, studies failed to demonstrate superiority when compared to other active, conversational interventions and their broader utility in promoting wellbeing in non-clinical populations is unclear. Therefore, whether relational agent interventions are an adequate substitute to other therapy modalities remains unclear.

Future studies should strive to demonstrate efficacy, equivalence (or superiority) and cost effectiveness through RCTs with comparisons to other forms of treatment. Studies that can demonstrate exactly how interventions achieve psychological change and for whom will be 48 important in streamlining bloated interventions to increase acceptability. Finally, transdiagnostic approaches to treatment may provide further opportunity to maximise the reach and simplicity of relational agent interventions.

Conflicts of interest

None declared.

Abbreviations

ECA: Embodied conversational agent VR: Virtual reality DSM-V: Diagnostic and Statistical Manual RCT: Randomised controlled trial QATSDD: Quality assessment tool for studies with diverse designs CBT: Cognitive behavioural therapy FU: Follow up HIQ: Heights interpretation questionnaire IQR: Interquartile range TAU: Treatment as usual SD: Standard deviation SSQ: Simulator sickness questionnaire MOL: Method of levels DASS-21: Depression, anxiety and stress scale ACT: Acceptance and commitment therapy MI: Motivational interviewing PHQ-9: Patient health questionnaire GAD-7: Generalised anxiety disorder PANAS: Positive and negative affect schedule FS: Flourishing scale PSS-10: Perceived stress scale SWLS: Satisfaction with life scale DBT: Dialectical behaviour therapy PBS: Positive behavioural support MBSR: Mindfulness based stress reduction SF12 MCS: Short form survey mental health composite score SBAR3: Structured communication enhancement strategy HADS: Hospital anxiety and depression scale

49

MDD: Major depressive disorder BDI-2: Beck depression inventory-2 TAU: Treatment as usual WHO-5-J: WHO-Five wellbeing index Japanese K10: Kessler psychological distress scale BADS-AC: Behavioural activation for depression scale, activation BADS-AR: Behavioural activation for depression scale, avoidance/rumination PHQ-2: Patient health questionnaire-2 UCLA: UCLA loneliness scale

50

References

1. McTear M, Callejas Z, Griol D. The conversational interface: Talking to smart devices. Springer Publishing Company; 2016. [doi: 10.1007/978-3-319-32967- 3]ISBN:9783319329673 2. Weizenbaum J. ELIZA---a computer program for the study of natural language communication between man and machine. Commun ACM 1966;9(1):36–45. PMID:12345678 3. Fast E, Horvitz E. Long-Term Trends in the Public Perception of Artificial Intelligence. Assoc Adv Artif Intell 2016; [doi: https://doi.org/10.1016/j.clay.2008.11.008] 4. Gentsch P. AI in Marketing, Sales and Service. How Marketers without a Data Science Degree can use AI, Big Data and Bots. Switzerland: Palgrame Macmillan; 2019. [doi: 10.1007/978-3-319-89957-2] 5. Campbell RH, Grimshaw MN, Green GM. Relational Agents: A Critical Review. Open Virtual Real J 2009;1:1–7. [doi: 10.2174/1875323X00901010001] 6. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, Surian D, Gallego B, Magrabi F, Lau AYS, Coiera E. Conversational agents in healthcare: a systematic review. J Am Med Informatics Assoc 2018;25:1248–1258. [doi: 10.1093/jamia/ocy072] 7. Bickmore T, Gruber A. Relational agents in clinical psychiatry. Harv Rev Psychiatry United States; 2010;18(2):119–130. [doi: https://dx.doi.org/10.3109/10673221003707538] 8. Hoermann S, McCabe KL, Milne DN, Calvo RA. Application of synchronous text-based dialogue systems in mental health interventions: Systematic review. J Med Internet Res 2017 Aug 7;19(8):267. PMID:28784594 9. Scholten MR, Kelders SM, Van Gemert-Pijnen JE. Self-guided Web-based interventions: Scoping review on user needs and the potential of embodied conversational agents to address them. J Med Internet Res. 2017. p. e383. PMID:29146567 10. Provoost S, Lau HM, Ruwaard J, Riper H. Embodied conversational agents in clinical psychology: A scoping review. J Med Internet Res 2017;19(5):1–23. PMID:28487267 11. Luxton DD. An Introduction to Artificial Intelligence in Behavioral and Mental Health Care. In: Luxton D, editor. Artifical Intell Behav Ment Heal care 2016. p. 1–26. [doi: 10.1016/B978-0-12-420248-1.00001-5] 12. Duarte A, Walker S, Littlewood E, Brabyn S, Hewitt C, Gilbody S, Palmer S. Cost- effectiveness of computerized cognitive-behavioural therapy for the treatment of depression in primary care: findings from the Randomised Evaluation of the Effectiveness and Acceptability of Computerised Therapy (REEACT) trial. Psychol Med 2017;47(10):1825–1835. [doi: 10.1017/S0033291717000289]

51

13. Gilbody S, Brabyn S, Lovell K, Kessler D, Devlin T, Smith L, Araya R, Barkham M, Bower P, Cooper C, Knowles S, Littlewood E, Richards DA, Tallon D, White D, Worthy G. Telephone-supported computerised cognitive-behavioural therapy: REEACT-2 large- scale pragmatic randomised controlled trial. Br J Psychiatry 2017;210(5):362–367. [doi: 10.1192/bjp.bp.116.192435] 14. Lovejoy CA, Buch V, Maruthappu M. Technology and mental health: The role of artificial intelligence. Eur Psychiatry 2019;55:1–3. PMID:30384105 15. Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, Altman DG, Booth A, Chan AW, Chang S, Clifford T, Dickersin K, Egger M, Gøtzsche PC, Grimshaw JM, Groves T, Helfand M, Higgins J, Lasserson T, Lau J, Lohr K, McGowan J, Mulrow C, Norton M, Page M, Sampson M, Schünemann H, Simera I, Summerskill W, Tetzlaff J, Trikalinos TA, Tovey D, Turner L, Whitlock E. Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015: Elaboration and explanation. Br Med J 2015. PMID:25555855 16. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA group S. Academia and Clinic Annals of Internal Medicine Preferred Reporting Items for Systematic Reviews and Meta-Analyses : Ann Intern Med 2009;151(4):264–269. PMID:21603045 17. American Psychiatric Association. Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing, Inc.; 2013. 18. Popay J, Roberts H, Sowden A, Petticrew M, Aria L, Rodgers Ma, Britten N, Roen K, Duffy S. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews A Product from the ESRC Methods Programme. ESRC Methods Program. 2006. 19. Sirriyeh R, Lawton R, Gardner P, Armitage G. Reviewing studies with diverse designs: The development and evaluation of a new tool. J Eval Clin Pract 2012;18(4):746–752. PMID:21410846 20. Ring L, Shi L, Totzke K, Bickmore T. Social support agents for older adults: longitudinal affective computing in the home. J multimodal user interfaces 2015 Mar;9(1, SI):79– 88. [doi: 10.1007/s12193-014-0157-0] 21. Freeman D, Haselton P, Freeman J, Spanlang B, Kishore S, Albery E, Denne M, Brown P, Slater M, Nickless A. Automated psychological therapy using immersive virtual reality for treatment of fear of heights: a single-blind, parallel-group, randomised controlled trial. The Lancet Psychiatry 2018;5(8):625–632. PMID:30007519 22. Bird T, Mansell W, Wright J, Gaffney H, Tai S. Manage Your Life Online: A Web-Based Randomized Controlled Trial Evaluating the Effectiveness of a Problem-Solving Intervention in a Student Sample. Behav Cogn Psychother 2018;1–13. PMID:29366432 23. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Heal 2017;4(2):e19. PMID:28588005

52

24. Fulmer R, Joerin A, Gentile B, Lakerink L, Rauws M. Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety; A Randomized Controlled Trial. 2018;5. PMID:30545815 25. Gaffney H, Mansell W, Edwards R, Wright J. Manage Your Life Online (MYLO): A pilot trial of a conversational computer-based intervention for problem solving in a student sample. Behav Cogn Psychother 2014;42(6):731–746. [doi: http://dx.doi.org/10.1017/S135246581300060X] 26. Burton C, Szentagotai Tatar A, McKinstry B, Matheson C, Matu S, Moldovan R, Macnab M, Farrow E, David D, Pagliari C, Serrano Blanco A, Wolters M. Pilot randomised controlled trial of Help4Mood, an embodied virtual agent-based system to support treatment of depression. J Telemed Telecare 2016;22(6):348–355. PMID:26453910 27. Inkster B, Sarda S, Subramanian V. An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being: Real-World Data Evaluation Mixed-Methods Study. JMIR mHealth uHealth 2018;6(11):e12106. PMID:30470676 28. Pinto MD, Greenblatt AM, Hickman RL, Rice HM, Thomas TL, Clochesy JM. Assessing the Critical Parameters of eSMART-MH: A Promising Avatar-Based Digital Therapeutic Intervention to Reduce Depressive Symptoms. Perspect Psychiatr Care 2016;52(3):157–168. PMID:25800698 29. Pinto MD, Hickman RL, Clochesy J, Buchner M. Avatar-based depression self- management technology: Promising approach to improve depressive symptoms among young adults. Appl Nurs Res 2013;26(1):45–48. PMID:23265918 30. Gardiner PM, McCue KD, Negash LM, Cheng T, White LF, Yinusa-Nyahkoon L, Jack TW, Bickmore BW. Engaging women with an embodied conversational agent to deliver mindfulness and lifestyle recommendations: A feasibility randomized control trial. Patient Educ Couns 2017;100(9):1720–1729. [doi: http://dx.doi.org/10.1016/j.pec.2017.04.015] 31. Ly KH, Ly AM, Andersson G. A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interv 2017;10:39–46. [doi: http://dx.doi.org/10.1016/j.invent.2017.10.002] 32. Suganuma S, Sakamoto D, Shimoyama H. An Embodied Conversational Agent for Unguided Internet-Based Cognitive Behavior Therapy in Preventative Mental Health: Feasibility and Acceptability Pilot Trial. JMIR Ment Heal 2018;5(3):e10454. [doi: 10.2196/10454] 33. Topooco N, Riper H, Araya R, Berking M, Brunn M, Chevreul K, Cieslak R, Ebert DD, Etchmendy E, Herrero R, Kleiboer A, Krieger T, Garcia-Palacios A, Cerga-Pashoja A, Smoktunowicz E, Urech A, Vis C, Andersson G. Attitudes towards digital treatment for depression: A European stakeholder survey. Internet Interv 2017;8:1–9. [doi: 10.1016/j.invent.2017.01.001] 34. Dunne N. Evaluation of psychology clinicians’ attitudes towards computerized

53

cognitive behavior therapy, for use in their future clinical practice, with regard to treating those suffering from anxiety and depression. Dissertation & Theses - Antioch University Santa Barbara; 2018. 35. Waytz A, Cacioppo J, Epley N. Who Sees Human? The Stability and Importance of Individual Differences in Anthropomorphism. Perspect Psychol Sci 2010;5(3):219–232. [doi: 10.1177/1745691610369336] 36. Kessler RC, Wai TC, Demler O, Walters EE. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry. 2005. PMID:15939839 37. Weizenbaum J. ELIZA---a computer program for the study of natural language communication between man and machine. Nat Lang Eng 1966 Jan;9(1):36–45. [doi: 10.1145/365153.365168] 38. Shinozaki T, Yamamoto Y, Tsuruta S, Knauf R. Validation of Context Respectful Counseling Agent. IEEE Int Conf Syst Man, Cybern 2015. p. 993–998. [doi: 10.1109/SMC.2015.180] 39. Hudlicka E. Virtual training and coaching of health behavior: Example from mindfulness meditation training. Patient Educ Couns 2013 Aug;92(2):160–166. [doi: 10.1016/j.pec.2013.05.007] 40. Tielman ML, Neerincx MA, van Meggelen M, Franken I, Brinkman W-PP. How should a virtual agent present psychoeducation? Influence of verbal and textual presentation on adherence. Technol Health Care 2017;25(6):1081–1096. PMID:28800346 41. Mitchell S, Krizman K, Bhaloo J, Cawley M, Welch M, Bickmore T, Ring L, Alvarez C. Treating Comorbid Depression During Care Transitions Using Relational Agents. Boston; 2016. 42. Hollis C, Sampson S, Simons L, Davies EB, Churchill R, Betton V, Butler D, Chapman K, Easton K, Gronlund TA, Kabir T, Rawsthorne M, Rye E, Tomlin A. Identifying research priorities for digital technology in mental health care: results of the James Lind Alliance Priority Setting Partnership. The Lancet Psychiatry 2018; PMID:30170964

54

Paper 2: Empirical paper

Agents of Change: A Multi-Method Study to Optimise the

Helpfulness of an Artificial Relational Agent for Treating Mental

Health Problems

Running title: Helpfulness of a relational agent for mental health treatment

The following paper has been prepared for submission to:

‘Acta Psychiatrica Scandinavica’

(For the purposes of the thesis, tables and figures have been left in the main body of the

text and not limited to five in total)

The guidelines for authors can be found in Appendix C.

Word Count: 7,929

(Excluding abstract, tables and references)

55

Acknowledgements

We would like to acknowledge the contributions of: Danny Whittaker a Mental Health

Writer and podcast host who advertised the study; the participants who agreed to take part; the Community Liaison Group at the University of Manchester who provided feedback on study procedures and materials; and Dr Lesley-Anne Carter whom provided supervision on the statistical analyses.

Conflicts of interest

None.

Abstract

Objective: To understand theory driven, therapeutic processes associated with the helpfulness of an online relational agent intervention, ‘Manage Your Life Online’ (MYLO).

Methods: Fifteen participants experiencing a mental health related problem used MYLO for two weeks. At follow-up, participants each identified two helpful and two unhelpful questions posed by MYLO within a single intervention session. Questions were rated for helpfulness and various therapeutic process factors including therapeutic alliance. A mixed effects model was fitted to examine associations between helpfulness and therapy process factors. Qualitative interviews analysed using thematic and content analysis enabled further insights into the process of intervention with MYLO, its acceptability and design.

Results: MYLO appeared acceptable to participants with a range of presenting problems.

Questions enabling free expression, increased awareness, and new insights were key to helpful intervention. Findings were consistent with core processes of therapeutic change, according to Perceptual Control Theory (PCT), a unifying theory of psychological distress.

56

Questions that elicited intense emotions, were repetitive, confusing or inappropriate, were identified as unhelpful and were associated with disengagement or loss of faith in MYLO.

Conclusion: Findings provide insight into likely core therapy processes experienced as helpful or hindering and outlines further ways to optimise acceptability of MYLO.

Key words: Mental health; Artificial Intelligence; Psychotherapy; Psychotherapeutic

Processes; Therapy, Computer-Assisted

Significant outcomes

1 MYLO is an acceptable intervention that has the potential to provide support flexibly

either alongside existing treatments or independently. The efficacy of MYLO remains

to be examined.

2 Facilitating the transdiagnostic therapeutic processes of talking freely, developing new

perspectives, and mobility of awareness appear key to helpful intervention with MYLO

across presenting problems. Our findings support the ambition towards greater

consideration of how psychological therapies work to achieve their effects to

maximise acceptability and efficiency.

3 MYLO has the potential to provide an accessible, far reaching form of psychological

support that could go some way to addressing current supply and demand problems

in psychological intervention.

57

Limitations

1 Due to difficulties recruiting participants through services, the sample size was smaller

than planned which affected the reliability of statistical analyses.

2 Intervention process factors were all highly correlated and may not be measuring

distinct or independent concepts. We have therefore been cautious in drawing

conclusions from this analysis.

3 Identification of two helpful and two unhelpful questions may not be representative of

an entire intervention session or the whole intervention.

58

Introduction

One in six people in England report a common mental health problem such as anxiety or depression [1]. Mental health problems comprise the single main source of disability and health-related economic burden globally [2]. However, despite high prevalence, access to treatment remains problematic and demand outstrips supply [3]. Key government policies encourage greater adoption of digital mental health interventions to increase access at reduced cost [4–6]. However, existing digital interventions recommended by the National

Institute for Health and Care Excellence (NICE) for common mental health problems have experienced high levels of attrition, and although active telephone support seems to facilitate effectiveness, this has increased concerns about their efficiency and cost- effectiveness [7,8].

Digital interventions that offer greater interactivity, increased choice and control over content, and applicability to a range of psychological problems, have the potential to increase acceptability and efficiency [9]. Continued and rapid advances in digital technology are now able to facilitate this vision [10]. One way of achieving greater collaboration and flexibility in computerised interventions is through relational agents. Relational agents are software programs that use artificial intelligence to simulate a conversation through text or voice [11] and the efficacy and acceptability of this method of intervention appears promising [12–16].

Demonstrating the efficacy and acceptability of digital interventions is a key focus of research and remains a vital priority [17]. However, evaluations have often neglected to

59 demonstrate core mechanisms of action, and as such, it remains largely unclear how interventions achieve their effects [18]. In the context of a broader paradigm shift in psychological intervention research to focus more closely on process [19], the mechanisms of action are of particular importance in digital interventions, as software is updated and changed rapidly. In particular, relational agent interventions rely on dynamic change (e.g. artificial intelligence) to deliver flexible, acceptable interventions to users [20,21].

Identifying mechanisms of action, including the therapeutic alliance as an agent of change, particularly in relational agents, is a top ten research priority in digital interventions in the

UK [21]. Investigating theoretically driven therapeutic processes requires a detailed analysis of what happens in therapy, how it is experienced by clients, and why they find it helpful or hindering [22–25]. Greater transparency regarding precisely how and why digital interventions achieve psychological change is likely to increase user trust, streamline interventions to their key components, and consequently increase reach [18].

‘Manage Your Life Online’ (MYLO) is an online relational agent developed at the University of Manchester, which uses artificial intelligence to simulate a conversation with a user through typed text. MYLO aims to emulate a transdiagnostic form of cognitive therapy called the Method of Levels (MOL) [26]. Transdiagnostic interventions are particularly suitable for the estimated 50% of service users who experience comorbidities [27], or problems that do not fit into pre-defined diagnostic categories [28]. Transdiagnostic interventions delivered in traditional and computerised formats have been shown to provide equivalent effects to disorder specific interventions of their type [29,30] and offer greater interactivity, flexibility and importantly, scalability. Moreover, MOL is differentiated from other cognitive therapies as it originates from a unifying theoretical approach known

60 as Perceptual Control Theory (PCT) [31]. PCT posits that psychological distress is due to sustained conflict in internal control systems that are essential to all living things. Therefore, the aim of MOL therapy is to increase a client’s sense of control through enabling them to talk freely whilst supporting a sustained focus on a problem and associated emotion.

Sustained exploration of a problem increases awareness of internal conflict, facilitates new perspectives of the problem, and enables reorganisation at the origin of the conflict [26].

Reorganisation is the process through which problem resolution occurs. MYLO has been evaluated in two feasibility trials using student samples, with promising results [32,33].

Aims of the Study

Primarily, we aimed to examine the process of intervention with relational agent MYLO with a sample reporting a mental health related problem that was troubling them. Greater understanding of the core processes associated with the helpfulness of MYLO will help to optimise its acceptability and helpfulness for users. We utilised quantitative and qualitative methods to closely examine the process of intervention both within and between participants. We postulated that four key mechanisms of psychological change identified in

MOL (perceived control, the ability to talk freely, to maintain a focus on emotion and gain new perspectives) would be positively associated with ratings of the helpfulness of MYLO’s questions (primary hypothesis). Therapeutic alliance factors were also included as a comparison as these have been shown to be moderately associated with clinical outcomes

[34].

Secondly, we aimed to explore levels of engagement and participant views on the design and function of MYLO to inform the further development of MYLO and optimise its 61 helpfulness. Finally, we aimed to explore psychological change between baseline and follow- up on measures of psychological distress (total scores on the GAD-7 & PHQ-9 and

PSYCHLOPS change scores) and capacity for psychological reorganisation (the key mechanism of change within MOL) to inform our understanding of the sample and facilitate comparisons between existing and future studies.

Material and Methods

Design

We conducted a multi-method, case series design, repeated over several cases. This design was theory led and facilitated detailed examination of client perceptions of helpfulness in a single intervention session with MYLO. Participants were granted online access to MYLO for a two-week period. Questionnaires were completed at baseline and follow-up and qualitative interviews conducted at follow-up.

Participants

We recruited people who self-reported a problem that was troubling them. Initially, we attempted to recruit exclusively through clinicians at a primary care mental health service.

However, this proved challenging and therefore we widened recruitment to various routes including electronic advertisement on a local peer support group, through the University of

Manchester research volunteering website and the University of Manchester counselling service (see Paper 3 for details). Inclusion criteria were: aged 16 and over, able to converse, read and write in English, interested in using an online intervention and had access to a device connected to the internet. We excluded people who had current suicidal intent or persistent self-injury, were currently psychotic, had substance dependence, a known

62 neurological or organic basis for presentation (e.g. dementia), a moderate to severe learning disability that would affect their ability to engage with the computer program, or a visual difficulty that would impair participation.

Ethics

The study was approved by an NHS Health Research Authority Ethics Committee (REC reference: 18/NW/0367). See Appendix D for approval letters. We abided by the American

Psychological Association (APA) Ethical Principles and Code of Conduct. Participants provided written informed consent. No treatment was withheld due to taking part in the study. Participants were provided with a list of contacts for help in a crisis and the researcher assessed risk on a weekly basis during participation. All data collected was pseudo anonymised.

Procedures

Potential participants contacted the researcher by email or telephone and were provided with the PIS via email (see Appendix E). Eligibility was assessed verbally over the telephone using the inclusion and exclusion criteria by the researcher. If eligible, an initial appointment in a private room at the study centre (The University of Manchester) was arranged by the researcher to gain informed consent, provide online access to MYLO (using a unique username and password) and complete baseline measures. Participants were advised to use MYLO at least once over two-weeks but no upper limits on usage were applied. MYLO conversation files were screened for risk information at least weekly. Participants received a phone call mid-intervention (one week) to check for any technical problems and assess risk.

After 2 weeks, participants attended the study centre in person to complete follow-up measures. At this session, participants were asked to read their longest MYLO conversation

63 transcript (up to the first 30 minutes) and identify two questions MYLO asked that they found helpful and two that they found more unhelpful. An intervention process measure

(see measures section for detail) was completed for each of the four questions. Participants were then interviewed about the content and process of intervention with MYLO.

Participants received £5 per completed assessment (£10 in total for both assessments).

Intervention (MYLO)

MYLO is an automated relational agent designed to deliver an MOL informed intervention through the format of text-based conversation without the support of a human clinician or therapist. MYLO is accessed online and can be used on any device through a web browser

(see Figure 1). Participants were provided with a randomly generated, unique username and password to login to MYLO. Users input free-text about their problem and MYLO analyses text input for key terms, phrases and themes. MYLO responds with questions aimed at encouraging higher-level awareness of a problem. Users can also provide real-time feedback about the helpfulness of questions to MYLO whilst submitting their answers. The researcher gave a short demonstration of the program at the baseline assessment and printed access instructions as a reminder. Participants were granted online access to MYLO over a two- week period with no upper limit on usage and no suggested duration or frequency of use.

Participants were provided with a printed list of contacts to use in a crisis.

64

Figure 1. MYLO conversation screen example

65

Measures

Intervention process measure

The primary outcome measure was a modified therapy process questionnaire developed from a previous MOL therapy process study [35] which utilises a similar format to a standardised therapeutic alliance measure (Session Rating Scale; SRS) [36]. Specifically, client ratings of helpfulness, alongside four process items measuring key mechanisms of psychological change according to MOL, and three standardised therapeutic alliance items adapted from the Session Rating Scale (SRS), were used to retrospectively capture the therapeutic process from a client perspective in a single intervention session (see Appendix

F). The feasibility, validity, and acceptability of this method was demonstrated in a previous study in which participants rated a video-recorded excerpt of their therapy session at 2- minute intervals [35].

The items measured 1) the perceived helpfulness of the question on a scale of 0, not helpful at all to 10, extremely helpful (primary outcome variable) 2) the degree to which the question enabled a sense of control over what was happening in conversation on a scale of

0, no control at all to 10, complete control 3) the ability to talk freely about the problem from 0, not able at all to 10, entirely able 4) the ability to experience emotion connected to the problem from 0, not able at all to 10, entirely able 5) to see the problem in a new way from 0, not able at all to 10, entirely able 6) to feel understood and respected from 0, not understood or respected at all to 10, understood and respected completely 7) to talk about what they wanted from 0, not able at all to 10, entirely able and 8) the extent to which they felt the question was a good fit from 0, not a good fit at all to 10, an extremely good fit. All

66 items were scored on an 11-point Likert scale. Four intervention process measures were completed by each participant for each of the questions they identified as either helpful (2 questions) or unhelpful (2 questions) from their longest conversation with MYLO.

We also used a secondary process measure called the Reorganisation of Conflict scale (ROC)

[37]. The ROC has three subscales, ‘inflexible or urgent problem solving’; ‘goal conflict awareness’ and ‘goal conflict reorganisation’ and several studies have evaluated its psychometric properties [38,39]. We used the latter 11-item sub-scale of this questionnaire which measured capacity for goal conflict reorganisation which is a key mechanism of change in MOL as only this sub-scale has been shown to have good internal consistency;

Cronbach’s alpha 0.83 [33].

Intervention engagement

The frequency (total number of logins to MYLO with a conversation of any duration) and duration (total length of time in minutes of conversation, rounded to the nearest minute) of

MYLO conversations were extracted using the automatic date and timestamps of conversations recorded in the MYLO program and were used as a proxy measure of engagement.

Symptom measures

Secondary outcome measures included the Patient Health Questionnaire (PHQ-9) [40], the

Generalized Anxiety Disorder Questionnaire (GAD-7) [41] and the Psychological Outcome

Profiles (PSYCHLOPS) [42]. The PHQ-9 is a 9-item measure of depressive symptoms with scores from 0 to 27 with a threshold score of 10 indicating clinical intervention. The measure has good internal consistency; Cronbach’s alpha 0.89 [40]. The GAD-7 is a seven-

67 item measure of anxiety with scores from 0 to 21 with a threshold of 8 indicating clinical intervention. The measure has good internal consistency; Cronbach’s alpha 0.92 [41].

Finally, the PSYCHLOPS is a four-question person-centred outcome measure with scores of 0 to 20. This measure assesses wellbeing, functioning and distress. It has good internal consistency with Cronbach’s alpha 0.79 (pre-therapy) and 0.87 (post-therapy) [43]. The change score between baseline and follow-up was used to measure intra-personal change as defined by the participant.

Qualitative interview

An audio taped interview was conducted to capture participants’ subjective experiences of why they chose each of the four questions as particularly helpful or unhelpful (see Appendix

G). Participants’ general views of intervention with MYLO, including feedback on the interface, usability, and design, were also gathered (see Appendix H for the interview schedule). Suggestions from participants regarding enhancements or modification of MYLO were extracted to inform clear recommendations for the development of the MYLO programme. Qualitative interviews were transcribed verbatim.

Piloting

Two pilots were carried out with staff members at a primary care mental health service prior to commencement of the study. Feedback on the usability of MYLO, appropriateness of the measures and interview schedules was gathered. Subsequently, specific technical amendments were made to MYLO (e.g. removing a header bar that made viewing MYLO difficult on a mobile phone screen; removing quotation marks from MYLO questions; increasing clarity of specific questions and increasing the variety of default questions) and

68 instructions were developed to help participants gain the most from intervention with

MYLO. Service user feedback from a community liaison group (CLG) was also gathered regarding the acceptability of the study design and measures. Recommendations made by the CLG were implemented, including demonstrating how to access MYLO on a computer at the initial appointment, providing printed instructions as a reminder, and clarifying that

MYLO is a computer program and may not respond as a human would (See Paper 3 for further details).

Statistical analysis

Analyses were performed in Stata version 15.1 [44] with an alpha level for significance of

5%. All variables were assessed for normality via histogram or boxplot inspection and skewness and kurtosis statistics. Descriptive statistics were used to describe the data. Power analysis (conducted in G*Power [45]) indicated that a total of 25 participants would be required to estimate a regression coefficient of 0.5 (a large effect) between the seven process variables and helpfulness score with 80% power at a significance level of 0.05.

Primary quantitative analysis

We conducted separate analyses for questions classed as helpful and questions classed as unhelpful. Data had a two-level hierarchical structure (question process measure, level-1 and participant, level-2) which violates the assumption of independent observations (see

Figure 2). Therefore, a two-level mixed effects model (STATA command MIXED) was fitted to investigate what process variables were associated with the perceived helpfulness of MYLO questions [46]. The participant variable was entered as a random factor to account for the clustered nature of the data.

69

Figure 2. Illustration of the two-level hierarchical structure of data, split by questions identified as helpful and unhelpful

Questions identified Questions identified as helpful as unhelpful

Level 2 Participant Participant

Level 1 Helpful Helpful Unhelpful Unhelpful question one question two question one question two

Outcome Rating of helpfulness Rating of helpfulness variable

4 ratings of MOL therapy 4 ratings of MOL therapy Predictor processes processes variables 3 ratings of therapeutic 3 ratings of therapeutic alliance processes alliance processes

Initially, separate two-level univariate mixed effects models were fitted with helpfulness rating as the outcome variable and each process item score as a predictor variable.

Following this, a multivariate mixed effects model was fitted with helpfulness rating as the outcome variable and all the process scores as predictors. We did not require techniques for missing data, as the primary analyses were conducted only on participants whom provided follow-up data. To assess normality of the distribution of outcome variables, post- estimation residuals were plotted using histograms. Because normality assumptions were violated, the analysis was conducted again with bootstrapping (1000 iterations) to correct

70 standard errors (SE) and provide a more accurate estimate of the confidence interval (CI) in line with guidance [47]. This paper only reports results from the bootstrapped analyses.

Secondary quantitative analyses

Descriptive statistics were used to describe engagement (frequency and duration of conversations) with MYLO. Changes in psychological distress and the process of psychological organisation were explored using paired samples t-tests on scores on the ROC,

PHQ9, GAD7 and PSYCHLOPS between baseline and follow-up. The study was not powered to detect significant effects; however, we report standardised effect sizes (Cohen d) to enable comparison with other work and future studies.

Qualitative analysis

Interview recordings were transcribed verbatim. The intervention process interview was analysed using thematic analysis [48]. Thematic analysis is a flexible approach to analysis that provides a rich account and interpretation of the data [48]. Initially, interview transcripts were read and re-read by the researcher to become familiar with the data. The research questions and subsequent interview guide provided the initial structure for the analysis (helpful and unhelpful questions), and thus a theoretical, research question driven approach to thematic analysis was used. The first author conducted the coding and themes reflecting participants’ accounts were identified for helpful questions and unhelpful questions. Where themes were related, main themes were developed encompassing subthemes. Quotes were then selected that represented each theme and themes were checked back with the transcripts to ensure they were representative of the data. This process is in line with guidance on conducting a thematic analysis [48].

71

Inductive content analysis [49] was employed to analyse general feedback from participants on their experiences of using MYLO and the design and function of MYLO. Content analysis is a replicable approach to describing and quantifying data which enables new insights and can facilitate the practical application of findings [49,50]. This approach enabled clear recommendations for the development of MYLO. Following familiarisation with the data, concepts were developed from the data and the frequencies of these between participant interviews were counted.

Results

Participant Characteristics

In a one-month period (November to December 2018) 28 people were assessed for eligibility. A total of 17 (60.7%) people were eligible and consented to the study and 15 completed follow-up measures (88%; see Figure 3 for recruitment flow diagram). All included participants had self-referred to the study. Due to challenges experienced during recruitment and time constraints (see Paper 3) we did not recruit the intended number of participants (n=25).

On average, participants were aged 33.4 years (Standard Deviation; SD, 14.5) and ranged between 22 and 67 years old. There were more female (n=11, 64.7%) than male participants

(n=6, 35.3%). Around half of the sample (52.9%, n=9) scored above the clinical cut-off on the

PHQ-9 at baseline and 58.8% (n=10) scored above the clinical cut-off on the GAD-7 at baseline. Participants self-reported a wide range of presenting problems such as anxiety, social anxiety, depression, loneliness, bereavement, low self-esteem, concerns about work, worry, sleep problems, relationship problems, financial concerns and career problems.

72

Figure 3. Recruitment flow diagram

Self-referred (n=38) Clinician referred (n=4)

Did not respond (n=12) Did not respond (n=2)

Assessed for eligibility (n=28)

Excluded (n=11)  Did not meet inclusion criteria (n=2)  Declined to participate (n=5)  Unable to attend the university (n=2)  Compensation not adequate (n=2)

Included (n=17)

 Completed baseline measures  Received 2-week MYLO intervention

Lost to follow-up (n=2)  Could not attend follow-up due to competing commitment & reported finding MYLO frustrating (n=1)  Did not attend follow-up, no response to further contact (n=1)

Analysed (n=15)

73

Primary results – intervention process

Identification of helpful and unhelpful questions

Each of the 15 participants who attended follow-up identified two questions that they found helpful and two questions that they found unhelpful from their longest MYLO conversation transcript (See Table 1). Generally, the questions that participants appeared to find more helpful were those that picked out key words e.g. “You used a phrase then - vicious circle - what led you to put it that way exactly?”, focussed on feelings e.g. “How does feeling confused or uncertain affect you?” and weighed up or analysed meaning, e.g. “What makes that important for you?”. Participants identified questions that asked for elaboration with no direction or interpretation more unhelpful e.g. “Please tell me a bit more by writing a few more sentences if you can”. The use of the words ‘mind’ e.g. “What's coming into your mind at the moment?” and ‘picture’ e.g. “If you have a picture in your mind, what is going on in it?” appeared to be more unhelpful for participants. Interestingly, one question

“What's running through the back of your mind while you're typing?” was identified more frequently than others (n=8, 13%) and was classified on an equal number of occasions as both helpful and unhelpful. This is consistent with research demonstrating that client preference influences whether they experience a therapist approach as helpful or hindering

[51].

74

Table 1. Frequency table of questions identified as helpful or unhelpful

Helpful Unhelpful Total Questions identified only as helpful Could you tell me what you think is the most important thing from 1 0 1 what you are saying here? How does feeling confused or uncertain affect you? 1 0 1 How does feeling uncertain affect the way you think? 1 0 1 It sounds like you are saying negative things about yourself. What 1 0 1 makes you do that? Please tell me what's on your mind (standard opening question) 1 0 1 Tell me something about why you said desperate? 1 0 1 This sounds like a sign of progress. What do you make of that right 1 0 1 now? What do you think about feeling anxious? 1 0 1 What is your experience of these memories at the moment? 1 0 1 What makes that important for you? 1 0 1 What makes you think about the idea of some kind of break? 1 0 1 What makes you use the word tired? 1 0 1 What options have you considered so far? 1 0 1 When you consider the way you think about this, how sure are you 1 0 1 that this is how you really feel? You said priorities. Can you tell me more about the alternatives you 1 0 1 are weighing up? You used a phrase then - vicious circle - what led you to put it that 1 0 1 way exactly? Questions identified only as unhelpful Helpful Unhelpful Total How does picturing the future like that make you feel now? 0 1 1 If you have a picture in your mind, what is going on in it? 0 1 1 Please tell me a bit more by writing a few more sentences if you can 0 2 2 Please tell me more about how you get this feeling? 0 2 2 Please try to give me a bit more detail by writing a few more 0 2 2 sentences about it Tell me something about why you said irrational? 0 1 1 What is going through your mind right now? 0 2 2 What is it about this sense you get that could be important? 0 1 1 What makes it helpful? 0 1 1 What makes you want to talk about that right now? How is it 0 1 1 important for you? What's coming into your mind at the moment? 0 3 3 What's going through your mind right now? 0 1 1 Questions identified as both helpful and unhelpful Helpful Unhelpful Total How is it to picture the future like that just now? 1 2 3 If you read back what you have just typed, what springs to mind 2 1 3 about what you have written? What do you notice about what you have just typed? 2 2 4 What makes this important to you? 1 1 2 What makes you use the word worried? 2 1 3 What would benefit you from being able to do this? 2 1 3 What's running through the back of your mind while you're typing? 4 4 8

75

Quantitative intervention process results

For questions identified as helpful, helpfulness scores were positively skewed. For questions identified as unhelpful, helpfulness scores were normally distributed. However, one participant rated questions they identified as unhelpful very highly for helpfulness which appeared contradictory. This participant’s qualitative interview data revealed mixed feelings about one of the unhelpful questions, “I thought the question was worded very well because it said if you can, so, it’s kind of giving you a get out clause, but then again, erm, for me, it, it wasn’t kind of, as liberating as the, the ones at the beginning,” (Participant 33).

Considering this, we used the median and interquartile range to describe the process level data as this provides a more accurate estimate of the average for data with extreme values

[52,53]. We also conducted a sensitivity analysis with this participants scores removed.

Results of the sensitivity analysis were consistent with the primary results therefore we report results with this participant’s scores included.

Descriptive statistics and distributions of scores on all intervention process variables from the intervention process measure split by helpful and unhelpful questions can be observed in Table 2 and Figure 4. Participants completed 32 items (eight items for four of MYLO’s questions) providing a total of 60 participant observations (30 for helpful questions and 30 for unhelpful questions) for the process analysis (see Figure 2 for illustration of data structure).

76

Table 2. Descriptive statistics for scores on intervention process measure items by helpful and unhelpful questions Process variable Helpful questions (n=30) Unhelpful questions (n=30)

Median Min Max Median Min Max (IQR†) (IQR)

Helpfulness 8.50 (2) 4 10 4.00 (2) 0 10 Control 8.00 (5) 5 10 4.00 (3) 1 10 Talk Freely 8.00 (7) 3 10 5.00 (5) 0 10 Emotion 9.00 (2) 4 10 5.00 (3) 1 10 Perspectives 8.00 (3) 2 10 3.50 (3) 0 10 Relationship (SRS‡) 7.00 (8) 2 10 4.00 (3) 0 10

Topic (SRS) 8.00 (5) 5 10 4.00 (4) 0 10 Good fit (SRS) 9.00 (5) 5 10 3.50 (3) 0 10

‡ †Interquartile range; Session Rating Scale

Figure 4. Boxplot of scores on all variables by unhelpful and helpful questions

10

8

6

4

2 0

unhelpful helpful Helpfulness Control Talk Freely Emotion Perspectives Relationship (SRS) Topic (SRS) Good Fit (SRS)

77

Helpful questions

Seven, separate two-level mixed effects regression models examining the association between each intervention process measure and helpfulness scores for questions classed as helpful were conducted (See Table 3). Questions that provided a greater sense of control, a sense of being understood and respected (relationship) and questions that were rated a good fit with the individual were positively associated with ratings of helpfulness for questions classed as helpful.

Table 3. Effect of process variables on helpfulness for questions identified as helpful

Predictor β coefficient SE† z P Confidence interval

(95%)

Control 0.51 0.22 2.28 0.022* 0.07 to 0.95 Talk Freely 0.30 0.34 0.90 0.370 -0.36 to 0.96 Emotion 0.34 0.32 1.07 0.284 -0.28 to 0.96

Perspectives 0.20 0.17 1.17 0.241 -0.14 to 0.54 Relationship (SRS) 0.35 0.15 2.37 0.018* 0.06 to 0.64 Topic (SRS) 0.38 0.26 1.44 0.150 -0.14 to 0.89 Good fit (SRS) 0.82 0.13 6.24 0.000** 0.56 to 1.07

*Significant at p<0.05; **Significant at p<0.001; †Bootstrapped Standard Error

78

To assess the contributions made by all intervention process measures on the helpfulness of

MYLO’s questions for questions classified as helpful, a multivariate mixed effects regression analysis was conducted (See Table 4). Questions that were rated as being a good fit remained significantly associated with helpfulness. A one unit increase in good fit rating accounted for a 0.8 unit increase in helpfulness. All other process factors were non- significant and beta coefficients were reduced compared to univariate analyses indicating that process factors were not independent of one another.

Table 4. Effect of intervention process measures on perceived helpfulness for questions classed as helpful

Predictor β coefficient SE† z P CI (95%)

Control 0.25 0.24 1.04 0.298 -0.22 to 0.72 Talk Freely -0.16 0.30 -0.52 0.601 -0.75 to 0.43 Emotion -0.07 0.23 -0.31 0.756 -0.54 to 0.39 Perspectives 0.08 0.14 0.57 0.566 -0.20 to 0.36

Relationship (SRS) -0.12 0.22 -0.50 0.620 -0.53 to 0.32 Topic (SRS) 0.01 0.30 0.03 0.978 -0.57 to 0.59 Good fit (SRS) 0.86 0.37 2.33 0.020* 0.14 to 1.59

*Significant at p<0.05; †Bootstrapped Standard Error

79

Unhelpful questions

Seven, separate two-level mixed effects regression models examining the association between each intervention process measure and helpfulness scores for questions classed as unhelpful were conducted (See Table 5). All of the process factors were positively associated with ratings of helpfulness for questions classed as unhelpful.

Table 5. Effect of process variables on helpfulness for questions identified as unhelpful

Predictor β coefficient SE1 z P CI (95%)

Control 0.54 0.27 1.97 0.049* 0.00 to 1.07 Talk Freely 0.48 0.15 3.23 0.001* 0.19 to 0.78 Emotion 0.55 0.14 3.99 0.000** 0.28 to 0.82 Perspectives 0.50 0.16 3.17 0.002* 0.19 to 0.81 Relationship (SRS) 0.58 0.14 4.20 0.000** 0.31 to 0.85 Topic (SRS) 0.49 0.18 2.73 0.006* 0.14 to 0.84

Good fit (SRS) 0.70 0.14 5.06 0.000** 0.43 to 0.97

*Significant at p<0.05; **Significant at p<0.001; 1Bootstrapped Standard Error

80

As above, to assess the contributions made by all intervention process measures on the helpfulness of MYLO’s questions for questions classified as unhelpful, a multivariate mixed effects regression analysis was conducted (See Table 6). No significant associations between process factors and helpfulness were found for questions rated as unhelpful. Beta coefficients were all reduced compared to univariate analyses indicating that process factors may not be independent of one another.

Table 6. Effect of intervention process measures on perceived helpfulness for questions classed as unhelpful

Predictor β coefficient SE1 z P CI (95%)

Control 0.10 0.25 0.39 0.695 -0.39 to 0.58 Talk Freely 0.02 0.26 0.08 0.938 -0.48 to 0.52 Emotion 0.28 0.24 1.17 0.243 -0.19 to 0.76 Perspectives 0.04 0.27 0.13 0.894 -0.50 to 0.57 Relationship (SRS) 0.21 0.41 0.50 0.614 -0.60 to 1.02

Topic (SRS) -0.02 0.29 -0.07 0.94 -0.59 to 0.55 Good fit (SRS) 0.18 0.63 0.28 0.78 -1.06 to 1.41

1Bootstrapped Standard Error

81

Qualitative intervention process results

Participants (n=15) provided general feedback on their experiences of using MYLO to explore their problems. Generally, MYLO appeared acceptable to participants and enabled participants to gain greater awareness of their problem and develop new perspectives:

…I think that it’s, more about, erm, you know, uncovering the things that you don’t normally think about, within your own, er, mind, within your own kind of consciousness, and then helping you find your actual path to, resolving some of your issues, that you know, you might be having, and I think that it’s quite useful, especially today when, erm, you know the mental health services are so over crowded…I think, think this is, appropriate for I think people like me, who just need to think about things from a different perspective. (Participant 20)

I think by, looking at it deeper, er, it’s almost like, when you tell somebody about a problem and their response is, why? and then you come back and tell them why and their response then is, why? it’s a good exercise, why? why? why? and, you, you eventually have to get to a source, it, it’s a bit like that really (Participant 1)

it did it’s job in a way, ‘cos it made me elaborate on things and actually get to the, the root problem. (Participant 42)

Some participants appeared to build a relationship with MYLO. For example, two participants inferred feelings to MYLO e.g. “…because it’s giving those dynamic, replies, so I kind of feel like, yeah, ok, even though it’s a robot maybe it would actually be feeling sad, and sorry, kind of thing” (Participant 12) and “it seemed, interested in me, kind of, and without, being kind of sympathetic, overly sympathetic or, erm, it, it wasn’t kind of, er, the expectations, you know and the more I felt like I could have gone with it, the more I would have got out of it, so it was kind as if I, I felt in control.” (Participant 33).

82

However, some participants (n=4) reported that they were aware of MYLO’s limitations e.g.

“I was aware all the time that there wasn’t a person, er, that, that became quite, apparent, the more I used it, that it was, er, an automated response almost.” (Participant 1). And three participants indicated that they had difficulty engaging with the app fully due to high distress and/or low motivation e.g. “because I was struggling a bit, I think I used it just quite sparingly, not, not a lot to be honest, as much as I would have wanted to…” (Participant 11).

Overall, most participants reported that they would recommend the intervention to a friend

(n=12). Three participants indicated that they thought it was most suitable for people with low-moderate levels of common mental health problems such as anxiety and depression e.g. “I think it could be quite a good tool for some people, er, that are having, problems with anxiety or depression, in the early stages” (Participant 1). Three participants thought it could be used alongside face-to-face therapy e.g. “…really good, erm, tool for, sort of complementing therapy, rather than, providing therapy” (Participant 12) and one participant highlighted the high demand for psychological services and thought it would be a useful intervention whilst on a waiting list for face-to-face therapy e.g. “if they needed to talk to some, something to get, to get things out, then, yeah, ‘cos, therapy’s not easy to get, you’re on a waiting list unless you’re suicidal basically” (Participant 2).

83

All participants whom attended follow-up (n=15) were interviewed about why they had chosen questions as helpful or unhelpful. Figure 5 illustrates a thematic map of participants responses to the question ‘What made you choose that question as particularly helpful?’.

Four major themes of talking freely, new perspectives, relationship and awareness were identified from the qualitative data. Two subthemes of compassion and humanity were identified within the main relationship theme. Specifically, questions that enabled a sense of being able to express themselves freely and in any direction of their choosing (talking freely; n=8); questions that enabled participants to begin to see their problem in new ways and gather new perspectives (new perspectives; n=13); questions that encouraged greater reflection and awareness of the details and emotions attached to a problem (awareness; n=12); questions that demonstrated understanding and compassion in relation to feelings

(relationship subtheme; compassion; n=10) and questions that felt more human and natural

(relationship subtheme; humanity; n=2) were associated with helpfulness.

84

Figure 5. Thematic map of participants’ reasons for choosing a question as being particularly helpful

it was very open ended, I like that, I, as I’ve rated, I felt as though I could direct the conversation in any way I wanted to (PI 42)

I was speaking in total honesty (PI 1) making me feel that, I, had, options and choices, (PI 21)

Talking freely (n=8) I didn’t feel like it was wanting me to respond in any particular way, just it felt like it was interested in whatever was on my mind at that particular time. (PI 33) I’d sort of ...I remember writing this one and being like, oh wow, like that’s changed my mind a bit (PI 41) come to a bit of a revelation New perspectives (n=13) before that, this bit was more of a lightbulb moment and sometimes it takes a long time in like therapy to like, like thought get there, or like, sometimes you might spend an hour talking about an issue and, you don’t feel about things like you can see it in a different way, whereas I felt like this help, helped me to do that. (PI 41)

that I hadn’t previously there’s a change because it made me realise ‘cos I’m thinking, oh my, oh my days, actually I am being thought about that I’m, I’m kind of not stuck in this (PI 1) desperate, erm, what, yeah, I’m desperate for something (PI 34) (PI 42)

acknowledgement, er, of, a change in the way I was answering, and that the, the Compassion programme had, had recognised that I was actually feeling that, as a progress (PI 1) (n=10) it was respectful, it was saying, if you can, you know, and I felt like it didn’t seem to the whole thing was kind of respecting how I felt, yeah. (PI 33) particularly helpful? particularly Relationship (n=12) automated, it seemed to be a response that felt almost human (PI 1) …it felt like quite actually …picking Humanity (n=2) a natural flow of up on the way I felt by my it’s quite like humanly worded (laughs) conversation. response (PI 1)

What made you choose that question as question that choose you made What like very conversational (PI 21) (PI 21)

… made me put things in perspective a bit more and be like, actually, does it matter that much, kind of thing (PI 41)

Awareness (n=12) it made me think about what emotions I was attaching to the issue, so, erm, just to be quite reflective I think, so I think that’s, that was quite helpful, to think about how it was affecting how I was feeling, yeah. (PI 22)

I didn’t stop to think about it, and it was nice to, to have this kind of thrown at me, like you’re saying something negative, so this is why it was helpful, made me reflect on that (PI 10) 85

Figure 6 illustrates a thematic map of participants responses to the question ‘What made you choose that question as particularly unhelpful?’. Four major themes of relationship

(subthemes: loss of faith and not understanding), question wording (subthemes: confusing and inappropriate), repetition and emotion (subthemes: too intense and disengagement) were identified from the interviews regarding questions classed as unhelpful. Specifically, questions that revealed that MYLO had not really understood the participant (relationship; subtheme; not understanding; n=2) and appeared to result in a loss of faith in MYLO more generally (relationship; subtheme; loss of faith; n=7); question content or wording that was confusing (question content; subtheme; confusing; n=7) or inappropriate (question content; subtheme; inappropriate; n=5); questions that were repetitive or required participants to repeat things they had already stated (repetition; n=10); questions that were too emotionally intense or required thinking about something that was too emotionally difficult at that time (emotion; subtheme; too intense; n=5) and appeared related to disengagement with MYLO (emotion; subtheme; disengagement; n=3) were associated with unhelpfulness.

86

Figure 6. Thematic map of participants’ reasons for choosing a question as being particularly unhelpful oh I don’t think …oh, I don’t know if this is gonna I’m gonna get Erm, frustration, really, disappointment, work for me, I was getting a bit much more out erm, feeling of, oh well this is a waste of Loss of faith (n=7) frustrated with it. (PI 21) of this. (PI 22) time, this isn’t going to help me…(PI 01) Relationship (n=9) …I just think I felt like if I was talking to a I was starting to feel more like I was just it didn’t really, Not understanding human they, there’s definitely different talking to a robot, if that makes sense? get it, do you things that they would have picked up Erm, and it almost felt like a little bit know what I (n=2) on what I’d just said (PI 22) more like an automated… (PI 22) mean, (PI 9)

wording was at the start was slightly confusing to me and I had I just didn’t understand what it

Confusing (n=7) to think about, ok, what does this question actually mean (PI 20) was asking. (PI 2) Question content I felt that this definite glitch almost with, with the programme… it was so inappropriate (PI 1) (n=12) Inappropriate (n=5) too kind of specific, it was just like, I feel like it, it was saying, it was asking then, it went back to that, it just, me to just, to, to kind of narrow it down a little bit too much (PI 12) kind of missed the mark. PI 09

because I feel like the question are asking a similar things, so, yeah, I, I just didn’t like to answer the same question again. (PI 23)

Repetition (n=10) I’d already told MYLO so much information and it was like, what’s I got a bit frustrated I think and I thought I’m just gonna be

particularly unhelpful? particularly going through your mind, I was like, you should know.(PI 34) writing the same thing out again… (PI 11)

I felt like I’d already said that, I’d already described what was going on. (PI 12)

What made you choose that question as question that choose you made What …I can’t keep doing all this thinking,….about the problem is making me more upset about it. (PI 34) Too intense (n=5) … just stresses me out a bit, and makes me a bit more like, anxious then that was just a bit too and like, I don’t know, like, what’s in my head, loads of things (PI 41) much, yeah. (PI 21) Emotion (n=8)

Disengagement that was at the point I closed the app, and I was like, er, don’t want to think about that right now, like (n=3) I’m on a train (PI 21)

sometimes I don’t want to think too much and delve it does make me annoyed because I don’t too much into things (PI 33) want to think about how I felt (PI 34) 87

Secondary results – Engagement, design and function, and clinical outcomes

Engagement with MYLO

The frequency of conversations with MYLO across the two-week intervention period varied between participants from a minimum of 1 to a maximum of 6. The mean number of conversations per participant was 2.53 (SD 1.38). The time each participant spent using

MYLO ranged from a minimum of 9 minutes to a maximum of 129 minutes. On average participants spent 35.7 minutes (SD 32.45) using MYLO in the 2-week period. Generally, participants found MYLO’s questions acceptable with 93% (238/255) of questions rated ‘OK’ by participants, 8.6% (22/255) rated as particularly helpful and 7.8% (20/255) rated as problematic.

Design and function

No participants reported any technical problems with MYLO during the intervention period.

Participants reported mixed feelings about the MYLO intervention design, functions and accessibility (see Table 7). Participants reported that MYLO was a quick, easy to access and available intervention which was simple to use and non-judgemental. However, participants expressed that they would like MYLO to be available as an app they could download and login to independently. Participants suggested several improvements to the content, such as a greater range of questions with increased individualisation (e.g. key word hits), the use of notifications to remind users to use the app, and the ability to view historical conversations.

Participants also suggested several improvements to the design including modernising the look, using a speech bubble format with automatic scrolling and including the option to create an avatar to represent themselves. Two participants indicated that they would like to see more colour across the app generally.

88

Table 7. Participant feedback on MYLO intervention functions and design

Positive feedback Suggestions for improvement Access Easy to access (n=12) Available as an app (n=7) Available (n=5) Available through search engines (n=1) Quick responses (n=5) Register for app independently (n=1) i.e. generate own username and password Private (n=1) No judgment/reduced stigma (n=1) Functionality Information about question Increase range of questions (n=3) button (n=2) Question feedback buttons Reduce question repetition (n=7) (n=3) Increase individualization (n=2) Provide summary/feedback of conversation at the end (n=1) View historical conversations (n=3)

Mood rating graph (n=1) Crisis contact information on main window (n=1) App notifications/reminders (n=2) Design/interface Simplicity (n=11) Automatic scrolling (n=3) Ease of use (n=2) Speech bubble format (n=4) Modernize (n=4) More use of color (n=2) Buttons instead of hyperlinks (n=1) Avatar/picture for MYLO and user in conversation screen (n=1)

89

Clinical outcomes

Differences between baseline and follow-up scores were normally distributed on all clinical outcome measures and there were no outliers. A small, non-significant increase was observed on ROC scores between baseline (72.48, SD 11.63) and follow-up (76.96, SD

12.84), t(14)=2.02, P=.063, d=0.37 indicating some improvement in capacity for reorganisation (the process through which conflict is resolved according to PCT [31]). Paired samples t-tests indicated small, but non-significant reductions in anxiety (d=0.15), depression (d=0.29) and distress (d=0.22) between baseline and follow-up (See Table 8) although due to the study design, the reasons for this cannot be attributed to the MYLO intervention. At follow-up, a third of the sample (29.4%, 5/15) scored above the clinical cut- off on the PHQ-9 and 41.2% (7/15) scored above the clinical cut-off on the GAD-7.

Table 8. Scores on clinical outcomes at baseline and follow-up and results of paired samples t-tests

Baseline Follow- Mean difference t df P- Effect size (n=15) up (95% Confidence value (Cohen’s ‡ Mean (n=15) interval ) d) (SD†) Mean (SD) PHQ-9 10.13 8.53 -1.6 (-3.27 to 0.07) -2.05 14 0.059 0.29 (5.37) (5.58) GAD-7 9.00 8.27 -0.73 (-2.31 to 0.84) -1.00 14 0.334 0.15 (4.88) (4.96) PSYCHLOPS 12.07 11.20 -0.87 (-2.53 to 0.79) -1.12 14 0.282 0.22 (3.17) (4.60) †Standard Deviation

90

Discussion

Main findings

As far as we are aware, this is the first study to investigate theory driven, therapeutic processes, from a client-perspective, for a relational agent intervention. Generally, participants found MYLO to be an accessible and acceptable intervention format which was simple to use and has the potential to provide flexible support, either as a complement to existing treatments, or standalone. Despite significant challenges in recruiting through clinicians, upon advertising within the wider community, interest in the study increased.

However, we do not have data on the proportion of people who pursued participation after seeing the advert therefore we cannot draw any firm conclusions about absolute levels of interest. Drop-out rates were low (2/17, 12%), especially compared to drop-out rates of up to 80% observed in previous internet-based therapy studies [54,55]. Participants reported a wide-range of presenting problems and over half of participants scored above clinical thresholds for anxiety and/or depression at baseline. No participants reported a worsening of symptoms at follow-up and small but non-significant reductions in psychological distress were found, although this was not a key aim of this study and we are cautious about drawing any conclusions from this (see limitations).

All of the therapy process factors (control over what was happening in conversation, the ability to talk freely, the ability to experience emotion, to see the problem in a new way, to feel understood and respected, to talk about the topic they wanted, and the extent to which the question was a good fit), were consistently rated highly for questions identified as helpful (median scores ranged from 7 to 9 out of 10) and consistently lower for questions

91 identified as unhelpful (median scores ranged from 3.5 to 5 out of 10). However, contrary to our hypothesis, only one the process factors - ‘good fit’ from the SRS was significantly associated with the helpfulness of MYLO questions in multivariate analyses (see limitations section and Paper 3 for further discussion). However, the results of our qualitative analysis provide partial support for our hypothesis. Compassionate and human-like questions, which enabled participants to talk freely, increase awareness of their problem, and gain new perspectives were identified as helpful. This is consistent with key mechanisms of change identified in MOL and PCT [26,31]. Notably, MYLO’s therapeutic approach (curious questioning) is exclusively concerned with enabling the client to develop their own understanding of the problem to gain new insights and solutions, which is vastly different to other relational agent interventions that tend to focus on psychoeducation, advice giving or teaching/learning new skills [56,57]. Our findings are also consistent with research indicating that relational agents that prevent the ability to talk freely (e.g. through limited, fixed choice response options) and offer unsolicited advice are experienced by clients as more unhelpful [58].

Repetitive, confusing or inappropriate questions, which highlighted MYLO’s lack of understanding, were associated with a loss of faith in the MYLO intervention and were identified as unhelpful. These themes are consistent with studies of other relational agent interventions [56,57,59–61] and are perhaps not unique to intervention with MYLO.

Furthermore, questions that elicited overwhelming or intense emotions were identified as unhelpful and appeared to be associated with disengagement from MYLO. This finding is supported by research suggesting there is an optimal level (a moderate amount) of emotional arousal in therapy which is associated with better outcomes [62].

92

Finally, several improvements to the design and functions of MYLO were highlighted including making MYLO available as an app, modernising the look and feel and adding functions such as notifications and conversation history.

Strengths and limitations

We conducted a comprehensive, multi-method analysis that examined the process of intervention with MYLO in detail, from a client centred perspective, to gain insights into what is helpful and hindering about the current MYLO intervention. We were inclusive in our entry criteria and did not exclude people based on mental health diagnosis, severity or concurrent or previous psychological treatments. No participants reported any technical problems or problems understanding how to use MYLO. This is in contrast to the high rates of clarification and/or technical guidance (an average of six emails over an 8-week intervention) required from therapists for a recent computerised CBT intervention [63].

Significant challenges to recruitment through clinicians resulted in a small, exclusively self- selected sample recruited through study adverts throughout the University of Manchester and a local peer support group. Furthermore, some participants were unable to take part due to having to travel to the University for assessments. This limits the generalisability of the findings. The challenges of recruiting through a primary care mental health service perhaps reflects the rare uptake of digital interventions in Improving Access to Psychological

Therapies (IAPT) services, despite a key aim of IAPT to provide treatment to as many clients as possible [64]. Moreover, the small sample resulted in limited statistical power, and thus also limits the ability to draw firm conclusions about core intervention process especially in relation to the quantitative results. Related to this, we were unable to conduct a more robust simulation-based power calculation that would account for the hierarchical data structure as we did not have prior estimates of important parameters (e.g. from previous 93 studies) [65]. We did not aim to demonstrate efficacy, as a much larger sample, a longer period with the intervention, and a control condition, would be required to investigate this.

Therefore, we cannot draw any conclusions about the effectiveness of the intervention from this study. Additionally, we did not collect data on specific mental health diagnoses, psychotropic medication, or current or previous psychological treatment; therefore, we cannot make inferences about whom, and for what difficulties MYLO is most suitable for or any potential interactions with other treatments.

The intervention process measure was developed and tested previously in a study of face- to-face MOL therapy [35] and therefore may not be applicable in the same way to a digital intervention. There is little agreement in the literature as to how to measure the digital therapy process and no specific measures have yet been developed [66,67]. Process factors were all highly correlated (multicollinearity) suggesting that these concepts may not be independent or distinct from one another. Multicollinearity limits the conclusions that can be drawn from the quantitative process analysis as the parameter estimates in the multivariate model can be biased and imprecise [68]. However, the findings have important implications for future studies examining processes of therapy quantitatively; for example, the importance of ensuring the accurate measurement of distinct concepts to mitigate multicollinearity problems. Asking participants to identify two helpful and two unhelpful questions from their longest conversation may have biased our results. Firstly, identified questions only represented the extremes in helpfulness and unhelpfulness and secondly, the ratings for these four questions may not be representative of the complete set of questions posed by MYLO and the intervention as a whole. Furthermore, participants may have had difficulties recalling experiences after a two-week interval. Related to this, the

94 finding that 93% of questions posed by MYLO during intervention were rated as ‘OK’, with

8.6% of questions rated as particularly helpful may indicate that MYLO’s questions generally require improvement to increase their helpfulness, however, as we only assessed the extremes in helpfulness in our analysis, further research will be required to ascertain how

‘OK’ questions can be improved.

Clinical implications

Importantly, the intervention appeared acceptable to participants with a wide variety of presenting problems of varying severity, which has the potential to significantly extend the applicability and reach of the intervention compared to disorder specific interventions.

Despite their different presenting problems, participants identified similar processes as either helpful or hindering providing support to the transdiagnostic model of psychological disorders [29] and more specifically the importance of transdiagnostic processes of talking freely, gaining higher level awareness, and developing new perspectives as outlined in PCT

[31]. This supports research indicating that a vital ingredient in helpful therapy is the ability to freely explore what is on your mind [35,69,70]. Moreover, due to the artificially intelligent nature of relational agents, each participant experienced the intervention differently depending upon which questions were posed by MYLO. Despite this, participants’ views on why questions were particularly helpful or unhelpful appeared to converge and provided insight into what clients found fundamentally important for a helpful intervention and recommendations on how to improve MYLO going forward. Our findings support the call to reconsider constraints on how therapy is delivered and importantly to consider core mechanisms of action over highly specified and manualised treatment protocols [71,72].

95

Finally, all participants were eventually recruited from the community. This suggests that there are a proportion of people that are not accessing services but are actively seeking psychological support. This is supported by research indicating a significant mental health treatment gap in the UK [1,73]. Digital interventions such as MYLO may be one way to meet this unmet need, and crucially, vastly improve accessibility through avoiding the need for multiple steps including diagnosis, referral from a GP and acceptance into a mental health service.

96

References

1. McManus S, Bebbington P, Jenkins R, Brugha T (eds). Mental Health and Wellbeing in England: Adult Psychiatric Morbidity Survey 2014. Leeds: NHS Digital; 2016. 2. Kessler RC, Aguilar-Gaxiola S, Alonso J, Chatterji S, Lee S, Ormel J, Üstün TB, Wang PS. The global burden of mental disorders: An update from the WHO World Mental Health (WMH) surveys. Epidemiol Psichiatr Soc [Internet] Cambridge University Press; 2009 Mar 11 [cited 2017 Aug 23];18(1):23–33. [doi: 10.1017/S1121189X00001421] 3. Rethink Mental Illness. Right treatment, right time. 2018. 4. Bennion MR, Hardy G, Moore RK, Millings A. E-therapies in England for stress, anxiety or depression: what is being used in the NHS? A survey of mental health services. BMJ Open [Internet] 2017;7(1):e014844. PMID:28115336 5. NHS. The NHS long term plan [Internet]. 2019. PMID:30617185 6. NHS England. Five Year Forward View for Mental Health : One Year on. 2017;(February):31pp. Available from: https://www.england.nhs.uk/wp- content/uploads/2017/03/fyfv-mh-one-year-on.pdf 7. Gilbody S, Brabyn S, Lovell K, Kessler D, Devlin T, Smith L, Araya R, Barkham M, Bower P, Cooper C, Knowles S, Littlewood E, Richards DA, Tallon D, White D, Worthy G. Telephone-supported computerised cognitive-behavioural therapy: REEACT-2 large- scale pragmatic randomised controlled trial. Br J Psychiatry 2017;210(5):362–367. [doi: 10.1192/bjp.bp.116.192435] 8. Duarte A, Walker S, Littlewood E, Brabyn S, Hewitt C, Gilbody S, Palmer S. Cost- effectiveness of computerized cognitive-behavioural therapy for the treatment of depression in primary care: findings from the Randomised Evaluation of the Effectiveness and Acceptability of Computerised Therapy (REEACT) trial. Psychol Med 2017;47(10):1825–1835. [doi: 10.1017/S0033291717000289] 9. Knowles SE, Toms G, Sanders C, Bee P, Lovell K, Rennick-Egglestone S, Coyle D, Kennedy CM, Littlewood E, Kessler D, Gilbody S, Bower P. Qualitative meta-synthesis of user experience of computerised therapy for depression and anxiety. PLoS One 2014;9(1). [doi: 10.1371/journal.pone.0084323] 10. Marzano L, Bardill A, Fields B, Herd K, Veale D, Grey N, Moran P. The application of mHealth to mental health: Opportunities and challenges. The Lancet Psychiatry Elsevier Ltd; 2015;2(10):942–948. PMID:26462228 11. Bohannon J. The synthetic therapist. Science. 349(6245):250–251. [doi: 10.1126/science.349.6245.250] 12. Bickmore T, Gruber A. Relational agents in clinical psychiatry. Harv Rev Psychiatry United States; 2010;18(2):119–130. [doi: https://dx.doi.org/10.3109/10673221003707538] 13. Hoermann S, McCabe KL, Milne DN, Calvo RA. Application of synchronous text-based

97

dialogue systems in mental health interventions: Systematic review. J Med Internet Res 2017 Aug 7;19(8):267. PMID:28784594 14. Scholten MR, Kelders SM, Van Gemert-Pijnen JE. Self-guided Web-based interventions: Scoping review on user needs and the potential of embodied conversational agents to address them. J Med Internet Res. 2017. p. e383. PMID:29146567 15. Provoost S, Lau HM, Ruwaard J, Riper H. Embodied conversational agents in clinical psychology: A scoping review. J Med Internet Res 2017;19(5):1–23. PMID:28487267 16. Luxton DD. An Introduction to Artificial Intelligence in Behavioral and Mental Health Care. In: Luxton D, editor. Artifical Intell Behav Ment Heal care 2016. p. 1–26. [doi: 10.1016/B978-0-12-420248-1.00001-5] 17. National Institute for Health and Care Excellence. Evidence Standards Framework for Digital Health Technologies. 2018;(December):1–29. 18. Torous J, Roberts LW. Needed innovation in digital health and smartphone applications for mental health transparency and trust. JAMA Psychiatry 2017;74(5):437–438. PMID:28384700 19. Hofmann SG, Hayes SC. The Future of Intervention Science: Process-Based Therapy. Clin Psychol Sci 2019;7(1):37–50. [doi: 10.1177/2167702618772296] 20. Torous J, Levin ME, Ahern DK, Oser ML. Cognitive Behavioral Mobile Applications: Clinical Studies, Marketplace Overview, and Research Agenda. Cogn Behav Pract 2017;24(2):215–225. [doi: 10.1016/j.cbpra.2016.05.007] 21. Hollis C, Sampson S, Simons L, Davies EB, Churchill R, Betton V, Butler D, Chapman K, Easton K, Gronlund TA, Kabir T, Rawsthorne M, Rye E, Tomlin A. Identifying research priorities for digital technology in mental health care: results of the James Lind Alliance Priority Setting Partnership. The Lancet Psychiatry 2018; PMID:30170964 22. Timulak L, McElvaney R. Qualitative meta-analysis of insight events in psychotherapy. Couns Psychol Q Taylor & Francis; 2013;26(2):131–150. [doi: http://dx.doi.org/10.1080/09515070.2013.792997] 23. Timulak L. Significant events in psychotherapy: An update of research findings. Psychol Psychother Theory, Res Pract 2010;83(4):421–447. [doi: 10.1348/147608310X499404] 24. Swift JK, Tompkins KA, Parkin SR. Understanding the client’s perspective of helpful and hindering events in psychotherapy sessions: A micro-process approach. J Clin Psychol 2017;73(11):1543–1555. PMID:29044600 25. Kazdin AE. Mediators and Mechanisms of Change in Psychotherapy Research. Annu Rev Clin Psychol 2007; PMID:17716046 26. Carey T. The Method of Levels: How to Do Psychotherapy Without Getting in the Way. Living Control Systems Publishing; 2006. ISBN:0974015547 27. Lamers F, Van Oppen P, Comijs HC, Smit JH, Spinhoven P, Van Balkom AJLM, Nolen

98

WA, Zitman FG, Beekman ATF, Penninx BWJH. Comorbidity patterns of anxiety and depressive disorders in a large cohort study: The Netherlands Study of Depression and Anxiety (NESDA). J Clin Psychiatry 2011;72(3):342–348. PMID:21294994 28. Carey TA, Mullan RJ. Evaluating the method of levels. Couns Psychol Q 2008;21(October 2014):247–256. [doi: 10.1080/09515070802396012] 29. Barlow DH, Farchione TJ, Bullis JR, Gallagher MW, Murray-Latin H, Sauer-Zavala S, Bentley KH, Thompson-Hollands J, Conklin LR, Boswell JF, Ametaj A, Carl JR, Boettcher HT, Cassiello-Robbins C. The unified protocol for transdiagnostic treatment of Emotional Disorders compared with diagnosis-specific protocols for anxiety disorders: A randomized clinical trial. JAMA Psychiatry [Internet] US Dept of Health Education & Welfare, Rockville, MD; 2017 Aug 2 [cited 2017 Aug 8];74(9):875–884. [doi: 10.1001/jamapsychiatry.2017.2164] 30. Newby JM, Mewton L, Andrews G. Transdiagnostic versus disorder-specific internet- delivered cognitive behaviour therapy for anxiety and depression in primary care. J Anxiety Disord [Internet] Elsevier Ltd; 2017;46:25–34. [doi: 10.1016/j.janxdis.2016.06.002] 31. Powers W. Behavior: The control of perception. Chicago: Aldine publishing co.; 1973. 32. Gaffney H, Mansell W, Edwards R, Wright J. Manage Your Life Online (MYLO): A pilot trial of a conversational computer-based intervention for problem solving in a student sample. Behav Cogn Psychother 2014;42(6):731–746. [doi: http://dx.doi.org/10.1017/S135246581300060X] 33. Bird T, Mansell W, Wright J, Gaffney H, Tai S. Manage Your Life Online: A Web-Based Randomized Controlled Trial Evaluating the Effectiveness of a Problem-Solving Intervention in a Student Sample. Behav Cogn Psychother 2018;1–13. PMID:29366432 34. Horvath AO, Del Re AC, Flückiger C, Symonds D. Alliance in Individual Psychotherapy. Psychotherapy 2011;48(1):9–16. PMID:21401269 35. Cocklin A, Tai S, Mansell W. Client Perceptions of Helpfulness in Therapy: A Novel Video-Rating Methodology for Examining Process Variables at Breif Intervals During a Single Session. Behav Cogn Psychother 2017;45:647–660. 36. Duncan BL, Miller SD, Sparks JA, Claud DA, Beach P, Reynolds LR, Johnson LD. The Session Rating Scale : Preliminary Psychometric Properties of a “Working” Alliance Measure. J Br Ther 2003;3(1):3–12. 37. Higginson S, Mansell W. What is the mechanism of psychological change? A qualitative analysis of six individuals who experienced personal change and recovery. Psychol Psychother Theory, Res Pract [Internet] 2008;81(3):309–328. PMID:18588749 38. Higginson S. Reorganisation of conflict. Masters Degree Dissertation. University of Manchester, Manchester, UK; 2007. 39. Morris L. Examination of the effectiveness and acceptability of a transdiagnostic group for clients with common mental health problems. Doctoral Thesis. University of

99

Manchester, Manchester, UK; 2016. 40. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med 2001;16(9):606–613. PMID:11556941 41. Spitzer RL, Kroenke K, Williams JBW, Lo B. A Brief Measure for Assessing Generalized Anxiety Disorder. JAMA Intern Med 2006;166(10):1092–1097. [doi: 10.1001/archinte.166.10.1092.ABSTRACT] 42. Shepherd M, Matthews V, Ashworth M, Christey J, Wright K, Godfrey E, Robinson S, Parmentier H. A client-generated psychometric instrument: The development of ‘PSYCHLOPS.’ Couns Psychother Res 2006; [doi: 10.1080/14733140412331383913] 43. Ashworth M, Robinson S, Godfrey E, Shepherd M, Evans C, Parmentier H, Tylee A. Measuring mental health outcomes in primary care: The psychometric properties of a new patient-generated outcome measure, “PSYCHLOPS” ('psychological outcome profiles’). Prim Care Ment Heal 2005; PMID:2006340700 44. StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC; 2017. 45. Faul F, Erdfelder E, Lang E-G, Buchner A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 2007;2:175–191. [doi: 10.1088/1755-1315/148/1/012022] 46. Seltman HJ. Chapter 15 Mixed Models. Exp Des Anal 2015. p. 357–378. 47. Mooney CZ, Duval RD. Bootstrapping : a nonparametric approach to statistical inference. Sage Publications; 1993. ISBN:9780803953819 48. Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol 2006; PMID:223135521 49. Elo S, Kyngäs H. The qualitative content analysis process. J Adv Nurs 2008;62(1):107– 115. PMID:18352969 50. Vaismoradi M, Turunen H, Bondas T. Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. Nurs Heal Sci 2013;15(3):398–405. PMID:23480423 51. Swift JK, Callahan JL, Cooper M, Parkin SR. The impact of accommodating client preference in psychotherapy: A meta-analysis. J Clin Psychol 2018;74(11):1924–1937. [doi: 10.1002/jclp.22680] 52. Leys C, Ley C, Klein O, Bernard P, Licata L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol [Internet] Elsevier Inc.; 2013;49(4):764–766. [doi: 10.1016/j.jesp.2013.03.013] 53. Altman D. Practical statistics for medical research. London: Chapman and Hall; 1991. ISBN:0412276305 54. Melville KM, Casey LM, Kavanagh DJ. Dropout from internet-based treatment for psychological disorders. Br J Clin Psychol 2010;49(4):455–471. PMID:19799804

100

55. Gilbody S, Littlewood E, Hewitt C, Brierley G, Tharmanathan P, Araya R, Barkham M, Bower P, Cooper C, Gask L, Kessler D, Lester H, Lovell K, Parry G, Richards DA, Andersen P, Brabyn S, Knowles S, Shepherd C, Tallon D, White D. Computerised cognitive behaviour therapy (cCBT) as treatment for depression in primary care (REEACT trial): Large scale pragmatic randomised controlled trial. BMJ 2015;351:1–13. PMID:26559241 56. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Heal 2017;4(2):e19. PMID:28588005 57. Fulmer R, Joerin A, Gentile B, Lakerink L, Rauws M. Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety; A Randomized Controlled Trial. 2018;5. PMID:30545815 58. Pinto MD, Greenblatt AM, Hickman RL, Rice HM, Thomas TL, Clochesy JM. Assessing the Critical Parameters of eSMART-MH: A Promising Avatar-Based Digital Therapeutic Intervention to Reduce Depressive Symptoms. Perspect Psychiatr Care 2016;52(3):157–168. PMID:25800698 59. Ring L, Shi L, Totzke K, Bickmore T. Social support agents for older adults: longitudinal affective computing in the home. J multimodal user interfaces 2015 Mar;9(1, SI):79– 88. [doi: 10.1007/s12193-014-0157-0] 60. Ly KH, Ly AM, Andersson G. A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interv 2017;10:39–46. [doi: http://dx.doi.org/10.1016/j.invent.2017.10.002] 61. Inkster B, Sarda S, Subramanian V. An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being: Real-World Data Evaluation Mixed-Methods Study. JMIR mHealth uHealth 2018;6(11):e12106. PMID:30470676 62. Carryer JR, Greenberg LS. Optimal Levels of Emotional Arousal in Experiential Therapy of Depression. J Consult Clin Psychol 2010;78(2):190–199. PMID:20350030 63. Hadjistavropoulos HD, Titov N, Dear BF, Pugh NE, Soucy JN. What are Clients Asking Their Therapist During Therapist-Assisted Internet-Delivered Cognitive Behaviour Therapy? A Content Analysis of Client Questions. Behav Cogn Psychother 2019;1–14. [doi: 10.1017/s1352465818000668] 64. Scott MJ. Improving Access to Psychological Therapies (IAPT) - The Need for Radical Reform. J Health Psychol 2018;23(9):1136–1147. [doi: 10.1177/1359105318755264] 65. Snijders TAB, Bosker RJ. Standard Errors and Sample Sizes for Two-Level Research. J Educ Stat 2008;18(3):237–259. [doi: 10.3102/10769986018003237] 66. Wisniewski H, Henson P, Keshavan M, Hollis C, Torous J. Digital mental health apps and the therapeutic alliance: initial review. BJPsych Open 2019;5(1):1–5. [doi: 10.1192/bjo.2018.86] 67. Schnur JB, Montgomery GH, Miller SJ, Sucala M, Brackman EH, Constantino MJ. The

101

Therapeutic Relationship in E-Therapy for Mental Health: A Systematic Review. J Med Internet Res 2012;14(4):e110. [doi: 10.2196/jmir.2084] 68. Shieh YY, Fouladi RT. The effect of multicollinearity on multilevel modeling parameter estimates and standard errors. Educ Psychol Meas 2003;63(6):951–985. [doi: 10.1177/0013164403258402] 69. Carey TA, Kelly RE, Mansell W, Tai SJ. What’s therapeutic about the therapeutic relationship? A hypothesis for practice informed by Perceptual Control Theory. Cogn Behav Ther 2012;5(2–3):47–59. [doi: http://dx.doi.org/10.1017/S1754470X12000037] 70. Griffiths R, Mansell W, Carey TA, Edge D, Emsley R, Tai SJ. Method of levels therapy for first-episode psychosis: rationale, design and baseline data for the feasibility randomised controlled Next Level study. BJPsych Open 2018;4(5):339–345. [doi: 10.1192/bjo.2018.44] 71. Chorpita BF. Commentary: Metaknowledge is power: envisioning models to address unmet mental health needs: reflections on Kazdin (2019). J Child Psychol Psychiatry [Internet] 2019;60(4):473–476. [doi: 10.1111/jcpp.13034] 72. Kazdin AE. Annual Research Review: Expanding mental health services through novel models of intervention delivery. J Child Psychol Psychiatry 2019;4:455–472. [doi: 10.1111/jcpp.12937] 73. Campion J, Knapp M. The economic case for improved coverage of public mental health interventions. The Lancet Psychiatry 2018;5(2):103–105. [doi: 10.1016/S2215- 0366(17)30433-]

102

Blank page

103

Paper 3: Critical appraisal

Word Count: 5, 311 (excluding references)

104

Overview

This paper outlines the researcher’s reflections and critical appraisals of the process of conducting and writing up the research for the thesis. Specifically, this paper will discuss the rationale for key methodological decisions, strengths and limitations of choices, challenges that were experienced in conducting the research and the learning that has occurred throughout the process. Finally, the paper will outline the researcher’s reflections on the overall contributions of the two papers, their impact for clinical practice and directions for future research.

Paper 1 – Relational agents in the treatment of mental health problems: A mixed methods systematic review

Rationale for review topic

Despite improvements in access to psychological therapies in the UK (e.g. through the

Improving Access to Psychological Therapies IAPT program) demand for psychological interventions still far outstrips supply. The researcher worked previously on a pilot trial of the MYLO intervention in 2011 with a student sample (published in 2014; [1]) and had an existing understanding and interest in relational agent interventions and their potential to provide accessible and timely intervention. Interest in the development of computer-based interventions, specifically, relational agent interventions for mental health and wellbeing has increased significantly over the past 10 years alongside significant advances in technology.

Initial searches conducted to scope out a potential systematic review topic in this area highlighted several relational agent interventions for mental health treatment. Previous systematic reviews identified were narrow in scope (e.g. only looked at a subset of relational

105 agent interventions) or were outdated and there appeared to be little to no consensus on whether relational agent interventions were acceptable to clients or efficacious in the treatment of mental health problems. Thus, a systematic review in this area appeared timely and important. We aimed to conduct a comprehensive review of the currently available relational agent interventions in the treatment of mental health problems. This would inform our understanding of the current state of the art, identify any gaps in knowledge and inform recommendations for future research studies.

Literature search

In developing the search strategy, it became apparent there was also limited consensus on the terminology used to describe or index research on relational agent interventions. This perhaps reflects the diversity of the field and cross-cutting nature of the search across computer science and mental health. Notably, papers identified as central to the research question were not found with initial searches and the expertise of a university librarian was drawn upon to facilitate the development of a comprehensive search strategy. The librarian also recommended the inclusion of a more general database (Web of Science) to ensure comprehensiveness alongside more specific subject databases (e.g. PsycINFO). The final search strategy was broad (three search categories: relational agent; mental health; intervention), reduced the likelihood of missing relevant papers and reflected the comprehensive nature of the research question. However, this meant that the search identified a vast number of papers (22,722) and the initial screening of titles and abstracts therefore took much longer to complete than anticipated.

106

We aimed to be as inclusive as possible (no restrictions on relational agent modality, study design, outcome measures or mental health domain) however it was important that the relational agents included were independent, autonomous and emulated natural conversation as we believe it is this type of intervention that has the potential to vastly increase reach over therapist supported interventions. Due to the broad inclusion criteria and sometimes limited descriptive information about the interventions in papers several full-texts were discussed with supervisors to agree eligibility. This shared decision-making process is a strength of the review as it is likely to have improved the reliability of decisions made about included papers. Due to time constraints and the large number of papers identified in the initial search, it was agreed that an inter-rater reliability assessment would not be undertaken at title and abstract stage. However, a 10% sample of studies identified for full-text screening were screened by an independent reviewer and substantial interrater agreement was indicated suggesting that the criteria were being systematically applied.

Due to the fast-paced nature of this field, there was a constant concern throughout the review process that the review would become outdated or duplicated by another researcher. Prospective registration on PROSPERO was completed and google scholar alerts were set-up to mitigate this risk as much as possible. Furthermore, the original search conducted in September 2018 was updated at the end of January 2019 and two new papers were identified and included which demonstrated the pace of research in this area. A further eligible paper was identified in March 2019 [2], however, the systematic review had already been written up and was ready for submission to a journal therefore upon discussion with supervisors it was agreed this paper would not be incorporated.

107

Quality appraisal & data extraction

Due to the varied methodologies used in the included papers, the 16-item validated Quality

Assessment Tool (QATSDD; [3]) was used to assess risk of bias. There were limited options for simultaneous quality appraisal of qualitative, quantitative and mixed methods papers and the tool chosen is comprehensive and provides both an item level and overall assessment of the strengths and limitations of the included studies across designs. There are some limitations to this tool (see [4] for a critique) that are important to consider in the context of our review. Specifically, the QATSDD penalises studies for small sample sizes as the item ‘representative sample of target group or a reasonable size’ merges two important factors (representative sample and sample size) together without mention of statistical power or consideration of the differences between quantitative and qualitative samples.

This meant that papers with small sample sizes, even if representative of the target group, received lower ratings of quality overall. Quality appraisal was not used to exclude studies with methodological limitations as this would have meant a significant loss of data.

However, the quality appraisal results were used to critically reflect on the strengths and weaknesses of the included studies and the strength of the body of research as a whole.

The researcher was aware that the inclusion of the researcher’s own paper [1] in the systematic review had the potential to bias quality appraisal and synthesis. It was important to take the paper at ‘face value’ and not incorporate knowledge or appraisals not explicit in the text. Ideally, the quality appraisal would have been independently conducted by a second reviewer to ensure reliability but due to limited resources this was not undertaken.

We tested our pre-specified data extraction form prior to extracting data from the included studies. However, due to the diversity of study designs, interventions and outcomes used data extraction became a much more iterative process reflecting the complexities across

108 studies. This meant revisiting papers numerous times to ensure all data was extracted consistently across studies.

Synthesis

Extracted data was narratively synthesised in line with guidance [5]. Narrative synthesis was chosen as it provides a way of systematically combining the findings of studies with diverse outcomes and methodologies into a coherent ‘story’ or narrative. Narrative synthesis has been criticised as being prone to bias compared to quantitative methods of synthesis such as meta-analysis, however, by following clear guidance on conducting the synthesis and being transparent in the reporting of our methods we endeavoured to mitigate this risk as far as possible. The process of synthesis was overwhelming at times due to the heterogeneity of included studies and the iterative process of identifying patterns and relationships between studies. As a result, the researcher gained skills in synthesising diverse literature.

Evaluation of findings

All included studies were published in the previous 6 years with almost 40% published in

2018 which is indicative of a real growth and continued interest in relational agents in the treatment of mental health problems. The findings of the systematic review indicated promising potential for relational agent interventions to treat mild to moderate psychological distress and indicated that people are willing and able to engage in this type of intervention. However, further research is required to assess, and crucially, compare which formats are most helpful or acceptable. The researcher was struck by the eclecticism of the intervention content which appeared to take a piece-meal approach drawing from a variety of therapeutic approaches with little to no explicitly defined theoretical underpinning. This

109 potentially obscures specificity of the core processes by which the intervention works to reduce psychological distress and may unnecessarily complicate interventions. The researcher recognises that some relational agent interventions developed for the treatment of mental health problems were excluded from the systematic review as they did not report on a mental health outcome. For example, the Sophie CBT system for depression [6], a chatbot to increase emotional resilience (Kokobot; [7]), a Context Respectful Counselling

Agent (CRCA) for IT workers [8–12], a virtual mindfulness coach ‘Chris’ [13], a therapy system for post-traumatic stress disorder (3MR_2; [14]) and an exam stress counsellor [15].

This may have biased the findings towards symptom-based changes. Overall, the findings of

Paper 1 provided a comprehensive overview of the available relational agents in mental health and importantly set the scene for the researcher’s study of a relational agent intervention, MYLO in Paper 2.

Paper 2 – Agents of change: A multi-method study to optimise the helpfulness of an artificial relational agent for treating mental health problems

Rationale for the research topic and study design

The primary aim of Paper 2 was to examine theory-driven hypotheses regarding the process of intervention with relational agent intervention MYLO. Insight into what people find helpful and hindering about intervention with MYLO would help to optimise the utility and acceptability of the intervention. The therapeutic process was identified as a research priority for relational agent interventions in Paper 1. And supports the wider drive to evidence mechanisms over manualisation and therapeutic processes over procedures [16–

110

19] in the context of a broader paradigm shift in clinical psychology research and practice

[18].

The study design took inspiration from a previous MOL therapy process study supervised by the researcher’s supervisors [20]. This study demonstrated that a repeated measures design, with a fine-grained quantitative analysis of what was helpful and hindering in therapy was acceptable to clients, feasible and provided valid insights into the process of face-to-face MOL therapy. The researcher adapted this quantitative method used in face-to- face therapy for digital intervention with MYLO. The researcher also incorporated qualitative interview data as it was thought this would support the quantitative analysis and provide a rich narrative on why therapeutic questions were identified as helpful or hindering and how they facilitated or blocked psychological change from a client perspective. Previous studies of relational agent interventions demonstrated that two weeks was enough time for participants to become familiar with the intervention and therefore provide feedback on various aspects of the intervention. Therefore, participants in our study were offered a two- week window in which they could use MYLO at least once and beyond that, were given control over the frequency and duration of sessions.

Patient and Public Involvement (PPI)

Although the methods employed in the empirical study were informed by a previous study

[20], consultation with experts by experience from the University of Manchester Community

Liaison Group (CLG) was crucial for the development of the measures, interview schedules and was especially valuable in considering issues around client safety when using an unsupported digital intervention. The CLG consists of experts by experience, who were very

111 positive about the MYLO intervention and its potential to increase access to psychological intervention which the researcher found encouraging. However, it was emphasised that the researcher should ensure participants understood that MYLO was a computer simulation (as opposed to a person) as they noted that the conversation style was very ‘human-like’ and had the potential to confuse or distress service users. As a result, the researcher ensured that participants understood the program was a computerised system through direct demonstration of the program with participants and were reminded that MYLO may not respond as a human would. The researcher ensured participants were provided with a list of contacts to be used in a crisis or for further support.

Information governance and IT set-up

A significant challenge experienced in the set up and conduct of this research was the information governance process required to collect, store and transfer the personal data of participants safely through MYLO and the wider IT set-up of MYLO at the University of

Manchester. The researcher consulted with the information governance team at the

University of Manchester throughout the planning, set-up and conduct of the study.

However, there were several occasions where it began to seem unlikely that the university would be able to host MYLO online in the time-frame required for the study and this would have had a major impact on the study design and procedures. This was made especially challenging due to the GDPR regulations that came into effect in May 2018 which required further approvals just as the study was being set-up. The researcher’s perseverance with the

IT and information governance teams meant that this was eventually approved and MYLO was hosted on a virtual machine based at the University. The researcher learned a great

112 deal about information governance and in particular, online data security and compliance with UK regulations throughout this process.

Ethical considerations

Ethical approval for the study was granted through a local NHS Research Ethics Committee.

Although the researcher had previous experience of applications to NHS Research Ethic

Committees, the process was lengthy, and significant delays were experienced in the processing of the application through the University.

Several key ethical issues were considered during the planning and set-up and were proactively managed throughout the study. Client safety and risk management were particularly important given that MYLO was to be offered as a stand-alone intervention without therapist support. MYLO did not have any risk identification methods embedded within it or any risk management interventions. This was discussed at the NHS Research

Ethics Committee panel with the researcher and main supervisor. It was decided that the researcher would actively manage risk with participants throughout their participation through the face-to-face data collection meetings, regular screening of their conversation transcript record on MYLO (at least weekly), a weekly telephone call and provision of a printed list of contacts to be used for further support. These procedures were in line with e- therapy risk management protocols at the primary care mental health service we aimed to recruit through. Furthermore, it was agreed that participants would be able to freely access any other psychological support or treatments during the study and would therefore not have to choose between MYLO and another treatment offered e.g. by the primary care

113 mental health service. It was advantageous that the researcher was completing a clinical psychology training course as the identification and management of distress and risk could be carried out appropriately under the supervision of supervisors.

The secure collection, storage and transportation of participant data was also an important consideration. Due to the unique IT processes involved (MYLO conversations stored on a virtual machine on a remote server and downloadable by the researcher remotely) the appropriate management of this required lengthy discussions and sustained liaison with the

University of Manchester information governance and IT teams. The researcher learned how to use new software such as MobaXterm to access the remote server where MYLO conversations were stored and learned how to edit XML files. Unique usernames and passwords were created for participants to log in to MYLO securely. Conversation files were pseudonymised automatically by the MYLO username. Only the researcher had access to the anonymisation key to match data to participants.

Recruitment

Initially we aimed to recruit exclusively through a charitable organisation commissioned by the NHS to provide a primary care mental health service. The service offered e-therapies as part of their psychology provision and were interested in being involved with the study from the outset. The service received approximately 10,000 referrals per year and 3000 of these people opted to access an e-therapy. The researcher and their main supervisor met with the service managers to discuss the project and liaised with them throughout the planning and ethical approval stages of the research. It was agreed that clinicians conducting initial

114 assessments would invite eligible people to take part with the contact details of interested individuals passed to the researcher. We aimed to recruit 25 people to the study (based on a-priori power analysis). Given the high numbers of referrals, the researcher and the team manager were hopeful that this would be a relatively quick and straightforward recruitment process. The researcher provided the team with Participant Information Sheets (PIS), eligibility checklists and the study flyers and posters. Staff members were briefed on how to invite individuals and the recruitment period began.

Between the 1st October 2018 and the 14th December 2018 only four interested individuals were identified from the primary care mental health service (none were eventually recruited, see Paper 2 for a flow diagram). Throughout this time, the researcher continually liaised with the service manager and staff members to discuss any barriers to recruitment.

Barriers identified were: many individuals being stepped up (therefore not eligible for inclusion), clinicians feeling that they were burdening clients after a complex assessment and not having the time to ask at the end of an assessment due to time constraints and competing duties. The researcher attempted to reduce the burden on clinicians as far as possible by enabling them to ask clients at the end of treatment or simply providing a study flyer to all clients at step-two, so the client could contact the researcher directly if they were interested. This did not appear to improve recruitment rates. The researcher and supervisors were surprised by the difficulties experienced recruiting. It appeared from the staff feedback that eligible clients were not being invited to take part for various reasons including time constraints, competing priorities and clinicians aiming to ‘protect’ clients from perceived burden. However, denying clients the option of an alternative or additional treatment despite good intentions appeared to the researcher to be in direct violation of

115 client-centred care and represented a more paternalistic style of service. This is supported by research suggesting that assessments in IAPT services can be highly protocolised and can therefore be experienced by clients as rigid and service, rather than client led [21,22].

Furthermore, it is perhaps the case that services and/or clinicians were more willing to offer traditional psychological interventions such as individual therapies despite other formats

(e.g. groups, guided self-help and e-therapies) being recommended for mild to moderate common mental health problems [23]. This may be indicative of low confidence in the value of e-therapies. However, in a climate of overstretched clinicians and services and high levels of unmet need for mental health difficulties diversification of the treatment offer becomes ever more important [16]. Empowering people to make informed choices about different treatment options is an important endeavour if we are to maximise engagement and outcomes. Further research into the attitudes of clinicians and services towards other modalities of psychological intervention, and in particular, autonomous options might prove insightful.

Due to the researcher’s time-constraints, at the end of October 2018 it was agreed that we would broaden recruitment to try to achieve the target of 25 participants by January 2019.

Ethical approvals were gained through the university and NHS to advertise the study more widely through an electronic advertisement to a local peer support group, the University of

Manchester research participants website and through posters at the University counselling service and student areas. Interest was high (38 initial contacts made in a 4-week period) and around half of participants were from a local peer support group and half were university staff or students. This is also supported by the findings of Paper 1 which found that studies that used more flexible recruitment methods (e.g. app store and online adverts)

116 recruited greater numbers of participants [24,25] compared to studies that recruited through clinicians [26]. The difficulties experienced recruiting through services and relative ease of recruiting through the community indicates that there are a proportion of people in the community whom may or may not be accessing services but are seeking some form of psychological support.

Sample characteristics

The researcher recruited 17 participants to the study in the time available for recruitment.

The sample were predominantly female (65%) with a wide age range (22 to 67 years old) and were all self-selected. They reported varying levels of psychological distress related to a problem they were currently experiencing. The sampling methods may have introduced some bias as around half of participants were members of a local peer support group which has an online presence and participants and staff recruited through the University are likely to be highly educated and computer literate which is unlikely to be representative of the wider population. These selection biases should be considered in future studies.

Around half of participants scored above clinical thresholds on measures of anxiety (GAD-7) and depression (PHQ-9) and there was wide variation in severity across the sample. Two interested individuals were excluded from taking part in the study due to high levels of risk and psychotic symptoms. Upon reflection, it appears that there is at least a subset of people keen to engage with novel digital therapeutic modalities regardless of the type of presenting problem or severity of difficulty. This raises important questions around how we can best

117 maximise access, client choice and control whilst minimising risks and harms in unsupported digital interventions.

It was decided that we would not collect data on diagnosis or medication nor any information on psychological treatment received concurrently or previously as this data did not form part of our primary hypotheses. However, as a result, we did not gather any insight into whether MYLO interacted with other treatments, whether perceptions of helpfulness were related to prior experiences of therapies or insight into whom MYLO would be most suited for and under what circumstances. This would be particularly important for future studies that assess and compare the efficacy of MYLO with other treatments. Furthermore, the study was not designed to evaluate changes in symptoms, as a much larger sample, a control group and greater time with the intervention would be required to assess this.

Intervention process measures

Gathering client perspectives on what is helpful and hindering in therapy is fundamental to the identification of processes that facilitate or lessen psychological change and support effective optimisation of interventions [27]. However, there was little consensus in the literature on how best to measure the therapy process in digital interventions and no specific measures have been developed [28,29]. The intervention process measure used in this study was therefore modified from a previous process study of face-to-face Method of

Levels (MOL) therapy [20] and measured key processes of psychological change identified in a specific unified theory of human functioning called Perceptual Control Theory (PCT). PCT provides a comprehensive understanding of how psychological distress arises (goal conflict),

118 is maintained (lack of awareness) and resolved (reorganisation). The model is dynamic and situated in the present moment. It was thought that it would be too burdensome for participants to rate every question MYLO posed in a single session of therapy and although the researcher aimed to demonstrate theory driven processes (thus perhaps more questions would be ideal), a key aim was to utilise the findings to optimise the clinical usefulness of the MYLO program for future studies. Therefore, it was agreed that participants would identify two questions that they experienced as helpful and two questions experienced as unhelpful. To increase the depth and richness of the process analysis, participants were also interviewed about why they had chosen each question. It was thought that the multi- method approach would enable detailed analysis of the core processes that facilitate psychological change and the core processes that block or prevent change and provide clear direction for optimisation.

The process measures and procedures were only piloted with staff members at a primary care mental health service and not with clients. Therefore, we did not have insight into whether the study procedures were overly burdensome for participants. However, no participants reported any adverse reactions to the study procedures or fatigue related to repeated assessment of processes. Participants appeared to understand how to complete the process measures and did not indicate any confusion or lack of understanding of the concepts. As the qualitative interviews were semi-structured, this relied on the researcher asking follow-up questions to facilitate discussion thus perhaps introducing a researcher bias to the findings.

119

Analytical approach

The analysis approach used in this study was multi-method and comprehensive. However, several limitations are apparent and discussed here and in Paper 2. Based on a-priori power analysis calculation, 25 participants were required to detect a large association between the process variables and helpfulness score with a α = .05 and the recommended minimum level of power of 0.8 [30]. Unfortunately, due to difficulties encountered during recruitment this sample size was not achieved and has limited the conclusions that can be drawn from the quantitative analyses.

Mixed effects models are used when data have a hierarchical structure [31]. In this study, data were nested within people. Meaning that data points from the same person were more likely to be similar to each other therefore the assumption of independent observations was not met. However, in conducting the mixed effects model analyses, it became apparent that the process variables were highly correlated with one another meaning that the beta coefficients may be biased and imprecise. Multicollinearity may be due to the process items not measuring distinct concepts (overlap) or it might be that they are all measuring an overarching or latent factor. The researcher was supervised by a statistician for the quantitative analysis and this proved extremely helpful when interpreting results and the researcher was careful not to draw any concrete conclusions from this analysis considering the limitations. In future quantitative analysis of therapy process, it would be imperative to measure concepts were distinct from one another and to reduce the number of concepts to those of most interest theoretically.

120

In the planning stages of this study, the researcher was initially uncertain of the added value of the qualitative analysis. However, the qualitative analysis provided fruitful insights into why questions were perceived as helpful or unhelpful by participants and the researcher found this aspect of the research particularly engaging. Despite there being value in a hypothesis driven approach to analysis, the researcher was aware that their existing knowledge of PCT and MOL and theory driven approach had the potential to bias the findings of the thematic analysis. The risk of bias could have been reduced by the independent verification of qualitative themes by another researcher. However, due to time constraints this was not possible.

Evaluation of findings

The study found that compassionate and human-like questions which facilitated participants to talk freely about their problem, increase insight into their problem and develop new perspectives on their problem were perceived as helpful. Helpfulness was also associated with questions perceived to be a ‘good fit’ for the participant. In contrast, questions which were repetitive, confusing or inappropriate and emphasised MYLO’s limited understanding were perceived as unhelpful and resulted in a loss of faith in the intervention. Furthermore, questions that elicited emotions that were perceived as too intense were also associated with unhelpfulness and appeared to result in disengagement from MYLO. The findings appeared consistent across presenting problems. As stated in the discussion section of

Paper 2, the findings provide partial support for the mechanisms of psychological change identified in PCT and facilitated in MOL therapy [32–34]. Specifically, being able to talk freely without filtering what comes to mind and seeing the problem in a new way. Experiencing emotion connected to a problem appeared to be helpful to a certain extent but became 121 unhelpful if this emotion was perceived as too intense. Interestingly, the overarching process of control identified in PCT as key to psychological change was not identified as related to helpfulness. However, it might be hypothesised that being ‘in control’ is what enables a person to talk freely, experience emotions that are manageable, develop awareness and gain new insights. Moreover, unhelpful processes such as incidences where

MYLO lacked understanding or asked questions that ‘delved too deep’ for comfort could be hypothesised as an experience of a loss of control. Future research could investigate this further perhaps by using methods such as mediation analysis.

Contribution to theory, research and practice

Taken together, the research conducted to form this thesis is timely and representative of a sea change in considering how we can effectively and acceptably diversify treatment options in the context of great need and limited resource. The researcher has written up the thesis alongside a placement in public health which has served to intensify the researcher’s interest and commitment to improving mental health provision on a wider scale. One way to achieve this, and considered in Paper 2, is by looking more closely at the processes of effective therapy with less focus on specific techniques or procedures. The findings provide some support to core processes outlined in PCT; a robust and unified framework of human behaviour. The findings also provide clear recommendations for the optimisation of the

MYLO program to maximise its efficiency and acceptability. Future studies utilising MYLO will seek to demonstrate efficacy and equivalence to other forms of treatment.

Personally, the researcher believes that digital mental health interventions could have a valuable place within the mental health intervention offer as the findings appear to

122 demonstrate acceptability to clients. As a population, we are using more and more technology to support and enhance our lives. The nature of exactly what, how and for whom remains an exciting endeavour and priority for research, however, the commitment to diversifying the way we access healthcare through a ‘digital first’ approach has already arrived and forms a key focus of the NHS long term plan [35]. As a scientist-practitioner, the researcher looks forward to proceeding with this vision in their clinical and research work going forward.

In addition to diversifying access through digital interventions, the researcher believes that continued and greater focus on demonstrating mechanisms in psychological treatments will be an important undertaking. In the context of the current debate on the replication crisis in psychology research [36], new ways of demonstrating valid and meaningful outcomes is required. The researcher’s clinical observations of significant overlap between diagnostic categories and high rates of ‘comorbidity’ suggests that the evidence base focussing on disorder specific and manualised treatments may not always be fit for purpose. Increased focus on how people solve problems and reduce distress (both within and beyond the confines of ‘treatment’) might provide helpful insights and could move the science of psychology forward into a new era with greater parsimony between research and clinical practice. Approaches such as PCT, which provides an understanding of human behaviour as dynamic and purposeful may offer one way in which to achieve this.

123

Dissemination

The research will be disseminated through various routes. Paper 1 is under review at the open access, Journal of Medical Internet Research (JMIR) Mental Health. JMIR have agreed to waive the Article Processing Fee if it is accepted for publication as the researcher nor supervisory team have access to any research funds (see Appendix I). Paper 2 will be submitted to a peer reviewed journal as soon as practicable. Paper 2 will be presented at a

Post-Graduate Research Annual Conference at the University of Manchester on Friday 7th

June 2019. The researcher also intends to submit Paper 2 to a national digital mental healthcare conference such as the MindTech National Symposium.

Concluding remarks

In writing this critical evaluation, the researcher has considered and evaluated several methodological and ideological issues pertinent to the research findings and highlighted several directions for future research. Despite the identified limitations, the research conducted in this doctoral thesis has resulted in a valuable and timely contribution to the digital therapies literature.

124

References

1. Gaffney H, Mansell W, Edwards R, Wright J. Manage Your Life Online (MYLO): A pilot trial of a conversational computer-based intervention for problem solving in a student sample. Behav Cogn Psychother 2014;42(6):731–746. [doi: http://dx.doi.org/10.1017/S135246581300060X] 2. Kamita T, Ito T, Matsumoto A, Munakata T, Inoue T. A Chatbot System for Mental Healthcare Based on SAT Counseling Method. Mob Inf Syst 2019;2019:1–11. [doi: 10.1155/2019/9517321] 3. Sirriyeh R, Lawton R, Gardner P, Armitage G. Reviewing studies with diverse designs: The development and evaluation of a new tool. J Eval Clin Pract 2012;18(4):746–752. PMID:21410846 4. Fenton L, Lauckner H, Gilbert R. The QATSDD critical appraisal tool: Comments and critiques. J Eval Clin Pract 2015;21(6):1125–1128. [doi: 10.1111/jep.12487] 5. Popay J, Roberts H, Sowden A, Petticrew M, Aria L, Rodgers Ma, Britten N, Roen K, Duffy S. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews A Product from the ESRC Methods Programme. ESRC Methods Program. 2006. 6. Mitchell S, Krizman K, Bhaloo J, Cawley M, Welch M, Bickmore T, Ring L, Alvarez C. Treating Comorbid Depression During Care Transitions Using Relational Agents. Boston; 2016. 7. Morris RR, Kouddous K, Kshirsagar R, Schueller SM. Towards an Artificially Empathic Conversational Agent for Mental Health Applications: System Design and User Perceptions. J Med Internet Res 2018;20(6):e10148. PMID:29945856 8. Shinozaki T, Yamamoto Y, Tsuruta S, Damiani E. An Emotional Word Focused Counseling Agent and Its Evaluation. 2014 IEEE Int Conf Syst MAN Cybern 2014. p. 2025–2031. 9. Yamamoto Y, Shinozaki T, Ikegami Y, Tsuruta S. Context respectful counseling agent virtualized on the web. World Wide Web 2016;19:111–134. [doi: 10.1007/s11280- 015-0326-4] 10. Shinozaki T, Yamamoto Y, Takada K, Tsuruta S. Context-based Reflection Support Counseling Agent. In: Yetongnon, K and Chbeir, R and Dipanda, A and Gallo L, editor. Eighth Int Conf Signal Image Technol Internet Based Syst Context 2012. p. 619–628. PMID:17583280 11. Shinozaki T, Yamamoto Y, Tsuruta S, Knauf R. Validation of Context Respectful Counseling Agent. IEEE Int Conf Syst Man, Cybern 2015. p. 993–998. [doi: 10.1109/SMC.2015.180] 12. Shinozaki T, Yamamoto Y, Tsuruta S. Context-based counselor agent for software development ecosystem. Computing 2015 Jan;97:3–28. [doi: 10.1007/s00607-013- 0352-y]

125

13. Hudlicka E. Virtual training and coaching of health behavior: Example from mindfulness meditation training. Patient Educ Couns 2013 Aug;92(2):160–166. [doi: 10.1016/j.pec.2013.05.007] 14. Tielman ML, Neerincx MA, Bidarra R, Kybartas B, Brinkman W-P. A Therapy System for Post-Traumatic Stress Disorder Using a Virtual Agent and Virtual Storytelling to Reconstruct Traumatic Memories. J Med Syst 2017;41(8):125. [doi: 10.1007/s10916- 017-0771-y] 15. Kavakli M, Li M, Rudra T. Towards the development of a virtual counselor to tackle students’ exam stress. J Integr Des Process Sci 2012;16(1):5–26. [doi: 10.3233/jid- 2012-0004] 16. Kazdin AE. Annual Research Review: Expanding mental health services through novel models of intervention delivery. J Child Psychol Psychiatry 2019;4:455–472. [doi: 10.1111/jcpp.12937] 17. Davison GC. A Return to Functional Analysis, the Search for Mechanisms of Change, and the Nomothetic-Idiographic Issue in Psychosocial Interventions. Clin Psychol Sci 2019;7(1):51–53. [doi: 10.1177/2167702618794924] 18. Hofmann SG, Hayes SC. The Future of Intervention Science: Process-Based Therapy. Clin Psychol Sci 2019;7(1):37–50. [doi: 10.1177/2167702618772296] 19. Kazdin AE. Mediators and Mechanisms of Change in Psychotherapy Research. Annu Rev Clin Psychol 2007; PMID:17716046 20. Cocklin AA, Mansell W, Emsley R, McEvoy P, Preston C, Comiskey J, Tai S. Client Perceptions of Helpfulness in Therapy: a Novel Video-Rating Methodology for Examining Process Variables at Brief Intervals During a Single Session. Behav Cogn Psychother 2017;45(06):647–660. [doi: 10.1017/s1352465817000273] 21. Marshall D, Quinn C, Child S, Shenton D, Pooler J, Forber S, Byng R. What IAPT services can learn from those who do not attend. J Ment Heal 2016;25(5):410–415. [doi: 10.3109/09638237.2015.1101057] 22. Winter D. Improving Access or Denying Choice? Ment Heal Learn Disabil Res Pract 2007;4:73–82. 2007;73–82. 23. National Collaborating Centre for Mental Health. Common mental health disorders: The NICE guideline on identification and pathways to care. The British Psychological Society and The Royal College of Psychiatrists; 2011. 24. Inkster B, Sarda S, Subramanian V. An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being: Real-World Data Evaluation Mixed-Methods Study. JMIR mHealth uHealth 2018;6(11):e12106. PMID:30470676 25. Suganuma S, Sakamoto D, Shimoyama H. An Embodied Conversational Agent for Unguided Internet-Based Cognitive Behavior Therapy in Preventative Mental Health: Feasibility and Acceptability Pilot Trial. JMIR Ment Heal 2018;5(3):e10454. [doi: 10.2196/10454] 26. Burton C, Szentagotai Tatar A, McKinstry B, Matheson C, Matu S, Moldovan R, 126

Macnab M, Farrow E, David D, Pagliari C, Serrano Blanco A, Wolters M. Pilot randomised controlled trial of Help4Mood, an embodied virtual agent-based system to support treatment of depression. J Telemed Telecare 2016;22(6):348–355. PMID:26453910 27. Castonguay LG, Boswell JF, Zack SE, Baker S, Boutselis MA, Chiswick NR, Damer DD, Hemmelstein NA, Jackson JS, Morford M, Ragusea SA, Roper JG, Spayd C, Weiszer T, Borkovec TD, Grosse Holtforth M. Helpful and hindering events in psychotherapy: A practice research network study. Psychotherapy 2010;47(3):327–344. [doi: 10.1037/a0021164] 28. Wisniewski H, Henson P, Keshavan M, Hollis C, Torous J. Digital mental health apps and the therapeutic alliance: initial review. BJPsych Open 2019;5(1):1–5. [doi: 10.1192/bjo.2018.86] 29. Schnur JB, Montgomery GH, Miller SJ, Sucala M, Brackman EH, Constantino MJ. The Therapeutic Relationship in E-Therapy for Mental Health: A Systematic Review. J Med Internet Res 2012;14(4):e110. [doi: 10.2196/jmir.2084] 30. Cohen J. Statistical power analysis for the behavioural sciences. 2nd ed. United States of America: Lawrence Erlbaum Associates; 1988. ISBN:0805802835 31. Seltman HJ. Chapter 15 Mixed Models. Exp Des Anal 2015. p. 357–378. 32. Powers W. Behavior: The control of perception. Chicago: Aldine publishing co.; 1973. 33. Carey TA, Kelly RE, Mansell W, Tai SJ. What’s therapeutic about the therapeutic relationship? A hypothesis for practice informed by Perceptual Control Theory. Cogn Behav Ther 2012;5(2–3):47–59. [doi: http://dx.doi.org/10.1017/S1754470X12000037] 34. Carey T. The Method of Levels: How to Do Psychotherapy Without Getting in the Way. Living Control Systems Publishing; 2006. ISBN:0974015547 35. NHS. The NHS long term plan [Internet]. 2019. PMID:30617185 36. Mansell W, Huddy V. The assessment and modeling of perceptual control: A transformation in research methodology to address the replication crisis. Rev Gen Psychol 2018;22(3):305–320. [doi: 10.1037/gpr0000147]

127

Appendices Appendix A. Journal of Medical Internet Research - guidelines for authors

Instructions for Authors of JMIR

Here are some quick links to how your manuscript should look at the time of submission. Components are detailed in the expected format of your manuscript. Please also refer to Instructions for Authors of JMIR for more information related to submissions, and Guide to JMIR Online Interface (PPT to come) for help with using our online system.

Original Paper Enter information for authors (including designations, affiliations, correspondence, contributions) in the online metadata form. Do not use periods after initials, and include degree designations and affiliations for all authors. Trial registration numbers are also filled in on the metadata forms online. Title of Your Manuscript Should Describe the Intervention: Study Design Abstract Background: Objective: Methods: Results: Be sure to include relevant statistics here, such as sample sizes, response rates, P values or Confidence Intervals. Be specific (by stating the value) rather than general (eg, “there were differences between the groups”). Conclusions: Trial Registration: In accordance with ICMJE recommendations, RCTs must have been registered in a WHO accredited trial registry. Please mention the ClinicalTrials.gov registration identifier, the International Standard Randomized Controlled Trial Number (ISRCTN), or a comparable trial identifier at the end of the abstract ("Trial Registration: ClinicalTrials.gov NCT123456"), as well as when you first mention the trial in the manuscript. When mentioning related trials (e.g. in the Introduction or Methods section) the trial registration number should also be added in brackets. ICMJE member journals require, as a condition of consideration for publication, registration in a public trials registry at or before the onset of patient enrollment. This policy applies to any trial which started enrollment after July 1, 2005. JMIR authors must add an explanation to the methods section of their manuscript if a RCT meeting these criteria has not been registered. The JMIR editor reserves the right to reject any paper without trial registration without any further consideration or peer-review.

128

Keywords: Provide 3 to 10 author-selected keywords or short phrases separated with semicolons (;) that will assist indexers in cross-indexing the article and that may be published with the abstract.

Introduction This section can include background information such as theories, prior work, and hypotheses.

If this section is quite lengthy, use of subheadings (use Word Heading 3) are encouraged to break up the material logically, e.g. Background, Prior Work, Goal of This Study etc. Subheadings should be consistent; therefore a subheading for the first part of the Methods section, for example, is also necessary (see below).

Generally, a typical paper contains between 3000 and 6000 words, but there are no rigorous restrictions. Papers should be written in accordance with the American Medical Association Manual of Style: A Guide for Authors and Editors. 9th ed. Baltimore, Md: Williams & Wilkins; 1998.

Please do not include URLs within the manuscript. A reference should be created for the URL and included in the reference list. Please use WebCite to capture the website as soon as possible, as they often expire after the intervention and become inaccessible.

Methods Recruitment Notice that the first subheading immediately follows the last heading. Subheadings under subheadings are also possible (see Statistical Analysis). Statistical Analysis Power Notice that the next Heading Style (Heading Style 4 in this case) is used. Click on the different headings to see their Heading Style in the “Home” ribbon under “Styles”. Always have at least 2 of the same subheading level in a section.

Data Exclusion Try to avoid having only one sentence after a subheading. For example, describe the key findings of a Table that you refer to in that sentence. Results User Statistics

129

These are only examples of possible headings. Please feel free to use different headings to best describe your results.

Evaluation Outcomes Please make reference to your Textboxes (Textbox 1), Tables (Table 1), Figures (Figure 1), and Multimedia Appendices (Multimedia Appendix 1) in parenthesis. Please see the examples below for how they should be formatted. Please note the punctuation used in all components, including the caption/title, footnotes etc.

Figures and Multimedia Appendices are uploaded online, while Textboxes and Tables are not uploaded and remain in the body of the manuscript, appearing in the order they are mentioned after the first mention of each Table.

Textbox 1. The caption/title is placed here in a sentence format (capitalization of every word is unnecessary). 1. The formatting is actually a 1x1 Table, not an actual “textbox”. 2. Textboxes have no footnotes. 3. Bullet points or numbered lists are allowed in textboxes.

Table 1. The table caption/title is placed here in a sentence format (capitalization of every word is unnecessary).a-e Main heading 1 Main heading 1 Main heading 1 Main heading 2 Main heading 2 Main heading 2

Subheading

(leave blank) data data data

(leave blank) data data data Subheading

(leave blank) data data data (leave blank) data data data Subheading (leave blank) data data data (leave blank) data data data

130 aNot all elements are necessary for every table, simply omit the irrelevant sections for your table and keep the formatting of the rest. For further details, please refer to the main Instructions for Authors of JMIR document. bFootnotes are labeled in superscript lower case a-z. Other symbols are not used. cAstericks (*) can only be used if exact P values cannot be provided for a specific reason, and are listed after the superscript a-z footnotes. dplease be conscious of the overall width of the table. Tables will be automatically fitted/resized to the width of a US Letter Small page in portrait configuration during typesetting. Overcrowded Tables or Tables that are too crowded WILL look squished, and should be avoided if possible. elonger headings can be abridged within the Table, with a full explanation in a footnote.

Figure 1. Captions/titles are inserted online. Try to use Times New Roman for text within the Figure to match the font of the final typeset manuscript when possible. These should be .jpeg or .png files. Please prepare Figures with good resolution – Figures that are predominantly graphics/pictures should have dpi close to 300, while those that are text- dominant can have lower resolution (usually dpi 200). Try to use combinations of color and symbols/line styles to define and refer to different categories. This will help with readability if Figures are printed/viewed in black and white. Discussion Principal Results Limitations Comparison with Prior Work Conclusions

Acknowledgements Please include all authors’ contributions, funding information, financial disclosure, role of sponsors, and other acknowledgements here. This description should include the involvement, if any, in review and approval of the manuscript for publication and the role of sponsors. Omit if not applicable.

Conflicts of Interest Disclose any personal financial interests related to the subject matters discussed in the manuscript here. For example, authors who are owners or employees of Internet companies that market the services described in the manuscript will be disclosed here. If none, indicate with “none declared”. Abbreviations JMIR: Journal of Medical Internet Research RCT: randomized controlled trial

131

Multimedia Appendix 1 Multimedia appendices are supplementary files, such as a PowerPoint presentation of a conference talk about the study, additional screenshots of a website, mpeg/Quicktime video/audio files, Excel/Access/SAS/SPSS files containing original data (very long tables), and questionnaires. See https://jmir.zendesk.com/hc/en-us/articles/115003396688 for further information. Do not include copyrighted material unless you obtained written permission from the copyright holder, which should be uploaded together with your Publication Agreement form as supplementary file.

The Multimedia Appendices must be uploaded online, accompanied by a caption. CONSORT-EHEALTH checklists are always uploaded as Multimedia Appendices. Although this is primarily intended for randomized trials, the section of the checklist describing how an intervention should be reported is also relevant for manuscripts with other evaluation designs. Before submission, authors of RCTs must fill in the electronic CONSORT-EHEALTH questionnaire at http://tinyurl.com/consort-ehealth-v1-6 with quotes from their manuscript (if you wish to comment on the importance of the items from the checklist for reporting, please also rate each item on a scale between 1-5). BEFORE you press submit, please generate a pdf of the form with your responses and upload this file as supplementary file entitled CONSORT-EHEALTH V1.6. References 1. Number references using 1., 2., 3. etc (no square brackets) corresponding to the square bracketed references (eg, [1], [2,3], [4-7]) in the body of the manuscript. 2. DO NOT use italics, periods after authors’ initials, and periods after journal abbreviations. 3. DO use a semicolon (;) after a journal title before the year, put volume number in parenthesis, and use a colon (:) before the page numbers. 4. Titles should be in sentence case (do NOT capitalize the first letter of every word). 5. Do not use the footnotes tool to generate the reference list. 6. Cite only published or accepted (“in print”) works. Submitted papers (not accepted) documents not widely available (personal emails, letters), or oral communications (unless they are published abstracts) should NOT be cited as references. Cite these in the main body of text as “personal communication by NAME, DATE” after obtaining permission from the communicator to quote his communication. 7. Remove OLE elements from reference management softwares such as Endnote and Reference Manager. Select the entire document (Ctrl+A or Command A), remove field codes (Ctrl+Shift+F9 or Command+6). This is important for correct parsing of your reference list using RefCheck during copyediting. This is an automatic process, but please check for

132 completeness and accuracy of parsed fields for each reference when prompted during copyediting steps after acceptance of your manuscript. 8. Journal Articles (examples following): append the PubMed Identifier (PMID, eg, "PMID:1234567", where 1234567 is the PubMed identifier) or DOI (digital object identifier, eg, doi:10.1136/bmj.331.7529.1391) after each reference. Alternatively (as per our old instructions) you could append a [Medline] link after each reference, linking to the PubMed abstract of the article you are citing. You may check whether a DOI is correct using the DOI resolver at http://dx.doi.org/. 9. International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. JAMA 1997;277:927-934. PMID:9062335 10. International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. JAMA 1997;277:927-934. [Medline] 11. Websites and Web articles (URLs) (example following) should be cited as "webcited®" references in the reference section at the end of the manuscript - do not include links to websites in the text. To webcite® a web reference means to take a snapshot of the cited document and to cite the archived copy (WebCite link) in addition to the original URL. JMIR now requires that authors use the WebCite ® technology (www.webcitation.org) to archive cited web references first before they cite them. Do not cite uncached "live" webpages and websites in the article or reference section, unless archiving with WebCite has failed. Provide both the original URL and the WebCite link. Note that journal articles in electronic formats are journal articles, not a web reference. 12. Fox S, Fallows D. 2003. Internet Health Resources. http://www.pewinternet.org/pdfs/PIP_Health_Report_July_2003.pdf. Archived at: http://www.webcitation.org/5I2STSU61 13. For books, please add the ISBN, if known (no blanks). (http://isbndb.com/; examples below) 14. Iverson CL, Flanagin A, Fontanarosa PB, et al. American Medical Association Manual of Style: A Guide for Authors and Editors. 9th edition. Baltimore, Md: Williams & Wilkins; 1998. ISBN:0195176332 15. Phillips SJ, Whisnant JP. Hypertension and stroke. In: Laragh JH, Brenner BM, editors. Hypertension: pathophysiology, diagnosis, and management. 2nd ed. New York: Raven Press; 1995. p. 465-78. 16. Conference Proceedings (example below). If conference proceedings are available through Medline, please use the Medline citation. 17. Kimura J, Shibasaki H, editors. Recent advances in clinical neurophysiology. Proceedings of the 10th International Congress of EMG and Clinical Neurophysiology; 1995 Oct 15-19; Kyoto, Japan. Amsterdam: Elsevier; 1996.

133

Appendix B. Systematic review search strategy

MEDLINE, PSYCINFO, EMBASE Relational agent* AND Mental health (MeSH) AND Intervention*

OR OR OR Chatbot* Mental health recovery (MeSH) Therap* OR OR OR Chat bot Mental disorders (MeSH) Treatment* OR OR OR Social bot Anxiety disorders (MeSH) Therapy, computer- OR OR assisted (MeSH) Anxiety (MeSH) Virtual avatar OR OR OR Depression (MeSH) Psychotherapy (MeSH) Virtual assistant OR OR OR Stress, psychological (MeSH) Counseling (MeSH) Virtual coach OR OR OR Stress disorders, traumatic (MeSH) Adaptation, OR Virtual reality (MeSH) psychological (MeSH) Stress disorders, post-traumatic (MeSH) OR OR OR Virtual conversation* Phobic disorders (MeSH) Problem solving (MeSH) OR OR OR Interactive agent* Phobia, social (MeSH) Behavior therapy OR OR (MeSH) Automat* conversation* Fear (MeSH) OR OR OR Cognitive therapy Obsessive-compulsive disorder (MeSH) Chat agent OR (MeSH) OR Depressive disorder (MeSH) conversation* agent OR OR Personality disorders (MeSH) Conversation system* OR Bipolar disorder (MeSH) OR OR Dialog* system Psychiatry (MeSH) OR OR Artificial intelligence (MeSH) Psychological trauma (MeSH) OR OR Conversational Stress* OR OR Social robot Problem* OR Robotics (MeSH) OR Virtual Reality (MeSH) OR Computer simulation (MeSH) OR Empath*

134

Appendix C. Author guidelines for Acta Psychiatrica Scandinavica

Sections

1. Submission 2. Aims and Scope 3. Manuscript Categories and Requirements 4. Preparing a Submission 5. Editorial Policies and Ethical Considerations 6. Author Licensing 7. Publication Process After Acceptance 8. Post Publication 9. Editorial Office Contact Details

1. SUBMISSION

By submitting a manuscript to or reviewing for this publication, your name, email address, and affiliation, and other contact details the publication might require, will be used for the regular operations of the publication, including, when necessary, sharing with the publisher (Wiley) and partners for production and publication. The publication and the publisher recognize the importance of protecting the personal information collected from users in the operation of these services, and have practices in place to ensure that steps are taken to maintain the security, integrity, and privacy of the personal data collected and processed. You can learn more at https://authorservices-wiley-com.manchester.idm.oclc.org/statements/data-protection- policy.html. Authors should kindly note that submission implies that the content has not been published or submitted for publication elsewhere except as a brief abstract in the proceedings of a scientific meeting or symposium.

Once the submission materials have been prepared in accordance with the Author Guidelines, manuscripts should be submitted online at https://mc.manuscriptcentral.com/actapsych

The submission system will prompt authors to use an ORCID iD (a unique author identifier) to help distinguish their work from that of other researchers. Click here to find out more.

Click here for more details on how to use ScholarOne.

For help with submissions, please contact: [email protected]

2. AIMS AND SCOPE

Acta Psychiatrica Scandinavica acts as an international forum for the dissemination of information advancing the science and practice of psychiatry. In particular we focus on communicating frontline research to clinical psychiatrists and psychiatric researchers. Acta Psychiatrica Scandinavica has traditionally been and remains a journal predominantly on clinical psychiatry, but translational psychiatry is a topic of growing importance to our readers. Therefore, the journal welcomes submission of manuscripts based on both clinical- and more translational (e.g. preclinical and epidemiological) research. When preparing manuscripts based on translational studies for submission to Acta Psychiatrica Scandinavica, the authors should place emphasis on the clinical significance of the research question and the findings. Manuscripts based solely on preclinical research (e.g. animal models) are normally not considered for publication in the Journal.

135

3. MANUSCRIPT CATEGORIES AND REQUIREMENTS

(For general formatting guidelines please see point 4). i. Original Articles Acta Psychiatrica Scandinavica welcomes submission of manuscripts based on original research, especially those that bring about new knowledge of the aetiology and/or treatment of mental disorders. Special requirements for Original Articles: Significant Outcomes: Provide up to three Significant Outcomes encapsulating the 'take-home messages' of the manuscript. The Significant Outcomes are to be presented succinctly (ideally only 1 sentence and max 2 sentences each), in tabulated form and should derive from the conclusions of the manuscript, without merely restating the conclusion, raising new issues, posing further questions or being dogmatic. Limitations: Provide up to three noteworthy Limitations. The Limitations should inform the reader about potential weaknesses, for instance in relation to study design, sample size and internal/external validity. The Limitations are to be presented succinctly (ideally only 1 sentence and max 2 sentences each) in tabulated form. In the manuscript, the Significant Outcomes and the Limitations must be placed immediately below the Abstract/Keywords. ii. Systematic Reviews / Meta-analyses Acta Psychiatrica Scandinavica welcomes submission of systematic reviews and meta-analyses. Such submissions must follow both the general guidelines for manuscripts outlined above as well as the guidelines provided in the Preferred Reporting Items for Systematic Reviews and Meta- Analyses (PRISMA) Statement: http://www.prisma- statement.org/PRISMAStatement/PRISMAStatement.aspx Special requirements for Systematic Reviews / Meta-analyses: Summations: Provide up to three significant Summations encapsulating the 'take-home messages' of the manuscript, The Summations should be presented succinctly (ideally only 1 sentence and max 2 sentences each), in tabulated form and should derive from the conclusions of the manuscript, without merely restating the conclusion, raising new issues, posing further questions or being dogmatic. Limitations: Provide up to three noteworthy Limitations. The Limitations must reflect any caveats or limitations related to the review process or the meta-analysis. The Limitations are to be presented succinctly (ideally only 1 sentence and max 2 sentences each) in tabulated form. In the manuscript, the Summations and Limitations must be placed immediately below the Abstract/Keywords. iii. From Research to Clinical Practice Acta Psychiatrica Scandinavica welcomes submissions of manuscripts describing how and when research results can be translated to clinical psychiatric practice. Typically, the authors of such manuscripts will have contributed substantially to the development of the area of clinical research that the manuscript is addressing. For a further description of the intended content of this type of submission, please see “4. PREPARING A SUBMISSION” below.

Special requirements for From Research to Clinical Practice: Clinical Recommendations: Provide up to three Clinical Recommendations. The Clinical Recommendations should be presented succinctly (ideally only 1 sentence and max 2 sentences each), in tabulated form and should derive from the conclusions of the manuscript, without merely restating the conclusion, raising new issues, posing further questions or being dogmatic.

136

Limitations: Provide up to three noteworthy Limitations. The Limitations must reflect any caveats related to the potential implementation of new clinical practices. The Limitations are to be presented succinctly (ideally only 1 sentence and max 2 sentences each) in tabulated form. In the manuscript, the Clinical Recommendations and Limitations must be placed immediately below the Abstract/Keywords. iv. Research Letters Acta Psychiatrica Scandinavica welcomes submissions of Research Letters, which represent an opportunity to publish (preliminary) research findings that are of interest to the field. Research Letters are “unstructured”, i.e. without the subheadings used in the full-length manuscripts. The length of the letters should be approximately 750-1000 words and a maximum of 5 references can be included. The authors may include a small table or figure in the submission. Abstracts are not used for Research Letters. v. Letters to the Editor Acta Psychiatrica Scandinavica welcomes submissions of Letters to the Editor that either I) comment on recent publications in the Journal or II) voice original ideas, opinions, optimism or concerns with regard to the field of psychiatry. The length of the letters should be approximately 750-1000 words and a maximum of 5 references can be included. The authors may include a small table or figure in the submission. Abstracts are not used for Letters to the Editor.

4. PREPARING A SUBMISSION

Parts of the Manuscript The manuscript should be submitted in separate files: main text file; figures; supplementary material (for online publication).

Main Text File (Original Articles, Systematic Reviews and From Research to Clinical Practice articles must follow this format) The main text file should be presented in the following order: i. A short informative title without abbreviations or acronyms ii. A short running title of 50 characters or less iii. The full names of the authors; iv. The author's institutional affiliations (where the work was conducted) v. Acknowledgments vi. Abstract and keywords vii. Significant Outcomes and Limitations (for Original Articles); Summations and Limitations (for Review Articles/Meta-Analyses); Clinical Recommendations and Limitations (for From Research to Clinical Practice articles) viii. Main text ix. References x. Tables (each table complete with title and footnotes) xi. Figure legends xii. Appendices (if relevant)

Authorship Please refer to the journal’s authorship policy the Editorial Policies and Ethical Considerations section for details on author listing eligibility.

Acknowledgments Contributions from anyone who does not meet the criteria for authorship should be listed, with permission from the contributor, in an Acknowledgments section. Financial and material support should also be acknowledged. 137

Conflict of Interest Statement Authors will be asked to provide a conflict of interest statement during the submission process. For details on what to include in this section, see the section ‘Conflict of Interest’ in the Editorial Policies and Ethical Considerations section below. Submitting authors should ensure they liaise with all co-authors to confirm agreement with the final statement.

Abstract The Abstract should be divided into the following sections: 'Objective', 'Methods', 'Results', and 'Conclusion' (the main part of the Abstract is devoted to Results). The abstract should not exceed 200 words.

Keywords Please provide 3-5 keywords. Keywords should be taken from those recommended by the US National Library of Medicine's Medical Subject Headings (MeSH) browser list at www.nlm.nih.gov/mesh.

Main Text (Original Articles and Systematic Reviews) Introduction: One to two pages concluded by the subtitle Aims of the Study (3 to 5 lines without literature references and abbreviations). Material and Methods: The authors may refer to design and methods described in previously published articles, but must include a succinct yet comprehensive description of these aspects in the new submission as well. Results: Clear and short avoiding double documentation to tables/figures. Discussion: Acta Psychiatrica Scandinavica articles do not have a conclusion section. If the authors find it necessary, they may include a concluding remark of maximum 5 lines as the final part of the Discussion.

Main Text (From Research to Clinical Practice articles) Introduction: Approximately one page that describes the clinical challenge, which the manuscript focuses on (e.g. treatment-resistant depression) and why it is important to address this challenge. State-of-the-art: A thorough description (but not a systematic review) of the most recent research development in the field (for instance studies on ketamine for the treatment of treatment-resistant depression). From research to clinical practice: This section should describe how the research findings described in the “state-of-the-art” section can be translated to current clinical practice. Limitations: This final section should address whether the current level of evidence is sufficient to allow for a change of clinical practice and - if not – what studies that need to be conducted to allow for this change to take place.

References All references should be numbered consecutively in order of appearance and should be as complete as possible. In text citations should cite references in consecutive order using Arabic superscript numerals. For more information about AMA reference style please consult the AMA Manual of Style Sample references follow:

Journal article 1. King VM, Armstrong DM, Apps R, Trott JR. Numerical aspects of pontine, lateral reticular, and inferior olivary projections to two paravermal cortical zones of the cat cerebellum. J Comp Neurol 1998;390:537-551.

138

Book 2. Voet D, Voet JG. Biochemistry. New York: John Wiley & Sons; 1990. 1223 p.

Internet document 3. American Cancer Society. Cancer Facts & Figures 2003. http://www.cancer.org/downloads/STT/CAFF2003PWSecured.pdf Accessed March 3, 2003

Tables Tables should be self-contained and complement, not duplicate, information contained in the text. They should be supplied as editable files, not pasted as images. Legends should be concise but comprehensive – the table, legend, and footnotes must be understandable without reference to the text. All abbreviations must be defined in footnotes. Footnote symbols: †, ‡, §, ¶, should be used (in that order) and *, **, *** should be reserved for P-values. Statistical measures such as SD or SEM should be identified in the headings.

Figure Legends Legends should be concise but comprehensive – the figure and its legend must be understandable without reference to the text. Include definitions of any symbols used and define/explain all abbreviations and units of measurement.

Figures The total number of figures/tables should not exceed 5. Figures are given priority over tables. Although authors are encouraged to send the highest-quality figures possible, for peer- review purposes, a wide variety of formats, sizes, and resolutions are accepted. Click here for the basic figure requirements for figures submitted with manuscripts for initial peer review, as well as the more detailed post-acceptance figure requirements. Figures submitted in color may be reproduced in colour online free of charge. Please note, however, that it is preferable that line figures (e.g. graphs and charts) are supplied in black and white so that they are legible if printed by a reader in black and white. If an author would prefer to have figures printed in colour in hard copies of the journal, a fee will be charged by the Publisher.

Additional Files Appendices Appendices will be published after the references. For submission they should be supplied as separate files but referred to in the text.

Supporting Information Supporting information is information that is not essential to the article, but provides greater depth and background. It is hosted online and appears without editing or typesetting. It may include tables, figures, videos, datasets, etc. Click here for Wiley’s FAQs on supporting information. Note: if data, scripts, or other resources used to generate the analyses presented in the paper are available via a publicly available data repository, authors should include a reference to the location of the material within their paper.

General Style Points The following points provide general advice on formatting and style. • Abbreviations: In general, terms should not be abbreviated unless they are used repeatedly and the abbreviation is helpful to the reader. Initially, use the word in full, followed by the abbreviation in parentheses. Thereafter use the abbreviation only. For abbreviations and symbols use Units, Symbols and Abbreviations for Authors and Editors in Medicine Related

139

Sciences, Sixth Edition. Edited by D.N. Baron and M McKenzie Clarke. ISBN: 9781853156243, Paperback, April, 2008. Abbreviations are not allowed in titles, headings and “Aims of the Study”. • Units of measurement: Measurements should be given in SI or SI-derived units. Visit the Bureau International des Poids et Mesures (BIPM) website for more information about SI units. • Trade Names: Chemical substances should be referred to by the generic name only. Trade names should not be used. Drugs should be referred to by their generic names. If proprietary drugs have been used in the study, refer to these by their generic name, mentioning the proprietary name and the name and location of the manufacturer in parentheses.

Wiley Author Resources Manuscript Preparation Tips: Wiley has a range of resources for authors preparing manuscripts for submission available here. In particular, authors may benefit from referring to Wiley’s best practice tips on Writing for Search Engine Optimization.

Editing, Translation, and Formatting Support: Wiley Editing Services can greatly improve the chances of a manuscript being accepted. Offering expert help in English language editing, translation, manuscript formatting, and figure preparation, Wiley Editing Services ensures that the manuscript is ready for submission.

5. EDITORIAL POLICIES AND ETHICAL CONSIDERATIONS

Peer Review and Acceptance The acceptance criteria for all papers are the quality and originality of the research and its significance to journal readership. Manuscripts are single-blind peer reviewed. Papers will only be sent to review if the Editor-in-Chief determines that the paper meets the appropriate quality and relevance requirements. Wiley's policy on the confidentiality of the review process is available here.

Human Studies and Subjects For manuscripts reporting medical studies that involve human participants, a statement identifying the ethics committee that approved the study and confirmation that the study conforms to recognized standards is required, for example: Declaration of Helsinki; US Federal Policy for the Protection of Human Subjects; or European Medicines Agency Guidelines for Good Clinical Practice. It should also state clearly in the text that all persons gave their informed consent prior to their inclusion in the study. Patient anonymity should be preserved. Photographs need to be cropped sufficiently to prevent human subjects being recognized (or an eye bar should be used). Images and information from individual participants will only be published where the authors have obtained the individual's free prior informed consent. Authors do not need to provide a copy of the consent form to the publisher; however, in signing the author license to publish, authors are required to confirm that consent has been obtained. Wiley has a standard patient consent form available for use.

Animal Studies A statement indicating that the protocol and procedures employed were ethically reviewed and approved, as well as the name of the body giving approval, must be included in the Methods section of the manuscript. Authors should also state whether experiments were performed in accordance with relevant institutional and national guidelines for the care and use of laboratory animals: • US authors should cite compliance with the US National Research Council's Guide for the Care and Use of Laboratory Animals, the US Public Health Service's Policy on Humane Care and Use of Laboratory Animals, and Guide for the Care and Use of Laboratory Animals.

140

• UK authors should conform to UK legislation under the Animals (Scientific Procedures) Act 1986 Amendment Regulations (SI 2012/3039). • European authors outside the UK should conform to Directive 2010/63/EU.

Clinical Trial Registration The journal requires that clinical trials are prospectively registered in a publicly accessible database and clinical trial registration numbers should be included in all papers that report their results. Authors are asked to include the name of the trial register and the clinical trial registration number at the end of the abstract. If the trial is not registered, or was registered retrospectively, the reasons for this should be explained.

Research Reporting Guidelines Accurate and complete reporting enables readers to fully appraise research, replicate it, and use it. Authors are expected to adhere to recognised research reporting standards. The EQUATOR Network collects more than 370 reporting guidelines for many study types, including for: • Randomised trials : CONSORT • Observational studies : STROBE • Systematic reviews and Meta-Analyses : PRISMA • Case reports : CARE • Qualitative research : SRQR • Diagnostic / prognostic studies : STARD • Quality improvement studies : SQUIRE • Economic evaluations : CHEERS • Animal studies : ARRIVE • Study protocols : SPIRIT • Clinical practice guidelines : AGREE We also encourage authors to refer to and follow guidelines from: • Future of Research Communications and e-Scholarship (FORCE11) • National Research Council's Institute for Laboratory Animal Research guidelines • The Gold Standard Publication Checklist from Hooijmans and colleagues • Minimum Information Guidelines from Diverse Bioscience Communities (MIBBI) website • FAIRsharing website

Genetic Nomenclature Sequence variants should be described in the text and tables using both DNA and protein designations whenever appropriate. Sequence variant nomenclature must follow the current HGVS guidelines; see varnomen.hgvs.org, where examples of acceptable nomenclature are provided.

Sequence Data Nucleotide sequence data can be submitted in electronic form to any of the three major collaborative databases: DDBJ, EMBL, or GenBank. It is only necessary to submit to one database as data are exchanged between DDBJ, EMBL, and GenBank on a daily basis. The suggested wording for referring to accession-number information is: ‘These sequence data have been submitted to the DDBJ/EMBL/GenBank databases under accession number U12345’. Addresses are as follows: • DNA Data Bank of Japan (DDBJ): www.ddbj.nig.ac.jp • EMBL Nucleotide Archive: ebi.ac.uk/ena • GenBank: www.ncbi.nlm.nih.gov/genbank Proteins sequence data should be submitted to either of the following repositories: • Protein Information Resource (PIR): pir.georgetown.edu • SWISS-PROT: expasy.ch/sprot/sprot-top

141

Structural Data For papers describing structural data, atomic coordinates and the associated experimental data should be deposited in the appropriate databank (see below). Please note that the data in databanks must be released, at the latest, upon publication of the article. We trust in the cooperation of our authors to ensure that atomic coordinates and experimental data are released on time. • Organic and organometallic compounds: Crystallographic data should not be sent as Supporting Information, but should be deposited with the Cambridge Crystallographic Data Centre (CCDC) at ccdc.cam.ac.uk/services/structure%5Fdeposit. • Inorganic compounds: Fachinformationszentrum Karlsruhe (FIZ; fiz-karlsruhe.de). • Proteins and nucleic acids: Protein Data Bank (rcsb.org/pdb). • NMR spectroscopy data: BioMagResBank (bmrb.wisc.edu).

Conflict of Interest The journal requires that all authors disclose any potential sources of conflict of interest. Any interest or relationship, financial or otherwise that might be perceived as influencing an author's objectivity is considered a potential source of conflict of interest. These must be disclosed when directly relevant or directly related to the work that the authors describe in their manuscript. Potential sources of conflict of interest include, but are not limited to: patent or stock ownership, membership of a company board of directors, membership of an advisory board or committee for a company, and consultancy for or receipt of speaker's fees from a company. The existence of a conflict of interest does not preclude publication. If the authors have no conflict of interest to declare, they must also state this at submission. It is the responsibility of the corresponding author to review this policy with all authors and collectively to disclose with the submission ALL pertinent commercial and other relationships.

Funding Authors should list all funding sources in the Acknowledgments section. Authors are responsible for the accuracy of their funder designation. If in doubt, please check the Open Funder Registry for the correct nomenclature: https://www.crossref.org/services/funder-registry/

Authorship The journal follows the ICMJE definition of authorship, which indicates that authorship be based on the following 4 criteria: • Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND • Drafting the work or revising it critically for important intellectual content; AND • Final approval of the version to be published; AND • Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. In addition to being accountable for the parts of the work he or she has done, an author should be able to identify which co-authors are responsible for specific other parts of the work. In addition, authors should have confidence in the integrity of the contributions of their co-authors. All those designated as authors should meet all four criteria for authorship, and all who meet the four criteria should be identified as authors. Those who do not meet all four criteria should be acknowledged. These authorship criteria are intended to reserve the status of authorship for those who deserve credit and can take responsibility for the work. The criteria are not intended for use as a means to disqualify colleagues from authorship who otherwise meet authorship criteria by denying them the opportunity to meet criterion #s 2 or 3. Therefore, all individuals who meet the first criterion should have the opportunity to participate in the review, drafting, and final approval of the manuscript. Contributions from anyone who does not meet the criteria for authorship should be listed, with permission from the contributor, in an Acknowledgments section (for example, to recognize 142 contributions from people who provided technical help, collation of data, writing assistance, acquisition of funding, or a department chairperson who provided general support). Prior to submitting the article all authors should agree on the order in which their names will be listed in the manuscript. Additional Authorship Options. Joint first or senior authorship: In the case of joint first authorship, a footnote should be added to the author listing, e.g. ‘X and Y should be considered joint first author’ or ‘X and Y should be considered joint senior author.’

Data Sharing and Data Accessibility The journal encourages authors to share the data supporting the results in the paper by archiving it in an appropriate public repository. Authors should include a data accessibility statement, including a link to the repository they have used, in order that this statement can be published alongside their paper. Data Protection By submitting a manuscript to or reviewing for this publication, your name, email address, and affiliation, and other contact details the publication might require, will be used for the regular operations of the publication, including, when necessary, sharing with the publisher (Wiley) and partners for production and publication. The publication and the publisher recognize the importance of protecting the personal information collected from users in the operation of these services, and have practices in place to ensure that steps are taken to maintain the security, integrity, and privacy of the personal data collected and processed. You can learn more at https://authorservices-wiley-com.manchester.idm.oclc.org/statements/data-protection- policy.html.

Human subject information in databases. The journal refers to the World Health Medical Association Declaration of Taipei on Ethical Considerations Regarding Health Databases and Biobanks.

ORCID As part of the journal’s commitment to supporting authors at every step of the publishing process, the journal requires the submitting author (only) to provide an ORCID iD when submitting a manuscript. This takes around 2 minutes to complete. Find more information here.

Publication Ethics This journal is a member of the Committee on Publication Ethics (COPE). Note this journal uses iThenticate’s CrossCheck software to detect instances of overlapping and similar text in submitted manuscripts. Read Wiley'sTop 10 Publishing Ethics Tips for Authors here. Wiley’s Publication Ethics Guidelines can be found here.

Referrals to the Open Access Journals Brain and Behavior and Clinical Case Reports Authors of high quality papers that Acta cannot offer to publish, perhaps due to scope or space, may be offered to have their manuscript and peer review reports forwarded for consideration by the editor of either of Wiley’s Open Access Journals Brain and Behavior or Clinical Case Reports. The review process by the Open Access journals will continue at the point left by Acta Psychiatrica Scandinavica. Articles that are eventually accepted by the Brain and Behavior or Clinical Case Reports, perhaps after revision, will typically be published within 15 days of acceptance. The Editor of the Open Access Journal will accept submissions that report well-conducted research which reaches the standard acceptable for publication. Brain and Behavior and Clinical Case Reports are Wiley Open Access journals and article publication fees apply.

6. AUTHOR LICENSING

143

If your paper is accepted, the author identified as the formal corresponding author will receive an email prompting them to log in to Author Services, where via the Wiley Author Licensing Service (WALS) they will be required to complete a copyright license agreement on behalf of all authors of the paper. Authors may choose to publish under the terms of the journal’s standard copyright agreement, or OnlineOpen under the terms of a Creative Commons License. General information regarding licensing and copyright is available here. To review the Creative Commons License options offered under OnlineOpen, please click here. (Note that certain funders mandate that a particular type of CC license has to be used; to check this please click here.) Self-Archiving definitions and policies. Note that the journal’s standard copyright agreement allows for self-archiving of different versions of the article under specific conditions. Please click here for more detailed information about self-archiving definitions and policies. Open Access fees: If you choose to publish using OnlineOpen you will be charged a fee. A list of Article Publication Charges for Wiley journals is available here. Funder Open Access: Please click here for more information on Wiley’s compliance with specific Funder Open Access Policies.

7. PUBLICATION PROCESS AFTER ACCEPTANCE

Accepted Article Received in Production When an accepted article is received by Wiley’s production team, the corresponding author will receive an email asking them to login or register with Wiley Author Services. The author will be asked to sign a publication license at this point.

Proofs Once the paper is typeset, the author will receive an email notification with the URL to download a PDF typeset page proof, as well as associated forms and full instructions on how to correct and return the file. Please note that the author is responsible for all statements made in their work, including changes made during the editorial process – authors should check proofs carefully. Note that proofs should be returned within 48 hours from receipt of first proof.

Publication Charges Colour figures. Colour figures may be published online free of charge; however, the journal charges for publishing figures in colour in print. If the author supplies colour figures at Early View publication, they will be invited to complete a colour charge agreement in RightsLink for Author Services. The author will have the option of paying immediately with a credit or debit card, or they can request an invoice. If the author chooses not to purchase colour printing, the figures will be converted to black and white for the print issue of the journal. Please contact the Production Editor if you have any queries regarding this.

144

Appendix D. NHS Research Ethics Committee favourable opinion

145

146

147

148

Appendix E. Participant information sheet (PIS)

MYLO_PIS_Version 3_05.11.2018_generic IRAS ID: 238389

PATIENT INFORMATION SHEET

STUDY: Service-user experiences of text-based conversations with a computer about their difficulties

You are being invited to take part in a research study. Before you decide, it is important for you to understand why the research is being done and what it will involve. Please take time to read the following information carefully. Talk to others about the study if you wish.

Please ask us about anything that is not clear enough or if you would like more information.

1.1 What is the purpose of the study?

This study is being carried out in part fulfilment of a Doctorate in Clinical Psychology (ClinPsyD) degree. A new online intervention called Manage Your Life Online (MYLO) has been developed. The aim of this study is to find out what is helpful and unhelpful about the MYLO programme and why. This will help us to understand how to improve MYLO and increase our understanding of how this type of intervention might work for different people.

1.2 What is MYLO?

MYLO is a computer programme that simulates a conversation using artificial intelligence. It is accessed online using a unique username and password. Users have text conversations with MYLO by typing on their computer keyboard. MYLO responds to what you type with questions that encourage new ways of thinking about a problem. MYLO encourages users to express themselves freely and experience feelings related to a difficulty. If you agree to take part, you will be given access to MYLO for a 2-week period. As MYLO is accessed online, you can use it when, where and as often as you like during this time. Participating in the study will not affect your current or future healthcare and you are able to access any other support or services alongside taking part.

1.3 Why have I been invited?

149

You have been invited to take part in this study because you have indicated that you are experiencing some psychological distress.

1.4 Do I have to take part?

No – it is your decision entirely and taking part is voluntary. If you do decide to take part, you will be given this information sheet to keep and will be asked to sign a consent form which acknowledges you have fully understood what you are taking part in. If you do consent, you are still free to withdraw from the study at any time. You do not need to give a reason. Withdrawing from the study will not affect your current or future healthcare.

1.5 What will happen if I take part?

Initial meeting (~ 45 minutes): If you agree to take part, a researcher will arrange a convenient time and place to meet with you to obtain your written consent to take part. You will also be asked to complete some questionnaires at this meeting. We will check that you have access to a suitable device (e.g. computer, iPad, tablet, smartphone or local library) that is connected to the internet. You will then be given a unique username and password to enter the MYLO programme. The researcher will demonstrate how to access MYLO on your device and also provide you with printed access instructions that you can refer to if you need.

You will then have two weeks to use the MYLO programme as much as you want to.

Telephone call (~10 minutes): A researcher will telephone you a week after the initial meeting to answer any questions you have about MYLO. The researcher will also ask you about your mood and may ask you about whether you have had any thoughts to harm yourself or someone else. Finally, the researcher will arrange a suitable time and place to meet with you to conduct a follow up meeting.

Follow up meeting (~60 minutes): At this follow up meeting you will be asked to complete some questionnaires, that will take approximately 20 minutes to complete. The researcher will then ask you some general questions about your experiences of using the MYLO programme. This interview will be audio-recorded and will last approximately 15 minutes. The researcher will then provide you with a printed copy of your first conversation with the MYLO programme. You will be asked

150 to identify two questions which you found helpful and two questions which you found unhelpful and why. This conversation will be audio recorded and will take approximately 15 minutes. Audio recordings may be stopped or paused on request at any point during the interview. For each of the four questions identified, you will be asked to complete a short questionnaire.

Following this meeting, you will no longer be able to access the MYLO programme as this completes your participation in the research study.

1.6 What about confidentiality?

We are collecting and storing this personal information in accordance with the General Data Protection Regulation (GDPR) and Data Protection Act 2018 which legislate to protect your personal information. The legal basis upon which we are using your personal information is “public interest task” and “for research purposes” if sensitive information is collected. For more information about the way we process your personal information and comply with data protection law please see our Privacy Notice for Research Participants.

You will be given a code number so that any information we collect from you will not have your personal information on it. The only thing that will have your name on is paper copies of the consent form. Consent forms will be stored separately to other data to protect your identity. Any information will be stored in a locked filing cabinet at the School of Psychological Sciences, University of Manchester; or on secure computer storage drives accessible only to members of the research team.

What you type when using MYLO will be recorded and stored within MYLO automatically. This information will then be transferred to a secure computer storage drive that only the research team can access. The information is transferred as soon as possible after you finish a text conversation with MYLO. You are not required to provide any personal identifying information during conversations with MYLO. Your conversation data will not have any of your personal information on (e.g. your name). We will use the code number we give you instead. However, we would be able to identify you from your code number should you type information which may give us reason to think you might harm yourself or others.

Audio recordings will be made using an audio recorder which will be locked away securely when not being used. Audio files will be transferred as soon as possible to a secure computer storage drive accessible only to members of the research team. The audio

151 recording will be written out in full (transcribed) by the research team. Anonymous quotes from the interviews or conversations with MYLO may be used when writing up the results of the study.

All information we obtain will be securely stored for 5 years after publication of results in accordance with University policy. After this time, the information will be securely destroyed.

Individuals from the University of Manchester, NHS Trust or regulatory authorities may need to look at the data collected for this study to make sure the project is being carried out as planned. This may involve looking at identifiable data but all individuals involved in auditing and monitoring the study will have a strict duty of confidentiality to you as a research participant.

Your information will only be shared if you tell us about something that gives us reason to think you might harm yourself or anyone else. This includes information you have typed into MYLO. We will always try to tell you before we share any information but information will only be shared with the relevant services e.g. A&E or GP in order to keep you or someone else safe.

1.7 What are the possible benefits of taking part?

The programme is designed to help find new ways of thinking about a problem that might be beneficial. Pilot studies of MYLO with student participants have suggested that some people find MYLO to be helpful.

You will be able to decide when, where, how often, and what problem to talk about with MYLO. This may help to address difficulties in accessing services (e.g. stigma or mobility problems).

The information gathered from this study will help to inform our understanding of how this type of intervention works and what makes it most helpful to service users. These findings will inform improvements to the MYLO programme to improve its acceptability to future service users. Also, if you agree, we might also use the data collected from this study in future research studies or analysis.

1.8 What are the possible risks of taking part?

152

It is possible that talking about your difficulties with MYLO may cause some distress. However, you can control the topic of conversation, how often and for how long you use MYLO. We will provide a list of contacts to be used if you are experiencing a crisis.

1.9 What will happen if I don’t want to carry on with the study?

You are free to withdraw from the study at any point until the completion of the follow up meeting with the researcher. If you do want to stop being in the study we would like to keep the data you have provided to use for our analysis (including identifiable data). However, you can tell us that you would like your information to be destroyed and we will destroy it. If you withdraw from the study, it will not affect your usual healthcare in any way.

2.0 Will I be compensated for my participation in the study?

You will receive £10 (£5 at each meeting attended) for taking part. This is to cover any travel costs incurred and as a thank you for your time.

2.1 How will the results of the study be reported?

The results will be published in reports and scientific journals, but it will not be possible to identify you from these reports.

2.3 Who has reviewed the study?

All research in the NHS is looked at by an independent group of people, called a Research Ethics Committee to protect your safety, rights, wellbeing and dignity. This study was reviewed and given a favourable ethical opinion for conduct in the NHS [REC reference:18/NW/0367]. The study was also approved by the University of Manchester Research Ethics Committee.

2.4 Who can I contact for further information?

153

If you are interested in taking part then you can contact the research team to discuss or you can agree for your name and contact details to be passed to the research team and they will contact you. If you decide not to take part in the study, the research team will delete your details from their records.

Hannah Gaffney, Principle Researcher: Telephone: Email: [email protected] Address: Division of Psychology & Mental Health, Zochonis Building, The University of Manchester, Brunswick Street, M13 9PL

2.5 What if there is a problem? Minor complaints If you have a minor complaint then you need to contact the researcher(s) in the first instance. You can contact the Principle Researcher (Hannah Gaffney) using the details above or research supervisors (Dr Warren Mansell, [email protected]; 0161 275 8589 or Dr Sara Tai, [email protected]; 0161 306 0402)

Formal complaints If you wish to make a formal complaint or if you are not satisfied with the response you have gained from the researchers in the first instance then please contact the Research Governance and Integrity Manager, Research Office, Christie Building, University of Manchester, Oxford Road, Manchester, M13 9PL, by emailing: [email protected] or by telephoning 0161 275 2674 or 275 2046.

You can also contact your local Patient Advice and Liaison Service (PALS) on 0161 918 4047 to obtain independent advice about taking part in research and to raise a complaint.

Thank you for taking the time to read this. Please keep this information sheet for future reference.

154

Appendix F. Intervention process measure

Client Measure of Helpfulness Instructions: You will be asked to identify two questions from your conversation that you found helpful and two questions that you found unhelpful. For each question below, think about the interaction you had with MYLO and try to recall how you experienced what was happening for you at that time. Circle the number which best describes your experience. Question :

How helpful was this question?

0 1 2 3 4 5 6 7 8 9 10 Not helpful Extremely at all helpful

To what degree did this question enable you to:

Have a sense of control over what is happening in conversation

0 1 2 3 4 5 6 7 8 9 10 No control Complete at all control

Talk freely about your problem (without filtering what comes to mind)

0 1 2 3 4 5 6 7 8 9 10 Not able Entirely at all able

Feel able to experience emotion connected to the problem

0 1 2 3 4 5 6 7 8 9 10 Not able Entirely at all able See your problem in a new way

0 1 2 3 4 5 6 7 8 9 10 Not able Entirely at all able

155

To what degree did this question enable you:

To feel understood and respected

0 1 2 3 4 5 6 7 8 9 10 Not understood or Understood and respected respected at all completely

To talk about what you wanted

0 1 2 3 4 5 6 7 8 9 10 Not able Entirely at all able

To what extent did you feel this question was a good fit for you

0 1 2 3 4 5 6 7 8 9 10 Not a good fit An extremely at all good fit

156

Appendix G. Interview Schedule – Helpful and Unhelpful Questions

Interview Schedule - Helpful & unhelpful_V1_13.04.2018 IRAS ID: 238389

TITLE OF STUDY: Understanding what is helpful and unhelpful in intervention with online relational agent MYLO: An intervention process study.

Interview Schedule – Helpful & unhelpful questions

1) What made you choose that question as particularly helpful/unhelpful? Prompts

o Was there something about it which made it more helpful/unhelpful? o What makes you realise is was helpful/unhelpful for you?

2) What was happening in that moment? Prompts

o What did you notice? o Was anything else happening? o Did you notice any changes in how you were feeling or thinking about the problem?

157

Appendix H. Interview Schedule – Accessibility and Interface

Interview Schedule- Accessibility & interface feedback_V1_13.04.2018 IRAS ID: 238389

TITLE OF STUDY: Understanding what is helpful and unhelpful in intervention with online relational agent MYLO: An intervention process study.

Interview Schedule – Accessibility & interface feedback

1) How easy was it for you to access MYLO? o Were there things that made it easier/harder? o What suggestions do you have for how to improve access?

2) What did you think of the design of MYLO? o Could you tell us about anything you would change? o Are there things you thought worked well about the design?

3) What was your experience of putting your difficulties in writing? o In what ways were you able to explore your problems using MYLO? o Can you tell me about the things you might have found difficult about the text conversation? o Is there anything you particularly liked about text conversations with MYLO?

4) Do you have any suggestions on how MYLO may be improved?

5) What are your thoughts on whether you would recommend MYLO to others? o Why would you / would not?

158

Appendix I. Correspondence with JMIR regarding open-access fee waiver

159