ACCEPTED VERSION-Proof Copy

ACCEPTED VERSION-Proof copy

The Chalder Fatigue Questionnaire is a valid and reliable measure of perceived fatigue severity in multiple sclerosis

Joseph Chilcot1, Sam Norton1, Maedhbh Etain Kelly2, Rona Moss-Morris1

1Health Psychology Section, Psychology Department, Institute of Psychiatry, Psychology and

Neuroscience, King’s College London, UK

2 Department of Psychosis Studies, PO63, Institute of Psychiatry, Psychology and Neuroscience,

King's College London, UK

Correspondence: Rona Moss-Morris ([email protected])

Health Psychology Section, Institute of Psychiatry, Psychology and Neuroscience King’s College

London, 5th floor Bermondsey Wing, Guy’s Hospital Campus, London Bridge, London, SE1 9RT

Key words: Chalder fatigue questionnaire; Chalder fatigue scale; fatigue; multiple sclerosis; MS; psychometrics; measurement; confirmatory factor analysis

Conflict of interest: None declared

Abstract

1 Background: Fatigue is one of the most distressing symptoms of Multiple Sclerosis (MS).

Measuring MS fatigue poses a number of challenges. Many measures confound definitions of severity and impact of fatigue and/or lack psychometric validation in MS.

Objective: To evaluate the psychometric properties of an 11 item fatigue severity measure, the

Chalder Fatigue Questionnaire (CFQ) in MS including validity of the factor structure, internal reliability, discriminant validity, and sensitivity to change.

Methods: Data was pooled from four previous studies investigating MS-fatigue using the CFQ

(n=444). Data analysis included confirmatory factor analysis to determine the factor structure and model fit, correlations to assess discriminant validity, and effects sizes to determine sensitivity to change.

Results: A bi-factor model with one general fatigue factor, incorporating 2 smaller group factors

(mental and physical fatigue) had good model fit and appeared the most appropriate factor structure underlying the CFQ scale. The CFQ had high internal consistency, showed small to moderate correlations with impact of fatigue and mood, and was sensitive to change across low and high intensity behavioural interventions.

Conclusions: The CFQ measuring a composite of physical and mental fatigue severity (i.e. a total score) is a psychometrically sound measure of fatigue severity in MS.

2 Introduction

Fatigue is reported as one of the most common and disabling symptoms of multiple sclerosis (MS)1,

2. MS fatigue has been defined as “a subjective lack of physical and/or mental energy that is perceived by the individual or caregiver to interfere with usual and desired activities” 3, and remains a complex and debilitating phenomenon. Fatigue can be distinguished from fatigability, whereby fatigue is conceptualised as the subjective sensation and fatigability the objective changes in mental or physical performance 4. Subjective reports of MS fatigue have a significant impact upon quality of life, and are associated with negative psychosocial factors including unemployment 5, 6. MS fatigue can be distinguished from fatigue occurring in healthy persons by its rapid onset, heat sensitivity, and tendency to interfere with day-to-day activities7, 8.

Fatigue is clearly an important symptom, but the assessment of fatigue in MS poses a challenge due to its subjective and multi-faceted nature. Although attempts at objective measures of fatigability have been made, self-report questionnaires are the most common, and possibly the most effective way of evaluating fatigue in both research and clinical settings9, 10. There are numerous fatigue self- report scales. The most commonly used in MS are the Fatigue Severity Scale (FSS)8 and Fatigue

Impact Scale (FIS)11. The 11 item FSS was initially developed and validated on people with MS.

However, whether the FSS measures severity specifically is questionable. The items represent a conglomerate of effects of fatigue on daily life (e.g. fatigue interferes with my daily functioning), triggers of fatigue (exercise brings on fatigue) and miscellaneous items (I am easily fatigued).

Although the FSS shows good psychometric properties in MS 12 it measures multi-facets of fatigue rather than severity of the symptom experience specifically.

The FIS and Modified FIS11, 13 provide a clearer operational definition. Items measure impact of fatigue on physical, cognitive, and psychosocial functioning in MS. The measure is psychometrically sound 14. However, a recent evaluation of the face validity of the FIS by MS health professionals concluded that the items were non-specific to fatigue impact 14. In addition, 3 impact is not the same as severity of fatigue. Guidelines in the pain literature suggest outcomes of clinical interventions for pain should include both measures of the impact of pain on daily life and the severity or intensity of pain15. Although pain interference and impact are correlated they are sufficiently distinct that interventions can show change in one but not the other16.

This may also be true for fatigue. Including measures of perceived severity and impact of fatigue in

MS will not only help us understand fatigue better but elucidate intervention effects on fatigue.

This is consistent with a recent review on measuring fatigue in neurological illness which emphasised the need for measures which clearly define specified components of fatigue 4. As both the FSS and FIS incorporate measurement of impact of fatigue, a validated measure specific to fatigue severity in MS is warranted. The Chalder Fatigue Questionnaire (CFQ) was originally developed for use amongst patients with Chronic Fatigue Syndrome17. It consists of 11 items loading onto two dimensions of fatigue severity – mental fatigue and physical fatigue which map onto to the operational definition of MS fatigue presented in the opening paragraph. The instrument has been found to have good clinical validity and internal consistency within this population18, 19.

Given its efficiency and easy utilization, the CFQ is a popular assessment of fatigue within a range of illnesses. However, there has been no formal assessment of the validity and consistency of this scale in the specific context of MS-related fatigue. Furthermore, the multidimensionality of fatigue scales has been widely debated, with data suggesting that most measures, including the CFQ, are in fact unidimensional20

The overarching objective of this study was to consider whether the CFQ is a suitable tool to be used in the evaluation of fatigue severity in people with MS. Our aims were to evaluate the psychometric properties of the CFQ with respect to its factor structure, internal reliability, sensitivity to change following intervention and discriminant validity. To assess discriminant validity we explored relationships between the CFQ and measures of both fatigue impact and depression. Previous work suggests there is a relationship between fatigue and depression but that

4 depression can improve independently of fatigue and vice versa 4, 21. We would therefore only expect small to moderate correlations between fatigue and depression. Similarly, as we argued that severity and impact should be considered independently, we expected only moderate relationships between the CFQ and measures of impact of fatigue.

Methods

Participants and design

Participants were drawn from four recent studies (n=444), which either investigated correlates of

MS fatigue22, 23 or trialed CBT-based treatments for MS fatigue21, 24. The data included in the main analysis was either cross-sectional or at baseline in the context of the two randomised controlled trials (RCTs). Demographic and illness characteristics for each study are shown in Table 1 and the ethical approvals for each cohort are described in the relevant publications.

INSERT TABLE 1 ABOUT HERE

Instruments

The Chalder Fatigue Questionnaire (CFQ)17, also referred to as the Chalder Fatigue Scale, is an

11-item questionnaire measuring the severity of physical and mental fatigue on two separate subscales. Seven items represent physical fatigue (items 1-7) and 4 represent mental fatigue (items

8-11). The studies from which the pooled data was collected used a slightly updated version of the

CFQ25, 26, which has been used widely including in the PACE trial27. In this version the item “Do you have problems thinking clearly?” is replaced with “Do you find it more difficult to find the correct word?”. Cella and Chalder25 state that this slight amendment improves the scales reliability, although either item could be used without impacting on the measures interpretation26. Within the context of MS the questions are asked with the following stem ‘We would like to know more about any problems you have had with fatigue in the last month. Please answer ALL the questions simply

5 by ticking the answer, which you think most, applies to you. We would like to know how you feel at the moment, or recently, compared to when you were last well’. Each item is scored 0-3; less than usual (0), no more than usual (1), more than usual (2) and much more than usual (3). The ratings of items are added together to calculate the total score (range=0-33). High scores represent high levels of fatigue.

Discriminant validity was assessed through:

1. Work and Social Adjustment Scale (WSAS) is a valid and reliable 5 item self-report measure

of impairment in relation to an identified disorder or symptom 28. In the context of these studies

items measured impact of fatigue on home management, work, social leisure activities, private

leisure activities and the ability to form and maintain close relationships.

2. The Modified Fatigue Impact Scale (MFIS) is a shortened version of the Fatigue Impact

Scale validated in people with MS 11.

3. Hospital Anxiety and Depression Scale (HADS) 29 is a commonly used self-report measure of

mood in patients with medical illnesses. 7 items relate to anxiety and 7 items relate to

depression.

Statistical Methods

The factor structure of the CFQ was examined using CFA in MPlus 7.1. Competing models were estimated using Weighted Least-Squares with Mean and Variance adjustment (WLSMV) estimation, testing one, two factor and bi-factor models of fatigue. In the bi-factor models, all 11 items were loaded onto a general fatigue factor. In addition, items were also loaded onto a number of group factors, with correlations between each of these latent factors fixed to zero. Assessment of goodness-of-fit based on standard structural equation modeling criteria: root mean squared error of approximation (RMSEA) <.08, confirmatory fit index (CFI) >.95, and Tucker-Lewis index (TLI)

>.95 30. Reliability of the total and subscale scores was assessed using the omega index, along with 6 an indicator of the saturation of a multidimensional scale by a general factor, omega-hierarchical, for the bifactor models31, 32. Discriminant validity between the fatigue factors with other patient reported outcomes (depression, anxiety and disability) was evaluated using Pearson’s correlation.

Sensitivity to change was assessed using the data from the two RCTs21, 24. Treatment effects, in terms of post-treatment standardised mean differences (Cohen's d), on the CFQ were estimated for

CBT versus treatment as usual24 and for CBT versus relaxation21. Following the intention-to-treat principle, missing post-treatment scores were imputed by carrying forward the baseline score. In addition to the treatment effects, the proportion of individuals showing a reliable improvement in fatigue following the method proposed by Jacobsen and Traux33 was calculated. In order to assess if the measure remains relatively stable over a 10-week period without treatment we calculated

Pearson’s correlations between baseline and follow-up CFQ in the no treatment control group.

Results

Confirmatory factor analysis of the Chalder fatigue Questionnaire

A series of CFA models were examined. The details of the five models are presented in the technical appendix together with a table of the summary statistics for the fit of each model. The first three models illustrated that items 6 (less strength in muscles) and 7 (feeling weak) of the original

CFQ negatively impacted the fit indices. Since these items appear to measure weakness rather than fatigue, they were dropped in the final two models. The model with the best fit and most satisfactory face validity was a 9-item bi-factor model with two group factors (see technical appendix table 1; model 4b). Model estimates from this analysis are shown in Table 2. The general factor explained 81.4% of the common variance between items. The mental (4 items) and physical group (5 items) factors explained only a small amount of common variance – 12.4% and 6.2% respectively. Omega hierarchical was .89, indicating that the total score across all items included in the scale predominantly reflects a general fatigue factor. Considering the mental and physical

7 subscales separately, the reliability coefficients were both .96. However, controlling for the part of the reliability attributable to the general factor the coefficients drop to .20 and .10, respectively.

Together this indicates that, even though the scale is multidimensional, the total score for the scale is a reliable indicator for general fatigue. Total scores for the physical and mental subscales are saturated by the general factor and thus reflect general fatigue rather than separate constructs of physical and mental fatigue.

Discriminant validity: correlations between fatigue, depression, anxiety and disability.

The original total CFQ sum score (11-items), and shortened sum score (9-items) were correlated with the HADS (depression and anxiety sum scores) and WSAS (see Table 3) to determine discriminant validity. As hypothesised, depression, anxiety and impact of fatigue (WSAS) all had significant but small to moderate positive associations with the total CFQ fatigue factor (bi-factor model), original (11-item) and shortened sum (9-item) scores. The size of the correlations were very similar for the 11 and 9 item versions. In a small subset of the total population, the 9-item summed

CFQ showed a small correlation with the Modified FIS supporting the argument that severity and impact may be distinguishable (r=.22, p=.19 [n=39; data from Moss-Morris et al., 2012]).

Sensitivity to change: Sensitivity to change was comparable for the 11 and 9-item CFQ versions in terms of the treatment effects of CBT versus treatment as usual24 and versus relaxation21. Compared to treatment as usual, the post intervention between group effect size for CBT using the 11 and 9 item versions summed fatigue score was d=1.19 and 1.15, respectively. Compared to relaxation, the post intervention between group effect size for CBT was d=0.76 and 0.81, respectively.

8 For both the 11 and 9 item versions a reliable change was estimated to be a 3-point difference.

There was no difference between versions in the number of people in the intervention group that exhibited a reliable improvement in fatigue between the baseline and post-treatment assessments.

In Moss-Morris et al 2416 of 23 (69.5%) and in van Kessel et al21 34 of 35 (97.1%) patients in the

CBT group exhibited a reliable improvement. For the treatment as usual group, the correlation between CFQ at baseline and follow-up (10 weeks later) was r=.58; p=02 suggesting without treatment scores remain moderately stable.

Discussion

The primary purpose of this study was to evaluate whether the Chalder Fatigue Questionnaire, is a valid and reliable measure of fatigue severity in MS patients. In terms of factor structure, early development of the CFQ with patients with chronic fatigue syndrome and healthy controls revealed two-factors, measuring physical and mental symptoms of fatigue17. However, our findings failed to support a two-factor model underlying the CFQ, as evidenced by poor model fit and two highly correlated factors. Given this, we tested bi-factor models, which allows the separation of variance into components related to a general factor, group factors and unique variance. This modeling approach is increasingly used to test whether multidimensional measures can be considered sufficiently unidimensional to allow for the use of a total score, measuring one general construct 34,

35.

A bi-factor model, containing a general fatigue factor, and two smaller group factors (physical and mental) most appropriately fitted the data. Items 6 and 7 correlated highly and appeared to measure something specific to weakness, rather than physical fatigue per se. Following examination of models with these items correlated, loaded onto a third group factor (weakness) or removed, model fit appeared most satisfactory when these items were removed. The two group-factors explained

9 relatively low variance, whereas the general factor, with all 9-items loaded upon it, explained approximately 80% of the common variance. Therefore, whilst the CFQ includes two dimensions of fatigue, it remains sufficiently unidimensional for the total score (i.e. the sum-score) to be used as a reliable measure of general fatigue severity. The group factors remain only fragile indicators of separate constructs, namely, mental and physical fatigue. The saturation of total subscale scores by the general factor means the subscales would be unreliable indicators of the unique constructs, thus we recommend using the total score as a general fatigue measure in future studies. These findings support those of others20, 36, and casts significant doubt over the practical distinction between physical and mental constructs of fatigue in MS patients. Further support for this assertion regards the poor criterion validity of the physical and mental subfactors in relation to other patient reported outcomes (depression, anxiety and disability). That is, all of the association with these measures is due to common variance accounted for by the general component of fatigue. The unidimensional nature of the CFQ is also supported in the general population37, thus we encourage future research to use the measure as a total score measuring fatigue severity, rather than subscales of mental and physical fatigue.

A secondary aim was to assess whether this measure of fatigue severity could be discriminated from measures of impact of fatigue (WSAS and MFIS) and measures of mood. Whilst fatigue severity was correlated with negative mood and the impact of fatigue on the ability to carry out day-to-day tasks, the overlap between these constructs was small to moderate in size (accounting for between

5-16% of the shared variance). These data suggest that it is worth including separate measures of fatigue severity and impact, although it should be noted with respect to MFIS, the available sample size was small. The data also suggest that fatigue severity can be discriminated to some extent from negative mood since the correlations between fatigue and distress were moderate in size.

Correlations were very similar in size for both the 11 and 9 items versions of the CFQ. The CFQ showed excellent sensitivity to change and large effect sizes in relation to CBT designed specifically to reduce fatigue in MS, both when the therapy was delivered by a therapist and 10 through a website with some minimal support. Sensitivity to change was comparable for both the

11 item total score and the reduced 9 item version. This suggests removing the two items relating to weakness did not impact on the properties of the total score. That is, internal reliability, and thus precision, was not affected. This along with the other analysis suggests that, in practice, the use of the either the 11 or 9 item version to assess fatigue is supported in the MS population. The two items removed in the bi-factor model related to weakness and it is conceivable that responses are confounded by disease symptoms in MS. These items may be stronger indicators of fatigue in other populations. However, as there is no evidence that the original 11-item version biases the validity of the instrument, we recommend the continued use of the 11-item version as it allows comparisons with non-MS samples. There appears to be little utility for using the 9-item version over the 11-item version.

Whilst our study has a number of strengths including the sample size and representative nature of the MS patient sample, a few limitations are worthy to note when interpreting these data. Firstly, our results are specific to the MS population and thus may not generalise to other populations.

Second, English speakers only completed the measure, therefore these data may not be generalised to other languages or cultures. Specifically, the measurement models of fatigue tested here may not be robust in other cultures, due to possible differences in the representation and expression of fatigue symptoms. Furthermore, the available follow-up data from the two pilot RCTs reported here21, 24 had insufficient sample sizes to determine model invariance over time, using multiple group confirmatory factor analysis. The test-retest reliability yielded a moderate coefficient (0.58).

This is likely because the retest data was taken from a control arm of a fatigue intervention study.

Therefore the retest period was 10 weeks, which is, not a typical time frame employed when evaluating retest reliability. Other studies show that the CFQ has good retest reliability38, however within individuals with MS this needs further evaluation. Finally, it is possible that some of the

11 CFQ items overlap with muscle weakness and perception of cognitive dysfunction such as problems with memory.

In conclusion, the CFQ appears to be a valid and internally reliable measure of fatigue severity in people with MS which is sensitive to change. We discourage the separation of physical and mental fatigue by means of two factor scores; rather suggest that a total sum score provides an appropriate and internally reliable measure of general MS-fatigue symptoms. Although the CFQ was associated with measures of impact of fatigue the size of these correlations were small to moderate suggesting that when measuring fatigue in MS including measures of both fatigue severity and impact are warranted. Future studies should also explore the relationships between the CFQ (measuring severity of fatigue), and measures of performance fatigability including central factors relating to cognitive networks and peripheral factors such as loss of muscle force.

References:

1. Bergamaschi R, Romani A, Versino M, Poli R and Cosi V. Clinical aspects of fatigue in multiple sclerosis. Functional neurology. 1997; 12: 247-51. 2. Krupp LB. Fatigue in multiple sclerosis: definition, pathophysiology and treatment. CNS drugs. 2003; 17: 225-34. 3. Multiple Sclerosis Council for Clinical Practice Guidelines. Fatigue and multiple sclerosis: evidence- based management strategies for fatigue in multiple sclerosis. Washington, DC: Paralyzed Veterans of America, 1998. 4. Kluger BM, Krupp LB and Enoka RM. Fatigue and fatigability in neurologic illnesses: proposal for a unified taxonomy. Neurology. 2013; 80: 409-16. 5. Smith MM and Arnett PA. Factors related to employment status changes in individuals with multiple sclerosis. Multiple sclerosis. 2005; 11: 602-9. 6. Julian LJ, Vella L, Vollmer T, Hadjimichael O and Mohr DC. Employment in multiple sclerosis. Exiting and re-entering the work force. Journal of neurology. 2008; 255: 1354-60. 7. Krupp LB, Alvarez LA, LaRocca NG and Scheinberg LC. Fatigue in multiple sclerosis. Archives of neurology. 1988; 45: 435-7. 8. Krupp LB, LaRocca NG, Muir-Nash J and Steinberg AD. The fatigue severity scale. Application to patients with multiple sclerosis and systemic lupus erythematosus. Archives of neurology. 1989; 46: 1121-3. 9. Schwid SR, Covington M, Segal BM and Goodman AD. Fatigue in multiple sclerosis: current understanding and future directions. Journal of rehabilitation research and development. 2002; 39: 211-24. 10. Zwarts MJ, Bleijenberg G and van Engelen BG. Clinical neurophysiology of fatigue. Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology. 2008; 119: 2- 10. 11. Fisk JD, Ritvo PG, Ross L, Haase DA, Marrie TJ and Schlech WF. Measuring the functional impact of fatigue: initial validation of the fatigue impact scale. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 1994; 18 Suppl 1: S79-83.

12 12. Taylor RR, Jason LA and Torres A. Fatigue rating scales: an empirical comparison. Psychological medicine. 2000; 30: 849-56. 13. Mills RJ, Young CA, Pallant JF and Tennant A. Rasch analysis of the Modified Fatigue Impact Scale (MFIS) in multiple sclerosis. Journal of neurology, neurosurgery, and psychiatry. 2010; 81: 1049-51. 14. Hobart J, Cano S, Baron R, et al. Achieving valid patient-reported outcomes measurement: a lesson from fatigue in multiple sclerosis. Multiple sclerosis. 2013; 19: 1773-83. 15. Turk DC, Dworkin RH, McDermott MP, et al. Analyzing multiple endpoints in clinical trials of pain treatments: IMMPACT recommendations. Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials. Pain. 2008; 139: 485-93. 16. Dworkin RH, Turk DC, Peirce-Sandner S, et al. Research design considerations for confirmatory chronic pain clinical trials: IMMPACT recommendations. Pain. 2010; 149: 177-93. 17. Chalder T, Berelowitz G, Pawlikowska T, et al. Development of a fatigue scale. Journal of psychosomatic research. 1993; 37: 147-53. 18. Morriss RK, Wearden AJ and Mullis R. Exploring the validity of the Chalder Fatigue scale in chronic fatigue syndrome. Journal of psychosomatic research. 1998; 45: 411-7. 19. Loge JH, Ekeberg O and Kaasa S. Fatigue in the general Norwegian population: normative data and associations. Journal of psychosomatic research. 1998; 45: 53-65. 20. Michielsen HJ, De Vries J, Van Heck GL, Van de Vijver FJR and Sijtsma K. Examination of the Dimensionality of Fatigue. European Journal of Psychological Assessment. 2004; 20: 39-48. 21. van Kessel K, Moss-Morris R, Willoughby E, Chalder T, Johnson MH and Robinson E. A randomized controlled trial of cognitive behavior therapy for multiple sclerosis fatigue. Psychosomatic medicine. 2008; 70: 205-13. 22. Skerrett TN and Moss-Morris R. Fatigue and social impairment in multiple sclerosis: the role of patients' cognitive and behavioral responses to their symptoms. Journal of psychosomatic research. 2006; 61: 587-93. 23. Witt EL. A stress and coping model of fatigue in MS. MSc Dissertation. University of Auckland, 2005. 24. Moss-Morris R, McCrone P, Yardley L, van Kessel K, Wills G and Dennison L. A pilot randomised controlled trial of an Internet-based cognitive behavioural therapy self-management programme (MS Invigor8) for multiple sclerosis fatigue. Behaviour research and therapy. 2012; 50: 415-21. 25. Cella M and Chalder T. Measuring fatigue in clinical and community settings. Journal of psychosomatic research. 2010; 69: 17-22. 26. Chalder T, Sharpe M and White PD. PACE trial clarification. Lancet. 2012; 379: 616. 27. White PD, Goldsmith KA, Johnson AL, et al. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. Lancet. 2011; 377: 823-36. 28. Mundt JC, Marks IM, Shear MK and Greist JH. The Work and Social Adjustment Scale: a simple measure of impairment in functioning. The British journal of psychiatry : the journal of mental science. 2002; 180: 461-4. 29. Zigmond AS and Snaith RP. The hospital anxiety and depression scale. Acta psychiatrica Scandinavica. 1983; 67: 361-70. 30. Hu L and Bentler PM. Cuttoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling. 1999; 6: 1-55. 31. Reise SP. The Rediscovery of Bifactor Measurement Models. Multivariate behavioral research. 2012; 47: 667-96. 32. Zinbarg RE, Revelle W, Yovel I and Li W. Cronbach’s alpha, Revelle’s beta, and McDonald’s omega_h: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika. 2005; 70: 123-33. 33. Jacobson NS, Roberts LJ, Berns SB and McGlinchey JB. Methods for defining and determining the clinical significance of treatment effects: description, application, and alternatives. Journal of consulting and clinical psychology. 1999; 67: 300-7.

13 34. Chilcot J, Norton S, Wellsted D, Almond M, Davenport A and Farrington K. A confirmatory factor analysis of the Beck Depression Inventory-II in end-stage renal disease patients. Journal of psychosomatic research. 2011; 71: 148-53. 35. Norton S, Cosco T, Doyle F, Done J and Sacker A. The Hospital Anxiety and Depression Scale: a meta confirmatory factor analysis. Journal of psychosomatic research. 2013; 74: 74-81. 36. Ahsberg E. Dimensions of fatigue in different working populations. Scandinavian journal of psychology. 2000; 41: 231-41. 37. De Vries J, Michielsen HJ and Van Heck GL. Assessment of fatigue among working people: a comparison of six questionnaires. Occupational and environmental medicine. 2003; 60 Suppl 1: i10-5. 38. Jelsness-Jørgensen LP. The Fatigue Questionnaire has a good test-retest profile in IBD. Alimentary Pharmacology & Therapeutics. 2012; 35: 621-2.

Table 1: Baseline demographic and illness characteristics of MS participants across 4 studies related to the treatment of fatigue.

Skerrett et al., Van Kessel et Witt, 2005 Moss-Morris 2006 al., 2008 (unpublished) et al., 2012

n 149 72 183 40 Design Cross-sectional RCT Cross-sectional RCT Age (mean, SD) 44.5 (11.8) 45 (9.6) 50.2 (12.9) 40.9 (15.1) Years with MS (mean, SD) 9.5 (7.4) 6 (5.4) 10.6 (9.5) 8.3 (7.0) Gender (female, n %) 127 (85.2%) 54 (75%) 145 (79.2%) 32 (80%) MS type Relapsing remitting 149 (100%) 41 (57%) 101 (55.2%) 22 (55%) Secondary progressive - 22 (30.6%) 37 (20.2%) 9 (22.5%) Primary progressive - 9 (12.5%) 44 (24%) 2 (5%)

14 Marital status Single/divorced/separated 42 (28.2%) 10 (13.9%) 44 (24%) 16 (40%) Living with Partner/married 102 (68.5%) 62 (86.1%) 126 (68.9%) 22 (55%) Employment related to MS Working less 43 (28.9%) 26 (36.1%) 45 (24.6%) 3 (7.5%) Unemployed 52 (35%) 22 (30.6%) 63 (34.4%) 11 (27.5%) RCT: randomised control trial

Table 2: 9-item bi-factor model for the Chalder Fatigue Questionnaire (model 4b, see technical appendix)

Factor Residual Original Item Description General Physical Mental Variance Item No 1 Tiredness .77** .58** .07 2 Need to rest more .79** .51** .11 3 Sleepy/drowsy .80** .39** .22 4 Problems starting things .84** .19** .26 5 Lack energy .83** .42** .13 8 Difficulty concentrating .93** .14 .11 9 Slips of the tongue when speaking .84** .43** .11 10 Difficulty finding correct word .84** .50** .05 11 Memory .88** .17** .19 Standardised estimates shown; **p<.01

15 Table 3: Correlates of the Chalder Fatigue Questionnaire (CFQ) general factor scores

Total CFQ sum score Shortened CFQ sum score n=444 Original 11-items 9-items HADS-depression .40** .40** HADS -anxiety .37** .38** WSAS – impact of fatigue .36** .34** MFISa .24 .22 HADS: Hospital Anxiety and Depression Scale

WSAS: Work and Social Adjustment Scale

MFIS: The Modified Fatigue Impact Scale a n=39

**p<.01 *p<.05

Technical appendix

Model Description No of free Chi-square (df) CFI TLI RMSEA parameters 1 1-factor 44 692.5 (44) p<.01 .98 .97 .18 2a 2-factor 45 395.4 (43) p<.01 .99 .99 .14

16 2b 2-factor with residual correlation 46 208.0 (42) p<.01 .99 .99 .10 3 3-factor 47 222.1 (41) p<.01 .99 .99 .10 4a Bi-factor with 2 group factors 55 145.3 (33) p<.01 .99 .99 .09 4b Modified Bi-factor with 2 group factorsa 45 33.4 (18) p=.01 .99 .99 .04 5 Bi-factor with 3 group factors 54 89.9 (34) p<.01 .99 .99 .06 Appendix table 1: Summary of model fit

a items 6 and 7 removed (weakness); root mean squared error of approximation (RMSEA) confirmatory fit index (CFI); Tucker-Lewis index (TLI)

Model 1: A one factor model with all 11-items loaded onto a single fatigue factor had poor model fit as indicated by a RMSEA>.08.

Model 2: A two-factor model, specifying correlated physical (items 1-7) and mental fatigue factors (items 8-

11) also had poor fit.

Model 2b: Inspection of Mplus derived modification indices suggested improved model fit for model 2 if a residual correlation was added between items 6 (less strength in muscles) and 7 (feeling weak). Since these items appear to measure “weakness” it was deemed reasonable to add this residual correlation to the model.

This two-factor model (model 2b), including the residual correlation, had improved fit, although this was still marginal as evidence by a RMSEA of .10. The correlation between the physical and mental fatigue factors was high, r=.80 (p<.01), suggesting a considerable amount of shared variance.

Model 3: Given the high correlation between mental and physical fatigue, a 3-factor model was also examined, which included a weakness factor (model 3) and was shown to have unsatisfactory fit given the

RMSEA.

Model 4: Given the high correlation between the physical and mental fatigue factors, a bi-factor model was tested. In the bi-factor model, all 11 items were loaded onto a general fatigue factor. In addition items 1-7 were loaded onto a physical fatigue factor and items 8-11 on a mental fatigue factor. Correlations between each of these three factors were fixed to be zero. The bi-factor model yielded a lower RMSEA (.09) and a substantial reduction in the chi-square statistic, albeit this was still significant (p<.01). Since items 6 and 7 appear to tap into the concept of weakness, two alternative bi-factor models were tested, one in which these

17 two items were dropped (model 4b) and another with them loaded onto a third group factor (“weakness”; model 5). The 9-item bi-factor model (model 4b) demonstrated the best fit in terms of the lowest Chi-square and RMSEA values (RMSEA=.04) and CFI and TFI values of .99.

18