<<

Optimal cut-off score for diagnosing depression with the Patient Health (PHQ-9): a meta-analysis

Manea L, Gilbody S, McMillan D

CRD summary This review concluded that the Patient Health Questionnaire (PHQ-9) had acceptable diagnostic properties for detecting major depressive disorder for cut-off scores between 8 and 11. These conclusions reflect the evidence presented, but limitations in the evidence base and methodological or reporting weaknesses in the review should be taken into account when interpreting the conclusions.

Authors' objectives To determine the best cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9).

Searching The authors searched EMBASE, MEDLINE and PsycINFO from 1999 (when PHQ-9 was issued) to August 2010. Search terms were "PHQ-9" and "patient health questionnaire". The authors searched reference lists of included studies and performed a reverse citation search in Web of Science. No language restrictions were applied. Study authors were contacted to obtain unpublished where necessary. Unpublished studies and conference abstracts were also reviewed.

Study selection Studies that reported the accuracy of PHQ-9 for diagnosing major depressive disorder (defined according to standard classification systems) were eligible for the review. Studies had to provide sufficient data to allow calculation of contingency tables. The reference standard was diagnosis using a standardised diagnostic schedule (examples listed in the paper).

Included studies recruited participants from primary care, specialised secondary care, community or mixed settings. age of participants ranged from 24.8 to 71.4 years and the prevalence of depression from 2.5 to 37.5%. Seven different language versions of PHQ-9 were used.

Study selection was performed by one reviewer.

Assessment of study Study quality was assessed based on criteria including size (250 or above was considered adequate); study design (single-gate preferred to two-gate); training and blinding of assessors; whether withdrawals or drop-outs were accounted for; and time between the index and reference tests. Validation of translated versions of PHQ-9 was also considered.

The authors did not state how many reviewers performed the quality assessment.

Data extraction Diagnostic accuracy data were extracted to construct contingency tables for the calculation of sensitivity, specificity and positive and negative predictive values.

The authors did not state how many reviewers performed the data extraction.

Methods of synthesis Pooled estimates of sensitivity and specificity and their associated 95% confidence intervals (CIs) were derived by bivariate meta-analysis. Summary receiver operating characteristic (ROC) curves were constructed using the bivariate model to produce a 95% confidence ellipse within the ROC space. Between-study heterogeneity was assessed using Ι² for the pooled diagnostic ; 25% was considered low, 50% moderate and 75% high heterogeneity. Effects of pre-specified sources of heterogeneity were analysed by meta-regression. Publication and small study were assessed using funnel plots.

Results of the review Eighteen studies with 7,180 participants (927 with major depressive disorder) were included. PHQ-9 cut-off scores ranged from 7 to 15. Four studies did not apply the reference standard to all participants and were at high risk of partial verification bias. Heterogeneity was high (Ι²=82.4%). Pooled specificity ranged from 0.73 (95% CI 0.63 to 0.82) at a cut-off of 7 to 0.96 (95% CI 0.94 to 0.97) at a cut-off of 15. Pooled sensitivity values varied between cut-off scores with no consistent pattern. For the widely recommended cut-off score of 10 (evaluated in 16 studies), pooled sensitivity was 0.85 (95% CI 0.75 to 0.91) and specificity 0.89 (95% CI 0.83 to 0.92). There were no substantial differences in pooled sensitivity and specificity for cut-off scores between 8 and 11. A cut-off score of 11 had the best trade-off between sensitivity and specificity.

In the meta-regression, only blinded application of the reference standard was a significant predictor of diagnostic performance. Funnel plots were not provided but the authors stated that they could not rule out publication bias.

Authors' conclusions PHQ-9 had acceptable diagnostic properties for detecting major depressive disorder for cut-off scores between 8 and 11.

CRD commentary The review question and inclusion criteria were clear. The search used a very small number of terms but covered a of relevant sources and included attempts to locate unpublished studies. Publication bias was assessed and the authors stated that bias could not be ruled out; full results were not reported. Study selection was performed by one reviewer, which the authors acknowledged could have introduced bias. Numbers of reviewers involved in quality assessment and data extraction were not reported.

Appropriate methods were used for quality assessment and data synthesis, including investigation of heterogeneity. The authors noted several limitations of the review, including variable quality of the included studies and unexplained heterogeneity between studies. The authors' conclusions reflect the evidence presented but these limitations and the methodological/reporting issues mentioned above should be taken into account when interpreting the conclusions.

Implications of the review for practice and research Practice: The authors stated the choice of cut-off score should take into account the population, setting and the effect of screening on outcomes.

Research: The authors stated that future studies should report results for all cut-off scores and should report whether administration of the reference standard was blinded.

Funding None

Bibliographic details Manea L, Gilbody S, McMillan D. Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis CMAJ: Canadian Medical Association Journal 2012; 184(3): E191-E196

PubMedID 22184363

Original Paper URL http://www.cmaj.ca/content/184/3/E191.abstract

Indexing Status Subject indexing assigned by NLM

MeSH Depressive Disorder, Major /diagnosis /; Humans; Odds Ratio; Psychiatric Status Rating Scales /standards; Psychometrics; ; ROC Curve; Sensitivity and Specificity

AccessionNumber 12012037127

Database entry date 22/08/2012

Record Status This is a critical abstract of a systematic review that meets the criteria for inclusion on DARE. Each critical abstract contains a brief summary of the review methods, results and conclusions followed by a detailed critical assessment on the of the review and the conclusions drawn.

Database of Abstracts of Reviews of Effects (DARE) Produced by the Centre for Reviews and Dissemination Copyright © 2012 University of York