Two psychometric tests, the PHQ-9 for depression and the GAD-7 for generalised anxiety disorder, are the twin pillars used by NHS Talking Therapies [formerly Improving Access to Psychological Therapies (IAPT)] service to direct therapy and evaluate outcome. The pillars form a gate through which the client is expected to pass, at every treatment session. Thus, making less time for listening to and treating clients.
NHS Talking Therapies clinicians are not trained to make diagnoses, so the tests are the sole arbiter of the services effectiveness. I made a freedom of Information Request to NHS England requesting details of the experience of clinicians and cost of the Service, bizarrely they said that they did not have this information. Drawing on data from the rest of mental health services, it seems likely that most practitioners are less than 3 years in post, over 80% are female and most age 40 or below. It stretches credibility to believe that these practitioners are sufficiently competent or diverse for the public they serve.
Unfortunately, other agencies such as the Charity, Anxiety UK have felt compelled to adopt IAPT’s metrics. The result is chaos, when viewed through the lens of the recent Negeri et al (2021) meta-analysis of the accuracy of the PHQ-9 to assess for depression. A chaos which is compounded by looking through the other lens, of the accuracy of the GAD-7 in different settings.
The Misuse of the PHQ-9
Negeri et al (2021) provide a tool to indicate the likely consequences of use of the PHQ-9 by itself. The first step is to enter the likely prevalence of depression in the target population (in primary care they suggest it is likely to be 5-10% and in specialty care settings or those with chronic health conditions it is likely to be 10 to 20%). Entering a prevalence of 10% for the level of depression in those presenting to IAPT (using the standard cut-off of a score of 10+) 22% of client i.e 22 out of 100 would screen positive. Of the 22 9 (39%) would meet diagnostic criteria for major depression (true positives) 13 (61%) would not meet diagnostic croiteria for major depression ( false postives). Thus inappropriate treatment would be given to more than 1 out of 2 clients. Alternatively inputting a prevalence of 15% ( perhaps more accurate if the population included those with long term conditions) would give a prevalence rate of 26% i.e 26 out of 100.. Of the 26 13 (50%) would meet diagnostic criteria for major depression (true positives) but 13 (50%) would not meet diagnostic criteria for major depression (false positives). Thus, one out of two clients would be treated for depression when they did not need to be.
Using the PHQ-9, as often as not, IAPT’s clinicians are treating the wrong disorder. How then can the results ( a claimed 50% recovery rate) be comparable to that in the randomised controlled trials for depression where all the clients were known (on the basis of a ‘gold standard interview’) to be suffering from depression?
The Use of the GAD-7 By Agencies in Addition to the PHQ-9, Adds To the Misdirection and Makes Their Claims of Effectiveness Even Less Credible
Rutter and Brown (2016) concluded that the GAD-7 is ‘a dimensional indicator of GAD severity rather than a screening tool for the presence or absence of the disorder in outpatients with anxiety and mood disorders’ and the GAD-7 did not provide sufficient specific information to indicate the presence of a GAD diagnosis’, At a cut-off of 10 the sensitivity was 79.5% and specificity 44.7%. Using a cut off of 8 the sensitivity was 86.5% but the specificity was 34.8%. But In the validation study of the GAD-7 by Spitzer et al (2006) the optimal cut off was a score of 10 or more, 89% with GAD had GAD-7 scores of 10 or greater (sensitivity ), whereas most patients 82% without GAD had scores less than 10 (specificity). The psychometric properties of the GAD-7 have also been examined in a heterogeneous sample of different diagnoses. Beard and Björgvinsson (2014) found poor specificity and a high false positive rate for specific anxiety disorders and the proposed cutoff by Spitzer et al. (2006) of ≥10 was only partly supported with a sensitivity of 74% and specificity of 54%. Kroenke et al. (2007) found that the GAD-7 performed well as a screener for GAD, post-traumatic stress disorder (PTSD), social anxiety disorder (SAD), and panic disorder (PD) in primary care patients and proposed a score of 8 as a cutoff score with a positive likelihood ratio above 3. It appears that it is only the authors of the GAD-7 that claim its value.
Getting Real
The most plausible explanation is that IAPT has engaged in self-promotion. Realistically, only the tip of the iceberg of IAPT clients recover Scott (2018).
But it is not only IAPT who are making false claims so to are other service providers. There is pressing need for independent audit using ‘gold standard’ assessments of the trajectory of clients lives after treatment.
Beard, C., and Björgvinsson, T. (2014). Beyond generalized anxiety disorder: psychometric properties of the GAD-7 in a heterogeneous psychiatric sample. J. Anxiety Disord. 28, 547–552. doi: 10.1016/j.janxdis.2014.06.002
Kroenke, K., Spitzer, R. L., Williams, J. B. W., Monahan, P. O., and Löwe, B. (2007). Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Ann. Intern. Med. 146, 317–325. doi: 10.7326/0003-4819-146-5- 200703060- 00004
Rutter, L. A., and Brown, T. A. (2017). Psychometric properties of the generalized anxiety disorder scale-7 (GAD-7) in outpatients with anxiety and mood disorders. J. Psychopathol. Behav. Assess. 39, 140–146. doi: 10.1007/s10862-016- 9571- 9
Spitzer, R. L., Kroenke, K., Williams, J. B. W., and Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder the GAD-7. Arch. Intern. Med. 166, 1092–1097. doi: 10.1001/archinte.166.10.1092
Dr Mike Scott