New NICE Menu for Depression

The proposed Guidance, published last month, excludes consideration of assessment. Recommendations are  therefore built on sand. Depression can occur in a variety of contexts and alongside other disorders, NICE’s response is that it doesn’t matter so long as there is a high score on a depression psychometric test. The clinician, not the client holds the menu, the former takes them through the options in a set order. For ‘less severe’ depression group CBT is to be canvassed first with clients, next in line is group behavioural activation. Despite the fact that the latter group modality has not been assessed with blind independent assessors.


NICE advocates different pathways for ‘less’ and ‘more severe’ depression, advocating a cut-off of 16 on the PHQ-9. De facto the authors rubber-stamp the widely held practice, reflected in the Improving Access to Psychological Therapies (IAPT) Service, of routing high scorers on a depression psychometric test (e.g PHQ-9 score 10 or greater) to treatment for this condition. But patients with a wide range of disorders including, panic disorder, PTSD, obsessive compulsive disorder and adjustment disorder have elevated depression scores. Nevertheless, NICE signals a diversion along a depression pathway with one fork for ‘less severe’ and another for the ‘more severe’. Clinicians and clients are likely to be equally bemused by the ‘road signs’. The upshot is likely to be misguided treatment.NICE have invited the public to Comment on their intended guidance on the treatment of depression. Commentary has to be submitted specifying the particular paragraph that any comment is about, so it is somewhat tedious, and you may well decide to write your Christmas cards instead. 


Generalising from Low Quality Studies

In assessing the outcome studies NICE do not take seriously the concept of minimally important difference (MID) i.e what change would a a patient see as the minimum requirement necessary for them to say treatment has made a real-world difference. There is no evidence that they would regard a change of score on a psychometric test as conferring a real-world difference. But they would recognise being back to their old self or best functioning and possibly no longer suffering from the disorder, so that loss of diagnostic status would be a reasonable proxy for a MID. However only a minority of studies furnish this data with the use of blind assessors. Inferences can therefore only be properly drawn from this sub-population of studies, which exclude the low intensity studies. As an exemplar see the comparison of group CBT and group behavioural activation at the end of this document.




Under the proposed Guidance client’s preferences are paramount.  If the client is judged as having ‘less severe’  depression and volunteers no treatment preference, they are to be taken through  a menu of options in a set order starting with first group cognitive behavioural therapy, second group behaviour activation, third individual CBT and on to the 11th option short-term psychodynamic therapy.  For ‘more severe’ depression top of the league is individual CBT plus antidepressants, in 2nd place individual CBT, and in 3rd place individual behavioural activation and in last and 10th place is group excercise. The ‘more severe’ route is more labour intensive and there is likely to be congestion as approximately half those entering IAPT have mean scores of 15 or more on the PHQ-9 [Saunders et al (2020)]. Unwittingly the Guidance spells the end of low intensity interventions because none of the top of the league options are low intensity! But 70% of clients entering the IAPT service are given a low intensity intervention first. However there is nothing to prevent a Service Provider declaring that ‘unfortunately none of the top of the league options are currently available’ and recourse has to be made to options in danger of relegation.

Psychometric Test Results Can only be Considered in Context


The NICE guidance assumes that psychometric test results speak for themselves but they are only meaningful when described in context. To my knowledge there is no study of the reliability of the PHQ-9 in UK routine mental health services compared to a ‘gold standard’ diagnostic interview. Rather data on the PHQ-9 has been extrapolated from from US studies of psychiatric outpatients, in a population with a high prevalence of depression, but not using a ‘gold standard’ diagnostic interview [The Prime MD was used instead, with insufficient distinction between this interview and the questions on the PHQ-9]. It is the author’s experience that in the UK the PHQ-9 gives a large number of false positives compared to a reliable diagnostic interview, such as the SCID.


The Need to Contextualise Outcome Studies

NICE has a ‘blind spot’ about context. In its’ analysis of outcome studies it lumps together ‘depression studies’ that were wholly reliant on self-report measures with those that included the results of a diagnostic interview as an outcome measure. Outcome is assessed in terms of statistical differences between either different modes of service delivery e.g stepped v non-stepped or between different treatments e.g CBT v waiting list. There was no attempt to try and discern what proportion of clients in each arm of a study would have regarded themselves as back to their normal selves or best functioning post treatment [ or in lieu of this, lost their diagnostic status] and the duration of those gains. Rather than patients being asked to cite preferences over treatments they largely have no knowledge of, they would be very interested as to the likelihood of treatment making a real-world difference to their lives i.e a difference that they would care about.

 The Need to Consider Effectiveness Studies Not Just Efficacy Studies

NICE’s failure to look at context is highlighted in the top league place it gives to group CBT for less severe depression. No mention that in our study [Scott and Stradling (1990) ] of individual and group CBT for depression in Toxteth, Liverpool the invitation to group CBT went down like a ‘lead balloon’ and we had to change the protocol to include up to 3 individual sessions in the ‘group’ arm. Entry was determined by independent diagnostic interview, but mean entry Beck Depression scores were around 28, so the population was likely ‘more severe’ in NICE terms. NICE also fails to critically appraise the Group Behavioural Activation studies, having previously called for BA studies to include observer rated assessments. They may have also added the need for credible attention control comparisons. NICE is content with statistical sweeps at large data sets rather trying to discern what is happening at the coal face.

Ignoring the Pandemic

NICE puts group interventions as top of the league for less severe depression, but ignores the context of the pandemic, realistically how possible will it be two get 2 therapists together with 8 clients for 90 minutes a week for 8 weeks, all face to face. with masks? The logistics and effectiveness of conducting it online is a venture into the unknown. NICE appears to operate without contextualisation of findings.


Failing to Pay Attention to the Detail of Group Interventions

In 2019 Kellett et al published a paper in Behavior Therapy, 50 (2019) 864–885 the abstract advocates Group Behavioral Activation for depression as a front line treatment. The abstract also claims a moderate to large effect on depressive symptoms. NICE appears not to have read further than the abstract, but closer inspection reveals the conclusions are deeply flawed.

In passing the abstract mentions that the standardized mean difference (SMD) between group BA and waiting list was 0.72. This would cause few people to question the findings, but actually it means the results are of doubtful clinical relevance, as it actually means there is less than one standard deviation in outcome between the treated group and the waiting list. If a group of depressed patients had a mean Beck Depression Inventory Score of 28 at the start of treatment, [assuming that the spread of the results was 7, the standard deviation – taken from the Scott and Stradling (1990) study Behavioural Psychotherapy, 18, 1-19 ] a mean score of 23 at the end of treatment would produce an SMD of 0.71, i.e about the same as in the University of Sheffield analysis. Thus the average person experiencing this change of score is unlikely to feel that they are back to their normal selves, and are likely to view it as part of the normal cycling of mood, influenced by positive events e.g the company/support of fellow sufferers for a time in a group. In none of the Group BA studies was there an independent assessor determining whether clients were still depressed or the permanence of any change. Unsurprisingly the authors found that the Group BA was no better than any other active treatment (i.e controlling for attention and expectation), and make an implicit plea for the Dodo verdict ‘all therapies are equal and must have prizes’.

In the body of the BA paper the authors acknowledge that the Group BA studies are of low quality, save one and that analyses were on treatment completers as opposed to the more rigorous intention to treat. But there is no indication anywhere as to what proportion of people recover from depression with any permanence.

In 1990 Steve Stradling and I had published [Behavioural Psychotherapy, 18, 1-19] a study of depressed clients comparing, group CBT, individual CBT and a waiting list condition. For Group CBT the initial mean BDI was 29.0 and end of treatment score was 6.2 whilst for individual treatment the comparable scores were 28.21 and 11.53. However those on the waiting list also improved from 25.89 initially to 20.26 at the end of waiting list. Thus, it is far from clear that the results from the University of Sheffield analysis on Group BA are actually better than those of putting people on a waiting list.

Dr Mike Scott