The Mistreatment of IAPT Clients – The Smoking Gun

What right has the Improving Access to Psychological Therapies (IAPT) Service to routinely label each client with a diagnostic code (ICD of the International Classification of Disorders, World Health Organisation) when the Organisation’s Manual states that it does not do diagnosis. Fearful of litigation, it states that its diagnoses should not be used for medico-legal purposes, but as the code is the determinant of treatment, IAPT should be in the dock!

The Improving Access to Psychological Therapies (IAPT) service screens clients for treatment using the PHQ-9 but a study published in the British Medical Journal by Brooke Levis et al last year  indicates that half the  deemed depression cases  have been incorrectly diagnosed. 

In high quality randomised controlled trials of the treatment for depression all clients admitted have been diagnosed as having depression according to a ‘gold standard’ diagnostic interview such as the SCID. The recovery rate in the rcts is 50%. But IAPT claims that it approaches the recovery rate of rcts. This is preposterous! Consider 100 IAPT cases which score above the PHQ-9 cut off of greater or equal to 10. One half of them i.e 50 will not actually have depression and therefore cannot recover from the disorder. Of the other half, 50 cases, if the IAPT clinicians were as good as in the rcts 25 would recover. Thus the maximum possible recovery rate for depression in IAPT is 25% and this is assuming its clinicians are as good as the highly trained clinicians in rcts. More plausibly the recovery rate for depression in IAPT is the 14.9% I found in my independent study of IAPT, http://DOI: 10.1177/1359105318755264 using the SCID.

In primary care 22% of patients score over 10 on the PHQ9, so what are the treatment implications for the likely 3 out of 4 IAPT clients who score below 10? For these the PHQ-9 offers no direction.

But IAPT has its’ own answer, IAPT Manual, p 24  (2019), a) come up with a problem descriptor then choose an ICD 10 that that ‘matches’ the descriptor and  then b) a NICE treatment that matches the ICD 10 code.  Consider an IAPT client who reports that they are feeling emotionally numb at work, detached from others and fatigued after little exercise. The therapist could plump for either depression, burnout, chronic fatigue syndrome or the effects of COVID-19, with no guidance as the appropriate label! 

Using IAPT system Delgadillo et al (2020) classified over 40% of clients as having ‘Affective Disorder’  and over 20%  as having a ‘mixed disorder’. But there are no randomised controlled trials for ‘affective disorder’ or ‘mixed’, so that for 60% of IAPT’s clients there cannot be an appeal to an evidence based treatment (i.e one based on a randomised controlled trial). Considering again a sample of 100 IAPT clients who score less than 10 on the PHQ9 60 of them will have been labelled with a disorder for which there can be no evidence based treatment, this leaves 40 clients who in principle could be treated  with an evidence based treatment. Again assuming that for this population of 40 that allegedly covered GAD (10-12%), panic disorder (4-6%), social anxiety disorder (4-6%), specific phobia (0.5-1.0%), OCD (4-5%), PTSD (6-8%) and other (2-3%) there was an overall recovery rate of 50% only 20% of the allegedly ‘non-depressed’ clients would recover. This 20% would have to regarded as an upper limit because it assumes the IAPT therapist would be as skilled as the highly trained therapists involved in the rcts for anxiety disorders. A more realistic estimate of recovery for the IAPT ‘anxious clients’ would be the 14.2% found in my study of IAPT clients http://DOI: 10.1177/1359105318755264

The other metric employed  by IAPT is the GAD-7, a measure of the severity of depression, but as according to IAPT it has only been relevant to one in 10 of its service users, any effect of the treatment of this disorder will only effect the above picture minimally. Assuming a 50% recovery the effect will be even less and less still when one compares the training of therapists in GAD acts with the training of the routine IAPT therapist.

IAPTS sole reliance on psychometric tests and fudge has backfired badly, but it is the client who suffers most, with therapists suffering from the recoil.  

Number Theatre and Routine Mental Health

the National Institute for Health Research has just published a review of studies of the psychological treatment of Medically Unexplained Symptoms (MUS) [Leaviss J, Davis S, Ren S, Hamilton J, Scope A, Booth A, et al. Behavioural modification interventions for medically unexplained symptoms in primary care: systematic reviews and economic evaluation. Health Technol Assess 2020;24(46)] but in all studies the primary outcome measure was an improvement of symptoms on some psychometric test. No categorical measure was used such as no longer suffering from a ‘disorder’ such as fibromyalgia, irritable bowel syndrome or chronic fatigue syndrome post treatment. Likewise the Improving Access to Psychological Treatment (IAPT) markets its success on a change in score on psychometric tests the PHQ-9 and GAD-7. Further whether or not an IAPT clinician is to be subjected to a formal review of competence is based on a change of score on these measures. No categorical measure is used such as the proportion of cases of depression, panic disorder, generalised anxiety disorder etc that have lost their diagnostic status. Sir David Spiegelhalter the Statistician has coined the term ‘number theatre’ to describe the way in which the UK Government has promulgated statistics in relation to the Pandemic, but this drama been playing for years in the mental health arena.  I am reminded of a line from a song somewhere, ‘I am more than a number in a little red book’, although intended for a very different context, it seems particularly apt for IAPT.

Damned Lies

Number theatre in the mental health field has it seems been driven by the desire of psychologists to colonise. It is a reaction against the categorical labels employed by psychiatry. But the truth of the matter is both are needed simultaneously. To take a medical example, if I have a heart problem I need to know what the problem is but also my blood pressure today.

IAPT will topple because it pivots on psychometric tests. Inspection of of its’ main pillar, the PHQ-9 exposes a crumbling structure:

  1. Client’s judgement of their functioning does not match changes on the PHQ-9 Thus an IAPT therapist might report to his supervisor the ‘improvement’ on his/her clients score on the PHQ-9 and at the same time report that the latter said they are ‘the same old’. The overall judgement of the client is likely to be dismissed in favour of the alleged ‘moving towards recovery’ or ‘recovery’ on the PHQ-9.
  2. In the initial validation study of the PHQ-9  by Kroenke and Spitzer it was not validated against a ‘gold standard’ that it was sufficiently different to to make it an acceptable diagnostic aid according to the AMSTAR
  3. The findings of the progenitors of the PHQ-9 Kroenke and Spitzer were not replicated by independent researchers using a ‘gold standard’ diagnostic interview  such as the SCID.
  4. The diagnostic accuracy of an instrument depends very much on the prevalence of the disorder in which it was first evaluated. In the case of the PHQ-9 psychiatric outpatients in the United States. There is no reliable evidence (as assessed by a standardised diagnostic interview)  on the prevalence of disorders amongst those attending IAPT (which include both self referrers and GP referrals).  Thus the clinical utility of the PHQ-9 in this context is unknown.
  5. The PHQ-9 is purportedly a measure of the severity of depression, but there is poor concordance between it and alternative measures of the severity such as the HAD i.e a person would be in a different category of severity depending on which measure is used.

5. The use of a psychometric test with a summary score assumes that each of the items (9 in the case of the PHQ-9) contribute equally to the total score. But this is implausible an item about suicidal ideation (item  9 on the PHQ-9) is likely to  be more significant than an item about fatigue. 

6. Two patients on the PHQ-9 could have the same score, but arising from one patient endorsing all intermediate scores whilst the second endorses several items at the highest score. The same score but arguably a quite different meaning.

7. The PHQ-9 assumes that is the frequency of a symptom  that is the determinant of severity rather than the intensity. 

8. Unless the mechanism by which a PHQ9 score is changed is known it cannot determined that an evidence based treatment was in fact used. Thus those getting a supposed ‘result’ may be more at fault than those acknowledging none response, the latter may simply be more honest. 

These considerations on the PHQ-9 may not be prohibitive of its use, if employed in the context of a standardised diagnostic interview that has established the person has depression. But such an interview would likely also yield the presence of one or more coexisting disorders. The trajectory of these additional disorders would have to be tracked by other psychometric tests that are pertinent to the disorder. The idea that the  PHQ-9 can stand alone as judge and jury on a client’s mental health is absurd.

However politicians, public health bodies and clinical commissioning groups like to be told that there is a simple solution to a problem and that they can make a difference by implementing the chosen solution. Enter stage right IAPT proclaiming ‘give the PHQ9 reduce it below 10, job done and woe betide any clinician who does not manage this routinely’. As an encore IAPT uses numbers e.g throughput of clients, waiting lists to placate politicians and funders.  Exhaustion, numbing and detachment [burnout] are an inevitable consequence of these working conditions. No amount of self-reflection as advocated by Psychological Wellbeing Practitioner in the current issue of CBT Today, is going to make a real world difference. It is a shame that CBT Today has become IAPT’s comic.

IAPT’s Flagship The PHQ-9 Hits An Iceberg

the PHQ-9, IAPT’s sole determinant of depression. identifies two and a half times as many people as depressed compared to the ‘gold standard’ diagnostic interview the SCID,  see Levis et al (2020) in this months issue of the Journal of Clinical Epidemiology .  This makes for garbage assessment (GA), the same measure is used by the service to measure outcome, garbage outcome (GO).  In the computer world  the mnemonic GIGO is used to denote, that if you put garbage into a computer you get garbage out. For mental health clinicians there should be a new addition to the lectionary GAGO. But who is answerable for flagship IAPT (Improving Access to Psychological Treatments) racing towards the iceberg? who is going to pick up the survivors? 

Levis et al (2020) comment that the PHQ-9 results are rather like a positive mammogram test for breast cancer, the result would give a grossly inflated view of the prevalence of breast cancer. They argue that that the PHQ-9 results only have any validity in the context of a standardised diagnostic interview. IAPT has never used such an interview, its’ claim for a 50% recovery rate is outrageous.

Interestingly the Journal paper found that there was no cut off score on the  PHQ-9 that meaningfully differentiated those in need of treatment from those who did not. Yet IAPT uses the cut off score of greater than 10 to denote a ‘case’, with an implicit treatment requirement.  Those scoring greater than 10 could be suffering from almost anything, adjustment disorder, a specific phobia, binge eating disorder etc or simply cheesed off with their debility following say the development of sepsis after an operation and ongoing impairment.

Ironically the fault with the PHQ-9 may lie in its’ origins. It was validated not against an acknowledged reference standard, such as the SCID, but against the PRIME MD, Kroenke et al (2001) J GEN INTERN MED 16:606-613 which asks in interview form exactly the same questions as on the PHQ-9. This contravenes one of the STARD, [Cohen et al (2016) doi:10.1136/bmjopen-2016- 012799 ] requirements to judge the diagnostic accuracy of a test, in that the reference standard must contain much more detailed information (e.g about levels of functional impairment) than that contained in the index test.   

The development of the PHQ-9 and the PRIME-MD were both funded by Pfizer Pharmaceuticals. The over identification of cases of depression is clearly in the interests of a pharmaceutical company. Clinicians also welcome with open arms anything that appears to reduce the assessment burden. Their employers, such as IAPT can rejoice that this surrogate  for reliable diagnosis, shows a reducing score with time [regression to the mean – Gilbody et al. (2015) looked at how GP patients with a PHQ-9 score of greater than 10 fare with usual treatment, over a four-month period; their mean PHQ-9 score reduced from 16 to 9],  which they can publicly misattribute to the benefits of therapy, and can convince the more naive of their clinicians that they are making a real world difference. IAPT continues in its’ bubble, shared with the supposed UK lead organisation for cognitive behaviour therapy, the British Association for Cognitive and Behaviour Therapy (BABCP).

IAPT And The Abuse of Psychometric Tests

A person who consents to a psychometric test has a right to a full explanation of its purposes. I have not met an IAPT client who has been given such an explanation. IAPT employees see it as a requirement of the organisation for ‘audit’, but this is not an explanation. Informed consent means that it has to be explained what would be the consequences of not taking the test, this never happens. It is important that tests are only given that are relevant to the purposes of evaluation (not to do so probably breaches data protection legislation). But in administering say the PHQ-9 the IAPT worker does not know whether this is pertinent to whatever the client is suffering from e.g OCD or PTSD (as there is no reliable standardised diagnostic interview). Further the client isn’t informed of the purpose to which the test result will be put, e.g it will be used by IAPT in such a way that any positive change on it greater than 6 will be publicised as indicating the difference the Organisation makes. It is not explained that the PHQ-9 was developed with funding from Pfizer, the drug company who would clearly benefit from the overidentification of depression. Further the PHQ-9 was extracted from the Prime-MD interview, taken out of this context its’ meaning is questionable.

Psychologists wield power in IAPT, they know or at least should know about the appropriate use of psychometric tests e.g if they are administered weekly the person can remember their last response thus biasing scoring. If on their watch they are allowing others to misuse them then this may be a matter for the HCPC and for some also the University body that employs them. Psychologists and Universities can not be complicit in a Government Quango marketing itself.

IAPT at Sea On Risk Assessment

A study just published by Na et al (2018)  in the Journal of Affective Disorders*  suggests that item 9 of the PHQ-9 is an insufficient assessment tool for suicide risk and suicide ideation, creating large numbers of false positives. Yet within IAPT, GP’s may be informed that either there are no risk issues on the basis of a ‘not at all’ response to  item 9, ‘thoughts that you would be better off dead or of hurting yourself’ or that there are risk issues on the basis that they have been bothered by these thoughts for at least several days in the last 2 weeks. The message is usually communicated to the GP following a telephone assessment conducted by the most junior members of staff a Psychological Wellbeing Practitioner. The GP then feels obliged to call the patient in for an assessment which turns out to be invariably pointless, not good for the patient or for the GP who may be seeing 40 patients that day!

A (2012) paper on IAPT by Vail et al ** stated ‘that IAPT clinicians did not have set procedures or questions for assessing mental health risk, and were  flexible in the approaches they adopted. They often relied upon their own clinical judgement and experience about how to approach the topic of mental health risk’. This chimes with what I found in an analysis of 90 cases going through IAPT, Scott (2018) in only three cases was there mention of risk in the documentation. Inspection of item 9 on the PHQ-9 shows that it confounds passive suicidal ideation with active planning making it unclear what the frequency response refers to, creating many false positives.

More direct questionning based on the C-SSRS * is probably more appropriate:

Have you started to work out or worked out details of how to kill yourself? Do you intend to carry out this plan ?

Have you made a suicide attempt- purposely tried to harm yourself with at least some intention to end your life?

Have you  taken any steps to prepare to kill yourself or actually started to do something to end your life or were stopped before you actually did anything?

A none response to either of the 3 questions would indicate no suicide risk.

* Na, P.J et al (2018) The PHQ-9 item 9 based screening f or suicide risk: a validation study of the Patient Health Questionnaire (PHQ) – 9 item 9 with the Columbia Suicide Severity Rating Scale (C-SSRS) Journal of Affective Disorders, 232, 34-40.

** Vail, L (2012) Investigating mental health risk assessment in primary care and the potential role of a structured decision support tool, GRIST. Mental Health in Family Medicine, 9, 57-67


Dr Mike Scott