IAPT’s Flagship The PHQ-9 Hits An Iceberg

the PHQ-9, IAPT’s sole determinant of depression. identifies two and a half times as many people as depressed compared to the ‘gold standard’ diagnostic interview the SCID,  see Levis et al (2020) in this months issue of the Journal of Clinical Epidemiology .  This makes for garbage assessment (GA), the same measure is used by the service to measure outcome, garbage outcome (GO).  In the computer world  the mnemonic GIGO is used to denote, that if you put garbage into a computer you get garbage out. For mental health clinicians there should be a new addition to the lectionary GAGO. But who is answerable for flagship IAPT (Improving Access to Psychological Treatments) racing towards the iceberg? who is going to pick up the survivors? 

Levis et al (2020) comment that the PHQ-9 results are rather like a positive mammogram test for breast cancer, the result would give a grossly inflated view of the prevalence of breast cancer. They argue that that the PHQ-9 results only have any validity in the context of a standardised diagnostic interview. IAPT has never used such an interview, its’ claim for a 50% recovery rate is outrageous.

Interestingly the Journal paper found that there was no cut off score on the  PHQ-9 that meaningfully differentiated those in need of treatment from those who did not. Yet IAPT uses the cut off score of greater than 10 to denote a ‘case’, with an implicit treatment requirement.  Those scoring greater than 10 could be suffering from almost anything, adjustment disorder, a specific phobia, binge eating disorder etc or simply cheesed off with their debility following say the development of sepsis after an operation and ongoing impairment.

Ironically the fault with the PHQ-9 may lie in its’ origins. It was validated not against an acknowledged reference standard, such as the SCID, but against the PRIME MD, Kroenke et al (2001) J GEN INTERN MED 16:606-613 which asks in interview form exactly the same questions as on the PHQ-9. This contravenes one of the STARD, [Cohen et al (2016) doi:10.1136/bmjopen-2016- 012799 ] requirements to judge the diagnostic accuracy of a test, in that the reference standard must contain much more detailed information (e.g about levels of functional impairment) than that contained in the index test.   

The development of the PHQ-9 and the PRIME-MD were both funded by Pfizer Pharmaceuticals. The over identification of cases of depression is clearly in the interests of a pharmaceutical company. Clinicians also welcome with open arms anything that appears to reduce the assessment burden. Their employers, such as IAPT can rejoice that this surrogate  for reliable diagnosis, shows a reducing score with time [regression to the mean – Gilbody et al. (2015) looked at how GP patients with a PHQ-9 score of greater than 10 fare with usual treatment, over a four-month period; their mean PHQ-9 score reduced from 16 to 9],  which they can publicly misattribute to the benefits of therapy, and can convince the more naive of their clinicians that they are making a real world difference. IAPT continues in its’ bubble, shared with the supposed UK lead organisation for cognitive behaviour therapy, the British Association for Cognitive and Behaviour Therapy (BABCP).

Dr Mike Scott



