just published in madinamerica.com/2022/08/phobic-real-world-outcomes/ Michael Scott
When I look at mental health research, I notice a startling avoidance of real-world outcome measures. It seems almost phobic. Yet this type of outcome should be considered the most important. After all, who cares whether some arbitrary measure goes up or down slightly after a week or two? What we care about should be whether people have improved quality of life over the long term. Can they get back to doing the things they used to do? Do they participate in the world, socially, at work? Do they enjoy their hobbies?
So why do researchers avoid asking these questions?
One big reason is that researchers are incentivized to find a positive effect. The motto of academia is “publish or perish,” and everyone knows that null effects are rarely published. But your job may depend on your ability to publish your next study. Even worse, plenty of researchers are funded by the pharmaceutical and device industries—corporations that obviously are hoping you find a nice effect for their drugs and devices.
Even with the best of intentions, though, the people who are testing therapies are often the people who invented the therapy and their disciples—who obviously have at least an unconscious bias, hoping that their personal theory works!
So, consciously or unconsciously, researchers tend to accept a lower threshold for proof of effectiveness. It’s difficult to actually improve people’s real lives significantly, and it’s a lot easier to use a ton of arbitrary metrics and find at least one “statistically significant” effect over a short time. The upshot is, to paraphrase the Dodo in Alice in Wonderland, “all medications and psychological therapies are winners and all must have prizes.”
And it seems that the media, politicians, and midlevel healthcare bureaucrats similarly have no interest in examining the validity of outcome measures. Instead, they pass on oversimplified understandings and glib slogans as if they encapsulate the nuances of what is actually quite controversial research. Most have the best of intentions to be a “mental health advocate,” and they’re told by establishment figures that any criticism of the existing system would be “stigmatizing” and “stop people from getting treatment”— treatment that we only assume works, again, based on arbitrary statistical outcomes over the short-term, not real-world improvement in the long-term.
In the worst-case scenario, researchers and activists who note the misleading research and conclusions dripping with “spin” in an attempt to improve the system are called “antipsychiatry” and marginalized within their own communities.
One searches in vain for studies that ask, after treatment, “Are you back to your old self?” and, importantly, “for how long?” These are the outcomes that patients really care about. Without such questions it is impossible to chart the trajectory of a person’s functioning. Such questions are at the heart of really listening to the patient. Without that, any therapeutic edifice crumbles. But it is not rocket science, just basic respect!
At best, and rarely, studies will report on the proportion of people who lose their diagnostic status—“recovered”—as assessed by an independent clinician. But these don’t indicate the duration of recovery. Do you lose your diagnostic status after two weeks, but then worsen again by a month?
Symptom Reduction vs Added Value
Finding the right psychological treatment for the right disorder is the window through which CBT researchers have gazed for decades. Likewise, psychiatrists have gazed through a similar window, which van Os and Guloksuz call “finding the right medication for the right brain disease.” Whether therapists or psychiatrists, researchers and clinicians have looked predominantly at symptom reduction, rather than whether treatment has provided added value to the client’s life. And all of this is usually rated by the clinician— rarely do we ask clients what they think about the treatment.
There has however been some limited success in the application of CBT to depression and some anxiety disorders, at least in randomised controlled trials. But even here researchers conclude “CBT is probably effective in the treatment of MDD, GAD, PAD and SAD; that the effects are large when the control condition is waiting list, but small to moderate when it is care-as-usual or pill placebo; and that, because of the small number of high-quality trials, these effects are still uncertain and should be considered with caution.”
Similarly, other researchers found that CBT had a large effect for treating OCD, and a moderate effect for treating PTSD. But beyond these DSM diagnoses, there is a dearth of credible supportive evidence.
Evolution or Dissolution?
It is the 50th Anniversary of the British Association for Behavioural and Cognitive Psychotherapy, the self-proclaimed lead organisation for CBT in the UK. The recent annual conference included a keynote speech called “On the Evolution of Cognitive Behaviour Therapy: A Four-Decade Retrospective and a Look to the Future.”
But evidence that it has evolved is sparse to non-existent. In 2008, Ost examined the methodology of what were then termed third-wave CBT therapies and concluded that the methodology employed made them significantly less reliable than the early pre-millenium CBT studies. He opined that the third-wave therapies would not qualify as evidence- based, despite yielding evidence of significant effect sizes. The evidence for the small, incremental changes in complexity and greater effectiveness of CBT is simply not there. Rather than evolution, we have evidence of the operation of the 2nd law of thermodynamics, in that therapeutic energies are being made available in less useful ways—dissolution.
Dissolution Under the Microscope
The PICOTS framework is a mnemonic used by the FDA to define evidence-based medicine. The “O” refers to outcomes and the FDA argues that these must be “outcomes that matter to patients and which predict long-term successful results.” Essentially, no cooking the books with small but statistically significant differences in outcome between an intervention and its comparator (the “C” of the mnemonic), ideally an active placebo.
The “P” stands for population, with a prerequisite to specify clearly who received the intervention, so that other researchers can replicate the findings with the same group of people. The “I” stands for intervention and requires a clear elaboration of what the treatment involved. For psychological therapies, this means the publication of a manual. The “T” refers to timeframe: how long have the treatment effects lasted. Finally, “S” refers to the treatment setting (e.g., primary care).
Over the past 40 years, psychological therapy (mainly CBT) studies have increasingly paid lip service to PICOTS. They have progressively looked less like the original pioneering efficacy studies. There has been a drift to reliance on self-report measures to define a population (P), as opposed to defining a population with a “gold standard” diagnostic interview—largely on the grounds of cost and expediency. Outcomes (“O”) have been progressively less likely to be assessed by independent blind raters.
For example, since the millennium there has been the development and evaluation of low-intensity CBT (typically defined as 6 hours or less of therapist contact). In none of these has there been an independent blind rater; outcome has always been assessed by
self-report and rarely has a diagnostic interview served as the gateway into the study. Yet, in the UK, these low-intensity treatments are the first-line treatments for depression and the anxiety disorders.
Not only has the National Institute of Health and Care Excellence (NICE) endorsed the usage of low-intensity CBT, but they have recently advised that in the first instance therapists should market eight sessions of group CBT for depression.
The lack of any credible evidence on real-world impact and duration of gains troubles them not. It appears an answer to the managerial dream of throughput. Therapies are accessed and patients axed.
CBT and Antidepressants in Practice
There is nothing in the arrangement of routine psychological therapy services that guarantees that a) the “right” disorder will be identified and b) the “right” treatment will be forthcoming. Routine services, such as IAPT in the UK, do not make diagnoses. In a just- reported paper by Clark et al (2022), IAPT clinicians were asked to refer patients to a social anxiety disorder study, but only half the patients referred were found to have the disorder in the study diagnostic assessment.
Thus, left to their own devices, the routine clinicians would have been providing inappropriate treatment to 1 in 2 patients. There can be no certainty that the treatment provided in routine practice is a bona fide treatment, as fidelity checks have never been made. Fidelity checks are disorder specific, with matching treatment targets and interventions. For example, in depression, tackling the loss of the pleasure response (anhedonia) with activity scheduling.
There is a potency of treatment gap between the interventions used in randomized controlled trials and their translation into routine practice. A paper published in the Journal of Psychiatric Research last year showed a 25% response rate for those who had antidepressants and manual-driven psychotherapy (mostly CBT), no better than antidepressants alone. This compares with a 31% response rate in those given a placebo in other studies.
Proper translation of the benefits of treatments identified in randomised controlled trials cannot be done on the cheap. It requires rigorous reliable assessments and a commitment to fidelity. But the latter has to be accompanied by the flexibility of adaptation to the individual. Respect and reverence of patients’ perspectives are paramount. Without funding bodies going beyond operational matters of numbers/waiting times and focussing on real world outcomes, the promise of randomised controlled trials will not be realised. There is a pressing need to return to basics by measuring treatment effects in the real- world.
In practice, there is also unfettered discretion when it comes to a clinician’s choice of which client problems to tackle, in what order and with what evidence-based protocol.
It is, however, possible for individual therapists to deliver quality therapy. I have outlined the specifics of this in Personalising Trauma Treatment: Reframing and Reimagining. I have termed this “restorative CBT”—returning the person to their old self. In this work, the uniqueness of the individual is recognised (e.g., “what does the trauma mean to you today?”), yet at the same time commonalities are recognised, such as the state of “terrified surprise” (a combination of exaggerated startle response and hypervigilance) experienced by those most debilitated by trauma.
Unfettered Discretion on Outcome Measures
In their important book Noise, published last year, Kahneman et al highlight the poor levels of agreement on matters as diverse as judicial sentencing and psychiatric diagnosis. Such disparities are clearly unfair. But there is also heterogeneity of outcome measures. This makes it possible for authors to claim positive benefits in the absence of any real-world demonstration of effectiveness. Researchers have had a field day with unfettered discretion on outcome measures, facilitating the quest for positive findings and heightening the likelihood of publication.
Clients have a right to expect that primary outcome measures should be meaningful to them. The danger is that because of a power imbalance, clients defer to the conclusions of the professionals on outcome and, in Kahneman et al’s terms, a “respect-expert” heuristic (rule of thumb) comes into play. As a consequence, the client is likely to be continually short-changed.
Get the MIA Newsletter
Sign-up to receive our weekly newsletter and other periodic updates.