New study casts doubt on reliability of mental health diagnosis interviews

6 hours ago 13

Diagnostic interviews – the most common way to diagnose substance use and mental disorders including depression, anxiety, bipolar and personality disorders – vary in reliability from condition to condition, according to a new study in Jama Network Open.

Laura Duncan, a psychiatry professor at McMaster University in Ontario, Canada, and one of the study’s authors, said diagnostic interviews are “often treated as a ‘gold standard’ for assessing mental disorders in both clinical settings and research”, but pointed out that these interviews fall short of providing a “definitive benchmark that demonstrates excellent validity and reliability”.

Even though evidence on the reliability of these interviews has long been mixed, “they continue to be widely viewed as the best available approach, possibly due to the lack of better alternatives,” Duncan said. The review study brings together evidence from studies on “test-retest reliability” of diagnostic interviews from February 2024 to September 2025.

The study’s authors used Cohen’s kappa coefficient to estimate how reliable diagnostic interviews were for different mental health conditions; this allowed them to see how often patients would receive the same diagnosis when given the same diagnostic interview twice, while accounting for the fact that sometimes this can happen by luck.

The average reliability was generally better for substance use disorders, and highest overall for opioid use disorder. Duncan said this was because substance use disorder criteria are largely based on behavior. For instance, it’s often easier to estimate how many drinks you had in a week, than the number of days you felt sad or anxious.

Dr Michael First, a psychiatrist and professor at Columbia University who authored the Structured Clinical Interview for DSM 5 (SCID), was frustrated with elements of the study. While he agreed that diagnostic interviews vary in reliability and too often fail to correctly diagnose people, he wanted to see more information about which specific instruments were most reliable.

“It’d be nice to be able to look at this and say: ‘Oh, based upon this paper, I should pick this one because of this.’ That would be doing the field a real service,” he said. “But there’s simply not enough information here.” Duncan said that the information in the study was based on the limited amount of relevant research available during the study period.

The review included papers on diagnostic tools including the SCID, which First authored, as well as Mini International Neuropsychiatric Interview (Mini), both which screen for multiple mental health conditions – as well as tools intended for specific disorders, like the Clinically Administered PTSD Scale (Caps.)

First also took issue with how the study lumped “fully structured”, and “semi-structured” interviews together. Fully structured interviews are more likely to yield the same result when administered more than once, “because you stick to the script and cannot deviate from it at all”, First noted.

“If the person says something contradictory, you’re not allowed to even point out that it’s contradictory,” First said. This type of interview is often used for epidemiological research on large populations, and is therefore designed for people with little training to administer.

Semi-structured interviews, on the other hand, are designed for trained clinicians to diagnose patients. With this type of interview, clinicians have the freedom to “ad-lib their questions as needed”, First said. This means if a patient’s answer is vague or contradictory, their provider is able to ask follow-up questions to clarify. That allows for more accurate diagnosis, but the patient’s answers also might vary more from session to session.

While Duncan noted that it would be useful to address all of First’s concerns, she said the data she would need to do so simply does not exist yet. In the papers her study included, Duncan said they “attempted to extract information on interview format, but this was often unclear or not reported”. The lack of available information necessary to compare different interview designs one by one is another sign of the need for more rigor when it comes to psychiatric diagnosis.

Even though he helps design them, First readily admits structured interviews are less than ideal tools. For decades, psychiatrists have been hoping one day more objective laboratory tests will become available for mental conditions, he said.

“We’ve been saying that for 50 years,” First said. Duncan pointed to an alternative future approach where clinicians “move away from strict diagnostic categories, where a condition is either present or absent, and think about symptoms on a spectrum or continuum”.

Read Entire Article