In 2018, the UK’s National Institute for Health and Care Excellence (NICE) produced a valuation study that ultimately recommended that researchers replace the EQ-5D-3L quality of life measure with a modified instrument called the EQ-5D-5L. A lengthy debate ensued. Over the past three years, academics have spilled considerable ink discussing the relative merits of the two systems.

You will be forgiven for not being familiar with either series of hyphenated letters and numbers. Put simply, they are two standardized questionnaires designed to measure health-related quality of life among patient populations. They both spring from the same home: the EuroQoL group, which arrived on the scene in the late 1980s when a group of European researchers set out to develop an instrument that would generate a “single index value for health status.”

The EQ-5D-3L measures patient quality of life data across five dimensions or attributes: mobility, self-care, usual activities, pain and discomfort, and anxiety and depression. The 3L and 5L versions of the questionnaire measure the attributes across three and five response levels respectively.

If you are bored or still confused about the differences between questionnaires, not to worry. Because, for the purposes of measuring quality of life from a patient perspective, both are essentially useless.

Drs. Paul Langley and Stephen McKenna pointed this out in Value in Health, the official journal for the International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Responding to a series of articles debating these two ineffective tools, they write:

In the ongoing and lengthy debate of the merits of the EQ-5D-3L and EQ-5D-5L multiattribute instruments, the protagonists appear to have overlooked a significant point: the EQ-5D value assessment framework, in common with other multiattribute utility instruments, fails to meet the required axioms of fundamental measurement. The implication of this failure, which has been recognized for over 30 years, is that utility scores are nothing more than ordinal or raw score measures. They cannot support claims for response to therapy.  

In other words, for more 30 years, many researchers have known that the EQ-5D and other multiattribute instruments fail to meet the basic standards of scientific measurement. And, for more than 30 years, others – the majority, in fact – have soldiered on, using unfit instruments to produce indefensible results.

That is three decades of wasted effort and useless analysis. All told, more than 19,000 papers published in world-class prestigious journals – all of which should have known better.

In the U.S., the most infamous misappropriations of the various EQ-5D instruments have come from the Institute for Clinical and Economic Review (ICER). In an attempt to ascribe an economic value based on health-related quality-of-life outcomes, ICER uses these instruments as the foundation for their quality-adjusted life year (QALY) analyses.

Basically, they use the EQ-5D measures to track the progress of a hypothetical patient population through various disease stages, taking into account expected – often assumed – changes brought about by a new drug or treatment. From there, ICER claims to be able to determine how many years a typical patient will live and the expected quality of life in those years.

That is the QALY in a nutshell.

ICER’s analyses arrive at a lifetime cost-per-QALY for new treatments. And, based on those conclusions, they make pricing and patient access to therapy recommendations.

Among the many problems with this approach is that, as Langley and McKenna point out, the EQ-5D instruments produce only raw or ordinal scores. In other words, the instruments cannot help predict how patients might progress from one disease state to another along an assumed timeline. That means, they offer zero measurable insight into how much better or worse one position on the line is from any other.  It is mathematical gibberish.

To put it even more simply: EQ-5D instruments – and basically all other multiattribute instruments – can tell you that patients will progress from point A to point B. But they cannot help calculate a measurable distance between those points.  This means all QALY claims made By ICER are best ignored.

If ICER and other organizations used an instrument that provided utility scores with ratio properties, they could potentially multiply those scores by time spent in a disease state. But the scores produced by the EQ-5D frameworks cannot be added, subtracted, multiplied, or divided, which makes them useless in making any complicated pricing or value calculations.

Just as a demonstration, the EQ-5D-3L generates utility scores ranging from -0.59 to 1.0. Negative utility scores mean negative QALYs. Does that mean the patients will be in a state worse than death? One would assume not. But if zero score in the QALY context means something other than death, the score itself has no meaningful value.

This is what happens when you attempt to add together different factors and attributes to produce a single score. For a utility score to be useful, it must reflect values as a single latent construct. That is the basic standard that applies in all hard sciences.

We should keep in mind that this is not just an esoteric academic exercise. ICER and other organizations use these data formulations to make recommendations on drug value and pricing. And healthcare payers – including private insurers and government health programs – use those recommendations to make coverage and payment decisions that have a real-world impact on sick patients, caregivers, and their families.

Researchers have already wasted 30 years trying to use the EQ-5D framework in ways for which it is objectively unfit. Going forward, experts and patient advocates should demand that potentially life-altering conclusions about the proper price or value of new treatments be based on hard data and accurate measures, not on imaginary claims based on bad science.