The recent Health Affairs blog Who is sowing seeds of confusion about the QALY? fails to acknowledge the most fundamental problem with the Quality Adjusted Life Year (QALY): the QALY is a mathematically impossible construct. Debates of the merits and scope of QALY claims; their use by organizations such as ICER to support value assessment; the discriminatory nature of the QALY; and its selective use by manufacturers are totally beside the point. The QALY is inherently flawed, and we cannot continue to make access and coverage decisions using an inappropriate and fundamentally faulty measurement.
What is a QALY and why does it fall short?
A QALY is a measure of the value of health technology outcomes. A QALY is constructed by multiplying time spent in a disease state by a generated utility score, and then applying generic “community preference” weights assigned to the ordinal responses, resulting in an overall ordinal scale. However, for this operation to be consistent with the axioms of fundamental measurement, the utility score must have ratio properties (which – as an ordinal scale – it does not). In other words, it is not enough to know that A is greater than B – we must also know the “distance” between A and B. We need to understand what has been measured and reported.
To illustrate, let’s take a look at the EQ-5D-3L utility score, which is commonly used to build QALYs. This score is constructed from five attributes or symptoms: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. Each attribute has three levels: no problem, some problems, and major problems. Here’s the issue: We have no idea of what the “distance” is between responses, and that distance is critical to making valid claims and drawing reasonable conclusions. We might prefer A over B, but we can do nothing other than order A greater than B (no problem vs. some problem). As a result, we have no idea just how much better A is than B. Is A the equivalent of 100 Bs? Or is A the equivalent of 1.5 Bs?
And why are we applying generic “community preferences” and expecting the results to reflect the needs and preferences of actual real patients living with a specific disease?
Health technology assessment frameworks that use the QALY put hypothesis testing aside in favor of creating evidence to support value assessment and claims for cost-effectiveness. In its current form, the QALY creates only “approximate information.” It does not provide – and is not designed to provide – specific, real-world data that can be tested and replicated. Unless we understand the implications of meeting the standards of fundamental measurement in creating patient-reported outcomes instruments, we will continue to create, inadvertently, measures of response to therapy that are technically invalid.
Part of the incentive for this “approximate information” creation was that, at product launch, real world evidence to support competing cost-effectiveness claims was limited. Rather than proposing research programs to capture data and report results to formulary committees in meaningful timeframes, value assessment frameworks used by ICER and others create imaginary world evidence.
Advocates for this creation of imaginary world evidence argue that with a judicious choice of model framework, it could be considered “realistic” and provide an estimation of the lifetime experience and treatment effects for a hypothetical group of patients exposed to new therapies. The problem here is “realistic” is not the same as “real.” We cannot simply create and then aggregate QALYs over a hypothetical lifetime disease state and expect to be able to reach meaningful conclusions that apply to the real world and real patients.
The first step is to acknowledge that cost-per-QALY value assessments and their “approximate information” are inappropriate for use in making decisions about the price, access, and coverage for treatments for real patients in the real world.
The second step is to understand that we have the tools to create patient outcomes instruments – to include quality of life – that meet the required axioms of fundamental measurement. Instruments developed following the application of conjoint measurement theory in Rasch modelling are readily available for some 30 disease states.
The third step is to commit to the development of instruments to assess patient-reported outcomes across a disease area that meet the standards of fundamental measurement. Only then can we have confidence that we understand the full “value” of treatments to real patients, clinicians, communities, and society at large.
Paul C. Langley, Ph.D., Adjunct Professor, College of Pharmacy, University of Minnesota