The False Promise of Physician Quality Stats
Robin Young • Wed, July 5th, 2017
Jason Shafrin, writing in his healthcare care blog Healthcare Economist on June 20, 2017, called attention to one of the unexpected consequences of physician quality statistics, namely that these scores may not be doing a good job of measuring care quality.
Physician quality scores are usually based on single composite statistical measures. Medicare’s Value-based Payment Modifier, Shafrin points out, boils several individual quality metrics down to a single quality score.
Statisticians are quick to spot the flaw in this methodology.
Writing in the journal Health Services Research, authors Martsolf, Carle, and Scanlon (2017) found that a single global measure of quality did not accurately predict quality care.
Notably, the researchers used health insurance claims data (October 2007 – 2010) for 134 physician practices in Seattle, Washington. The researchers used confirmatory and exploratory factor analysis to develop theory and empirically driven internally valid composite measures based on 19 quality indicators.
Martsolf, et al. found that their results did not support a single global measure using the entire set of quality indicators. They did, however, identify an acceptable multidimensional model (RMSEA = 0.059; CFI = 0.934; TLI = 0.910). The four dimensions used in the data were diabetes, depression, preventive care, and generic drug prescribing.
Martsolf et al. Conclusions
Finally, the authors concluded that while commonly used process indicators can be used to create a small set of useful composite measures, the lack of an internally valid single unidimensional global measure has important implications for policy approaches meant to improve quality by rewarding “high-quality physicians.”
Real World Ramifications
As Shafrin noted in his blog, the kind of global composite physician quality measures which are increasingly used to pay physicians is not without risk.
What are those risks?
Shafrin explains with this simple example: “Physician A could be excellent at diagnosing a condition but poor at treatment and Physician B could be excellent at treatment but poor at diagnosis. If this information where known to patients, and all patients went to Physician A for diagnosis and Physician B for treatment, they would both be excellent at treating the patients they do even though a composite score could rank both physicians as average. This example captures cases where quality is multidimensional. Quality metrics also must be reliable as well and accurately capture underlying physician quality when measured across a reasonable sample size of patients.”
In short, as Shafrin writes, “When indicators measuring unrelated constructs are included in a single score, the high score on some indicators could “hide” low scores on other indicators or vice versa. In this case, the composite measure does not provide a clear quality signal. Inclusion of invalid composite measures could actually hurt quality reporting by leading to physician practice misclassification.”
And, finally, Shafrin concludes: “When multiple indicators measuring distinct aspects of quality are inappropriately combined into a single measure, the resulting composite measure is not useful or even completely uninterpretable.”