ACP American College of Physicians - Internal Medicine - Doctors for Adults

Effective Clinical Practice

A Primer on Before/After Studies: Evaluating a Report of a "Successful Intervention"

September/October 1999

It can be difficult to rigorously evaluate a clinical management or quality improvement intervention. Because these interventions generally occur at a system level (e.g. throughout the clinic, the hospital or the health plan), it may not be practical to obtain suitable concurrent controls (e.g. clinics, hospitals or plans not exposed to the intervention). As illustrated below, a common approach is to measure outcomes before the intervention is implemented and compare them to outcomes measured afterwards - an approach often referred as a before/after study (or pre-post study).

Schematic Representation

While academics can easily criticize the lack of a concurrent control group, managers still need to make decisions based data available to them. This primer is intended to provide guidance on how to critically think about a report of "successful intervention" obtained from a before/after study.

As with any report of "success", readers should start by considering whether the observed changes were trivial or whether critical outcomes were ignored. If the reader concludes that there were important changes involving the relevant outcomes, then he or she must go on to challenge the fundamental inference: that the "success" is a consequence of the intervention. The validity of this inference is threatened with an affirmative responses to any of the following questions:

Would all subjects in the "before group" be eligible for the "after group"?

A typical before/after study compares the outcomes of hospitalized patients before and after some system intervention. Thus it is common that different patients are involved (e.g. patients admitted with pneumonia in June are compared to patients admitted with pneumonia in July). If only certain patients are eligible for the intervention, however, an inference about its "success" can be seriously flawed. Consider a study of the effect an outpatient low molecular weight heparin program (which has specified selection criteria) on the average length of stay of patients with deep venous thrombosis (DVT). A comparison of cost between all DVT patients (before) with DVT patients eligible for outpatient program (after) would dramatically overestimate the effect of the intervention. The best estimate of the interventions effect would be to compare all DVT patients (before) with all DVT patients (after) - including both those eligible and ineligible for the program. The comparability of patients in the before and after group is particularly relevant in assessments of the effect of guidelines (which generally apply to select patient subgroups).

Is there evidence for a prevailing "temporal trend"?

Many outcomes change across time - regardless of whether there has been intervention or not. Consider a before/after study testing an intervention to reduce a hospital's length of stay. The average length of stay is 5 days before the intervention but is 4.7 days after it. It is tempting to believe the intervention caused the change. On the other hand, there is a prevailing temporal trend: length of stay has been falling everywhere across time (at least until recently). The same problem would arise in a before/after study testing an intervention to increase the use of Aspirin in post-myocardial infarction patients. It would be difficult to untangle whether the observed change is the result of the intervention or dramatic television advertising. Because many forces are likely to be acting on outcomes people care about, it is important to question whether an intervention is truly responsible for "success" - particularly if outcomes are improving everywhere.

Were study subjects selected because they were "outliers"?

Understandably, some before/after studies target "problem areas" and select subjects who are "outliers" - subjects who have extreme values in some measure. These studies may follow the same subjects over time and face another threat to validity: regression to the mean. Examples could include a study of case management in patients who have had high utilization in the past or a study of an intensive communication tutorial in physicians who have been judged by their patients to have poor communication skills. Even if there is no intervention, subjects selected because of extreme values will, on average, be found to have less extreme values with repeated measurement. A group of patients with extremely high utilization one year will tend not to be so high the next (some may have had a major heart attack, stroke or other catastrophic event that does not reoccur in the next year); a group of physicians with extremely poor communication skills will tend to improve (some may have had a personal crisis that resolves in the ensuing year). Note that in neither case are the subjects expected to return to the mean, just become less extreme. Regression to the mean sets the stage to ascribe changes to a "case management program" or a "communication tutorial" when they, in fact, represent the natural course of events.

Although it is always possible that a change observed in a before/after study is a consequence of the intervention, affirmative responses to any of the above make the inference more tenuous. Alternatively, the inference is strengthened when investigators a paid careful attention to the comparability of the subjects being compared. Inferences are further strengthen when the observed change is substantial, relatively unique and occurs quickly following the intervention - in other words, when it is difficult to ascribe to temporal trends. The confusing effect of regression to the mean can be avoided if subjects are not selected because they are outliers. Nonetheless, inferences from a before/after study should be seen as being based on circumstantial evidence. If the accuracy of the inference is important, readers and researchers alike must ask if there a reasonable opportunity to test the intervention using concurrent controls.