Interpreting Clinical Trials
Wednesday, December 15, 2004
Dermatologists are constantly exposed to clinical trials data at
meetings, in the news, and in the medical literature. At first
glance, this information, usually presented in easy-to-read bar
graphs and charts, may seem straightforward. What may not be so
apparent are the background issues that have influenced the study
design, data that have not been presented, and nuances in design or
interpretation that may be crucially important but easily
Understanding clinical trial design and how outcomes are
measured is therefore critical to interpreting the results of
studies and ultimately to applying these findings in practice.
While the randomized double-blind study is the gold standard in
clinical trials, there are other valid and meaningful approaches.
This article will review some of the features that are important to
consider when evaluating results from a clinical trial.
First, good trial design requires a clear and concise statement
of the central research question, or hypothesis.
Multiple analyses of other endpoints should be specified ahead of
time to avoid erroneous conclusions caused by statistical "fishing
trips" or data-mining. Eventually, if enough ad hoc tests are run,
some tests will appear to be statistically valid by chance alone,
and unfocused analysis increases this possibility.
The two most common design forms are the
randomized trial and the
observational trial. Randomization attempts to
remove potential bias in the allocation of subjects to different
testing groups. When done correctly, it should produce intervention
and control groups that are, on average, evenly balanced in terms
of both predictable prognostic factors (such as age and gender) and
other, unknown characteristics. Randomization, however, is by
definition random, and thus the groups are unlikely to be exactly
the same, although they will tend to be more similar if larger
groups are involved.
Observational trials monitor a group or several
groups as they pass through time. There are obvious limitations to
observational trials, as the groups being followed may not be the
same and as there may be bias in evaluating the outcome when such
trials are not blinded. Nonetheless, observational
studies can be effective in generating some kinds of information,
and they may be the only practical way to look at some diseases.
One large analysis has suggested that results from observational
studies are often similar to those of randomized studies. Not
surprisingly, however, observational studies usually overestimate
the magnitude of any beneficial effect.1
There are many advantages to placebo-controlled
trials. They demonstrate absolute efficacy and safety, and
allow for distinction between adverse events due to the drug and
those due to the underlying disease or background noise.
They also detect treatment effects with a smaller sample size than
used in any other type of concurrently controlled study, while
minimizing the effect of subject and investigator expectations. The
major disadvantage is that, if there are other known effective
treatments available, patients may have to forgo them for some time
in order to be able to participate. (Regulatory agencies in Europe
recently mandated that studies compare a new drug with the
standard-of-care treatment available rather than with a placebo, in
contrast to most trials in the United States.)
Blinding is important as well; in an
unblinded, or open-label, trial,
both the subject and the investigator know which intervention the
subject has been assigned. In single-blind studies, only the
investigator or subject is aware of which intervention the subject
is receiving. In a double-blind study, neither the subject nor the
investigator knows the treatment or group assignment. Given the
magnitude of the effect of vehicle alone in topical studies, the
placebo effect overall, and the tendency for investigators to
optimistically grade changes in a positive way, blinding can be the
key feature that differentiates a scientific assessment from a
The open-label extension is a study design in
which the investigator is aware which intervention is being given
to which participant after the blinded portion of the study has
been completed. Some studies with an open-label design are
randomized, but most do not include a comparison group. Open-label
study designs may vary; some will take all comers, while others may
limit enrollment to responders or nonresponders. These choices
significantly change the patient pool being evaluated and may have
an important impact on outcome.
An intent-to-treat (ITT) analysis is a
comparison between two or more groups assigned to receive different
treatments that includes all enrolled subjects, regardless of
whether they may have dropped out of the study due to lack of
compliance, intolerance, or a concomitant, unrelated illness.
Studies that do not use this approach may yield results that appear
more promising than they really are. For example, if many people
leave a study because their disease worsens, and if they are not
included in the analysis, it will make the percentage of successes
appear greater. If patients drop out for unrelated reasons, they
are counted as failures.
Clinical trials should have sufficient statistical
power to detect the differences between groups,
and this feature is an important part of the determination of
sample size. The calculation of sample size is
based on the nature of the condition, the desired precision of the
answer, the degree of improvement expected, the availability of
alternative treatments, the knowledge of the intervention being
studied, and the availability of participants. If not "powered"
appropriately, important findings may be missed. The type of
comparison being made should also be included in this calculation.
For example, concluding that two drugs have "equivalent efficacy"
may not be valid if the study was too small to show a small but
clinically meaningful difference between them.
A P value of less than 0.05 is usually
defined as the lowest level of significance at which the null
hypothesis can be rejected; it is the conventional cutoff used in
studies to determine that a result likely did not happen by chance.
In other words, this value signals that the difference between
groups could have occurred by chance alone in less than 1 time in
20. Clinical significance, on the other hand, is a
matter of judgment, and clearly some results that are statistically
significant may be clinically insignificant.
More that just primary data is collected during clinical trials.
Adverse drug reactions (ADRs) are recorded, but so are all
adverse events (AEs). Adverse events include all
noxious untoward events, regardless of whether a causal
relationship with the intervention is being tested, in order to
detect any signals of unexpected problems. Although some studies
may have enough patients to show a difference in efficacy, there
may not be enough patients to show differences in adverse events,
especially if the adverse events are rare, so adverse events
reports must be interpreted with caution.
Comparing results of individual drug combinations assessed in
different trials poses several limitations. Entry criteria, methods
of analysis, and degrees of adherence may all be different, and
therefore comparisons are difficult to interpret.
Meta-analysis faces some of these same problems as
it is a combination of several studies, but it uses a quantitative
method for analyzing the pooled results of more than a
single study to improve power, especially when results from
different studies are inconsistent.
Of course, the analysis of study findings does not end with the
data. Continued critique of whether the authors' conclusions are
consistent with the study findings, whether conclusions stay within
the parameters of the study design, and whether clinical decisions
can be made based on the authors' conclusions remains the final
level of analysis. However, an understanding of how the researchers
sought to answer their question can illuminate whether their
conclusions are indeed worthwhile.
- Ioannidis JP, Haidich A, Pappa M, et al. Comparison of evidence
of treatment effects in randomized and nonrandomized studies.