Research Methods in Psychiatry
The 2 Es of Research: Efficacy and Effectiveness Trials
David L Streiner, PhD1
| |
Studies that investigate the usefulness of various therapies
fall along a continuum that ranges from those looking at whether
an intervention can work under ideal circumstances (efficacy
trials) to those that focus on whether a treatment works when
applied in the real world (effectiveness trials). Whether
a study is closer to one end of the spectrum or the other
affects almost every aspect of the trial. These aspects include
which patients are eligible for enrolment, the degree of control
over the way the intervention is delivered, which patients
are or are not included in the analyses, how missing data
are handled, and even which statistical tests may be used.
The 2 types of trials may yield different results, but both
provide useful information. This paper explores these issues,
shows the decisions researchers must take at each phase of
a trial, and discusses how clinicians should interpret the
results.
(Can J Pychiatry 2002;47:552556)
Click here
for Author Affiliations
Key Words: efficacy, effectiveness, study
design, subject selection
Résumé :
Les deux « E » de la recherche : essais defficacité
et deffet utile
|
It is fairly well accepted now that the best evidence for demonstrating
that an intervention works comes from the results of a randomized
controlled trial (RCT), in which eligible patients are randomly
assigned to either a new therapy or to a comparison group. However,
there are RCTs, and then there are RCTs. In other words, not all
RCTs are the same. In this article, we will discuss the differences
between RCTs designed to demonstrate the effectiveness of treatment
and those that look at the efficacy of an intervention. Thus, its
first necessary to discuss the difference between efficacy and effectiveness.
Efficacy is concerned with the question, Can a treatment work under
ideal circumstances? Conversely, effectiveness addresses the question,
Does it work in the real world? Studies that focus on efficacy do
everything possible to maximize the chances of showing an effect.
The rationale is that if the treatment cannot be shown to work under
the best conditions, there isnt a ghost of a chance that it
will be effective in actual practice. On the other hand, effectiveness
studies emphasize the applicability of the treatment and therefore
try harder to duplicate the situations that clinicians will encounter
in their practices. The 2 study types are referred to in terms that
describe their differing aims and designs. They are sometimes distinguished
as explanatory and pragmatic trials (1,2); at other times they are
called explanatory and management trials (3). Pragmatic
and management capture the flavour of the question,
Do things work in the real world? However, explanatory
is a bit misleading, because the emphasis in the study is more on
can than on why. So, Ill continue
to use the terms efficacy and effectiveness in this paper.
No matter what theyre called, though, the difference between
the 2 types has implications for who is selected to be in the study,
how the intervention is delivered, how dropouts and people who receive
the wrong treatment are handled, and how the results
are analyzed. In actuality, efficacy and effectiveness studies are
the extremes of a continuum, and most studies fall somewhere in
between. However, it is important for the reader of trials to be
aware of these implications, because they affect (or should affect)
how the results are interpreted: do you change your clinical practice
today, in light of the findings, or do you wait until more convincing
evidence is in?
To illustrate the difference, I will focus on the treatment of
a disorder that Geoff Norman and I discovered several years ago:
photonumerophobia, or the fear that ones fear of numbers will
come to light (4,5). Unfortunately, this malady has not yet been
recognized by DSM or ICD. Nevertheless, after teaching statistics
to medical students, nurses, and grad students for more than 3 decades,
it is obvious (at least to us) that this is a widely prevalent and
disabling condition, but one that is now amenable to treatment.
The therapy we propose is teaching statistics using the articles
in this Research Methods in Psychiatry (RMP) series.
To test whether it works, well have a comparison group of
people treated with another set of readings (the Other condition).
The outcome will be the number of people who are not phobic at the
end of the semester.
Subject Selection
Most parametric statistical tests, such as the t-test and the analysis
of variance (ANOVA), compare how large the difference between the
groups is in relation to the variability within groups. That is,
they assume that differences among people within the same group
is error (a better term would perhaps be unexplained
variability) and that the between-group variability must be
larger than the within-group variance to show that something is
going on. Therefore, when designing a study, we maximize the chances
that well find a statistically significant result if we 1)
make the difference between the groups as large as possible and
2) make the variability within the groups as small as possible.
We have relatively little control over the first factor because
its a function primarily of how well the intervention works
(although well soon see how we can exert that small degree
of control). However, we can affect the within-group variability
by making the groups as homogeneous as possible in terms of age,
sex (to the degree allowed by the granting agencies), other treatments
received within a certain time frame, and most important, by ensuring
the strictness of the diagnostic criteria and the absence of comorbid
disorders. This is why the subjects section of efficacy
studies begins with a long list of inclusion and exclusion criteria:
the criteria exist not only to ensure that the people in the trial
actually have the disorder of interest but also to make the groups
homogeneous and thus reduce the within-group variability.
Some efficacy studies go even further than making the groups homogeneous:
they try to exclude those who may be placebo responders
and to enrol only those patients who will be most compliant and
most responsive to the intervention. For example, some studies use
a single-blind run-in phase (6), during which all eligible patients
are placed on a placebo. Those who show an improvement are eliminated
from the study because they would inflate the change seen in the
comparison group and thus reduce the between-group difference. Another
tactic is to use an enriched sample (7) of patients who have previously
been shown to respond to the intervention. A third approach, used
in the Veterans Administration hypertension study, had patients
take a riboflavin-labelled placebo (8). This allowed the investigators
to determine which subjects would comply with taking medications
and to reject the others.
The drawback of this approach is that the subjects in the study
become less and less like the patients encountered in actual practice.
The practising clinician does not have the researchers luxury
of saying, I cant treat your schizophrenia, because
DSM-IV says you must have had your symptoms for 6 months, and yours
have persisted for only 5 months, or You also suffer
from an anxiety disorder, so out you go. It is also difficult
to test whether the patient will be compliant; in any case, the
therapist must try to treat the patient even if there are concerns
in this regard.
So, if we were designing an efficacy trial of RMP vs Other, we
would apply very tight criteria for photonumerophobia and exclude
all people who do not meet all of them. We would also reject from
the study people who might have other psychological or medical disorders
that could lessen the magnitude of the treatment effect or who received
some other form of therapy for the problem that would make it difficult
to determine what the active ingredient was. Conversely,
an effectiveness trial would include all people who present with
this complaint: all would be accepted for therapy, irrespective
of age, comorbidities, or other concurrent therapies. The sample
size might have to be increased to compensate for these confounding
conditions, but the results would be more generalizable to clinical
practice.
The Intervention
I mentioned earlier that the main determinant of the difference
between the groups is the effectiveness of the intervention itself
and that we have little control over this.
In fact, while we cannot enhance the true effect of the treatment,
there are many ways to make it perform less well. Needless to say,
efficacy studies try very hard to avoid these pitfalls, using various
techniques. From the providers perspective, these techniques
include having therapists attend training sessions so that they
can learn to perform the therapy systematically (9), using treatment
manuals that detail what should and should not be done during the
sessions (10), tape-recording the sessions so that they can later
be checked for adherence to the treatment protocol (11), having
fixed dosing regimens or algorithms in drug trials (12), or having
a fixed number of therapy sessions (13). Some surgical trials even
go so far as to drop surgeons or centres that have high perioperative
mortality or infection rates (14). On the receiving side of the
intervention, efficacy studies often have research nurses or assistants
call the patients to remind them to take their medication, to reschedule
missed appointments, and, sometimes, just to check on the patients
between visits.
The advantages of these strategies are obvious. The therapy is
delivered either by highly skilled and well-trained people or by
advanced students who receive continual supervision and feedback
from more senior clinicians. The intervention itself often follows
the recommendations of best-practice guidelines, which are (or at
least should be) based on the results of earlier clinical trials.
When medications are used, a frequent requirement is ongoing monitoring
of blood levels to ensure that the medications are within therapeutic
levels. Follow-up and reminder calls maximize adherence to the therapy,
and these contacts may themselves have some therapeutic effect.
Wouldnt life (or at least work) be wonderful if we all had
these resources! The sad reality is that the extra staff and lab
work are rarely available or affordable outside large, externally
funded RCTs. Further, despite what we read in letters of recommendation,
not everybody is in the top 5% of the profession: believe it or
not, one-half of the therapists in this world are below average
(15). Even excellent therapists, though, rarely have the luxury
of attending week- or month-long training courses after they finish
residency or a fellowship. More often, they learn of new techniques
through lectures, readings, or, at best, a 1-day preconference workshop
that does not have any provision for continuing supervision. The
consequence is that therapy in real life is rarely delivered as
effectively or uniformly as it is in controlled efficacy trials.
Donoghue and Hylan, for example, summarize the results of many surveys
showing that in primary and secondary care settings, tricyclic antidepressants
are frequently prescribed in dosages loweroften 50% lowerthan
those found to be efficacious in RCTs (16).
Effectiveness trials are closer to the end of the continuum that
reflects therapy as it is actually given. For example, they impose
fewer restrictions on how the treatment is delivered and monitor
patient compliance less. Even so, it is rare to see a random selection
of clinicians in effectiveness studies: they tend to be drawn from
people in academia, often in tertiary care teaching hospitals. It
is also becoming increasingly more common for studies of both efficacy
and effectiveness to use manualized therapy or drug algorithms;
and this is likely more usual in studies than in routine clinical
practice. This means that although there is probably still a difference
between the way therapy is delivered in effectiveness trials and
in real life, these studies tend to be more realistic than are efficacy
studies.
1 | 2
| 3
|