Who Gets Counted
Lets assume we have gone through all the steps of finding
100 patients with phobia, allocating them to the 2 groups at random,
and carrying out the intervention. At the end of the study, when
we start looking at the data, we find that 10 subjects in the Other
group have actually stumbled across the RMP series on their own
and have read all the articles. In the RMP group, 2 subjects committed
suicide before the classes began; 7 dropped out before writing the
final exam; and 3 withdrew before the study began, claiming that
their phobia was cured. Do we count these people; if so, to which
group do we assign the results? That is, are the results of the
subjects in the Other group who read the RMP articles attributed
to RMP or to Other? The answer is, as one would expect from a statistician-psychologist,
It all depends.
The first thing it depends on is how many people weve lost.
Ideally, weve taken measures to keep this number as small
as possible, in which case it really wont matter much how
we count their results, because it wont appreciably change
the results. However, if despite our best efforts weve lost
more than roughly 10% to 15% of the subjects, then we have to consider
whether we are conducting an efficacy trial or an effectiveness
one. If we are asking the question, Can the intervention work?
(that is, if we are testing its efficacy), then we are in a bit
of a bind. We can argue on the one hand that it doesnt make
sense to blame the RMP intervention for the deaths of the 2 subjects
if they were never exposed to RMP. Nor does it make sense to credit
RMP with curing the 3 who got better before starting the trial.
The 10 in the Other group who actually read the RMP articles are
a bit more troublesome, in that they were likely exposed to both
conditions; however, we can again argue that the best course of
action would be to drop them from the analyses, along with the 7
subjects who withdrew. On the other hand (and theres always
another hand), the more subjects we drop from the analyses, the
greater the possibility that were biasing the results by deviating
from random assignment. Theres no easy solution to this conundrum.
For therapies in which it is difficult to disentangle the beneficial
effects from the side effects (for example, drug therapies), can
and does may boil down to the same thingso we
should count people who dropped out. With other types of interventions,
such as the talking therapies, it may be possible to alter those
aspects that lead to dropping out without affecting the therapy
itself (for example, by extending clinic hours or even bringing
the therapy to the patient). In these situations, there may be a
big difference between can and does, so it makes more sense not
to count people who have dropped out.
The picture is entirely different for an effectiveness trial, in
that the bind disappearswe have to count everybody. In real
life, if patients become desperate and commit suicide before the
treatment has had a chance to work, then that is the fault of the
treatment and how it is delivered. One cannot ignore the fact that
receiving therapy often involves being on a waiting list for a period
of time or that the drug may not start to work for 2 or 3 weeks.
Similarly, patients may discontinue a treatment because of adverse
side effects, which can be anything from blurry vision caused by
reading 22 articles, to the inconvenience of coming to class for
an entire semester, to time lost from work. Finally, patients assigned
to one treatment mode may deliberately or accidentally receive the
other intervention. These are the realities of life and some of
the reasons why the results of effectiveness trials are always equal
to or worse, but never better, than those of efficacy studies. This
last observation has been documented by Weisz, Donenberg, Han, and
Weiss in a metaanalysis of child psychotherapy trials (17). In well-controlled
studies that were closer to the efficacy end of the continuum, the
children in the experimental conditions scored 0.75 SD above those
in the control conditions. Translated into English, this means that
77% of the kids who received the interventions did better than the
average kid who was a control subject. However, in studies that
were carried out in regular clinic settings (that is, closer to
the effectiveness end of the continuum), this difference virtually
disappeared.
Analysis
The differences in the study objectives also affects the approach
to the statistical analyses. Effectiveness trials must count all
the patients in the group to which they were originally assigned.
This is referred to as an intention-to-treat analysis.
As we have discussed, dropouts can jeopardize the results of effectiveness
studies because we cannot assume that those who discontinued are
a random subset of all the subjects (18). Rather, they are more
probably those who benefited the most or the least. Various statistical
techniques have been developedsuch as the Last Observation
Carried Forward (LOCF), multiple imputation, or growth curve analysisthat
allow subjects who miss appointments or who drop out entirely to
be included in the analyses (18).
These procedures are not required to the same degree in efficacy
studies, because we are interested only in those patients who received
the full course of treatment. As a consequence, we are not interested
in those who discontinued early, for whatever reason, or those who
were contaminated by receiving some or all of the wrong treatment.
Here, imputation is used to fill in the blanks when some demographic
data are missing, or if the patient skipped some appointments in
the middle of the test. We would not impute data if a subject dropped
out entirely.
Conclusions
Cook and Campbell differentiate between the internal validity of
a study and its external validity (19). The former refers to the
design aspects of the investigationhow well it was carried
out, the degree to which various biases were avoided, and whether
it had minimal dropouts. Internal validity affects the degree to
which we can conclude that the outcome resulted from the intervention
and from other factors, such as the groups differing on key
variables or differential dropouts from the various conditions.
Of equal, if not greater, concern for clinicians is the studys
external validity, which affects our ability to generalize the results
of the trial to the conditions that obtain in real life. Very often,
there is a trade-off between these 2 types of validity: tightening
up admission criteria increases internal validity at the expense
of external validity, as does increasing the control over what happens
during the session. To minimize potential sources of error,
we have to sacrifice verisimilitude. Conversely, the more we try
to mirror the reality of the therapeutic encounter, the greater
the chances are that factors outside of our control (and perhaps
of our knowledge) may be responsible for the results (20,
p 117).
So, which should come first, effectiveness studies or efficacy
studies? The answer is very definite: for the clinician, the most
useful information comes from effectiveness studies. From this perspective,
it would be wise to start with an effectiveness trial because it,
and it alone, tells whether the intervention will work in real life.
However, theres a risk associated with this. It is quite possible
that the intervention can work, but there may have been problems
in the treatment deliveryin patient selection criteria, in
therapist training, in nonadherence due to side effects or the requirements
of the study itself, or in other factors that led to finding no
difference between the groups. This Type II errorconcluding
that there is no significant effect when in fact there is onemay
prematurely cut off further research in the area. Had there been
significant findings from a previous efficacy study, though, researchers
would be more inclined to start investigating the reasons for the
effectiveness studys failure, focusing on the way the therapy
was delivered, rather than dismissing the treatment as ineffective.
No single study can answer all questions, and investigators must
decide where they want to be on the efficacy-effectiveness spectrum.
Consequently, to know more about the usefulness of an intervention,
we require a series of studies, spanning the continuum from one
end to the other.
References
1. Schwartz D, Lellouch J. Explanatory and pragmatic
attitudes in therapeutic trials. J Chronic Dis 1967;20:63748.
2. Hotopf M, Churchill R, Lewis G. Pragmatic randomised controlled
trials in psychiatry. Br J Psychiatry 1999;175:21723.
3. Sackett DL, Gent M. Controversy in counting and attributing events
in clinical trials. NEJM 1979;301:14102.
4. Norman GR, Streiner DL. Biostatistics: the bare essentials. 2nd
ed. Toronto: BC Decker; 2000.
5. Streiner DL. Risky business: making sense of estimates of risk.
Can J Psychiatry 1988;43:4115.
6. Quitkin FM, McGrath PJ, Stewart JW, Ocepek-Welikson K, Taylor
BP, and others. Placebo run-in period in studies of depressive disorder.
Br J Psychiatry 1998;173:2428.
7. Calabrese JR, Rapport DJ, Shelton MD, Kimmel SE. Evolving methodologies
in bipolar maintenance research. Br J Psychiatry 2001;178:S157S163.
8. Veterans Adminstration Cooperative Study Group on Antihypertensive
Agents. Effects of treatment on morbidity in hypertension. II. Results
in patients with diastolic blood pressure averaging 90 through 114
mm Hg. JAMA 1970;213:114352.
9. Elkin I, Parloff MB, Hadley SW, Autry JH. NIMH treatment of depression
collaborative research program: background and research plan. Arch
Gen Psychiatry 1985;42:30516.
10. Luborsky L, DeRubeis RJ. The use of psychotherapy treatment
manuals: a small revolution in psychotherapy research style. Clin
Psychol Rev 1984;4:514.
11. Sensky T, Turkington D, Kingdon D, Scott JL, Scott J, Siddle
R, OCarroll M, Barnes TR. A randomized controlled trial of
cognitive behavioral therapy for persistent symptoms in schizophrenia
resistant to medication. Arch Gen Psychiatry 2000;57:16572.
12. Philipp M, Kohnen R, Hiller KO. Hypericum extract versus imipramine
or placebo in patients with moderate depression: randomized multicentre
study of treatment for eight weeks. BMJ 1999;319:15349.
13. Scott J, Teasdale JD, Paykel ES, Johnson AL, Abbott R, Hayhur
Moore R, and others. Effects of cognitive therapy on psychological
symptoms and social functioning in residual depression. Br J Psychiatry
2000;177:4406.
14. Gasecki AP, Eliasziw M, Ferguson GG, Hachinski V, Barnett HJ.
Long-term prognosis and effect of endarterectomy in patients with
symptomatic severe carotid stenosis and contralateral carotid stenosis
or occlusion: results from NASCET. North American Symptomatic Carotid
Endarterectomy Trial (NASCET) Group. J Neurosurg 1995;83:77882.
15. Streiner DL. Do you see what I mean? Indices of central tendency.
Can J Psychiat 2000;45:8336.
16. Donoghue J, Hylan TR. Antidepressant use in clinical practice:
efficacy v. effectiveness. Br J Psychiatry 2001;179 (Suppl 42):S9S17.
17. Weisz JR, Donenberg GR, Han SS, Weiss B. Bridging the gap between
laboratory and clinic in child and adolescent psychotherapy. J Consult
Clin Psychol 1995;63:688701.
18. Streiner DL. The case of the missing data: methods of dealing
with drop-outs and other vagaries of research. Can J Psychiatry
2002;47:6875.
19. Cook TD, Campbell DT. Quasi-experimentation: design and analysis
issues for field settings. Boston: Houghton Mifflin; 1979.
20. Streiner DL. Evaluating what we do. In: Cullari
S, editor. Foundations of clinical psychology. Boston: Allyn and
Bacon; 1998.
--------------------------------------------------------------------------------
This is the 22nd article in the series on Research Methods in Psychiatry.
For previous articles, please see Can J Psychiatry 1990;35:61620,
1991;36:35762, 1993;38:913, 1993;38:1408, 1994;39:13540,
1994;39:1916, 1995;40:606, 1995;40:43944; 1996;41:13743,
1996; 41:4917, 1996;41:498502, 1997;42:38894,
1998;43:1739, 1998;43:4115, 1998;43: 73741, 1998;43:83742,
1999;44:1759, 2000;45:8336, 2001;46:726, 2002;47:6875,
2002;47;2626.
Manuscript received November 2001 and accepted April 2002.
1 Director, Kunin-Lunenfeld Applied Research Unit, Baycrest Centre
for Geriatric Care; Professor, Department of Psychiatry, University
of Toronto, Toronto, Ontario.
Address for correspondence: Dr DL Streiner, Director, Kunin-Lunenfeld
Applied Research Unit, Baycrest Centre for Geriatric Care, 3560
Bathurst Street, Toronto, ON M6A 2E1
E-mail: dstreiner@klaru-baycrest.on.ca
1
| 2 | 3
|