|
Those of you who are old enough may remember Neil Sedaka singing
Breaking Up is Hard to Do. If only that were true when
it comes to the variables we use in research! Many times (I would
say far too many), a researcher uses a continuous measure, such
as a depression inventory, as an outcome variable and then dichotomizes
itabove or below some cut-point, for example, or the number
of people who did and did not show a 50% reduction in their scores
from baseline to follow-up (1). Less often, but again far too frequently,
researchers may assign patients to different groups by dichotomizing
or trichotomizing scores from a continuous scale.
Over the years, several arguments have tried to justify this practice.
Perhaps the most common one runs something like this: Clinicians
have to make dichotomous decisions to treat or not to treat, so
it makes sense to have a binary outcome. Another rationale
that is offered is, Physicians find it easier to understand
the results when theyre expressed as proportions or odds ratios.
They have difficulty grasping the meaning of beta weights and other
indices that emerge when we use continuous variables. In this
article, Ill try to show that you pay a very stiff penalty
in terms of power or sample size when continuous variables are broken
up, with the consequent risk of a Type II error (that is, failing
to detect real differences). But before we begin, let me assume
the role of a marriage counsellor and see whether the arguments
in favour of splitting up are really viable.
The rationale for dichotomizing outcomes because clinical decisions
are binary fails on 3 grounds. The primary one is that it confuses
measurement with decision making. The purpose of most research is
to discover relationsrelations between or among variables
or between treatment interventions and outcomes. The more accurate
the findings, the better the decisions that we can make; that is,
the findings come first and the decision making follows. As we will
see, findings come more readily and more accurately when we retain
the scaling of continuous variables. The second reason is that all
the research using the old dichotomy becomes useless if the cut-point
changes. For example, the definition of hypertension used to be
160/95 (2). If we defined the outcome of intervention trials dichotomouslywith
above 160/95 being hypertensive and below being normotensivethen
those findings would become useless after the definition changed
to 140/90 (3). If we expressed the outcome as a continuum, however,
the values of beta coefficients and similar indices showing the
effects of various risk and protective factors would not change
at all: if we wanted to use statistics such as odds ratios (ORs)
or the percentage of patients who improved, it would be a trivial
matter to recalculate the results. We have a similar situation in
psychiatry. The diagnosis of antisocial personality disorder (ASP),
for example, is a binary one: the person either does or does not
satisfy the diagnostic criteria (that is, a certain number of symptoms
are present). However, Livesley and others maintain that ASP and
many other disorders should actually be seen as a continuum: the
more symptoms that are checked off, the more of the trait the person
has (4). If the number of symptoms necessary to meet the criteria
were to change, as occurred when DSM-IV replaced DSM-III-R, then
much previous research using a dichotomous diagnosis would have
to be discarded. If the diagnosis were expressed as the number of
symptoms present, though, it would be relatively easy to reinterpret
the findings using the new criteria.
|
|
Finally, whether to hospitalize a patient with suicidal ideation
or to discharge a patient with symptoms of schizophrenia may be
binary decisions, but many treatmentsperhaps most fall
along a continuum involving the dosage or strength of a medication
and the number and frequency of therapy sessions.
As for the argument that physicians are more comfortable with statistics
based on categorical measures, we are likely dealing with both a
base canard that they, like old dogs, cannot learn new tricks and
a vicious circle. As long as the belief persists, studies will be
designed, analyzed, and reported using proportions and ORs, meaning
that physicians will not have the opportunity to become more comfortable
with other approaches.
First, Ill give some examples of how dichotomizing can lead
us astray, and then Ill use these examples to discuss why
this is the case.
Example 1
Lets look at the data in Table
1, which shows scores on a scale for 2 groups, each with
10 subjects. Lets assume that, if we were to dichotomize the
scale, we would use a criterion for caseness of 15/16:
people with scores from 1 to 15 would be considered normal, and
those with scores of 16 and over would be defined as cases. The
mean for Group 1 is 11.70, and the mean for Group 2 is 16.80. There
is slightly more than a 5-point difference between the groups, and
the average of the first group is well below the cut-off of 15/16,
while the average of the second group is above the cut-point. If
we used a t-test to compare the groups, wed find that t(18)
= 2.16, P = 0.045. That is, there is a statistically significant
difference between the means. Now, lets dichotomize the results
and count the number of people above and below the cut-point in
each group. What wed find is shown in Table
2. Because 2 of the cells have frequencies below 5, wed
use a Fishers exact test, rather than a chi-square test, and
wed find that the P level is 0.057. In other words, the difference
is not statistically significant.
Example 2
In the second example, we have 40 subjects, measured on 4 variables,
A through D. If we were to correlate these variables, wed
find the results shown in the upper triangle of Table
3. Of the 6 correlations, 5 are significant at the P
< 0.01 level. Now, well do a median split on each of these
variables, so that roughly one-half of the subjects fall above,
and one-half below, the cut-point. If we reran the correlations,
we would find the results in the lower triangle of the same table.
In every case, the correlations are lowersometimes substantively
soand only 2 of the 6 correlations are significant at the
P < 0.01 level.
Taking this example a bit further, we can run a regression
equation, with A as the dependent variable (DV) and B through D
as the predictors. Keeping the variables as continua, wed
find the multiple R is 0.767 and R2 = 0.588, which would lead to
thoughts of publication and promotion for most people. If we dichotomized
the variables, however, wed find that the multiple R is 0.460,
with an associated R2 of 0.211, which might jeopardize that promotion
by at least a year. (Purists might say that we should really use
a logistic regression with a dichotomous DV. If we did, wed
find the Cox and Snell pseudo-R2 to be an even more disappointing
0.20.)
|