Marie St-Georges, MPs4
| |
Objective:To report psychometric data from preliminary studies of the
Adolescent Dominic (AD), a pictorial screen for the most frequent Axis
I youth mental disorders.
Method: We created 113 picture items based on DSM-III-R diagnostic criteria
and assessed them for comprehension (sample 1, n = 114; sample 2, n = 40)
and reliability (sample 3, n = 128) in a group of adolescents aged 12 to
16 years living in the community. We used the kappa statistic to estimate
testretest reliability of symptoms, criteria and diagnoses, and intraclass
correlation coefficients (ICCs) for symptom and criterion scores. We assessed
internal consistency of symptom scores with the alpha coefficient.
Results: For symptoms, 54.4% of kappas were higher than 0.60, while only
2% were poor. ICCs for symptom scores yielded higher values (0.81 to 0.89)
than for criterion scores (0.51 to 0.86). Internal consistency of symptom
scores ranged from 0.52 to 0.83. Kappas for diagnoses ranged from 0.52
to 0.76.
Conclusion: Symptom reliability compared favourably with data from other
assessment interviews of youth mental disorders. Following these positive
results, a computerized DSM-IV version of the AD has focused on the assessment
of symptoms and is currently being tested for reliability and criterion
validity.
(Can J Psychiatry 2004;49:828–837)
Click here for author affiliations.
|
Clinical Implications
-
The Adolescent Dominic (AD) is a DSM-based standardized screen with demonstrated reliability among adolescents as young as age 12 years, living in the community.
-
The instrument can serve as a preliminary step in clinical interviewing and a complement to usual clinical practice.
The instrument could encourage the expression of adolescents’ own concerns and thereby help clinicians identify priorities for intervention.
Limitations
-
The low prevalence of mental disorders in community samples is the main limitation of studies of this type.
The AD does not assess all DSM mental disorders.
Cut-off scores with clinical samples have yet to be established, as has standardization with a large sample to provide normative data.
Results may not be generalizable to adolescents with physical or cognitive impairments or learning disabilities.
Psychometric properties are population-specific and should be interpreted considering the characteristics of the population from which the samples were drawn.
|
Key Words: adolescents, pictorial assessment, mental disorders, reliability, community samples
Résumé : Élaboration et fiabilité dun instrument pictural de dépistage
des troubles mentaux chez les jeunes adolescents
|
|
It is widely agreed that youngsters should be assessed directly regarding their mental health, because other informants’ reports cannot replace self-reports (1). Adults tend to pay attention to externalizing behaviour problems, whereas children and adolescents are better at identifying their internalizing disorders or behaviours that they may manifest without their parents’ knowledge (1–14). To this end, highly sophisticated, comprehensive, DSM-based structured diagnostic interviews for youth have been developed over the past 2 decades. However, their very level of sophistication makes them difficult to use in real-world conditions, and it is unlikely that service providers in primary care will ever use them extensively. Since standardized assessment is the foundation of science-based intervention, this limited applicability of instruments has become a major concern (15).
Another impediment to the adoption of these instruments by front-line service providers is that psychometric studies have mostly been conducted in clinical settings (16). Reliability and validity coefficients are highly specific to a given population. Consequently, a measure that is reliable when used in a heterogeneous sample (for example, a clinical sample) may be much less reliable in a more homogeneous one (for example, a community sample) (17). Psychometric studies in community samples have shown that reliability of youth responses to highly structured interviews is difficult to achieve (Table 1). For instance, the Diagnostic Interview for Children and Adolescents-Revised (DICA-R) and various versions of the Diagnostic Interview Schedule for Children-2 (DISC-2) provide substantial reliability for conduct disorder but less reliability for other behaviour disorders and only moderate reliability for depressive disorders; they are problematic for anxiety disorders (16,18–21). In addition, because all adolescents in these studies were grouped together for data analyses and results were not reported according to age level or any age groupings that would have revealed age differences in reliability, reported reliability coefficients may be overestimated for younger participants (that is, those aged 9 to 14 years) and underestimated for older ones (that is, those aged 15 to 18 years).
Table 1 Test-retest reliability studies of DICA-R and DISC-2 diagnoses:
community child or adolescent informants
|
|
Diagnoses
|
Boyle and others (18)
12 to 16 years
(n = 137) DICA-R
|
Jensen and others (16)
9 to 17 years
(n = 278) DISC-2.1
|
Ribera and others (19)
9 to 17 years
(n = 124) DISC-2.1
|
Schwab-Stone and others (20)
9 to 18 years
(n = 247) DISC-2.3
|
Breton and others (21)
12 to 14 years
(n = 145) DISC-2.25
|
|
|
k
|
k
|
k
|
k (SE)
|
k (SE)
|
|
|
Disruptive disorder
|
0.38
|
not available
|
0.52
|
not available
|
0.37 (0.17)
|
|
Attention-deficit hyperactivity disorder
|
0.24
|
0.43
|
0.02
|
0.10 (0.06)
|
Fewer than 5 positive cases at test
|
|
Oppostional defiant disorder
|
0.28
|
0.23
|
0.39
|
0.18 (0.05)
|
Fewer than 5 positive cases at test
|
|
Conduct disorder
|
0.92
|
0.60
|
0.71
|
0.64 (0.06)
|
0.49 (0.22)
|
|
Depressive disorders
|
0.38
|
0.29
|
0.22
|
0.35 (0.06)
|
0.55 (0.16)
|
|
Major depressive disorder
|
0.45
|
not available
|
0.26
|
0.37 (0.06)
|
not available
|
|
Dysthymia
|
0.40
|
0.29
|
0.00
|
0.43 (0.06)
|
not available
|
|
Anxiety disorders
|
not available
|
0.30
|
0.38
|
0.39 (0.06)
|
0.49 (0.11)
|
|
Separation anxiety disorder
|
0.00
|
not available
|
0.65
|
0.27 (0.06)
|
0.59 (0.19)
|
|
Overanxious disorder and generalized anxiety disorder
|
0.54/
|
not available
|
0.03/0.09
|
0.28 (0.05)/
|
0.53 (0.19)
|
|
Simple phobia
|
not available
|
not available
|
0.46
|
0.33 (0.06)
|
0.55 (0.12)
|
k (SE) = kappa value (standard error)
DICA-R = Diagnostic Interview for Children and Adolescents-Revised
DISC-2 = Diagnostic Interview Schedule for Children-2
¾ = no data collected
|
Age differences in the reliability of child interviews have not been thoroughly explored (18,22–23). A reliability study of the DISC symptom scores found that results yielded by highly structured interviews with clinically referred children under age 10 years should be interpreted cautiously (22). If a criterion of 0.70 is used for test–retest reliability, coefficients are moderate for children aged 10 to 13 years, especially in regard to depression (0.53) and anxiety disorders (0.54). Test–retest intraclass correlation coefficients (ICCs) averaged 0.60 for children aged 10 to 13 years and 0.71 for adolescents aged 14 to 18 years. The age issue is particularly relevant, since adequate reliability is a minimal standard for an assessment method and should usually be tested prior to evaluating validity (23).
Several factors may cause unreliability (17). Among these is information variance, and researchers have repeatedly tried to improve the information-gathering phase of diagnosis. This variance reflects phenomena such as bad phrasing of questions and recording of responses and respondents’ misunderstandings, lapses of concentration, and intentional resistance. For instance, it was found that very few children aged 9 to 11 years understood DISC questions involving the time at which symptoms occurred (24). Apart from the issue of time concepts, only 16% of children aged 9 years understood questions assessing depressive diagnoses; this only increased to 31% of those aged 11 years. Theories of cognitive development may explain some of the unreliable data obtained when youngsters are given structured interviews (25–26). The development of higher-level thinking skills in adolescence depends highly on cultural, social, and individual factors. Cognitive skills typical of the “concrete operational” stage may extend beyond the ages of 10 to 12 years (27). For many young adolescents, misunderstanding of abstract concepts could be lessened by the use of more concrete representations. To assess symptoms of mental disorders in adolescents aged 12 to 16 years, and keeping in mind the limitations of existing instruments, we developed a picture-based screen that would provide concrete representations (28) of abstract DSM constructs. Information- processing theories suggest that combining visual and auditory stimuli allows for better conceptual understanding (26,29–37), so we integrated these sensory modalities. Such integration has already been successful with school-age children (38–41). This paper describes the developmental phase of the Adolescent Dominic (AD).
Methods
The developmental phase of the AD involved 3 stages. Stage I consisted of the creation of pictures corresponding to DSM-III-R diagnostic criteria (42). Stage II verified whether participants adequately understood the content conveyed by the pictures; if they did not, we edited and redrafted the pictures. In Stage III, we evaluated test–retest reliability of the pictorial interview. We obtained institutional review board–approved parental authorization and assent forms for every participant.
Stage I: Creation of the Pictures
Various characters (for example, Dominic and his or her parents, teacher, and peers) were created and shown to a small group of French-speaking adolescents drawn from the community (n = 17). Their comments helped us to select or modify these characters. We drafted 180 pictures based on competency situations and on DSM-III-R diagnostic criteria for attention-deficit hyperactivity disorder (ADHD), oppos- itional defiant disorder (ODD), conduct disorder (CD), major depressive disorder (MDD), overanxious disorder (OAD), separation anxiety disorder (SAD), simple phobia (SPH), and substance use (Figures 1 to 4). We excluded 3 SAD criteria (A3, A4, and A9) that apply more to younger children. The main character, family members, and peers were sex-specific, resulting in a boy and a girl version. Seven situations were slightly adapted for sex. We retained a subset of 113 pictures that corresponded closely to DSM-III-R symptom descriptions (See the Discussion section below for the use of DSM-III-R rather than DSM-IV).
Figure 1 Do you find it difficult to wait your turn, like Dominic? (boys' version)
Figure 2 Are you very scared, like Dominic? (girls' version)
Figure 3 Do you feel good at scholl, Like Dominic? (boys' version)
Figure 4 Do you worry a lot about not having friends, like Dominic? (girls' version)
Stage II: Comprehension Checks
We tested these 113 pictures for comprehension in a sample of adolescents aged 12 to 16 years (n = 114), balanced for age and sex. We drew this sample from 13 French public high schools located in various socioeconomic regions of the Montreal urban area. Six schools were located in lower-middle-class areas, 4 in underprivileged areas, and 3 in middle-class areas. All schools followed the regular academic curriculum; we did not solicit youngsters enrolled in special classes (for example, recent immigrants, those with learning disabilities, and those with physical or cognitive impairment).
We randomly divided pictures illustrating sex-specific versions into 4 booklets, resulting in 8 different booklets. We presented each sex-specific booklet to a subsample of 14 to 17 adolescents. All subsamples were balanced for age. At this stage, the interviewer asked every participant the question, “Could you tell me what is going on in this picture?” without supplying any verbal cue, not even a verbal query that would boost picture comprehension. The interviewer transcribed all respondents’ answers verbatim.
For every respondent, 2 judges (a child psychiatrist and a child psychologist) independently assessed the transcribed responses and decided whether the 113 pictures were “understood,” “more or less understood,” “not understood,” or “missing.” If one judge failed to assess a given picture and respondent, the second judge’s qualification applied. A picture was scored 1.0 when both judges considered it “understood” by a given respondent, 0.5 when only one judge considered it understood, and 0 for all other situations. We calculated a comprehension rate per picture (CRP) by averaging all scores (for example, 1.0, 0.5, and 0) attributed to a picture. A CRP of 0.60 was selected as the criterion for qualifying a picture as understood (CRP 0.60). Pictures with a CRP < 0.60 were edited or redrafted. We performed a second comprehension check on the edited and redrafted pictures, following the same procedures with another sample (n = 40). We recruited at least 4 boys and 4 girls for each age, except for boys aged 15 years (n = 3). Again, pictures not understood (CRP < 0.60) were edited or redrafted.
Stage III: Reliability Study (Test–Retest Stability and Internal Consistency)
Participants. Another sample of adolescents aged 12 to 16 years (n = 128), balanced for age but not for sex (70 girls and 58 boys), was drawn according to the same recruitment procedures and selection criteria. Two lay interviewers conducted the test and retest interviews in a counterbalanced design. Retest took place within an interval of 7 to 13 days (mean interval 9 days). The average duration of each test was approximately half an hour.
Description of the Instrument and Procedures. One hundred and two pictures withstood the comprehension checks and were examined for test–retest reliability. These 102 pictures were organized as follows: 94 pictures assessed 101 symptoms of DSM-III-R criteria (ADHD, 16 pictures; ODD, 13 pictures; CD, 15 pictures; MDD, 20 pictures; OAD, 14 pictures; SAD, 8 pictures; SPH, 10 pictures; and substance use, 5 pictures). The remaining 8 pictures described competency situations and normal behaviour.
In several instances, 2, 3, or 4 pictures illustrated symptoms pertaining to a single diagnostic criterion. For example, 3 pictures represented depressed or irritable mood in MDD. Conversely, 7 pictures assessed symptoms pertaining to more than 1 mental disorder. For example, the picture for ODD “loss of temper” was also used for MDD “irritable mood.” Symptoms and competency situations were randomly mixed to avoid a halo effect.
Adolescents were not asked to elaborate on the pictures, as in projective tests, but to say whether they acted, thought, or felt like the main character (“Dominic”). More specifically, a simple verbal question referring to the symptom or symptom query accompanied every picture. We limited symptom queries to a single concept and used easily understood words. Sentence length rarely exceeded 12 words. Symptom queries were administered in the same structured format to all participants. The interviewer read the question to the adolescent while she or he was looking at the picture. A positive response to the symptom query (for example, “Do you have nightmares, like Dominic?”) triggered a subquestion assessing symptom frequency (based on an event having occurred 6 months prior to the interview) during the past 6 months. For example, the subquestion might be “Since [the event which occurred 6 months ago], have you had frequent nightmares, like Dominic?” Responses to symptom queries and sub- questions were coded 0 (no) or 1 (yes). Assessment of severity was restricted to this contingent question on symptom frequency for ADHD, ODD, CD, OAD, and SAD.
For MDD (20 pictures), a positive answer to the subquestion assessing symptom frequency generated additional questions pertaining to DSM-III-R diagnostic criteria. For 6 pictures assessing criteria A1 (depressed mood) and A2 (loss of interest or pleasure), symptom duration (that is, for 2 weeks or more) and daily occurrence were checked. For 12 pictures assessing criteria A4 to A9, duration, daily occurrence, and cooccurrence with depressed mood or loss of interest or pleasure were checked. With the remaining 2 pictures designed for criterion A3 (weight gain or loss), cooccurrence with depressed mood or loss of interest or pleasure was checked. Only a positive response to the preceding question generated further subquestions.
For SPH (10 pictures), a positive response to the symptom query triggered a question about symptom occurrence during the past 6 months. A further positive response then generated a group of 5 subquestions assessing persistence of fear, invariability and immediacy of the anxiety response, stimulus avoidance, interference with usual social activities, and recognition that fear was excessive. Consequently, assessment of SPH yielded 50 criteria for analysis. Finally, we did not assess all DSM-III-R criteria for substance use (5 pictures): for alcohol and tobacco, we assessed lifetime prevalence and current use; for drug consumption, we assessed lifetime prevalence only.
Statistical Analyses
For MDD and SPH, negative responses to symptom queries or to subquestions on symptom occurrence resulted in missing data for other severity subquestions. To keep the number of observations constant, such missing data were recoded as implied negative responses (43), that is, absence of such symptoms. Symptoms (101 dichotomous variables) were defined by 0 or 1 responses to symptom queries. Symptom scores (8 continuous variables) were obtained by summing the 0 or 1 responses for every symptom query according to diagnostic groupings (ADHD, ODD, CD, MDD, OAD, SAD, SPH, and substance use). Criteria (108 dichotomous variables) were computed from 0 or 1 responses to subquestions. Because we used more than one picture to assess a few symptoms, we combined responses, using the “or” rule. We obtained criterion scores (7 continuous variables) by summing the 0 or 1 coding for each criterion according to diagnostic groupings. We computed approximations of diagnoses (7 dichotomous variables) according to DSM-III-R cut-off points and algorithms.
We used the kappa statistic (44,45) to assess temporal stability of dichotomous
variables. However, obtaining acceptable reliability in community samples
is challenging because of the relatively low prevalence of disorders (17).
Because accuracy of kappa is sensitive to very low or very high prevalence,
we required at least 5 positive and 5 negative responses at test for its
calculation. Also, no calculation was performed in the absence of positive
or negative cases at retest. We used Fleisss criteria
(k < 0.40, poor
reliability; 0.40 £ k < 0.60, fair reliability; 0.60 £ k < 0.75, good reliability;
k ³ 0.75, excellent reliability; 45) to designate the strength of association.
We used the ICC for the reliability of symptom criterion scores over time
(46). Preliminary analyses indicated no significant sex differences in
reliability, so we based reliability analyses on the total sample. We evaluated
internal consistency of symptom scores by using Cronbach alpha coefficients
(47) on responses to the first assessment
Results
Comprehension Checks
The first comprehension check revealed high mean CRP scores for the set of 113 pictures for both girls (mean CRP 0.84) and boys (mean CRP 0.86). Thirteen girls’ pictures (11.3%) and 11 boys’ pictures (9.7%) were not understood according to the selected criterion. Following these results, we edited or redrafted 33 pictures in the girls’ version and 26 pictures in the boys’ version. Most changes involved making the main character (“Dominic”) more conspicuous, rendering emotional expressions more obvious, and changing the sex of peers in the pictures. Irrelevant visual elements were minimized. We added 7 new pictures in the girls’ version and 5 in the boys’ version. These 40 girls’ pictures and 31 boys’ pictures were submitted to the second comprehension check, which revealed high mean CRP scores for both girls’ and boys’ pictures (mean CRPs 0.85 and 0.88, respectively). Mean CRPs were high for newly added pictures (7 girls’ pictures, 0.88; 5 boys’ pictures, 0.93). Only 5 girls’ pictures and 1 boys’ picture were not understood. Following these results, we eliminated 7 pictures, edited 1, and added 2.
Reliability of Symptoms and Symptom Scores
Ninety-four pictures assessing 101 symptoms of DSM-III-R criteria and 8
pictures describing normal behaviour were checked for reliability. Because
there were fewer than 5 positive responses at test, we did not calculate
kappa values for 13 out of 101 symptoms (12.9%). Included were a few pictures
without score variance (2 for SPH and 3 for CD). Of the remaining 88 symptom
queries, only 2 (2%) were poor (k< 0.40); 31 yielded kappa values between
0.40 and 0.59; 29 yielded kappa values between 0.60 and 0.69; and 26 yielded
kappa values equal to or greater than 0.70. Table 2 reports the distribution
of kappa values according to diagnosis. According to Fleisss criteria
(45), most kappa values (55/101, or 54.4%) were good to excellent (k ³
0.60).
Table 2 Distribution of kappa values calculated on the Adolescent Dominic
symptom queries (n = 128).
|
|
Diagnoses
|
Number of
symptom queries (%)
|
< 5 positive cases at test
n (%)
|
k < 0.40
n (%)
|
0.40 £ k < 0.60
n (%)
|
0.60 £ k < 0.70
n (%)
|
k ³ 0.70
n (%)
|
|
|
Attention-deficit hyperactivity disorder
|
16 (100)
|
0 (0)
|
0 (0)
|
6 (37.5)
|
5 (31.3)
|
5 (31.3)
|
|
Oppositional defiant disorder
|
13 (100)
|
0 (0)
|
2 (15.3)
|
4 (30.7)
|
3 (23.1)
|
4 (30.7)
|
|
Conduct disorder
|
15 (100)
|
7 (46.6)
|
0 (0)
|
2 (13.3)
|
3 (20)
|
3 (20)
|
|
Major depressive disorder
|
20 (100)
|
2 (10)
|
0 (0)
|
6 (30)
|
8 (40)
|
4 (20)
|
|
Separation anxiety disorder
|
8 (100)
|
0 (0)
|
0 (0)
|
6 (75)
|
1 (12.5)
|
1 (12.5)
|
|
Overanxious disorder
|
14 (100)
|
0 (0)
|
0 (0)
|
5 (35.7)
|
8 (57.1)
|
1 (7.1)
|
|
Simple phobia
|
10 (100)
|
3 (30)
|
0 (0)
|
1 (10)
|
1 (10)
|
5 (50)
|
|
Substance use
|
5 (100)
|
1 (20)
|
0 (0)
|
1 (20)
|
0 (0)
|
3 (60)
|
|
Total
|
101 (100)
|
13 (12.9)
|
2 (2)
|
31 (30.7)
|
29 (28.7)
|
26 (25.7)
|
k = kappa values; n = number of symptom queries
Fleisss criteria: k < 0.40, poor reliability; 0.40 £ k < 0.60, fair reliability;
0.60 £ k <.75, good reliability; k ³ 0.75, excellent reliability.
|
As for reliability of symptom scores (Table 3), ICCs ranged from 0.81 (OAD)
to 0.89 (substance use) and were all significant at the P < 0.05 level.
Cronbach alpha coefficients ranged from 0.52 (substance use) to 0.83 (ODD).
A low number of items and low prevalence negatively affected the alpha
coefficients for substance use, CD, and SPH. Also, substance use was less
clearly a one-dimensional scale.
Table 3 Testretest reliability of the Adolescent Dominic (n = 128)
|
|
|
Number of cases
|
Symptom scores
|
Criterion scores
|
Diagnoses
|
|
|
+/+
|
+/
|
-/+
|
-/-
|
n
|
a
|
ICC
|
95%CI
|
ICC
|
95%CI
|
k (SE)
|
95%CI
|
|
|
ADHDa
|
07
|
04
|
00
|
101
|
112
|
0.82a
|
0.86 a
|
0.81 to 0.90
|
0.78b
|
0.68 to 0.86
|
0.76 a (0.12)
|
0.53 to 0.98
|
|
ODD
|
12
|
05
|
04
|
107
|
128
|
0.83
|
0.87
|
0.82 to 0.91
|
0.86
|
0.81 to 0.90
|
0.69 (0.10)
|
0.50 to 0.88)
|
|
CD
|
01
|
00
|
02
|
124
|
127
|
0.62
|
0.84
|
0.78 to 0.88
|
0.67
|
0.56 to 0.75
|
c
|
|
|
MDD
|
00
|
06
|
00
|
122
|
128
|
0.81
|
0.87
|
0.81 to 0.90
|
0.51
|
0.37 to 0.63
|
d
|
|
|
SAD
|
08
|
05
|
07
|
108
|
128
|
0.76
|
0.82
|
0.75 to 0.87
|
0.73
|
0.64 to 0.80
|
0.52 (0.12)
|
0.28 to 0.76
|
|
OAD
|
34
|
21
|
02
|
71
|
128
|
0.78
|
0.81
|
0.74 to 0.86
|
0.80
|
0.73 to 0.86
|
0.62 (0.07)
|
0.48 to 0.75
|
|
SPH
|
06
|
07
|
02
|
113
|
128
|
0.63
|
0.85
|
0.79 to 0.89
|
0.80
|
0.72 to 0.85
|
0.54 (0.14)
|
0.27 to 0.80
|
|
Substance use
|
|
|
|
|
|
0.52
|
0.89
|
0.85 to 0.92
|
|
|
|
|
+/+ = positive on test/positive on retest; +/ = positive on test/negative
on retest; /+ = negative on test/positive on retest; / = negative on
test/negative on retest
ADHD = attention-deficit hyperactivity disorder; CD = conduct disorder;
MDD = major depressive disorder; SAD = separation anxiety disorder; SPH
= simple phobia; OAD = overanxious disorder; ODD = oppositional defiant
disorder
a = alpha coefficients computed on symptom scores at test; ICC = intraclass
correlation coefficient; k (SE) = kappa value (standard error);
an = 112, owing to a mistake by an interviewer; bn = 77, owing to a mistake
by an interviewer; cFewer than 5 positive cases at test; dNo positive cases
at retest
|
A comparison of symptom score ICCs by age revealed no significant differences
(subjects aged 12 to 14 years, mean ICC 0.84, range 0.72 to 0.96; subjects
aged 15 to 16 years, mean ICC 0.85, range 0.76 to 0.93). With regard to
the 7 pictures adapted for sex, kappas were comparable for both sexes.
Reliability of Criteria, Criterion Scores, and Diagnosis Approximations
Owing to low prevalence, kappas could not be calculated for 12 of 58 criteria
(20.7%). However, of the remaining 46 criteria, 19 yielded kappa values
between 0.40 and 0.59, 17 yielded kappa values between 0.60 and 0.69; and
3 yielded kappa values equal to or greater than 0.70. According to Fleisss
criteria, 20 kappa values (34.5%) were good to excellent (k ³ 0.60), while
only 7 (12%) were poor k < 0.40).
For simple phobia, 26 kappas out of 50 (52%) could not be calculated. Of
the remaining 24, 8 kappa values were between 0.40 and 0.59; 6 were between
0.60 and 0.69; and 7 were equal to or greater than 0.70. Only 3 were less
than 0.40. Thus, according to Fleisss criteria, 13 kappas (26%) were good
to excellent ( k ³ 0.60), while only 3 (6%) were poor ( k <
0.40).
As for criterion scores (Table 3), ICCs ranged from 0.51 for MDD to 0.86 for ODD. Kappa values for diagnosis approximations (Table 3) ranged from 0.52 for SAD to 0.76 for ADHD; they could not be calculated for CDs or MDDs.
Finally, Table 4 compares reliability results between the 20 MDD symptom queries, its 73 severity subquestions, and its 9 diagnostic criteria. While only 10% of symptom queries (2/20) elicited fewer than 5 positive responses at test, nearly one-third of the severity subquestions (22/73) and diagnostic criteria (3/9) did not yield enough positive responses to allow an estimation of kappa. Stability was at least moderate ( k ³ 0.50) for 90% of symptom queries, but it was at least moderate for only 37% of severity subquestions and 22% of diagnostic criteria.
Table 4 Distribution of kappa (k) values calculated on the Adolescent
Dominic major depressive disorder (MDD) symptom queries, severity subquestions,
and diagnostic criteria (n = 128)
|
|
MDD
|
Number
(%)
|
< 5 positive cases at test (%)
|
k < 0.40
(%)
|
0.40 £ k < 0.50
(%)
|
0.50 £ k< 0.60
(%)
|
0.60 £ k <0.70
(%)
|
0.70 £ k < 0.80
(%)
|
0.80 £ k
(%)
|
|
|
Symptom queries
|
20 (100)
|
2 (10)
|
0 (0)
|
0 (0)
|
6 (30)
|
8 (40)
|
2 (10)
|
2 (10)
|
|
Severity subquestions
|
73 (100)
|
22 (30.1)
|
7 (9.6)
|
17 (23.3)
|
17 (23.3)
|
5 (6.8)
|
4 (5.5)
|
1 (1.4)
|
|
Diagnostic criteria
|
9 (100)
|
3 (33.3)
|
3 (33.3)
|
1 (11)
|
0 (0)
|
2 (22)
|
0 (0)
|
0 (0)
|
Fleisss criteria: k < 0.40, poor reliability; 0.40 £ k< 0.60, fair reliability;
0.60 £ k 0.75, good reliability; k ³ 0.75, excellent reliability
|
Discussion
To our knowledge, this is the first reliability report on a DSM-based pictorial interview for the assessment of mental disorders in adolescents in the community. The high reliability observed, particularly with symptom score ICCs of 0.81 to 0.89, suggests that adolescents may benefit from a pictorial approach combined with direct and simple symptom queries. Analyses according to age show that ICCs obtained from younger adolescents (aged 12 to 14 years) are as high as those obtained from older adolescents (aged 15 to 16 years). Pictures displaying concrete examples of abstract concepts seem to support the production of stable responses to symptom queries, even from the youngest respondents in the sample.
Comparison With Other DSM-Based Instruments
Regarding disorders for which we obtained sufficient cases, kappa values for AD diagnoses compare favourably with results from the DICA-R (18), and various versions of the DISC-2 (16,19–21) (Table 1), except for SAD. However, reliability may appear misleadingly low when applied to categorical diagnoses (49). A change in a single response can bring the case to above or below the diagnostic threshold. The use of ICCs to assess reliability of symptom and criterion scores incorporates the whole distribution of responses and therefore typically yields higher reliability estimates than do the kappa statistics for individual items.
The reliability of AD symptom scores (that is, ICCs of 0.81 to 0.89) compares favourably with DISC-2 symptom scores administered to youngsters in the community. In a study using the DISC-2.3 with informants aged 9 to 18 years, symptom-score ICCs ranged from 0.11 (panic disorder) to 0.83 (CD), with average ICCs of 0.40 for anxiety disorders, 0.52 for depressive disorders, and 0.70 for disruptive behaviour disorders (49,50). A study of the DISC-2.25 with informants aged 12 to 14 years yielded ICCs of 0.71 to 0.84 (21).
The AD shares several important features with the Dominic-R Questionnaire for school-aged children (38). Both are pictorial, DSM-based structured interviews that assess a comparable set of mental disorders. Reliability of AD symptom scores compares favourably with the Dominic Questionnaire (ICCs of 0.59 to 0.74) (38), the Dominic-R (0.71 to 0.81) (39), and the African-American version of the Dominic-R (ICCs > 0.75) (41).
In contrast with the Dominic version for younger children (38), the AD comprises a language-based component for assessing symptom severity. Results show that, while reliability at the symptom level is good, it is much less so for diagnostic criteria and diagnoses. Extensive verbal queries following initial positive responses to the pictures mean that assessment of the diagnostic criteria may require skills and interactions similar to those required in purely verbal interviews. Symptom frequency, duration, and cooccurrence—all variables necessary for diagnostic assessment—are less amenable to graphic representation. Elaborate, sentence-based content investigating time-related and other severity criteria may thus reduce reliability to levels achieved by highly structured instruments that rely on verbal questioning; ultimately, it may gain little for diagnostic assessment. At this stage, however, we could not easily disentangle the relative impact of low prevalence and verbal questioning on reliability loss.
The low prevalence of mental disorders in community adolescent samples is a limitation in this type of study—a problem that has confronted many researchers (20,23,49). Nonetheless, we decided not to use a clinically enriched sample for the developmental phase of the AD, because such sampling strategies increase sample heterogeneity and result in higher kappas than would otherwise occur (16). Finally, because an instrument’s reliability is usually the upper limit of its validity and because no gold standard exists for assessing criterion validity, comparing the AD with other measures of mental disorders for validation purposes would be questionable at this early stage.
To capitalize on the satisfactory reliability results obtained with the AD—especially for internalized disorders—it would be methodologically sound to focus on what this pictorial instrument does best; that is, it allows youngsters to reliably assess symptoms themselves. The AD would then become a DSM-based, standardized, user friendly screen for front-line service providers. These characteristics could be considered a fair trade-off for the instrument’s limited capacity to yield reliable diagnoses. For the AD, clinical cut-off points similar to the symptom-loading approach adopted by such rating scales as the Youth Self-Report (51) will have to be determined in criterion and discriminant validity studies. However, unlike many psychopathology rating scales (52), its basis in DSM diagnostic criteria assures construct validity.
The DSM-IV and the AD
During these preliminary studies of the AD (from 1993 to 1995), the fourth edition of the DSM was in preparation (53). A comparison of the DSM-III-R and the DSM-IV at the symptom level shows that the DSM-IV introduced changes mainly for ADHD, but less so for disorders such as ODD, CD, MDD, generalized anxiety disorder (formerly OAD) and SAD. The necessary updating of the AD according to DSM-IV criteria has been undertaken, and fortunately, almost all pictures have been reused. More important, the instrument has been redesigned for the screening of symptoms. A computerized DSM-IV French version of the AD is currently undergoing field tests with clinical and community samples for reliability and criterion validity, and an English version will be validated as the next step.
Funding and Support
This study was supported by the Fonds de la Recherche en Santé du Québec through a grant (930685-104) awarded to Dr Valla and Dr Bergeron.
Acknowledgement
Patrick Bolland provided helpful translation assistance.
References
1. Achenbach TM, McConaughy SH, Howell CT. Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychol Bull 1987;101:213–32.
2. Graham P, Rutter M. The reliability and validity of the psychiatric assessment of the child. II. Interview with the parent. Br J Psychiatry 1968;114:581–92.
3. Herjanic B, Herjanic M, Brown F, Wheatt F. Are children reliable reporters?
J Abnorm Child Psychol 1975;3:41–8.
4. Cytryn L, McKnew DH, Bunney, WE. Diagnosis of depression in children: a reassessment. Am J Psychiatry 1980;137:22–5.
5. Herjanic B, Reich W. Development of a structured psychiatric interview for children: agreement between child and parent on individual symptoms. J Abnorm Child Psychol 1982;10:307–24.
6. Orvaschel H, Puig-Antich J, Chambers W, Tabrizi MA, Johnson R. Retrospective assessment of prepubertal major depression with the Kiddie-Sads-E. J Am Acad Child Adolesc Psychiatry 1982;21:392–7.
7. Moretti MM, Fine S, Haley G, Marriage K. Childhood and adolescent depression: child-report versus parent-report information. J Am Acad Child Adolesc Psychiatry 1985;24:298–302.
8. Edelbrock C, Costello AJ, Dulcan MK, Calabro-Conover N, Kalas R. Parent-child agreement on child psychiatric symptoms assessed via structured interview. J Child Psychol Psychiatry 1986;27:181–90.
9. Weissman M, Merikangas KR, Gammon GD. Assessing psychiatric disorders in children. Discrepancies between mothers’ and children’s reports. Arch Gen Psychiatry 1987;44:747–53.
10. Costello EJ, Costello AJ, Edelbrock C, Burns BJ, Dulcan MK, Brent D, and others. Psychiatric disorders in pediatric primary care. Prevalence and risk factors. Arch Gen Psychiatry 1988;45:1107–16.
11. Bird HR, Gould, MS, Staghezza B. Aggregating data from multiple informants in child psychiatry epidemiological research. J Am Acad Child Adolesc Psychiatry 1992;31:78–85.
12. Kazdin AE. Informant variability in the assessment of childhood depression. In: Reynolds WM, Johnston HF, editors. Handbook of depression in children and adolescents. New York: Plenum Press; 1994. p 249–71.
13. Seiffge-Krenke I, Kollmar F. Discrepancies between mothers’ and fathers’ perceptions of sons’ and daughters’ problem behaviour: a longitudinal analysis of parent-adolescent agreement on internalising and externalising problem behaviour. J Child Psychol Psychiatry 1998;39:687–97.
14. Youngstrom E, Loeber R, Stouthamer-Loeber M. Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. J Consult Clin Psychol 2000;68:1038–50.
15. Hoagwood K, Olin S. The NIMH blueprint for change report: research priorities in child and adolescent mental health. J Am Acad Child Adolesc Psychiatry 2000;41:760–7.
16. Jensen P, Roper M, Fisher P, Piacentini J, Canino G, Richters J, and others. Test–retest reliability of the Diagnostic Interview Schedule for Children (DISC 2.1). Arch Gen Psychiatry 1995;52:61–71.
17. Shrout PE, Spitzer RL, Fleiss JL. Quantification of agreement in psychiatric diagnosis revisited. Arch Gen Psychiatry 1987;44:172–7.
18. Boyle MH, Offord DR, Racine Y, Sanford M, Szatmari P, Fleming JE, Price-Munn N. Evaluation of the Diagnostic Interview for Children and Adolescents for use in general population samples. J Abnorm Child Psychol 1993;21:663–81.
19. Ribera JC, Canino G, Rubio-Stipec M, Bravo M, Bauermeister JJ, Alegria M, and others. The Diagnostic Interview Schedule for Children (DISC-2.1) in Spanish: reliability in a hispanic population. J Child Psychol Psychiatry 1996;37:195–204.
20. Schwab-Stone ME, Shaffer D, Dulcan MK, Jensen PS, Fisher P, Bird HR, and others. Criterion validity of the NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3). J Am Acad Child Adolesc Psychiatry 1996;35:878–88.
21. Breton JJ, Bergeron L, Valla JP, Berthiaume C, St-Georges M. Diagnostic Interview Schedule for Children (DISC-2.25) in Quebec: reliability findings in light of the MECA study. J Am Acad Child Adolesc Psychiatry 1998;37:1167–74.
22. Edelbrock C, Costello AJ, Dulcan MK, Kalas R, Calabro-Conover NC. Age differences in the reliability of the psychiatric interview of the child. Child Dev 1985;56:265–75.
23. Schwab-Stone M, Fallon T, Briggs M, Crowther B. Reliability of diagnostic reporting for children aged 6–11 years: a test–retest study of the Diagnostic Interview Schedule for Children-Revised. Am J of Psychiatry 1994;151:1048–54.
24. Breton JJ, Bergeron L, Valla, JP, Lépine S, Houde L, Gaudet N. Do children aged 9 through 11 years understand the DISC Version 2.25 questions ? J Am Acad Child Adolesc Psychiatry 1995;34:946–54.
25. Yates T. Theories of Cognitive Development. In: Lewis M, editor. Child and adolescent psychiatry. Baltimore (MD): Williams and Wilkins; 1990. p 109–29.
26. Anthony BJ. Cognitive Development in Adolescence. In: Flaherty LT, Sarles RM, editors. Handbook of child and adolescent psychiatry. Vol 3, Adolescence: development and syndromes. New York: Wiley & Sons; 1997.
p 65–78.
27. Butcher HJ. Human Intelligence. Its nature and assessment. New York: Harper & Row; 1972.
28. Harter S, Pike R. The Pictorial Scale of Perceived Competence and Social Acceptance for Young Children. Child Dev 1984;55:1969–82.
29. Gaddès WH. Can educational psychology be neurologized? Can J Behav Sci 1969;1:38–49.
30. Luria AR. The working brain. New York: Allen Lane, Penguin; 1973.
31. Shiffrin RM, Schneider W. Controlled and automatic human information processing. II. Perceptual learning, automatic attending and a general theory. Psychol Rev 1977;84:127–90.
32. Frostig M, Maslow P. Neuropsychological contributions to education. J Learn Disabil 1979;12:40–54.
33. Bereiter C. Aspects of an educational learning theory. Review of Educational Research 1990;60:603–24.
34. Goldberg E. Contemporary neuropsychology and the legacy of Luria. Mahwah (NJ): Lawrence Erlbaum Associates; 1990.
35. Iran-Nejada A. Active and dynamic self-regulation of learning processes. Review of Educational Research 1990;60:573–602.
36. Caramazza A. Is cognitive neuropsychology possible? J Cogn Neurosci 1992;4:80–95.
37. Kosslyn SM, Intriligator JM. Is cognitive neuropsychology plausible? The perils of sitting on a one-legged stool. J Cogn Neurosci 1992;4:96–106.
38. Valla JP, Bergeron L, Bérubé H, Gaudet N, St-Georges M. A structured pictorial questionnaire to assess DSM-III-R-based diagnoses in children (6–11 years): development, validity and reliability. J Abnorm Child Psychol 1994;22:403–23.
39. Valla JP, Bergeron L, Bidaut-Russell M, St-Georges M, Gaudet N. Reliability of the Dominic-R: a young child mental health questionnaire combining visual and auditory stimuli. J Child Psychol Psychiatry 1997;38:717–24.
40. Valla JP, Bergeron L, Smolla N. The Dominic-R: a pictorial interview for 6- to 11-year-old children. J Am Acad Child Adolesc Psychiatry 2000;39:85–93.
41. Bidaut-Russell M, Valla JP, Thomas JM, Bergeron L, Lawson E. Reliability of the Terry: a mental health cartoon-like screener for African-American children. Child Psychiatry Hum Dev 1998;28:249–63.
42. American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-III-R, revised. 3rd ed. Washington (DC): American Psychiatric Association; 1987.
43. Lucas PC, Fisher P, Piacentini J, Zhang H, Jensen PS, Shaffer D, and others. Features of interview questions associated with attenuation of symptoms reports. J Abnorm Child Psychol 1999;27:429–37.
44. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960;20:37–46.
45. Fleiss JL. Statistical methods for rates and proportions. New York: John Wiley & Sons; 1981.
46. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1997;86:420–8.
47. Cronbach L. Essentials of psychological testing. 3rd ed. New York: Harper; 1951.
48. Anastasi A. Psychological testing. 6th ed. New York: McMillan Publishing Company; 1988.
49. Shaffer D, Fisher P, Lucas CP, Dulcan MK, Schwab-Stone ME. NIMH Diagnostic Interview Schedule for Children Version IV (NIMH DISC-IV): description, differences from previous versions, and reliability of some common diagnoses. J Am Acad Child Adolesc Psychiatry 2000;39:28–38.
50. Shaffer D, Fisher P, Dulcan MK, Davies M, Piacentini J, Schwab-Stone ME, and others. The NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3): description, acceptability, prevalence rates, and performance in the MECA study. J Am Acad Child Adolesc Psychiatry 1996;35:865–77.
51. Achenbach TM, Edelbrock C. Manual for the Youth Self-Report and Profile. Burlington (VT): University of Vermont Department of Psychiatry; 1991.
52. Myers K, Winters NC. Ten-year review of rating scales. II. Scales for internalizing disorders. J Am Acad Child Adolesc Psychiatry 2002;41:634–59.
53. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4th ed. Washington (DC): American Psychiatric Association; 1994.
Author(s)
Manuscript received May 2003, revised, and accepted January 2004
1Researcher, Research Unit, Rivière-des-Prairies Hospital; Associate Professor, Department of Psychology, Université du Québec à Montréal, Montreal, Quebec.
2Researcher, Research Unit, Rivière-des-Prairies Hospital; Clinical Professor, Department of Psychiatry, Université de Montréal, Montreal, Quebec.
3Researcher, Research Unit, Rivière-des-Prairies Hospital; Researcher, Department of Psychiatry, Université de Montréal, Montreal, Quebec.
4Staff members, Research Unit, Rivière-des-Prairies Hospital, Montreal, Quebec.
Address for correspondence: Dr N Smolla, Research Unit, Rivière-des- Prairies Hospital, 7070 Perras Boulevard, Montreal, QC, H1E 1A4
e-mail: nicole.smolla.hrdp@SSSS.gouv.qc.ca
1 | 2