![]() |
|
![]() Structured diagnostic interviews embedded in health surveys are ubiquitous in psychiatric research, and it is difficult to overstate their impact. Our fundamental understanding of the prevalence and burden of psychiatric disorders relies on data from structured diagnostic interviews administered by laypersons to survey respondents. The most influential diagnostic population surveys have been conducted in the US (1,2). With the release of data from the CCHS Cycle 1.2, researchers can access information on several psychiatric disorders from a nationally representative sample of Canadians residing in households (see www.statcan.ca/english/freepub/82-617-XIE/). To identify persons with one or more psychiatric disorders, Statistics Canada used a modified version of the CIDI, a structured diagnostic interview designed to be administered by trained laypersons and to generate valid DSM-IV diagnoses (3). This particular version was developed as part of the World Mental Health Survey Initiative to provide a validated instrument for use in multiple languages and cultures (4). We conducted a critical review of the CIDI, focusing on the Depression module. Our primary purpose was to evaluate the evidence supporting the CIDI’s validity—that is, the extent to which the depression diagnoses generated by the CIDI reflect true cases of depression. We also comment on the CIDI’s development and on the evidence pertaining to its reliability and validity. We conclude by considering the implications of our review for policy and epidemiologic research. History and DevelopmentThe CIDI is an amalgamation of 2 preexisting instruments— the DIS of the National Institute of Mental Health and the PSE. Both predecessors of the CIDI were developed by groups of mental health experts. The DIS generates a diagnosis for depression, based on DSM-III criteria. An advisory committee of US diagnostic instrument experts, including the chair of the task force that created the DSM-III, selected the items in the DIS directly from DSM-III criteria (5). This mode of question–item generation reflects the purpose of the CIDI, which is to generate DSM-IV diagnoses in a format for administration by a trained lay interviewer. The PSE was designed as a guide to structuring a clinical interview, with the goal of assessing the “present mental state” of adult patients (6). Like the DIS, the PSE was created by a group of mental health experts, and the instrument items were chosen by a small group of physicians on the basis of their clinical practice and teaching. The task force responsible for developing the CIDI simply used the items of the DIS as a template, added any items of the PSE with no DIS overlap, and created items for the CIDI. In the first draft, an expert committee in psychiatric nomenclature wrote questions that could be administered by lay interviewers and understood by an adult resident in the community (3). After numerous revisions, the CIDI was pretested in a sample of primary care and psychiatric outpatients. The developers chose a clinical sample rather than a community sample (for which the CIDI was intended) because they wanted to have a larger number of respondents with disorders to test the instrument’s properties. The prevalence of psychiatric diagnoses in primary care patients and psychiatric out- patients is higher than the prevalence in a community sample, thereby giving a higher “signal-to-noise” ratio and providing a better opportunity to determine the instrument’s capacity to distinguish cases from noncases. However, the instrument’s performance in this setting is not an accurate reflection of its performance in a community setting (see Validity section below). After its initial development, the CIDI was chosen as the psychiatric diagnostic instrument to be used in the US National Comorbidity Study. To prepare the CIDI for use in this study, cognitive testing and debriefing techniques were used to ascertain problematic questions. Four types were identified and addressed: questions that were too complex, questions with vague definitions, questions that could be interpreted in more than one way (for example, seeing or hearing things others do not), and a question with contextual misunderstanding (that is, the misunderstanding had more to do with the question’s placement within the interview than with the clarity of the specific question) (7). Question comprehension is one area where cognitive survey techniques have been helpful. Cognitive testing also addresses respondents’ understanding of the questionnaire’s context and format and their motivation to perform the task of answering questions (8). Addressing motivation and context are particularly important for persons suffering from mental illness because they likely have greater cognitive and (or) motivational difficulties than do persons without mental illness (9). ReliabilityThe CIDI has been the subject of several reliability studies. The purpose of the CIDI Depression module is to generate a cross-sectional diagnosis of depression that coincides with the diagnosis that a clinician following the criteria of the DSM-IV would make. Because the CIDI was designed to generate diagnoses based on a single administration, most reliability studies used either test–retest or interrater reliability. The time between test and retest in the reliability studies varied from less than a week to 4 weeks (4). In the initial development phase, the CIDI performed reasonably well. The first field test involved 575 subjects in different sites, and the interrater reliability for any depressive disorder had a kappa statistic of 0.95. Interestingly, during the same study period, the test–retest reliability for 60 of the 575 respondents yielded a kappa of only 0.67 for depressive disorders (10). Subsequent reliability studies had smaller sample sizes and used slightly different versions of the CIDI, reflecting the instrument’s repeated modifications. In general, the studies all demonstrated a high interrater reliability, as would be expected with a highly structured interview (4,11). A few problems accompany interpreting the reliability studies of the CIDI. One major issue is that the CIDI has been extensively modified since its inception. Reliability measures reported from one version are not necessarily generalizable to another. Similarly, CIDI reliability studies sometimes report on diagnostic modules other than the one for depression and may not report specifically on the Depression module. Reliability estimates from one module cannot be generalized to another. In summary, the reliability studies indicate that the CIDI performs reliably, as measured by interrater reliability. However, the use of different versions of the CIDI and the occasional exclusion of the Depression module suggest that the reliability of the CIDI Depression module remains unconfirmed. ValidityThe task proposed for the CIDI is daunting: to generate psychiatric diagnoses that are similar to those a specialist clinician would make under ideal circumstances with unlimited assessment time (the gold standard), using a format that can be administered by a lay interviewer and understood by laypersons in the community. Using clinician assessment as a validating gold standard is problematic because clinician diagnoses are highly unreliable. Reasons for this unreliability include idiosyncrasies in the formulation of initial hypotheses, ways of asking about symptoms or illness experience, and methods of weighting the information obtained (11). To circumvent the problem of poor clinician assessment reliability, validity studies involving the CIDI Depression module have compared the CIDI diagnostic results with those from clinicians using DSM-III-R criteria checklists (12) and from a semistructured diagnostic interview (13). Semistructured assessment tools differ from completely structured assessment tools (such as the CIDI) in that they allow the interviewer some latitude in interpreting responses. The CIDI differs from an expert clinical assessment in several ways. Whereas the CIDI systematically screens for every diagnosis included in the assessment tool, a clinician typically focuses on a particular diagnosis and does not systematically screen. Clinicians usually consider comorbid conditions only insofar as they have an impact on the primary diagnosis. Another difference is that most clinicians operate within a finite period of time. In contrast, the CIDI is not time-limited: the greater the number of diagnoses, the longer the survey will take to complete. The CIDI is more likely to generate multiple diagnoses because of its structure and its unbounded duration. With these differences in mind, we found 3 validity studies of the CIDI Depression module (12–14). One of these studies used a clinician-scored DSM-III-R symptom checklist as the gold standard (12). The DSM-III-R checklist is a semistructured diagnostic instrument used by clinicians as a DSM-III-R assessment guide. Compared with this gold standard checklist, the CIDI had a “diagnostic concordance” sensitivity of 85%, a specificity of 98%, and a kappa of 0.84 (12). DSM-III-R checklists completed by clinicians are a good gold standard to test the CIDI—they allow for clinical judgment but provide a framework within which the clinician exercises such judgment. Nonetheless, the checklist study was problematic for 3 reasons. First, the validation sample was small (n = 20). Second, the sample was drawn from a primary care and general psychiatric outpatient setting that differs substantially from the community setting in which the CIDI is typically used. Finally, the clinicians who were present during the CIDI interview sometimes became the interviewers who administered the CIDI. It is quite likely that the observation or, worse, the administration of the CIDI biased the checklist scoring. Thus the high sensitivity and specificity may partially reflect the bias of the interviewers (12). The other validation studies used the SCAN, a semistructured diagnostic interview administered by clinicians, as a comparison (13,14). It is difficult to draw conclusions from these studies because the SCAN itself has not been properly validated. However, for depression, the agreement between the 2 instruments was moderate, with a kappa of 0.39 for current depression diagnoses in one study (13) and with sensitivity and specificity of 0.5 and 0.9, respectively, in the other study (14). Although not a CIDI validation study per se, the study by Tiemens and colleagues provides insight into how the CIDI performs (15). In this study, family physician diagnostic performance was compared with the CIDI Depression module, with the CIDI being the gold standard. The total percentage agreement about depression status was 86.6%, but the kappa agreement statistic was quite modest (k = 0.29). The paper reported a high proportion of false negatives (because the CIDI was the gold standard, a false negative meant that the CIDI measured depression, whereas the clinician did not) and false positives (the CIDI did not measure depression, whereas the family physician did). Many of the false-negative cases missed by family physicians were patients with depression for whom family physicians either underestimated the severity of their depression or made a different diagnosis. That is, these false-negative cases were “near misses” rather than completely missed cases. The false-negative patients were younger, were more likely to be employed, were feeling healthier, and used less health care, compared with true positives (patients defined as “depressed” by both the CIDI and the family physicians) (15). The false-negative patients did not appear to be ill to the physicians, despite meeting CIDI criteria for depression. The tendency for family physicians to miss CIDI depression cases on the basis of decreased severity and relatively preserved functioning has implications for the conclusions one can draw from CIDI depression diagnoses regarding health service use. It is a matter of debate whether relatively highly functioning patients with mild depression should be measured with equal weight as more impaired depression patients in analyses intended to measure service need. In our opinion, the most critical issue regarding the CIDI’s performance involves the samples used to test its validity. The CIDI has been validated only with clinical samples, never in the general community setting for which the instrument was intended (7,11). Clinical populations were used because testing validity in a general community setting, where psychiatric disorder prevalence is much lower, would require much larger sample sizes and incur far greater costs. However, it is well known that the prevalence of a disorder can have a significant impact on the estimated positive and negative predictive value of a diagnostic test (16). The PPV is the probability that a person with a positive score on a diagnostic test is a true case. To illustrate the impact of prevalence on PPV, we compare the PPV of a test with 80% sensitivity and specificity in 2 different samples (Table 1). One sample has a 5% prevalence of the disease, and the other sample has a 10% prevalence. We chose the 5% prevalence to approximate the true prevalence of depression in a community sample, whereas we chose 10% as a conservative estimate of the prevalence of depression in clinical samples. As shown in Table 1, the impact of prevalence on the estimated ability of a test to detect true positives is profound. We conclude that the CIDI does not perform nearly as well in community samples as in validation studies, given that the validation samples used to date had substantially higher prevalence of mental disorders than could reasonably be expected in the community. As Andrews and Peters have highlighted (11), there are other examples of psychiatric diagnostic instruments validated in a clinical sample that have lower PPVs when administered in a community setting (17).
DiscussionThe CIDI is a diagnostic instrument designed for administration by trained lay interviewers to generate current and lifetime DSM-IV diagnoses. Although the CIDI has been revised extensively since its inception, it is important to understand its comparative strengths and weaknesses. The CIDI is a questionnare based on a structured interview; it has very good interrater reliability, even though the Depression module in its current form has not been rigorously assessed. The refinements made through the use of cognitive survey techniques have likely resulted in a questionnaire that respondents can comprehend and complete. The ability to capture comorbid diagnoses that a clinician would not necessarily capture probably explains some of the inflated prevalence rates for which the CIDI has been criticized (11). Capturing comorbid disorders is potentially valuable in health services research and policy planning, since it allows for a more complete analysis of the burden of symptoms and disorders in the community. These strengths must be tempered by an appreciation of the CIDI’s weaknesses. A major flaw in its development was the failure to validate it in a community sample—the setting for which it was always intended. As described above, validation results from studies using clinical samples, in which the depression prevalence rate is at least twice the rate in a community sample, produce high rates inflated by false positives. Unlike the increased detection of comorbid disorders, however, the inflated prevalence due to the instrument’s poor PPV has no obvious advantages and remains problematic for health service researchers and policy-makers alike. Until a well-designed validation study is conducted in a community setting, researchers and policy-makers will not know to what degree the CIDI generates false-positive cases of depression. Further, it is quite possible that the CIDI’s performance in a community-based validation study will be disappointing, compared with its performance in the validation studies to date. With all this in mind, how can results based on CIDI diagnoses be interpreted? The most critical issue arises from case rates inflated by false positives. For example, studies based on CCHS 1.2 data may conclude that there is a persistent unmet need for depression treatment within the broad Canadian community. Low treatment rates for depression have been a consistent finding in other health services research (18–20). However, without proper validation of the CIDI, the truth about this unmet need will remain unknown. Respondents falsely classified as suffering from clinical depression may not require the same extent and intensity of mental health services. Given the widespread application of the CIDI internationally, addressing the outstanding concerns about validity with proper validation studies should become an international priority. Funding and SupportDr Kurdyak is supported by a fellowship award from the CIHR, Rx&D, AstraZeneca, and the Canadian Psychiatric Research Foundation. He is also a fellow in Research in Addictions and Mental Health Policy and Services, a CIHR strategic training program. References1. Regier DA, Narrow WE, Rae DS, Manderscheid RW, Locke BZ, Goodwin FK. The de facto US mental and addictive disorders service system. Epidemiologic catchment area prospective 1-year prevalence rates of disorders and services. Arch Gen Psychiatry 1993;50:85–94. 2. Kessler RC, McGonagle KA, Zhao S, Nelson CB, Hughes M, Eshleman S, and others. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States. Results from the National Comorbidity Survey. Arch Gen Psychiatry 1994;51:8–19. 3. Robins LN, Wing J, Wittchen HU, Helzer JE, Babor TF, Burke J, and others. The Composite International Diagnostic Interview. An epidemiologic instrument suitable for use in conjunction with different diagnostic systems and in different cultures. Arch Gen Psychiatry 1988;45:1069–77. 4. Wittchen HU. Reliability and validity studies of the WHO-Composite International Diagnostic Interview (CIDI): a critical review. J Psychiatr Res 1994;28:57–84. 5. Robins LN, Helzer JE, Croughan J, Ratcliff KS. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry 1981;38:381–9. 6. Wing JK, Cooper JE, Sartorius N. Measurement and classification of psychiatric symptoms. London (UK): Cambridge University Press; 1974. 7. Kessler RC, Ustun TB. The World Mental Health (WMH) Survey Initiative version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). Int J Methods Psychiatr Res 2004;13:93–121. 8. Biemer P, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement errors in surveys. New York (NY): John Wiley & Sons, Inc; 1991. 9. Wittchen H-U, Ustun TB, Kessler RC. Diagnosing mental disorders in the community. A difference that matters? Psychol Med 1999;1021–7. 10. Wittchen HU, Robins LN, Cottler LB, Sartorius N, Burke JD, Regier D. Cross-cultural feasibility, reliability and sources of variance of the Composite International Diagnostic Interview (CIDI). The multicentre WHO/ADAMHA field trials. Br J Psychiatry 1991;159:645–53, 658. 11. Andrews G, Peters L. The psychometric properties of the Composite International Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol 1998;33:80–8. 12. Janca A, Robins LN, Bucholz KK, Early TS, Shayka JJ. Comparison of Composite International Diagnostic Interview and clinical DSM-III-R criteria checklist diagnoses. Acta Psychiatr Scand 1992;85:440–3. 13. Andrews G, Peters L, Guzman A-M, Bird H. A comparison of two structured diagnostic interviews: CIDI and SCAN. Aust N Z J Psychiatry 1995;29:124–32. 14. Brugha TS, Jenkins R, Taub N, Meltzer H, Bebbington PE. A general population comparison of the Composite International Diagnostic Interview (CIDI) and the Schedules for Clinical Assessment in Neuropsychiatry (SCAN). Psychol Med 2001;31:1001–13. 15. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care physicians versus a structured diagnostic interview. Understanding discordance. Gen Hosp Psychiatry 1999;21:87–96. 16. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. 2nd ed. Toronto (ON): Little, Brown, and Company; 1991. 17. Jensen P, Roper M, Fisher P, Piacentini J, Canino G, Richters J, and others. Test–retest reliability of the Diagnostic Interview for Children (Disc 2.1). Parent, child, and combined algorithms. Arch Gen Psychiatry 1995;52:61–71. 18. Katz SJ, Kessler RC, Lin E, Wells KB. Medication management of depression in the United States and Ontario. J Gen Int Med 1998;13:77–85. 19. Wells KB, Schoenbaum M, Unutzer J, Lagomasino IT, Rubenstein LV. Quality of care for primary care patients with depression in managed care. Arch Fam Med 1999;8:529–36. 20. Lin E, Parikh SV. Sociodemographic, clinical, and attitudinal characteristics of the untreated depressed in Ontario. J Affect Disord 1999;53:153–62. Author(s)Manuscript received May 2005, revised and accepted June 2005. 1. Research Fellow, Health Systems Research and Consulting Unit, Centre for Addiction and Mental Health, Toronto, Ontario; Staff psychiatrist, Department of Psychiatry, University of Toronto, Toronto, Ontario. 2. Scientist, Health Systems Research and Consulting Unit, Centre for Addiction and Mental Health, Toronto, Ontario; Assistant Professor, Department of Psychiatry, University of Toronto, Toronto, Ontario. Address for correspondence: Dr PA Kurdyak, Centre for Addiction and Mental Health, 33 Russell Street, T311, Toronto, ON M5S 2S1 e-mail: paul_kurdyak@camh.net
1 | 2
|
||||||||||||||||||||||||||||||||||||