The Clock Drawing Test: A review of its accuracy in screening for dementia

The Clock Drawing Test (CDT) is a simple neuropsychometric instrument that can be easily applied to assess several cognitive functions. Over the past 20 years, the CDT has aroused considerable interest in its role for the early screening of cognitive impairment, especially in dementia. Although the CDT is considered an accurate test for dementia screening, recent studies including comparisons with structured batteries such as the CAMCOG have shown mixed results. Objectives To investigate the importance of the CDT compared to other commonly used tests, in the diagnosis of dementia in the elderly; (2) to evaluate the reliability and correlation between available CDT scoring scales from recent studies. Methods A systematic search in the literature was conducted in September 2008 for studies comparing CDT scoring systems and comparing the CDT with neuropsychiatric batteries. Results Twelve studies were selected for analyses. Seven of these studies compared CDT scoring scales while five compared the CDT against the CAMCOG and the MMSE. Eight studies found good correlation and reliability between the scales and the other tests. Conclusion Despite the mixed results in these studies, the CDT appears to be a good screening test for dementia.

The Clock Drawing Test (CDT) is a simple neuropsychometric instrument that can be easily applied to assess several neuropsychiatric functions. 1 The CDT was introduced in the early 20 th century as an indicator of constructional apraxia. 2 From 1953 to mid-1986, the CDT was mainly used to screen visuoconstructional disorders associated with lesions in the parietal region of the brain. 3 Constructional apraxia may occur in many neurological diseases, such as in patients with stroke sequelae, and is often present in early dementia. [4][5][6] Over the past 20 years the CDT has aroused considerable interest for its role in early screening of cognitive impairment, especially in Alzheimer's disease. [7][8][9][10][11][12][13][14][15][16][17][18][19] In 1986, Shulman et al. published the first study associating the CDT with the screening of elderly patients with cognitive disorders, particularly the screening and followup of acute dementia and delirium. 13 Since then, various studies have been carried out with the aim of establishing criteria to apply and interpret the CDT and evaluate its current role as a screening instrument for patients with cognitive impairment. 1 Its contribution has also been investigated in the assessment and follow-up of delirium, focal cerebral lesions, Huntington's disease, schizophrenia, unilateral neglect, multiple sclerosis, among others. 1 In broad terms, the test evaluates several cognitive skills, similarly to the Mini-Mental State Examination (MMSE). 20 Many cortical, subcortical, anterior, posterior, right and left skills in brain hemispheres have to operate simultaneously to draw a clock, particularly involving the frontal, temporal and parietal regions. 3 This makes the CDT an interesting instrument for identification and follow-up of patients with possible dementia. 3 The test assesses many cognitive skills that may be involved in early Alzheimer's disease, such as short term memory, understanding of verbal instructions, spatial orientation, abstract thinking, planning, concentration, executive and visuospatial skills. 3 Our aim in this study was: (1) to investigate the importance of the CDT test compared to other commonly used tests, in the diagnosis of dementia in the elderly; (2) to evaluate the reliability of and correlation among available CDT scoring scales based on results of recent studies.

Methods
A systematic search of the literature was conducted (in September 2008) for articles comparing CDT scoring systems and the CDT with other neuropsychiatric instruments in dementia. A search for relevant publications was carried out of the PubMed  and the PsycInfo (1806-2008) databases to identify studies reporting on clock drawing test and dementia. Keywords used in the systematic search were: "clock drawing", "clock test", "screen-ing", "accuracy", "scales", "cognitive impairment", "mild cognitive impairment", "dementia", "Alzheimer's disease", "old age," "elderly". The results were limited to articles published in English and which were based on human research. The references of key articles or books were also examined for citations missed by the search strategy. Each article resulting from this search was analyzed by the authors in a search for comparisons among CDT scoring scales and between CDT scales and other neuropsychometric batteries. Articles published after the systematic review of the CDT (Shulman, 2000) 1 were of special interest to this study.

Results
The initial search strategy conducted in September 2008 identified 115 potentially relevant papers regarding CDT and dementia. An initial review of the abstracts excluded 79 papers because they did not compare CDT scoring scales or the CDT with other instruments. Eleven articles were selected from the remaining papers. One study (Bourke et al., 1995) 21 was selected from the references of the previously selected papers. Seven papers involved CDT scoring comparisons (Table 1) and five studies compared the CDT with the Cambridge Cognitive Examination (CAMCOG) and the MMSE ( Table 2).

CDT scoring scales
There are more than fifteen well validated scales to interpret the CDT. They provide qualitative 12,13,22 or quantitative 10,11,23,24 methods of variable complexity. These scales are based on (1) strict and well-structured protocols, (2) the most frequent findings after test application, and (3) opinion of experts. 1 There is no consensus in the literature about which scale is the most adequate for test interpretation. Shulman 1 found mean sensitivity and specificity levels of 85% across all scales by using a statistical instrument to group all scales according to a similar scientific method. Conceptual opposition to these findings remains in the literature due to difficulties in replicating the results (Table 1). [25][26][27][28][29] Comparisons among scales have been questioned, since the studies showed major methodological differences in terms of patient recruitment and clinical procedures and presence or absence of comparison with instruments of higher diagnostic accuracy. 30 Despite such limitations, several studies in the literature have indicated that the scales by Shulman et al., 13 Mendez et al. 24 and Sunderland et al. 10 showed greater diagnostic accuracy and similar results when compared with neuropsychiatric exams, even when used in populations with diverse cultural backgrounds and educational levels. 1,18,19,30,31 A study in Brazil by Shulman et al. 13 sought to evaluate intra and inter-rater reliabilities of the CDT scored (scores from 0 to 5; cut-off: 3 points) by two independent raters, in an elderly random sample of 202 subjects with very low formal educational level. 15 Intra and inter-rater reliabilities were excellent when CDTs were classified as 'normal' (scores 4 or 5) or 'abnormal'(scores 0 to 3) (kappa=0.99 and 0.94, respectively) and were in the good to excellent range when scored from 0 to 5 (kappa=0.88 and 0.74, respectively). 15 Storey et al. compared the scales of Shulman et al., 22 Mendez et al., 24 Sunderland et al., 10 Wolf-Klein et al., 12 and Watson et al. 23 in elderly individuals with clinical diagnosis of dementia according to the DSM-IV. 32 Inter-rater reliability was high for all five scales (0.81-0.93), although they found lower accuracy than original studies. 27 The methods by Shulman et al. 22 and Mendez et al. 24 demonstrated the best diagnostic accuracy. 27 Richardson and Glass, in a study of 63 patients with Alzheimer's disease, vascular and mixed dementia, analyzed five CDT scales (Shulman et al., 22 Mendez et al., 24 Sunderland et al., 10 Wolf-Klein et al., 12 and a practical scale developed by one of the authors) and found robust cor- Good reliability, but with lower accuracy than previously reported for the scales. Richardson  relation between the MMSE and the CDT in all scales. 31 Another study also showed significant correlations among five similar methods of CDT analysis and the MMSE and Short Performance Test (SKT). 28 Scalan et al. compared six CDT scales (Shulman et al., 22 Mendez et al., 24 Sunderland et al., 10 Wolf-Klein et al., 12 Manos and Wu, 33 Lam et al. 35 ) for scores obtained from naive and experienced raters, who had classified as normal or abnormal on the CDT. 19 Surprisingly, three of the scales (Sunderland, Wolf-Klein, Lam) showed poorer performance than that of the clocks assessed by experienced raters. 19 Finally, a study involving Alzheimer's patients and controls used the scales by Wolf-Klein et al., 12 Rouleau et al., 9 and Watson et al. 23 and found good inter-rater reliability, 36 although all CDTs lacked sensitivity in milder dementia. 36 Studies involving patients at early stages of dementia are still rare in the literature. 1 A study involving patients with mild cognitive impairment and questionable dementia showed a weak relationship between the CDT and the MMSE, SKT and verbal fluency tests, but a strong correlation between four scales (Manos and Wu, 33 Wolf-Klein et al., 12 Shulman et al. 22 and Watson et al. 23 ). 34 Lee et al. conducted a study involving 30 patients at early stages of Alzheimer's disease and 30 normal patients. 37 The CDT was analyzed using the scales by Sunderland et al. 10 and Mendez et al. 24 The patients with Alzheimer's disease were classified according to the disease stage as CDR 0.5 (very mild), 1.0 (mild) or 2.0 (moderate). The CDT sensitivity was lower for patients with CDR 0.5 on both scales (Sunderland, mean 67%; 33% for CDR 0.5, 77% for CDR 1.0 and 100% for CDR 2.0; Mendez, mean 73%; 44% for CDR 0.5, 82% for CDR 1.0 and 100% for CDR 2.0). All the clocks were compared with the CAMCOG battery to evaluate their performance. 37 The only significant correlation was between the CAMCOG praxis rating and the Sunderland scale. 37 In a longitudinal study, patients with initial to advanced Alzheimer's disease were evaluated. 38 A total of 75 patients were selected: 15 normal controls, 25 with very mild dementia (CDR=0.5), 21 with mild dementia (CDR=1.0), and 14 with moderate and severe dementia (CDR=2.0 or 3.0). Each CDT was blindly rated by two raters using six standardized scales. The same scales for the CDT interpretation were used for follow-up. All scales had low sensitivity in identifying individuals at early stages of dementia, allowing for a significant number of false positives. 38 A recent study has analyzed the most common errors found on the CDT in a population of 536 elderly individuals, developing an interpretation scale based on the most frequent errors. 39 This scale showed that six errors are needed for good discrimination between normal elderly individuals and those with dementia, and that the error scale may be better than the three scales most frequently used in the literature. 39

Comparison between the Clock Drawing Test and other instruments or batteries for cognitive screening
The CDT is a screening instrument with sensitivity and specificity approaching that of the MMSE (87 and 86%, respectively). 40 The correlation between the CDT and the MMSE ranges from moderate (0.30) to high (0.77), mean 0.61. 1 The highest correlations were found for the scale by Shulman et al., 22 Mendez et al. 24 and the CLOX scale. 41 Comparison of correlation between the MMSE and other cognitive screening tests range between 0.60-0.90, higher than those for the CDT. 40 Brodaty and Moore showed that the clock test can be better than the MMSE at a memory clinic. 30 There is also a potential advantage when both tests are applied concomitantly. 42 The MMSE includes limited assessment of visuospatial and executive functions which may be altered in some dementia patients at early stages of the disease more prominently than language and memory. 5 Juby conducted a study with 150 elderly outpatients at a general clinic comparing the MMSE and three interpretation methods of the CDT. 43 The researcher used the scales by Sunderland et al. 10 , Wolf-Klein et al. 12 and Watson et al. 23 All CDT scores were significantly associated with the MMSE results showing high to moderate correlations (p=0.01) ranging from -0.50 to 0.67. 43 A study including normal controls, patients with dementia or depression was compared with both tests. 29 In case of an abnormal result in one of the tests when the CDT and the MMSE were used together, 39 out of 41 cases of dementia were identified correctly generating a sensitivity of 95%. However, 26% of patients without dementia or depression and 30% of those with depression had lower than normal scores on one of the tests, resulting in 74 and 70% specificity, respectively. The CDT had 76% sensitivity and 81% specificity, 29 lower than the values found in previous studies in which patients were selected from clinics specialized in neurology, memory and psychiatry. 11,12,24,30 In a large study conducted in England, 13,557 elderly individuals completed the CAMCOG CDT (scored from 0 to 4) and the MMSE. 44 The authors showed a 76.5% sensitivity and 87.1% specificity for moderate to severe cases of cognitive impairment with a cut-off point of two points, corresponding to an MMSE score of 17 for nurse administration and 40% sensitivity and 91% specificity for postal administration. 44 No relationship was found in cases of mild dementia.
Solomon et al. combined the CDT with episodic memory, orientation and verbal fluency tests lasting seven minutes and found 100% sensitivity and specificity in the differentiation between likely Alzheimer's patients and healthy controls. 45 Scanlan and Borson associated the CDT with three memory items forming the Mini-Cog test and achieved high sensitivity (97%) and specificity (95%) in the cognitive screening of dementia. 46 Schramm et al. combined the MMSE or the SKT with five different clocks. 28 The sensitivity of each clock was improved to levels of up to 92% using the SKT and CDT evaluated according to Shulman et al. 28 There are few studies comparing the CDT to neuropsychometric batteries of higher diagnostic accuracy in dementia ( Table 2). 1 Bourke et al. compared the CDT to the CAMCOG in 77 patients who met the NINCDS-ADRDA criteria for probable Alzheimer's disease. 21 The scales used for interpreting the clock were those by Shulman et al. 22 and Mendez et al. 24 There were robust correlations between the scales by Shulman (r=0.70) and Mendez (r=0.67) and the CAMCOG. 21 The study by Heinik et al. (2002) sought to compare the scales by Shulman et al. 22 22 had the same performance in mild and moderate cases, while that by Freedman 3 showed poorer performance among patients at the moderate stage. 47 The same authors later selected 56 patients with Alzheimer's disease, 36 with vascular dementia and 26 controls with bipolar disorder according to the DSM-IV at a geriatric outpatient clinic. 48 The CAMCOG was applied to all patients for comparisons with the MMSE and the CDT interpreted according to Freedman et al. 3 The CDT showed high correlations with the MMSE (0.73) and with the CAMCOG (0.80) (p<0.001). 48 The relationship between the MMSE and the CAMCOG was also high (0.93, p<0.001). 48 Later, Heinik et al. analyzed three scales for CDT interpretation in the same group of patients as the previous study. 16 They added the application of two other scales for the clock: that by Shulman et al. 22 and the CAMCOG scale. The authors found significant correlations between the three scales (CAMCOG, Shulman, Freedman) and the CAMCOG score, as well as the MMSE. 16 The results of Heinik et al. were better than the previous study performed by Bourke et al. 21 Also in 2004, Van der Burg et al. conducted a study involving 473 normal controls and patients with dementia selected from the community. 49 The CAMDEX was per- formed in all patients, being considered the gold standard. Two clock scales were applied: Shulman et al. 22 and the CAMCOG scale. Inter-rater reliability was evaluated and was weak for Shulman's scale (0.47) and high for the CAM-COG CDT (0.75). 49 When inter-rater diagnostic agreement was evaluated, the results were much better for both scales (0.88 and 0.91, respectively). Sensitivity and specificity were similar between the scales: 97 and 32%, respectively, for the CAMCOG CDT, and 96 and 42% for the scale developed by Shulman. 49 In conclusion, studies which tested the accuracy of the CDT in dementia screening have shown that the CDT may be scored reliably with a variety of scales and that it accurately discriminates cognitively unimpaired patients from patients showing early cognitive decline. The various interpretation scales available tend to generate congruent results and CDT scores are frequently highly correlated with other screening tests such as the MMSE and the SKT. Correlations between the CDT and more comprehensive cognitive batteries such as the CAMCOG also tend to be high. Therefore, present evidence suggests the CDT may be used as a single screening test when there are time constraints, or be applied as part of larger assessment protocols.