OBJECTIVE: To compare medical students' global itemized ratings (GIR) and real-case structured clinical assessment (RC-SCA), generated by faculty members from three different specialties (Gynecology-O&G, Internal Medicine-IM, Pediatrics-Ped). METHOD: 106 Y4 learners were assessed by one faculty member from each specialty, who filled in GIR, consisting of 6 technicaldomains (mean score GIRt) and 7 humanistic domains (mean score GIRh), on a 0-10 scale, and resultant RC-SCA, from direct attendance observation. Statistical analyses used Cronbach coefficient, Friedman and Wilcoxon paired tests, Pearson and Spearman correlation coefficients, Euclidean distances. Significance level=5%. RESULTS: High internal consistency was observed in the three GIR (> 0.92). Ratings were negatively skewed. Ped scores were significantly lower than O&G and IM (median differences between 0.50 and 0.67), with low correlations between them (-0.02<R<0.48). the domains that had greater impact on the reliability of GIR were: clinical judgment (O&G and Ped), problem-solving (IM), and self-reflective skills (Ped). O&G and Ped scores showed the smallest agreement; GIRt Ped scores showed the greatest disagreement with all the other scores. CONCLUSION: The specialties have different views on how to evaluate students' skills, inspite of using similar instruments, which can be a reflection of their "culture". the challenge remains to minimize these differences through faculty development activities.
Educational measurement; Specialties medical; Clinical clerkship