Interobserver reliability in the classification of pairs of records formed by probabilistic linkage of SISMAMA databases

REV BRAS EPIDEMIOL 2019; 22: E190045 RESUMO: Introdução: O estudo avaliou a confiabilidade interobservadores na classificação de pares de registros formados durante o processo de relacionamento probabilístico, sendo uma das etapas de validação da metodologia a ser utilizada em pesquisa sobre desigualdades de acesso às ações de controle dos cânceres de mama e do colo do útero no Brasil (DAAC-SIS). Metodologia: O programa RecLink foi usado para relacionar as bases de dados do Sistema de Informação do Controle do Câncer de Mama (SISMAMA) do estado de Minas Gerais, tendo como referência 301 mamografias de rastreamento com resultado provavelmente benigno (categoria BI-RADS 3), registradas em outubro de 2010 e, como comparação, 158.517 mamografias registradas em 2011. Posteriormente, 215 pares de registros, que não obtiveram o escore máximo atribuído pelo RecLink, foram classificados independentemente por dez avaliadores, de quatro centros participantes da pesquisa, como pares verdadeiros ou falsos. Resultados: O coeficiente Kappa variou de 0,87 a 1,00. Seis avaliadores obtiveram concordância perfeita com um ou mais avaliadores de outros centros. O Kappa global foi 0,96 (intervalo de confiança de 95% — IC95% 0,94 – 0,99). Discussão: A avaliação interobservadores foi fundamental para garantir a qualidade do processo de relacionamento, e a sua prática deve ser rotina em estudos dessa natureza. A divulgação desses resultados contribui para a transparência na condução e no relato do estudo em curso. Conclusão: A confiabilidade interobservadores foi excelente, sinalizando homogeneidade satisfatória da equipe na classificação dos pares de registros.


INTRODUCTION
Several Health Information Systems (SIS, acronym in Portuguese) have been developed in the last decades in Brazil to record mortality, morbidity and health care data; however, records belonging to the same individual , cannot easily be identified across these databases because recording of the National Health Card (CNS) number, a unique identification number given to each Brazilian individual, is not yet mandatory in all SIS.
Computer algorithms, based on probabilistic linkage methods, have been developed to help to identify information belonging to the same individual across different SIS. This method uses statistical models to match pairs of records and to score them according to their likelihood of being true pairs. In Brazil, the RecLink software is the most used program 1 . It generates a score, which summarizes the degree of global agreement based on the agreement and disagreement of a set of matched identifier fields 2 . Manual classification of matched pairs that did not obtain a maximum score is, however, still necessary and may vary among different evaluators, because it involves subjective judgement.
Reliability studies of probabilistic linkage are scarce and generally restrict the evaluation to the agreement between the fields of the analyzed databases 3,4 . Assessment of inter-observer reliability is, however, key to monitor and assess the degree of consistency in the classification of pairs of records between the various evaluators contributing to the study. The present study aimed to evaluate interobserver reliability in the classification of pairs of records formed during probabilistic linkage of data from the Breast Cancer Control Information System (SISMAMA, acronym in Portuguese). This assessment is part of a larger on-going research project to investigate inequalities in the access to breast and cervical cancer control activities in Brazil (DAAC-SIS), and constitutes one of the validation steps of the methodology to be used.

METHODOLOGY
An interobserver reliability study was performed regarding the classification of pairs formed by the RecLink software (version 3.1.6.3160), in the probabilistic linkage between the SISMAMA databases -mammography module, from the State of Minas Gerais, Brazil. The databases analysed comprised only records with a valid CNS number; however, data on this variable was not made available to the evaluators. The reference database included 301 records of women who underwent mammography in October 2010 with a likely benign outcome (BI-RADS 3 category), to whom repeat mammography within six months is recommended 5 . This reference database was linked to a database consisting of 158,517 mammograms registered in 2011, after exclusion of two duplicate records.
For pair formation, the soundex code of the woman's first name was used. For score formation, the woman's "full name" and "date of birth" fields and "mother's full name" were used, with the suggested parameters 2 . Only pairs with scores > 0.5 were considered. The pairs with maximum score (17.2) were excluded, and the others were independently analyzed by 10 evaluators, four of whom were from Minas Gerais, two from Bahia, two from Rio de Janeiro, and two from São Paulo).
For each pair of evaluators, Cohen's Kappa coefficient was calculated with the respective 95% confidence intervals (95%CI), and the results were classified as proposed by Byrt 6 . In addition, exact 7 and non-exact 8 global Kappa coefficients were calculated, with their 95%CI estimated using a bootstrap technique with generation of 1,000 random samples based on the original sample (Kappa of each pair of evaluators). These samples were used to generate the sample distribution of the estimate (global Kappa). The lower and upper limits of the global Kappa correspond, respectively, to the estimates of the 2.5 and 97.5 percentiles of the sample distribution. The analyses were performed using the statistical software R 9 . Subsequently, the disagreements among the evaluators were reviewed by the entire team. The

RESULTS
RecLink formed 281 pairs of records, 66 (23.5%) of which had a maximum score (17.2). The other 215 pairs, with scores ranging from 17.0 to 0.54, were independently classified by 10 evaluators. Only nine pairs (4.2%) obtained discordant classifications. A subsequent review by the entire team revealed that only one pair of records was wrongly classified as true by one evaluator.
The Kappa coefficient for each of the 45 pairs of evaluators ranged from 0.87 to 1.00 (Figure 1), with 80% (36/45) of the pairs having excellent agreement (> 0.92). The remaining pairs involved evaluator 3, with very good agreement (0.87 to 0.90). The agreement was perfect (Kappa = 1.00) for 14 pairs, which involved six of the 10 evaluators (60%), corresponding to at least one evaluator from each of the four centers participating in the study. The exact and non-exact global Kappa was 0.96 (p < 0.001, 95%CI 0.94 -0.99).

DISCUSSION
This study involved the classification of 215 pairs of records by 10 independent evaluators, with dual comparison between them, corresponding to 45 pairs of evaluators. Each center participating in the study had at least one evaluator who obtained perfect agreement with one or more evaluators from other centers.
Interim evaluations, however, should be implemented throughout the DAAC-SIS study to monitor linkage quality. Such care is fundamental to minimize possible losses or inclusion of false pairs, which could introduce biases in the analyses to be performed 10 .
The dissemination of these results highlights the efforts that should be made to ensure quality control when conducting record linkage studies across different SIS from the Brazilian Unified Health System (SUS, acronym in Portuguese). In addition, it helps to disseminate the need to incorporate routinely such assessment into similar studies.

CONCLUSION
The study revealed excellent interobserver reliability and demonstrated the team consistency in the classification of record pairs. Assessment of interobserver reliability is a key tool to establish the quality of the record linkage process, and it should be regarded as routine practice in studies of this nature.