Inclusion of a deterministic post-processing stage to increase the performance of probabilistic record linkage

Rafael Brustulin Poliana Guerino Marson About the authors

The aim of this study was to demonstrate the application of a deterministic post-processing stage, based on measures of similarity, to increase the performance of probabilistic record linkage with and without manual revision. The databases used in the study were the Brazilian Information System for Notificable Diseases and the Brazilian Mortality Information System, from 2007 to 2015, in Palmas, Tocantins State, Brazil. The probabilistic software was OpenRecLink, and a deterministic post-processing stage was applied to the data obtained from three different probabilistic linkage strategies. The three strategies were compared to each other, and the deterministic post-processing stage was added. The sensibility of the probabilistic strategies without manual revision varied from 69.1% and 77.8%, while the same strategies plus the deterministic post-processing stage varied from 92.9% to 96.3%. Sensitivity of the two probabilistic strategies with manual revision was similar to that obtained by the deterministic post-processing stage, but the number of matches that were referred to manual revision by the two probabilistic strategies varied between 1,177 and 1,132 records, compared to 149 and 145 after the deterministic post-processing stage. Our findings suggest that the deterministic post-processing stage is a promising option, both to increase the sensitivity and to reduce the number of matches that need to be reviewed manually, or even to eliminate the need for manual revision altogether.

Keywords:
Database; Software; Automatic Data Processing; Information Systems


Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz Rua Leopoldo Bulhões, 1480 , 21041-210 Rio de Janeiro RJ Brazil, Tel.:+55 21 2598-2511, Fax: +55 21 2598-2737 / +55 21 2598-2514 - Rio de Janeiro - RJ - Brazil
E-mail: cadernos@ensp.fiocruz.br