Acessibilidade / Reportar erro

Evaluation of different blocking strategies in probabilistic record linkage

Blocking, that is, the creation of logical record blocks within the files to be linked, is one of the steps that have to be taken in the process of probabilistically linking large databases. This paper is aimed at comparing different blocking strategies and studying the effectiveness of a standardizing algorithm that we have developed, which uses the same spelling for similarly sounding first syllables of names. We linked a mortality database with information on 59,065 death reports with a hospital death report database with 531 records, which had corresponding entries in the larger database. Different blocking strategies were compared with regards to processing cost and the proportion of lost true matches. The multiple steps blocking strategy was more effective, allowing the identification of all the true matches, at the same time producing a total number of pairs which was smaller than the one obtained with the use of two different single-step strategies. Among the single-step strategies, the best result was achieved with the utilization of a key produced by a combination of the soundex codes of the first name and sex. The utilization of the algorithm that standardizes the spelling of similarly sounding first syllables of names produced no remarkable effects, both in terms of cost and reduction of the loss of true matches.

Database; Probabilistic record linkage; Blocking; Epidemiology


Associação Brasileira de Saúde Coletiva Av. Dr. Arnaldo, 715 - 2º andar - sl. 3 - Cerqueira César, 01246-904 São Paulo SP Brasil , Tel./FAX: +55 11 3085-5411 - São Paulo - SP - Brazil
E-mail: revbrepi@usp.br