Acessibilidade / Reportar erro

Automatic indexing by assignment of scientific articles written in Portuguese from the Information Science area

Abstract

This work proposes and evaluates a process of automatic indexing by assignment in the representation of full-text articles written in Portuguese, in the context of construction of a scientific database in the area of Information Science in Brazil. It uses the exploratory, bibliographic and empirical research as a methodology. The empirical part takes base in the accomplishment of an experiment as a case study. The experiment consists of the application of the proposed process in a corpus composed of 60 scientific articles, as well as quality assessment in automatic indexing through indexes of consistency, precision, recall, and F-measure. The gold standard was the authors’ keywords. The automatic indexing process uses the Brazilian Thesaurus of Information Science and SISA software. The satisfactory results were a consistency index average of 19%, an average precision of 30%, an average recall of 37%, and a mean F-measure of 30%. The analysis of the results shows the thesaurus has a strong influence on the results of an automatic indexing by assignment, although the general term’s relations had poor contribution on the quality of the automatic indexing. In addition, we point out intervening factors in automatic indexing.

Keywords
Automatic indexing; Automatic indexing by assignment; Thesaurus; Scientific journal; Information Science

Pontifícia Universidade Católica de Campinas Núcleo de Editoração SBI - Campus II - Av. John Boyd Dunlop, s/n. - Prédio de Odontologia, Jd. Ipaussurama - 13059-900 - Campinas - SP, Tel.: +55 19 3343-6875 - Campinas - SP - Brazil
E-mail: transinfo@puc-campinas.edu.br