Acessibilidade / Reportar erro

Evaluation of selection criteria for noun phrases with relevance for information retrieval

Abstract

This study assesses the criteria for selecting the most representative noun phrases from documents written in Portuguese in the field of law. The research methods were literature review and an experiment. In the experiment, ten selection criteria were applied to noun phrases extracted from a set of abstracts of theses and dissertations. The effectiveness of the criteria was assessed regarding the selection of noun phrases relevant for information retrieval. Through the experiment, the most effective criteria identified were removal of noun phrases with stopwords value or noun phrases containing pronouns, the selection criteria of noun phrases based on position of occurrence, level of the noun phrase, inverse document frequency, and document occurrence frequency.

Keywords
Automatic indexing; Legal information; Information representation; Noun phrase selection; Noun phrases

Pontifícia Universidade Católica de Campinas Núcleo de Editoração SBI - Campus II - Av. John Boyd Dunlop, s/n. - Prédio de Odontologia, Jd. Ipaussurama - 13059-900 - Campinas - SP, Tel.: +55 19 3343-6875 - Campinas - SP - Brazil
E-mail: transinfo@puc-campinas.edu.br