Use of noun phrases in automatic classification of electronic documents

Maia, Luiz Cláudio; Souza, Renato Rocha

doi:10.1590/S1413-99362010000100009

Acessibilidade / Reportar erro

Brasil

Perspectivas em Ciência da Informação

Español English

Brasil

Español English

sumário « anterior atual seguinte »

Sumário

Artigos • Perspect. ciênc. inf. 15 (1) • Apr 2010 • https://doi.org/10.1590/S1413-99362010000100009 copy

Use of noun phrases in automatic classification of electronic documents

Authorship SCIMAGO INSTITUTIONS RANKINGS

This research work presents a proposal for the classification of electronic documents using techniques and algorithms based on natural language processing and noun phrases indexing along with plain keywords. Two tools, OGMA and Weka, were used for the experiments proposed. OGMA was developed by the author to automate the extraction of noun phrases and to perform the calculation of the weight of each term in the process of document indexing for each of the six proposed methods. The WEKA was used to analyze the OGMA results using the algorithms of clustering and classification "Simplekmeans" and "NaiveBayes", respectively. This process resulted in a percentage value indicating how many documents were classified correctly. The best performing methods were those with the terms without stopwords and the classified and scored noun phrases.

Text analysis; Clustering; Automatic indexing; Noun phrases; Natural language processing

Escola de Ciência da Informação da UFMG Antonio Carlos, 6627 - Pampulha, 31270- 901 - Belo Horizonte -MG, Brasil, Tel: 031) 3499-5227 , Fax: (031) 3499-5200 - Belo Horizonte - MG - Brazil
E-mail: pci@eci.ufmg.br

Acompanhe os números deste periódico no seu leitor de RSS