Ciência da Informação
Print version ISSN 0100-1965
GOMES, Georgia Regina Rodrigues and MORAES FILHO, Rubens de Oliveira. Automatic categorization of digital documents. Ci. Inf. [online]. 2011, vol.40, n.1, pp. 68-76. ISSN 0100-1965. http://dx.doi.org/10.1590/S0100-19652011000100005.
The evolution of information technology and dissemination of digital documents on the Web calls for a mechanism for the organization of such documents in order to facilitate the search and recall processes. In digital libraries or repositories of electronic works, for example, there is a need for tools that will automatically classify documents, since the classification process (categorizations) is done manually. Such a tool will represent an important resource and support for cataloging. This article presents the development of a tool whose chief objective is to categorize digital documents automatically, using pre-established categories, where each document will belong to one or more categories according to its content, thus making the classification of such documents more efficient and also quicker. Techniques and algorithms of text mining were used to develop and validate the tool; also, some categories were defined in the case study, as well as related terms such as: information technology, law and physics.
Keywords : Information technology; Categorization; Digital libraries; Text mining; Digital documents.