SciELO - Scientific Electronic Library Online

vol.14 issue4A hybrid heuristic for the multi-plant capacitated lot sizing problem with setup carry-overFast two-step segmentation of natural color scenes using hierarchical region-growing and a color-gradient network author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Journal of the Brazilian Computer Society

Print version ISSN 0104-6500On-line version ISSN 1678-4804


MILIDIU, Ruy Luiz; SANTOS, Cícero Nogueira dos  and  DUARTE, Julio Cesar. Portuguese corpus-based learning using ETL. J. Braz. Comp. Soc. [online]. 2008, vol.14, n.4, pp.17-27. ISSN 1678-4804.

We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.

Keywords : Entropy Guided Transformation Learning; transformation-based learning; decision trees; natural language processing.

        · text in English     · English ( pdf )


Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License