Revista Brasileira de Meteorologia
Print version ISSN 0102-7786
SALVADOR, Henrique Gonçalves; CUNHA, Adilson Marques Da and CORREA, Cleber Souza. Vedalogic: a Method of Climatologic Data Verification based on Data Mining Models. Rev. bras. meteorol. [online]. 2009, vol.24, n.4, pp. 448-460. ISSN 0102-7786. http://dx.doi.org/10.1590/S0102-77862009000400007.
This work presents the VEDALOGIC - Method for Climatologic Data Verification - based on Data Mining Models, to be used by the "Instituto de Controle do Espaço Aéreo Brasileiro" (ICEA). The VEDALOGIC method consists of a data verification using Data Mining algorithm models. The method uses clustering models generated from a historical series that provide the identification of homogeneous groups in the Climatologic Data Base (CDB). This method, based on clustering models, detects unconformities, named outliers. Detected outliers are classified/predicted according to the decision tree models which are also built from historic data. The found value based on the decision tree model is used as a suggestion to correct an outlier, contributing to increase the CDB data consistence. In this study, the Expectation-Maximization (EM) and the K-means algorithms were used to generate clustering models, and the REPTree and the M5P algorithms were used to generate decision (classification/prediction) tree models. To verify the efficiency of the proposed method, some noisy data were artificially inserted into CDB. After applying the VEDALOGIC method, all inserted noisy data were detected and the adjustments have an average precision above 98%.
Keywords : Mineração de Dados; Banco de Dados Climatológicos; Clustering; Verificação de Dados.