Acessibilidade / Reportar erro

Spatial pattern recognition of extreme temperature climatology: assessing HadCM3 simulations via NCEP reanalyses over Europe

Análise do padrão espacial da climatologia de temperaturas extremas: avaliando simulações do HadCM3 via reanálises do NCEP para a Europa

Abstracts

This article attempts to quantify the spatial uncertainties associated with extreme temperature’s response, by assessing data derived from climate model. This is undertaken by a comparison of the spatial pattern of a long-term time-series aggregation (1960/61-1989/90) for extreme temperatures simulated by a particular GCM (the UK Met Office - Hadley Centre climate model, HadCM3) to that of the USA NCAR NCEP Reanalyses, which are considered as ‘truth’, over the MICE (Modelling the Impacts of Climate Extremes - EU Project) spatial domain. Since evaluation of models is crucial to assessing future scenarios, the aim of this study is to investigate whether the extreme values predicted by the HadCM3 climate model can simulate those produced by NCEP Reanalyses, assuming that the extremes of both models are realizations of the same spatial stochastic process. To get more useful information about the uncertainties surrounding spatial climate projection, one also has to analyze the pattern of temperature extremes in terms of their anomalies. A common technical issue in the assessment of numerical spatial models is based on the Principal Components Analysis and Bayesian Classification for spatial pattern recognition. These methodologies are very important and useful for guiding an evolutionary statistical model-building process. This study leads to the conclusion that the HadCM3 Simulations do not realistically reproduce the NCEP Reanalyses, despite the fact that the climatology of extremes has demonstrated very similar spatial patterns. It is likely therefore that such instability may persist in the future.

anomaly; compositing; principal components loading


Este artigo é um ensaio na quantificação de incertezas espaciais associadas à resposta de temperaturas extremas, avaliando dados derivados de um determinado modelo de clima. Isto é empreendido por uma comparação entre o padrão espacial de séries temporais agregadas de longo termo (1960/61-1989/90) para temperaturas extremas simuladas de um particular MCG, o "UK Met Office - Hadley Centre" modelo de clima, HadCM3, e das reanálises do "USA NCAR NCEP", que são considerados como ‘verdade’, no domínio espacial do projeto MICE ("Modelling the Impacts of Climate Extremes" - Projeto Europeu). Posto que a avaliação de modelos é crucial na determinação de cenários futuros, a meta deste estudo é investigar se os valores extremos obtidos através do HadCM3 são hábeis para reproduzir as reanálises do NCEP, assumindo que os extremos de ambos os modelos sejam realizações do mesmo processo estocástico no espaço. Para adquirir informações mais úteis sobre as incertezas acerca da projeção espacial de clima, deve-se também analisar o padrão de extremos de temperatura em termos de suas anomalias. Uma técnica comum na avaliação de modelos numéricos está baseada na Análise de Componentes Principais ou na Classificação Bayesiana Empírica para o reconhecimento do padrão espacial. Estas metodologias são muito importante e úteis para guiar o processo de estatístico evolutivo na construção de modelos. Este estudo conduz à conclusão que as simulações do HadCM3 não reproduzem realisticamente as reanálises do NCEP, apesar da climatologia de extremos ter demonstrado padrões espaciais semelhantes. É provável, então, que tal instabilidade possa persistir no futuro.

anomalia; "compositing"; "loadings" das componentes principais


ARTIGOS

Spatial pattern recognition of extreme temperature climatology: assessing HadCM3 simulations via NCEP reanalyses over Europe

Análise do padrão espacial da climatologia de temperaturas extremas: avaliando simulações do HadCM3 via reanálises do NCEP para a Europa

P. S. LucioI, II; F. C. CondeI, III; A. M. RamosI

ICentro de Geofísica de Évora (CGE), Évora, Portugal. E-mails: pslucio@uevora.pt, andreara@uevora.pt

IIUniversidade Federal do Rio Grande do Norte (UFRN), Natal - RN, Brasil

IIIInstituto Nacional de Meteorologia (INMET/CDP), Brasília - DF, Brasil. E-mails: fabio.conde@inmet.gov.br

Corresponding author Corresponding author: Paulo Sérgio Lucio Departamento de Estatística - CCET - UFRN Lagoa Nova – Natal-RN CEP 59.072-970E-mails: pslucio@ccet.ufrn.br

ABSTRACT

This article attempts to quantify the spatial uncertainties associated with extreme temperature’s response, by assessing data derived from climate model. This is undertaken by a comparison of the spatial pattern of a long-term time-series aggregation (1960/61–1989/90) for extreme temperatures simulated by a particular GCM (the UK Met Office - Hadley Centre climate model, HadCM3) to that of the USA NCAR NCEP Reanalyses, which are considered as ‘truth’, over the MICE (Modelling the Impacts of Climate Extremes - EU Project) spatial domain. Since evaluation of models is crucial to assessing future scenarios, the aim of this study is to investigate whether the extreme values predicted by the HadCM3 climate model can simulate those produced by NCEP Reanalyses, assuming that the extremes of both models are realizations of the same spatial stochastic process. To get more useful information about the uncertainties surrounding spatial climate projection, one also has to analyze the pattern of temperature extremes in terms of their anomalies. A common technical issue in the assessment of numerical spatial models is based on the Principal Components Analysis and Bayesian Classification for spatial pattern recognition. These methodologies are very important and useful for guiding an evolutionary statistical model-building process. This study leads to the conclusion that the HadCM3 Simulations do not realistically reproduce the NCEP Reanalyses, despite the fact that the climatology of extremes has demonstrated very similar spatial patterns. It is likely therefore that such instability may persist in the future.

Keywords: anomaly, empirical Bayesian classification; compositing; principal components loading.

RESUMO

Este artigo é um ensaio na quantificação de incertezas espaciais associadas à resposta de temperaturas extremas, avaliando dados derivados de um determinado modelo de clima. Isto é empreendido por uma comparação entre o padrão espacial de séries temporais agregadas de longo termo (1960/61–1989/90) para temperaturas extremas simuladas de um particular MCG, o "UK Met Office - Hadley Centre" modelo de clima, HadCM3, e das reanálises do "USA NCAR NCEP", que são considerados como ‘verdade’, no domínio espacial do projeto MICE ("Modelling the Impacts of Climate Extremes" - Projeto Europeu). Posto que a avaliação de modelos é crucial na determinação de cenários futuros, a meta deste estudo é investigar se os valores extremos obtidos através do HadCM3 são hábeis para reproduzir as reanálises do NCEP, assumindo que os extremos de ambos os modelos sejam realizações do mesmo processo estocástico no espaço. Para adquirir informações mais úteis sobre as incertezas acerca da projeção espacial de clima, deve-se também analisar o padrão de extremos de temperatura em termos de suas anomalias. Uma técnica comum na avaliação de modelos numéricos está baseada na Análise de Componentes Principais ou na Classificação Bayesiana Empírica para o reconhecimento do padrão espacial. Estas metodologias são muito importante e úteis para guiar o processo de estatístico evolutivo na construção de modelos. Este estudo conduz à conclusão que as simulações do HadCM3 não reproduzem realisticamente as reanálises do NCEP, apesar da climatologia de extremos ter demonstrado padrões espaciais semelhantes. É provável, então, que tal instabilidade possa persistir no futuro.

Palavras-chave: anomalia, classificação Bayesiana empírica; "compositing"; "loadings" das componentes principais.

1. INTRODUCTION

The purpose of this manuscript is to evaluate the spatial ability of a GCM output to reproduce the present-day occurrence of temperature extremes over the European geographical domain. Basically, one would like to confirm whether, stochastically and visually, the spatiotemporal aggregated extreme patterns (long-term climate signal) obtained from the HadCM3 Simulations can be considered as realizations of the NCEP Reanalyses. To evaluate the hypothesis of model performance one uses a straightforward interpolation rescaling followed by Principal Components Analysis and Bayesian Classification – two alternative methods to corroborate the results in Lucio, 2004a and Lucio, 2004b).

Since the high risk of getting the wrong strategy decision concerning future scenarios and impacts has costly economic and social consequences. Hypothesis on climate change concerning global climate models can be rejected through comparison with observations or reanalyses database, but improbably confirmed. Assessments (Fuentes et al., 2003) of GCM simulation models by statistical techniques have been historically concentrated mainly on monthly and seasonal means. Considering extreme measures of climate models and reanalyses studies, the spatial prognostic of large-scale is an important component. The failure to neglecting this spatial information (Lucio, 2004a; Lucio, 2004b) can lead to errors in decision-make.

Hence, basically, one would like to confirm whether stochastically the long-time aggregated extreme values obtained from the HadCM3 Simulations can be considered realisations of the NCEP Reanalyses over the MICE (Modelling the Impacts of Climate Extremes: MICE EVK2-CT-2001-00118 - European Research Project) geographical domain. For this reason, the climatology and the anomaly spatial patterns from the HadCM3 Simulations are compared with the corresponding ones from the NCEP Reanalyses, in the same spatial domain.

2. EXPERIMENTAL DATASET

Climate scenarios are defined by particular climate forcing in the dynamical simulations of climate models. The general circulation models (GCM) are the most complex of these models and such the most powerful tools available for making realistic estimates of climate change. However, they involve much uncertainty, especially in terms of estimating regional or local change essentially with reference to the extremes. In this work, the spatial pattern of extreme temperature climatology and anomaly over the MICE spatial domain (Figure 1) are analysed to assess the output of the HadCM3 Simulations and the NCEP Reanalyses.


The HadCM3 Simulation is performed with a coupled atmosphere-ocean general circulation model (AOGCM) developed at the Hadley Centre - UK and described by Gordon et al. (2000) and Pope et al. (2000). The HadCM3 model produces values for grid squares over a temporal window. The NCEP/NCAR Reanalyses (National Center for Environmental Prediction/National Center of Atmospheric Research - USA), which are currently available back to 1948, contain several meteorological parameters in a global spatial resolution of 2.5° x 2.5° (latitude x longitude) and in a vertical extent from the surface up to 10 hPa level, c.f. Kalnay et al. (1996). The spatial window for HadCM3 covering the MICE domain is a regular grid with 2.5° latitude by 3.75° longitude (16 long x 19 lat = 304 points) and for the NCEP Reanalyses is represented with a Gaussian grid, which preserves equal areas, spaced of 1.875° longitude by 1.903801° latitude (31 long x 25 lat = 775 points).

The elementary dataset under study consists on the temporal aggregated coldest and warmest temperatures from winter (December, January and February - DJF) and summer (June, July and August - JJA) assembled. The climatology was calculated separately for each grid point as the average value (calculated over the period of record: 1960/61–1989/90). Hence, the climatology for nocturnal and diurnal extreme temperatures were calculated based on the 30 recorded years subdivided into the four following work-ensembles:

1. Spatial climatology of the winter coldest minimum temperature (TMIN.W);

2. Spatial climatology of the winter warmest maximum temperature (TMAX.W);

3. Spatial climatology of the summer coldest minimum temperature (TMIN.S);

4. Spatial climatology of the summer warmest maximum temperature (TMAX.S).

These marks were chosen because they represent the warmest and coldest days/nights, related to the winter and summer seasons. Furthermore, the dependence of temperature extremes on geographical attributes was considered, since the extreme temperature depends on the latitude (solar forcing), altitude, and contrasts land-water and land-topography - both spatial patterns are expected to be very similar, which can be predictable by a simple spatial autoregression moving average interpolation.

In practice, to get large-scale comprehensible information, it is more useful to analyse meteorological extremes considering their anomalies (d): the deviation of the absolute extreme measure from the long-term average value (i.e., the deviation of the coldest or the warmest temperature for a year from the long-term average value of the coldest or the warmest daily temperature). The reason for using anomalies is to attempt to remove the influence of latitude, longitude, and land/ocean differences. Hence, the anomalies were calculated separately for each grid point as the difference between the largest maximum (smallest minimum) temperature for a particular season and year and the average value (calculated over the period of record) of the largest maximum (smallest minimum) temperature at that grid point. Hence, the anomalies for minimum and maximum extreme temperature events were calculated based on the 30 recorded years subdivided into the four following work-ensembles:

1. Spatial anomalies of the winter minimum temperature (dTMIN.W);

2. Spatial anomalies of the winter maximum temperature (dTMAX.W);

3. Spatial anomalies of the summer minimum temperature (dTMIN.S);

4. Spatial anomalies of the winter maximum temperature (dTMAX.S).

The NCEP Reanalyses are obtained at individual centred-grid points and have different spatial resolution from the HadCM3 Model output. Consequently, it is not straightforward to compare NCEP Reanalyses to the HadCM3 Model output directly. Rather, some "regriding" or rescaling of the data, by means a spatial interpolation is needed for comparability. This requires a statistical procedure with adequate precision, such as the compositing interpolation onto a common grid (Kysely, 2002a; Kysely, 2002b). The basic idea of the statistical rescaling (compositing or changing of support) to assess spatial information is to project an empirical large-scale spatial variable on a different space-domain (Cressie, 1996). This technique is based on the assumption that all small-scale variability can be explained or recovered from large-scale variability (von Storch, 1999).

3. DATA REGRIDING

A two-dimensional spatial process can be characterized by a simple realisation of a stochastic model {dT(s) : s Î Â2} where s Î Â2 represent a generic data location. The trend surface model is a polynomial based on data location, which summarizes the dependence of the response variable {dT(s) : s Î Â2} on the explanatory factors {s = (x, y) Î Â2} and occasionally some factors that has distinct levels, in the vein of {ocean/sea, land} dummy variables (Draper and Smith, 1998), in order to take into account that this information may have separate deterministic effects on the extreme temperature anomalies, by the trend surface stochastic equation (Cressie, 1996; Bailey and Gatrell, 1996).

If the primary interest is to understand and describe the nature of the variation in the observed attribute values, by isolating any systematic large-scale trend, the knowledge of the trends in models are important to be diagnosed. One method that may be used to identify, explore and characterize the effect of potential large-scale factors on an outcome variable is the compositing (compositing refers to creating solutions by combining objects from different sources). The main advantage of this method is its simplicity in both implementation and understanding. It is extremely flexible, able to capture both localized and moving features and to quantify non-linearity in the behavior (by comparing opposite phases of the extreme events).

In this work one interpolates through the data points by polynomials cubic splines to determine the predicted extreme anomalies ({dT(s) : s Î Â2}) - For n given points there exists a unique polynomial of degree n-1 or less which passes through these points. A cubic spline is a piecewise cubic polynomial such that the function, its derivative and its second derivative are continuous at the interpolation nodes. The natural cubic spline has zero second derivatives at the endpoints. It is the smoothest of all possible interpolating curves in the sense that it minimizes the integral of the square of the second derivative (de Boor, 1978).

The spatial similarities between both models are illustrated in Figure 2a-b and Figure 3a-b, where significant spatial misfits are locally detected essentially vis-à-vis of the minimum temperature systems (Figure 2a and Figure 3a). The differences are largely influenced by topography, latitudinal heat and energy balances. The compositing method involves constructing an indicator which correlates with the variability of the phenomenon of interest. Making a regression of the "index" onto the data yields composite spatial patterns of variability.



A potential problem with climate data is that they undoubtedly present a spatial-dependence component; it can result in spatial autocorrelation which causes nuisances for statistical methods that make assumptions about the independence of residuals. Tests to detect spatial autocorrelation can be performed as an exploratory technique to decide whether results from spatial regression modeling could be used. If spatial residual autocorrelation exists it will violate the assumption about the independence of residuals and the validity of hypothesis testing in model fit is questioned. The main effect of such violations is that the residual squared sum is underestimated, inflating the value of test statistic that increases the chance of a Type I error - incorrect rejection of the null hypothesis, H0: no spatial autocorrelation. Hence, in attributes motivating the spatial pattern of the residual analyses, the level, the nature and strength of the interdependence intrinsic to each temperature extreme ensemble were defined by a measure of residuals spatial association, the Moran’s indices (Moran, 1948).

Taking the spatial regression residuals into account, reasonable low interdependence can be detected for the HadCM3 Simulations in contrast to a sensible dependence for the NCEP Reanalyses. Observe that the use of simple interpolation to produce a common grid for both NCEP and HadCM3 models can give the impression to be inappropriate to validate extreme temperatures simulated by both models. Effectively, this fact can add another source of uncertainty into the analyses.

4. SPATIAL PATTERN PCA RECOGNITION

A method to find a spatial pattern, which explains the maximal variance of the data fields, must be considered to confirm the author’s desire to assess the models interpolating onto a common grid (spatial confirmatory analyses). Hence, to quantify spatially how well the interpolated HadCM3 climate model reproduces the current extreme events climate the author make use of alternative methods to capture local suspicions of lack- information, as it identifies the strengths and weaknesses of GCM to regional-scale differences in model performance via Principal Components Analysis and Bayesian Classification designed for spatial pattern recognition.

The Principal Components Analysis (PCA) is widely used in climate data analysis. The use of the PCA intends to facilitate the understanding of the underlying spatial data structure. Let k spatial stochastic process {Z(s) : s Î Â2}1, ...k where s represent a generic data location in a common two-dimensional Euclidean space (Bailey and Gatrell, 1996). The result of the PCA is a set of indices describing the spatial variations in the original data set, called principal components. Each principal component is a linear combination of the original variables. The amount of information suggested by a principal component is its variance and the principal components are derived in decreasing order of variance. In PCA the set of all observations at aggregated time is treated as a vector whose coordinates represents points in space, and then decomposed into orthogonal components.

The principal components are obtained from an orthogonal projection of the spatial ensemble {Z(s)}j=1, ... k on discretized eigenfunctions (Johnson and Wichern, 1998). Regularly most of the variance in {Zj=1, ... k (s)} can be described by few principal components. The PCA identifies a small number of linear combinations that account for a large proportion of the spatial variability. This methodology is very useful to describe the spatial variation of the temperature extremes climatology and long-term anomaly.

The spatial pattern, which maximizes the variance, is the eigenvector of the covariance/correlation matrix. This direction is the one in which most of the variance lies. Subsequent spatial patterns can be found in order of decreasing importance, corresponding to decreasing eigenvalues. The lth eigenvector is the direction that explains the maximum of the residual variance in the data and is orthogonal to all previous eigenvectors. This eigenvector is called the lth EOF (Empirical Orthogonal Function). Associated amplitude or component loading is called the lth Principal Component (PC) of the field. A PC is simply the data projected onto the lth EOF. There is no rule-of-thumb to know a priori how many PCs or EOFs should be retained. Commonly, a "low number" of PC’s capture most of the variance and the PC/EOF pairs form an efficient low dimensional description of the field. In this case, projection of the field onto these PC’s is a useful spatial filter. The EOF’s may combine independent spatial patterns of variability, since higher EOF’s are forced to be orthogonal to their predecessors, they may not be physically feasible. In North et al. (1982) a rule for the identification of degenerated EOF modes is described in which the statistical uncertainties of the EOFs are related to a significance test – the test, also used by Kim and North (1993), was slightly modified considering analogies between classic tests of independence: chi-square, likelihood ratio, Fisher’s exact, Cramer’s V and Yule’s Q and implemented in MathSoft, 2000 for the purpose of this research.

The rotated PCA separates distinct physical modes of behaviour or quantify coupling between different modes in the system (ocean/land or longitude/latitude). First the spatial data is filtered through PCA. Then look at the linear combinations of the EOF’s to enforce certain spatial localization of the variability patterns. In this way, the EOF’s found may have a simpler physical interpretation. As expected the quite trivial land-sea contrast dominates the resulting loading plots, which insert a negligible uncertainty source. A linear method based on the Singular Value Decomposition (SVD) on the covariance matrix, was used to isolate patterns of coupled variability between the different fields of data. This finds optimally coupled spatial structures by maximizing the covariance between various possible patterns. Spatial patterns of the singular vectors of the SVD are orthonormal.

Figure 4 and Figure 5 show the minimum and the maximum spatial PC distribution for the extremes’ mean climatology (coldest and warmest), respectively, where the spatial similarity (statistically significant regions) between both models can be easily detected. Remark that in the figures’ label one also gives the Standard Deviation for each eigenvalue associated to each eigenvector component. Figure 6 and Figure 7 show the minimum and the maximum spatial distribution of the PC for the extremes anomaly, where a strong contrast between both models can be without difficulty visualized; essentially with respect to the maximum system, which could be explained by model uncertainties outstanding the energy balance. There is a very high variability of the minimum extremes climatology.





The plot of the first component (Figure 4) basically shows a north-south (i.e., latitudinal) gradient, and the isolines for the second component outline the European and North African land mass. The main spatial structure information can be identified by the two first PC’s, based upon the covariance matrix (these PCs capture most of the variance and the PC/EOF pairs form an efficient two-dimensional description of the field). These two components, for the minimum, account about 98.9% of the total variation for the HadCM3 Simulations and 98.5% of the total variation for the NCEP Reanalyses. The highest positive loadings on the first component are related to the winter’s climatology (Figure 4). This component describes the main characteristic of latitudinal, longitudinal and ocean/land variation in extreme temperatures, from the warmest environment of the south-west to the coldest of the north-east zones. The Mediterranean region and the south part of the Atlantic give high scores, while through the north and the north-east there is a preponderance of negative scores. The variable with highest positive loadings on the second component is related to the summer’s maximum climatology in opposition to the winter’s minimum climatology. This component separates contrasted warmest summer zones from coldest winter ones describing the characteristic on latitudinal variation and land/ocean balance in extreme temperature determination. The first two components, for the maximum, explain about 97.8% (Figure 5) of the total variation for the HadCM3 Simulations and 98% of the total variation for the NCEP Reanalyses. The variable with highest positive loadings on the first component is the winter’s minimum climatology for the HadCM3 Simulations as well as a mixed contrast between the winter’s minimum climatology and the summer’s maximum climatology for the NCEP Reanalyses (Figure 5). This component describes the main characteristic of latitudinal variation in extreme temperatures, from the warmest environment of the south to the coldest of the north zones. The variable with highest positive loadings on the second component is related to the summer’s maximum climatology in opposition to the winter’s minimum climatology. This component describes the main characteristic of latitudinal, longitudinal and ocean/land variation in extreme temperatures, from the warmest environment of the south-west to the coldest of the north-east zones;

Concerning the PCA constructed for the anomaly of long-term extreme seasonal temperature regimes, the elementary spatial structure information can be identified by the first two PC’s, based upon the covariance matrix (Figure 6). These two components, for the minimum, account about 95.3% of the total variation for the HadCM3 Simulations and 98.2% of the total variation for the NCEP Reanalyses. The highest positive loadings on the first component are related to the winter’s climatology (Figure 6). This component describes the main characteristic of ocean/land variation in extreme temperature anomalies. The variable with highest positive loadings on the second component is related to the winter’s minimum in opposition to the summer’s maximum. This component separates contrasted largest summer anomalies from prevalent winter ones describing the characteristic on latitudinal variation and land/ocean balance in extreme temperature anomalies determination. The first two components, for the maximum, explain about 89.3% of the total variation for the HadCM3 Simulations and 98.6% of the total variation for the NCEP Reanalyses (Figure 7). The variable with highest negative loadings on the first component is the winter’s minimum climatology for the HadCM3 Simulations in opposition with the summer’s maximum anomaly for the NCEP Reanalyses (Figure 7). This component describes the main characteristic of latitudinal variation, mainly for the NCEP Reanalyses, in extreme anomalies. The variable with highest negative loadings on the second component is related to the summer’s maximum for the HadCM3 Simulations in opposition to the summer’s maximum for the NCEP Reanalyses. This component describes the main characteristic of latitudinal, longitudinal and ocean/land variation in extreme anomalies. Based on the PCA one concludes that for extreme anomalies there is no realistic spatial agreement between the HadCM3 Simulations and the NCEP Reanalysis throughout the entire target window, local discrepancies could be explained by model uncertainties outstanding the orography, the latitudinal heat and energy balances. Although, the spatial pattern of the NCEP Reanalyses climatology is strongly recognized by the HadCM3 Simulations.

The PCA do not use external information about the phenomenon such as underlying measurements. This prior information can be extracted to quantify spatially how well the general circulation climate model reproduces the current extreme events given by the reanalyses. Compositing all the events, reduce the number of spatial degrees of freedom by cluster ("where the system spends most of its time") analysis, from such a region gives the corresponding spatial pattern and some additional information can be obtained by observing the spatial character of categorized events.

The naive Bayesian classification or clustering is a simple probabilistic classification method. The term naive Bayes refers to the fact that the probability model can be derived using Bayes’ Theorem and that it incorporates strong independence assumptions that often have no bearing in reality, hence are deliberately naive. Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. Abstractly, the probability model for a classifier is a conditional model over a dependent class variable with a small number of outcomes or classes, conditional on several feature variables through.

5. SPATIAL PATTERN BAYESIAN CLASSIFICATION

The procedure of Bayesian Classification (BC) is based on the estimation of the prior distribution upon certain global aspects of the data set (Domingos and Pazzani, 1997). It is an alternative approach to Discriminant Analysis via probability models (Ripley, 1996). The BC summarizes the evidence provided by a data set in favour of one model classification, represented by a statistical model, where the prior knowledge is constructed based on the overall attribute across the spatial data (i.e., make available a measure to quantify how the HadCM3 Simulations react to a specific NCEP Reanalyses classification). This methodology can be used for description as well as for prediction. One has developed the following exercise just to exemplify and attempt to elucidate for what kind of interpretation is expected to come out of this scheme.

Let pC = P(C) the prior spatial probabilities of the classes, assumed to have arisen under the classification according to a climate signal conditional probability function from the NCEP Reanalyses P(NCEP|C). Given the prior probabilities functions, the data C produce posterior probabilities P(C|NCEP). Since any prior information gets transformed to posterior information through consideration of the spatial ensemble C, the transformation itself represents the evidence provided by C. In fact, the same transformation is used to obtain the posterior probability, regardless of the prior probability and this transformation takes a simple form, from the Bayes’ theorem (Kass and Raftery, 1995; Carlin and Louis, 2002).

In practice consider the Boolean classification problem concerning the multinomial climate signal domain presupposing the stochastic function {Z(s), s Î Â2} taking values 1, 2, 3 or 4. Consequently, the HadCM3 Simulations can be represented by a realization of {Z(s), s Î Â2} given by {z(si), i = 1, ..., n} Z = 1, 2, 3, 4, by means of the inter-quartile subdivision of the NCEP Reanalyses pCj = P(Cj(s)), "j= 1,2,3,4 representing the spatial target-classifier (Figure 8). According to the Bayes’ theorem, the probability of the {Z(s)} = {z(si), i = 1, ..., n} being class Cj is:


The probability ratios also called discriminant functions can be expressed in terms of a series of likelihood ratios:

The probabilities on the prediction are crucial for optimal decision making. The cost of predicting for z(si) class Ck when the true class is Cj, k ¹ j, is designated by c(k, j, z). The Bayes optimal prediction for z(si) is the class Ck that minimizes the expected loss (i.e., the expected cost of predicting z(si) belongs to a given class Ck):

In fact, one is interested on the spatial true rate of a classifier; it means the empirical frequency of locations correctly classified. In real life, this naive Bayes approach is more powerful than might be expected from the extreme simplicity of its model; in particular, it is fairly robust in the presence of non-independent attributes (Zhang et al. 2000). Recent theoretical analysis has shown why the naive Bayes classifier is so robust (Rish et al. 2001).

There is a slight visual inconsistency between the HadCM3 Simulations (isotherms) and the NCEP Reanalyses inter-quartile classification of extreme temperature climatology over the target domain. This unpredictability can be established by the reciprocal classification. The HadCM3 Simulations produce unrealistically extreme temperature events for the winter ensemble with misclassification of about 5% and a significant divergence regarding the summer of about 25%, there is a significant divergence regarding the summer time as well, the misfit is about 25% (Figure 8 and Figure 9).


Model Calibration by Relative Risk Assessment: The risk is a measure of the probability that "damages" will occur as a result of a given hazard. The risk assessment was employed to quantify the "evaluation" of the risk posed to the HadCM3 by the potential presence or use the Simulations NCEP Reanalyses. The relative risks associated (Figure 10) are reasonable. The biggest sources of uncertainties are due the displacement from north-east bias to the south-west temperature classification. The Mediterranean region and the south part of the Atlantic give small uncertainties, while though the north, mid latitudes and through north-east there is a preponderance of high risks. These relative risk maps separate contrasted hot summer zones from cold winter ones describing the characteristic of latitudinal variation in extreme of temperature. The HadCM3 Simulations produce unrealistically extreme temperature climatology for the winter in the North-eastern Europe and for the summer in the North-western Europe, where the cost of a false diagnostic seems to be very high.


6. SUMMARY

This manuscript is an attempt to compare patterns of extreme temperature anomalies simulated by the GCM (HadCM3) with NCEP/NCAR Reanalyses over a sector in Europe. The spatial patterns of extreme anomalies in the NCEP Reanalyses are not recognized from the HadCM3 model. Such instability may persist in the future. In all variables the same spatial climatology pattern is detected, suggesting that the ensemble describes the main characteristic of spatial variation in temperature extremes, from the temperate climates of the southwest zones to the extreme cold of the northeast ones. The HadCM3 Simulations are not able to fit reasonably the regimes defined by the NCEP Reanalyses. This is evidenced by the analyses carried out in this paper.

The procedures described here are useful for two reasons. First, they help to prevent model misspecification, which can lead to incorrect conclusions regarding accuracy and efficacy. Second, they provide information about the relationship between prognostic factors (predictive location-based) and extreme risk. As a matter of fact, all diagnostic based on PCA and BC reject the hypothesis that the HadCM3 Simulations are adequate for modeling present-days scenarios of extremes based on the NCEP Reanalyses. Nevertheless, the HadCM3 Simulations fit quite well the spatial climate regimes defined by the NCEP Reanalyses. Climatic risk analyses and forecasting, concern among others topics, assess and interpretation changes in the intensity and frequency of recurrence of long duration extreme values (heat/cold waves). So, information and incorporation of other extreme indices into the analysis are strongly advisable.

Though uncertainties are inherent in any statistical model, they can be reduced by judicious choices of other methods. In the context of the extreme value analysis, possibilities include the use of alternative models that confirm the information obtained in this exploratory analysis. Additionally, only the prognostic factors (predictive location-based) and not the extreme hazard (vulnerability severe-extreme-based) were considered in this study. The objective of this basic analysis was just to present a way to assess spatial extremes of a particular GCM, analyzing the uncertainty introduced by the inability of HadCM3 Simulations to adequately reproduce the current climate by statistical modeling.

PCA and BC are very powerful tools to model extreme hazards and future uncertainties by means of location-based prediction. It is an alternative method to capture local suspicions of lack of information. This exploratory work is essential, as it identifies the strengths and weaknesses of GCM to regional-scale differences in model performance, providing insights on regional-scale climate interactions and processes.

7. ACKNOWLEDGMENTS

This work is an autonomous contribution to the EU funded MICE project (Modelling the Impacts of Climate Extremes: MICE EVK2-CT-2001-00118). Model data were supplied by the Climate Impacts LINK project (funded by the UK Department of the Environment, Food and Rural Affairs) on behalf of the Hadley Centre for Climate Prediction and Research, which is part of the Met Office in the UK (the National Meteorological Service). The financial supports of FCT (Portugal) and ICAT-FCUL (Portugal) are also acknowledged. Appreciative thanks to the referees and the journal editor; their comments were very useful in improving this paper.

8. REFERENCES

Received October 2005

Accepted May 2006

  • BAILEY, T. C.; GATREL, A. Interactive Spatial Data Analysis Prentice Hall, 1996. 413p.
  • de BOOR, C. A Practical Guide to Splines Reprint edition, Springer, 1994. 416p.
  • CARLIN, B. P; LOUIS, T. A.Bayes and Empirical Bayes Methods for Data Analysis 2nd edition, Chapman and Hall/CRC, 2002. 440p.
  • CRESSIE, N. A. C., Change of support and the modifiable area unit problem. Geographical Systems, v. 3, p. 159-180, 1996.
  • DOMINGOS, P.; PAZZANI, M. Beyond independence: Conditions of the optimality of the simple Bayesian classifier. Machine Learning, v. 29, p. 103-130, 1997.
  • DRAPER, N. R.; SMITH, H. Applied Regression Analysis 3rd edition, Wiley-Interscience, 1998. 736p.
  • FUENTES, M.; GUTTORP, P.; CHALLENOR, P. Statistical Assessment of Numerical Models. NRCSE Technical Report Series 076, University of Washington Press, 2003.
  • GORDON, C.; COOPER, C.; SENIOR, C. A.. BANKS, H.. GREGORY, J. M.. JOHNS, T. C.; MITCHELL, J. F. B.; WOOD, R. A. The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments. Climate Dynamics, v. 16, p. 147-168, 2000.
  • JOHNSON, R. A.; WICHERN, D.W. Applied Multivariate Statistical Analysis 4th edition, Prentice Hall, 1998. 799p.
  • KALNAY, E.; KANAMITSU, M.; KISTLER, R.; COLLINS, W.; DEAVEN, D.; GANDIN, L.; IREDELL, M.; SAHA, S.; WHITE, G.; WOOLLEN, J.; ZHU, Y.; CHELLIAH, M.; EBISUZAKI, W.; HIGGINS, W.; JANOWIAK, J.; MO, K. C.; ROPELEWSKI, C.; WANG, J.; LEETMAA, A.; REYNOLDS, R.; JENNE, R.; JOSEPH, D. The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc., v. 77, p. 437-471, 1996.
  • KASS, R.; RAFTERY, A. Bayes factors and model uncertainty. J. Amer. Statist. Assoc., v. 90, p. 773-795, 1995.
  • KIM, K.-Y.; NORTH, G. R. EOF analysis of surface temperature field in a stochastic climate model. J. Climate, v. 6, p. 1681-1690, 1993.
  • KYSELY, J. Comparison of extremes in GCM-simulated, downscaled and observed central-European temperature series. Climate Research, v. 20, p. 211222, 2002a.
  • KYSELY, J. Probability estimates of extreme temperature events: Stochastic modelling approach vs. extreme value distributions. Studia Geophysica et Geodaetica, v. 46, p. 93112, 2002b.
  • LUCIO, P. S., Geostatistical assessment of HadCM3 simulations via NCEP reanalyses over Europe. Atmospheric Science Letters, v. 5, p. 118-133, 2004a.
  • LUCIO, P. S. Assessing HadCM3 simulations from NCEP reanalyses over Europe: Diagnostics of block-seasonal extreme temperatures regimes. Global and Planetary Change, v. 44, p. 39-57, 2004b.
  • MATHSOFT, 2000. S-Plus 2000. Guide to Statistics. Data Analysis Products Division, MathSoft: Seattle.
  • MORAN, P. A. P. The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, v. 10, p. 243-251, 1948.
  • NORTH, G. R.; BELL, T. L.; CAHALAN, R. F.; MOENG, F. J. Sampling errors in the estimation of empirical orthogonal functions. Mon. Wea. Rev., v. 110, p. 699-706, 1982.
  • POPE, V. D.; GALLANI, M. L.; ROWNTREE, P. R.; STRATTON, R. A. The impact of new physical parametrizations in the Hadley Centre climate model - HadAM3. Climate Dynamics, v. 16, p. 123-146, 2000.
  • RIPLEY, B. D. Pattern Recognition and Neural Networks. Cambridge University Press, 1996. 415p.
  • RISH, I.; HELLERSTEIN, J.; JAYRAM, T. An analysis of data characteristics that affect naive Bayes performance. IBM T. J. Watson Research Center, Technical Report RC 21993, 2001.
  • von STORCH, H. On the use of "inflation" in statistical downscaling. Journal of Climate, 12, p. 3505-3506, 1999.
  • ZHANG, H.; LING, C. X.; ZHAO, Z. The learnability of naive Bayes. Lecture Notes in Computer Science, v. 1822, p. 432-442, 2000.
  • Corresponding author:
    Paulo Sérgio Lucio
    Departamento de Estatística - CCET - UFRN
    Lagoa Nova – Natal-RN
    CEP 59.072-970E-mails:
  • Publication Dates

    • Publication in this collection
      12 Nov 2007
    • Date of issue
      Aug 2007

    History

    • Accepted
      May 2006
    • Received
      Oct 2005
    Sociedade Brasileira de Meteorologia Rua. Do México - Centro - Rio de Janeiro - RJ - Brasil, +55(83)981340757 - São Paulo - SP - Brazil
    E-mail: sbmet@sbmet.org.br