Estimation of the Agricultural Probability of Loss : evidence for soybean in Paraná State

In any agricultural insurance program, the accurate quantification of the probability of the loss has great importance. In order to estimate this quantity, it is necessary to assume some parametric probability distribution. The objective of this work is to estimate the probability of loss using the theory of the extreme values modeling the left tail of the distribution. After that, the estimated values will be compared to the values estimated under the normality assumption. Finally, we discuss the implications of assuming a symmetrical distribution instead of a more flexible family of distributions when estimating the probability of loss and pricing the insurance contracts. Results show that, for the selected regions, the probability distributions present a relative degree of skewness. As a consequence, the probability of loss is quite different from those estimated supposing the Normal distribution, commonly used by Brazilian insurers. Key-words: Risk, extreme value theory, probability of loss, crop insurance. Resumo: Em todo programa de seguro agrícola, a quantificação da probabilidade da perda da produtividade agrícola é de grande importância. A fim de estimar esta quantidade, é necessário supor alguma distribuição de probabilidade paramétrica. O objetivo deste trabalho é estimar a probabilidade da perda usando a teoria dos valores extremos para modelar a cauda esquerda da distribuição. Os valores estimados foram comparados aos valores estimados sob a suposição da normalidade. Por fim, são discutidas as implicações de se supor uma distribuição simétrica em vez de uma família mais flexível de distribuições para estimar a probabilidade da perda e fixar a taxa de prêmio dos contratos de seguro. Os resultados mostraram que, diferente dos procedimentos adotados no mercado segurador, que supõem normalidade, as distribuições de probabilidade apresentam um relativo grau de assimetria que modifica o valor das probabilidades de perda e, consequentemente, as taxas de prêmio. Palavras-chave: Teoria dos valores extremos, risco, probabilidade de perda, seguro agrícola. JEL Classification: C00. 1. Professor no Departamento de Economia, Administração e Sociologia da ESALQ/USP e coordenador do GESER/ESALQ. E-mail: vitorozaki@yahoo.com.br 2. Professor no Departamento de Estatística – UEPB. E-mail: ricardo.estat@yahoo.com.br 3. Professor na Faculdade de Matemática – UFU. E-mail: prineves.est@gmail.com 4. Professor no Departamento de Estatística – UFPEL. E-mail: rogerio.c.campos@hotmail.com


Introduction
Historically, Brazilian agricultural insurance market faces some well recognized problems.More specifically, insurance high costs, systemic nature of risk, adverse selection, lack of accurate farm-level information, and moral risk can be pointed out as the main drawbacks.
Recently, the Federal Government has implemented some initiatives to develop crop insurance in Brazil.For instance, by the Federal Law 10,823 part of the total insurance costs will be subsidized.Now, the following step relies on providing information to drive rules by which the insurance companies will offer their products.It implies in getting around the problem concerning the lack and inaccurateness of the information exhibited in some areas.
Remarkably, the insurance market is based on individual information which is important to show the risk profile of each one.In the absence of reliable and accurate information insurers will avoid offer their contracts.In the agricultural sector farm-level yield data is almost inexistent.Some cooperatives have gathered yield information from their associated producers, but it is still far from being enough to support the spatial density of information which is demanded by the crop insurance.Municipalitylevel yield has been recorded and released by Brazilian Institute for Geography and Statistics (IBGE) and used as an alternative.However, such aggregation is not desirable for local (farming fields) risk analysis (OZAKI, 2008).
According to the IBGE the Parana State is the major grain producer in Brazil.In 2011, the IBGE estimated a high production of soybean (13.4 millions of tons).This amount is 41% larger, when compared to the total reached in the previous harvest year.However, less than 10% of the planted area is covered by crop insurance.In this context, the analyses of the probability of loss can support insurance companies to deal with seasonal fluctuation of grain production.Traditionally, the Normal distribution assumption is commonly used by the insurers to quantify and price the risk.Nevertheless, by assuming the Normal distribution it is not possible to take into account the skewness and bimodalities present in the probability distributions of the agricultural yields (GOODWIN and KER, 1998).
Moreover, the shape of the distribution is particularly important in the context of crop insurance studies, because it reflects the risk (probability of loss) of the producer.When modeling agricultural yields one must look at the density concentrated at the left tail of the distribution.When yields events are assumed as normally distributed the probability of loss will be underestimated if the true distribution exhibits  heavier tails as in extreme values distributions (OZAKI et al., 2008).
This study applies the extreme value theory to model the left tail of the agricultural yield probability distribution of the mesoregions 5 in Parana state.Finally, the estimates are compared to those supposing the Normal distribution that is commonly used by insurance companies.
The paper is organized as follows: in Section 2 we briefly review some aspects of crop insurance.In Section 3 we show the extreme value theory and in Section 4 we describe the Brazilian yield data for soybean.In Section 5 we present our empirical findings and discuss their implications, and in Section 6 we conclude the paper.

Crop insurance
Basically, the compensation mechanism is triggered by the farm-level yield.Producers are indemnified when the agricultural yield observed in the end of the harvest (in the unit or farm) falls below the yield guaranteed in the contract.This type of agricultural insurance is called individual yield crop insurance.The indemnity I for each farm i can be expressed as follows: 5. Groups of municipalities within a State with common characteristics.
In which: φ i is the deductibility, 0 < φ i < 1; xi c is the critical yield; x i is the observed (final) yield; The critical yield is defined according to the equation: x c = α i μ i .In which: α i is the level of coverage chosen by the agricultural producer, 0 < α i < 1; and, μ i is the farmer expected yield.In what follows, I i represents the indemnity due to each producer when the agricultural yield x i falls below the guaranteed yield xi c .

Methodology
Extreme value theory (EVT) is widely applied in financial, economical and insurance areas.One of the main challenges to the risk manager is to implement risk management models which allow for rare and damaging events with perverse consequences (KOEDIJK et al., 1990;LORETAN and PHILLIPS, 1994;LONGIN, 1996;EMBRECHTS et al., 1997;DANIELSSON and DE VRIES, 2000;NEFTCI, 2000;MCNEIL and FREY, 2000;GENCAY et al., 2003;DIEBOLD et al., 1998).
The regulator agents of insurance companies expect the companies be able to honor their contracts even under crises scenarios.Thus, it is mandatory to keep a reasonable fund to avoid insolvency in case of catastrophic events.
EVT has become one of the main theories in developing statistical models for extreme insurance losses.This approach is focused on a special class of probability distributions called Generalized Extreme Value distribution (GEV) which encompass distributions like Gumbel, Fréchet and Weibull.PGD (Pareto Generalized Distribution) distribution such as Exponential, Pareto and Beta are also used in the EVT approach as well.In the standard format GEV and PGD depend only on the parameter which is called tail index.
There are two main approaches to deal with extreme random variables: POT (Picks Over Threshold) approach concerns on fitting the probability distribution (usually a PGD) by taking values over some threshold; and, the Block Maxima (or Gumbel method) approach addresses the set of maximum values coming from a block of observations.In order to assess the market risk, for instance, the Block Maxima is used to estimate the return probability of maximum (minimum) event for a given time interval (e.g., months, years) (DE HAAN and PEREIRA, 2006).

Extreme value theory
Suppose that the sequence X 1 , X 2 , ... X n is independent and identically distributed (i.i.d.) with distribution function F(X) and let ϒ n = max(X 1 , X 2 , ... X n ) with distribution function: In which a n > 0 and b n are normalized constants.
If (1.1) holds, we say F (or X) belongs to the (maximum) domain of attraction (MDA) of H and write F ∈ MDA(H) (or X ∈ MDA(H)).Note that H has one of the following three parametric forms (which are generally called Extreme Value Distributions -EVD): In II and III α is any positive number.The three types are also often called the Gumbel, Fréchet and Weibull distributions, respectively.

Fisher-Tippett Theorem
Let ϒ n = max(X 1 , X 2 , ..., X n ) be a sequence of independent and identically-distributed random variables.For some a n > 0 let: for some non-degenerate H then it belongs to the three types described in 1.2.
Based on the Fisher-Tippett theorem is possible to estimate the asymptotic distribution of in which µ, σ and ξ are the parameters of location, scale and shape, respectively.Moreover, 1 + ξ(xµ)/σ > 0, σ > 0. The case where ξ = 0 is interpreted as the limit case ξ → 0, that is Type II and III correspond to The Fisher-Tippett theorem gives a limit distribution for the maximum collected in a block of size n.Let x 1 ,x 2 ,...,x k be observations of the random variable X.The sample data is partitioned into n blocks such that nk ≤ m, and let The estimators of µ ^, σ ^ and ξ ^ are estimated using this new sample, ϒ 1 ,ϒ 2 ,...,ϒ n .
From the GEV we can get the probability density function of the GEV distribution given by: 6. Smith (1990) detailed the statistical treatments, applications and estimations of the GEV.

ϒ ϒ
 in which -∞ < x < (µ -σ)/ξ for ξ < 0, and (µ -σ)/ξ < x < +∞ for ξ > 0 in which x is a random variable associated to the maximum values.Suppose now that X i , i = 1, 2, … is a sequence of i.i.d.random variables with a continuous marginal distribution function F(x), and Xi t , i = 1, 2, … is the so-called associated sequence of i.i.d.random variables with the same marginal distribution function F. note that ϒ n stands for the maximum as usual, defined by (1.1), while ϒ ^n denotes the corresponding maximum of ,..., X X ,.The limiting distribution of ϒ n can be related to the limiting distribution of ϒ ^n via a quantity called the extremal index of the sequence {X n } (CARTWRIGHT, 1958;NEWELL, 1964;O'BRIEN, 1974).

Selection of the extreme value distribution
The statistics of likelihood ratio (T LR ) is defined by In which l GEV θ t ^h and l G θ t ^h are the maximum of the logarithm of the maximum likelihood function of GEV and the Gumbel distribution in which GEV θ t =(µ ^,σ ^,ξ ^) and G θ = t (µ ^,σ ^) are vectors of the estimated parameters µ, σ and ξ with asymptotic distribution 2 χ with one degree of freedom.Hosking et al. (1985) suggest the use of a modified test statistics to improve the approximation of the asymptotic distribution of (1.4) given by In which n is the length of the sample.
To test the null hypothesis H 0 : ξ = 0 versus H 1 : ξ ≠ 0, one must compare the test statistics T * LR with the tabulated value of 2 χ distribution with one degree of freedom and ϑ significance level.
, H 0 is rejected.In other words, there is strong evidence that the distribution is not the type I (Gumbel).

Diagnostic of the extreme value distribution
In order to test the assumption that the data follow a GEV distribution, it is possible to use the Kolmogorov-Smirnov test (SANSIGOLO, 2008;BAUTISTA et al., 2004).The D statistic of the Kolmogorov-Smirnov tests is defined by is the theoretical distribution of GEV and F x i t ^^h h is the empirical distribution.
The test procedure consists in sorting the data in ascending order.The distribution function assumed for the data is defined by F(x (i) ) and the empirical distribution function of X is described as follows: The hypothesis that the data follow the GEV distribution (H 0 ) will be rejected if the test statistics D D , n $ j 6 @ , where D , n j 6 @ is the critical value.One should also use the graphical interface via the qq-plot (quantile-quantile chart).
In order to determine the probability distribution assuming a Normal distribution we used the maximum likelihood method to estimate the two first moments of the distribution.I what follows we fitted the Normal distribution and it was compared to the GEV distribution.
The next step is to calculate the probability of loss by integrating the area below the curve less than a predetermined level (a percentage of the average yield).In the crop insurance market, this percentage -often called level of coveragerange between 60 up to 80% of the average yield for each mesoregion.In this study we consider only three levels, 60, 70 and 80%.
The shape of the distribution is of great importance when calculating the probability of loss.Considering the situation where the true distribution is symmetric and one fits an asymmetrical distribution to the data.In this context if the asymmetry is negative, then the probability of loss will be underestimated in relation to the true distribution.The opposite is also true.

Data description
The municipality-level soybean yield data were acquired from the Secretary of Agriculture of the State of Parana (Seab), from 1981 to 2007, in kg/hectare.The municipality-level yield is an average for crop yield in the municipality and is based on the subjective methodology created by the Geography and Statistics Brazilian Institute (IBGE).These statistics are based on a consensus among agricultural players in a municipality (farmers, bank managers, crop extensionists, etc).Thus, the IBGE releases the average yield for each municipality.Given a mesoregion, we select the minimum value of the soybean yield in a group of municipalities within the mesoregion.In other words, if there are ten mesoregions, we have ten observations in each year.This data are collected and released annually (two year lag).
Thus, the empirical application considers mesoregions (a set of municipalities) defined by the Brazilian Institute of Geography and Statistics (IBGE) (Figure 1).Considering the fact that in each mesoregion there exist a set of municipalities then the minimum value of this set of municipalities in a year is used to create the time series of minimum values for each mesoregion.We utilized this procedure for each year since 1980.Figure 2 shows the evolution of the average of the minimum values, from 1981 to 2007.

Results and discussion
Table 1 and Figure 3 show, respectively, descriptive statistics and the boxplot of the soybean yield for all mesoregions.The descriptive statistics show that the median is systematically higher than average or smaller, which suggests that the distributions are asymmetric to the left or to the right, respectively.
The coefficients of asymmetry and kurtosis can be used in the exploratory analysis to recognize the shape (skewness, kurtosis) of the distribution, or even to recognize mixtures of distributions.Table 2 shows the point estimates of the shape parameter 1 ξ α = .Results show that the shape parameters were negative for 7 regions and positive for 3 regions suggesting that the extreme distributions are either Weibull or Gumbel.
Comparing the value of the statistics T * LR ^h presented in Table 3 with the 2 χ statistics with one degree of freedom and 5% level of significance .3 84 ; . 1 0 05 2 χ = ^h , one can note that the distribution to be fitted is the Gumbel distribution in the mesoregions 2, 3, 5, 6, 7, 8, 9 and 10.On the other side the mesoregions 1 and 4 will be modeled using the Weibull distribution.This fact is confirmed by the D statistics (Kolmogorov-Smirnov test) presented in Table 3.According to the test most of the mesoregions could be fitted by the Gumbel distribution.
It is important to note that the Normal distribution could also be used as an alternative probability distribution according to the Shapiro-Wilk normality test.In Table 4 one can note that in mesoregions 2, 3 and 4 the Normal distribution could not be rejected at the level of 10%.On the other hand, Atwood et al. (2003) raise critical issues on the use of the Normal distribution when modeling crop yields.Figure 4 shows the distributions adjusted for each mesoregion.Most of the distributions are asymmetric to the right.In this context, the probability of loss is supposed to be higher compared to the Normal distribution case, commonly used by the insurance companies.The Figure 5 shows the diagnostics of the fitted distribution through the qq-plot.Considering the level of significance of 1%, Table 4 shows that the data follow the GEV distribution.In other words, the null hypothesis cannot be rejected.
Once the distribution of interest is chosen, the next step is to determine the probability of loss for each mesoregion and compare the results assuming the Normal distribution.Tables 5 and  6 show the probabilities of loss of the selected municipalities using both distributions (Weibull-Gumbel and Normal).One must note that the level of coverage is a parameter chosen by the insured at the beginning of the insurance contract.In this context, the agricultural producer can choose the levels from 60 to 80% of the average yield of the municipality i (μ i ) resulting in the critical yield.If the final yield -in the end of the harvest, y i , is lesser than the critical, the insured receives the indemnity.In this study, because of the fact that we aggregate the information into mesoregions, the critical yield is related to each mesoregion instead of municipalities.Thus, the premium rate is calculated for each mesoregion.
In Tables 5 and 6 one can notice that for all levels of coverage, the probability of loss estimated using the Normal distribution is greater than the estimated probability using the Weibull-Gumbel distributions.A direct implication of this fact is the overpricing of the premium rate, given by (Goodwin and Ker, 1998): in which E is the expectation operator and F is the distribution function.
The probability of loss is represented by the distribution function in the premium rate formulae.In the calculation of the rate, the operator of expectation and the term in the denominator are constant.
In what follows, we present a hypothetical example to illustrate this fact.Consider a soybean producer in Mesoregion 1.The premium rate calculated for the level of 80% is equal 2% and family H without the distribution of X.The three types of extreme value distributions can be written into a generalized extreme value (GEV) distribution form given by

Figure 2 .
Figure 2. Boxplot of the yields for each mesoregion

Figure
Figure 3. Density estimation

Figure 4 .
Figure 4. Kolgomorov-Sminorv test of the empirical cumulative distribution (dotted line) and theoretical distribution (continuous line)

Table 1 .
Soybean yield descriptive statistics Source: Results of the study.

Table 2 .
Estimates of the location, scale and shape parameters

Table 3 .
Intervals of 95% confidence for the shape parameter (ξ) and values of the Modified Likelihood Ratio Test (T * LR ) Source: Results of the study.

Table 6 .
Probability of loss assuming the Weibull and Gumbel distributions -levels of coverage of 60, 70 and 80% Source: Results of the study.

Table 7 .
Probability of loss assuming the Normal distribution -levels of coverage of 60, 70, and 80% Source: Results of the study.