Acessibilidade / Reportar erro

Pricing Crop Revenue Insurance Using Parametric Copulas* * We thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial support.

Abstract

Crop revenue insurance has been widely discussed recently. It has become an important mechanism for risk; management of crop yielO anp prices. However, a more comprehensive study is needed to investigate the dependence stmcture between thu variables analyzed ro celculate the premium rate actearially fair for revenue insurance. Tleis study proposes alternatives ao calculate premium rates for revenue insurance using parametric copula functions. The results nuuunsts that the average commnndal rates calculated by the insurer aru undnrestimftnd when compared to tce copula model. The underesSimation om rates can lead to seriou losses to insurers, sinca segy consiher a lower risk than should be tamen into account.

Keywords
Crop risk management; OLLN distribution; parametric copulas; revenue insurance

Resumo

Nos últimos anos, o seguro de faturamento agrícola tem sido amplamente discutido. Este tipo de seguro tornou-se um mecanismo importante para o gerenciamento dos riscos climáticos (afetam a produtividade) e de mercado (preços das culturas). No entanto, um estudo mais abrangente é necessário para investigar a estrutura de dependência entre as variáveis analisadas e para calcular a taxa do prêmio atuarialmente justa para o seguro de faturamento. Este estudo propõe formas alternativas para calcular as taxas do prêmio do seguro de faturamento utilizando as funções cópulas paramétricas. Os resultados sugerem que as taxas comerciais médias calculadas pela seguradora estão subestimadas quando comparadas com o modelo de cópulas. A subestimação das taxas podem levar a graves perdas às seguradoras, uma vez que considera um risco menor do que deveria ser levado em consideração.

1. Introduction

Agriculture is extremely important for Brazil, both economically and socially; nevertheless, many risks threaten all agribusiness chains to a greater or lesser degree. Among which, climate risks affect crop yield and market risks can lead to significant changes in commodity prices.

The development of risk management strategies requires the understanding of the nature of the risk, its origin, its likelihood distribution, its correlation with other risks and the capability of instruments to reduce it. Several strategies could be used by the agricultural sector to manage risks, namely crop diversification, new production techniques, agricultural derivatives and crop insurance.

Crop insurance is efficient to protect the producer’s income in adverse conditions and is an important instrument to transfer the risk from producers to other economic agents. Insurance guarantee the income when some event occurs that causes economic damage by comparing their income with the scenario without such an event, upon payment of a premium and an indemnity, in case of damages.

In Brazil, the first revenue insurance was offered in 2010. In 2011, a second company also began to offer this type of insurance. This insurance product is relatively recent in the Brazilian agricultural market; however, the demand to this new insurance policy has grown each year (Carvalho et al., 2013Carvalho, A. L., Paredes, C. A., Miquelluti, D., Ruis, D., Passarelli, E., & Duarte, G. V. (2013). Aspectos gerais do seguro de faturamento. http://geser.imagenet.com.br/
http://geser.imagenet.com.br/...
).

Because it is a recent product in the Brazilian market, no studies have evaluated actuarial procedures to value this insurance type; thus, requiring further research on the theoretical aspects of pricing that take into account specific features of yield and price series of Brazilian soybean, justifying the purpose of this work.

According to Goodwin and Mahul (2004Goodwin, B. K., & Mahul, O. (2004, September). Risk modeling concepts relating to the design and rating of agricultural insurance contracts (Policy Research Working Paper No. 3392). Washington, D.C.: World Bank. http://dx.doi.org/10.1596/1813-9450-3392
http://dx.doi.org/10.1596/1813-9450-3392...
), the main issue of revenue insurance is risk modeling, which consists of calculating joint distribution, marginal distributions and correlation between random variables of yield and price.

In multivariate problems, the multivariate normal distribution is usually used. This choice is mostly based on its mathematical simplicity. However, this normality assumption restricts the type of association between the margins and becomes a linear relationship. In addition, it assumes a symmetric association, which may not capture data idiosyncrasies in practice.

In the actuarial field, data tend to introduce heavier tails and dependency structures that may be linear, non-linear, and dependency only on distribution tails. Therefore, improper use of the normality hypothesis may result in large financial losses and lead to an underestimation of probability and severity of events related to this loss.

This study presents alternative methods to calculate and estimate the premium rate of the revenue insurance through parametric copulas in joint distribution between price and yield. The next subsection presents a literature review.

1.1 Literature review

A precise modeling of random variables of the agricultural yield and the price are necessary to calculate the fair actuarial premium rate. Three approaches are used to model these variables. The first involves estimating parametric distribution parameters. The second requires a wide variety of nonparametric methods commonly used to model approximate distributions. Thirdly, semi-parametric are alternative methods that combine elements of both the parametric and nonparametric methods.

Nonparametric methods are advantageous, as they do not require prior specification of the distribution design, meaning that the ”data speak for themselves”. In this case, some distribution features can be shown, such as positive-negative asymmetry and bimodality. However, these methods require a large sample, despite greater flexibility to describe various density shapes (Goodwin & Mahul, 2004Goodwin, B. K., & Mahul, O. (2004, September). Risk modeling concepts relating to the design and rating of agricultural insurance contracts (Policy Research Working Paper No. 3392). Washington, D.C.: World Bank. http://dx.doi.org/10.1596/1813-9450-3392
http://dx.doi.org/10.1596/1813-9450-3392...
).

For parametric distribution, normal distribution to model yield is suggested (Just & Weninger, 1999Just, R. E., & Weninger, Q. (1999). Are crop yields normally distributed? American Journal of Agricultural Economics, 81(2), 287-304. http://dx.doi.org/10.2307/1244582
http://dx.doi.org/10.2307/1244582...
). However, Ramirez, Misra, and Field (2003)Ramirez, O. A., Misra, S., & Field, J. (2003). Crop-yield distributions revisited. American Journal of Agricultural Economics, 85(1), 108-120. http://dx.doi.org/10.1111/1467-8276.00106
http://dx.doi.org/10.1111/1467-8276.0010...
have found evidence against normality in their research and the Beta distribution is the most used parametric distribution (Babcock & Hennessy, 1996Babcock, B. A., & Hennessy, D. A. (1996). Input demand under yield and revenue insurance. American Journal of Agricultural Economics, 78(2), 416-427. http://dx.doi.org/10.2307/1243713
http://dx.doi.org/10.2307/1243713...
; Hennessy, Babcock, & Hayes, 1997Hennessy, D. A., Babcock, B. A., & Hayes, D. J. (1997). Budgetary and producer welfare effects of revenue insurance. American Journal of Agricultural Economics, 79(3), 1024-1034. http://dx.doi.org/10.2307/1244441
http://dx.doi.org/10.2307/1244441...
).

This work (Duarte, Braga, Miquelluti, & Ozaki, 2017Duarte, G. V., Braga, A., Miquelluti, D. L., & Ozaki, V. A. (2017). Modeling of soybean yield using symmetric, asymmetric and bimodal distributions: implications for crop insurance. Journal of Applied Statistics, 45(11), 1-18. http://dx.doi.org/10.1080/02664763.2017.1406902
http://dx.doi.org/10.1080/02664763.2017....
) compared Normal, Beta, skewnormal, skew-t distributions and the Odd Log-Logistics Normal (OLLN) distribution to model soybean yield in four municipalities in Parana State. In these municipalities, the OLLN distribution was the best fit to the data for capturing bimodality. In addition, it compare the insurance premium rates of crop yield calculated by OLLN distribution with commercial rates applied by insurance companies in the country, indicating that the rates calculated by the OLLN distribution were lower than those applied by insurance companies.

When modeling prices, the lognormal distribution has been widely used in studies on crop insurance (Goodwin, Roberts, & Coble, 2000Goodwin, B. K., Roberts, M. C., & Coble, K. H. (2000). Measurement of price risk in revenue insurance: Implications of distributional assumptions. Journal ofAgricultural and Resource Economics, 25(1), 195-214. https://www.jstor.org/stable/40987056
https://www.jstor.org/stable/40987056...
). Alternatives to lognormal distribution have been studied and semi-parametric and nonparametric approaches are commonly used.

According to Goodwin and Ker (2002)Goodwin, B. K., & Ker, A. P. (2002). Modeling price and yield risk. In R. E. Just & R. D. Pope (Eds.), A comprehensive assessment of the role of risk in U.S. agriculture (Vols. Natural Resource Management and Policy, vol 23, pp. 289-323). Boston, MA: Springer. http://dx.doi.org/10.1007/978-1-4757-3583-3_14
http://dx.doi.org/10.1007/978-1-4757-358...
, the main difficulty in modeling crop revenue lies in the need to determine the correlation degree between price and yield, since they are rarely independent.

This dependency between variables can be studied via dependency structures for copulas. The theory of copulas has often been used to analyze financial series and risk factors. In the case of crop insurance, a small number os studies have been elaborated such as Ahmed and Serra (2015)Ahmed, O., & Serra, T. (2015). Economic analysis of the introduction of agricultural revenue insurance contracts in Spain using statistical copulas. Agricultural Economics, 46(1), 69-79. http://dx.doi.org/10.1111/agec.12141
http://dx.doi.org/10.1111/agec.12141...
, and Miqueleto (2011)Miqueleto, G. J. (2011). Contribuições para o desenvolvimento do seguro agrícola de renda para o Brasil: Evidências teóricas e empíricas (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/T.11.2011.tde-12092011-163544
http://dx.doi.org/10.11606/T.11.2011.tde...
.

This study discusses the different parametric copulas for modeling joint distribution, taking into account the margins approximated by the nonparametric empirical method and the parametric method with the Odd-Log Logistics Normal (OLLN) distribution for the yield and the Skew-T-Student distribution for the price series. In addition, the calculated rates were compared by the copula methodology with bivariate Normal distribution, generally used in the insurance market, and with commercial rate applied by the insurance market.

The next subsection shows ways of calculating agricultural insurance premium.

1.2 Revenue Insurance Premium Rate

The insurance model used as a reference in this work was revenue assurance (RA), adopted in the market in the United States. The theoretical concepts to calculate revenue insurance premium rates were described in Miqueleto (2011)Miqueleto, G. J. (2011). Contribuições para o desenvolvimento do seguro agrícola de renda para o Brasil: Evidências teóricas e empíricas (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/T.11.2011.tde-12092011-163544
http://dx.doi.org/10.11606/T.11.2011.tde...
.

This work considered revenue (F) as a function of two variables, yield (X) and price (Y), whose expression is F = XY. Guaranteed yield is defined by Xg = 𝜆Xe, in which 0 < 𝜆 < 1 is the coverage level (CL) chosen by the producer and Xe is the expected yield, typically calculated by the average of the last five seasons. Guaranteed price is defined by Yg = 𝜆Ye, where 0 < 𝜆 < 1 is the coverage level and Ye is the expected price. In this work the expected price used was the average of the last 15 prices of the simulated distribution. In this type of insurance, the compensation to the producer per unit area is given by I = max [(𝑥g𝑦g - XY;0)].

The optimal premium rate is given by

(1) π = Prob X < x ; Y < y XY < x g y g x g y g 𝔼 XY XY < x g y g x g y g .

The next section presents data description used in the research. Section 3 presents the models used in modeling the margins and shows copula methodology to calculate bivariate distribution. Section 4 shows the results. Section 5 presents the premium rates. Finally, section 6 presents the conclusions.

2. Data description

For modeling crop yield, the series of soybean crop annual yield (Kg/ha) were analyzed in the municipalities of Toledo, Cascavel, Guarapuava and Castro, in the state of Parana (Brazil) and provided by the Institute of Social and Economic Development of Parana (IPARDES, 2015IPARDES - Instituto Paranaense de Desenvolvimento Econômico e Social. (2015). Base de Dados do Estado (BDEweb). Retrieved Julho 02, 2015, from http://www.ipardes.pr.gov.br
http://www.ipardes.pr.gov.br...
), available from 1980 to 2015.

The price series used is the nominal monthly price received by producers in Parana State in (R$), whose unit is the 60 kg bag and provided by the Secretariat of Agriculture and Supply of Parana State (SEAB, 2015SEAB - Secretaria da Agricultura e do Abastecimento do Parana, Departamento de Economia Rural (DERAL). (2015). Preços. Retrieved September 30,2015, from http://www.agricultura.pr.gov .br/modules/conteudo/conteudo.php?conteudo=195
http://www.agricultura.pr.gov .br/module...
). In order to equalize the periodicity of the two series, it was used the average nominal prices in the crop sales period, corresponding to the months of soybean harvest March, April and May of each year. In addition, the price series was deflated by general price index-international availability (IGP-DI), available at IPEADATA (2015)IPEADATA. (2015). Índice Geral de Preços - Disponibilidade Interna (IGP-DI). http://www.ipea-data.gov.br/. Retrieved November 19, 2015, from http://www.ipeadata.gov.br
http://www.ipea-data.gov.br/...
.

The commercial rates of the insurance company A were used in this study and was provided by the Ministry of Agriculture, Livestock and Food Supply (MAPA, 2017MAPA - Ministério da Agricultura, Pecuária e Abastecimento. (2017). Atlas do seguro rural: Indicador das taxas. Retrieved February 21,2017, from http://indicadores.agricultura.gov.br/atlasdoseguro/index.htm
http://indicadores.agricultura.gov.br/at...
).

3. Methodology

This section presents the methodology used to model price and yield series, as well as parametric copulas to model the dependency structure between the variables.

In order to fix the yield and prices series in terms of bias, the procedure approached in Gallagher (1987)Gallagher, P. (1987). U.S. soybean yields: Estimation and forecasting with nonsymmetric disturbances. American Journal of Agricultural Economics, 69(4), 796-803. http://dx.doi.org/10.2307/1242190
http://dx.doi.org/10.2307/1242190...
was used. This procedure was describe in Appendix A Appendix A. Trend in Yield and Prices Series Between 1980 and 2015, a decreasing trend in prices and an increasing yield data are observed. The latter is due to great advances in technologies used in crops. In addition, it is also expected temporal dependency and non-constant variance over time. Thus, before adjusting any probabilistic model for the series, it is necessary to use statistical techniques to make the data bias-free, independent and homoscedastic. To fix the series in terms of bias, the procedure approached in Gallagher (1987) was used. This procedure estimates initially a linear model between yield and time, given by 𝑦𝑡 = 𝛼 + 𝛽T + 𝛾T2 + 𝑒𝑡, where 𝑒𝑡 ~ 𝒩(𝜇, 𝜎), 𝑦𝑡 is the yield or price vector, T is the time vector, 𝛼, 𝛽 and 𝛾 are the regression parameters. It is used the residual regression 𝑒 ̂𝑡, estimation of the last observation of the adjusted model 𝑦̂2015 and the bias is removed, according to this equation: y˜t=ŷ20151+êt/ŷt. To check temporal dependency, it was applied the test proposed by Ljung and Box (1978), where the null hypothesis is that there is independency in the series. In addition, to check the homoscedasticity of variances, the test of Ljung and Box (1978) was applied to the squared series. . Yield modeling used the Normal and Skew-Normal (SN) distributions proposed by Azzalini (1985)Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12(2), 171-178. https://www.jstor.org/stable/4615982
https://www.jstor.org/stable/4615982...
, as well as the OLLN and Skew-f (ST) distributions (Braga, Cordeiro, Ortega, & Cruz, 2016Braga, A. d. S., Cordeiro, G. M., Ortega, E. M. M., & Cruz, J. N. d. (2016). The odd log-logistic normal distribution: Theory and applications in analysis of experiments. Journal of Statistical Theory and Practice, 10(2), 311-335. http://dx.doi.org/10.1080/15598608.2016.1141127
http://dx.doi.org/10.1080/15598608.2016....
, Azzalini & Capitanio, 2003Azzalini, A., & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2), 367-389. http://dx.doi.org/10.1111/1467-9868.00391
http://dx.doi.org/10.1111/1467-9868.0039...
). Price modeling used the Log-Normal and Skew-𝑡 distributions.

To select the model that best fits the data, some criteria or statistical tests were used. The most used criteria for model selection in practice are the corrected Akaike Information Criterion (AICc), proposed by Akaike (1998)Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of Hirotugu Akaike (pp. 199-213). Springer. and the Bayes Information Criterion (BIC) (Schwarz et al., 1978Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464. https://www.jstor.org/stable/2958889
https://www.jstor.org/stable/2958889...
). A different approach to choose the best model is through the study of modified statistics, from Anderson-Darling (W*) and Cramer-von Mises (A*), proposed by Lin, Huang, and Balakrishnan (2008)Lin, C.-T., Huang, Y.-L., & Balakrishnan, N. (2008). A new method for goodness-of-fit testing based on type-II right censored samples. IEEE Transactions on Reliability, 57(4), 633-642. http://dx.doi.org/10.1109/TR.2008.2005860
http://dx.doi.org/10.1109/TR.2008.200586...
and Pakyari and Balakrishnan (2012)Pakyari, R., & Balakrishnan, N. (2012). A general purpose approximate goodness-of-fit test for progressively type-II censored data. IEEE Transactions on Reliability, 61(1), 238-244. http://dx.doi.org/10.1109/TR.2012.2182811
http://dx.doi.org/10.1109/TR.2012.218281...
, respectively.

3.1 Copulas

The copulas theory is a multivariate modeling tool widely used in different branches of science in which the interest is on multivariate dependency and the use of multivariate normality in question. In finance and actuarial science, copulas are used in modeling of correlated events and competitive risks Miqueleto (2011)Miqueleto, G. J. (2011). Contribuições para o desenvolvimento do seguro agrícola de renda para o Brasil: Evidências teóricas e empíricas (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/T.11.2011.tde-12092011-163544
http://dx.doi.org/10.11606/T.11.2011.tde...
. This theory becomes attractive, as copulas cover a wide range of dependency structures and are able to model completely the dependency data structure.

Furthermore, copulas allow to relate a joint distribution H(x, y) to their marginal distributions F(x) and G(y). The Sklar theorem, proposed by Sklar (1959)Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges., 8, 229-231., guarantees the conditions of representation uniqueness of the distribution function H(x, y) by means of copulas, which is described below.

Theorem 1.Let H is a function of bivariate distribution with F and G margins. Thus, there is a bivariate copula C such that ∀(x,y) ∈ ℝ2:

(2) H x , y = C F x , G y .

On the other hand, if C is a copula bivariate and F, G are univariate distribution functions, the function H defined in equation (2) is a bivariate distribution function with margins F and G. In addition, if the margin distributions are all continuous, C is unique. Otherwise, C is given exclusively in Dom F × Dom G.

Appendix B Appendix B. Copulas This theory becomes attractive, as copulas cover a wide range of dependency structures and are able to model completely the dependency data structure. The copula function is one of the most useful tools to deal with multivariate distributions given or known the marginal univariates. Given a joint distribution function H with the continuous margins F and G, as in the Sklar theorem, and easy to build the corresponding copula: (B-1) C u , υ = H F − 1 u , G − 1 υ , where F-1 and G-1 mean, respectively, the generalized inverse of F and G, that is, F-1 (u) = supz{F(z) < u} and G-1(𝑣) = supz{G(z) < v}. If X and Y are continuous random variables with distribution function defined above, then C is a function of joint distribution with random variables U = F(X) and V = G(Y), which is obtained by “probability integral transform” of which U, V feature uniform distribution 𝑈(0,1). The density function of copula 𝑐 bivariate can be defined by c F x , G y = ∂ ∂ x ∂ y C F x , G y if the derivative exists. From the Sklar theorem, the joint density of X and Y is given by 𝑓(𝑥, 𝑦) = 𝑐(𝐹(𝑥), 𝐺(𝑦))𝑓(𝑥)𝑔(𝑦), where 𝑓 and 𝑔 are the probability density functions of 𝐹 and 𝐺, respectively. Therefore, any joint distribution function that meets the requirements of the theorem has a copula representation. There are several examples of copulas, such as normal or Gaussian copula, copula 𝑡 and Archimedean copulas. Most depend on one or more parameters, called 𝛿 which characterizes the dependence between the variables (Cherubini, Luciano, & Vecchiato, 2004). For example: Gaussian Copula: is the copula of a bivariate normal distribution with correlation parameter 𝛿 given by C N u , υ ; δ = Φ Φ − 1 u , Φ − 1 υ = ∫ − ∞ Φ − 1 u ∫ − ∞ Φ − 1 υ 1 2 π 1 − δ 2 e − x 2 − 2 xy δ + y 2 2 1 − δ 2 dxdy where Φ is the joint bivariate normal distribution function with correlation coefficient 𝛿 Copula t:CνΣu,υ=tν,0,Σtν−1u,tν−1υ where 𝑡𝜈,0,Σ is the t distribution function of 0 and average bivariate correlation matrix Σ, and 𝑡𝜈 is the univariate t-distribution function, with 𝜈 degrees of freedom. Archimedean Copulas: may be written as Cu,υ=ϕ−1ϕu+ϕυ for a function 𝜙 : [0,1] → ℝ+, continuous, strictly descending, such that 𝜙(1) = 0. The function 𝜙(⋅) is called the copula generating function 𝐶(𝑢, 𝑣). The Archimedean copulas are mostly used in financial studies, since they encompass multiple dependency structures (symmetrical, asymmetrical with dependency on tails). The most used are: Gumbel Copula, Clayton and Frank. According to Morettin (2008), parametric, nonparametric and semi-parametric estimators can be used to estimate copula 𝐶. In the first case, the maximum likelihood estimators are used. In the second case, empirical copulas (based on ranks) and smoothed estimators (via kernel, wavelets) are used. In the third case, the Pseudo maximum likelihood estimators are used. B.1 Parametric Estimators of the Likelihood Function Given a sample (X𝑖, Y𝑖), 𝑖 = 1,... , 𝑛 H bivariate with marginal distributions F and G of equation (B-4), the joint density is given by (B-2) f x i , y i , η = c F x i , α 1 , G y i , α 2 f x i , α 1 g y i , α 2 , where 𝛼1 contains F parameters, 𝛼2 contains G parameters, and 𝜃 contains parameters for c. Given 𝜂 = (𝛼1, 𝛼2, 𝜃) the log-likelihood is given by (B-3) l x , y , η = ∑ i = 1 n ln c F x i , α 1 , G y i , α 2 ; θ + ∑ i = 1 n ln f 1 x i , α 1 + ∑ i = 1 n ln f 2 y i , α 2 . The maximum-likelihood estimators (MLE) are obtained by maximizing this function. Generally, this can be a difficult task when there are many parameters, that is, in multivariate cases 𝑛 > 5 and when the margins are more complex. Then, the literature recommends a two-stage procedure, called “inference function for margins (IFM)” in which estimation is divided into two parts: • In step 1, estimators of the parameters of the margins are obtained, α ̂ i = arg max ∑ i = 1 n ln f i x i , α i , i = 1 . 2 . • In step 2, copula estimators are obtained, θ ̂ = arg max ∑ i = 1 n ln c F x i , α ̂ 1 , G y i , α ̂ 2 . According to Joe and Xu (1996), this procedure leads to consistent and asymptotically normal estimators. B.2 Pseudo Maximum Likelihood Estimators - Semi-Parametric Method In this case, F and G are estimated using nonparametric models, empirical distribution function (e.d.f.) or a combination of e.d.f. and distribution adjustment of extreme values to the distribution tails. To find the pseudo maximum likelihood estimators, proposed by Genest, Ghoudi, and Rivest (1995), the following steps are followed: • Obtain pseudo samples for copula (𝑢̂𝑖, 𝑣𝑖) = (𝐹̂(x𝑖), 𝐺̂(𝑦𝑖)), where F ̂ j x i = n − 1 ∑ i = 1 n I x ij ≤ y − 0 . 5 , for i = 1 , … , n . • Find log-likelihood and maximize in relation to 𝜃 by numerical methods: (B-4) l θ , u ̂ , υ ̂ = ∑ i = 1 n ln c u ̂ i , υ ̂ i ; θ . B.3 Goodness-of-fit Tests The Goodness-of-fit procedures are used to identify the best existing parametric copula to fit the data. The data adjustment procedure described in this subsection is based on the work of Berg (2009); Genest et al. (2009); Kojadinovic, Yan, and Holmes (2011). Given a set of random and independent variables X = (X1, X2, ..., Xn), it is assumed that the variables were generated by a joint function, H(𝑥) = C[F1(𝑥1), F2 (𝑥2), ..., F𝑛(𝑥n)], ∀𝑥 ∈ ℝ𝑛, where C[⋅] describes an unknown parametric copula. For example, it is implied that a given set of observations was generated by a Gaussian copula, described as follows: H 0 : C ∈ C θ , H a : C ∈ C θ , where C𝜃 represents a family of parametric copulas whose parameter is 𝜃, for example, Gaussian copula. According to Joe (2014), data adjustment procedures are based on Rosenblatt transform, Cramer-von Mises statistics or Kolmogorov-Smirnov and other empirical density functions, in order to involve a measure of overall distance between the supposed model and the empirical density. These procedures may not be sensitive to the tail behavior. Among the various methods of performing tests related to this type of hypothesis, the Cramer-von Mises test stands out, because it was more robust with more evidence of effectiveness, according to the works of Berg (2009); Genest et al. (2009); Kojadinovic et al. (2011). Therefore, it is the empirical process ℂnu=nCnu−Cθnu,u∈0,1d where C𝑛 is the empirical copula on the sample, and C𝑛𝑛 is copula estimated under H0. Thus, the Cramer-von Mises statistic is given by: S n = ∫ 0 , 1 d ℂ n u dC n 2 u = ∑ i = 1 n C n U i − C θ n U i 2 . An approach for the p-value for Sn can be obtained by using parametric bootstrap methods. However, this approach is extremely computationally expensive, because each iteration requires the generation of random number from the copula and the estimation of parameters of this copula. Therefore, as the sample size increases, the application of parametric bootstrap-based test becomes prohibitive. An approach based on the Central Limit Theorem is proposed to reduce the high computational cost. B.4 Vuong's procedure for comparison of copula parametric models In order to compare the best copula parametric model adjusted to the data, the Vuong’s procedure proposed by Vuong (1989) is described. This procedure uses the Kullback-Leibler Information Criterion to measure the model proximity to the truth and statistics based on simple likelihood ratio are used to test the null hypothesis that the competing models are also close to the real data generator process against the alternative hypothesis that a model is closer. Tests are directional and are successively derived for the cases in which the competing models are not nested, overlapping or nested and if both, one or none of them is incorrectly specified. As a prerequisite, the asymptotic distribution of likelihood ratio statistic is completely characterized under conditions that are more general. It is a weighted sum of Chi-square or a normal distribution, depending on whether the distributions in competitive models closer to the truth are observed identical. This test is also proposed for this last condition. According to Joe (2014), the Vuong’s procedure can be described as the sample version of Kullback-Leibler Divergence calculations and sample size to differentiate two models that could be nested. Therefore, consider two copula densities 𝑓1 = 𝑐1(𝑢, 𝑣; 𝜃) and 𝑓2 = 𝑐2(𝑢, 𝑣; 𝜃) for two bivariate copulas C1(Θ1) and C2(Θ2), with respective parameters Θ1 and Θ2. The difference of the Kullback-Leibler Divergence of the two copulas from the true density of copula can be measured for a sample of size 𝑛 and bivariate of sample (𝑢𝑖, 𝑣𝑖) by: D ̂ 12 = n − 1 ∑ i D i = n − 1 LLR , where LLR is equal to the log-likelihood ratio is given by 𝐷𝑖 = log(𝑓2(𝑦𝑖; 𝜃̂2)/𝑓1(𝑦𝑖;𝜃̂1)). For non-nested or nested models, a large confidence interval o the sample of 95% for parameter 𝐷12 is (B-5) D ̂ 12 ± n − 1 / 2 σ ̂ 12 2 , where the variance of 𝐷12 is given by σ̂122=n−1−1∑i=1nDi−D̂122. There are also versions with LLR adjusted by the Akaike information criteria (AIC) or based on Bayes information criterion (BIC), and are respectively Vuong (1989): (B-6a) D ̂ 12 − n − 1 dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 , (B-6b) D ̂ 12 − 1 2 n − 1 log n dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 . If the intervals described in equations (B-5), (B-6a) or (B-6b) contain 0, models M1 and M2 are not considered significantly different. If the interval does not contain 0, then, the model M1 or M2 is the best fit depending on whether the interval is completely below 0 or above 0, respectively. presents the parametric and semi-parametric estimators for copulas. Besides that, the Goodness-of-fit procedures propose by Genest, Remillard, and Beaudoin (2009)Genest, C., Rémillard, B., & Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199-213. http://dx.doi.org/10.1016/j.insmatheco.2007.10.005
http://dx.doi.org/10.1016/j.insmatheco.2...
was used to identify the best existing parametric copula to fit the data. In addition, in order to compare the best copula parametric model adjusted to the data, the Vuong’s procedure proposed by Vuong (1989)Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 57(2), 307-333. http://dx.doi.org/10.2307/1912557
http://dx.doi.org/10.2307/1912557...
is used.

4. Results

The models for the yield and price series were used to remove bias over time. Table 1 presents the estimates of the linear model and the significance of parameters for all series. At level of 1% of significance, it is observed the presence of linear trend of order 1 for all yield series and order 2 for the price series. Figure 1 shows the original and corrected series of crop yield of four municipalities and the price series.

Table 1
Results of the adjustment of the linear model to trend.

Figure 1
Original and corrected yield and price series.

Table 2 presents the p-values of the Ljung Box test for the corrected series and these squared to check for temporal dependency and heterogeneity of variances. At 1 % level of significance, the series do not have conditional dependency and heterogeneity of variances over time. The estimates of the copula parametric models were calculated using the R software, version 3.0.3, together with the copula package (Yan, 2007Yan, J. (2007). Enjoy the joy of copulas: With a package copula. Journal of Statistical Software, 21(4), 1-21. http://dx.doi.org/10.18637/jss.v021.i04
http://dx.doi.org/10.18637/jss.v021.i04...
).

Table 2
Results of the Ljung-Box test for the yield and price series, and these are squared to verify the independence between the observations over time and the homogeneity of variances.

4.1 Semi-Parametric Inference

To estimate copulas under the semi-parametric approach, empirical distributions for margin distributions of yield and prices were used and then the parametric copulas were estimated. Table 3 lists estimates for dependency parameter of copulas and the standard error of the estimates in parentheses. For all copulas, the dependency parameter 𝛿 is negative, showing a negative dependency between the price and yield variables. In practice, this relationship of negative dependency is expected, according to supply and demand, because when yield increases and supply of soybean moves up, in general prices of the product reduces, and vice versa.

Table 3
Estimates for copulas parameters (standard error in parentheses).

Table 4 presents the Akaike Information Criteria (AIC), Bayesian Criterion (BIC), log-likelihood value (LLV) as well as Cramer-von Mises statistics (Sn) and p-value of the model fit test described in Appendix B Appendix B. Copulas This theory becomes attractive, as copulas cover a wide range of dependency structures and are able to model completely the dependency data structure. The copula function is one of the most useful tools to deal with multivariate distributions given or known the marginal univariates. Given a joint distribution function H with the continuous margins F and G, as in the Sklar theorem, and easy to build the corresponding copula: (B-1) C u , υ = H F − 1 u , G − 1 υ , where F-1 and G-1 mean, respectively, the generalized inverse of F and G, that is, F-1 (u) = supz{F(z) < u} and G-1(𝑣) = supz{G(z) < v}. If X and Y are continuous random variables with distribution function defined above, then C is a function of joint distribution with random variables U = F(X) and V = G(Y), which is obtained by “probability integral transform” of which U, V feature uniform distribution 𝑈(0,1). The density function of copula 𝑐 bivariate can be defined by c F x , G y = ∂ ∂ x ∂ y C F x , G y if the derivative exists. From the Sklar theorem, the joint density of X and Y is given by 𝑓(𝑥, 𝑦) = 𝑐(𝐹(𝑥), 𝐺(𝑦))𝑓(𝑥)𝑔(𝑦), where 𝑓 and 𝑔 are the probability density functions of 𝐹 and 𝐺, respectively. Therefore, any joint distribution function that meets the requirements of the theorem has a copula representation. There are several examples of copulas, such as normal or Gaussian copula, copula 𝑡 and Archimedean copulas. Most depend on one or more parameters, called 𝛿 which characterizes the dependence between the variables (Cherubini, Luciano, & Vecchiato, 2004). For example: Gaussian Copula: is the copula of a bivariate normal distribution with correlation parameter 𝛿 given by C N u , υ ; δ = Φ Φ − 1 u , Φ − 1 υ = ∫ − ∞ Φ − 1 u ∫ − ∞ Φ − 1 υ 1 2 π 1 − δ 2 e − x 2 − 2 xy δ + y 2 2 1 − δ 2 dxdy where Φ is the joint bivariate normal distribution function with correlation coefficient 𝛿 Copula t:CνΣu,υ=tν,0,Σtν−1u,tν−1υ where 𝑡𝜈,0,Σ is the t distribution function of 0 and average bivariate correlation matrix Σ, and 𝑡𝜈 is the univariate t-distribution function, with 𝜈 degrees of freedom. Archimedean Copulas: may be written as Cu,υ=ϕ−1ϕu+ϕυ for a function 𝜙 : [0,1] → ℝ+, continuous, strictly descending, such that 𝜙(1) = 0. The function 𝜙(⋅) is called the copula generating function 𝐶(𝑢, 𝑣). The Archimedean copulas are mostly used in financial studies, since they encompass multiple dependency structures (symmetrical, asymmetrical with dependency on tails). The most used are: Gumbel Copula, Clayton and Frank. According to Morettin (2008), parametric, nonparametric and semi-parametric estimators can be used to estimate copula 𝐶. In the first case, the maximum likelihood estimators are used. In the second case, empirical copulas (based on ranks) and smoothed estimators (via kernel, wavelets) are used. In the third case, the Pseudo maximum likelihood estimators are used. B.1 Parametric Estimators of the Likelihood Function Given a sample (X𝑖, Y𝑖), 𝑖 = 1,... , 𝑛 H bivariate with marginal distributions F and G of equation (B-4), the joint density is given by (B-2) f x i , y i , η = c F x i , α 1 , G y i , α 2 f x i , α 1 g y i , α 2 , where 𝛼1 contains F parameters, 𝛼2 contains G parameters, and 𝜃 contains parameters for c. Given 𝜂 = (𝛼1, 𝛼2, 𝜃) the log-likelihood is given by (B-3) l x , y , η = ∑ i = 1 n ln c F x i , α 1 , G y i , α 2 ; θ + ∑ i = 1 n ln f 1 x i , α 1 + ∑ i = 1 n ln f 2 y i , α 2 . The maximum-likelihood estimators (MLE) are obtained by maximizing this function. Generally, this can be a difficult task when there are many parameters, that is, in multivariate cases 𝑛 > 5 and when the margins are more complex. Then, the literature recommends a two-stage procedure, called “inference function for margins (IFM)” in which estimation is divided into two parts: • In step 1, estimators of the parameters of the margins are obtained, α ̂ i = arg max ∑ i = 1 n ln f i x i , α i , i = 1 . 2 . • In step 2, copula estimators are obtained, θ ̂ = arg max ∑ i = 1 n ln c F x i , α ̂ 1 , G y i , α ̂ 2 . According to Joe and Xu (1996), this procedure leads to consistent and asymptotically normal estimators. B.2 Pseudo Maximum Likelihood Estimators - Semi-Parametric Method In this case, F and G are estimated using nonparametric models, empirical distribution function (e.d.f.) or a combination of e.d.f. and distribution adjustment of extreme values to the distribution tails. To find the pseudo maximum likelihood estimators, proposed by Genest, Ghoudi, and Rivest (1995), the following steps are followed: • Obtain pseudo samples for copula (𝑢̂𝑖, 𝑣𝑖) = (𝐹̂(x𝑖), 𝐺̂(𝑦𝑖)), where F ̂ j x i = n − 1 ∑ i = 1 n I x ij ≤ y − 0 . 5 , for i = 1 , … , n . • Find log-likelihood and maximize in relation to 𝜃 by numerical methods: (B-4) l θ , u ̂ , υ ̂ = ∑ i = 1 n ln c u ̂ i , υ ̂ i ; θ . B.3 Goodness-of-fit Tests The Goodness-of-fit procedures are used to identify the best existing parametric copula to fit the data. The data adjustment procedure described in this subsection is based on the work of Berg (2009); Genest et al. (2009); Kojadinovic, Yan, and Holmes (2011). Given a set of random and independent variables X = (X1, X2, ..., Xn), it is assumed that the variables were generated by a joint function, H(𝑥) = C[F1(𝑥1), F2 (𝑥2), ..., F𝑛(𝑥n)], ∀𝑥 ∈ ℝ𝑛, where C[⋅] describes an unknown parametric copula. For example, it is implied that a given set of observations was generated by a Gaussian copula, described as follows: H 0 : C ∈ C θ , H a : C ∈ C θ , where C𝜃 represents a family of parametric copulas whose parameter is 𝜃, for example, Gaussian copula. According to Joe (2014), data adjustment procedures are based on Rosenblatt transform, Cramer-von Mises statistics or Kolmogorov-Smirnov and other empirical density functions, in order to involve a measure of overall distance between the supposed model and the empirical density. These procedures may not be sensitive to the tail behavior. Among the various methods of performing tests related to this type of hypothesis, the Cramer-von Mises test stands out, because it was more robust with more evidence of effectiveness, according to the works of Berg (2009); Genest et al. (2009); Kojadinovic et al. (2011). Therefore, it is the empirical process ℂnu=nCnu−Cθnu,u∈0,1d where C𝑛 is the empirical copula on the sample, and C𝑛𝑛 is copula estimated under H0. Thus, the Cramer-von Mises statistic is given by: S n = ∫ 0 , 1 d ℂ n u dC n 2 u = ∑ i = 1 n C n U i − C θ n U i 2 . An approach for the p-value for Sn can be obtained by using parametric bootstrap methods. However, this approach is extremely computationally expensive, because each iteration requires the generation of random number from the copula and the estimation of parameters of this copula. Therefore, as the sample size increases, the application of parametric bootstrap-based test becomes prohibitive. An approach based on the Central Limit Theorem is proposed to reduce the high computational cost. B.4 Vuong's procedure for comparison of copula parametric models In order to compare the best copula parametric model adjusted to the data, the Vuong’s procedure proposed by Vuong (1989) is described. This procedure uses the Kullback-Leibler Information Criterion to measure the model proximity to the truth and statistics based on simple likelihood ratio are used to test the null hypothesis that the competing models are also close to the real data generator process against the alternative hypothesis that a model is closer. Tests are directional and are successively derived for the cases in which the competing models are not nested, overlapping or nested and if both, one or none of them is incorrectly specified. As a prerequisite, the asymptotic distribution of likelihood ratio statistic is completely characterized under conditions that are more general. It is a weighted sum of Chi-square or a normal distribution, depending on whether the distributions in competitive models closer to the truth are observed identical. This test is also proposed for this last condition. According to Joe (2014), the Vuong’s procedure can be described as the sample version of Kullback-Leibler Divergence calculations and sample size to differentiate two models that could be nested. Therefore, consider two copula densities 𝑓1 = 𝑐1(𝑢, 𝑣; 𝜃) and 𝑓2 = 𝑐2(𝑢, 𝑣; 𝜃) for two bivariate copulas C1(Θ1) and C2(Θ2), with respective parameters Θ1 and Θ2. The difference of the Kullback-Leibler Divergence of the two copulas from the true density of copula can be measured for a sample of size 𝑛 and bivariate of sample (𝑢𝑖, 𝑣𝑖) by: D ̂ 12 = n − 1 ∑ i D i = n − 1 LLR , where LLR is equal to the log-likelihood ratio is given by 𝐷𝑖 = log(𝑓2(𝑦𝑖; 𝜃̂2)/𝑓1(𝑦𝑖;𝜃̂1)). For non-nested or nested models, a large confidence interval o the sample of 95% for parameter 𝐷12 is (B-5) D ̂ 12 ± n − 1 / 2 σ ̂ 12 2 , where the variance of 𝐷12 is given by σ̂122=n−1−1∑i=1nDi−D̂122. There are also versions with LLR adjusted by the Akaike information criteria (AIC) or based on Bayes information criterion (BIC), and are respectively Vuong (1989): (B-6a) D ̂ 12 − n − 1 dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 , (B-6b) D ̂ 12 − 1 2 n − 1 log n dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 . If the intervals described in equations (B-5), (B-6a) or (B-6b) contain 0, models M1 and M2 are not considered significantly different. If the interval does not contain 0, then, the model M1 or M2 is the best fit depending on whether the interval is completely below 0 or above 0, respectively. . For all municipalities and in all fitted copulas, the tests did not reject any copula at 0.01 level of significance. Thus, in the choice for the copula that best represents dependency between the variables, the one with the lowest AIC and BIC values was chosen.

Table 4
AIC and BIC Selection Criteria for parametric copulas with empirical margins(EM).

Considering Toledo × price and Cascavel × price, the AMH copula was chosen using the criteria aforementioned. On the other hand, yield for Castro × price and Guarapuava × price is best adjusted by the Gaussian copula.

The next sub-section presents the copula estimation under the parametric approach, in this case, specifying a parametric distribution for price and yield separately and then fitting the copula to the simulated variables of these distributions.

4.2 Parametric Inference

Figure 2 shows the graphics of the fitted price and yield densities in Toledo. Normal, SkewNormal (SN) and OLLN distributions were used for yield. The price series used the LogNormal and Skew-t distributions. Table 5 presents the fitting procedures for the choice of models for yield and price in Toledo. For yield, the model chosen was OLLN distribution, for presenting the lowest value for statistics A* and W*. In addition, under the same criteria, the model chosen for the price series was the Skew-t distribution. The same analysis was performed for the municipalities of Guarapuava, Cascavel and Castro in which the parametric model that best fits to yield is the OLLN model for all municipalities (Duarte et al., 2017Duarte, G. V., Braga, A., Miquelluti, D. L., & Ozaki, V. A. (2017). Modeling of soybean yield using symmetric, asymmetric and bimodal distributions: implications for crop insurance. Journal of Applied Statistics, 45(11), 1-18. http://dx.doi.org/10.1080/02664763.2017.1406902
http://dx.doi.org/10.1080/02664763.2017....
).

Figure 2
Adjusted distribution for the corrected yield Toledo and price SEAB series.

Table 5
Statistics and information criteria AICc, BIC, A* and W* for the univariate series.

The next step is to select the copula that best represents the dependency between the data structure. For that, the criteria of AIC and BIC information were used.

Table 6 presents the Akaike’s Information Criterion (AIC), the Bayesian criterion (BIC), the Cramer-von Mises statistics (Sn) and the p-value of the fitting test of the model described in Appendix B Appendix B. Copulas This theory becomes attractive, as copulas cover a wide range of dependency structures and are able to model completely the dependency data structure. The copula function is one of the most useful tools to deal with multivariate distributions given or known the marginal univariates. Given a joint distribution function H with the continuous margins F and G, as in the Sklar theorem, and easy to build the corresponding copula: (B-1) C u , υ = H F − 1 u , G − 1 υ , where F-1 and G-1 mean, respectively, the generalized inverse of F and G, that is, F-1 (u) = supz{F(z) < u} and G-1(𝑣) = supz{G(z) < v}. If X and Y are continuous random variables with distribution function defined above, then C is a function of joint distribution with random variables U = F(X) and V = G(Y), which is obtained by “probability integral transform” of which U, V feature uniform distribution 𝑈(0,1). The density function of copula 𝑐 bivariate can be defined by c F x , G y = ∂ ∂ x ∂ y C F x , G y if the derivative exists. From the Sklar theorem, the joint density of X and Y is given by 𝑓(𝑥, 𝑦) = 𝑐(𝐹(𝑥), 𝐺(𝑦))𝑓(𝑥)𝑔(𝑦), where 𝑓 and 𝑔 are the probability density functions of 𝐹 and 𝐺, respectively. Therefore, any joint distribution function that meets the requirements of the theorem has a copula representation. There are several examples of copulas, such as normal or Gaussian copula, copula 𝑡 and Archimedean copulas. Most depend on one or more parameters, called 𝛿 which characterizes the dependence between the variables (Cherubini, Luciano, & Vecchiato, 2004). For example: Gaussian Copula: is the copula of a bivariate normal distribution with correlation parameter 𝛿 given by C N u , υ ; δ = Φ Φ − 1 u , Φ − 1 υ = ∫ − ∞ Φ − 1 u ∫ − ∞ Φ − 1 υ 1 2 π 1 − δ 2 e − x 2 − 2 xy δ + y 2 2 1 − δ 2 dxdy where Φ is the joint bivariate normal distribution function with correlation coefficient 𝛿 Copula t:CνΣu,υ=tν,0,Σtν−1u,tν−1υ where 𝑡𝜈,0,Σ is the t distribution function of 0 and average bivariate correlation matrix Σ, and 𝑡𝜈 is the univariate t-distribution function, with 𝜈 degrees of freedom. Archimedean Copulas: may be written as Cu,υ=ϕ−1ϕu+ϕυ for a function 𝜙 : [0,1] → ℝ+, continuous, strictly descending, such that 𝜙(1) = 0. The function 𝜙(⋅) is called the copula generating function 𝐶(𝑢, 𝑣). The Archimedean copulas are mostly used in financial studies, since they encompass multiple dependency structures (symmetrical, asymmetrical with dependency on tails). The most used are: Gumbel Copula, Clayton and Frank. According to Morettin (2008), parametric, nonparametric and semi-parametric estimators can be used to estimate copula 𝐶. In the first case, the maximum likelihood estimators are used. In the second case, empirical copulas (based on ranks) and smoothed estimators (via kernel, wavelets) are used. In the third case, the Pseudo maximum likelihood estimators are used. B.1 Parametric Estimators of the Likelihood Function Given a sample (X𝑖, Y𝑖), 𝑖 = 1,... , 𝑛 H bivariate with marginal distributions F and G of equation (B-4), the joint density is given by (B-2) f x i , y i , η = c F x i , α 1 , G y i , α 2 f x i , α 1 g y i , α 2 , where 𝛼1 contains F parameters, 𝛼2 contains G parameters, and 𝜃 contains parameters for c. Given 𝜂 = (𝛼1, 𝛼2, 𝜃) the log-likelihood is given by (B-3) l x , y , η = ∑ i = 1 n ln c F x i , α 1 , G y i , α 2 ; θ + ∑ i = 1 n ln f 1 x i , α 1 + ∑ i = 1 n ln f 2 y i , α 2 . The maximum-likelihood estimators (MLE) are obtained by maximizing this function. Generally, this can be a difficult task when there are many parameters, that is, in multivariate cases 𝑛 > 5 and when the margins are more complex. Then, the literature recommends a two-stage procedure, called “inference function for margins (IFM)” in which estimation is divided into two parts: • In step 1, estimators of the parameters of the margins are obtained, α ̂ i = arg max ∑ i = 1 n ln f i x i , α i , i = 1 . 2 . • In step 2, copula estimators are obtained, θ ̂ = arg max ∑ i = 1 n ln c F x i , α ̂ 1 , G y i , α ̂ 2 . According to Joe and Xu (1996), this procedure leads to consistent and asymptotically normal estimators. B.2 Pseudo Maximum Likelihood Estimators - Semi-Parametric Method In this case, F and G are estimated using nonparametric models, empirical distribution function (e.d.f.) or a combination of e.d.f. and distribution adjustment of extreme values to the distribution tails. To find the pseudo maximum likelihood estimators, proposed by Genest, Ghoudi, and Rivest (1995), the following steps are followed: • Obtain pseudo samples for copula (𝑢̂𝑖, 𝑣𝑖) = (𝐹̂(x𝑖), 𝐺̂(𝑦𝑖)), where F ̂ j x i = n − 1 ∑ i = 1 n I x ij ≤ y − 0 . 5 , for i = 1 , … , n . • Find log-likelihood and maximize in relation to 𝜃 by numerical methods: (B-4) l θ , u ̂ , υ ̂ = ∑ i = 1 n ln c u ̂ i , υ ̂ i ; θ . B.3 Goodness-of-fit Tests The Goodness-of-fit procedures are used to identify the best existing parametric copula to fit the data. The data adjustment procedure described in this subsection is based on the work of Berg (2009); Genest et al. (2009); Kojadinovic, Yan, and Holmes (2011). Given a set of random and independent variables X = (X1, X2, ..., Xn), it is assumed that the variables were generated by a joint function, H(𝑥) = C[F1(𝑥1), F2 (𝑥2), ..., F𝑛(𝑥n)], ∀𝑥 ∈ ℝ𝑛, where C[⋅] describes an unknown parametric copula. For example, it is implied that a given set of observations was generated by a Gaussian copula, described as follows: H 0 : C ∈ C θ , H a : C ∈ C θ , where C𝜃 represents a family of parametric copulas whose parameter is 𝜃, for example, Gaussian copula. According to Joe (2014), data adjustment procedures are based on Rosenblatt transform, Cramer-von Mises statistics or Kolmogorov-Smirnov and other empirical density functions, in order to involve a measure of overall distance between the supposed model and the empirical density. These procedures may not be sensitive to the tail behavior. Among the various methods of performing tests related to this type of hypothesis, the Cramer-von Mises test stands out, because it was more robust with more evidence of effectiveness, according to the works of Berg (2009); Genest et al. (2009); Kojadinovic et al. (2011). Therefore, it is the empirical process ℂnu=nCnu−Cθnu,u∈0,1d where C𝑛 is the empirical copula on the sample, and C𝑛𝑛 is copula estimated under H0. Thus, the Cramer-von Mises statistic is given by: S n = ∫ 0 , 1 d ℂ n u dC n 2 u = ∑ i = 1 n C n U i − C θ n U i 2 . An approach for the p-value for Sn can be obtained by using parametric bootstrap methods. However, this approach is extremely computationally expensive, because each iteration requires the generation of random number from the copula and the estimation of parameters of this copula. Therefore, as the sample size increases, the application of parametric bootstrap-based test becomes prohibitive. An approach based on the Central Limit Theorem is proposed to reduce the high computational cost. B.4 Vuong's procedure for comparison of copula parametric models In order to compare the best copula parametric model adjusted to the data, the Vuong’s procedure proposed by Vuong (1989) is described. This procedure uses the Kullback-Leibler Information Criterion to measure the model proximity to the truth and statistics based on simple likelihood ratio are used to test the null hypothesis that the competing models are also close to the real data generator process against the alternative hypothesis that a model is closer. Tests are directional and are successively derived for the cases in which the competing models are not nested, overlapping or nested and if both, one or none of them is incorrectly specified. As a prerequisite, the asymptotic distribution of likelihood ratio statistic is completely characterized under conditions that are more general. It is a weighted sum of Chi-square or a normal distribution, depending on whether the distributions in competitive models closer to the truth are observed identical. This test is also proposed for this last condition. According to Joe (2014), the Vuong’s procedure can be described as the sample version of Kullback-Leibler Divergence calculations and sample size to differentiate two models that could be nested. Therefore, consider two copula densities 𝑓1 = 𝑐1(𝑢, 𝑣; 𝜃) and 𝑓2 = 𝑐2(𝑢, 𝑣; 𝜃) for two bivariate copulas C1(Θ1) and C2(Θ2), with respective parameters Θ1 and Θ2. The difference of the Kullback-Leibler Divergence of the two copulas from the true density of copula can be measured for a sample of size 𝑛 and bivariate of sample (𝑢𝑖, 𝑣𝑖) by: D ̂ 12 = n − 1 ∑ i D i = n − 1 LLR , where LLR is equal to the log-likelihood ratio is given by 𝐷𝑖 = log(𝑓2(𝑦𝑖; 𝜃̂2)/𝑓1(𝑦𝑖;𝜃̂1)). For non-nested or nested models, a large confidence interval o the sample of 95% for parameter 𝐷12 is (B-5) D ̂ 12 ± n − 1 / 2 σ ̂ 12 2 , where the variance of 𝐷12 is given by σ̂122=n−1−1∑i=1nDi−D̂122. There are also versions with LLR adjusted by the Akaike information criteria (AIC) or based on Bayes information criterion (BIC), and are respectively Vuong (1989): (B-6a) D ̂ 12 − n − 1 dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 , (B-6b) D ̂ 12 − 1 2 n − 1 log n dim Θ 2 − dim Θ 1 ± 1 . 96 n − 1 / 2 σ ̂ 12 2 . If the intervals described in equations (B-5), (B-6a) or (B-6b) contain 0, models M1 and M2 are not considered significantly different. If the interval does not contain 0, then, the model M1 or M2 is the best fit depending on whether the interval is completely below 0 or above 0, respectively. . Similarly to the case of semi-parametric inference for all municipalities and in all fitted copulas, the test adjusted did not reject any copula at 0.01 level of significance.

Table 6
Selection Criteria AIC, BIC, Sn statistic for parametric copulas with parametric margins (PM).

Therefore, for the choice of the copula that best represents the dependency between the variables, the one with the lowest AIC and BIC values were chosen. The copula that best characterizes the dependency structure between the variables of yield in Toledo, Cascavel and Castro with price is AMH copula. Copula AMH presents greater accuracy in calculating revenue premium rates for these municipalities. For yield in Guarapuava and price, the Gaussian copula best represents the dependency structure.

4.3 Comparison between models

Table 7 presents the models chosen for both inference procedures. For the municipalities of Toledo, Cascavel and Guarapuava, in both types of inferences, the models chosen were the same, the AMH copula, the AMH copula and the Gaussian copula, respectively.

Table 7
Models selected by semi-parametric (SPI) and parametric (PI) inference, with dependency parameter (δ).

In these cases, because it is not possible to apply the procedure proposed by Vuong (1989)Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 57(2), 307-333. http://dx.doi.org/10.2307/1912557
http://dx.doi.org/10.2307/1912557...
, the models with the lowest AIC and BIC values were chosen, according to Tables 4 and 6. Therefore, for the municipalities of Cascavel and Toledo, the AMH copula was chosen by the parametric inference thus with parametric margins. For the municipality of Guarapuava, the Gaussian copula model was selected with semi-parametric and inference thus with empirical margins.

In addition, Table 7 shows that for municipality of Castro, there are different models in both inference procedures (SPI and PI). In this case, the Vuong’s procedure was used. The confidence interval of 95% for 𝐷̂12, according to equation (B-5) is (-0.5352385; 0.8787669) with 𝐷̂12 = 0.1717642. As the interval contains zero, the models cannot be considered significantly different.

On the other hand, applying the goodness-of-fit test to the data, described in Appendix B, section B.3, it is observed in Table 8 that the lowest value of Cramer Von Mises statistic (Sn) is for the model adjusted by the semi-parametric inference (SPI). Furthermore, at 0.05 of significance level, one can consider the non-rejection of the null hypothesis in the case of semi-parametric inference. Therefore, the Gaussian copula model fits the data better.

Table 8
Results of the fit test to the model for the municipality of Castro × Price by SPI and PI.

Table 9 summarizes the selected models for the different municipalities.

Table 9
Summary of the models selected for the different municipalities by SPI and PI.

The next step is to calculate the revenue insurance premium rates using the selected models and compare the results with the bivariate normal distribution, which is widely used by the insurance market.

5. Revenue Insurance Premium Rate

At the beginning of the insurance contract, the producer chooses the coverage level (CL). Assuming that the producer can choose between 60 to 80% coverage of guaranteed revenues Fg = XgYg, if, at the harvest the yield X obtained is less than the insured yield Xg or the price obtained Y is lower than the insured price Yg in the contract, the insured receives a compensation.

Table 10 presents pure rates (PR) calculated according to equation (1), using the chosen copulas and bivariate normal distribution for all municipalities. For the municipalities of Castro and Cascavel, the revenue insurance rate calculated by Bivariate Normal distribution underestimates the rates when compared with the copulas, for all coverage levels. In addition, for Guarapuava, the Bivariate Normal distribution rate is also underestimated when compared with copula for coverage levels of up to 75%. This underestimation of the premium rate may lead to serious losses for insurers, since it considers a lower risk than should be taken into account.

Table 10
Pure Rates for Revenue Insurance Premium using Selected Copula and Normal Distribution for all municipalities.

For the municipality of Toledo, the revenue premium rate calculated by Normal distribution underestimates up to 65% of CL and overestimates 70% from CL, when compared with the AMH copula. For the municipalities of Guarapuava and Toledo with 80% coverage level, the Bivariate Normal distribution overestimates the premium rate compared to copula. Overestimation of the rate may hinder the widespread use of crop insurance in the Brazilian territory, in addition attract producers with higher risk profile, increasing adverse selection problem.

According to Brisolara (2013)Brisolara, C. S. (2013). Proposições para o desenvolvimento do seguro de receita agrícola no Brasil: Do modelo teórico ao cálculo das taxas de prêmio (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/I11.2013.tde-02102013-141823
http://dx.doi.org/10.11606/I11.2013.tde-...
, the pure premium rate represents the intrinsic business risk, without including any load (additional costs). To compare it with the rates offered by the Brazilian insurance companies, it is necessary to load it with average market parameters concerning the technical margin (20 %), administrative expenses (20%), commercial expenses (10%) and profit margin of the insurer (10%). Thus, the commercial rate (CR) is calculated as follows:

CR = PR × 1 . 2 1 0 . 1 + 0 . 2 + 0 . 1 .

The average commercial rates of revenue insurance premium using Bivariate Normal distribution copula, and the rates of the insurance company for coverage levels of 60-69% and 70-79% are presented in Table 11.

Table 11
Average Commercial Rates (%) of Revenue Insurance Premium using the selected copula, Normal Distribution and insurance company A.

The rates calculated in this study are higher than those applied in the insurance market. For example, considering the municipality of Toledo with LC 60-69% and 70-79%, the rate of insurance company A is equivalent to 50.64% and 29.84% of the rate calculated by the AMH copula, respectively.

In the case of the municipality of Castro, the difference is even greater. The average commercial rates of the company A are equivalent, respectively, to 56.73% and 38.29% of the rate calculated by the Gaussian copula.

Therefore, there is an underestimation of the premium rate for part of the insurance company, which may result in large losses, because it considers a lower risk than should be taken into account.

On the other hand, according to MAPA (2017)MAPA - Ministério da Agricultura, Pecuária e Abastecimento. (2017). Atlas do seguro rural: Indicador das taxas. Retrieved February 21,2017, from http://indicadores.agricultura.gov.br/atlasdoseguro/index.htm
http://indicadores.agricultura.gov.br/at...
, 93.30% of revenue insurance policies for the soybean crop in Brazil are sold by the insurance company A, which offers almost exclusively this type of insurance and coverage in all regions of Brazil (spreading risks). This allows the company to decrease the insurance rate in regions with higher risk, as is the case of the southern region in Brazil.

This gap between the rates applied on the market and those calculated in this study could be explained by the fact that this insurance type is relatively recent in the Brazilian market. This means that there is a lack of studies on which actuarial procedures should be used to price this type of risk may have led the insurer to adopt a subjective procedure in pricing rates.

6. Conclusions

In this work, alternative approaches were proposed to calculate revenue insurance premium rates using parametric copulas with parametric marginal and empirical marginal distributions. These methods were applied to soybean yield data from the municipalities of Toledo, Cascavel, Guarapuava and Castro of Parana State and nominal monthly prices received by producers in Parana State. It is concluded that the parametric models that best fit the yield and price series of all municipalities were the OLLN and Skew-t models, respectively.

In addition, the copula that best represents the dependency structure between the variables for the municipalities of Cascavel and Toledo is the AMH copula. For the municipality of Guarapuava and Castro, the Gaussian was the copula selected with empirical margins.

For the municipalities of Castro and Cascavel, considering all coverage levels, the revenue insurance rate calculated by Bivariate Normal distribution underestimates the rates when compared with copulas. For the municipality of Guarapuava, the Bivariate Normal distribution rate is also underestimated when compared to the copulas for coverage levels up to 75%. For the municipality of Toledo, the revenue insurance rate calculated by Normal distribution underestimates up to 65% of NC and overestimates 70% from CL when compared with the AMH copula.

The underestimation of the premium rate could lead to serious losses to insurers, once it considers a lower risk than should be taken into account. The overestimation of the rate, in turn, may hinder the widespread use of this insurance type in the Brazilian territory, attracting producers with higher risk profile, which increases the problem of adverse selection.

In addition, the average commercial rates calculated in this study were much higher than those applied by the insurance company A. This gap may be explained because of the overestimation of the additional costs, which is a confidential information and varies by insurance companies. One might suggests there is a commercial practice such as tie-in sales by the insurer, but is not a common practice. Another possible justification relies on the fact that the insurance companies diversify their portfolio in different regions and products resulting in lower rates than those found in this study. In others words, when considering the technical margin, commercial and administrative expenses, and profit margin of the insurance company used in this work, we might have overrated the additional costs, leading to a higher premium rate. Furthermore, this insurance type is relatively recent in Brazil and thus there are very few actuarial studies and procedures defined for its pricing, leading the insurance company to adopt a subjective and simpler procedures for pricing its rates.

Further studies should be considered to better reflect the risk and the insurance premium rate and investigate the structure of three dimensional dependency of yield, prices of futures contracts traded at the Chicago Mercantile Exchange (CME) and the exchange rates, since most Brazilian soybean harvest is exported and traded at the CME.

  • *
    We thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial support.

Appendix A. Trend in Yield and Prices Series

Between 1980 and 2015, a decreasing trend in prices and an increasing yield data are observed. The latter is due to great advances in technologies used in crops. In addition, it is also expected temporal dependency and non-constant variance over time. Thus, before adjusting any probabilistic model for the series, it is necessary to use statistical techniques to make the data bias-free, independent and homoscedastic. To fix the series in terms of bias, the procedure approached in Gallagher (1987)Gallagher, P. (1987). U.S. soybean yields: Estimation and forecasting with nonsymmetric disturbances. American Journal of Agricultural Economics, 69(4), 796-803. http://dx.doi.org/10.2307/1242190
http://dx.doi.org/10.2307/1242190...
was used. This procedure estimates initially a linear model between yield and time, given by 𝑦𝑡 = 𝛼 + 𝛽T + 𝛾T2 + 𝑒𝑡, where 𝑒𝑡 ~ 𝒩(𝜇, 𝜎), 𝑦𝑡 is the yield or price vector, T is the time vector, 𝛼, 𝛽 and 𝛾 are the regression parameters. It is used the residual regression 𝑒 ̂𝑡, estimation of the last observation of the adjusted model 𝑦̂2015 and the bias is removed, according to this equation: y˜t=ŷ20151+êt/ŷt. To check temporal dependency, it was applied the test proposed by Ljung and Box (1978)Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297-303. http://dx.doi.org/10.1093/biomet/65.2.297
http://dx.doi.org/10.1093/biomet/65.2.29...
, where the null hypothesis is that there is independency in the series. In addition, to check the homoscedasticity of variances, the test of Ljung and Box (1978) was applied to the squared series.

Appendix B. Copulas

This theory becomes attractive, as copulas cover a wide range of dependency structures and are able to model completely the dependency data structure. The copula function is one of the most useful tools to deal with multivariate distributions given or known the marginal univariates. Given a joint distribution function H with the continuous margins F and G, as in the Sklar theorem, and easy to build the corresponding copula:

(B-1) C u , υ = H F 1 u , G 1 υ ,

where F-1 and G-1 mean, respectively, the generalized inverse of F and G, that is, F-1 (u) = supz{F(z) < u} and G-1(𝑣) = supz{G(z) < v}.

If X and Y are continuous random variables with distribution function defined above, then C is a function of joint distribution with random variables U = F(X) and V = G(Y), which is obtained by “probability integral transform” of which U, V feature uniform distribution 𝑈(0,1).

The density function of copula 𝑐 bivariate can be defined by

c F x , G y = x y C F x , G y

if the derivative exists. From the Sklar theorem, the joint density of X and Y is given by 𝑓(𝑥, 𝑦) = 𝑐(𝐹(𝑥), 𝐺(𝑦))𝑓(𝑥)𝑔(𝑦), where 𝑓 and 𝑔 are the probability density functions of 𝐹 and 𝐺, respectively. Therefore, any joint distribution function that meets the requirements of the theorem has a copula representation. There are several examples of copulas, such as normal or Gaussian copula, copula 𝑡 and Archimedean copulas. Most depend on one or more parameters, called 𝛿 which characterizes the dependence between the variables (Cherubini, Luciano, & Vecchiato, 2004Cherubini, U., Luciano, E., & Vecchiato, W. (2004). Copula methods in finance. John Wiley & Sons. http://dx.doi.org/10.1002/9781118673331
http://dx.doi.org/10.1002/9781118673331...
). For example:

Gaussian Copula: is the copula of a bivariate normal distribution with correlation parameter 𝛿 given by

C N u , υ ; δ = Φ Φ 1 u , Φ 1 υ = Φ 1 u Φ 1 υ 1 2 π 1 δ 2 e x 2 2 xy δ + y 2 2 1 δ 2 dxdy

where Φ is the joint bivariate normal distribution function with correlation coefficient 𝛿

Copula t:CνΣu,υ=tν,0,Σtν1u,tν1υ where 𝑡𝜈,0 is the t distribution function of 0 and average bivariate correlation matrix Σ, and 𝑡𝜈 is the univariate t-distribution function, with 𝜈 degrees of freedom.

Archimedean Copulas: may be written as Cu,υ=ϕ1ϕu+ϕυ for a function 𝜙 : [0,1] → ℝ+, continuous, strictly descending, such that 𝜙(1) = 0. The function 𝜙(⋅) is called the copula generating function 𝐶(𝑢, 𝑣).

The Archimedean copulas are mostly used in financial studies, since they encompass multiple dependency structures (symmetrical, asymmetrical with dependency on tails). The most used are: Gumbel Copula, Clayton and Frank. According to Morettin (2008)Morettin, P. A. (2008). Econometria financeira: Um curso em séries temporais financeiras. Edgard Blücher., parametric, nonparametric and semi-parametric estimators can be used to estimate copula 𝐶. In the first case, the maximum likelihood estimators are used. In the second case, empirical copulas (based on ranks) and smoothed estimators (via kernel, wavelets) are used. In the third case, the Pseudo maximum likelihood estimators are used.

B.1 Parametric Estimators of the Likelihood Function

Given a sample (X𝑖, Y𝑖), 𝑖 = 1,... , 𝑛 H bivariate with marginal distributions F and G of equation (B-4), the joint density is given by

(B-2) f x i , y i , η = c F x i , α 1 , G y i , α 2 f x i , α 1 g y i , α 2 ,

where 𝛼1 contains F parameters, 𝛼2 contains G parameters, and 𝜃 contains parameters for c. Given 𝜂 = (𝛼1, 𝛼2, 𝜃) the log-likelihood is given by

(B-3) l x , y , η = i = 1 n ln c F x i , α 1 , G y i , α 2 ; θ + i = 1 n ln f 1 x i , α 1 + i = 1 n ln f 2 y i , α 2 .

The maximum-likelihood estimators (MLE) are obtained by maximizing this function. Generally, this can be a difficult task when there are many parameters, that is, in multivariate cases 𝑛 > 5 and when the margins are more complex. Then, the literature recommends a two-stage procedure, called “inference function for margins (IFM)” in which estimation is divided into two parts:

  • • In step 1, estimators of the parameters of the margins are obtained,

    α ̂ i = arg max i = 1 n ln f i x i , α i , i = 1 . 2 .

  • • In step 2, copula estimators are obtained,

    θ ̂ = arg max i = 1 n ln c F x i , α ̂ 1 , G y i , α ̂ 2 .

According to Joe and Xu (1996)Joe, H., & Xu, J. J. (1996, October). The estimation method of inference functions for margins for multivariate models (Technical Report No. 166). Vancouver: University of British Columbia. http://dx.doi.org/10.14288/L0225985
http://dx.doi.org/10.14288/L0225985...
, this procedure leads to consistent and asymptotically normal estimators.

B.2 Pseudo Maximum Likelihood Estimators - Semi-Parametric Method

In this case, F and G are estimated using nonparametric models, empirical distribution function (e.d.f.) or a combination of e.d.f. and distribution adjustment of extreme values to the distribution tails. To find the pseudo maximum likelihood estimators, proposed by Genest, Ghoudi, and Rivest (1995)Genest, C., Ghoudi, K., & Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543-552. http://dx.doi.org/10.1093/biomet/82.3.543
http://dx.doi.org/10.1093/biomet/82.3.54...
, the following steps are followed:

  • • Obtain pseudo samples for copula (𝑢̂𝑖, 𝑣𝑖) = (𝐹̂(x𝑖), 𝐺̂(𝑦𝑖)), where

    F ̂ j x i = n 1 i = 1 n I x ij y 0 . 5 , for i = 1 , , n .

  • • Find log-likelihood and maximize in relation to 𝜃 by numerical methods:

    (B-4) l θ , u ̂ , υ ̂ = i = 1 n ln c u ̂ i , υ ̂ i ; θ .

B.3 Goodness-of-fit Tests

The Goodness-of-fit procedures are used to identify the best existing parametric copula to fit the data. The data adjustment procedure described in this subsection is based on the work of Berg (2009)Berg, D. (2009). Copula goodness-of-fit testing: An overview and power comparison. The European Journal of Finance, 15(7-8), 675-701. http://dx.doi.org/10.1080/13518470802697428
http://dx.doi.org/10.1080/13518470802697...
; Genest et al. (2009)Genest, C., Rémillard, B., & Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199-213. http://dx.doi.org/10.1016/j.insmatheco.2007.10.005
http://dx.doi.org/10.1016/j.insmatheco.2...
; Kojadinovic, Yan, and Holmes (2011)Kojadinovic, I., Yan, J., & Holmes, M. (2011). Fast large-sample goodness-of-fit tests for copulas. Statistica Sinica, 21(2), 841-871. https://www.jstor.org/stable/24309543
https://www.jstor.org/stable/24309543...
.

Given a set of random and independent variables X = (X1, X2, ..., Xn), it is assumed that the variables were generated by a joint function, H(𝑥) = C[F1(𝑥1), F2 (𝑥2), ..., F𝑛(𝑥n)], ∀𝑥 ∈ ℝ𝑛, where C[⋅] describes an unknown parametric copula. For example, it is implied that a given set of observations was generated by a Gaussian copula, described as follows:

H 0 : C C θ , H a : C C θ ,

where C𝜃 represents a family of parametric copulas whose parameter is 𝜃, for example, Gaussian copula.

According to Joe (2014)Joe, H. (2014). Dependence modeling with copulas [Chapman & Hall/CRC Monographs on Statistics and Applied Probability 134]. CRC Press., data adjustment procedures are based on Rosenblatt transform, Cramer-von Mises statistics or Kolmogorov-Smirnov and other empirical density functions, in order to involve a measure of overall distance between the supposed model and the empirical density. These procedures may not be sensitive to the tail behavior.

Among the various methods of performing tests related to this type of hypothesis, the Cramer-von Mises test stands out, because it was more robust with more evidence of effectiveness, according to the works of Berg (2009)Berg, D. (2009). Copula goodness-of-fit testing: An overview and power comparison. The European Journal of Finance, 15(7-8), 675-701. http://dx.doi.org/10.1080/13518470802697428
http://dx.doi.org/10.1080/13518470802697...
; Genest et al. (2009)Genest, C., Rémillard, B., & Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199-213. http://dx.doi.org/10.1016/j.insmatheco.2007.10.005
http://dx.doi.org/10.1016/j.insmatheco.2...
; Kojadinovic et al. (2011)Kojadinovic, I., Yan, J., & Holmes, M. (2011). Fast large-sample goodness-of-fit tests for copulas. Statistica Sinica, 21(2), 841-871. https://www.jstor.org/stable/24309543
https://www.jstor.org/stable/24309543...
. Therefore, it is the empirical process nu=nCnuCθnu,u0,1d where C𝑛 is the empirical copula on the sample, and C𝑛𝑛 is copula estimated under H0. Thus, the Cramer-von Mises statistic is given by:

S n = 0 , 1 d n u dC n 2 u = i = 1 n C n U i C θ n U i 2 .

An approach for the p-value for Sn can be obtained by using parametric bootstrap methods. However, this approach is extremely computationally expensive, because each iteration requires the generation of random number from the copula and the estimation of parameters of this copula. Therefore, as the sample size increases, the application of parametric bootstrap-based test becomes prohibitive. An approach based on the Central Limit Theorem is proposed to reduce the high computational cost.

B.4 Vuong's procedure for comparison of copula parametric models

In order to compare the best copula parametric model adjusted to the data, the Vuong’s procedure proposed by Vuong (1989)Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 57(2), 307-333. http://dx.doi.org/10.2307/1912557
http://dx.doi.org/10.2307/1912557...
is described. This procedure uses the Kullback-Leibler Information Criterion to measure the model proximity to the truth and statistics based on simple likelihood ratio are used to test the null hypothesis that the competing models are also close to the real data generator process against the alternative hypothesis that a model is closer.

Tests are directional and are successively derived for the cases in which the competing models are not nested, overlapping or nested and if both, one or none of them is incorrectly specified. As a prerequisite, the asymptotic distribution of likelihood ratio statistic is completely characterized under conditions that are more general. It is a weighted sum of Chi-square or a normal distribution, depending on whether the distributions in competitive models closer to the truth are observed identical. This test is also proposed for this last condition.

According to Joe (2014)Joe, H. (2014). Dependence modeling with copulas [Chapman & Hall/CRC Monographs on Statistics and Applied Probability 134]. CRC Press., the Vuong’s procedure can be described as the sample version of Kullback-Leibler Divergence calculations and sample size to differentiate two models that could be nested. Therefore, consider two copula densities 𝑓1 = 𝑐1(𝑢, 𝑣; 𝜃) and 𝑓2 = 𝑐2(𝑢, 𝑣; 𝜃) for two bivariate copulas C11) and C22), with respective parameters Θ1 and Θ2.

The difference of the Kullback-Leibler Divergence of the two copulas from the true density of copula can be measured for a sample of size 𝑛 and bivariate of sample (𝑢𝑖, 𝑣𝑖) by:

D ̂ 12 = n 1 i D i = n 1 LLR ,

where LLR is equal to the log-likelihood ratio is given by 𝐷𝑖 = log(𝑓2(𝑦𝑖; 𝜃̂2)/𝑓1(𝑦𝑖;𝜃̂1)).

For non-nested or nested models, a large confidence interval o the sample of 95% for parameter 𝐷12 is

(B-5) D ̂ 12 ± n 1 / 2 σ ̂ 12 2 ,

where the variance of 𝐷12 is given by σ̂122=n11i=1nDiD̂122.

There are also versions with LLR adjusted by the Akaike information criteria (AIC) or based on Bayes information criterion (BIC), and are respectively Vuong (1989)Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 57(2), 307-333. http://dx.doi.org/10.2307/1912557
http://dx.doi.org/10.2307/1912557...
:

(B-6a) D ̂ 12 n 1 dim Θ 2 dim Θ 1 ± 1 . 96 n 1 / 2 σ ̂ 12 2 ,

(B-6b) D ̂ 12 1 2 n 1 log n dim Θ 2 dim Θ 1 ± 1 . 96 n 1 / 2 σ ̂ 12 2 .

If the intervals described in equations (B-5), (B-6a) or (B-6b) contain 0, models M1 and M2 are not considered significantly different. If the interval does not contain 0, then, the model M1 or M2 is the best fit depending on whether the interval is completely below 0 or above 0, respectively.

References

  • Ahmed, O., & Serra, T. (2015). Economic analysis of the introduction of agricultural revenue insurance contracts in Spain using statistical copulas. Agricultural Economics, 46(1), 69-79. http://dx.doi.org/10.1111/agec.12141
    » http://dx.doi.org/10.1111/agec.12141
  • Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of Hirotugu Akaike (pp. 199-213). Springer.
  • Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12(2), 171-178. https://www.jstor.org/stable/4615982
    » https://www.jstor.org/stable/4615982
  • Azzalini, A., & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2), 367-389. http://dx.doi.org/10.1111/1467-9868.00391
    » http://dx.doi.org/10.1111/1467-9868.00391
  • Babcock, B. A., & Hennessy, D. A. (1996). Input demand under yield and revenue insurance. American Journal of Agricultural Economics, 78(2), 416-427. http://dx.doi.org/10.2307/1243713
    » http://dx.doi.org/10.2307/1243713
  • Berg, D. (2009). Copula goodness-of-fit testing: An overview and power comparison. The European Journal of Finance, 15(7-8), 675-701. http://dx.doi.org/10.1080/13518470802697428
    » http://dx.doi.org/10.1080/13518470802697428
  • Braga, A. d. S., Cordeiro, G. M., Ortega, E. M. M., & Cruz, J. N. d. (2016). The odd log-logistic normal distribution: Theory and applications in analysis of experiments. Journal of Statistical Theory and Practice, 10(2), 311-335. http://dx.doi.org/10.1080/15598608.2016.1141127
    » http://dx.doi.org/10.1080/15598608.2016.1141127
  • Brisolara, C. S. (2013). Proposições para o desenvolvimento do seguro de receita agrícola no Brasil: Do modelo teórico ao cálculo das taxas de prêmio (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/I11.2013.tde-02102013-141823
    » http://dx.doi.org/10.11606/I11.2013.tde-02102013-141823
  • Carvalho, A. L., Paredes, C. A., Miquelluti, D., Ruis, D., Passarelli, E., & Duarte, G. V. (2013). Aspectos gerais do seguro de faturamento http://geser.imagenet.com.br/
    » http://geser.imagenet.com.br/
  • Cherubini, U., Luciano, E., & Vecchiato, W. (2004). Copula methods in finance John Wiley & Sons. http://dx.doi.org/10.1002/9781118673331
    » http://dx.doi.org/10.1002/9781118673331
  • Duarte, G. V., Braga, A., Miquelluti, D. L., & Ozaki, V. A. (2017). Modeling of soybean yield using symmetric, asymmetric and bimodal distributions: implications for crop insurance. Journal of Applied Statistics, 45(11), 1-18. http://dx.doi.org/10.1080/02664763.2017.1406902
    » http://dx.doi.org/10.1080/02664763.2017.1406902
  • Gallagher, P. (1987). U.S. soybean yields: Estimation and forecasting with nonsymmetric disturbances. American Journal of Agricultural Economics, 69(4), 796-803. http://dx.doi.org/10.2307/1242190
    » http://dx.doi.org/10.2307/1242190
  • Genest, C., Ghoudi, K., & Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543-552. http://dx.doi.org/10.1093/biomet/82.3.543
    » http://dx.doi.org/10.1093/biomet/82.3.543
  • Genest, C., Rémillard, B., & Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 44(2), 199-213. http://dx.doi.org/10.1016/j.insmatheco.2007.10.005
    » http://dx.doi.org/10.1016/j.insmatheco.2007.10.005
  • Goodwin, B. K., & Ker, A. P. (2002). Modeling price and yield risk. In R. E. Just & R. D. Pope (Eds.), A comprehensive assessment of the role of risk in U.S. agriculture (Vols. Natural Resource Management and Policy, vol 23, pp. 289-323). Boston, MA: Springer. http://dx.doi.org/10.1007/978-1-4757-3583-3_14
    » http://dx.doi.org/10.1007/978-1-4757-3583-3_14
  • Goodwin, B. K., & Mahul, O. (2004, September). Risk modeling concepts relating to the design and rating of agricultural insurance contracts (Policy Research Working Paper No. 3392). Washington, D.C.: World Bank. http://dx.doi.org/10.1596/1813-9450-3392
    » http://dx.doi.org/10.1596/1813-9450-3392
  • Goodwin, B. K., Roberts, M. C., & Coble, K. H. (2000). Measurement of price risk in revenue insurance: Implications of distributional assumptions. Journal ofAgricultural and Resource Economics, 25(1), 195-214. https://www.jstor.org/stable/40987056
    » https://www.jstor.org/stable/40987056
  • Hennessy, D. A., Babcock, B. A., & Hayes, D. J. (1997). Budgetary and producer welfare effects of revenue insurance. American Journal of Agricultural Economics, 79(3), 1024-1034. http://dx.doi.org/10.2307/1244441
    » http://dx.doi.org/10.2307/1244441
  • IPARDES - Instituto Paranaense de Desenvolvimento Econômico e Social. (2015). Base de Dados do Estado (BDEweb) Retrieved Julho 02, 2015, from http://www.ipardes.pr.gov.br
    » http://www.ipardes.pr.gov.br
  • IPEADATA. (2015). Índice Geral de Preços - Disponibilidade Interna (IGP-DI) http://www.ipea-data.gov.br/ Retrieved November 19, 2015, from http://www.ipeadata.gov.br
    » http://www.ipea-data.gov.br/
  • Joe, H. (2014). Dependence modeling with copulas [Chapman & Hall/CRC Monographs on Statistics and Applied Probability 134]. CRC Press.
  • Joe, H., & Xu, J. J. (1996, October). The estimation method of inference functions for margins for multivariate models (Technical Report No. 166). Vancouver: University of British Columbia. http://dx.doi.org/10.14288/L0225985
    » http://dx.doi.org/10.14288/L0225985
  • Just, R. E., & Weninger, Q. (1999). Are crop yields normally distributed? American Journal of Agricultural Economics, 81(2), 287-304. http://dx.doi.org/10.2307/1244582
    » http://dx.doi.org/10.2307/1244582
  • Kojadinovic, I., Yan, J., & Holmes, M. (2011). Fast large-sample goodness-of-fit tests for copulas. Statistica Sinica, 21(2), 841-871. https://www.jstor.org/stable/24309543
    » https://www.jstor.org/stable/24309543
  • Lin, C.-T., Huang, Y.-L., & Balakrishnan, N. (2008). A new method for goodness-of-fit testing based on type-II right censored samples. IEEE Transactions on Reliability, 57(4), 633-642. http://dx.doi.org/10.1109/TR.2008.2005860
    » http://dx.doi.org/10.1109/TR.2008.2005860
  • Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297-303. http://dx.doi.org/10.1093/biomet/65.2.297
    » http://dx.doi.org/10.1093/biomet/65.2.297
  • MAPA - Ministério da Agricultura, Pecuária e Abastecimento. (2017). Atlas do seguro rural: Indicador das taxas Retrieved February 21,2017, from http://indicadores.agricultura.gov.br/atlasdoseguro/index.htm
    » http://indicadores.agricultura.gov.br/atlasdoseguro/index.htm
  • Miqueleto, G. J. (2011). Contribuições para o desenvolvimento do seguro agrícola de renda para o Brasil: Evidências teóricas e empíricas (Doctoral dissertation, Universidade de São Paulo, Escola Superior de Agricultura “Luiz de Queiroz”, Piracicaba). http://dx.doi.org/10.11606/T.11.2011.tde-12092011-163544
    » http://dx.doi.org/10.11606/T.11.2011.tde-12092011-163544
  • Morettin, P. A. (2008). Econometria financeira: Um curso em séries temporais financeiras Edgard Blücher.
  • Pakyari, R., & Balakrishnan, N. (2012). A general purpose approximate goodness-of-fit test for progressively type-II censored data. IEEE Transactions on Reliability, 61(1), 238-244. http://dx.doi.org/10.1109/TR.2012.2182811
    » http://dx.doi.org/10.1109/TR.2012.2182811
  • Ramirez, O. A., Misra, S., & Field, J. (2003). Crop-yield distributions revisited. American Journal of Agricultural Economics, 85(1), 108-120. http://dx.doi.org/10.1111/1467-8276.00106
    » http://dx.doi.org/10.1111/1467-8276.00106
  • Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461-464. https://www.jstor.org/stable/2958889
    » https://www.jstor.org/stable/2958889
  • SEAB - Secretaria da Agricultura e do Abastecimento do Parana, Departamento de Economia Rural (DERAL). (2015). Preços Retrieved September 30,2015, from http://www.agricultura.pr.gov .br/modules/conteudo/conteudo.php?conteudo=195
    » http://www.agricultura.pr.gov.br/modules/conteudo/conteudo.php?conteudo=195
  • Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges., 8, 229-231.
  • Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 57(2), 307-333. http://dx.doi.org/10.2307/1912557
    » http://dx.doi.org/10.2307/1912557
  • Yan, J. (2007). Enjoy the joy of copulas: With a package copula. Journal of Statistical Software, 21(4), 1-21. http://dx.doi.org/10.18637/jss.v021.i04
    » http://dx.doi.org/10.18637/jss.v021.i04

Publication Dates

  • Publication in this collection
    25 Nov 2019
  • Date of issue
    Jul-Sep 2019

History

  • Received
    02 July 2018
  • Accepted
    17 Sept 2018
Fundação Getúlio Vargas Praia de Botafogo, 190 11º andar, 22253-900 Rio de Janeiro RJ Brazil, Tel.: +55 21 3799-5831 , Fax: +55 21 2553-8821 - Rio de Janeiro - RJ - Brazil
E-mail: rbe@fgv.br