Predicting survival function and identifying associated factors in patients with renal insufficiency in the metropolitan area of Maringá , Paraná State , Brazil

Renal insufficiency is a serious medical and public health problem worldwide. Recently, although many surveys have been developed to identify factors related to the lifetime of patients with renal insufficiency, controversial results from several studies suggest that researches should be conducted by region. Thus, in this study we aim to predict and identify factors associated with the lifetime of patients with chronic renal failure (CRF) in the metropolitan area of Maringá, Paraná State, Brazil, based on the generalized additive models for location, scale and shape (GAMLSS) framework. Data used in this study were collected from the Maringá Kidney Institute and comprehends 177 patients (classified with CRF and mostly being treated under the Brazilian Unified National Health System) enrolled in a hemodialysis program from 1978 up to 2010. By using this approach, we concluded that in other regions, gender, kidney transplant indicator, antibodies to hepatitis B and antibodies to hepatitis C are significant factors that affect the expected lifetime. Survival Analysis; Renal Insufficiency; Kidney Diseases Correspondence L. R. Nakamura Universidade Federal de Santa Catarina. Campus Reitor João David Ferreira Lima s/n, Florianópois, SC 88040-900, Brasil. luiz.nakamura@ufsc.br 1 Universidade Tecnológica Federal do Paraná, Curitiba, Brasil. 2 Universidade Federal de Santa Catarina, Florianópolis, Brasil. 3 Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Piracicaba, Brasil. 4 Universidade Federal de Pernambuco, Recife, Brasil. doi: 10.1590/0102-311X00075517 Cad. Saúde Pública 2018; 34(1):e00075517 QUESTÕES METODOLÓGICAS MEthodologiCal iSSuES This article is published in Open Access under the Creative Commons Attribution license, which allows use, distribution, and reproduction in any medium, without restrictions, as long as the original work is correctly cited.


Introduction
Renal failure, also known as renal insufficiency and kidney failure, is a serious medical and public health problem, characterized by the inability of the kidneys to excrete toxic substances from the body in a proper way 1 . There are numerous causes of kidney failure. Some of them lead to a rapid decline in renal function, with values less than 1 to 2% of normal levels, named acute renal failure (ARF). In addition, a gradual and progressive loss of a large part of the functioning nephrons can be observed, called chronic renal failure (CRF). According to Marques et al. 2 , symptoms of kidney failure include headache, weakness, anorexia, nausea, vomiting, cramps, diarrhea, oliguria (insufficient secretion of urine), edema, confusion, thirst, loss of smell and taste, somnolence, hypertension and tendency to hemorrhage resulting from renal failure, as well as skin pallor, xerosis (pathological skin dryness), dysmenorrhea (cramping before or during menstruation), amenorrhea (absence of menses), testicular atrophy, impotence, attention deficit disorder, muscle spasms, and coma.
Patients who, for some reason, lost their kidney function and irretrievably reached the terminal stage of the disease, now have three different methods for treatment: hemodialysis, peritoneal dialysis and kidney transplantation. According to the Brazilian Society of Nephrology, hemodialysis promotes the removal of toxic substances, water and minerals from the body by filtering the blood. It is generally held three times per week in sessions lasting an average of three up to four hours, with the aid of a machine, in specialized clinics.
De-Lima et al. 3 and Andrade et al. 4 report that despite various technological innovations incorporated in hemodialysis, Brazilian's studies did not show any great improvement in survival of patients with CRF in recent decades. The lack of increased survival of CRF carriers maintained on kidney dialysis could be explained by the increase in the average age of people who have started this treatment in recent years. Interest in the effects of age is justified by the recent growth in the number of elderly who start hemodialysis. According to Sims et al. 5 , from the approximately 1 million people kept in this regular treatment in the world, more than half are older than 65 years.
The aging population and increasing lifetime expectancy, resulting from a demographic change in recent decades in Brazil, have contributed to changes in the morbidity and mortality profile and increased prevalence of chronic diseases, including chronic kidney disease (CKD) 6 . According to Bommer 7 , hypertension and diabetes are the main risk factors for CKD and are becoming more frequent in the general population, contributing for the increased incidence of CKD. Oliveira et al. 8 reported that in Brazil and Latin America about 15% of dialysis patients are diabetics.
In addition to age, hypertension and diabetes, other complication factors in patients undergoing hemodialysis are hepatitis B (HBS) and C (HCV). Despite the control of hepatitis transmission, it remains a huge problem in chronic hemodialysis units. Leão et al. 9 and Ferreira et al. 10 detected a statistically significant association between infection with hepatitis C and B and the duration of hemodialysis treatment.
In Brazil, there are insufficient studies that evaluate, at a national level, the survival and associated factors of patients in both dialysis modalities 11 . Thus, regional and local studies are essential to identify such factors and to evaluate the survival of patients with CRF. Therefore, a survey was conducted at Maringá Kidney Institute to predict survival and identify associated factors in the lifetime of patients with renal insufficiency from the metropolitan region of Maringá, Paraná State, Brazil.

Methodology Data set description
Data were collected from the Maringá Kidney Institute, for patients classified with CRF enrolled in a hemodialysis program from 1978 up to 2010. Due to the lack of specialized hospitals in the region, this institute is responsible for the majority of patients with renal failure in the metropolitan region of Maringá, which is formed by 26 cities and has more than 764,000 inhabitants.
The data set has a total of 177 patients (80 women and 97 men), of whom 23 underwent kidney transplant and 22 were diabetic. Most patients treated at the institute came from the Brazilian public health system and all patients in the survey were undergoing hemodialysis. The total of failure times (death) is 119, and 58 observations were considered censored (32% of censure) if the patient did not continue in the program for any reason or if patients did not die until the end of study. Moreover, hepatitis B and C antibodies were detected in 73 and 14 patients, respectively. Finally, average age of onset of treatment is 56 years old.

Regression models
In practical applications, patients' lifetimes are affected by explanatory variables, such as age, tumor size, lymph node status, among others, which can affect the healing probability of an individual. Thus, they should be added to statistical models to obtain better estimates as well as individual interpretations for these variables. Several surveys using the class of location models have been proposed in literature. A disadvantage of the class of location models is that the variance is not modelled explicitly regarding explanatory variables, but implicitly, through its dependence on the location parameter. Hence, an important consideration in regression models is the assumption that all observations have the same dispersion parameter. The noncompliance with this assumption may affect the efficiency of the estimates of the parameters.
As an alternative to the class of location models, Rigby & Stasinopoulos 12 proposed the generalized additive models for location, scale and shape (GAMLSS), in which the systematic part of the model is expanded, allowing not only the location but all the parameters of the conditional distribution of the response variable to be modelled as parametric and/or nonparametric functions of a set of explanatory variables.
Let T~D(t;θ), where D represents the distribution of T and θ T denotes the vector of its parameter. Consider independent observations t i 's conditional on the parameters θ i (for i = 1,…,n) having probability density function (pdf) f(t i ;θ i ). To exemplify, consider a distribution having two parameters, say μ and σ, and define the elements of the vector θ T using two appropriate link functions, μ = g 1 (Χ 1 β 1 ) and σ = g 2 (Χ 2 β 2 ), (1) where g k (.), k = 1,2, denote appropriate link functions, β k = (β ok ,…,β mk k) T is a parameter vector of length (m k +1), m k denotes the number of explanatory variables related to the k th parameter and Χ k is a known model matrix of order nx(m k +1). The choice of parameters to be modelled by explanatory variables will be discussed later in this study.
Next, consider a sample of n independent observations t 1 ,…,t n . Let c i denote the censoring time, , (2) where S(t i ;θ i ) is the survival function relative to the distribution D, F and C denote the sets of individuals for which t i (lifetime or censoring) and the vector of parameters are defined in (1) by specifying appropriate link functions. The numerical maximization of (2) can be easily performed using the GAMLSS package 14 in the R software (The R Foundation for Statistical Computing, Vienna, Austria; http://www.r-project.org). Computational aspects are given in Figure 1.

Figure 1
Some computational aspects.

Probabilistic distributions
In this survey, we use three different distributions as possible models to represent the lifetime of the patients. The simplest one-parameter distribution that may be appropriate to represent the response variable is the exponential distribution with E(T) = μ and Var(T) = μ 2 , where the log link function is used to model μ with explanatory variables. Secondly, the log-normal distribution (two parameters) is also selected to explain the lifetime of the patients, with E(T) = exp(μ+ ) and Var(T) = [exp(σ 2 )-1] exp(2μ+ σ 2 ). In this case, we use the identity and log link functions to model μ and σ as functions of the explanatory variables, respectively. The last distribution used in this study is the Weibull distribution.
We highlight here that the parameterization of this last model differs from the more common one

Model selection
Here, we consider the model selection process in three sequential steps. The first step consists in choosing the best distribution to represent lifetime and cure proportions. In the second step, we present a method to select the explanatory variables to fit the parameters of the model. Finally, in the third step, we investigated the model assumptions through a residual analysis.

Selecting the best model
In the first stage, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values are used to assess different fitted distributions. The AIC and BIC criteria are defined by AIC = GD+2k and BIC = GD+log(n)k, respectively, where GD represents the global deviance given by GD = -2l(θ), l(θ) is the maximized total log-likelihood function, n denotes the sample size and k denotes the number of fitted parameters. The model presenting the smallest values for these criteria is then selected. These criteria can be used to compare models disregarding regression structure as well as the class of location and heteroscedastic models. In general, the selected distribution to represent failure and censored times also provides a good fitted model in the presence of explanatory variables (see Cordeiro et al. 13 and Ramires et al. 15 ). Moreover, an important function in survival analysis is the hazard function, which is defined as the event rate at time t conditional on survival until time t or later, that is, T ≥ t. In medicine, the hazard function is commonly called the force of mortality of the survival function. The empirical scaled total time on test (TTT) transform can be used to identify the shape of the hazard function, and then, we can verify if the chosen model was able to capture such shape.

Selecting the explanatory variables
The selection of the terms for all parameters is performed using a stepwise AIC procedure. Many different strategies could be applied for selection of the terms used to model all the distribution parameters. Here, we adopt the strategy described by Nakamura et al. 16 . Basically, the strategy consists of two steps. In the first step, we adopt a forward selection procedure to choose an appropriate model for μ, with σ fitted as constants. Afterwards, we repeat the same procedure to select the model for σ, using the models already obtained in the previous steps as constants. For the second step, we perform a backward selection procedure to choose an appropriate model for μ, where σ is fitted as constant.
At the end of the steps described, the final model may contain different terms for μ and σ. Note that if unnecessary terms are selected by the forward method in the first step, they will be removed in the backward method in the second step, thus the combination of these two methods is called stepwise AIC procedure. Details of this method can be found in Figure 1.

Residual diagnostics
To study departures from the error assumption and the presence of outlying observations, we can use the diagnostic tools in the GAMLSS package. The first technique is based on the normalized (randomized) quantile residuals 17  Although quantile residuals are widely used in the literature, it is not possible to identify specifically failures to fit the mean, variance, skewness and kurtosis in the response variable. As an alternative, we can use the worm plots (WP) 18 . These plots were pioneered by identifying regions (intervals) of an explanatory variable in which the model does not adequately fit the data. This is a diagnostic tool for checking the residuals for different ranges of one or two explanatory variables. In this study, we adopt an approach proposed by van Buuren & Fredriks 18 , which is to fit cubic models to each of the detrended Q-Q plots with the resulting constant, linear, quadratic and cubic coefficients indicating differences (misfits) between the empirical and model residual mean, variance, skewness and kurtosis, respectively, within the range in the WP (for further details, see Stasinopoulos et al. 19 ).

Results and discussion
We start the analysis by fitting exponential, Weibull and log-normal models disregarding regression variables. Table 1 shows maximum likelihood estimates (MLEs) and corresponding standard errors (SEs) (in parentheses) of the model parameters and GD, AIC and BIC statistic values for the fitted models. Note that we are considering the structure presented in (1), that is, for positive parameters we use the logarithmic link function μ = exp(β 0 ). We also provide in Figure 2a the plots of estimated and empirical survival functions. Thus, we conclude that the Weibull distribution provides a good fit to these data.
The TTT plot showed in Figure 2b reveals that the hazard function has decreasing shape. On the other hand, Figure 2c shows that the estimated hazard functions of exponential, Weibull and lognormal models are constant, decreasing and unimodal (increasing and then decreasing), respectively, indicating that the Weibull distribution is appropriate to fit these data.
After selecting the Weibull distribution as the best one to fit these data, we propose a Weibull regression model. Using the steps described in the previous sections to select additive terms for the parameters, we obtain μ = exp(β 01 +β 11 x 1 +β 21 x 2 +β 51 x 5 +β 61 x 6 +β 71 x 7 ) (3) and σ = exp(β 02 +β 12 x 1 +β 52 x 2 +β 62 x 6 +β 72 x 7 ) (4) For the selected explanatory variables to explain the parameters μ and σ from the Weibull model, we present in Figure 3 the empirical survival curves (Kaplan-Meier) stratified for the categorical variables. In these panels, we can see a great difference between the levels for each variable (gender, kidney transplant indicator, antibodies to hepatitis B and C), agreeing with the selected covariates in the final model. Table 2 provides estimates, SEs, and p-values obtained from the fitted Weibull model based on GAMLSS. We highlight that all selected parameters are significant at the 10% significance level. Results in Table 2 reveal that the explanatory variables -age at the beginning of treatment, gender, kidney transplant indicator and antibodies to hepatitis B and C -are significant factors to explain the lifetime mean μ of the patients, which is expected from a wide literature that associates these variables as high-risk factors for patients (e.g. Tangri et al. 20 and Fabrizi et al. 21 ). Moreover, the age at the beginning of treatment, kidney transplant indicator and antibodies to hepatitis B and C are significant factors to explain the variability of the shape parameter σ of survival times. A limitation of this study is that few explanatory variables were considered because only those variables were collected by the hospital. Further research, including other variables, may reveal factors not yet discovered in the literature.
Partial effects of the explanatory variables in the mean μ are displayed in Figure 4. From the model for the mean μ, we can see that the older the patient when he starts treatment, the lower is their lifetime expectancy (panel (a)). More precisely, for each additional year at the beginning of treatment, the patient's average lifetime expectancy decreases 0.036 years ( Table 2). The lifetime expectancy is lower in female patients (panel (b)) and in patients who have not undergone kidney transplantation Table 1 Maximum likelihood estimates (MlEs) of the exponential, Weibull, and lognormal distributions, the corresponding standard errors (SEs) and global deviance (gd), akaike information criterion (aiC) and Bayesian information criterion (BiC) statistics.

Figure 2
Estimated and empirical survival functions, total time on test (ttt plot) and estimated hazard rate function of the proposed distributions.
(panel (c)). Moreover, panels (d) and (e) indicate that patients who have antibodies to hepatitis C and B, respectively, have greater lifetime expectancy. Regarding the shape parameter σ ( Figure 5), as we can see in panel (a), the variable age at the beginning of treatment has a positive linear relationship with the variability of patients' lifetime , i.e. the variability of lifetime increases when patients waited to start treatment. Furthermore, according to panels (b)-(d), the variability also increases when the patient has undergone kidney transplantation and when there are antibodies to hepatitis B and/or C. Here, it is clear that the variability of patient's lifetime must be modelled as well, and not only the mean (as more traditional models such as classical linear regression or generalized linear model usually do), which could mislead us on the selection of false risk factors for kidney failure. Figure 6 shows some residual plots that can help to verify the adequacy and the assumptions of the chosen fitted model given in (3) and (4). Panel (a) suggests that the normalized quantile residuals have approximately normal distribution. Panel (b) shows that there are few points off the line in the low end of the range. Finally, as we can see in panel (c), the WP also suggests that there is a slightly problem in the lower tail of the distribution of T. The correct would be the fitted cubic model (red curve) to be a line on the X-axis. Nevertheless, the Weibull regression model based on the GAMLSS framework provides a reasonable fit to these data.

Figure 3
Empirical survival curves for gender, kidney transplant indicator, antibodies to hepatitis B and antibodies to hepatitis C.

Table 2
Estimates, corresponding standard errors (SEs) and p-values from the fitted Weibull generalized additive models for location, scale and shape (gaMlSS) regression.  Fitted terms for the mean μ: age (in years) at the beginning of treatment, gender, kidney transplant, antibodies to hepatitis C and antibodies to hepatitis B.

Parameter
Consider the high-risk group (x 1 = maxx 1 = 88, x 2 = 0, x 5 = 0, x 6 = 0 and x 7 = 0) and the low-risk group (x 1 = minx 1 = 17, x 2 = 1, x 5 = 1, x 6 = 1 and x 7 = 1). The average of the lifetime for high-risk and low-risk groups is 310 and 21,568 days, respectively, which correspond to 0.84 and 59 years, respectively. The same idea can be used to estimate the average lifetime for any level of explanatory variables.

Concluding remarks
Renal insufficiency is a serious medical and public health problem. Thus, regional studies are needed to identify associated factors that affect patients' lifetime. Based on the fit of the Weibull model, we verify that the late onset of renal insufficiency, female gender, absence of kidney transplant and no antibodies to hepatitis B and C can be identified as negative factors that affect the lifetime of patients from the metropolitan area of Maringá. Hence, patients having such characteristics require special attention. Furthermore, the age at the beginning of treatment, kidney transplant indicator and antibodies to hepatitis B and C are significant factors to explain the variability of the survival time.

Figure 5
Fitted terms for σ: age (in years) at the beginning of treatment, kidney transplant, antibodies to hepatitis C and antibodies to hepatitis B.

Figure 6
Residual from the Weibull regression model based on the generalized additive models for location, scale and shape (gaMlSS) framework: density of the quantile residuals, Q-Q plot and worm plots (WP).