Survival analysis of women with breast cancer : competing risk models

This study aimed to estimate the effects of prognostic factors on breast cancer survival, such as age, staging, and extension of the tumor, using proportional hazards and competing risks models proposed by Cox and Fine-Gray, respectively. This is a retrospective cohort study, based on a population of 524 women, who were diagnosed with breast cancer in the period from 1993 to 1995 and monitored until 2011, residents in the city of Campinas, São Paulo, Brazil. The cutoff points for the variable of age were defined with Cox simple models. In the settings of simple and multiple Fine-Gray models, age was not significant to the presence of competing risks, neither it was in Cox models. For both models, death by breast cancer was the event of interest. The survival functions, estimated by Kaplan-Meier, showed significant differences for deaths by breast cancer and by competing risks. Survival functions by breast cancer did not show significant differences when comparing the age groups, according to log-rank test. Cox and Fine-Gray models identified the same prognostic factors that influenced in breast cancer survival.


Introduction
Breast cancer is an important public health problem, in Brazil and worldwide, due to the growing incidence, morbidity, and mortality, as well as to the high costs of treatment.Several factors are already established as triggers for breast cancer development in women, among them, the reproductive life of the woman (early menarche, nulliparity, first pregnancy above 30 years of age, use of oral contraceptives, late menopause, and hormone replacement therapy).Age has also been considered an important risk factor for the breast cancer onset 1 .
In 2012, there were approximately 14 million new cases of cancer in the world, and 8.2 million deaths (except by non-melanoma skin cancer).Approximately 1.7 million of these new cases were of breast cancer, with about 552,000 deaths.In the United States of America, it is estimated that more than 200,000 new cases of breast cancer were diagnosed in women only this year, with 44,000 deaths 2 .
In Brazil, in 2013 there were 14,388 deaths from breast cancer, being 14,207 women.For 2016, the National Cancer Institute José Alencar Gomes da Silva (INCA) pointed out, in its latest report, that 57,960 new cases of breast cancer are expected, with an estimated incidence rate of 56.20 cases every 100,000 women 3 .Among these new cases, 51% will be in the Southeast Region of Brazil.The Northern Region has the lowest incidence projection for 2016.These projections are based on information provided by the Population-Based Cancer Registry (RCBP) and the Mortality Information System (Sim).There are more than 20 RCBP in Brazil, which are located mainly in the capitals.
Cancer registries and institutes, such as INCA, gather information and generate reports with graphs and estimates related to the survival of people with various types of cancer.These reports are disseminated both through printed and electronic means.Such reports are important sources of information to support health organs in their actions and positions, and allow to trace an epidemiological profile by sex, age, and age group of the Brazilian regions.In general, the survival analyses are performed based on classic survival techniques and thus consider a single cause for the death (remission, relapse etc.) as an event of interest.
In the epidemiological area, some indicators, such as mortality rates or coefficients, risks ratio (or relative risks), and odds ratio, are frequent-ly used to express the magnitude and strength of the association between exposure factors and event of interest.Especially in the studies on survival time of diseases such as cancer, the probability or "odds" to survive is the measure used for a given monitoring time.Such a measure can be summarized in graphs or survival functions that will describe the behavior of the survival probabilities to an event of interest over time.Besides, survival time and risks ratio can then be adjusted by the exposure variables.
The classical techniques of survival analysis, most used to study survival time, are the non-parametric estimator of Kaplan-Meier and the proportional hazards models such as Cox's.The first allows the estimate of survival functions, and the second allows the assessment of covariates on hazards ratio.Both consider that there is a single cause to the event of interest (such as the death, for example, in studies on cancer) [4][5][6] .However, it is more realistic to assume that an individual in the population is subject to several death causes that compete with each other.Bearing this in mind, more appropriate techniques have been proposed.Among them is the Fine-Gray competing risk model, which incorporates the influence of those risks in the survival estimate 7 .The Fine-Gray model is an extension of the Cox model, and it allows the incorporation of various causes for the event of interest, and is also of easy applicability and interpretation.The advantage of using this model is that other death causes are considered in the parameter estimates of the model, in a way that the risks are more appropriately estimated.
Gooley et al. 8 point out, in their study, to the incorrect way of relating risk function, considering it a complement of the survival function, when in the presence of competing risks to an interest event.In fact, this is inappropriate, since, when there are several causes to a same event of interest, the relations and statistical properties of the classic survival analysis (which considers only one cause) are not valid to the scenario.
Given the impracticality of applying classical techniques of survival analysis, it becomes essential the application of more appropriate models such as the competing risks models.
In this scenario of competing risks, some studies, which motivated our own, analyze the survival time in the presence of competing risks through Cox proportional hazard models for cause-specific.In other words, the model is adjusted for each event of interest, and analyzed together with the competing risks models [9][10][11] .
We aim to introduce the use of Fine-Gray's competing risks models, and of Cox proportional hazard for cause-specificto estimate the effects of prognostic factors in breast cancer survival when in the presence of competing events.

Methodology
The population under study consists on a retrospective population-based cohort, obtained in the Population-Based Cancer Registry of the School of Medical Sciences from the University of Campinas (RCBP-FCM/Unicamp).It refers to the registries of 524 women, diagnosed with breast cancer from January 1 st , 1993 to December 31 st , 1995; resident in the municipality of Campinas, São Paulo; with follow-up period until December 31 st , 2011.The information provided by RCBP-FCM/Unicampis sourced from several medical institutions (public and private) in this municipality, but not from all the existing ones.
When not listed in the RCBP-FCM/Unicamp database, the date of death was obtained in the Mortality Information System (Sim) of Campinas.
Campinas is located in the northeast of São Paulo state, and had a population of 1,080,113(Census of 2010) and MHDI (Municipal Human Development Index) of 0.805 in 2010 12 .
Among the variables of the RCBP-FCM/Unicamp, the studied ones were: age (years), tumor staging (I, II, III, IV, other), tumor extension (localized, regional, metastatic, other), and region of residence (North, Northeast, South, Southeast, East, Central, ignored) -this variable was constructed according to the geographical location of the address.The staging degrees were grouped, for example: Ia and Ic-as category I;IIa and IIbas category II;IIIa and IIIb -as category III; and so forth.
The tumor staging followed the TNM classification as proposed by the Union for International Cancer Control (UICC) of 1988.In the acronym, T refers to the primary tumor, N to lymph nodes (cancer spread to near lymph nodes), and M to metastasis (cancer spread to distant parts of the body).
The RCBP-FCM/Unicamp had 564 women recorded, whowere diagnosed with breast cancer in the studied period, but only 524 were considered eligible.The inclusion criteria were women resident in the municipality of Campinas, diagnosed with breast cancer in the period from 1993 to 1995, and registered in the RCBP-FCM/Uni-camp.Were excluded the registries of women diagnosed with breast cancer through autopsy, i.e., after death, and also those with missing information regarding dates of diagnosis or birth, in addition to those with tumor staging and extension in the category in situ, for considering that women with these characteristics did not fit in the same situation of risk of death as the others.
The survival time was measured in years and defined as the period between the diagnosis date and the occurrence of the event of interest (death from breast cancer or from other causes), or until the end of the study.Registers of women who had no events of interest during the study period (1993 to 2011) were considered censured observations.Other death causes, except breast cancer, were appointed in the results as competing risks (CR).Although breast cancer can be also seen, on the other hand, as a competing death risk from other causes, it was not appointed in this way.The several risk denominations (relative risks, death risks, hazards ratio, among others) must be understood within the context they are included 13 .
Two configurations were considered to the categorical variable of censuring: one on which this variable was equal to 1 when the individual died from breast cancer; equal to 2 when the death occurred from a competing cause to breast cancer; and equal to 0 for censure (when the woman did not die during the study period); and the other for which there was no category 2.
A descriptive analysis was performed with the studied variables, using mean, median, interquartile range, and proportions.To estimate the survival function it was used the Kaplan-Meier 14 estimator to each event or outcome of interest, with and without stratification in the level of categorical variables.The curves were compared with each other through the log-rank test, with a significance level of 5%.
To estimate the effects of covariates on the survival, the Cox proportional hazard and the Fine-Gray competing risk models were fitted, considering simple and multiple regressions.Multiple models were adjustedwith two and with all the variables in the model.Simple models were adjustedfor each variable separately.All models were adjusted by the variable of age, because of its well-known influence in the literature to explain breast cancer survival.The Fine-Gray model was adjusted considering death from breast cancer as the event of interest, and the other causes as competing risks.For Cox models, in turn, two types of settings were made: considering death from breast cancer as an event of in-terest and the other causes as censuring, and vice versa.The significance level to assess covariates in the adjusted models was 0.05.
To compare the survival functions according to age, this variable was categorized.To this end, we followed the proposition of Cai et al. 15 for defining a cutoff point through Cox models' settings to various points.We suggested the ages of 40, 45, 50, 55, 60, 65 and 70 years as possible cutoff points.After the settings in the Cox models, the age of 50 years was defined as a reasonable cutoff point (which coincides with the period of menopause onset).Thus, a categorical variable of age was built, assigning the value 1 to women over 50 years and value 0 to those under this age 16,17 .
The proportionality assumption was verified through graphical methods and statistical test appropriate for each model, such as Schoenfeld residuals' graphs, graphs for accumulated incidence functions, log-rank and Gray statistical tests.
The analyses were performed based on the obtained results and graphs, using the R software, version 3.0.2 18.
This research project was approved by the signing of the Informed Consent Form by the Research Ethics Committee of the School of Medical Sciences from University of Campinas (Resolution 466/2012CNS/MS).

Results
The highest percentages of women diagnosed with breast cancer were in the southern and northern regions (28% and 22%, respectively), and the lowest was in the northwestern region of Campinas.Regarding the staging of tumor, approximately 36% of the women were, on the date of diagnosis, at degree II or III.On tumor extension, 62% of the women had localized or regional extensions.Among the women who died from breast cancer, 40% were at the staging degree II or III on the date of diagnosis.Regarding tumor extension, in turn, 57% had localized or regional extensions.Also, it was observed that approximately 10% were in the staging degree IV, or had a metastatic extension.Approximately 98% of the women were diagnosed through anatomopathological examination; Table 1.
Until the end of the study, from the 524 women in follow-up, 191 died from breast cancer, 81 died from other identified causes, and 252 were censured.On the date of diagnosis, the mean age was 57 years (interquartile range from 45 to 67 years); being the youngest women 25 years old and the most longevous, 93.We also observed that approximately 64% of the women were over 50 years of age.
Considering the death from breast cancer as an event of interest, the general survival estimate obtained with the estimator proposed by Kaplan-Meier was 60.8% at the end of the study, 79.7% at five years and 68.9% at ten years of disease.There was no significant difference between the survival functions according to age groups (< 50 and ≥ 50 years), according to the log-rank test, p = 0.2204 (graph above of Figure 1).The survival functions for the covariates of staging and tumor extension had significant differences among their respective levels of categories (figures not outlined here).
In Cox simple models, with death from breast cancerbeing the event of interest, the continuous covariate of age was not significant, while covariates of staging and tumor extension presented significant categories regarding their baseline ones.They were: the staging degree III and IV, and all extension categories.In Cox multiple model, the covariate of age remained insignificant in the presence of the others, demonstrating the lack of influence of age in the risk of death from breast cancer.However, regarding covariates of tumor staging and extension, only the latter remained significant in Regional and Other categories, as seen in Table 2. Thus, a model intending to explain the risk of death from breast cancer would be the one that considers only the extension variable as explanatory.
In Cox bivariate models, set for two variables, and the event of interest being death from breast cancer, age was insignificant in all of them.In the adjusted model for age and tumor staging, the only two significant categories were the degree III and IV.In the model setting with age and tumor extension, the latter covariate presented all the significant categories.Considering the model setfor tumor staging and extension, the first covariate presented no significant categories, and the latter presented the Regional and Other categories as significant.These bivariate models corroborate the proposed model, for considering only the variable of tumor extension to explain the risk of death from breast cancer.The tables with these results are not presented here.
In the settings of the Cox models, considering death from other causes (competing risks to breast cancer) as the event of interest, the variable of age was significant both in simple and multiple models.While the covariates of tumor staging and extension were insignificant in the Cox simple model, each one of them showed only one significant category in the multiple model, the staging degree IV and the metastatic extension, as seen in Table 3.Thus, a model intending to explain the risk of death from other causes would be the one that have age as the only predictor variable.
Considering the event of interest as the death from other causes, the bivariate models of Cox presented the covariate of age as significant when controlled, separately, by the covariates of staging and tumor extension, although these variables were insignificant in these models.The Cox model with the covariates of tumor staging and extension presented staging degree IV and metastatic tumor extension as significant categories.In this case, a model to explain the risk of death by other causes is one that contains age, tumor extension and staging.
In the presence of competing risks, and with death from breast cancer as the event of interest, the covariates of tumor extension and staging were consideredas important to the incidence of death in the settings in simple Fine-Gray models.Tumor staging presented two significant categories, the degrees III and IV.All the categories of tumor extension were significant.In the multiple model, only the covariate of tumor extension was significant (Table 4).In the same way as Cox model, the Fine-Gray model allows the explanation of risk of death from breast cancer only by the tumor extension.
In Fine-Gray models setfor two covariates, the age was not significant in the presence of any other covariate (tumor staging and extension).In the model setfor age and tumor staging, the two significant categories presented were degrees III and IV.In the model setfor age and tumor extension, the latter presented all the significant categories.And, finally, in the model setting with tumor staging and extension, the first presented no significant categories, while the latter presented all the significant categories.Thus, analogous to the Cox model, the settings of bivariate Fine-Gray models showed the variable of tumor extension as explanatory to risk of death from breast cancer.
Proportionality for each set model, Cox and Fine-Gray, was verified through Schoenfeld residuals, and graphs for the accumulated incidence function, respectively.And we verified the proportionality considering the covariate of age (graphs not outlined here).

Discussion
In our study, which accompanied in retrospect a cohort of women diagnosed with breast cancer, the general survival estimated by Kaplan-Meier method (K-M) was 60.8%,which implies that this was the estimated probability of surviving about 16 to 18 years to breast cancer, counting after the date of diagnosis.However, there was a fairly significant proportion of censured data, which include women that died of competing causes, so the probabilities might be overrated.With values very close to those obtained in our study, Guerra et al. 19 found a survival of 76.3% in five years, being 62% and 68% of these women in the postmenopausal period (separated by type of access to service; public or private, respectively).
A study on breast cancer survival, held on a municipality in the South of Brazil, found a global survival of 68.1% (in the period from 1980 to 2000) -87.7% in five years, and 78.7% in ten years 20 .Abreu et al. 21, in their study on a population cohort from the municipality of Goiânia, from 1988 to 1990, found a survival of 57.1% in five years and 41.5% in ten years, values that are much smaller than those obtained in our study.This may be due to the regional differences in health services accessibility and the implementation of cancer prevention programs 21 .
The effects of tumor staging or extension were independent of age, since there was no significant interaction with this variable.An interesting observation is that the age groups (ages under and over 50 years) were homogeneous in the categories of tumor staging and extension.
In their study, Cai et al. 15 point out age as an important prognostic factor associated with high Figura 1. Survival functions, estimated by the Kaplan-Meier method, for deaths from breast cancer and from competing risks (graphs above and below, respectively).incidences of cancer mortality (specifically, for localized renal cell carcinoma), which was concluded based on settings of Cox models.In contrast, in our findings age was not considered a prognostic factor important to the survival of the women who died of breast cancer; young and longevous women had virtually the same survival probabilities.This is very interesting, since in the studied cohort the proportion of women over 50 years is far superior to that of younger women, and it was expected to find a stronger influence of age on the rates of death risk by breast cancer (64.3% of the women were over 50 years at the time of diagnosis).Since it is not feasible to randomize the women in the two age groups and who do not have breast cancer yet, the unbalanced data in the age variable may have influenced on the importance of such variable in the set models.Brazil has socioeconomic, cultural, and environmental characteristics markedly regional, which also interfere with the epidemiological profile of each region 22 .This implication of regional differences is also perceived in the municipality indicators, as could be evidenced in this study regarding the greater incidence of breast cancer in Southern and Northern regions of the municipality of Campinas.
In addition, we point out that breast cancer survival may be influenced by the presence of competing risks.Cox and Fine-Gray models identified basically the same covariates to explain the risks or incidences of death by breast cancer, noting that the latter model considers the presence of competing risks.The model proposed by Fine-Gray models the cumulative incidence function, considered by other authors as appropriate to estimate the death probabilities.
Unlike expected, the fact of considering the other death causes as censuring in the Cox model did not impact the identification of other covariates, distinctive of the ones in the Fine-Gray model, which incorporates competing risk, and this may be due to the low percentage of other causes of death in the dataset studied.
It is important to note, on the results obtained by our study, that the relative risks have their values reduced when competing risks are considered in the estimates of model parameters.Thus, to consider as censured data the records of women who died of any competing cause to breast cancer produces overrated values of relative risks (or hazards ratio).In other words, although in low percentage in the selected cohort, the presence of other death causes (competing risks) impacted the estimates of models parameters.
The models we set, both Cox as Fine-Gray, considered that only one of the variables -staging or tumor extension -should be included in the model to explain the death by breast cancer.In particular, the tumor extension was selected (based on the descriptive values of the test), corroborating other studies 23,24 .
Other studies indicate that the high incidence of cancers detected in the early stages, and the low incidence in later stages can result from the effective screening programs.Overall, in developing countries such as Brazil, there are screening programs said to be opportunistic, i.e., the breast cancer diagnosis is made at random during a visit of the women to the health services for other reasons.And, in some developed countries, the screening programs are said to be effective, i.e., they are made in a systematic and organized manner.Because of the occurrence of over-diagnosis from screening programs, many women are diagnosed with early stage cancers that would probably never develop to clinical symptoms and submitted to unnecessary, long, and painful treatments, which negatively impact their quality of life 25,26 .
One reason to present the applicability of the competing risks models in our study is the fact that it is still incorrectly used to estimate the incidence function, through the complementary estimator of Kaplan-Meier survival function 27 .This may be due to computer difficulties, as many statistical packages do not calculate yet the cumulative incidence function.Thus, a computationalsupport can be found in Pintilie's work 28 .
Estimating the survival in a more appropriate way is an important support to the development and implementation of adequate programs for the treatment of the disease.The classical techniques of survival analysis to estimate the survival time overestimate the survivalprobabilities.On the other hand, they underestimate the death risks, since the presence of competing risks is not considered in the analysis, showing the relevance of the use of models such as the Fine-Gray's, singled out in this research.Some limitations are intrinsic to the type of study we used.The use of secondary data made unfeasible the analysis of other prognostic factors to the survival of women with breast cancer, such as, for example, the type of treatment to which women would be submitted.It also hindered the registration of censuring that occurred during the period of follow-up, since the only ones that were identified occurred at the end of the study.In addition, a selection bias may existdue to the fact that RCBP-FCM/Unicamp has records of some medical institutions and does not cover all existing ones in the municipality of Campinas.Hence, people who did not have access to these places were not considered in the analysis.
The RCBP, even with all the limitations and restrictions of analysis on health situation, are still important sources of information to trace the epidemiological (oncological)profileof a particular region, subsidizing the health services in prevention and treatment programs for people with cancer 29 .
In Brazil, although there are awareness campaigns on the importance of periodic breast examinations, being it by breast self-examination, clinical breast examination (CBE), or by radiological exams, such as mammography (MMG), coefficients of gross mortality from breast cancer have grown over the last ten years, not showing any decline or stability 26 .This may be consequent of late diagnosis, or of behavioral factors regarding the development of the cities.In the municipality of Campinas it has not been different.The coefficient of breast cancer mortality is increasing over time.
Breast cancer is considered an important problem of public health in Brazil, and, although there are studies addressing the topic of survival, few have considered the existence of competing risks.There is much yet to be explored in this regard.Thus, it becomes important the search of other death causes when collecting data, instead of only searching for the interest causes, to enrich further analyses and to obtain more reliable survival estimates that reflect reality.
A suggestion for future studies isthe use of healing fraction models, also called long-duration models, when there is a high percentage of censured data 30 .

Conclusion
We conclude that both Cox and Fine-Gray models virtually identified the same covariates as influential in the survival time of women with breast cancer, noting that Fine-Gray consider the presence of competing events in the estimates of model parameters.It is likely that the low percentage of competing risks contributed to these findings since the model proposed by Fine-Gray is an extension of the Cox model.
Hazards ratiosare overestimated when the competing events are considered censured data.In other words, the parameter estimates of the models are influenced by the presence or absence of competing causes.
And, unlike the expected, age does not influence on survival of women with breast cancer, that is, young and older women have virtually the same survival probabilities.Also, the survival in presence of competing risks is influenced both by age and by tumor staging and extension.
Besides, through the settings of Cox models, we saw that age appeared as an important prognostic factor to explain risks of death from other causes (competing risks to breast cancer).Regarding the Fine-Gray model, this variable was important to explain death from breast cancer in the presence of competing events.

Collaborations
RO Ferraz and DC Moreira-Filho were responsible for the bibliographic review, conception, and planning of the research project, the obtaining and complementing of database, data analyzing and interpreting, writing and critical review of the definitive version.Both are responsible for the approval of the version to be published, and reviewing it to ensure the integrity and accuracy of any part of the work.

Table 1 .
Absolute and relative frequencies of the variables of women resident in the municipality of Campinas and diagnosed with breast cancer in the period from 1993 to 1995.

Table 3 .
Cox proportional hazards models for cause-specific, for death from other causes in women resident in the municipality of Campinas, in the period from 1993 to 1995.
(a) Age is continuous; (b) HR: Hazard Ratio; (c) SE: Standard Error; (d) Descriptive value of the test.

Table 4 .
F-G competing risks models for breast cancer of women resident in the municipality of Campinas, in the period from 1993 to 1995.
(a) Age is continuous; (b) HR: Hazard Ratio; (c) SE: Standard Error; (d) Descriptive value of the test.

Table 2 .
Cox proportional hazards models for cause-specific, for death from breast cancer in women resident in the municipality of Campinas, in the period from 1993 to 1995.