Health expectancy indicators: what do they measure?

Background: Health expectancy indicators aim at capturing the quality dimension of total life expectancy.; however, the underlying approach, definition of health, and information source differ considerably among the indicators available. Objective: (1) Review the main concepts and approaches used to estimate health expectancy focusing on two widely used European health indicators: Health-Adjusted Life Expectancy (HALE) and Healthy Life Years (HLY); (2) identify underlying differences between the results yielded by these two indicators. Method: Statistical differences between the HALE and HLY indicators by sex at ages 50, 60, and 70 were tested using pairwise and global Student´s t -tests and z-scores based on standard deviation. Data for 29 European countries were collected from the European Health Expectancy Monitoring Unit (EHEMU) information system and the World Health Organization (WHO) Global Burden of Disease Study 2016 (GBD 2016). Results: The HALE indicator estimates were smoother across European countries compared with those of the HLY indicator, present a narrower sex gap in morbidity, higher z-scores compared with the average distribution across Europe, and results less sensitive to cross-national variations. Conclusion: The HALE estimates indicate that morbidity is more compressed for both sexes, whereas the HLY estimates suggest that morbidity is more compressed for males but more expanded for females. These contrasting results demonstrate that health expectancy indicators should be interpreted with caution.


INTRODUCTION
The idea of combining health and mortality in a summary measure was first proposed in the 1960s 1 and later developed in the 1970s by Sullivan, whose name became a synonym of the method 2 . Sullivan combined population health state prevalence data with mortality data to generate estimates of expected years of life lived in various health states. This measure was called disability-free life expectancy (DFLE). It was first calculated for a set of countries in the 1980s and later in the 1990s. In the same decade, the REVES (Réseau Espérance de Vie en Santé) network was established with the aim of promoting the use of healthy life expectancy as a tool for monitoring health and policy making. In 1993, the DFLE measure was included among the health indicators of the Organisation for Economic Co-Operation and Development (OECD). The REVES network then started to systematically assess the comparability and stability of health indicators across countries, health dimensions, age and time, with a focus on Europe 3-6 . The REVES network developed a series of research projects that aimed at adding the quality dimension of life lived to the quantity dimension of European populations. In its first project (2004)(2005)(2006)(2007), called the European Health Expectancy Monitoring Unit (EHEMU), supported by the Public Health Programme of the European Union, several summary measures and methods of population health were developed, particularly, healthy life expectancies free from chronic disease, disability, and in good perceived health 7,8 . The REVES network also developed survey instruments that were included in the European Union Statistics on Income and Living Conditions (EU-SILC) and on the Survey of Health, Ageing, and Retirement in Europe (SHARE) 7,9 . In the second project (2007)(2008)(2009)(2010), called European Health and Life Expectancy Information System (EHLEIS), the REVES network monitored the results provided by the instrument and indicators constructed in the previous project 10 . In its most recent project (2011-2014), called Joint Action European Health and Life Expectancy Information System (JA EHLEIS), the group consolidated these measures and survey instruments, tested their comparability, and broadened the EHLEIS [11][12][13] . From this joint effort throughout the three projects, the group suggested a harmonized and comparable indicator of healthy life expectancy in the European context, named the healthy life years (HLY) indicator. The HLY indicator is a DFLE measure based on the Global Activity Limitation Indicator (GALI) included in the EU-SILC survey and estimated using the Sullivan's method (more details on this measurement can be found further in the text). In 2004, as part of the Lisbon Strategy, the European Union (EU) national states selected HLY as one of the European structural indicators to be monitored annually. It was regarded as a key economic outcome measure for social policies related to retirement age and spending for health and long-term care for their rapidly aging population 14 . Later in 2006, the European Commission (EC) sponsored a study by the RAND Europe research and development corporation to assess the uptake of the HLY structural indicator in the EU and ministries in member states 15 . The EC concluded from this study that the HLY indicator is relevant to guide policy making and monitoring regarding labor force participation, pensions, health conditions, and lifestyles. As a result, since 2004, the European Union (EU) has monitored the HLY indicator of its member countries based on a standard set of questions from the EU-SILC 9,14 .
However, other research shows that because DFLE indicators are estimated using a dichotomous weighting scheme (healthy vs. unhealthy), they do not consider varying levels of severity, making them more sensitive to the definition of disability. On an effort to tackle this issue, in 1999, the WHO published the first disability-adjusted life expectancy (DALE) estimates for 191 countries, a measure the organization deemed more sensitive to disability severity levels 16 . DALE measures the number of years of life that one expects to live in full health by weighting the severity of disability prevalence of disease and injury burden. In the following year, the GBD study updated these WHO estimates including different measurement approaches and cross-population comparable survey data from 63 surveys in 55 countries 17 . On this new set of estimates, WHO incorporated a disease-specific approach to estimate the disability and loss of healthy years associated with an exhaustive set of health conditions, and derived what was called the health-adjusted life expectancy (HALE) indicator. In addition to HALE, another health indicator used by WHO and GBD is the disability-adjusted life year (DALY). DALY is the sum of years of life lost due to mortality (YLL) and years of healthy life lost due to disability (YLD). Contrary to healthy life expectancy indicators, which quantify how much of total life expectancy is lived in good health, DALY is a gap measure that assesses the distance between a population's actual health and some desired goals or targets to reach 17,18 . Because the components of DALY (both YLL and YLD) are used in the construction of the health state valuation weights in the estimation process of HALE, both HALE and DALY are the current health indicators used by WHO and the GBD study 17,19 .
Despite the common aim in estimating healthy life expectancies, the underlying approach, definition of health, and information source employed by the HALE and HLY indicators are different. This affects how one evaluates and interprets the overall health states of populations. This study particularly addresses these differences and their consequences for health and mortality research.
This study aims to 1 : review the main concepts and approaches used to estimate health expectancy focusing on two widely used European health indicators (HALE and HLY) 2 ; identify underlying differences between the results yielded by these two indicators and address their impact on policy making and health research.

METHOD
A conceptual overview of the framework underlying health expectancies is presented to address objective 1 ; to address objective 2 , an empirical application to 29 European countries for 2016 is conducted to underline the differences between results stratified by age, sex, and health indicator. The application focuses on this selected group of European countries because HLY is the indicator used by the EU. The HLY indicator and total life expectancy (LE) are retrieved from the EurOhex database from the EHEMU information system for 2016 20 . The HALE estimates are retrieved from the WHO-GBD website 21 . The information is available by age and sex. Both health indicators are readily available in the data sources previously described. Descriptive analyses show the differences in country ranking by total LE and proportion of total LE spent in unhealthy states for both the HLY and HALE indicators by sex at ages 50, 60 and 70. Statistical differences between the HALE and HLY indicators by sex and age cutoffs are tested using pairwise and global Student's t-tests, since they come from two different independent samples 22 . For further discussing differences between the HLY and HALE indicators, z-scores based on standard deviation are estimated, since HLY and HALE are not directly comparable. The z-scores estimated represent the number of standard deviations that each indicator lies above or below its mean distribution. To calculate the z-score, the mean from each of the individual data points is subtracted and divided by the standard deviation 23 . Despite still not enabling a direct comparison in terms of magnitude, this enables analysis of the relationship between mortality and health underlying the HLY and HALE indicators. It also allows evaluation of how European countries perform in terms of proportion of total LE spent in healthy states at age 50 and by sex.

A review of healthy life expectancy indicators The concept of healthy life expectancy
There is a set of health indicators and healthy life expectancy measures that are used by different institutions. In the European case, HLY is the official structural indicator for assessing and monitoring health. In the case of WHO and GBD, the HALE and DALY indicators are used. Despite being differently estimated, they have in common the underlying conceptual framework of survival, and how to slice the survival curve into different parts that represent the health states. Let ( ) x S be the survivorship curve for a population at any given point in time t and over ages a, ranging from α to ω, where ω is the last age where there are survivors left.
The grey shaded area α is the area under the curve ( ) x S , and its integral yields the total life expectancy x e , as depicted on panel a in Figure 1. To the right, on panel b, the same survivorship curve ( ) x S is divided into health states I to IV, where state I is the state of full health. In theory, those health states are a continuum, but in practice they are generally conceptualized and measured as a set of mutually exclusive and exhaustive discrete states ordered on one or more dimensions. A gap measure such as DALY, contrary to a healthy LE indicator, does not refer to the area α under the curve, but instead measures the distance between a population's actual health and some desired goal or target to be reached, which refers to area β . From a theoretical perspective, the ideal goal would be the distance from S to a complete rectangularization of the curve. In practical terms, researchers use a limit function or threshold age.
Healthy LE indicators are aimed at considering the specific areas under the survival curve as years of life lived in less than full health in order to quantify how much of total LE is lived in good health. What distinguishes the various healthy LE indicators from each other are mainly the approach or methodology employed, the source of health data, and how the information is incorporated. The HLY indicator is within the category of DFLE estimates. In this case, health expectancy gives a weight of 1 to health states with no disability and a weight of 0 to health states with any level of disability above a given threshold (Other types of health expectancy not discussed here also use this approach, such as active life expectancy and independent life expectancy [24][25][26] ). This is a dichotomous approach, since there are only two mutually exclusively health states defined. This means that the indicator is usually defined in terms of two shaded areas under curve ( ) x S , with or without a given health state. The definition of health can vary (e.g., with/without chronic morbidity, good/bad self-rated health, with/without dementia), and in the case of the HLY indicator, the dimension of health analyzed is activity limitation, based on the Global Activity Limitation Instrument (GALI), as further shown in more detail.
The HALE indicator, on the other hand, is within the category that employs polychotomous or continuous weights. These weights are based on health state valuations that are defined in terms of severity-weighted disability prevalence. The weight of 1 is attributed to years of good health and non-zero weights to some states of less than good health. The WHO and GBD argue that dichotomous weighting, such as the one employed by the HLY measurement, is not sensitive to differences in the severity distribution of disability, since time spent in any health state categorized as disabled is assigned a weight of zero 16 .

How is healthy LE estimated?
The three most usual ways to estimate healthy LE are the Sullivan's 1 , Multistate 2 and Double Decrement methods 3 . The Sullivan's method essentially combines life table information on survivorship with prevalence rates by age. It requires a population life table and prevalence data for the health state or states of interest. The prevalence data are usually derived from cross- sectional surveys. Because of their parsimony and tested consistency, they are the most often used approach 27,28 . The Multistate approach is a generalization of the life table (which can be  conceived as a single state life table) where it is possible to estimate the transition probability matrix for the various non-absorbing states of health before death, including remission and recovery states. This allows calculation of health expectancies for specific health states of a selected population subgroup, while the prevalence-based Sullivan's method provides only the average health expectancy for the entire population at a given age. The Multistate method is based on incidence measures representing current health transitions, and it allows death rates to differ by health state [29][30][31] . However, it requires detailed health information, usually derived from longitudinal studies 29,30 . The Double Decrement method is a special case of the Multistate method where the only possible transition is from disability to death, and thus the probability of remission in a given health state is zero 32 . This method is appropriate when the disability state is considered either irreversible (e.g., senile dementia) or when probabilities of recovery are negligible. Other currently less used approaches are the Microsimulation, Grade of Membership (GoM) and Bayesian Inference methods. For more details and a brief introduction on these methods see 30 . An alternative, not often used approach to indirectly measure healthy LE is the Intercensal method, where age-specific proportions of healthy persons at two successive and independent cross-sectional health surveys are combined with mortality information to generate a set of transition probabilities 33 . This indirect estimation of health expectancy relies on a multistate approach, but uses widely available data. This method has been proven suitable to estimate healthy LE in contexts where nationally representative longitudinal health studies are limited, precluding HLY estimates for the population as a whole 34 . This study focuses on the specific characteristics of the Sullivan's method, because it is the one used to estimate both the HLY and HALE indicators.

The Sullivan's method and its use to estimate the HLY and HALE indicators
The Sullivan's method partitions the total number of person-years lived from the life table into disability and DFLE based on the proportion of the population disabled at each age, as shown in Figure 1 2 . Therefore, this method can use period health data from surveys and period life tables to derive its estimate. Due to its simplicity and parsimony, as well as to its ease of interpretation, the Sullivan's method has been used to estimate DFLE in many populations and according to different definitions of disability 28,35 , racial and regional disparities 36,37 , educational levels 25 , gender 38 , and time 39 , to mention a few. Additionally, it is the approach used by many of the international health organizations, governments, and research groups, including the WHO, the U.S. National Center of Health Statistics (CDC-NCHS), Eurostat, and the GBD study, performed by the Institute for Health Metrics and Evaluation (IHME) 19,40 .

The dichotomous health measure, HLY
Eurostat's HLY is a composite indicator that combines mortality data with health status data based on the GALI foreseen in the annual EU-SILC survey question: "For at least the past 6 months, to what extent have you been limited because of a health problem in activities people usually do?". The GALI has been thoroughly tested, and its robustness has been rigorously assessed for the European context 7,9,10,30 .
Consider that x L is the number of years lived between ages x and 5 x + from the life table, and that x prev is the prevalence rate of a given health dimension retrieved from a particular survey. Then the number of years lived in state H between ages x and 5 x + is given by: To estimate the health expectancy for the given health state H , DFLE should be subtracted from the total life expectancy. This procedure can be done for any health dimension considered. The REVES network estimated this indicator for chronic morbidity, self-reported health, and activity limitation. After thorough analyses, the network concluded that activity limitation was the most appropriate health dimension to capture overall health conditions. This particular healthy life expectancy is the HLY indicator, which is used by Eurostat 7,[9][10][11][12][13]30 .

The continuous or polychotomous health measure, HALE
The HALE indicators can be estimated based on Equation 1, but the number of years lived in state H is replaced by a weighted sum of the years lived across all health states defined. The sum of the prevalence rates across all states 0 to S is 1. On a scale from 0 to 1, where 1 is equal to good health, there will be several weights measured on this scale. Consider the weights 0 w , 1 w , 2 w … s w and the prevalence rates The difference between Equation 1 and Equation 2 is that the numerator of Equation 2 is a weighted sum of all health states (S) defined. In graphical terms, this means to take the health states as described in panel b of Figure 1 and attribute a specific severity-weighted prevalence to each of them. Considering the 4 states schematically drawn in Figure 1, this makes HALE additively decomposable into , where HE are the health expectancies. The issue that is widely debated in this approach is how the s w severityweighted prevalence rates of disability are estimated 41 . These values are derived from the years of life lived with disability ( YLD ) estimated by the GBD study. The YLD is a component of the DALY measure. Figure 2 shows how EU countries are ranked according to total LE at age 50 (left) and proportion of total LE spent in HLY (right). The ranking is based on estimates for males. It is evident that women live longer than men on all countries, with contrasting differences, but also that they spend a higher proportion of their total LE in poorer health in all countries.

An empirical application HLY: the Eurostat indicator
In the literature, this phenomenon is reported as the male-female health-survival paradox 42 , and has been explored in the European context through both the HLY indicator and other health dimensions 12,13,43,44 .

HALE: the WHO and GBD indicator
Regarding the HALE indicator in Figure 3, the scenario of the gender paradox is not universal across all countries as it is for the HLY indicator, as shown by the overlap or even reversal in the proportion of total LE spent in equivalent good years by gender (right side of the figure). The HLY and HALE indicators are not directly comparable, but when the proportion within the same indicator is computed, it was expected that the pattern observed in Figure 2 would reappear in Figure 3. Noteworthy that, in Figure 3, the countries are ranked according female total life expectancy, but that should also not change the pattern observed.
The magnitude of the differences in Figure 2 and Figure 3 are shown in Table 1. For all European countries considered, SILC sex ratios indicate a considerably better scenario for males in terms of their total LE spent in good health relative to their female counterparts. Portuguese and Romanian males expect to spend 30% more healthy life years compared with females at age 50. On the other hand, HALE sex ratios are not always indicative of male advantage regarding equivalent years spend in good health, and the sex ratios hover around 1, with Eastern European countries (Croatia, Czech Republic, Poland, Slovakia, and Slovenia) presenting ratios a little below 1. Figure 4 shows HALE vs. HLY boxplots by sex and different ages for European countries in absolute years. The scattered dots around the boxplot are the countries. The HALE indicator from the GBD study provides a higher level of health expectancy compared to the HLY indicator on all ages and both sexes considered. In addition, the distribution of their estimates is less dispersed across countries, with HLY being the only indicator to present outliers. The named countries represent the higher end of the distribution (i.e., the best performers in terms of health).

Eurostat x GBD: What do they measure?
The t-tests indicate that the absolute differences observed between the means of the two distributions are significant, both for the pairwise and global comparisons. The forerunner  countries are the same for females on every age, but differ between indicators, with France and Sweden estimated as the healthiest countries by the HALE and HLY indicators, respectively. For males, there are differences for both the ages and indicators considered. The HALE indicator estimates Swiss men to be the healthiest at ages 50 and 60, while French men take the lead at age 70. The HLY indicator estimates Swedish males to be the forerunners at age 50, but their Scandinavian counterparts from Norway lead at ages 60 and 70. Figure 5 shows how country rankings differ in terms of proportion of healthy life by sex at age 50 considering the SILC source (HLY) or the GBD source (HALE). Although the pattern of difference in rankings is similar for both sexes, ranking differences by indicator are very significant.
Since it is not possible to directly compare magnitude differences in the HLY and HALE indicators, sex ratios of both indicators, proportion of healthy life, and z-scores are used to show the extent to which each indicator establishes a relationship between mortality and health. Panel a of Figure 6 shows the sex ratio distribution for both the HALE and HLY indicators in terms of proportion of total life expectancy lived in good or equivalent healthy years. The most striking aspect is how smooth HALE is compared with HLY. The ratio across European countries barely deviates from 1 for the HALE indicator, while the HLY indicator is much more sensitive and well above 1 for all countries, indicating the health advantage that men experience compared with women at age 50. Panel b shows a different facet of the same aspect highlighted in panel a, but now the distributions are shown, with the two endpoints being males at the top end and females at the bottom end of the boxplot. This highlights the variance between sexes within the countries for each indicator. The GBD study indicator presents lower variance between sexes, with the gap between women and men barely existing. In addition, the fact that the proportion of healthy life according to the GBD study indicator is much higher than that of the SILC-EU indicator suggests that the GBD study indicator is much more correlated to the mortality dimension. Since the GBD study weighting scheme accounts for the levels of disease severity, it is possible that its health indicator mirrors mortality. The fact that higher levels of severity are correlated to higher probability of dying suggests that the HALE indicator probably reflects more the mortality aspect than the health aspect. To briefly assess the latter aspect, z-scores were computed for each indicator and the results were expressed in terms of lower and higher sex ratio. Results of zero show equal point and mean. A result of one indicates that the point is one standard deviation above the mean, and when data points are below the mean, the Z-score is negative. First, there is striking difference between the countries that deviate positively and negatively between the two indicators. Second, the HALE measure has a higher proportion  of z-scores that deviate positively from the mean of countries, so that the scenario is more optimistic for the selected European countries. In contrast, the countries that deviate negatively do so in a greater magnitude. The opposite is observed for the HLY indicator. The distribution presents a more balanced pattern of both positive and negative deviations, with positive deviations being of larger magnitude.

DISCUSSION
This study reviewed the main concepts and approaches used to estimate health expectancy and focused on two widely used health indicators to stress the underlying differences between them and the impact they have on policy making and health research. The HALE indicator estimates were smoother across European countries than those of the HLY indicator, present a narrower gender gap in morbidity and higher z-scores compared with the average distribution across Europe. These results matter for understanding the relationship between health and mortality. One of the most important conceptual frameworks in health and mortality is whether increases in longevity imply a compression 45 , expansion 46 , or dynamic equilibrium of morbidity 47 . Health expectancy indicators are attempts to assess how many of the life years gained are followed by an increase or a decrease in healthy life years. Despite the absence of a time series analysis, the HALE indicator estimates indicate that morbidity is more compressed for both sexes, since the proportion of total LE lived in healthy state is high. In contrast, the HLY indicator suggests a greater variance in the relationship between morbidity and LE, with a more compressed pattern of morbidity for males and an expanded pattern for females. It also shows more influence of cross-national characteristics. This has important policy implications for both decision makers, as they rely on the performance of the best standing countries to set targets, and for health researchers who aim at assessing how mortality correlates to morbidity.
The sources for those differences are not the aim of this study, but previous research has shown that the health-valuation approach employed by the GBD study lacks parsimony and is often too complex and unclear 41,48 . In addition, that approach incorporates more than 135 disease and injury categories and different disease stages, severity levels, and sequelae. Many different data sources are used to calculate it, and an iterative process, combined with approximate Bayesian computation, are used in estimating its indicator, with a heavy modelling of data 16 . In the case of the HLY indicator, the source of information is mainly the SILC survey and the GALI, thus the sensitivity of this indicator is more easily assessed than that of the HALE indicator. On the other hand, the HALE indicator provides a health expectancy indicator for over 191 countries worldwide, which has aided researchers in assessing health from a global perspective, whereas the HLY indicator is restricted to European countries.
Lastly, there is the issue that both indicators are estimated by the Sullivan's method. Some authors contend that these indicators are not purely cross-sectional because the prevalence rates are cumulative, and hence partly dependent on the earlier health conditions of each age cohort. The prevalence of disability is a stock variable that depends on the past, while incidence of disability is a flow variable 27,49 . Because of this mismatch between stock and flow of health variables, when sudden changes in population health occur, the Sullivan's approach cannot detect them appropriately, nor monitor the resultant change 27,50 . However, other studies have shown that in cases where changes in transition rates are stable and smooth, the Sullivan's method provides acceptable results for estimating trends in health expectancy 27,31 .
In 2014, because of the emergence of a plethora of healthy LE indicators that followed different guidelines and concepts of health, the WHO assembled a working group, the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER), which was aimed at promoting good practice in defining and reporting health estimates. These guidelines were first published in 2016 in the Lancet and PLOS Medicine journals. The effort was to have more consistent health results by incorporating a list of items that should be reported whenever health estimates are published 51 . However, despite following these protocols, health expectancy indicators still yield very different results, and this needs to be accounted for when measuring the health status of a population and drawing conclusions from it. The European Innovation Partnership on Active and Healthy Ageing (EIP-AHA) was launched in 2011 with the aim of increasing healthy life years by 2 years until 2020 and promoting healthy and active ageing within the EU 52 . The partnership uses the HLY indicator to monitor its goals and establish the parameter of its guidelines. Countries like New Zealand, on the other hand, do not have a clear target, but use the GBD study indicators to outline major causes of health loss in the country and advise policies and other stakeholders 53 . However, the profiles of health that indicators such as the HLY and HALE provide are very different, and actually suggest alternative health environments, including when stratifying by age and sex. Both the HLY and HALE indicator present limitations, since no health indicator can fully encompass all health dimensions in their complexity. Because the HALE indicator correlates more closely to mortality because of its level of severity weights, the outputs indicated by the HALE indicator are actually mirroring mortality patterns. This does not mean that the HALE patterns is not useful for setting health policies, but it suggests that the policies it would be more successful in establishing and monitoring are "curative" and associated with more lethal conditions. The HLY indicator, on the other hand, suggests a more "preventive" outlook, as this type of indicator reflects the level of debilitating conditions of a population regardless of their degree of lethality. In addition, due to its global availability, the HALE indicator enables cross-national comparisons and assessing performance of countries in terms of health, even if this relies on heavy modelling, making it particularly interesting for global overviews. However, for setting more local, preventive care targets, indicators that are more sensitive to health, such as the HLY indicator, seem more suitable.