Differences between self-reported and laboratory measures of diabetes, chronic kidney disease, and hypercholesterolemia

This paper aims to compare the self-reported prevalence measured by laboratory tests and the false positive and negative values for diabetes, chronic kidney disease, and hypercholesterolemia. We used information from the interview and laboratory tests of the National Health Survey (2013, 2014-2015). Sensitivity and specificity were calculated by gender, age, schooling, having health insurance, and time since the last medical visit. We used logistic regression to analyze associated factors with false positives and negatives. Sensitivity was higher for diabetes and among older adults and those who had a medical visit more recently. Specificity was high for all diseases, with better performance among younger people, those with high schooling, and a visit more than one year ago. The likelihood of false positives and negatives decreased with schooling and increased with age. Low sensitivity suggests that prevalence might be higher than indicated by self-reported measures.


introduction
The self-reported health measure is often used as an alternative to surveys that, in general, require a more complicated data collection process and entail higher costs 1 . However, Moreira et al. 1 emphasize that self-reported prevalence (often found in household surveys) may be biased. Respondents may classify themselves as ill and not carry the disease (false positive) or not report the disease and be diagnosed with the disease (false negative).
Some factors that may explain the lack of understanding about the health situation itself. For example, Johnston et al. 2 highlight that individuals do not recognize their condition. They may have provided incorrect information to doctors, or forgotten or misinterpreted medical advice, or even received incorrect information from professionals. The same authors point out that the lack of understanding about the health condition itself can vary by socioeconomic level. Thus, the analysis of self-reported health conditions will probably be influenced by factors such as income and education 3 .
According to Velakkal et al. 4 , epidemiological studies on chronic NCDs in developing countries tend to show lower prevalence in groups with lower socioeconomic status, mainly due to the difficulty of accessing health services. In India, the authors compared the self-reported diagnosis with the diagnosis based on standard tests (structured questionnaires for identifying some diseases) for some NCDs and identified essential differences. Self-reported diagnoses were more prevalent in groups with higher income and education. However, the differences were not maintained when the standard test was used, which, in the authors' opinion, could be associated with the difficulty of groups in worse socioeconomic situations in accessing health services, even if they can identify their worst health condition.
In Brazil, considering the population over 18, the prevalence of self-reported diabetes mellitus (DM) was higher than that diagnosed from laboratory tests, according to PNS data (2014-2015) 5 . The difference between the prevalence of DM was more significant for the group of 60 and over, and the self-reported measure was more prevalent. Laboratory diagnoses evidenced a higher prevalence than self-reported in the 35-44 years age group.
Regarding chronic kidney disease (CKD), laboratory estimates (PNS 2014(PNS -2015 show that the number of individuals with the disease is approximately four times higher than the self-re-ported 6 . Concerning cholesterol, the situation is very similar to that observed with CKD. The prevalence measured by laboratory tests is considerably higher than what that pointed out by the self-reported diagnoses 7 . Diabetes, high cholesterol, and CKD have different symptomatic and clinical manifestations, which implies different probabilities of their identification by individuals [8][9][10] . The pathological profile influences sociodemographic characteristics, access to health services, and how people are aware of their health [8][9][10] . However, the joint analysis -for different diseases -of the differences between self-reported and laboratory-measured diagnoses can help us better measure the quality of self-reported health information and the differences between the outcomes. The analysis stratified by different sociodemographic variables can also be of great importance for the planning of public health policies when pointing out possible, more sensitive groups. The National Health Survey (PNS) contains a series of biological markers that make the proposal feasible.
In this context, this study aims to compare the prevalence between the self-reported diagnosis -collected in the first stage of the National Health Survey (PNS), in 2013 -and that measured through laboratory tests in the second stage of the research, in the 2014-2015 period, based on sensitivity and specificity, for diabetes, chronic kidney disease, and high cholesterol. These measures will be analyzed considering sociodemographic characteristics and access to health services. We also intend to analyze how these characteristics influence the probability of false-positive and false-negative diagnoses.

Methods
This is a cross-sectional study using information from the two stages of the National Health Survey (PNS). The self-reported data were constructed from the answer to the question if any doctor had already diagnosed them with some of the diseases of interest. A specific question for each of the outcomes is included in the PNS chronic disease module collected in 2013, namely, Diabetes (Q030), Cholesterol (Q060), and Chronic Kidney Disease (Q124). These outcomes were compared with laboratory tests of Glycosylated Hemoglobin (GH), Creatinine, Glomerular Filtration Rate (GFR), Total and Fractional Cholesterol (LDL, HDL, and TC), performed on the same individuals between 2014 and 2015.
The base with PNS laboratory exams consists of 8,952 observations, from a subsample of the PNS. The construction of the PNS laboratory base can be better understood in other studies 5,[11][12][13] . The number of tests analyzed here ranged from 7,211 to 8,528 due to losses in the laboratory sample, processing of analyses, insufficient material, and others. The sample considered for each of the outcomes, and each of the different metrics had observations with self-reported and laboratory information. That is, cases with missing values in any of these variables were disregarded.
The laboratory diagnosis of diabetes was based on glycosylated hemoglobin (HgA1) for individuals with HgA1 ≥ 6.5. This cutoff point was used in a recently published work with the same database 5 . It should be noted that cases where individuals had HgA1 ≤ 6.5 (which would represent the absence of diabetes) but declared having used medication to lower sugar or insulin in the two weeks before the research (Q03401 or Q03402) were excluded from the analysis on diabetes. Keeping these cases in the base could artificially inflate false-positive cases and specificity. The same exercise could not be performed for the other outcomes due to the availability of information.
Two different metrics were used for Chronic Kidney Disease (CKD): the glomerular filtration rate (GFR) and creatinine (CR). The definition of cutoff points followed the work of Malta et al. 6 , with the diagnosis of CKD attributed to men with CR ≥ 1.3 and 1.1 for women. All those with values below 60 mL/min/1.73 m 2 were considered to have CKD based on the GFR. The calculation of the GFR in the database was based on the equation of the MDRD study without adjustment for ethnicity and skin color 14 . Three different metrics and cutoff points were employed to define the diagnosis of altered cholesterol: total cholesterol ≥ 200 mg/dL; low-density lipoproteins (LDL) ≥ 130 mg/dL; and high-density lipoproteins (HDL) < 40 mg/dL for men and < 50 mg/dL for women. It should be noted that the question used for self-reported identification refers to high (and not altered) cholesterol, which could lead to some confusion concerning HDL. The definition of limits was based on a study with the same database 7 and a Brazilian guideline on dyslipidemia 15 .
The first strategy for analyzing the differences between laboratory-measured and self-reported diagnoses was the assessment of sensitivity and specificity for each of the outcomes of interest, considering the set of sociodemographic variables -gender, age, schooling, whether or not they have health insurance, and the time elapsed since the last medical appointment. The sensitivity indicates the proportion of individuals who reported having one of the diseases among those that the laboratory test indicated the disease's presence. In turn, specificity is the proportion of individuals who responded that they did not have the disease, and the test confirmed its absence 16 ; that is, it indicates the proportion of cases in which the laboratory measurement confirmed the negative self-reported diagnosis. In the analysis, the laboratory base's outcomes were addressed as the gold standard. The sensitivity and specificity of each of the categories of variables of interest for each outcome and metric were calculated separately. Thus, confidence intervals (95% CI) were generated for each subgroup.
Then, logistic regression models were developed for two dependent variables (for each outcome), false-positive and false-negative cases. That is, for diabetes, CKD, and cholesterol, the false-positive variable was assigned a value of "1" in cases where individuals declared the presence of the disease, but the information was not corroborated by the laboratory test, and zero for the other cases. The value of "1" in the false-negative variable was assigned to individuals whose laboratory diagnosis indicated the presence of the disease, not confirmed by the respondents, and a value of zero was attributed in the other cases. The same sociodemographic and access to health services variables described in the sensitivity and specificity analysis were used as explanatory variables. The model was developed for the three outcomes considering each of the metrics used (HgA1, GFR, CR, TC, LDL, and HDL).
The 2013 PNS was approved by the National Research Ethics Committee (CONEP) of the National Health Council, Ministry of Health. The research participants signed an informed consent form (ICF), and subsequently, collections of peripheral blood were performed at any time of the day. All estimates and analyses were made using Data Analysis and Statistical Software (Stata), version 2014.

Diabetes
The prevalence of diabetes measured based on glycosylated hemoglobin (HgA1) in the sample analyzed here was 7.4% (95% CI, 6.7%-8.2%), while the self-reported information points to a prevalence of 5.5% (95% CI, 4.9% -6.3%). The difference between the prevalence levels was statistically significant. Table 1 shows that 59% (95% CI, 53.8%-63.4%) of individuals with the laboratory diagnosis of diabetes declared to have the disease.
Men and women sensitivity to diabetes was practically the same, at 58.9% (95% CI, 50.5%-66.9%) and 59% (95% CI, 52.4%-65.2 %), respectively. Considering the age groups, the sensitivity concerning diabetes was lower in the younger group (less than 50 years), and the difference between this group and the others (50-59 and 60 and over) was statistically significant. Schooling levels did not show significant differences in diabetes-related sensitivity.
Individuals who had visited the doctor over a year earlier, with a laboratory diagnosis of diabetes, reported the presence of the disease (26.8% -95% CI, 17.0%-39.5%) at levels lower than those groups whose medical visit occurred more recently (both above 60% sensitivity). A health insurance plan did not show a significant difference in sensitivity for laboratory and self-reported diabetes measurements. Table 2 shows the specificity of self-reported measures and based on laboratory tests for all outcomes. Concerning diabetes, 98.8% (95% CI, 98.4%-99.1%) of individuals with glycosylated hemoglobin below 6.5% reported not having the disease. In general, the specificity of diabetes was relatively high. Among the variables analyzed here, only age showed a significant difference between the subgroups. Specificity for people under 50 years (99.3% 95% CI, 98.9%-99.6%) was higher than among those aged 60 and over (97%). Tables 3 and 4 show, respectively, for each outcome, the odds ratio estimated by a logistic regression model for two dependent variables, namely, false-negative (FN) and false-positive (FP) cases. Considering diabetes, the likelihood of false-negatives increased significantly with age. Regarding the group aged 50 or less, both individuals aged 50-59 years (OR = 2) and those older than 60 years (OR = 3.18) we more likely to show FNs. Regarding schooling, the model shows that individuals with access to higher education were 51% less likely (OR = 0.49) to record a false negative than those in the illiterate or incomplete elementary group. The other variables (gender, last visit, and health insurance plan) did not show statistical significance in the model estimated to analyze FNs' for diabetes. Table 4 indicates that being over 60 significantly increased (183%, OR = 2.83) the likelihood of a false positive occurrence for diabetes compared to the group under 50. The other variables were not statistically significant.

chronic Kidney Disease
The prevalence of creatinine-based chronic kidney disease was 5% (95% CI, 4.5%-5.6%) and 6.6% (95% CI, 6-7.4%) considering the Glomerular Filtration Rate. In turn, the self-reported prevalence of CKD was 1.4% in both samples used to analyze outcomes. The sensitivity of laboratory and self-reported measurements was relatively low for both GFR and creatinine, with 4.4% (95% CI, 2.7%-7%) and 3.2% (95% CI, 2.1 %-5.1%), respectively. That is, a small portion of individuals with the laboratory diagnosis of CKD adequately reported the presence of the disease. Considering both metrics, the sensitivity analysis of chronic kidney disease (Table 1) did not show significant differences between the analyzed variables' subgroups. However, noteworthy are the low levels of sensitivity observed for all subgroups.
As for specificity, Table 2 shows high values (98.7%, 95% CI, 98.3%-99%) for both metrics (Creatinine and GFR). In conjunction with diabetes, it was the highest specificity observed among the outcomes. As in the sensitivity analysis, no significant differences were identified between the subgroups of sociodemographic variables and access to health services. Table 3 shows that the likelihood of FN was higher for the group over 60 (OR = 3.63) compared to the group under 50, considering creatinine as a laboratory measure. In the same model, having a health insurance plan increased the likelihood of FN for CKD by 40% (OR = 1.4). The analysis of the occurrence of FN for CKD -when GFR was the metric used -points to a higher likelihood of FN for women (OR = 1.65). Individuals from both the 50-59 years group (OR = 3.35) and the 60 years and over group (OR = 6.95) were more likely to declare that they were not sick, and the statement diverged from the GFR. The same model indicated a lower likelihood of false-negative (OR = 0.73) for those with a medical visit more than a year earlier, compared to the group whose visit took place in less than three months. Table 4 shows statistically significant differences in the likelihood of false-positive only for the 50-59 years group compared to those under 50, and only in the creatinine-based model. None of the explanatory variables showed statistical significance in the GFR-based analysis.
The sensitivity (Table 1) related to cholesterol was considerably lower than that observed in the analysis of diabetes. The sensitivity was higher for LDL, at 27.4% (95% CI, 24.6%-30.3%), followed by total cholesterol (24.2%, 95% CI, 22.2%-26.3%) and HDL (16.5%, 95% CI, 15.1%-17.2%). TC and HDL showed a higher sensitivity for women. In the three cholesterol measurements, the group's sensitivity under 50 was at least half of those aged 60 and over. No statistical significance was observed in any of the three cholesterol metrics among the schooling subgroups. Table 1 indicates that the time elapsed since the last visit influences the sensitivity for practically all outcomes. The same was observed in analyses based on TC, LDL, and HDL, particularly when comparing the groups that had visited a doctor more than one year earlier and the group that had visited a doctor in the last three months since the interview.
Specificity was lower for the three cholesterol metrics concerning diabetes and CKD, ranging from 85.7% (95% CI, 84.2%-87.1%) and 89.1%  (95% CI, 24.6%-30.3%) ( Table 2). Specificity measured by HDL and LDL was higher for men. Table 2 shows that the age gradient of specificity shows the opposite sign to that observed in the sensitivity (Table 1). In other words, specificity decreased with age. For all outcomes, the younger population (less than 50) had a higher proportion of non-sick (laboratory-confirmed) individuals who reported not having the disease than the 60 years and over group. For example, in the case of HDL, the absolute difference in specificity between these groups was 21.6%. In turn, considering schooling, the differences were only statistically significant between illiterates or those with incomplete primary education and those with complete primary or secondary education. Specificity was higher for individuals who went to the doctor over a year earlier. Measured by LDL, specificity was higher for those who did not have health insurance.
The likelihood of false-negative (Table 3) was higher for women when measured by HDL (OR = 1.39). Regarding the younger group (under 50 years), the likelihood of FN was almost always positive with age. The likelihood of false-negative decreased with age only for HDL. Concerning time since the last visit, the likelihood of recording FNs was higher for TC, LDL, and HDL, respectively, with a 31% (OR = 1.31), 41% (OR = 1.41), and 21% (OR = 1.21) higher likelihood (1 year or more) compared to those who had gone to the doctor less than three months earlier. In turn, having a health insurance plan reduced the likelihood of FNs only for cholesterol measured by HDL (OR = 0.75). Table 4 shows that women were 32% more likely to declare having been diagnosed with high cholesterol and having LDL within the expected (laboratory-confirmed) (FP) (OR = 1.32) than men. The likelihood of FP for women was also higher, considering HDL (OR = 1.05). The likelihood of someone over 60 years of age reporting FP was greater for TC (OR = 2.98), LDL (OR = 3.15), and HDL (OR = 3.93).
Few outcomes attested that the schooling level affected the likelihood of FP. The likelihood of someone with elementary school or complete high school (total cholesterol) representing a case of PF was 27% less likely than illiterates or with incomplete elementary school. The longer the time elapsed since the last visit, the lower the likelihood of FP, especially for the group in which the last visit to the doctor occurred over a year earlier. Having a health insurance plan pointed to a higher likelihood of FPs.

Discussion
Considering the metrics used, the results indicate low agreement between self-reported and laboratory-measured diagnoses. Biological markers can inform about health conditions before the individual's perception of symptoms 3 . Crimmins and Vasunilashorn 17 highlight that the availability of biomarkers allows researchers to know the health situation and the effectiveness of services simultaneously. The difference between self-reported and laboratory-measured diagnoses, especially concerning sensitivity, indicates an essential proportion of the population that may be ill and is unaware of this condition.
PNS laboratory data show that chronic kidney disease estimates were up to four times higher than self-reported data 7 . The results showed that sensitivity was higher in diabetes, where about 60% of those who claimed to have the disease were positively tested. Concerning cholesterol, this measure was lower (close to a quarter) and less than 5% for CKD. Therefore, the low sensitivity identified here may reflect underdiagnosed CKD, especially in the country. However, the comparison between the measures of agreement between different diseases should be performed with caution. Self-reported health measurements' accuracy is related to socioeconomic and demographic characteristics, the presence or absence of morbidities, and the pathology's features [8][9][10] . The perception of symptoms is one of the factors related to disease recognition [8][9][10] .
Diabetes was more sensitive than other diseases. To a large extent, this result may be related to the manifestation of the disease's typical symptoms, not observed in cholesterol and CKD. Okura et al. 10 note that diabetes is not an aggres-sive disease at the onset of its manifestation, but requires many interactions with the health system for its control, which increases the likelihood of individuals recognizing the disease. Compared to CKD and high cholesterol, this may be one reason for the increased sensitivity observed. While it remains asymptomatic for a long time, cholesterol is more commonly requested in routine tests than CKD. The clinical manifestation of the condition increases the likelihood of individuals reporting it 9 .
In this sense, Harris and Schoorp 18 highlight that, unlike self-reported measures, biometric markers offer objective health measures and can point out risks in cases where symptoms do not manifest early and consistently. This reinforces the importance of research, such as the PNS, for planning health actions at the national level.
Specificity, or the proportion of individuals who responded that they did not have the disease and the test confirmed it, was high for all measures (diabetes, CKD, and cholesterol), with a lower cholesterol level. Higher specificity was found in younger individuals with a high schooling level and who had visited a doctor over a year earlier. However, these same groups have a lower prevalence of the diseases analyzed [5][6][7] .
The current study confirms the literature results that point to low sensitivity and high specificity (96%) between measured and self-reported measures 10,[19][20][21][22] . Analyzing diabetes and hypertension, Ning et al. 16 observed a higher sensitivity for the older population. A cohort study on older adults in Bambuí also identified a higher sensitivity for diabetes among older adults who visited the doctor most recently and among those with some schooling 19,23 . In general, sensitivity was higher among older people.
In most of the outcomes and metrics considered, the group that had visited a doctor less than three months earlier had a higher sensitivity than those with a last visit in a more extended period. Greater access to health services, with a higher likelihood of having the disease diagnosed, and the ability to remember medical guidelines and change attitudes can explain this outcome 19,23 .
Other studies also point out that the use of health services is the factor most associated with the validity of self-reported morbidity 17,19,[24][25][26] .
As for education, some studies point to different directions. In China, Wu et al. 20 identified a greater ability to correctly report the disease among those with higher education and a higher number of chronic diseases. In turn, Goldman et al. 22 associate the quality of self-reported infor-mation with better cognitive function. Another study in China on diabetes showed that the group with primary education had a higher sensitivity than secondary education 16 . In the Dutch context, Molenaar et al. 21 observed a higher sensitivity in the intermediate level of schooling than those with higher education. The PNS showed greater sensitivity in the groups with lower education (illiterate or incomplete elementary school), especially when compared to the intermediate level (complete elementary or complete high school). In general, the group that accessed the upper level showed higher sensitivity than the intermediate level but lower than the first group.
However, it should be noted that this indicator is very likely to suffer from the effect of age composition. Lower schooling levels are more common in the older population, the same group that has higher sensitivity. In this sense, the results are influenced by the age structure of the education groups. Considering only the population aged 60 and over with diabetes, to illustrate the argument, sensitivity shows the expected gradient, 74.7% (Higher Education and over), 65.8% (Elementary or High School), and 62.7% (Illiterate or Incomplete Elementary). In turn, in general terms, specificity grows with the schooling level, agreeing with other studies 16,21 . Concerning the composition effect on specificity, we can also understand that the age groups with the highest schooling level are, in general, younger, the same group with the highest specificity 27 .
Education and access to health insurance plans can be considered income proxies. Dallas et al. 3 emphasize that, unlike self-reported health measurements, biological markers may reveal an association between disease and income before the illness becomes explicit. Harris and Schorpp 18 emphasize that identifying risks from biological markers allows planners to intervene in the social determinants of health risk before the disease manifests itself or the condition worsens.
The regression models reinforce the impression that age is a crucial factor in analyzing the agreement between measured and self-reported diagnoses. The likelihood of incorrectly declaring the diagnosis, for all outcomes, increased substantially with age for both false-positives (individuals without a laboratory diagnosis, but who self-reported the disease's presence) and for false-negatives (positive laboratory diagnosis, not endorsed by the self-reported information). This result is interesting in light of the higher sensitivity for the older age groups. While having proportionally greater knowledge about their health condition, older adults are also very likely to misreport their condition, regardless of the outcome analyzed.
This relationship can be identified in both directions in the literature. For example, Johnson et al. 2 observe a positive relationship between age and the occurrence of false-negative for chronic hypertension. In turn, Onur and Valmuri 28 observed a negative effect of age on the occurrence of FN for hypertension and pulmonary disease. It should be noted that these authors used models and specifications different from those used here. Compared to the less educated group, in general, the likelihood of FP or FN was not statistically significant. In India, Onur and Valmuri 28 note that higher schooling levels reduce the propensity to record FN.
The declining likelihood of false-positives as the time since the last visit increases is probably related to patients' low ability to understand the diagnoses. Velakkal et al. 4 emphasize that one of the factors behind the discrepancy between self-reported measures and laboratory measures is often the inability to understand the diagnosis received. More vulnerable individuals (older, less educated) have lower levels of knowledge about their health.
It should be noted that false-positive cases (and specificity) can be artificially influenced by cases of individuals with the medication-controlled disease. In these cases, the self-reported diagnosis will be positive (correctly), but the laboratory measurement will point to another direction due to disease control. Likewise, some previous tests may have returned a positive result at another time in life and, even if the parameters are no longer altered, individuals may report the presence of the disease. Concerning diabetes, the database allowed us to exclude from the analysis cases in which diabetes could be medication-controlled. However, the same exercise could not be performed for the other outcomes. Another situation that must be considered is the possibility of PNS testing errors and potential errors in other previous tests. In general, health services usually repeat tests to clear doubts.
This work has significant limitations. The period between the two data collections, on average, two years between the first and second stages of the PNS, undoubtedly increases the likelihood of disagreement between the measurements due to the possible change in the individuals' health conditions. Another limitation already mentioned is that the cases in which the diseases (cholesterol and CKD) may be medication-controlled were not considered in the analyses. Likely, cases of people who declared they had received a medical diagnosis but whose condition was medication-controlled were considered false-positives.

conclusion
The study pointed out the low sensitivity of self-reported morbidity to detect diseases, especially for chronic kidney disease and altered cholesterol. Self-reported measurements can underestimate diseases' prevalence, showing that surveys using only self-reported measurements should be analyzed with some caution. Thus, given the feasibility, laboratory components should be routinely used in population surveys to estimate the prevalence of diseases in the Brazilian population. This study's outcomes indicate that the use of health services was an essential determinant of the population's ability to correctly inform their health condition, contributing to proper disease monitoring and care. collaborations PC Pinheiro, MBA Barros, CL Szwarcwald, IE Machado, DC Malta conceived and designed the study. PC Pinheiro developed the management, exploration, and analysis of the data, the elaboration and interpretation of the results, and the discussion. All authors critically reviewed the manuscript and contributed to the entire process. All authors read, contributed to, and approved the final manuscript.