Latent class analysis of COVID-19 symptoms in Brazil: results of the PNAD-COVID19 survey Análises de classes latentes dos sintomas relacionados à COVID-19 no Brasil: resultados da PNAD-COVID19 Análisis de clases latentes de síntomas relacionados con la COVID-19 en Brasil: resultados de la PNAD-COVID19

The lack of mass testing for COVID-19 diagnosis creates the need to determine the magnitude of the disease based on its clinical symptoms. The study aimed to analyze the profile of COVID-19 symptoms and related aspects in Brazil. The author analyzed the sample of participants from the Brazilian National Household Sample Survey (PNAD-COVID19) conducted in May 2020. Latent class analysis (LCA) was performed with sociodemographic covariables and 11 symptoms reported by 346,181 individuals. Rao-Scott test and standardized residual analysis were used to measure the association with the pattern of health services use. Spatial scan analysis was performed to identify areas at risk of COVID-19 cases. LCA showed six classes of symptoms based on the pattern of answers by participants: (1) all the symptoms; (2) high prevalence of symptoms; (3) predominance of fever; (4) predominance of cough/sore throat; (5) mild symptoms with predominance of headache; and (6) absence of symptoms. Female sex, brown skin color, the North and Northeast regions of Brazil, and all three older age brackets showed stronger association with the class with all the symptoms (class 1). Most use of health services was also by this group of individuals, but with different profiles of use. Spatial analysis showed juxtaposition of this class with areas at greater risk of COVID-19. These finding underline the importance of investigating symptoms for the epidemiological identification of possible cases in a scenario with low population testing rates. COVID-19; Latent Class Analysis; Pandemics; Epidemiological Monitoring Correspondence R. S. Moreira Instituto Aggeu Magalhães, Fundação Oswaldo Cruz. Av. Professor Moraes Rego s/n, Cidade Universitária, Recife, PE 50670-420, Brasil. rafael.moreira@fiocruz.br 1 Instituto Aggeu Magalhães, Fundação Oswaldo Cruz, Recife, Brasil. 2 Faculdade de Medicina, Universidade Federal de Pernambuco, Recife, Brasil. This article is published in Open Access under the Creative Commons Attribution license, which allows use, distribution, and reproduction in any medium, without restrictions, as long as the original work is correctly cited.


Introduction
The pandemic caused by the novel coronavirus and called COVID-19 (corona virus disease, identified in 2019), first reported in December 2019 in Wuhan, China 1,2 , emerged in association with severe forms of pneumonia and high transmissibility. Patients generally present shortness of breath, dry cough, fever, headache, and dyspnea. Fatal cases usually involve progressive respiratory failure with severe lung damage 3,4,5 .
The first case in Brazil was identified on February 26, 2020, and the first death was reported on March 17, 2020. By August 5, 2020, just over five months since the first case, there had been a total of 2,859,073 cases and 97,256 deaths (Brazilian Ministry of Health. https://covid.saude.gov.br/, accessed on 05/Aug/2020). With models showing exponential progression, containment measures include testing more cases, enhanced hygiene, and social isolation 6 . Although there are cases of asymptomatic infection 7,8,9 , the demand for health services is generally accompanied by complaints of symptoms associated with the disease. The mean incubation period has been reported at 5.2 days 10 .
Meanwhile, the circulation of asymptomatic cases can mean an increase in the infection rate. Considering the time span from onset of the disease to death, or some 6 to 41 days, with a median of 14 days 11 , early identification of suspected cases is a crucial window of opportunity for better management of the epidemic, both pharmacological and nonpharmacological. Countries that have achieved success in controlling the epidemic have shown high population testing rates, follow-up of cases and contacts, and greater political and social engagement in the maintenance of social distancing measures. Mass testing also allows identifies the infections that are still at the subclinical level 10,12,13 . Low and middle-income countries with difficulties in the implementation of mass testing must prioritize screening based on definition of clinical cases or presumptive diagnosis 14 .
Brazil presents a much lower testing rate than expected for adequate control of the epidemic, relaxing of social distancing measures due to economic pressures, and instabilities in the coordination of public health measures. The epidemic has thus surged at different paces on various regional geographic scales. This inequality mirrors Brazil´s social iniquity, and the pandemic struck a country already debilitated by low economic growth and a public sector (science, education, and health) weakened by cutbacks in public policy investments, resulting from the neoliberal fiscal austerity model, a reality shared by other countries of Latin America 15 .
In the absence of mass testing, knowledge of notified cases generally occurs when there is a search by health services, when these have the means for confirmation of suspected cases. Another form of notification occurs with the confirmation of underlying cause of death. In both situations, notification is limited to patients that appeared for treatment (and obtained access to it), characterizing a scenario prone to the collapse of health systems for failing to display a strategy for active epidemiological surveillance of suspected cases.
The Brazilian Institute of Geography and Statistics (IBGE) has systematically conducted a specific Brazilian National Household Sample Survey (PNAD) for COVID-19, called PNAD-COVID19 16 . Participants answer questions on self-reported symptoms related to COVID-19, the profile of health services use, and their work market status. Given the low mass testing in Brazil, it is essential to determine the pattern of COVID-19 symptoms. The way this pattern appears in the population, plus scientific knowledge on associated factors, can enhance the support for management focused on patients with higher likelihood of positive confirmation of the disease. This should also foster rational and epidemiologically directed use of ICU beds and mechanical ventilators 17 , besidesallowing better detection, diagnosis, and timely monitoring of suspected cases. Within this scope, the study aimed to elucidate the pattern of COVID-19 symptoms and the main associated socio-spatial factors.
Cad. Saúde Pública 2021; 37(1):e00238420 Methodology A cross-sectional analysis was performed with data from PNAD-COVID19 in May 2020. In this sample, questions were collected on COVID-19 symptoms and the profile of health services use related to these symptoms and the repercussions on the work market. Details on the sampling plan can be found on the IBGE 16 . The PNAD COVID19 survey was based on the sample of households from the Permanent PNAD, first quarter of 2019. A linkage was performed for integration with other databases in order to obtain telephone numbers for each household. This allowed identifying a sample with at least one available telephone in each of 193,662 households. These data represent 92% of the basic sample, distributed across sets of some 48 thousand households per week. This sample is fixed, in the sense that the households interviewed in the first month of data collection remained in the sample in the subsequent months until the end of the survey. All the residents in the selected households were invited to answer the survey, conducted by computer-assisted telephone interviewing (CATI).
The study considered data on symptoms, sociodemographic profile, and use of health services. The questions on symptoms referred to the presence in the reference week (week anterior to the interview) of 11 symptoms: fever, cough, sore throat, shortness of breath, headache, chest pain, nausea, stuffy or runny nose, fatigue, sore eyes, and loss of sense of smell or taste.
In relation to the profile of health services use, the survey asked whether household members had gone to health services and which measures had been taken (stayed at home, called a healthcare worker, self-medication, prescribed medication, or received a visit from a health worker from the SUS or private services). The following healthcare services were listed: basic healthcare unit (UBS in Portuguese) or family health team, urgent care unit (UPA in Portuguese), hospital of the Brazilian Unified National Health System (SUS), private physician's office, private urgent care service, or private hospital. Finally, the survey asked about hospitalization for COVID-19.
Sociodemographic data were sex, skin color, schooling, age, and major geographic region. In order to spatially contextualize COVID-19 cases with symptoms, the survey used the number of accumulated confirmed cases as of June 2020 per 100,000 inhabitants according to Health Regions and states of the country. The basis was the number of accumulated cases as of May 31, 2020, provided by the Ministry of Health (https://covid.saude.gov.br/, accessed on 05/Aug/2020).
Latent class analysis (LCA) with covariables was used to establish the profile of COVID-19 symptoms (dependent variable). This statistical procedure aims to group individuals according to similar patterns of answers, modeled with covariables. This forms classes with greater intraclass homogeneity and greater interclass heterogeneity. Latent classes were generated from the 11 symptoms selfreported by participants in the PNAD-COVID19 survey. However, to guarantee local independence between the indicator variables, it was necessary to perform three groupings. These were: (1) cough with sore throat; (2) shortness of breath with chest pain; and (3) nausea with fatigue. We thus ended up with 8 indicator variables rather than the initial 11 variables.
In the construction of this latent variable, models with different numbers of latent classes (categories) were created and tested until finding the ideal model to describe this variable. The following criteria were observed in this choice: Akaike information criterion (AIC), Bayesian information criterion (BIC), and adjusted Bayesian information criterion (adjusted BIC), always observing the lowest values when comparing the current model with the previous one. The highest entropy value was also considered 18 . Besides these criteria, two other statistical tests were performed (Vuong-Lo-Mendell-Rubin and Lo-Mendell-Rubin) to verify whether the chosen number of classes was better in terms of the model´s fit when compared to the number of classes in the previous model.
To test the association between latent classes and independent socioeconomic variables, we estimated simple and multiple multinomial logistic regression models, with odds ratio (OR)as the measure of effect and its respective 95% confidence intervals (95%CI). The reference category was the latent class without symptoms. Estimation in the LCA models with sociodemographic covariables used the Mplus 6.12 (https://www.statmodel.com/) package, considering the characteristics of the complex sample design. Level of significance was set at 5%. Figure 1 illustrates the model representing the latent class analysis with covariables.
Analysis of the association with the profile of health services use was performed with Rao-Scott tests 19 , an adaptation of the chi-square test to complex samples. Standardized residual analysis (SRA) Cad. Saúde Pública 2021; 37(1):e00238420 was performed to identify excess observations in each category of the dependent variable with the categories of the independent variables. Standardized residuals were considered significant when they were greater than 1.96 in a one-tailed test with 2.5% significance. SRA consists of examining the residuals (differences between observed and expected values in contingency tables) in standardized form, that is, expressed as units of standard deviation. In this sense, in the distribution of probabilities of occurrence, standardized residuals higher than 1.96 or lower than -1.96 have low odds of occurrence (±2,5%) 20,21 .
In complex samples, the standardized residuals can be quite large, since the standard errors produced without Rao-Scott correction tend to be smaller than the standard errors with the proper correction. Thus, the SPSS for Windows version 20 (https://www.ibm.com/) offers the option of analysis of crossed tables in complex samples. With this option, both Rao-Scott correction of the chi-square test and calculation of standardized residuals are adjusted for the complex sampling plan.
Non-incorporation of variables in the profile of health services use in the estimation of latent classes (LCA with covariables) can produce biases due to measurement errors. Still, according to Wang & Wang 22 , the use of LCA (without covariables) mixed with other databases, with the aid of other packages like SPSS, can be executed when the most likely model for class belonging is adequate, that is, when entropy is high (greater than 0.80). In our case, entropy with six classes was 0.928. So,
Cad. Saúde Pública 2021; 37(1):e00238420 even assuming a possible limitation due to measurement bias, high entropy allows an analysis of classes via other alternatives besides the model with covariables. Another important issue was that due to the high number of individuals that did not seek health services, especially in the absence of symptoms, we preferred to conduct only a bivariate analysis with contingency tables, since the inclusion of these variables in a multiple multinomial logistic regression model would violate both the model´s parsimony and the absence of collinearity between these independent variables.
All analyses included the complex sample's parameters. Since this was a large sample, sampling design should be considered in the analysis of statistical significance in the tests and resulting estimates.
Spatial analysis included scan test with confirmed COVID-19 cases. The maximum circle allowed by the program is 50% of the population at risk. However, this high proportion would include half of the Brazilian population, hindering the identification of areas at lower risk. Thus, the size of the spatial window was based on the value indicated on the results of the Gini index in SaTScan (https:// www.satscan.org/), provided after a first standard spatial scan (50%), generating an ideal value of 2% of the population at risk. When the analysis finds a circle with relative risk greater than the risk outside the circle, Monte Carlo type replicates are processed to calculate the p-value, considering 5% significance 23 . Thus, a thematic map was built, presenting the number of cases per 100,000 inhabitants according to quintile, proportional distribution of latent classes, and areas with relative risk with statistical significance.
Since the data are publicly available on the Internet and with no form of individual identification, the study did not require submission to the Institutional Review Board for research in human subjects. Figure 2 illustrates the unconditional distribution of latent classes and the conditional distribution of affirmative answers to each of the eight symptoms according to latent class analysis classification. The model that displayed the best fit was with six classes. They were classified as: (1) all the symptoms; (2) high prevalence of symptoms; (3) predominance of fever; (4) predominance of cough/sore throat; (5) mild symptoms with predominance of headache; and (6) absence of symptoms.

Figure 2
Distribution of unconditional probabilities of latent classes and conditional probabilities of eight symptoms related to COVID-19 according to six latent classes. Brazil, 2020. Table 1 presents the results of the LCA multiple multinomial logistic regression model with sociodemographic covariables. Females showed higher odds of being classified in the classes with more symptoms. Females showed 31% higher odds of presenting all the symptoms (class 1). Brown color showed 34% higher odds of presenting all the symptoms, while black color presented 55% higher odds of predominance of fever, when compared to white color. Asian-descendant and indigenous individual presented 80% higher odds of high prevalence of symptoms (class 2).
Categories with more schooling showed higher odds of having all the symptoms (class 1), high prevalence of symptoms (class 2), and predominance of cough/sore throat (class 4). The same was true for all quartiles of more advanced age in all the classes, except for the class of mild symptoms (class 5). Compared to the Southeast region, the North and Northeast of Brazil presented high odds of belong- Table 1 Multiple multinomial logistic regression model with estimates of odds ratios (OR) and 95% confidence intervals (95%CI) for latent class analysis with sociodemographic covariables, Brazil, 2020. ing to the classes with more symptoms (classes 1 and 2). The South of Brazil showed 14% higher odds of belonging to the class with predominance of cough/sore throat (class 4). Table 2 presents the distribution of the relative frequency of latent classes according to the profile of health services use. Health services use was associated with the prevalence of the latent classes of all the symptoms (class 1), high prevalence of symptoms (class 2), and predominance of fever (class 3). Except for classes 5 (mild symptoms with predominance of headache) and 6 (absence of symptoms), all the individuals classified in the other classes stayed at home. However, the acts of calling a healthcare worker, taking medication (prescribed or self-medicated), and receiving a visit from a private healthcare worker were associated with the classes with the most severe symptoms. Receiving a visit from a healthcare worker from the SUS was associated with both having all the symptoms and mild symptoms with predominance of headache. The most common healthcare service for individuals with all the symptoms was the UPA under the SUS. For individuals with higher prevalence of cough and sore throat, the UBS was the most frequently used service. Hospitals, both public and private, were not the priority services for COVID-19 care, nor were private physicians' offices. Private urgent care services were used by individuals with high prevalence of symptoms. There was an excess of individuals in class 1, with all the symptoms, who were hospitalized or attempted hospital admission but were unsuccessful.

Table 2
Percent distributions of latent classes of COVID-19 symptoms according to profile of health services use. Brazil, 2020.   As for spatial distribution of the quintiles of confirmed cases per 100,000 inhabitants and circular areas with greater relative risk of the disease, we observed a juxtaposition of a large share of the regions/areas with higher proportions of individuals classified as having all the symptoms (class 1). The class of individuals without symptoms (class 6) was not illustrated spatially in order to facilitate viewing the composition of the other classes, whose order of magnitude is quite diferente (Figure 3).

Figure 3
Spatial distribution of latent classes of symptoms (bars), COVID-19 case rates (per 100,000 inhabitants in quintiles), and areas at spatial risk in the COVID-19 case rate (circles). Brazil, 2020.

Discussion
LCA allowed understanding the way COVID-19 symptoms are grouped among individuals with similar prodromic patterns. It is thus a better alternative than the imposition of a cutoff point that only classifies whether the individual had one or more symptoms. There can often be individuals with the same number of symptoms but of a sharply distinct nature. Considering the epidemiology and pathogenesis of COVID-19, the most common symptoms at the onset of the disease are fever, cough, and fatigue, while other symptoms include the production of sputum, headache, hemoptysis, diarrhea, dyspnea, and lymphopenia 24 . Thus, other symptoms like diarrhea and hemoptysis could be added to the investigation of symptoms in order to capture more suspected cases.
The demographic profile associated with the class of all the symptoms (classe 1) consisted mainly of brown women from the North and Northeast and all three older age brackets. The profile was thus not limited to the older quartile (53 years and older), emphasizing the groups of young adults (19-36 years) and middle-aged adults (37-53 years) with higher odds of belonging to all the classes, except mild symptoms (class 5). Having any level of schooling was associated with classes 1, 2, and 4. This may suggest greater understanding and capacity for voicing the magnitude of self-reported symptoms.
The search for health services was associated with three classes of symptoms, ranging from all the symptoms to predominance of fever. On the one hand, this finding shows the concern for seeking medical care in the presence of these symptoms in general, even if it is a fever. This may reflect the awareness-raising campaigns on the most common COVID-19 symptoms. On the other hand, it may generate higher probability of infection among those that are not confirmed, besides generating greater overload on health services. In such cases, teleconsultation initiatives have emerged as a possibility for follow-up of patients classified with less severe symptoms.
While telehealth was already growing in recent years in Brazil, the pandemic accelerated the field's legislation and regulation, given this strategy's potential for dealing with COVID-19. The possibilities for remote care and treatment, screening, triage, prevention, monitoring, detection, and surveillance shape a scenario of strengthening telehealth in the healthcare services´ practice in the territory. The applications range from teleconsultations to telediagnosis, telemonitoring, teleregulation, tele-education, and second opinions. However, the full consolidation of telehealth requires investments in organizational models, systems, services, human resources, and infrastructure, including allocation of more budget funds and time for implementation 25 . Considering Brazil´s historical social inequality and the pandemic´s rapid growth, the country is still falling far short of universal coverage for these services.
The fact that the UPA of the SUS were the service most used by individuals with highest prevalence of all the symptoms makes the UPA the portal of entry for most individuals with multiple symptoms. This finding suggests this point of care in the system as the priority for testing, and not only the hospitals or UBS. However, Medina et al. 26 emphasize the key role of primary healthcare (PHC) to deal with the pandemic. They propose action by PHC along four main lines: (1) health surveillance in the territories; (2) care for patients with COVID-19; (3) social support for the most vulnerable; and (4) continuity of routine activities in PHC.
In relation to health surveillance, the typical capillarity of PHC, oriented by the Family Health Strategy (ESF), provides for the detection, notification, and follow-up of cases. However, the role of community leadership and territorial proximity of the community health agents favor engagement in measures of social isolation and local empowerment, dissemination of accurate information, and support for educational activities. An important part of care for COVID-19 patients is the organization of different patient flows depending on the severity of the disease, expressed by the symptoms and a history of contacts with infected persons. In this line of care, triage, teleconsultation, monitoring of symptoms, and linkage with other points in the network of care are essential for the continuity of healthcare. Social support for vulnerable groups, especially elderly individuals and those with comorbidities, should be articulated with the other mechanisms of social protection. This line of action features shelters in hotels, schools, home support, and other initiatives in community solidarity. Finally, other routine activities proper to PHC cannot be neglected, since they aim to keep the population protected through continuity of the vaccination calendar, monitoring of the main health problems, and follow-up of individuals whose diseases are targets of intervention and resolution in this line 26 . In this sense, LCA of the information collected systematically by the PNAD-COVID19 survey proved to be a powerful tool to support the four lines of action in PHC, to the extent that they allow surveillance, screening, and monitoring for adequate linkage to the other points in the network.
However, there was an excess contingent of individuals classified with all the symptoms (class 1) who sought hospitalization but were unsuccessful. This calls attention to the collapse of health systems that were unable in many cases to meet the demand for hospitalization of more severe cases.
A recent longitudinal study in London, United Kingdom, using the app called COVID Symptom Study Smartphone detected and validated six distinct groups/clusters of COVID-19 symptoms that could require different levels of medical care 27 . Fourteen questions were used on symptoms reported by study participants. Clusters 1 and 2 represented forms of COVID-19 with 1.5% and 4.4%, respectively, requiring respiratory support. These clusters predominantly presented upper respiratory tract symptoms and were distinguished from each other by pain in cluster 2 compared to cluster 1 and slightly increased reports of fever in cluster 2. Group 3 showed stronger isolated gastrointestinal symptoms (diarrhea, missed meals) and a relative reduction in the need for respiratory support (3.7%). Clusters 4, 5, and 6 included participants that reported more severe COVID-19 symptoms, with 8.6%, 9.9%, and 19.8% of individuals who required respiratory support, respectively. These three groups had distinct presentations, with cluster 4 marked by the early presence of severe fatigue and cluster 5 by persistent pain and cough. Individuals in cluster 5 also reported confusion and severe fatigue. Finally, individuals in cluster 6 reported more severe symptoms such as difficulty breathing, including early onset of shortness of breath accompanied by chest pain.
In the current study, although most individuals were grouped in the class without symptoms, specific knowledge of the characteristics associated with the other latent classes is essential for identification of groups at increased risk. It also allows understanding the pattern of health services use in order to provide the best and timeliest diagnostic and patient care strategy. In this sense, seeking to apply epidemiological reasoning to the diagnostic criteria for COVID-19, Tan 28 identifies as the best detection method the one conducted in three stages: suspected case, diagnosed clinical case, and definitively diagnosed case. The CDC defines COVID-19 patients as those with cough, shortness of breath or difficulty breathing, fever, chills, muscle pain, sore throat, and loss of taste or smell as symptoms, besides close contact with confirmed COVID-19 patients 29 .
In fact, spatial analysis showed the juxtaposition of regions with higher case rates and circular areas at high risk with higher frequency of multiple symptoms (class 1). These findings provide an approximation between the symptoms and confirmed cases, corroborating this association. However, this approximation can only be done at the ecological level, considering the cluster effect. Thus, we cannot attribute a COVID-19 diagnosis to individuals classified with all the symptoms (class 1) and must thus avoid the ecological fallacy.
In a brief synthesis, considering the pandemic hecatomb and the absence of a vaccine or scientifically proven pharmacological treatment, the person-space-time epidemiological triad should be revisited in the same investigative spirit of John Snow in confronting the cholera epidemic in London in the 19th century 30 . In this sense, the recording of different symptom profiles should become a permanent activity in epidemiological surveillance systems. This study´s results showed similarity between the factors associated with COVID-19 cases and latent classes of symptoms. The difference lies in the operational ease based on self-report of symptoms recorded by the PNAD-COVID19 survey in relation to the still-distant scenario of mass testing.