Lifestyle predictors of depression and anxiety during COVID-19: a machine learning approach

Abstract Introduction Recent research has suggested an increase in the global prevalence of psychiatric symptoms during the COVID-19 pandemic. This study aimed to assess whether lifestyle behaviors can predict the presence of depression and anxiety in the Brazilian general population, using a model developed in Spain. Methods A web survey was conducted during April-May 2020, which included the Short Multidimensional Inventory Lifestyle Evaluation (SMILE) scale, assessing lifestyle behaviors during the COVID-19 pandemic. Depression and anxiety were examined using the PHQ-2 and the GAD-7, respectively. Elastic net, random forest, and gradient tree boosting were used to develop predictive models. Each technique used a subset of the Spanish sample to train the models, which were then tested internally (vs. the remainder of the Spanish sample) and externally (vs. the full Brazilian sample), evaluating their effectiveness. Results The study sample included 22,562 individuals (19,069 from Brazil, and 3,493 from Spain). The models developed performed similarly and were equally effective in predicting depression and anxiety in both tests, with internal test AUC-ROC values of 0.85 (depression) and 0.86 (anxiety), and external test AUC-ROC values of 0.85 (depression) and 0.84 (anxiety). Meaning of life was the strongest predictor of depression, while sleep quality was the strongest predictor of anxiety during the COVID-19 epidemic. Conclusions Specific lifestyle behaviors during the early COVID-19 epidemic successfully predicted the presence of depression and anxiety in a large Brazilian sample using machine learning models developed on a Spanish sample. Targeted interventions focused on promoting healthier lifestyles are encouraged.


Introduction
The widespread global COVID-19 crisis continues to affect people in different ways, with many people becoming vulnerable to mental health challenges during the pandemic. 1 Since the onset of this pandemic, fears arising from uncertainties about their well-being have led to major changes in people's lifestyles around the world. 2 The sudden deviation from daily routines has resulted in increased prevalence of psychiatric symptoms relative to before the COVID-19 pandemic. 1 In particular, online web-surveys have been used to assess symptoms of mental disorders and have found an increase of the prevalence of symptoms of common mental disorders such as depression and anxiety in the general population in many countries, such as China, Italy, and Denmark, among others. [3][4][5] Two of the most severely affected countries in the world were Spain and Brazil, where citizens experienced high levels of psychological distress in the early stages of the COVID-19 epidemic. 6,7 Using online assessment tools, research studies conducted during the initial stages of the pandemic in Spain indicated a high prevalence of depressive symptoms, ranging between 18.7% 8 and 41% 9 , as well as of anxiety symptoms, ranging between 21.6% 8 and 25% 9 among the general population. Similarly, a study in Brazil conducted during the COVID-19 epidemic found anxiety and depression to be the most commonly prevalent psychiatric symptoms in the general population, with a staggering 81.9% of participants indicating symptoms of anxiety, and 68% presenting symptoms of depression. 10 Furthermore, another Brazilian study found a positive association between psychological symptoms (i.e. depression and anxiety) and social isolation variables (i.e. loneliness, days in isolation, level of concern about the COVID-19 situation in Brazil), suggesting the impact that these challenging routine changes may have had on the mental well-being of people. 11 The major lifestyle adjustments people have had to make during this pandemic might be considered as risk factors for the appearance of unstable psychological symptoms during the quarantine period. 2 Unhealthy behaviors, such as poor dietary habits, poor sleep quality, and lack of exercise, to name a few, have been found to contribute to the burden of mental health around the globe. 12,13 Given the unusual circumstances people across the world have been affected by, it is not uncommon for people to have developed unhealthy behaviors during the quarantine period that may trigger stress-related symptoms of depression and anxiety. 2 A recent systematic review including studies from various countries suggests a global increase in prevalence of psychiatric symptoms during the COVID-19 pandemic, 1  the final score is obtained by summing the scores for all questions (noting that some questions are reversescored). The higher the score, the better (healthier) the lifestyle. In addition, self-rated health (SRH) was measured using the question "How would you rate your health in general?", with response options of "Very good", "Good", "Regular", "Bad", and "Very bad", scored from 1 to 5, respectively. For the purpose of the present study, all the items were independently included in the model.

Outcome
Our main outcome was presence of a positive screening result for depression or anxiety. Current depression and anxiety were assessed using Patient Health Questionnaire-2 (PHQ-2) 17 for depression, with a cut-off of ≥ 3, and the Generalized Anxiety Disorder 7-item (GAD-7) 18 scale for anxiety, with a cut-off of ≥ 10. Two dichotomous variables were created, where scores on the two scales that were equal to or above the cut-offs were defined as "Positive Depression" and "Positive Anxiety".

Dealing with non-responses
First, all participants that had missing variables, which would prevent us from building the final model, were excluded from the analysis. Secondly, columns containing more than 5% of missing data were removed.
Finally, the remaining variables were imputed as follows: (1) for every variable, the mode, in case of categorical, or the mean, in case of continuous data, was computed from the training set; (2) the internal test set was imputed with the previously computed modes and means; (3) the same was done for the external test set (Brazil).

Statistical analysis
The statistical analyses performed to compare groups in terms of sociodemographic and clinical characteristics were conducted using SPSS 21. Independent variables were described by outcome and compared using chisquare tests and Student's t test for independent samples. All variables, with the exception of SMILE scores, were categorized and analyzed using chi-square tests between the respective outcome groups. For SMILE scores, the samples were compared using Student's t test. All machine learning experiments were conducted using R software (version 4.04), and the caret library (version 6.0-86). 19 The data were analyzed using 3 different machine learning algorithms: elastic net, random forest, and gradient tree boosting (extreme gradient boosting [XGBoost] library). Elastic net is a regularized linear model that penalizes high weights, and thus is focused on generalization of the model for unseen samples. 20 Random forest is a machine learning algorithm that combines and averages the predictions of multiple decision trees with random subsets of features and instances, resulting in a single predictive model. 21 XGBoost is a scalable technique that creates a predictive model by efficiently adding new models to correct the errors of existing models (also known as 'gradient boosting'), until the best possible model is reached. 22 The dataset was trained and tested with each of the three models separately to examine their performance in comparison to each other. For each model, the data from Brazil were separated from the dataset and were not used until testing time; referred to as the external test. Then, the Spain dataset was split into two (75% for a training sample and 25% for an internal test) using class-stratified sampling. The training sample was then used to train the model, according to the training procedures for each machine learning technique.
For each model, a grid search with the caret default hyperparameters was used to identify the best model in a 10-fold cross-validation procedure. Downsampling of the majority class was used to fix class imbalance.
Variables with more than 5% missing data were removed and the remaining variables were imputed by either the mode (for categorical variables) or the mean (for numeric variables). Generalization of the model was then assessed in the internal test sample, and used to generate all performance metrics (e.g., accuracy, sensitivity, and others) for the Spanish sample. Finally, the model was evaluated on every individual from the Brazilian sample without any retraining or fine tuning. Model predictions were also used to create risk quintiles. Participants were sorted by their corresponding predicted probabilities and separated into five groups (20% highest predictions allocated to group 1, 20-40% allocated to group 2, and so on), then, the percentage of participants with presence of the outcome was calculated. This approach provides a broad idea of how predicted probabilities translate into actual probabilities of the outcome in test data (i.e. model calibration).

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Sample size and sociodemographic characteristics
The final sample for this study comprised 22,562 individuals, with 19,069 subjects from Brazil and no anxiety in Spain. The sociodemographic differences are described in Table 1 (depression screening) and Table 2 (anxiety screening).

Model performance
Model performance was assessed using several metrics.  Figure 1A).   The calibration of the model was also analyzed, by evaluating the concentration of the outcome within the percentiles of predictions. In Figures 1C and 1D Figure 1D).
In addition, the importance of each question on the ( Figure 2B).

Discussion
Findings from this study indicate that variables that sleep disturbances and anxiety are bidirectionally associated, 33 and this association was expectedly evident during the early stages of the pandemic. 34 Stressful life events, such as the COVID-19 pandemic, which threaten one's psychological and physical well-being are likely to cause increased sleep disturbances in the population. 35 Indeed, sleep problems have been highly prevalent during COVID-19, with approximately 40% of the general population reporting poor sleep quality in the early stages of the pandemic. 36 The confinement period during the pandemic has led to changes in social and environmental cues important for circadian rhythms and the sleep-wake cycle, including the lack of fixed schedules for working, eating, exercising, socializing, and similar daily routines. 35 Changes in the sleep-wake cycle can result in desynchronization between the circadian rhythm and important immune functions, which can affect a person's physical and mental well-being. 35,37 In particular, reduced sleep quality was associated with higher levels of depression and anxiety symptoms early on during the COVID-19 lockdown in Italy. 38  pandemic. In addition, to our knowledge, this is the first study using machine learning methods to predict it is important to highlight that presence of anxiety and depression was assessed using a screening test, and replication of this data assessing them using a structured, albeit much more labor-intensive, clinical interview is encouraged. There are also limitations regarding the lack of fine-tuning of the model for the Brazilian sample; although we showed that the model performs similarly in both countries, the model was not further trained on a subset of Brazilian data, which could potentially improve the model for that sample.
In contrast, a major advantage of this study is the consistency in performance metrics across three highly reliable machine learning methods for development of predictive models, which also achieved high AUC-ROC values for the internal and external tests. In addition, another major advantage is that the trained sample of the model was from Spain, one of the first countries affected by the pandemic, which was then tested in Brazil, where the epidemic started at a later period. This is an encouraging sign for development of prevention guides using this model. Lastly, the findings from this paper are in accordance with previous research indicating important mental health burdens related to lifestyle behaviors, especially during the COVID-19 pandemic.
Implications from this study could be highly significant in the approach towards developing targeted approaches to promote healthy lifestyles that might help reducing the burden of common mental disorders.