Machine learning analysis to predict health outcomes among emergency department users in Southern Brazil: a protocol study

REV BRAS EPIDEMIOL 2021; 24: E210050 ABSTRACT: Objective: Emergency services are essential to the organization of the health care system. Nevertheless, they face different operational difficulties, including overcrowded services, largely explained by their inappropriate use and the repeated visits from users. Although a known situation, information on the theme is scarce in Brazil, particularly regarding longitudinal user monitoring. Thus, this project aims to evaluate the predictive performance of different machine learning algorithms to estimate the inappropriate and repeated use of emergency services and mortality. Methods: To that end, a study will be conducted in the municipality of Pelotas, Rio Grande do Sul, with around five thousand users of the municipal emergency department. Results: If the study is successful, we will provide an algorithm that could be used in clinical practice to assist health professionals in decision-making within hospitals. Different knowledge dissemination strategies will be used to increase the capacity of the study to produce innovations for the organization of the health system and services. Conclusion: A high performance predictive model may be able to help decisionmaking in the emergency services, improving quality of care.


INTRODUCTION
In Brazil, urgent and emergency services are a fundamental part of the health care system, ensuring timely assistance for individuals in accidents and life-threatening cases 1 . Although urgent and emergency services existed before the creation of the public health system, the guidelines for their network organization were recently established to improve access and the quality of health care, guarantee timely resolutive services for the population, and promote comprehensive care with interaction between the different care levels of the system 1 .
Considering the lack of specific funding for urgent and emergency services, overcrowding becomes one of the difficulties faced when organizing the Urgent and Emergency Care Network and a historical problem in the country, causing concern among managers and the general population. However, this problem is hard to solve due to its different sources. The large number of patients who seek emergency services results from social, cultural, and organizational aspects of health services in the care system 2 . These aspects increase the demand for emergency departments for problems that could be treated at other care levels, culminating in the inappropriate use of emergency services 3 , usually characterized by frequent users [3][4][5] . In 2011, a study carried out in Southern Brazil evidenced that practically all frequent users had chronic diseases 4 . Nevertheless, emergency services should be specific and restricted to cases of imminent risk to life or the following life-threatening situations.
The inappropriate use of emergency care is considered an indicator of the quality and services of the health system, given its potential to assess the care network. Evidence on the problem is scarce in Brazil. In one of the few studies carried out on the subject to date, inappropriate use showed a prevalence of 24.2% in the municipality of Pelotas, Rio Grande RESUMO: Objetivo: Os serviços de emergência são fundamentais na organização da rede de atenção à saúde. Não obstante, apresentam diferentes dificuldades para seu funcionamento. Entre essas, destaca-se a superlotação dos serviços, a qual, em boa medida, é explicada pelo uso inadequado do serviço e reutilização frequente por parte de usuários. Apesar do conhecimento dessa situação, as informações sobre a temática são escassas no Brasil, ainda mais as relacionadas ao acompanhamento longitudinal dos usuários. Assim, este projeto objetiva avaliar a performance preditiva de diferentes algoritmos de machine learning para estimar o uso inapropriado e a reutilização dos serviços de emergência e a mortalidade. Métodos: Para isso, será realizado um estudo no município de Pelotas, Rio Grande do Sul, com um pouco mais de cinco mil usuários do pronto socorro municipal. Resultados: Caso o estudo seja bem-sucedido, será disponibilizado um algoritmo com potencial para ser usado na prática clínica para auxiliar profissionais de saúde na tomada de decisão no contexto hospitalar. Diferentes estratégias de difusão dos conhecimentos serão utilizadas para aumentar a capacidade do estudo de produzir inovações para a organização do sistema e serviços de saúde. Conclusão: Um modelo preditivo de alto desempenho pode auxiliar na tomada de decisão nos serviços de emergência, melhorando a qualidade do atendimento. do Sul, in 2004. In other words, for every four patients who sought the emergency department, one was misusing it, increasing the demand for and expenses of the service, especially losses caused by the lack of comprehensive care for these individuals 6 .

Palavras
Several reasons lead to the search for emergency services and their consequent inappropriate use. Still, they are primarily a consequence of the low problem-solving capacity of primary care services 3 . There is a popular belief that emergency services are more effective due to their higher technological density compared to primary care units. In addition, access to primary care services is usually restricted to daytime hours (making access difficult for the economically active population), with a daily limit (cards) of appointments and long waiting time for specialized care and exams 3 . Further, the activities carried out in primary care services continue to focus on curative care and are centered on certain health professionals, with little guidance for disease prevention and health promotion actions. Barriers to access to primary health care (PHC) also include difficulties in scheduling appointments, obtaining information about health problems, obtaining long-term control medications, and participating in educational groups. This context increases the demand for emergency services to deal with conditions that could be treated at other, more appropriate care levels for longitudinal and comprehensive care 7 .
However, in Brazil, longitudinal information for assessing adverse outcomes among users of emergency services is scarce, limiting the capacity of the health system to produce evidence-based actions and case-prioritization strategies. Some international studies suggest the potential use of models to predict adverse outcomes among users of emergency services 8,9 . At the same time, the evidence does not yet point to effective interventions to reduce the frequent use of these services 5 .
Among the main predictors of inappropriate use of emergency services and its subsequent adverse events, the occurrence of multiple chronic health problems, also called multimorbidity, stands out. This scenario seems to be relevant to increase the capacity of indices for risk stratification and the prediction of adverse health outcomes 10 , including mortality 11 and use of emergency services 12 . A systematic review published in 2015 showed that multimorbidity raised the accuracy of risk stratification indices, especially for hospital admission/ readmission 10 . Multimorbidity also plays a crucial role in predicting outcomes in frequent users of emergency services 13 . The rapid demographic and epidemiological transition that Brazil has been experiencing ratifies the importance of concurrent chronic conditions in the same individual for determining adverse outcomes and challenges the health system as to the proper management and prevention of these problems 1,14,15 . Multimorbidity is frequent in Brazil 16,17 , especially among older adults and users of health services 18 .
Despite the relevance of emergency services and the outcome assessment among users of these services, Brazil has produced few studies on the topic, and for the most part, they are limited to cross-sectional estimates of the demand met 19 . Monitoring these individuals can contribute to identifying population groups with a higher risk of inappropriate use of emergency services, repeated visits to the service, and adverse outcomes, especially mortality. These assessments can be helpful for the health system organization because their REV BRAS EPIDEMIOL 2021; 24: E210050 information has repercussions on other services and levels, contributing to adopting health innovations that can improve the effectiveness of the system, optimize resources, and prevent adverse events for the population. Thus, this study aims to evaluate the predictive performance of different machine learning algorithms to estimate the inappropriate and repeated use of emergency services and mortality. Secondary objectives include 1. measuring the percentage of inappropriate use of the emergency department according to the Hospital Urgency Appropriateness Protocol; 2. predicting the risk of death and of repeated visits to emergency services within one year after using the emergency department; 3. assessing demographic and socioeconomic characteristics, health status, access to primary care services and services related to emergency care associated with the outcomes under study.

METHODS
The study will use two designs: a cross-sectional and prospective cohort. The cross-sectional study will measure the prevalence and predict the inappropriate use of the emergency service. At the same time, the prospective cohort design will estimate the risk of death and of repeated visits to the emergency service within one year.

STUDY SITE
The present study will be carried out in an emergency department in a municipality in the south of the state of Rio Grande do Sul, considered an urgent and emergency point of entry into the public health system, which is governed by the Municipal Health Secretariat along with two universities, one federal (or public) and one private. It is installed in the facilities of a university hospital, providing care to urgent and emergency cases in the city and more than 20 municipalities in the region. The service remains open 24 hours a day, seven days a week, serving only the public system, and on average, 300 patients are treated per day, according to data provided by the unit.
The unit is divided into four sectors: intake with Risk Assessment and Classification, adult emergency, pediatric emergency, and hospitalization. Intake consists of the reception, vital signs assessment room, and waiting room.
The adult emergency department has a physician's office, a stabilization room, and an observation room with seven beds. The pediatric emergency includes a physician's office, a stabilization and observation room, and a 16-bed ward. Finally, the hospitalization ward comprises 10 beds, 29 stretchers in the corridors, an observation and medication room with 14 reclining chairs, two isolation rooms, a physician's office for surgical evaluation, a laboratory test collection station, a nursing station, a medication room, the administrative sector, and purge.

TARGET POPULATION
The target population will comprise individuals aged 18 years or older using the emergency service in the municipality of Pelotas, Rio Grande do Sul, within three months or more if necessary.

CRITERIA FOR INCLUSION AND EXCLUSION OF STUDY PARTICIPANTS
We will include individuals aged 18 years or over, admitted to the municipal emergency department, and undergoing the risk classification process. Individuals using the service for forensic examination escorted by police officers will be excluded from the study.

SAMPLE CALCULATION, SAMPLING, AND DATA COLLECTION
Sample size was calculated in two steps. First, it needed to estimate the prevalence of inappropriate use of the emergency service. Considering an estimated prevalence of 24.2 6 ± 2% error and 95% confidence level, 1,759 individuals will be required. The broadest estimates were used to calculate associations, based on similar studies in Brazil: 95% significance level, 80% power, exposed/unexposed ratio of 0.1, 20% of outcome in the unexposed, and a minimum prevalence ratio of 1.3. With these parameters, 4,892 individuals are necessary to study the proposed associations. Adding 10% for possible losses and/or refusals, the calculated sample size is 5,381.
Eligible service users will be selected to achieve the estimated sample. Based on usage records from 2017, provided by the emergency service for the performance of the study, the department had an average of 3,735 visits per month and 120 per day for the adult and older adult population. Thus, if current demand remains the same as in the past, we intend to systematically select users to achieve the required sample. If demand is lower, we will interview all eligible users. This strategy will guarantee the equiprobability and representativeness of the sample using the emergency service.

REV BRAS EPIDEMIOL 2021; 24: E210050
The study will be conducted in the emergency department. If necessary, phone calls will be made to complete the interview. The service management is aware of the proposal and available to contribute to the study through logistical and infrastructure support.

DEPENDENT VARIABLES
The inappropriate use of the emergency service will be measured through the Hospital Urgency Appropriateness Protocol 20 . In this document, five criteria are used to define inappropriate use, including the severity of the case, the need for treatment, the diagnostic intensity, prolonged observation and/or transfer to another service, and criteria for patients who seek the service without a referral. Use of the service without meeting at least one of the criteria will be considered inappropriate use.
Mortality data on events occurring in the emergency service will be provided by the official Mortality Information System. In addition, an active search for study participants will be carried out through telephone and/or home visits to identify deaths. Users will be followed up within one year after using the emergency department.
Finally, the repeated visits to emergency services will be collected from returning users. We will also compute the number of times the individual has repeatedly used the service over one year. This information will be systematically obtained from the service admission record.

INDEPENDENT VARIABLES
Independent variables will be selected from previous studies in the literature 3,6,10,21 , including demographic and socioeconomic information, health status focused on chronic diseases and multimorbidity, and access to and use of health services, collected during the hospitalization in the emergency service. To prepare the questionnaire that will gather this information, we will administer previous research questions.

DATA ANALYSIS
Machine learning algorithms will be tested to predict deaths and repeated visits to emergency services within one year after the patient uses the emergency department. In recent years, machine learning has shown rapid growth, being used in significant public health problems, such as helping diagnose diseases and predicting the risk of adverse events and deaths [22][23][24] .
Predictive algorithms aim to improve health care and provide decision-making support for professionals in the field. For the present study, baseline characteristics of individuals will be used to train popular machine learning algorithms, such as neural networks, random forests, support vector machines, penalized regressions, and gradient boosting.

REV BRAS EPIDEMIOL 2021; 24: E210050
After the final data collection, individuals will be divided into a training set (70% patients, who will be used to set the parameters and hyperparameters of each algorithm) and a test set (30%, who will be used to test the predictive ability of models in new unseen data) to test the predictive performance of algorithms in future data.
All preliminary steps will also be carried out to ensure the good performance of the algorithms, especially those related to pre-processing predictor variables, such as the standardization of continuous variables, separation of categorical predictors with one-hot encoding, exclusion of strongly correlated variables, dimension reduction by the use of principal component analysis, and definition of hyperparameters with 10-fold cross-validation.
The final objective will be developing a tool, based on the best algorithm, with a good performance according to the best algorithm predictive capacity to separately identify the risk of death and repeated visits to emergency services, measured through the area under the ROC curve. Studies performed recently in developed countries have shown that this is a feasible challenge using machine learning 21,25 . If the study is successful, we will test the applicability of and adherence to these algorithms in clinical practice to assist health professionals in decision-making within hospitals.

COVID-19
Given the current pandemic caused by the SARS-CoV-2 virus, the study was delayed and will start after the clearance of primary data collection activities, following the recommendations of the Universidade Federal de Pelotas. As a result, general questions about symptoms, tests, and possible effects caused by coronavirus contamination were included. In this context, machine learning algorithms will also use information about the pandemic to predict outcomes and contextualize findings accordingly.

QUALITY ASSURANCE AND CONTROL
Activities to guarantee the control and quality of the data will involve a series of measures aimed at preventing the risk of bias. Initially, a research protocol will be developed, followed by an instruction manual that will be part of the kit that each interviewer will receive. Interviewers will be trained so that they are calibrated in all necessary aspects. Before data collection, each interviewer will perform tests with their family members or friends to increase accuracy and practice the use of research instruments.
The Redcap software will be used for data collection. Redcap is a data collection software that can be used on any mobile device without an internet connection. Its use is possible due to a partnership with Vanderbilt University (available at https://www.project-redcap. org/). Questions will be designed to obtain responses structured according to the expected.

REV BRAS EPIDEMIOL 2021; 24: E210050
In addition, if any question is not answered to Redcap, it issues a warning to the interviewer so they can correct the missing question.
Database from the questions will be checked daily to find possible inconsistencies. Finally, we intend to make random telephone calls to 10% of the sample. A reduced questionnaire will be administered to compare the answers to those of the main questionnaire.

SCHEDULE
The project will start with the preparation of the electronic questionnaire. In May 2021, we will start collecting data in the emergency service. Database verification and cleaning will occur concurrently with data collection and will continue until the end of the study. At the end of the study, data analysis and production and submission of scientific articles will begin. Next, the findings will be presented at events, and the results disseminated. At the end of the study, the final report will be prepared and presented.

ETHICAL PRINCIPLES
This study will be conducted based on informed consent, as determined by the ethical aspects of Resolution 466/2012 of the National Council of the Ministry of Health and the Code of Ethics for Nursing Professionals, particularly the duties in Chapter IV, articles 35, 36, and 37, and the prohibitions in chapter V, articles 53 and 54, which deal with the ethical aspects of research involving human beings, ensuring the subjects' willingness to participate, anonymity, and right to withdraw from the study at any time, completely respecting the participant and seeking to provide maximum benefits and minimum losses. After identifying and selecting the participants, we will inform them about the research objectives and ask them to sign the Informed Consent Form. The Research Ethics Committee of the School of Medicine at the Universidade Federal de Pelotas approved the project on March 7, 2020, under opinion number 3,530,616 and CAAE 17785219.1.0000.5317.

EXPECTED RESULTS, SCIENTIFIC CONTRIBUTIONS, AND IMPACT
The study will test algorithms to predict the inappropriate use of emergency services, repeated visits to the service, and death within one year after the interview. This activity will be performed to obtain results that can be used in clinical practice. In this sense, the scientific contributions of the study include the development of methods and techniques for using these algorithms in clinical practice to assist health professionals and health managers in decision-making in the hospital and possible adjustments to the health care network.
Machine learning algorithms can be helpful for PHC by providing updated information, easier access, case management, and coordination, creating a complex map of the user's condition and service demand, preventing inappropriate use and repeated visits to emergency services. The expansion of information and communication technology resources in PHC will improve access and service utilization and the application of algorithms to achieve better performance and comprehensive care.
Moreover, project products include preparing scientific articles, oral communications at conferences in the area, scientific communication strategies for disseminating data, and a report for FAPERGS. It will also lead to new academic collaborations, contributing to training local human resources in the area.