Assessment of participation bias in cohort studies : systematic review and meta-regression analysis

The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest. The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies. A systematic review (MEDLINE, Scopus and Web of Science) for articles describing the proportion of participation in the baseline of cohort studies was performed. Among the 2,964 initially identified, 50 were selected. The average proportion of participation was 64.7%. Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline) were associated with participation. Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources. Finally, journals should require the presentation of this information in the papers. Selection Bias; Cohort Studies; Epidemiologic Methods Resumo A proporção de não-participação em estudos de coorte está associada também à exposição e à probabilidade de ocorrência do evento poder gerar viés nas estimativas de interesse. O objetivo do presente trabalho é realizar uma revisão sistemática e metanálise de artigos que descrevem a participação em estudos de coorte e avaliar as características associadas à participação. Foi realizada uma revisão sistemática (MEDLINE, Scopus e Web of Science), buscando-se artigos que descrevessem a proporção de participação na linha de base de estudos de coorte. De 2.964 artigos inicialmente identificados, foram selecionados 50. Entre esses, a proporção média de participação foi de 64,7%. Utilizando-se o modelo de metarregressão com efeitos mistos, somente a idade, ano da linha de base e a região do estudo (limítrofe) estiveram associados à participação. Considerando a diminuição na participação em anos mais recentes e o custo dos estudos de coorte, é essencial buscar informações que permitam avaliar o potencial de não-participação antes de comprometer os recursos. Viés de Seleção; Estudos de Coortes; Métodos Epidemiológicos http://dx.doi.org/10.1590/0102-311X00133814 Silva Junior SHA et al. 2260 Cad. Saúde Pública, Rio de Janeiro, 31(11):2259-2274, nov, 2015 Background Among observational studies, the advantages of prospective cohort studies are that they are able to estimate incidence measures directly and are less vulnerable to information bias. However, participation refusal at baseline or follow-up can introduce selection bias when simultaneously associated with both exposure and the outcome 1,2. As a result, the association between exposure and outcome may differ between participants and non-participants. Morton et al. 3 observed a tendency for participation in cohort studies to decrease between 1970 and 2003. As the non-participation proportion rises, vulnerability to selection bias tends to increase. Therefore, it is recommended reporting participation proportion in observational studies 4, designing methodological studies to evaluate the impacts of non-participation and evaluating study characteristics that may influence participation 5. To the best of our knowledge, and in spite of its importance, no systematic evaluation of participation in observational cohort studies is available to guide choices and scientific assessment of validity of conclusions. This present study aims to perform a systematic review and meta-regression of papers describing non-participation bias in cohort studies, and evaluate the studies’ characteristics associated with participation proportion.


Assessment of participation bias in cohort studies: systematic review and metaregression analysis
Avaliação do viés de participação em estudos de coorte: uma revisão sistemática e metarregressão Evaluación del sesgo de participación en estudios de cohortes: una revisión sistemática y metarregresión The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest.The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies.A systematic review (MEDLINE, Scopus and Web of Science) for articles describing the proportion of participation in the baseline of cohort studies was performed.Among the 2,964 initially identified, 50 were selected.The average proportion of participation was 64.7%.Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline) were associated with participation.Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources.Finally, journals should require the presentation of this information in the papers.

Background
Among observational studies, the advantages of prospective cohort studies are that they are able to estimate incidence measures directly and are less vulnerable to information bias.However, participation refusal at baseline or follow-up can introduce selection bias when simultaneously associated with both exposure and the outcome 1,2 .As a result, the association between exposure and outcome may differ between participants and non-participants.
Morton et al. 3 observed a tendency for participation in cohort studies to decrease between 1970 and 2003.As the non-participation proportion rises, vulnerability to selection bias tends to increase.Therefore, it is recommended reporting participation proportion in observational studies 4 , designing methodological studies to evaluate the impacts of non-participation and evaluating study characteristics that may influence participation 5 .
To the best of our knowledge, and in spite of its importance, no systematic evaluation of participation in observational cohort studies is available to guide choices and scientific assessment of validity of conclusions.This present study aims to perform a systematic review and meta-regression of papers describing non-participation bias in cohort studies, and evaluate the studies' characteristics associated with participation proportion.

Methods
We performed a systematic review and meta-regression following the methodology proposed by Higgins & Green 6 and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) criteria 7 .For the other data bases, the specific syntaxes corresponding to each base were used.

We
Article titles and abstracts were evaluated by two reviewers working independently in order to ascertain whether they met the criteria for inclusion in the study.Disagreements were assessed by a third reviewer.

Eligibility criteria and data extraction
As specific populations and health problems may induce large differences in participation proportions related to theses specificities, we only included population-based cohort studies on adult (18 to 75 years old) healthy people.We excluded studies that addressed specific populations (eg.pregnant women, patients with specific ailments), review studies and others (eg.genetic studies, surgery, drug therapies).Figure 1 depicts the review flow chart.
The references identified were stored and processed using the JabRef 2.10 software (http:// jabref.sourceforge.net/).We collected the participation proportion, the general characteristics of the study (year of baseline contact, place, selection strategy and study outcome).We also evaluated the characteristics of the study population including type (general population vs. working population), participation of women and the mean age.The relevant data was extracted reading the full paper.

Data analysis
A meta-analysis of participation proportion was conducted using mixed-effects models, often called binominal-normal models 8 .Given the heterogeneity of the studies (I² = 99.97%;τ² = 0.54; p < 0.001), we investigated the variables associated with the participation proportion, initially by simple meta-regression models.When the value of variance accounted for (VAF) by the model was greater than 5%, the variable was included in the multiple model.VAF indicates the percentage of total heterogeneity that is explained by each moderator.The goodness of fit of the multiple model was evaluated by the likelihood ratio test (LRT).
We analyzed the following variables: year of the baseline contact, participant mean age, proportion of women, selection strategy, population type (general population vs. employees population), study outcome -cardiovascular (baseline category), general health or others (cancer, accident, substance use, incapacity and smoking) -and study region, as divided by United Nations Statistics Division 9 into Continental Europe (baseline category), Northern Europe, USA, and Others (Asia or Oceania).Spearman correlation coefficient was used to evaluate the relation between the year of the baseline contact and the participation proportion.
The analyses were performed using the metafor 10 library of R software (The R Foundation for Statistical Computing, Vienna, Austria; http:// www.r-project.org).

Results
Of the 2,964 original papers initially identified, 50 were selected.Figure 1 summarizes the study selection process.
Table 1 describes the objectives, database, analysis and main results of the selected papers.To evaluate participation, 29 (58%) papers compared participants and non-participants using secondary databases, 15 (30%) used the information available at baseline, and six (12%) used some way of contacting the non-respondents with small questionnaires.Logistic regression models were the most used technique to evaluate participation, used in 18 (40%) of the papers.Passive follow-up studies applied survival (7) and Poisson regression models (4), and a few some combination of different techniques.In eight papers the evaluation was based on frequencies comparison, using baseline characteristics and/or questionnaires.Imputation, weighted regression and simulations were applied in four papers to evaluate and propose analytical methods for correcting potential bias.
Table 2 describes of the overall study characteristics and sample characteristics potentially associated with participation proportion.Most of the publications are concentrated in the years from 2005 to 2014, the oldest having been published in 1978.The studies comprised 40 (80%) geographically population-based, while the remainder were of workers (8), students (1) and recruits (1).
Most of the studies were conducted in Northern Europe (40%).Regarding participant selection, 60% were random sample, the remainder census-based.The most frequent outcomes were overall health condition in twenty-three (46%), and cardiovascular health in forteen.Other outcomes included cancer, accident, substance use, incapacity and smoking.Participant mean age was 49.5 years (SD = 8.2 years).Mean participation proportion was 64.7%, and ranged from 32.2% to 87.3%.Women participation was slightly larger (52.6%) (Table 2).
A negative correlation was found between study year and participation proportion (ρ = -0.38).Figure 2 shows the downward trend in participation proportion.The dotted line indicates the linear regression, an annual rate of decrease of 0.66% (R² = 0.1; p = 0.01).The continuous line (a smooth spline) indicates a downward trend in participation, since 1985.The diameters of the circles of each study, identified by the number of the study (id) in Table 1, is proportional to the inverse of the corresponding standard errors in the meta-regression.The larger circles are more influential in the meta-regression.
The simple meta-regression showed association only between participation proportion and year of the baseline contact (OR = 0.97; 95%CI: 0.95-0.99).The multiple meta-regression showed an association between participation proportion, year of the baseline contact (OR = 0.97; 95%CI: 0.95-0.99)and age (OR = 0.97; 95%CI: 0.95-1.00)(Table 3).In other words, for one-year increase in the year of the baseline contact of the study we expect a 3% decrease in the odds of study participation.Likewise, for one-year increase in the mean age of the study participants we expect a 3% reduction in the odds of study participation.
The analysis shows residual heterogeneity τ² = 0.41 (p < 0.001) for the participation proportion, suggesting that 18.1% of total heterogeneity can be accounted for by including year of the baseline contact and age.The test for residual heterogeneity is significant (LRT = 42,252.5,df = 33, p = 0.00), indicating that other covariates not considered in the model are influencing the participation proportion.

Discussion
We found a high heterogeneity in participation proportions among the papers evaluating nonparticipation bias.The most referred characteristics described in the systematic reviewed papers were sociodemographic profile, hospitalization and cancer incidence.Mortality was larger among non-participants.However, in the meta-regression performed only year of the baseline contact and age was associated with participation.
Several strategies involving comparison between participants and non-participants have been proposed to evaluate the potential selection bias in cohort studies: questionnaires to non-participants, comparison of participants according to recruitment moment 4 and passive monitoring of the eligible population using secondary database to assess the outcome 11 , the majority of papers in our study.
The results show a decrease in participation in studies over time.The reasons for this decline are not clear, but social changes, and changes in selection and recruitment and in study designs may influence participation 3 .The decrease in participa- tion may be related particularly to the increasing number of studies in recent decades, as well as the proliferation of political and marketing surveys 5 .In addition, increased requests for biological material in epidemiological studies may influence adherence negatively 3 .
Previous studies have reported the association between young age and participation cohort studies.Contrary to other articles 12,13,14,15 the proportion of women in the studies showed no association with participation, not even in the simple model.The outcome of the studies was not associ-     Response rates was similar for white participants, both male and female, and in all study centers.In general, respondents presented higher socioeconomic status and health, but differences were smaller for women 27 Veenstra et al. 29 To assess association between health status at baseline and nonresponse; to analyze survival in a 5-year follow-up Secondary data Logistic model Among respondents, prevalence of coronary heart disease was higher.However, their mortality was lower   ated with participation, in spite of its importance in some of them 11,18,19,20,21,22,23,24,25 .Study region showed no association with participation, in spite of the diversity of places evaluated.Participating in studies voluntarily, giving time, information and biological material is all related to ideas of social capital and volunteering 16 , and we expected variation according to local cultural components.
Participation in studies has also been associated with behavioral variables and with general state of health.Non-participants report greater consumption of alcohol, smoking and poor general state of health 12,15,2019,21,22,23,26,27,28,29,30,31,32,33,34 .This information, however, are not available in most publications, limiting the scope of our study.
Strategies to increase participation proportion have been proposed in terms of persuading individuals who are reluctant or hesitant; however, willingness to participate is not always accompanied by commitment to adhere to the study in the long term 35 .Lastly, we agree with the argument of Morton et al. 3 that more information should be requested on the profile of participation and its potential bias.
There is a major need to pursue methodological studies to evaluate the impacts of non-participation on measures of effect in cohort studies.Strategies for that kind of evaluation include comparing participants with non-participants through administrative data bases (sex, age, place of residence), application of summary questionnaires and passive follow-up of eligible population to evaluate mortality 4 .Recent publications from journals with high impact factors show that nonparticipation is mostly ignored or dismissed by many authors, although some are attempting to reduce it or mention it as a limitation in their study 36 .
In conclusion, our findings suggest that the drive for participation and compliance should be assessed previously to funding the cohort study, and specific local knowledge should be included in addressing the potential participants.

Sesgo de Selección; Estudios de Cohortes; Métodos Epidemiológicos
Contributors S. H. A. Silva Junior was responsible for the first draft of the manuscript and data analyzes and contributed to the conception and design of the study.S. M. Santos, C. M. Coeli and M. S. Carvalho contributed to the conception and design of the study.All authors contributed significantly to interpreting the results, commented extensively on subsequent revisions, and read and approved the final manuscript.
searched MEDLINE, Scopus and Web of Science data bases for papers published between January 1978 and November 2014.The query used for the MEDLINE search strategy was: (cooperation[Title/Abstract/ MESH] or noncooperation[Title/Abstract/ MESH] or non-cooperation[Title/Abstract/ MESH] or participant*[ Title/Abstract/ MESH] or nonparticipant*[Title/Abstract/ MESH] or non-participant*[Title/Abstract/ MESH] or compliance[Title/Abstract/MESH] or noncompliance[Title/Abstract/MESH] or non-compliance[Title/Abstract/MESH]) AND bias*[Title/Abstract/MESH] AND (cohort*[Title/ Abstract/MESH] OR prospective [Title/Abstract/ MESH] OR longitudinal [Title/Abstract/MESH]).

Figure 1 Flowchart
Figure 1Flowchart of the search and selection of studies included in the meta-analysis.
symptoms in 4 out of 30 comparisons suggest that imputation of qualityof-life scores of non-participants in palliative care is biased based on the available predictors (continues) scheme were small associated with loss-to-follow-up were: education (lower), non-English-speaking origin, current smoker, poorer health and difficulty managing their income, varying according to cohort age29  Caetano et al. 57To identify characteristics of nonrespondents in a survey among couples on violence and drinking Secondary data Logistic model Male non-respondents were younger, less educated, more often unemployed and drinkers.Among women, having been an abuse victim during childhood increased response 30 Garcia et al.To evaluate attrition in a Spanish population-based cohort Baseline information Logistic model Death and moving to another town were the main reasons of nonresponse.Refusals were associated with working status (disabled and retired) and place of birth (other regions of Spain or in foreign countries); emigration with civil status, age and education as ) and age (younger) presented lower participation rates.The survey location (easy access to participants' residence) and reminders sent to subjects significantly improved the participation model Refusals increased 4.3% in seven years (from 1987 to 1994).Nonrespondents were defined by a combination of sociodemographic characteristics.Nonrespondents hospital admission rates were higher than respondents six months before data collection, and similar afterwards (continues) Cad.Saúde Pública, Rio de Janeiro, 31(11):2259-2274, nov, 2015

Figure 2 Correlation
Figure 2Correlation of year the baseline year and participation rate.

Table 1
Characteristics of studies potentially associated with participation.

Table 2
Objectives, database, analysis and results of the selected papers.

Table 2 (
continued) * Objectives presented here were the most related to the objective of this review.