Factors associated with preterm birth : from logistic regression to structural equation modeling

This study proposed the application of structural equation modeling (SEM) to investigate variables associated with preterm birth based on a theoretical model analyzed previously by hierarchical logistic regression. The data came from a population-based case-control observational study of hospital births to mothers residing in Londrina, Paraná State, Brazil (June 2006 to March 2007). For the SEM, the study considered the association between socioeconomic characteristics and psychosocial aspects pertaining to reproductive history, work and physical activity, complications during the pregnancy, and fetal characteristics. It also considered the relationship between these associations and the outcome preterm birth mediated by adequacy of prenatal care. The weighted least square mean and variance adjusted estimator (WLSMV) was used for categorical data and robust maximum likelihood (MLR) for odds ratios. Three latent variables were created: socioeconomic vulnerability, family vulnerability, and unwanted pregnancy. The effect of socioeconomic and family vulnerability and unwanted pregnancy on prematurity occurred indirectly through inadequacy of prenatal care. The proposed methodology allowed using constructs, verifying the role of mediation by inadequacy of prenatal care, and identifying the variables’ direct and indirect effects on the outcome preterm birth. Statistical Models; Logistic Models; Risk Factors; Premature Birth Correspondence G. P. Alencar Faculdade de Saúde Pública, Universidade de São Paulo. Av. Dr. Arnaldo 715, São Paulo, SP 01246-904, Brasil. gizelton@usp.br 1 Faculdade de Saúde Pública, Universidade de São Paulo, São Paulo, Brasil. 2 Departamento de Saúde de Jequié, Universidade Estadual do Sudoeste da Bahia, Jequié, Brasil. 3 Centro de Ciências da Saúde, Universidade Estadual de Londrina, Londrina, Brasil. doi: 10.1590/0102-311X00211917 Cad. Saúde Pública 2019; 35(1):e00211917 QUESTÕES METODOLÓGICAS METHODOLOGICAL ISSUES This article is published in Open Access under the Creative Commons Attribution license, which allows use, distribution, and reproduction in any medium, without restrictions, as long as the original work is correctly cited. Oliveira AA et al. 2 Cad. Saúde Pública 2019; 35(1):e00211917 Introduction Theoretical models for studying preterm birth have demonstrated the importance of multiple variables associated with preterm birth. Gestational complications, including bleeding, high blood pressure, eclampsia, altered amniotic fluid volume, genital tract infection, and diabetes have a direct effect on gestational age at birth 1. Various authors have pointed to multiple gestation as a risk factor for preterm birth, including the presence of complications, which may be related to the occurrence of premature rupture of membranes or other maternal-fetal disorders 1,2,3,4. Maternal age, reproductive history including parity, and characteristics of previous children (low birth weight and prematurity) have been identified by various studies as risk factors for preterm birth 1,5,6,7. Aspects of inadequate prenatal care, such as late initiation (third trimester) and unsatisfactory frequency (fewer than 6 consultations) are also risk factors for preterm birth 5,6,8. Studies have also shown the importance of prenatal care for preventing preterm birth. Some studies also identify the impact of socioeconomic variables, complications, and reproductive history on inadequate prenatal care and highlight the role of prenatal care as a determinant of maternal and perinatal indicators, citing such factors as low socioeconomic status, low schooling, and alcohol use in pregnancy 8,9,10,11, low maternal age, marital status, planning of the pregnancy, public healthcare services, high parity, previous premature childbirth 9,10,12, and unwanted pregnancy 13. These determinants are part of the chain of factors associated with both inadequate prenatal care and negative outcomes such as preterm birth. We thus include the mediating role of prenatal care in the modeling. Family vulnerability involves poverty, exploitation, abuse, and psychosocial and cultural factors. Another aspect in parallel is that families headed by women result partly from early or unwanted pregnancy, family instability, and abandonment 14. The social vulnerability experienced by women may also be associated with a degree of emotional vulnerability involving feelings of abandonment, violence, and disempowerment which fails to ensure the conditions for adequate prenatal care and access to medicines 15. Unplanned pregnancy is associated with smoking during pregnancy, using less folic acid, late initiation of prenatal care, and interruption of pregnancy 16. Epidemiological studies on preterm birth routinely use hierarchical logistic regression models to identify risk factors for preterm birth that consider conceptual models to represent the relations between one variable and others in a causal model 17,18. In such models, the variables are included simultaneously in the model and can have their effects overestimated or underestimated. When excluded, they may carry with them part of the information that makes the connection between the variables. Although hierarchical regression considers important elements such as temporal ordering and the logic of the relations between the variables, such models fail to adequately address the relations with confounding and mediating variables 19. Structural equation models (SEM) are more appropriate, since they allow multiple simultaneous equations to incorporate confounding and mediation, besides incorporating latent variables for representing more complex measures that are not measurable with a single variable and that are created on the basis of covariance between observed variables. SEM minimize the effect of residual confounding in associations, especially in observational studies 20,21. This study proposes to analyze the relations between variables and the outcome preterm birth via SEM, using a model tested with logistic regression and hierarchical selection of variables. We used data from a study in Londrina, Paraná State, Brazil, in 2009, headed by Silva et al. 2. A model was redefined (Figure 1) based on the original study’s theoretical and conceptual framework, and which incorporated latent variables. SEM also allowed including variables with a mediating effect on the exposure and outcome variables. FACTORS ASSOCIATED WITH PRETERM BIRTH 3 Cad. Saúde Pública 2019; 35(1):e00211917 Figure 1 Theoretical and conceptual framework for factors associated with preterm birth.


Introduction
Theoretical models for studying preterm birth have demonstrated the importance of multiple variables associated with preterm birth.Gestational complications, including bleeding, high blood pressure, eclampsia, altered amniotic fluid volume, genital tract infection, and diabetes have a direct effect on gestational age at birth 1 .Various authors have pointed to multiple gestation as a risk factor for preterm birth, including the presence of complications, which may be related to the occurrence of premature rupture of membranes or other maternal-fetal disorders 1,2,3,4 .Maternal age, reproductive history including parity, and characteristics of previous children (low birth weight and prematurity) have been identified by various studies as risk factors for preterm birth 1,5,6,7 .Aspects of inadequate prenatal care, such as late initiation (third trimester) and unsatisfactory frequency (fewer than 6 consultations) are also risk factors for preterm birth 5,6,8 .
Studies have also shown the importance of prenatal care for preventing preterm birth.Some studies also identify the impact of socioeconomic variables, complications, and reproductive history on inadequate prenatal care and highlight the role of prenatal care as a determinant of maternal and perinatal indicators, citing such factors as low socioeconomic status, low schooling, and alcohol use in pregnancy 8,9,10,11 , low maternal age, marital status, planning of the pregnancy, public healthcare services, high parity, previous premature childbirth 9,10,12 , and unwanted pregnancy 13 .These determinants are part of the chain of factors associated with both inadequate prenatal care and negative outcomes such as preterm birth.We thus include the mediating role of prenatal care in the modeling.
Family vulnerability involves poverty, exploitation, abuse, and psychosocial and cultural factors.Another aspect in parallel is that families headed by women result partly from early or unwanted pregnancy, family instability, and abandonment 14 .The social vulnerability experienced by women may also be associated with a degree of emotional vulnerability involving feelings of abandonment, violence, and disempowerment which fails to ensure the conditions for adequate prenatal care and access to medicines 15 .Unplanned pregnancy is associated with smoking during pregnancy, using less folic acid, late initiation of prenatal care, and interruption of pregnancy 16 .
Epidemiological studies on preterm birth routinely use hierarchical logistic regression models to identify risk factors for preterm birth that consider conceptual models to represent the relations between one variable and others in a causal model 17,18 .
In such models, the variables are included simultaneously in the model and can have their effects overestimated or underestimated.When excluded, they may carry with them part of the information that makes the connection between the variables.Although hierarchical regression considers important elements such as temporal ordering and the logic of the relations between the variables, such models fail to adequately address the relations with confounding and mediating variables 19 .
Structural equation models (SEM) are more appropriate, since they allow multiple simultaneous equations to incorporate confounding and mediation, besides incorporating latent variables for representing more complex measures that are not measurable with a single variable and that are created on the basis of covariance between observed variables.SEM minimize the effect of residual confounding in associations, especially in observational studies 20,21 .
This study proposes to analyze the relations between variables and the outcome preterm birth via SEM, using a model tested with logistic regression and hierarchical selection of variables.We used data from a study in Londrina, Paraná State, Brazil, in 2009, headed by Silva et al. 2 .A model was redefined (Figure 1) based on the original study's theoretical and conceptual framework, and which incorporated latent variables.SEM also allowed including variables with a mediating effect on the exposure and outcome variables.

Figure 1
Theoretical and conceptual framework for factors associated with preterm birth.

Methods
The data are from a population-based case-control observational study of hospital births in Londrina ( June 2006-March 2007) in which cases were defined as preterm births (< 37 gestational weeks) and controls were non-preterm births (≥ 37 weeks).The original study by Silva et al. 2 used all the variables in categorical form.In the current study, we opted to use the variables in continuous form in order to minimize the fact that categorical variables fail to display approximately normal distribution and may entail problems of asymmetry that affect the standard errors, estimates of residual variance, and the chi-square statistic used in the fit indices in SEM.The records with complete continuous variables totaled 296 cases and 329 controls, representing 90% and 89% of the original database, respectively.
In order to detect possible biases due to the difference in the number of records in the original database, the test of independence was used, based on Pearson's chi-square 22 , which showed that the distributions of the variables in the two databases are statistically similar, indicating absence of bias in selection of the events.
The theoretical framework proposed by Silva et al. 2 was organized in blocks, which were transcribed in the current study to represent possible constructs (Figure 1).Block 1 in the original model is represented by maternal and family socioeconomic characteristics.Block 2 was separated into pregestational characteristics and maternal reproductive history.Block 3 was separated into psychosocial Cad.Saúde Pública 2019; 35(1):e00211917 conditions, maternal habits including physical activity and work, and prenatal care.Block 4 included maternal complications during the index pregnancy and block 5 covered fetal characteristics.
Based on this model, the following steps were adopted in the modeling: (1) Measurement model: verification of the factors' composition via confirmatory factor analysis (CFA).Creation of latent variables: based on the constructs elaborated and suggested by the blocks of variables and the literature that defined the theoretical model, via CFA, considering the number of factors with eigenvalues greater than 1 23 .
(2) Structural model: relates latent variables, observed in the outcome preterm birth, drawing on models in which this outcome is binary.Direct and indirect effects on the outcome were obtained.Modification indices were used to identify relations not considered previously and improve the model's fit 24 .The weighted least square mean and variance adjusted estimator (WLSMV) was used with geomin rotation 25 .Having obtained the final model and tested the model's fits, it was re-estimated by robust maximum likelihood (MLR) in order to obtain odds ratio (OR) estimates.The fit indices used for the model were the Tucker-Lewis index (TLI; reference for good fit: TLI > 0.95), comparative fit index (CFI; reference: > 0.95), root mean square error of approximation (RMSEA; reference: < 0.05), and weighted root mean square residual (WRMR; reference: < 0.90).Significance was set at 5%.The analyses used the MPlus software package (https://www.statmodel.com/) and the graphs used CmapTools (https://cmap.ihmc.us/).

Maternal and family socioeconomic characteristics
These characteristics included the following variables: maternal age in years (< 19; 20 to 34; ≥ 35), per capita family income in minimum wage, number of household residents per room, mother's schooling in years, head-of-household's schooling in years, head-of-household's occupation (skilled; semiskilled/manual; housekeeper; student/retired; unemployed), head-of-household's type of employment (formal; informal; not recorded), household location (non-slum; slum), type of family (nuclear -consisting of husband, wife, and children; only mother and children; non-nuclear; other family arrangements), mother living with husband/partner for less than two years (yes; no), person responsible for supporting household (mother; father; other), presence of elderly over 60 years of age (yes; no), mother's race/color (white/Asian/indigenous; black/brown), type of housing construction (finished; unfinished), maternal migration (yes; no), number of children under 10 years of age (yes; no), and head-of-household's age.

Psychosocial aspects
The gestational conditions covered psychosocial questions at the beginning and during pregnancy: attempted abortion (yes; no), planned pregnancy (yes; no), reactions to the pregnancy by the mother, father, and family and worries during the pregnancy (negative; positive), arguments/fights with the husband/partner (yes; no), separation (yes; no), and worries during the pregnancy (yes; no).

Maternal complications during the index pregnancy
Complications were: vaginal bleeding, hypertension, eclampsia, altered amniotic fluid volume, hospitalization during the pregnancy, anemia, diabetes, genital tract infection, vaginal discharge, placenta previa, and urinary tract infection (yes; no).

Maternal reproductive history
The variables used to measure maternal reproductive characteristics were parity, previous cesareans, previous preterm infant, previous low birth weight infant, maternal age, and assisted reproduction in this pregnancy (yes; no).

Work and physical activity
The variables used to measure activities were whether the mother worked during the pregnancy, whether the work required physical effort, and strenuous housework.

Prenatal care
Inadequacy of prenatal care involved the following observed variables: first consultation (1st trimester), number of consultations (minimum of 3), laboratory tests (urine, blood, ultrasound), procedures, and basic prenatal orientation.The categories were: adequate prenatal care (all items positive); inadequate level I (one or more negative answers); inadequate level II (three or more negative answers); or no prenatal care.The modeling considered two indicative variables to represent inadequacy of prenatal care: inadequacy of prenatal care I: if the prenatal care was inadequate (scored 1) and if adequate (scored 0); and inadequacy of prenatal care II: if the prenatal care was inadequate type II, which included mothers who had no prenatal care at all (scored 1) and if inadequate type I or adequate (scored 0).

Other variables
In order to cover all the variables selected by the original study 2 , the following were included as observed variables: body mass index (BMI), physical activity, alcohol consumption, and multiple pregnancy.

Ethical aspects
The study complied with the ethical principles in Resolution n. 196/1996 of the Brazilian National Health Council, which regulates research involving human subjects, and was approved by the Institutional Review Board of the Public Health Faculty/São Paulo University (protocol n. 404211/2013).

Results
Durante the modeling, changes were made to the initial model.Based on the blocks Maternal and Family Socioeconomic Characteristics and Psychosocial Aspects, the following latent variables were generated.

Maternal and family socioeconomic characteristics
Two factors were selected whose percentage of explained variance totaled 71.5%, with adequate fits in the model (RMSEA = 0.053; CFI = 0.988; TLI = 0.977) (Table 1).Variables whose correlation with the factors was less than 0.30 were excluded: mother's race/color, type of finishing on the housing, maternal migration, number of children under 10 years in the household, maternal age, head-ofhousehold's age, and head-of-household's type of occupation.The two factors mentioned above represented socioeconomic vulnerability (SEV) and family vulnerability (FV), respectively.Based on the way the variables were categorized, higher scores for the latent variable SEV corresponded to lower maternal and head-of-household's schooling and higher rates of slum residence, since per capita family income has a negative sign, indicating that lower-income households have greater vulnerability.Meanwhile, higher scores for family vulnerability correspond to recent relations with the husband/ partner, non-nuclear families, and presence of elderly in the household.

Psychosocial aspects
This latent variable aims to measure maternal psychological conditions affected by moments of worry and stress during pregnancy (Table 2).The variables attempted abortion, fights with husband/partners, separation, and worries during pregnancy were excluded due to low correlation (< 0.30) with the other variables.The selected variables were: planned pregnancy (yes; no) and reactions to the pregnancy by the mother, father, and family.A single factor was considered whose explained variance totaled 75.3%, with good fit indices (RMSEA = 0.056; CFI = 0.991; TLI = 0.972).
Based on the variables' categorization, higher scores on the unwanted pregnancy (UP) latent variable correspond to negative reactions to the pregnancy by the mother, father, and family.This factor also indicates emotional vulnerability, as measured by the level of support received by the mother after becoming pregnant.

Maternal reproductive history
This block was represented by the combination of parity, previous low birth weight infant, previous preterm infant, and maternal age associated with the outcome gradually with its OR and represents the interaction with these variables.(Table 3).

Maternal complications in the index pregnancy
In the original study's logistic regression model, the block related to maternal complications in the pregnancy showed an important fit in the multiple regression 2 , suggesting that more than one of these conditions could be present, and that some of them may have led to the mother's hospitalization during pregnancy.Thus, the variable to represent was the number of complications (0; 1 or more) (Table 3).Hospitalization also increased as the number of complications increased: 40% of the mothers with at least one complication had been hospitalized.

Work and physical activity: physical effort
Physical effort consisted of the following variables: whether the mother had worked during pregnancy, if the work required physical effort, and whether she had done strenuous housework (Table 3).
The graphic representation of the final model (Figure 2) shows the standardized estimates.For some relations, the standardized estimates and OR are presented.For some variables, both values indicate the effects of inadequate prenatal care I and II.
BMI during pregnancy also showed an indirect effect via complications (0.099), and its total effect on prematurity was 0.205.Besides a direct effect, alcohol consumption also showed an indirect effect

Discussion
Prenatal care in the current model showed a direct effect with a positive sign on preterm birth, corroborating other studies 5,6 .In prenatal care, both the pregnant woman and her unborn child benefit from preventive follow-up, orientation, clarifications, and diagnosis of any altered health condition in the mother or fetus 26 .The effects of SEV, FV, reproductive history, and UP on prematurity occur indirectly through inadequate prenatal care, thus indicating the role of this variable's mediation on prematurity.Prenatal care plays an important role in negative pregnancy outcomes and reflects the mother's social, economic, and psychological conditions.According to a comprehensive analysis of quality indicators for prenatal care in Brazil, 15% of pregnant women receive adequate prenatal care and 60% receive all the recommended orientation and complementary tests 15 .Their living conditions and emotional and affective vulnerabilities can influence their knowledge on health during pregnancy 8,13,27,28   pregnancies, without husbands or partners, without paid employment, with little schooling, and with low socioeconomic status 29 .Based on our results, the relationship between these factors suggests the influence of factors of socioeconomic, family, and psychological vulnerability and maternal conditions on the adequacy of prenatal care.Gestational complications, namely bleeding, high blood pressure, eclampsia, altered amniotic fluid volume, genital tract infection, and diabetes have a direct effect on preterm birth 1 .In addition to the direct effect, multiple pregnancy also displays an indirect effect via complications, which may be related to the occurrence of premature rupture of membranes or other maternal-fetal complications 1,2,5 .The variable maternal reproductive history showed a direct effect on the outcome.In the current study, this variable consisted of maternal age and reproductive history with the number of pregnancies and characteristics of any previous infants (low birth weight and prematurity), namely described factors 1,5,6,7 .
The separation of socioeconomic and psychosocial factors in three factors aided the understanding of the various roles in this complex dimension.As a complement to the psychosocial characteristics, the variable worries expresses the nature of the stress experienced during pregnancy, which Cad. Saúde Pública 2019; 35(1):e00211917 was statistically significant in the original study.The current study did not identify a direct association with gestational age, but an indirect effect via complications, which points to another order of concerns, namely with health 1,2 .SEV, as a distal latent variable, showed correlations with both distal and proximal variables, such as physical effort, BMI, walks, maternal reproductive history, and FV.SEV was related indirectly to prematurity in the current study by two paths, the first via inadequate prenatal care and the second via UP.The socioeconomic dimension is quite complex and is not limited to unmet material needs of well-being but also to the denial of opportunities in social relations such as access to work and healthcare 30,31 .Both inadequate prenatal care and UP capture these vulnerabilities.
As for physical effort, the results here point to a negative association with prematurity, unlike other studies, including a systematic review in which either no effect was observed (in the majority of studies) or a moderate effect from work conditions involving physical effort was seen.The variable physical effort, as constructed here, can translate the mother's life phase in relation to her full productive activity/work, whether due to her age or household per capita income, pointing not to SEV (correlation with the variable SEV verified in the model).Meanwhile, Behrman & Butler 28 found that strenuous work can also be seen as an indicator of favorable socioeconomic circumstances, i.e., the capacity to have and keep a job, along with the employment benefits, while also signaling the psychological satisfaction resulting from some types of work.Thus, this result in the current study is not fully explained, and other approaches are needed to better understand the role of physical effort.The same pattern was also seen in the variable taking walks or other forms of physical activity, that is, a direct and negative effect on prematurity.
The BMI variable had a direct effect on gestational age at birth and an indirect effect via complications, corroborating the findings by Padilha et al. 32 .The current study also found a correlation between BMI and SEV, indicating BMI levels outside the normal range, namely < 19kg/m 2 (underweight) and ≥ 30kg/m 2 (obese) in mothers with greater SEV.
This same pattern was also seen in alcohol consumption during pregnancy, with direct and indirect effects on gestational age at birth.The indirect effect was via UP and inadequate prenatal care.Other studies have shown an association between heavy alcohol consumption and prematurity, but the mechanisms were not clear 1,28,33 , while still others found no association 34 .
The current model also indicated a path to prematurity via UP and inadequate prenatal care, and no references were found that discuss these possible relations.Future studies could explore these paths better.Again, the role of prenatal care as a channel of care for the mother and infant deserves attention.
The original case-control study 2 used a theoretical framework designed in hierarchical blocks using a logistic regression model 35 in which blocks of variables were grouped according to some common characteristic.The current study was based on this model by considering the structure created by including non-observed (latent) variables.In principle, it is possible to create a structural equation model drawing on a hierarchical model, since the structure in blocks bears an important element of the relations between these variables and the outcome: temporality, characterized by the concepts "distal", "intermediate", and "proximal".
However, there were difficulties in rewriting the model: the blocks combine variables that are not always correlated, so that each relationship should be reassessed for inclusion in the model.
In addition to the low correlations between the variables comprising the factors, various other problems may arise with the use of the structural equation model: estimation problems, non-convergence, lack of identification, non-positive matrix, and estimation of negative variances are not uncommon 26 .
For some variables, estimation problems occurred due to empty boxes (zero) in the bivariate combinations, such as "previous pregnancy" and "nulliparous mothers", in which structurally there is a zero in this intersection, which prevents estimation of the covariance and requires using the solution proposed by Muthén & Muthén 36 , namely to combine a new variable, that is, to create an interaction variable.Its relationship to the outcome was taken into account (via OR).An alternative is to use a perturbation in the box, i.e., the zero is replaced by a small value 37 .
Estimation of the OR allows the known interpretation in epidemiology.Obtaining the OR model after having generated the model with WLSMV was the alternative found to perform the most Cad.Saúde Pública 2019; 35(1):e00211917 adequate estimation of the data 38 .Muthén et al. 39 reported that WLSMV was developed with small and moderate samples.Beauducel & Herzberg 40 also reported more satisfactory results with WLSMV in variables with few categories (2 or 3, compared to 4 or more).
The study's limitations include its observational design, which does not allow conclusions as to causality.The number of individuals was small given the number of relations studied, although this was not a problem for obtaining the models' fits.And the study was not designed to observe the relations in the form proposed by the modeling.
From the methodological point of view, the SEM was built based on a model that had already been built (the original model), drawing on hierarchical regression, presenting advantages to the extent that it can be used as an initial model and allowed including paths that represent the relations between the model's variables (between them and the outcome), besides identifying proposed mediations.The proposed alterations to the initial model aimed to grasp the variables involved in the original study.The process of building the SEM highlighted: (1) the need to reanalyze the role of the variables when choosing to use the SEM; (2) that complex models require studies that already consider the creation of latent variables in their formulation; (3) that the combination of variables to form a new variable is a resource in the absence of well-defined prior latent variables and that solves problems of empty boxes when cross-analyzing variables; and (4) that the same variables were significant using the different estimators (WLSMV and MLR) in the model's structural component.

Conclusion
It was possible to use the result of the original work to develop the SEM based on the revision of the theoretical model, knowing the relations with the outcome and the logistic regression model.Application of the proposed methodology identified the presence of constructs (SEV, UP, FV), verified inadequacy of prenatal care as a mediator, and identified direct and indirect effects of variables on the outcome preterm birth.

Figure 2
Figure 2Resulting model for preterm birth.

Table 3
Distribution of cases and controls according to selected variables.Londrina, Paraná State, Brazil, 2009.

Table 4
Models' estimated parameters for preterm birth.Results with weighted least square mean and variance adjusted estimator (WLSMV) and robust maximum likelihood (MLR).Londrina, Paraná State, Brazil, 2009.