Evaluation and revision of questionnaires for use among low-literacy immigrant Latinos



Karen T. D'Alonzo

RN, Ph.D in Nursing, Assistant Professor, Rutgers, The State University of New Jersey, College of Nursing, New Jersey, USA. E-mail:

Corresponding Author




As more Spanish speaking immigrants participate in and become the focus of research studies, questions arise about the appropriateness of existing research tools. Questionnaires have often been adapted from English language instruments and tested among college-educated Hispanic-Americans. Little has been written regarding the testing and evaluation of research tools among less educated Latino immigrants. The purpose of this study was to evaluate and revise a battery of Spanish-language questionnaires for an intervention among immigrant Hispanic women. A three-step process was used to evaluate, adapt and test Spanish versions of the Self-Efficacy and Exercise Habits Survey, an abbreviated version of the Hispanic Stress Inventory-Immigrant version and the Latina Values Scale. The revised tools demonstrated acceptable validity and reliability. The adaptations improved the readability of the tools, resulting in a higher response rate, less missing data and fewer extreme responses. Psychometric limitations to the adaptation of Likert scales are discussed.

Descriptors: Psychometrics; Immigrant Health; Health Disparities.




Latinos, or persons of Hispanic descent, comprise nearly 16% of the total United States (U.S.) population and are the fastest growing minority group in the US(1). Approximately 40% of the over 45 million Latinos living in the US are foreign born and close to 75% of first-generation Latino immigrants speak Spanish most of the time. Furthermore, many Latino immigrants have lower levels of formal education and are unlikely to be familiar with research terminology. As more Spanish speaking immigrants participate in and become the focus of research studies in the US, questions arise about the appropriateness of existing research tools. Questionnaires designed for use among Latinos in the U.S. have often been translated from English and tested among second and third generation college-educated Hispanic-Americans. This approach assumes that persons of Hispanic ancestry are a homogenous, well educated group. Little has been written regarding the evaluation and testing of research tools among less educated Latino immigrants, particularly those who are less familiar with the research process.

Methodological issues

For some time, researchers have reported procedural difficulties in conducting research among Hispanic populations(2-3). In addition to translation issues, there are a number of culturally-based methodological concerns for non-Hispanic researchers working with Latinos(4). Most frequently cited is a cultural bias associated with the use of Likert scales among Latinos(2). Likert scales are popular in the social sciences, but low-literacy immigrant Latinos may have a poor understanding of the graded response format(3,5). Specific response trends include social desirability responses(6), extreme response sets (excessive use of the endpoints of the scale)(7) and missing data(8). Lastly, population-related extraneous variables may affect Latinos' responses to Likert scales, including age, level of education, acculturation and country of origin(2,3,7). Younger adults with more years of education and higher acculturation scores are less likely to report difficulties completing Likert scales(3). Historically, there has been little agreement among researchers regarding the most appropriate way to handle these methodological dilemmas(4,9).

Purpose. The purpose of this study was to evaluate and revise a battery of Spanish-language questionnaires for an intervention among immigrant Hispanic women. The study consisted of three phases: Phase 1-Assessment of existing instruments; Phase 2–Revision of instruments and Phase 3-Preliminary assessment of the psychometric properties of the revised instruments. In Phases 1 and 2, the primary investigator was interested in assessing the appropriateness of the instrument response format, evidence of social desirability issues, extreme response styles, missing data and time needed for completion. Phase 3 focused on pilot testing the revisions and assessing the psychometric properties of the adapted instruments.



Phase 1- Assessment of existing instruments: A purposive sample of 13 females was recruited from a group of immigrant Latinas attending a promotora (community health worker) training program in New Jersey. The descriptive statistics for the subjects in Phases 1, 2 and 3 are presented in Table 1.

Data collection measures: (1) Self-Efficacy and Exercise Habits Survey(10) (SEHS); (2) Latina Values Scale (LVS)(11) and (3) Abbreviated version of the Hispanic Stress Inventory-Immigrant (HSI-I)(12). The SEHS and the HSI-I had previously been translated into Spanish with good validity and reliability among U.S. Spanish speaking populations. A Spanish version of the LVS was not available during Phase 1, so the author and three bi-lingual community consultants translated the tool into Spanish and then back- translated it into English. The English and Spanish versions of the questionnaires were compared and differences in terms and conceptual accuracies were resolved.

Self-Efficacy and Exercise Habits Survey(10): The SEHS measures self-efficacy for exercise in relationship to two factors; "Making time for exercise" and "Resisting relapse". Alpha coefficients for the SEHS have been reported at .83 and .85, with a test- retest reliability of .78. Both the English and Spanish versions of the SEHS have a six-point Likert scale; choices 1 and 2 correspond to "I am sure that I cannot" ("Estoy seguro que no puedo"), choices 3 and 4 represent "Perhaps I can" ("Quizas sí puedo"), while choices 5 and 6 correspond to "I am sure that I can" ("Estoy seguro que puedo"). The English version, developed among college students, contains 12 items, with a possible range of 12-72. The 15-item Spanish version was developed among community-dwelling adults and has a possible range of 15-90(13). Higher scores are associated with a greater degree of exercise self-efficacy.

Latina Values Scale (LVS)(11): The 31-item two-stage Likert scale tool was developed among young Hispanic women in the US to measure beliefs about marianismo, traditional roles assigned to women in Hispanic culture. The LVS has demonstrated an inter-item reliability of .87. A significant inverse correlation (r=-.65, p=.01) was noted between scores from the LVS and those reported from a measure of assertiveness. Exploratory factor analysis (EFA) revealed three subscales: Responsibility, Assertion and Satisfaction. The "main scale" of the original English version is a six-point Likert scale, with the responses ranging from 1- "I totally disagree" to 6- "I totally agree". Scores in the main scale range from 31-186. Higher scores are associated with stronger marianismo beliefs. The "Satisfaction" scale contains 31 items, which correspond to the items in the main scale. For each question, the subject is asked, "How satisfied are you of your response?" The subject then chooses one of four anchors, ranging from 1- "I am very dissatisfied" to 4- "I am very satisfied." Scores on the satisfaction scale range from 31-124. High scores indicate the subject is very comfortable with her responses. A draft of a Spanish version of the tool was created as previously described.

Abbreviated version of the Hispanic Stress Inventory-Immigrant version (HSI-I)(12): The 17 item-tool was developed for immigrant Latinos. The five subscales deal with occupational or economic stress, parental stress, marital stress, immigration stress and family/culture stress. Internal consistencies across all subscales ranged from .68 to .83. Convergent validity of the revised tool is supported with moderately positive relations through self-report measures of depression, anxiety, and anger mood levels. Respondents are asked to answer "Yes" or "No" to whether or not they have experienced a series of acculturation-related stressors in the past three months. If the answer is "yes", the subject is then instructed to indicate (on a scale of 1-5) how worried or tense the situation made them feel. A rating of one means the subject felt "Not at all worried/ tense" ("Nada preocupada(o)/tensa(o)") while a score of five indicates the subject felt "Extremely worried/ tense" ("Muy preocupada (o)/tensa(o)"). Scores on the HSI-I range from 17-85. Higher scores correspond to higher levels of subject anxiety.

Procedure: Following study explanation and informed consent, each of the women received a packet containing the questionnaires in random order. Subjects were instructed to answer the questions, spot the need for grammatical changes and identify any questions felt to be ambiguous or overly sensitive. Three bi-lingual research assistants aided women who requested clarification or who wished to have the questions read to them. Data were managed and demographic and psychometric analyses performed using SPSS version 16.0 for Windows (SPSS Inc. Chicago. Ill).



Only eight of the 13 women were able to complete the questionnaires within one hour. Literacy level and number of years of education were prominent factors in completion of the questionnaires. The four subjects who had less than six years of formal education were most likely to omit multiple items. Six of the women noted difficulty in responding questions containing double negatives. Validity and reliability were not assessed due to the small sample size and the preponderance of missing data.

All of the women voiced displeasure with the Likert format. Half of the questionnaires showed evidence of the subjects' preference for extreme responses. Likewise, half of the group favored choices which avoided putting themselves in an unfavorable light (social desirability responses) or simply answered all of the questions with the same response (halo effect). The four completed packets contained significant amounts of missing data. For the SEHS, only four questionnaires were completed in their entirety. All contained a large number of positively skewed extreme responses to situations where exercising might be difficult ("I am sure that I can") ("Estoy seguro que si puedo'). Hence, the total scores for the SEHS were quite high (mean=41.5, SD=2.4), although none of the women were regular exercisers. The LVS was the longest instrument and contained some questions of a sexual nature which the women considered "too personal" to answer. Many of the subjects did not answer the second half of each question (Satisfaction subscale), which was a crucial component of the questionnaire. For those women who did respond, many later indicated they understood the phrase, "How satisfied are you of your response?" ("¿Cuán satisfecho está Usted con su respuesta?") to mean "How sure are you of your response?" ("¿Cuán seguro está Usted con su respuesta?"). Rather than appear indecisive, the majority of the women answered, "I am very satisfied"("Estoy muy satisfecho") without fully understanding the nature of the question. In the HSI-I, more than one-half of the women omitted the second half of each question and 3 of the questionnaires contained responses with strong halo effects.

Phase 2 - Revision of instruments: Prior to revisions, the primary investigator met with two bi-lingual community consultants and an experienced Latino researcher to discuss the difficulties encountered during the assessment of the questionnaires. Based upon the findings from Phase 1, the following changes were suggested and subsequently implemented: 1) Likert scales were collapsed into 3 choices; Yes/No/I don't know, or I'm not sure; 2) a set of simple instructions, sample questions and responses were added to each questionnaire; 3) all of the tools were administered in the same order, beginning with the simplest tool first and ending with the longest and/or one with the most sensitive questions; 4) local community women were trained as community research assistants (CRAs) to allow for a 2:1 ratio of subjects/assistants; 5) child care was provided during data collection; 6) scales were all written in the same direction; 7) tools were rewritten in a larger font with more "white space" around each question and 8) the LVS was replaced with The Latina Values Scale- Revised (LVS-R)(14) , a 28-item Spanish language version developed for young to middle-aged Latinas. In the LVS-R, the Satisfaction scale was replaced by the Conflict Scale and the second part of each question was rewritten as: "Has your response to this question caused problems or conflicts in your life?" (¿La respuesta a esta pregunta ha causado problemas o conflictos en su vida?). Study participants felt that this phrase better reflected the intent of the questionnaire. Cronbach's alpha for the LVS-R was .94 and .95 for the Conflict scale. In addition to the Conflict Scale, EFA of the LVS-R(14) revealed six other factors: Self-Sacrifice, Assertion, Guilt, Self-Blame, Putting Others First and Responsibility. The presence of seven factors in the LVS-R as opposed to only three factors in the LVS demonstrates the complexity of the concept of marianismo(14). All of the revised tools were adapted to assure a sixth grade reading level in Spanish. Once permission was obtained from the authors, the revised tools were reviewed for face validity and cultural appropriateness by community consultants prior to administration to six promotoras. Characteristics of this sample are described in Table 1.

Results: All six participants completed the full battery of questionnaires. Mean completion time was 35 minutes, with all women answering close to 100% of the questions. Psychometric testing was not performed due to the small number of subjects.

Phase 3- Psychometric properties of the revised instruments: The revised tools were administered to 81 immigrant Hispanic women, ages 18-55 yrs., who participated in the Physical Activity Intervention for Latinas (PAIL) Study (manuscript in progress). The promotoras assisted subjects by reading/clarifying the meaning of selected questions and checking to see that all questions were answered. Mean completion time for the set of tools was 40 minutes, with less than two percent of missing data for all of the women.

Psychometrics: Reliability of the revised tools was assessed through corrected item - total correlations and the internal consistency of each tool and adapted subscales. Construct validity was assessed through CFA, carried out on each revised tool through principal axis extraction with Varimax rotation(15). Prior to CFA, the assumptions for factor analysis were tested and verified. Bartlett's Test of Sphericity, which measures the strength of the relationship among variables, and the Kaiser-Meyer-Olkin (KMO) measure of sampling accuracy were used to assess the factorability of the correlation matrix(15). A priori theory, the original tools and eigenvalues (factors with eigenvalues of less than 1.0 were excluded) guided factor extraction. Items with a factor loading greater than .32 were retained(15). A minimum of a .20 difference between an item's loading on theoretically aligned and opposed factors was recommended. Scree plots were used to corroborate decisions regarding factor extraction. The results of the psychometric analysis are presented in Tables 2, 3, and 4.

Self-Efficacy and Exercise Habits Survey: The Likert scale was reduced to 3 anchors: 1) "I am sure I cannot"; 2) "Maybe I can" 3) " I am sure that I can". The revised Spanish version of the SEHS contained 15 items, with a possible range of 15-45. Mean score for the questionnaire was 37.5 (SD=4.81), with a range of 28-45. The histogram plot of the data revealed skewness of -.22 and kurtosis of -1.0. Although descriptive statistics were not calculated in the earlier sample, it appears there was less of a tendency for the women to choose extreme positive responses with the revised tool. All of the item -total subscale correlation coefficients ranged from .41 to .72. Cronbach's alpha for the SEHS was .81. KMO was .64 and Bartlett's test of Sphericity was statistically significant (p<.001), both of which are acceptable(15). CFA results are presented in Table 2. Since only two factors had Eigenvalues over 1.0 and the Scree plot supported a two-component solution, CFA revealed the same two-factor solution identified in the English version(10). Internal consistency for each of the two subscales, "Making time for exercise" and "Resisting relapse" was satisfactory at .75. These two factors accounted for 42.2 % of the total variance.

Latina Values Scale Revised: Twenty-three of the twenty-eight items in the LVSR were retained and tested. The five items omitted were either of a sexual nature and not relevant to this study or were considered redundant. The five-point Likert scale used in the original LVSR was reduced to 3 anchors: 1) "I disagree"; 2) "Not sure" and 3) "I agree". The 5 anchors used in the original LVSR Conflict scale were reduced to 3 choices: 1) "Never"; 2) "Sometimes" and 3) "Always". Possible total score ranged from 23-69. Mean score on the LVSR was 45 (SD=7.23) with a range of 33-69. The histogram plot demonstrated a slight negative skew (-.28) and kurtosis of -.78, suggesting that subjects did not tend to adhere to the extreme response style. Cronbach's alpha for the LVSR was acceptable at .74. Two items had item to subscale correlations less than the acceptable level of .40. These were: "I often put myself down in relation to men/ A menudo me sentido inferior en comparación a los hombres" and "I find myself believing that criticism is caused by my faults/ Creo que los conflictos y problemas son mi culpa." Based upon informal discussion with a subset of the women, the first question should have been omitted in this study as it was related to the other sexually-themed items. For the second question, the women felt the wording of the Spanish version of the question was confusing and evoked a different type of response than was intended by the English version. As a result, both items were subsequently excluded from analysis for this study. The KMO was .50, indicating minimally acceptable sample size(16) and Bartlett's Test of Sphericity was statistically significant (p<.001). CFA supported the seven-factor model, which accounted for 66.5% of the variance. Internal consistency for each of the subscales >.70. Results of the CFA are presented in Table 3.

Hispanic Stress Inventory: The first half of each two-stage question pertaining to sources of immigration-related stress was not altered. The five-point Likert scale used in the second half of each question was reduced to three anchors: 1) "Not at all worried"; 2) "A little worried" and 3) "Very worried". Mean score for the main scale was 4.06 (SD=3.95), indicating that most of the subjects reported few instances of immigration-related stress. Potential range of scores for the second half of the revised tool was 0-54. Mean score for the second half of the questions was 7.79 (SD=9.41) with a range of 0-36. In the revised version, there were less than five percent of missing values. Item to subscale correlation coefficients were acceptable, ranging from .41 to .77.Cronbach's alpha for the entire tool was .74. KMO was .67 and Bartlett's Test of Sphericity was p< .001. Since only two factors had Eigenvalues over 1.0, CFA supported the two-factor model(13) of Interfamilial and Extrafamilial stressors. Internal consistency for the two subscales was .72 and .76 respectively. These two factors accounted for 40.3% of the total variance. A scree plot confirmed the two-factor solution. One item," I have felt that my children's ideas about sexuality are too liberal" did not load well on to either scale, probably because of the young age of the parents. The results are presented in Table 4. The item was added to the adapted tool, "I think that if I would go to a social service or government agency, I would be deported", exhibited an item to subscale correlation coefficient of .42. The internal consistency of the Extrafamiliar subscale remained the same whether or not the item was included, suggesting that it was a good fit in the subscale.



The results support earlier studies that indicate the use of Likert scales among immigrant Latinos is often problematic. This is true even when the instruments have been previously translated into Spanish and have demonstrated adequate validity and reliability among Spanish speakers in the US. This study demonstrated that adapted Spanish language tools can be used successfully among groups of low-literacy immigrants who are unfamiliar with the research process. The adaptations appeared to improve the readability of the tools, resulting in a greater response rate, less missing data and a diminished trend toward extreme responses.

Although condensing the number of Likert anchors from five to three seemed to solve some of the subjects' ambiguities, it also lowered the variability of the responses. In the CFA process, the isolated factors explained only 40-65% of the total variance. Although identifying enough factors to account for 80-90% of the variance is desirable, this criterion could be as low as 50% when the goal is to explain variance with as few factors as possible(17). Given that the CFA of each of the three instruments confirmed the original factor structure, this lends support for the format of the adapted tools. The issue of the optimum number of anchors in a Likert scale is a controversial one. Five to seven anchors are often considered ideal to assure a thorough representation of responses, but three points may be sufficient if the emphasis is on group rather than individual data. In this study, using more points than the subjects could understand might have resulted in increased variability, but not necessarily increased validity or reliability. Another potential issue is the loss of variability when the three-point Likert scale is used in a pre-test/ post-test format. When the ability to capture change is necessary, a larger number of anchors and subjects may be needed to be thoroughly briefed before data collection begins. It is notable that the subjects had little difficulty completing other Spanish language tools (e.g. the Center for Epidemiologic Studies Depression Scale (CES-D)) using multiple response scales when the choices were more quantifiable. This same format could be implemented in the design of new instruments to measure physical activity concepts. Rather than asking how confident the subject is about getting up early to exercise, the question could be worded, "How many times/week do you feel that you could get up early to exercise?" In this case, the responses would be based on the number of days/week (e.g. 0, 1-2, 3-4. 5-6, 7). Such an approach would improve clarity and conceivably preserve response variability. While the women in this study had a mean nine years of education, 21% only completed the sixth grade. Since conversational literacy in a language is likely to be greater than familiarity with research terminology, it is not surprising that many of the subjects in Phase 1 struggled to complete the tools. Researchers who work with immigrant populations should take into account that subjects are not likely to be "research literate", so additional time and attention is needed for data collection instructions. One limitation of Phase 3 of the study was the small sample size. KMO values for two of the three instruments suggest the sample size was likely adequate. If there are four or more variables with loadings above .60, the pattern may be interpreted whatever the sample size used and when communalities are high (>60), sample sizes well below 100 will still be adequate. Repeated CFA of the instruments with larger sample sizes is advisable.



Adaptation of existing research tools remains a significant instrumentation challenge for researchers who work with immigrant Latinos. Given the growing diversity in the US population, nurse researchers need to consider a variety of culturally appropriate methods to encourage participation in research studies by low-literacy immigrant populations. Transnational collaborations among nurse researchers committed to the health of Latin American immigrants may be one approach to identify such culturally appropriate methods.



Corresponding Author:
Karen T. D'Alonzo
Rutgers, The State University of New Jersey
College of Nursing
180 University Avenue, Rm. 226
Newark, NJ USA 07102



Received: Jan. 18th 2011
Accepted: Ago. 15th 2011

