Female Genital Self-image Scale (FGSIS): cut-off point, reliability, and validation of measurement properties in Brazilian women

ABSTRACT This study aimed to translate, create a cut-off point, and assess the measurement properties of the female genital self-image scale (FGSIS) in Brazilian women. Content, structural, and construct validity, internal consistency, test-retest reliability, and measurement errors were assessed in this online study. FGSIS cut-off point to classify satisfaction with genital self-image (GSI) was performed using the Partial Credit Model (PCM). In total, 614 women (28.92±9.80 years) participated in the study. The FGSIS had a one-factor structure and adequate measurement properties. FGSIS≥22 points classify women as satisfied with their GSI. Therefore, FGSIS is a simple, valid, and reliable measure to assess GSI in Brazilian women.


INTRODUCTION
Genital self-image (GSI) is defined as the individual's perception of their genitalia 1 and can be associated with sexual dysfunctions, reduction of gynecological exams, and a poorer quality of life [2][3][4] .Women dissatisfied with their GSI present an increased level of anxiety when exposing their genitalia during sexual activity 1 , which can reduce their sensation of pleasure and generate pain during penetration 5 .Moreover, a dissatisfying GSI may increase the demand for unnecessary genital cosmetic surgery, especially in Brazil, where the rates of plastic surgery are high, mainly labiaplasty 6 .
In the literature, several studies 5,[7][8][9] have measured GSI with the female genital self-image scale (FGSIS) 4 .This patient-reported outcome measure (PROM) assesses a woman's feelings and opinions about her genitals based on seven items and has been validated in different populations [7][8][9] .In Brazil, the FGSIS 10 and the male genital self-image scale (MGSIS) 11 were translated and validated.However, unlike MGSIS, the translation process of the Brazilian Portuguese version of the FGSIS is unclear, the validation study is not fully available, and the authors only included women seeking abdominoplasty, which does not represent the population of Brazilian women 10 .This shows an important methodological flaw in the use of this PROM in a general population of women, as it is necessary to use PROMs validated for the population of interest 12 .
Although some Brazilian studies 5,13 used the translation by 10 in scientific research and clinical practice, the use of PROMs with high quality of evidence related to validity and reliability is recommended.This ensures, for example, that the PROM measures what it is intended to measure, that its, whose items correctly address the construct to be measured, or that the measure is free from measurement error 12 .Thus, the frequent use of the FGSIS in Brazilian research shows an important methodological flaw of PROMs without quality measurement properties.Furthermore, due to the influence of GSI on sexual function and quality of life [2][3][4] , the use of high-quality PROMs for measurement properties should be encouraged to obtain valid and reliable measurements 12 .This may also help clinicians evaluate women dissatisfied with GSI that seek genital cosmetic surgery 4 .Thus, this study aimed to translate, create a cut-off point, and assess the measurement properties of the FGSIS in a sample of Brazilian women.

METHODOLOGY
This is an online validation study conducted in Brazil.The link for participation was posted on social media and instant messaging apps.All data collection instruments were entered into Google Forms and participants could only answer the questions after reading the research objectives, evaluation methods, and clicking on "I agree to participate".Women over 18 years, of Brazilian nationality, and literate in Brazilian Portuguese were included in the study.Transgender were excluded from the study because the FGSIS was not developed for this specific population.
Sample size estimation was based on the COSMIN guideline 14 , which considers seven subjects per validated instrument item and over 100 subjects as an adequate minimum.As the FGSIS has seven items, 100 women would be needed for the analysis.
The translation and evaluation of the FGSIS measurement properties followed the COSMIN guideline 14 .This study evaluated the following measurement properties: content validity (degree to which a measuring instrument seems to be an adequate reflection of the construct); structural validity (degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured); internal consistency (degree of interrelation between items); test-retest reliability (degree to which a measurement is free from measurement errors); measurement errors (systematic and random error of a patient's score not attributed to real changes in the construct); and hypothesis testing for construct validity (degree to which an instrument scores are consistent with hypotheses based on the assumption that the instrument validly measures the construct) 12 .
For the retest evaluation, a link was returned with the FGSIS and the following question: "Did you undergo treatment (surgery/physical therapy/medication on genitals) between the first and second assessments of this research?".The answer options were "yes" or "no."Only women who answered "no" to this question were included in the test-retest reliability and measurement error analyses.Retest responses between 10 and 14 days after the first assessment were also considered.According to COSMIN 15 , this ensures that women do not change the construct between test and retest.This study was conducted from April 2021 to July 2022.

Sociodemographic characteristics
A questionnaire with sociodemographic, gynecological, and obstetric questions was used to characterize the sample.

Female genital self-image scale
FGSIS is a 7-item PROM that assesses female GSI.FGSIS items are scored on a 4-point Likert-type scale, ranging from 1 (strongly disagree) to 4 (strongly agree).The items are added together for the total score, which ranges from 7 to 28 points.Higher scores indicate a more positive GSI.The FGSIS development study found a one-dimensional structure with adequate internal consistency (α=0.91) for the 7-item version, and adequate internal consistency (α=0.86) and good test-retest reliability (r=0.62-0.78)for the 4-item alternate version.Test-retest reliability of the 7-item version was not performed in the original FGSIS study 4 .

Female sexual function index
Female sexual function index (FSFI) assesses the sexual function of sexually active women in the previous four weeks.This PROM consists of 19 items with six different answer options for each item.FSFI items are grouped into the following domains of sexual function: desire (1-2), arousal (3-6), lubrication (7-10), orgasm (11-13), satisfaction (14-16), and pain (17-19).The total FSFI score ranges from 2 to 36 points and is represented by the sum of the scores for each domain multiplied by a factor that equalizes the influence of each weighted score on the total score 16,17 .FSFI was validated for the Brazilian population with high internal consistency (α=0.96) and excellent test-retest reliability (ICC=1.00)for a total score 18 .

Body appreciation scale
Body appreciation scale (BAS-2) validated for Brazilian Portuguese was used to assess body appreciation.This one-dimensional PROM showed excellent test-retest reliability (ICC=0.81)and adequate invariance between sexes (female Omega=0.91;male Omega=0.92) in Brazilian adults.The BAS-2 has ten 5-point items ranging from 1 (never) to 5 (always), with higher scores indicating greater body appreciation 19 .

Procedures
Initially, the authorization for translation of FGSIS was granted by the developer of the instrument, Dr. Debby Herbenick.FGSIS translation and content validity was conducted in four steps 15 .In the first step, two Brazilian Portuguese-speaking translators who are fluent in English independently translated the original version of the FGSIS.One of these translators had experience in the assessed construct and the other translator did not know the construct.Afterwards, both FGSIS translations were synthesized by the researchers into a single version.In the second step, the synthesized version of the FGSIS was back-translated to the source language by two English-speaking translators.Both back-translations were performed independently, and a single version was synthesized.Discrepancies between back-translations were resolved by the researchers.In the third step, the Brazilian Portuguese version of the FGSIS was assessed by a committee of experts.In this step, online cognitive debriefings were conducted by a trained researcher.The committee was composed of three physical therapists with experience in women's health, two gynecologists, two nurses with experience in gynecology, and a psychologist.The committee reviewed the Brazilian Portuguese version of the FGSIS and suggested modifications.Then, a new round was conducted by the same experts.At this stage, the experts were asked about the comprehensiveness of the items and relevance of the FGSIS instructions, items, and response options.In the fourth step, individual cognitive debriefings were conducted and recorded over telephone by a trained researcher.
The cognitive debriefings had a semi-structured script and were conducted with 13 Brazilian women to assess the comprehensiveness of items, and the relevance and intelligibility of instructions, items, and response options of the FGSIS.A second round of cognitive debriefings with other 13 women was conducted after suggestions about the intelligibility of the items.The saturation of responses was then controlled in a spreadsheet with the suggestions for each FGSIS item.All interviews were recorded and transcribed by two other independent researchers.Content validity assessed comprehensiveness, relevance, and intelligibility during the stages of the cognitive debriefing of the expert committee and Brazilian women.

Statistical analysis
Structural validity was assessed by exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).First, EFA was assessed by oblimin rotation, the Kaiser-Meyer-Olkin (KMO) test, and Bartlett's test of sphericity.KMO>0.80 was considered ideal and p<0.05 in Bartlett's test shows the factorability of the data.Maximum likelihood estimation and a polychoric matrix were implemented with parallel analysis to decide the number of factors to be retained.Then, χ 2 (df ), root mean square error of approximation (RMSEA), comparative fit index (CFI), and the Tucker-Lewis index (TLI) were used in CFA.RMSEA<0.08 and CFI and TLI>0.90 were considered ideal.EFA and CFA were performed using Factor 10.10.02 and JASP 0.14.1, respectively.
Cronbach's alpha was used to assess the internal consistency of the FGSIS total score, with ≥0.7 considered ideal 20 .For test-retest reliability, intraclass correlation coefficient (ICC) with a two-way mixed effect model with interaction for absolute agreement between mean measures was used.ICC>0.75 was considered as excellent reliability 21 .For measurement errors, standard error of the measurement (SEM), smallest detectable change (SDC) at the individual level, and Bland and Altman graph were used.SEM was estimated by the formula [ difference SD/√2], in which difference SD was the standard deviation (SD) of the difference between the test and retest score of the FGSIS 22 .SDC was estimated by [SEM*1.96*√2].Bland and Altman graph was estimated by limits of agreement (LoA) using the formula [d-±(1.96*difference SD)], in which d-is the mean of the differences between the test and retest of the FGSIS 15 .
Hypothesis test for construct validity was assessed by Pearson's correlation, with r>0.5 indicating strong correlation, r=0.3-0.5 medium correlation, and r<0.3 weak correlation 23 .The hypothesis is that the FGSIS total score has a significant, positive, and medium-to-strong correlation with BAS-2, and no significant correlation or weak correlation with FSFI, according to the FGSIS development study 4 .Reliability and construct validity tests were performed with SPSS 22.
To create a cut-off point on the FGSIS total score for satisfaction with GSI, Partial Credit Model (PCM) of Item Response Theory was used and compared with a score generated by the Classic Test Theory (CTT) in R studio.CTT was used to determine the latent trait's level (θ) of the respondent in PCM 24 .The parameters of the FGSIS items were estimated on a measurement scale (0±1) with 0 as mean of the θ of the participants and 1 as SD.Thus, the items were positioned on the scale to allow their interpretation in the context of the θ measured.Then, each item was positioned at the point on the scale at which the probability of a participant to respond to a certain category of the item was ≥0.60.

Content validity
Content validity and face validity of the FGSIS were assessed by the expert committee and by 26 Brazilian women during cognitive debriefing in two steps.In both steps, the women in the cognitive debriefing had different mean age (26.85±8.93 years -step 1; 29±7.65 years -step 2), schooling level, skin color, and relationship status.The expert committee suggested minor changes in the naming of FGSIS items, such as modifying the word "genitais" to "órgãos genitais" for a better adaptation to Brazilian Portuguese.After the modifications, a new stage was conducted with the expert committee, who considered the modified version of FGSIS adequate.For items 2 and 5, women suggested adding an explanation of the general appearance and functioning of the genitals, respectively.Thus, the terms "aparência geral, incluindo pelos, coloração, etc." were added to item 2, and "como, por exemplo, para a relação sexual e menstruação" were added to item 5.The middle answer options "concordo" and "discordo" were also changed to "concordo parcialmente" and "discordo parcialmente."After this, a new cognitive debriefing stage included 13 other women, and no modification was suggested.Participants considered the final version of the FGSIS comprehensive, relevant, and intelligible.The final translated version of the FGSIS is presented in Appendix A, in Brazilian Portuguese.

Reliability
In the total sample, 355 (57.82%) women returned the questionnaires between 14 and 20 days for retest, and 22 (6.20%) women were excluded for having undergone treatment on the genitals.Thus, test-retest reliability analysis was performed with 333 (93.80%) women and considered excellent (ICC=0.923;95%CI 0.904-0.938).For the total sample, Cronbach's α for total FGSIS score was 0.822.
The mean difference (d-) between the test and retest results was −0.285.SEM and SDC at the individual level were 1.469 and 4.071, respectively.

DISCUSSION
We translated and assessed the measurement properties of the Brazilian Portuguese version of the FGSIS.The final version of the FGSIS was considered comprehensive, relevant, and intelligible.Unlike other FGSIS validation studies 2,8-10 , ours and that by Ellibes Kaya et al. 7 used adequate qualitative methods to assess the content validity of the FGSIS.According to COSMIN, widely recognized qualitative methods (i.e., cognitive debriefing and focus groups) must assess relevance and comprehensiveness of PROM by the experts and relevance, comprehensiveness, and intelligibility by the target population 12 .
In this study, EFA identified a one-factor structure in FGSIS, which was confirmed by CFA.Although other studies 2,7,9 have also identified a two-factor structure for the FGSIS, Ellibes Kaya et al. 7 considered a one-factor structure the most adequate.Thus, the Brazilian Portuguese version of the FGSIS can be used to assess a single construct: GSI.In the FGSIS validation study for Turkish and Iranian populations, principal component analysis was used as an EFA technique 7,9 .This may have overestimated the factor loadings of the items, indicating a two-factor structure.
The values of internal consistency and test-retest reliability showed that the items in the Brazilian Portuguese version of the FGSIS are consistent with the construct it intends to measure and reliable after a period of time.Similar results were also found in the population of Turkish 7 and Iranian women 9 .Although other studies did not assess test-retest reliability with ICC, internal consistency was also satisfactory for the seven FGSIS items 4,9 .This shows that the test-retest reliability and internal consistency of the FGSIS do not vary much in different populations.
Among the studies that evaluated the measurement properties of the FGSIS, only ours and that by Ellibes Kaya et al. 7 presented values for measurement errors.Although this study measurement errors were higher than those reported in Turkish women (SEM=0.28;SDC ind =0.78; LoA=−0.213-2.818) 7, both study results are free from systematic error.The low values of measurement errors in the study by Ellibes Kaya et al. 7 possibly occurred due to the low sample size (n=32) compared to our high sample size (n=333).
For the assessment of the hypothesis test for construct validity, we found a medium correlation between GSI and general sexual function in sexually active women, and a strong correlation between GSI and body appreciation in the total sample.Although our initial hypothesis was of at least a weak correlation between GSI and sexual function, we believe this result was due to the relationship between frequency of sexual activity, sexual function, and GSI found in other studies 1,2 .Similar results were found in the studies by Ellibes Kaya et al. 7 , Mohammed and Hassan 8 , and Pakpour et al. 9 .Our hypothesis predicting that the FGSIS total score would have a medium to strong correlation with BAS-2 was confirmed.The relationship between body image and GSI is also discussed in other studies, in which GSI is considered an integral part of body image 9,25 .
By comparing the PCM and CTT analyses, we could distinguish satisfied and dissatisfied women with GSI by the cut-off of 22 points in the FGSIS.Thus, FGSIS scores ≤21 points classify the woman as dissatisfied with GSI, and scores ≥22 classify the woman as satisfied with GSI.With this cut-off point, future studies can perform other forms of analysis on the GSI (i.e., tests for categorical variables), and health professionals could more clearly identify satisfaction with GSI or the interference of this construct on other aspects of the patient's health.
Despite the existence of a translation and validation study of the FGSIS for Brazilian women seeking abdominoplasty 10 , we followed the COSMIN checklist to assess the measurement properties of the FGSIS in a sample of Brazilian women.This shows better quality in validation studies and greater coverage for the Brazilian population.However, this study has some limitations.First, the sample was mostly composed of women with complete or incomplete higher education, which makes the generalization of the measurement property values questionable.This may have happened because college women and young people have greater access to the internet in Brazil.Moreover, the number of people with higher education has also increased recently in Brazil 26 .In this regard, we suggest that future studies include women of different schooling levels.Second, the number of responses to the retest was lower than expected, as we received retest data from just over half the number of participants in the first assessment.However, this may be because people consider sexuality a taboo and feel uncomfortable talking about it 27 .Thus, communication on the subject is still difficult and surrounded by repression 28 .Finally, we do not assess the criterion validity and responsiveness of the FGSIS.For criterion validity, a gold standard method to assess GSI is needed, which does not yet exist.The evaluation of responsiveness, the ability of an instrument to detect changes over time in the construct to be measured, was beyond our scope.Therefore, we suggest that future studies assess FGSIS responsiveness in the Brazilian population.

CONCLUSIONS
FGSIS is a simple, valid, and reliable measure to assess GSI in Brazilian women.The FGSIS cut-off point can also be used to classify women as satisfied or dissatisfied with their GSI.Health professionals and researchers can use the FGSIS to better understand female sexuality in clinical practice and scientific research.This PROM may also be useful in assessing patients dissatisfied with their GSI seeking genital cosmetic surgery.

Figure 1 Figure 1 .
Figure 1 shows the Bland and Altman plot with the lower (−4.359) and upper (3.788) limits of agreement (LoA).
Scale and positioning of female genital self-image scale items according toPartial Credit Model and Classical Test

Table 1 .
Characteristics of the study participants (n=614), Brazil, 2021-2022 SD: standard deviation; BMI: body mass index; BAS-2: body appreciation scale-2; FSFI: female sexual function index; FGSIS: female genital self-image scale.¥ Analysis performed only with women sexually active in the previous four weeks.

Table 2 .
Factor loads for items of female genital self-image scale by confirmatory factor analysis, Brazil, 2021-2022 FGSIS: female genital self-image scale; CFA: confirmatory factor analysis.