Acta Botanica Brasilica

Free listing is a data collection technique used in different subject areas to characterize a given cultural domain. Analysis of a set of lists from a human population allows inferences to be made about the cultural salience of the items in that domain. However, the challenge that the salience index presents is establishing a threshold value for determining whether an item can be considered salient or not. The present analysis reveals how to determine which items of a list have non-randomly determined citation frequency and order. Monte Carlo techniques were used to create a hypothetical null scenario. The present analysis not only objectively identifies which items stand out in relation to the others, it also reveals which items can be considered idiosyncratic and how order and frequency independently influence the salience index. The present analysis represents a useful tool for analyzing data collected through free listing. It also can contribute to understanding processes related to the cultural relevance of items and to the test future hypotheses in different areas of knowledge.


Introduction
The free list is one of the most commonly used data collection techniques in different areas of knowledge. It is an extremely useful tool that allows a rapid survey of people's knowledge about a given cultural domain (Brewer 1985).
Although free listing presents some limitations (e.g. Quinlan 2005;Sousa et al. 2016;Zambrana et al. 2018), it has been widely used in ethnobotanical studies with different approaches. Some examples of its application can be found in the investigation of plants with pharmacological potential (Cartaxo et al. 2010), the comparison of knowledge about plants in different communities (Ladio et al. 2007) and in the study of the structure and resilience of medical systems (Santoro et al. 2015).
In addition to identifying the items belonging to a given domain, the analysis of different lists in the same human population allows inferences to be made about their cultural salience. In an etic perspective, it represents a measure of the cultural importance of the mentioned items, expressed by the relation between the frequency of citations and the order of citation of each element (Quinlan 2005).
During a few decades, the frequency of each item (Borgatti 1990;Weller & Romney 1988) and the order of citation (Rommey & D'Andrade 1964) were used separately to express salience. However, Smith (1993) proposed the joint use of these two variables in the Salience Index. The Salience Index has since been adjusted by Smith & Borgatti (1997) and Smith et al. (1995), while other versions were proposed by Sutrop (2001) and Robbins et al. (2017), although maintaining the basic idea of combining the average position and frequency of citation of each item in determining the relative importance of items.
Calculating the Salience Index is relatively simple, and can be done using software for analyzing free lists (e.g. Borgatti 1996;Borgatti et al. 2002;Pennec et al. 2012;Purzycki & Jamieson-Lane 2017). A salience value is calculated for each item registered in a set of free lists, which are then ranked according to their relevance. The salience index is very important in ethnobotanical studies because of its usefulness in identifying plants with high general use value (Lozano et al. 2014) or species of better quality used for specific purposes (Nunes et al. 2016). It is also important for prospecting for species with pharmacological potential (Leitão et al. 2013). The interpretation of these values, however, is quite subjective. In the analysis of a set of free lists it is difficult to establish a salience threshold (Quinlan 2005). Weller & Romney (1988), for example, suggested that items with a citation frequency of about 75 % should be considered the most important. Quinlan (2005), on the other hand, points out that breaks are often observed in descending tabulations of Salience Indices of items of free lists, which can be used to distinguish the most important items. According to this author, items listed prior to the first break should be considered "highly salient", while other breaks can be used to distinguish other groups of less salient items. Borgatti & Halgin (2013) also discussed the difficulty with interpreting values of the Salience Index in research on "cultural domains" through free lists. These authors emphasize the need to define the boundaries of a "cultural domain", and identify which items in a set of lists can be considered "idiosyncrasies". In addition, these authors suggest that when analyzing descending tabulations of salience, graphical analysis of a scree plot of frequency values can help identify breaks (or elbows) and suggest the threshold between items belonging to the cultural domain and those of little or no relevance for study (Borgatti & Halgin 2013).
Since this method is very subjective, the identification of such breaks is often quite complicated (Thompson & John 2006). To the best of our knowledge, there is no method available in the literature for the analysis of free lists that proposes an objective technique for the identification of salient items. To reduce subjectivity in interpreting the Salience Index, we present a mathematical analysis that allows understanding which elements of a set of free lists have a frequency and order of citation that are not due to random reasons. To achieve this we use the probability of the occurrence of the salience value of each item in a hypothetical null scenario.

Materials and methods
The "p-value" of the Salience Index When we claim that certain elements mentioned in the free lists are more salient than others are, we are assuming that some of these elements are cited at greater frequencies and/or with a different order than the expected by random reasons. Thus, the salience values calculated for these elements, in addition to being higher, are not expected to occur associated with randomly cited items.
Assuming this, in order to decide on the statistical significance of the salient value of the items of a free list, it is necessary to initially set a null scenario that presents the same characteristics of the real scenario studied. In order to achieved that we used the number of informants interviewed in the population, the total number of items cited and the average size of the free lists, to randomly generate free lists of 1000 simulated populations, using the Monte Carlo Techniques (Robert & Casella 2010). Each simulated population has the same number of items and informants as the actual one, although the frequency and order of the items in the lists are completely random. For each simulated population, we performed the calculation of the Salience Index of each item, creating a null distribution with the salience values of the items cited due to random reasons.
Then, from the data collected from the real population, we calculated the Salience Index of each item cited followed by the probability of occurrence of these values in a null scenario (p-value). We accepted as significant all items that had lower values than the 5% probability threshold set (p-value <0.05). All analysis was performed using R development environment (R Core Team 2017) and the script developed is available as supplementary material online (R Script S1 in supplementary material), or available at request.

The cut-off point of Salience Index: Using the new method
In order to exemplify the suggested analysis, we used free lists referring to the knowledge of medicinal plants in "Horizonte" community, located in the surroundings of the National Forest of Araripe-Apodi (Ceará -Northeast of Brazil). These lists are part of two already published papers (Lozano et al. 2014;Nascimento et al. 2016). The botanical identification of the plants listed through common names on Table 1 is available in Lozano et al. (2014).
The free-list technique was used to investigate the knowledge about medicinal plants of the communities, having as the generative question: "Which medicinal plants do you know? The shortest list in our subsample has three items, the largest of which is 63, and a total of 216 items have been cited along these 153 lists.
For each simulated population, we created 153 other lists that followed the parameters observed in the original lists. The size of each of the 153 lists was randomly set and should vary between the maximum and minimum size of the actual lists. Then, to compose each simulated list, a draw was performed without substitution among the 216 items mentioned in the actual lists, ensuring that the items appeared with random frequency and order. Finally, the Salience Index of each item was calculated according to the formula S=(Σ((L-R j + 1)/L))/N, were "L" is the length of a list, R j is the rank of item j in the list and N is the number of lists in the sample, like proposed by Smith & Borgatti (1997). The procedure was repeated 300 times, creating a null scenario with 45,900 randomly generated salience values (Fig. 1).
After establishing the null scenario, we calculated the actual salience values of the items cited in the free lists (Tab.1 -Column 2), and then we calculated the probability of occurrence (p-value) of the actual salience value of each item within the null scenario created (Tab. 1 -Column 3). The salience values calculated using the Smith & Borgatti Index (1997) can vary between "0.0" (items with extremely low salience) and "1.0" (items with extremely high salience), depending on the frequency and position of each item in the analyzed list set. In order to verify the influence of each one of these variables in the composition of the salience index, we calculated the frequency (Tab. 1 -Column 4) and the average position (Tab. 1 -Column 6) of each cited item.
We then used the data from the simulated populations to create a null scenario for each variable. From that, we verified whether the values of frequency and mean position are different than expected at random (Tab. 1 -Column 5 and 7).
The calculated values for our data ranged from 0.4805 ("arruda") to 0.0001 ("azeitona preta"). Only the 38 items with the highest salience values (0.4805 -0.0487) had p-values low enough to be considered different from randomly generated values and, therefore, can be considered as salient. The 42 subsequent items showed salience values between 0.0729 and 0.0266 and the calculated p-values are not significant. Finally, the 136 items with lower salience values (0.0266 -0.0001) also presented values of p <0.05. The item that presented the highest frequency of citation (barbatimão) was cited in 111 lists and 52 items were cited only once. Citation frequency values greater than 20 or lower than 9 were considered statistically significant. Regarding the average position, only items with a mean ranking above 9 or below 20 presented statistically significant values.
Although the identification of breaks in the sequence of salience values or in a scree plot of the frequency of the cited items (Quinlan 2005;Borgatti & Halgin 2013) is the least arbitrary method for the analysis of free lists, given its subjectivity, it does not allow us to unambiguously decide the boundaries between the salient items. When analyzing our data through the method of identifying breaks in salience values, for example, we can consider that the first break occurs between the salience values of "hortelã" (0.3976) and "jatobá" (0.2796) (Tab. 1 -Column 2), since the distance between the two salience values is about 0.1, representing a drop of almost 30 % in the sequence of values. When we plot our frequency values on a scree-plot ( Fig. 2), we find a first break between the "janaguba" (104 citations) and the "malva do reino" (95 citations). The second observed break is slightly larger than the first and coincides with the break identified in the sequence of salience values (hortelã -89 citations and jatobá -73 citations). The results of the analyzes performed with the method we propose suggest that, in addition to the difficulty in identifying the boundaries between the salient items, the breaks identification method (Quinlan 2005;Borgatti & Halgin 2013) underestimates the quantity of items that should be considered as salient. Through this procedure, only the six items with higher salience values would actually be considered salient. That is about six times less than the amount of items identified with the method we propose.
Since the scree-plot consists of drawing a graph with only the frequency values of the items cited in the lists, implicitly it assumes that this variable is sufficient to define the salience. However, by decomposing the salience value, we find that an item need not necessarily be more frequent than expected at random to present a statistically significant salience value, such as the "lorma", in our data set (Tab. 1). Likewise, an item may present a citation frequency higher than that expected at random and still not be among the items with statistically significant salience values, such as "quebra-pedra" and "Lemon" (Tab. 1), because it does not occur among the highest-ranking positions. Finally, our data also showed that frequency and mean position may act in opposite directions in the composition of the salience index. This is the case of "quixaba" and "avocado", which present a statistically significant mean position although they present a frequency of citation higher than expected at random. However, it occurs because they occupy very low positions in the lists.

Discussion
The significantly higher salience values shows that the importance that the interviewed individuals attribute to these items privileges themselves at the moment of the construction of the free list. This results in a higher frequency of citation and/or an average position than expected by random reasons.
The understanding that, among the elements recorded through free lists only some items with higher salience values can be really salient, has already been emphasized in the literature for a while (Borgatti & Halgin 2013;Quinlan 2005). The choice of these items is often made in an arbitrary way. Tol et al. (2018), for example, when analyzing free lists on maternal health problems, explain that items with a Salience Index close to "1" are "high salience values". Nevertheless, the five most salient items in the free lists (S = 0.51 to 0.21) continue to be discussed without any justification for the threshold. Similarly, Wong et al. (2015) emphasized that high salience values indicate items of high importance, however, they have not defined how high these values should be. After analyzing the salience values, the authors highlight one to three items with higher salience values (in each category considered) among those cited in a study on the perception of potential clients about the advantages and disadvantages of acquiring health insurance.
The 42 elements whose salience index is not statistically significant when compared to a random distribution generated represent the items on the lists are cited by respondents without receiving any particular prominence. These items may correspond to the content responsible for the heterogeneity of knowledge existing in human populations. This result corroborates the pattern found in studies with medicinal plants, for example, which demonstrate that only a small group of species is known to most people (Ferreira Júnior & Albuquerque 2015). These items would be represented in the analysis suggested here with higher values of and p < 0.05. However, it should be noted that knowledge about medicinal plants is dynamic and subject to intracultural variation, related to factors such as age, gender, income, social roles played locally, among others (Almeida et al. 2012;Quinlan e Quinlan 2007;Hanazaki et al. 2000;Torres-Avilez et al. 2016). An alternative explanation for the items with non-statistically significant salience values is that these items may represent this socio-cultural heterogeneity, reflecting the most relevant sets of plants for people in different intracultural contexts. The analysis of free lists of different socio-cultural groups (gender, age, income, work specialization, etc) of people belonging to the same human population can be performed to verify if the subgroups present different items with statistically significant salience values, which would reinforce this hypothesis.
The ethnobiological studies that evaluate intracultural variation of knowledge tend to use only the number of items cited by each social group evaluated as the main descriptor for this variation. This methodology is quite frequent in studies related to the effect of gender (Torres-Avilez & Albuquerque 2017). The comparison of the salient items (p < 0.05) of the whole community with those of the social groups may favor the more comprehensively understanding of the local differences of a given cultural domain, by indicating which items are characteristic of each group.
136 of the 216 items present in the lists that we use can be considered idiosyncrasies, by the same criteria adopted in our analysis. These items presented unusually low salience values, with very low probability (p-value) of being produced on randomly generated lists. Such items are known by very few people or cited at last in the free lists, and for these reasons have been interpreted by the literature as items of little or no cultural importance (Borgatti & Halgin 2013) or mistakes (Quinlan 2005).
The existence of the need to identify idiosyncrasy among the items quoted in free lists has already been discussed by Borgatti & Halgin (2013). The authors suggest that the items cited by only one informant should be discarded. However, in addition to recognizing that this is not sufficient to eliminate all necessary items, this criterion is based on only one aspect of the salience, the frequency of citation.
The exclusion of items described in these studies, besides totally arbitrary, does not allow the understanding of the factors that may be leading to these idiosyncrasies. Depending on the structuring of the local medical system, for example, knowledge about the use of some medicinal plants may be concentrated in a few people who perform very specific functions, such as local experts. This would make plants of high cultural importance have very low salience values. Cultural information is also subject to "errors" during the process of social transmission (Laland & Brown 2002), a factor that can contribute to the appearance of these little transmitted local items. In addition, human beings have the capacity to create knowledge, to innovate, usually as an adaptive response to environmental situations (Boyd et al. 2011), a factor that can also generate idiosyncratic information.
Moreover, the sharing of cultural information in human populations is subject to temporal changes, that is, information that has been very frequent in the past may become infrequent in the present time depending on the environmental situation experienced. However, the opposite is also true (Mesoudi 2011). For example, extreme drought events in caatinga areas in Brazil lead people to use a set of emergency food plants that are resistant to drought and require more complex preparation (Nascimento et al. 2012). It is likely that, after a long period of scarcity, only older people will cite most of these emergency plants. Thus, the cultural salience of these emergency food plants will vary depending on the current environmental situation and the constancy of extreme drought events. Thus, comprises the characteristics of the least salient items and how they are distributed across different socio-cultural groups in the community may allow the understanding of the dynamics of entry and exit of cultural information in human populations.

Final considerations
In our calculations, we applied the formula proposed by Smith & Borgatti (1997) to calculate the Salience Index, because this index is the most used in ethnobiological studies. However, the analysis we propose is applicable to the interpretation of the salience calculated by any other formula (e.g. Sutrop 2001;Robbins et al. 2017) or even for free-recall studies.
The use of simulated populations to generate a null model and subsequent verification of statistical significance of the salience values opens up new perspectives for the studies that use the free list as a technique for collecting data. In addition to objectively indicating which items are more prominent in relation to the others as to their frequency and position in the lists, the present analysis also shows which items have salience values significantly lower than expected by chance. This could be a result of the production of individual knowledge (innovations), recent information inputs (immigration), changes of the original information (mutations) or low mnemonic relevance.
The present analysis is, therefore, a useful tool for the understanding of processes related to cultural relevance and/or mnemonic of items, thus contributing to the test of future hypotheses in different areas of social sciences and ethnobiology. from FACEPE (Foundation for Support to Science and Technology of the State of Pernambuco -Grant number: APQ-0562-2.01/17).