Methodological Criteria for Scoring Clustering and Switching in Verbal Fluency Tasks

The objective of this study was to describe an adaptation to Brazilian Portuguese of the methodological criteria for analysis of clustering and switching in semantic verbal fluency (SVF) and phonemic verbal fluency (PVF) tasks. The adaptation process consisted of six steps, including the selection of the clustering and switching variables based on data from a sample of 419 children and the analysis of inter-rater reliability (six raters). The following variables were scored: the total number of words generated the raw number of clusters, the mean cluster size, and the raw number of switches. There was a significant association between raters (intra-class correlation coefficients between 0.95 and 0.99), showing that the analytical method was reliable. Our study provides an evaluation of SVF and PVF tasks that goes beyond the overall score, making it possible to investigate the cognitive processes underlying this neuropsychological function.

Our study describes the methodological criteria adapted from Troyer, Moscovitch & Winocour (1997), Troyer (2000), and Lopes, Brucki, Giampaoli, & Mansur (2009) for the evaluation of the underlying cognitive components (clustering and switching) in the overall performance on verbal fluency (VF) tasks based on the cognitive neuropsychology approach.VF tasks have been used in psychology for a long time.These tasks were first used by Thurstone (1938, cited by Miller, 1984) and his tests of "primary mental abilities".The overall score of these tasks is widely used in clinical and experimental contexts in individuals of different ages and educational levels (Oberg & Ramírez, 2006) as a diagnostic tool.These tasks are also used in research on normal aging (Mayr, 2002).
VF tasks, both PVF and SVF tasks, evaluate language skills, such as lexical retrieval, vocabulary size, and lexical access.That is, the ability to access the semantic lexicon and retrieve the semantic and formal information of the words (Jaichenco, Wilson, & Ruiz, 2007).These tasks also include the evaluation of the access to the working and semantic memories (Lezak, Howeison, & Loring, 2004).In addition, these tasks assess speed and facilitation of verbal production and response, mental organization, search strategies, and ability to initiate a behavior as a response to a new activity.These characteristics are related to the following components of executive functions: volition, flexibility, and inhibitory control.
VF performance improves throughout childhood and adolescence, but the peak of development varies according to the type of task.In terms of this variation, international studies have found significant differences in the SVF performance between 7-8 years old and 9-10 years old; whereas, when it comes to PVF, the number of words retrieved significantly increases starting at 11-12 years (Sauzéon, Lestage, Raboutet, Kaoua, & Claverie, 2004;Tallberg, Carlsson, & Liberman, 2011).The literature shows that both adults and children have a better performance in SVF tasks in comparison with PVF tasks (Sauzéon et al., 2004;Strauss et al., 2006).This performance pattern is explained by the different cognitive components recruited during the tasks.PVF tasks are associated with increased use of search strategies to retrieve words; therefore, these tasks depend more on the executive functions (such as cognitive flexibility and switching), which require greater cognitive effort (Sauzéon et al., 2004).As for SVF tasks, mental imaging strategies (imagining scenarios based on the category of words) facilitate response retrieval.In addition, responses to SVF seem to be more related to the lexical-semantic networks that are part of the semantic memory (Charchat-Fichman et al., 2011), which develops prior to the executive functions.Thus, the improvement of SVF is more apparent in early childhood.The subsequent occurrence of changes in the performance of PVF tasks compared with SVF can be explained in terms of development of strategic search skills, cognitive flexibility, switching, and inhibitory control.All these components are dependent on executive functions and the maturation of the frontal lobes, which reaches its peak between 10 and 12 years old (Tallberg et al., 2011).In addition, PVF is also dependent on educational level and growth of the orthographic lexicon (Charchat-Fichman et al., 2011).
Studies of patients with brain lesions have suggested that impaired performances on PVF tasks are related to frontal lobe lesions, particularly in the left hemisphere (Birn et al., 2010;Troyer, Moscovitch, Winocur, Alexander, & Stuss, 1998;Troyer, Moscovitch, Winocur, Leach, & Freedman, 1998).In contrast, SVF tasks require the use of a smaller number of sets of words, because these tasks recruit words within a particular semantic category (Troyer et al., 1997).In this case, studies using the lesion paradigm have shown impaired performances on these tasks associated with lesions in the temporal lobes (Birn et al., 2010;Troyer, Moscovitch, Winocur, Alexander et al., 1998;Troyer, Moscovitch, Winocur, Leach et al., 1998).
The score analyzed in VF tasks usually is the total number of correct words generated within a certain amount of time (Strauss et al., 2006).However, this score does not provide much information on the cognitive processes involved in the performance of fluency.Additionally, it does not answer the question of why a particular group of patients has impaired performance on the task (Troyer et al., 1997).One of the reasons why fluency is a multifactorial process can be related to the theories of lexical access (Levelt, Roelofs, & Meyer, 1999;Rapp & Goldrick, 2006).Briefly, these theories propose that the word production originates from a complex process that involves the activation of three representational levels: semantic, lexical, phonological levels.These levels interact simultaneously translating a concept into a set of phonemes with mediation of lexical forms.Therefore, changes in any of these levels could lead to impaired performance on VF tasks.
Hence, the overall score of the PVF and SVF tasks does not show which of these levels are impairing the overall performance.Only an analysis of the cognitive strategies used during retrieval is able to reveal the deficient processes.
Some authors have suggested the use of qualitative assessment of the performance on VF tasks for the analysis of the cognitive processes underlying these tasks (Troyer, 2000;Troyer et al., 1997).In addition to the qualitative analysis of types of errors (e.g., perseverative vs. non perseverative errors), Troyer et al. (1997) proposed another method of qualitative and quantitative analysis: analyses of clustering and switching.The analysis of clustering involves the phonemic analysis in PVF, whereas the analysis of switching involves the semantic categorization in SVF.Both the clustering and switching processes are relatively automatic.Clustering is related to semantic memory, especially lexical storage of words.Switching involves search processes and cognitive flexibility in order to switch from one subcategory to another.This is a controlled process related to executive functions.Troyer (2000) argues that, in order to achieve good overall performance on VF tasks, the individual needs to generate words within the same subcategory and only switch to the next category after using all available words in the previous category.In the case of PVF using the phonemic criterion of letter "M", the process would consist of using the words starting with "MO" followed by those starting with "MA", as shown in this sequence: "moon, mother, maple, mate, marmot, mango, mail, married, and mad."In the SVF task using the semantic category of animals, an example containing balancing of strategies included the following retrieved words: "giraffe, alligator, ladybug, butterfly, eagle, and dove."This pattern would correspond to switching between subcategories of wild animals, insects, and birds.
There are fewer Brazilian studies using this method in VF tasks.We could find studies of adults (Brucki & Rocha, 2004) and healthy elderly (Silva et al., 2011), elderly with mild cognitive decline (Bertola et al., 2014), patients with Alzheimer's disease (Lopes et al., 2009), patients with lesion in the right hemisphere (Becker, Muller, Rodrigues, Villavicencio, & Salles, 2014), and children with anxiety disorders (Toazza et al., 2014).However, we could not find specific studies discussing the criteria used to make up clusters and switching analysis between them.The original studies by Troyer et al. (Troyer et al., 1997;Troyer 2000) established some criteria for the formation of phonemic and semantic subcategories; however, they are adapted to the Canadian context (English), instead of the Brazilian context.Brazilian studies have shortly explained how the categories were determined, but failed to list the items included in each category.
Therefore, the objective of this study was to describe an adaptation to Brazilian Portuguese of the analysis of clustering and switching in VF tasks according to the proposal designed by Troyer et al. (1997), Troyer (2000), and Lopes et al. (2009), and to investigate its reliability.Specifically, this paper aims to: 1) describe the selection of the clustering and switching variables based on data from a children sample; and 2) investigate the inter-rate reliability.Therefore, we tried to describe the criteria adapted to the Brazilian context for the analysis of clustering and switching in PVF and SVF tasks for clinical use in childhood.

Method
Participants, instrument, and procedures will be described step by step.The selection of the clustering and switching variables was according to the following steps: 1) development of a database of words generated by the sample of children; 2) determination of the categories for PVF and SVF; 3) scoring words and selection of the variables by the raters independently; 4) analysis of inter-rater reliability; 5) new scoring of discordant variables; and 6) final version of the scoring variables.

Step 1: Development of a Database of Generated Words
The selection of the clustering and switching variables was based on the words generated by 419 children on a PVF task and a SVF task.These participants were from the standardization sample of the "Child Brief Neuropsychological Assessment Battery -NEUPSI-LIN-INF" (Salles, Fonseca, Cruz-Rodrigues, Mello, Barbosa, & Miranda, 2011;Salles, Sbicigo, Machado, Miranda, & Fonseca, 2014), they were 6-12-year primary school students from public and private schools located in the state of Rio Grande do Sul, Brazil.This project was approved by the Research Ethics Committee (information removed by the journal), and the children's parents allowed them to participate in the study by signing a written consent form.
The VF tasks used in our study were according to the Child Brief Neuropsychological Assessment Battery -NEUPSILIN-INF.The PVF task used the letter M as the phonemic criterion, whereas the SVF task used the category animals as the semantic criterion.Both tasks used 60 seconds for word retrieval.All the words retrieved by each participant were entered into an Excel spreadsheet in the order in which they were generated, including errors (repetitions and intrusions).
Step 2: Determination of the categories for PVF and SVF Six raters participated in the selection of the clustering and switching variables.Four of them were undergraduate students of psychology, one was a Master's degree student in psychology with training in neuropsychology, and one was a university professor, who was a speech therapist with expertise in cognitive neuropsychology and language.Based on the criteria established in the study by Troyer et al. (1997) and Lopes et al. (2009) for the formation of clustering and switching variables, the raters discussed the necessary adaptations to the Brazilian context for the PVF and SVF tasks.Initially, the raters independently scored the words generated by 50 participants in each task.The goal was to adapt the categories created by Troyer et al. (1997) for the PVF and SVF tasks and the categories created by Lopes et al. (2009) for the SVF task based on the retrieval patterns of the sample.The initial scoring included the original categories of the studies.Next, the raters discussed which of these categories would actually be used in the study and which new items would be included in each category.

Step 3: Scoring Words and Selection of Variables
After defining the categories, four raters independently scored the words generated by 419 participants in each VF task.Three variables were selected for each VF task related to the components of clustering and switching: the raw number of clusters, the mean cluster size, and the raw number of switches.In addition, we also analyzed the classical measure: the total number of words generated according to the standards determined by Strauss et al. (2006).For each participant, three raters scored the variables independently.
Step 4: Analysis of Inter-Rater Reliability Intraclass correlation coefficient (ICC) analyses were performed to investigate the reliability of each variable analyzed by the four raters.We used the statistical package SPSS 20.0.The significance level was set at 5%.ICC values higher than or equal to 0.75 were considered to be excellent correlations (Shrout & Feliss, 1979).
Step 5: New Scoring of Discordant Items For items where there was no agreement between at least two raters, the scoring was performed again by three raters independently.With that purpose, the raters were trained again by the principal researcher.This procedure was based on the original study by Troyer et al. (1997) in which the principal researcher was also considered the gold standard rater.

Step 6: Final Version of the Scoring Variables
After performing all these steps, we determined the criteria for scoring each of the six variables of analysis (the raw number of clusters, the mean cluster size, and the raw number of switches in the PVF and SVF tasks).Three categories were established for the formation of clustering and switching on the PVF task and six categories on the SVF task.

Steps 1 and 2: Development of Database of Generated Words and Determination of the Categories for PVF and SVF
Based on the database developed in Step 1, we conducted Step 2 (determination of the categories for PVF and SVF).In the original study by Troyer et al. (1997), four categories were used to select the clustering and switching variables in the PVF task: words starting with the same two letters (e.g.: arm and art), words that rhyme (e.g.: sand and stand), words that differ only by a vowel sound (e.g.: sat and seat and soot), and homonymous words (e.g.: sum and some).In the PVF task, there was not need to implement many changes to the patterns used in the Canadian study; we only removed the category of homonymous words because this type of words is not common in Portuguese and there were not cases in this sample.Thus, the following chosen categories were: words starting with the same two letters, words that rhyme, and words starting and ending with the same sounds, differing by a vowel sound (examples in Table 1).
The SVF task required greater changes to the original categories, mainly because of the use of unusual animals in the Brazilian context when compared to the original Canadian study.The Canadian study (Troyer et al., 1997) used the following groups: living environment (Africa, Australia, Arctic/Far North, Farm, North America, Water), human use (beasts of burden, fur, pets), and zoological categories (bird, bovine, canine, deer, feline, fish, insect, insectivores, primate, rabbit, reptile or amphibian, rodent and weasel).Many of these categories did not appear in the generations (such as weasels, Australian animals, etc.) in our study sample, or these categories were too broad.In these cases, the whole sequence of the participant's retrieval fit in only one category (one cluster).There was not variation of patterns (variations of clusters, that is, switching between clusters or single words), as it is the case for the zoological categories, which includes almost all kinds of animals.Therefore, this would not be an illustrative analysis of strategies underlying lexical retrieval.Lopes et al. (2009) studied a Brazilian sample of adults and adapted and refined the categories proposed by Troyer et al., (1997) because of this problem.These authors used the following categories: wild, domestic environment, breeding, small (for arthropods and such like), winged and aquatic animals.We discussed the use of these categories, and we further refined them because we could not find sequences involving the pattern of small animals in our data.Also, we chose to include the category insects because this was a common pattern in our sample.Therefore, we included the living environment animals mentioned by Troyer et al. (1997) in the wild, and we removed animals that are unusual in the Brazilian context, such as bison and musk ox.The following categories were the same as those in the study by Lopes et al. (2009): domestic environment, breeding, winged, and aquatic animals (see Table 2 for the final scoring).

Step 3: Scoring Words and Selection of Variables
The criteria for variable selection were the same as that of the original study by Troyer et al. (1997).See below the description of the selection process for each variable.
Raw number of clusters (categories): sum of all clusters of each participant.Clusters are groups of words generated successively and belonging to the same category.There is a cluster when at least two words are successively generated in the same category (e.g., macaroni, mango, and mittens -the first two words make up a cluster and the latter is considered a single word).Single words are not considered to be a cluster.
Mean cluster size: sum of the words of a cluster starting with the second word generated (e.g., model, morning, and motion is a cluster of two words), dividing this number by the total number of clusters generated by the participant.
Raw number of switches: sum of the number of switches between the clusters, also including exchanges between single words, which are not included in the two previous scores.The sequence mail, machine, madam, meal, mule, music has two clusters (made up by words beginning with ma and mu and a single word between them, with three switches, i.e., three strategy changes).
Errors and repetitions are included in the calculation of these scores because they provide information about the cognitive processes underlying the generated words (Troyer et al., 1997).The score total number of words generated (quantitative analysis) was also Words that differ only in vowel sounds, keeping the first and last letters meal, mail mesh, mash analyzed independently by the raters using the criteria shown below.
Total number of correct words generated: sum of the words retrieved during the limit of time set for the task, excluding errors and repetitions.For both tasks, errors are considered to be morphological derivatives of the same word, such as gender variations (e.g.: boy, girl / rooster, chicken), variations of number without representing collectives (e.g.: boy, boys / rooster, roosters), variations of size (e.g.: boy, little boy / cat, kitten), different forms of the same verb (e.g.: bite, bites, bit) and names (Salles et al., in press).

Step 4: Analysis of Inter-Rater Reliability
Table 3 shows the ICCs for the raw number of clusters, the mean cluster size, the raw number of switches, and the total number of words generated in the PVF and SVF tasks evaluated by the four raters independently.The ICC for the PVF task showed high values (the lowest value for the confidence interval was 0.91), demonstrating excellent inter-rater reliability.The ICC values for the same variables in the SVF task were also satisfactory because the lowest value for the confidence interval was 0.90.

Step 5: New Scoring of Discordant Items
After performing the ICC analyses, the raters participated in additional training sessions with the purpose of scoring the items again when there was no agreement between at least two raters (considering the three raters who analyzed each task).In the PVF task, there was no need to discuss the rules because two out of three raters agreed on the values of variables during the independent analysis.Elk, tapir, baboon, sloth, buffalo, camel, chameleon, kangaroo, beaver, cheetah, chimpanzee, snake, rabbit, coyote, crocodile, dinosaur, elephant, emu, squirrel, seal, ferret, skunk, giraffe, gorilla, hippo, hyena, alligator, ocelot, python, leopard, lion, lemur, lynx, wolf, monkey, platypus, porcupine, manatee, penguin, puma, panther, fox, rhino, snake, anteater, tiger, bear, polar bear, deer, zebra.Aquatic animals Animals that live in aquatic environments and can be sea animals or not.

Domestic environment animals
Found in the domestic environment (pets or those living in the home garden).
Winged animals Animals that belong to the zoological class of birds.

Insects
Animals that belong to the zoological class of insects.
In the SVF task, it was necessary to revise some items, mainly in terms of difference between wild animals -the broadest category of the rules -and aquatic and winged, because of the overlapping of animals included in the latter categories when compared to the former category.There was disagreement also regarding the category domestic environment and winged, such as, for example, the sequence dog, cat, birdie, bird, and ostrich.In this case, the word birdie should be included in two clusters; in the first cluster of domestic environment and in the second cluster of winged animals.In order to resolve, the items showing disagreement were reviewed by the research team, the rules for scoring were discussed again and the score were determined independently.The final value of each variable was the one agreed upon by at least two raters after this second analysis.
Step 6: Final version of the scoring variables

Criteria for Cluster Scoring
According to Troyer et al. (1997) andTroyer (2000), clusters are groups of words retrieved successively and belonging to the same category.A cluster must include at least two words retrieved successively belonging to the same category.For example, model, morning, and motion is a cluster of two words (because the number of words per cluster is always considered starting with the second word), considering the PVF task.When two clusters overlap, i.e., some of the words retrieved in the sequence may be included in more than one category, and other words exclusively belong to one of the categories, the words that overlap are considered twice in each of the corresponding clusters.In other words, these words will be counted twice, the first time in the first cluster and again in the second cluster, that is, in the second category.For instance, in the sequence mail, mass, mansion, melon; the word mansion is counted in the cluster of mail, mass, mansion (words starting with the same two letters) and in the cluster mansion, melon (words that rhyme), scoring two clusters, one with two words and another one with one word.
In cases where smaller clusters are encompassed by larger clusters, only the largest cluster should be considered, encompassing all words.For example, in the sequence hippo, monkey, camel, and rabbit the last two words could form a cluster of breeding animals, but because they may also belong to the group of wild animals, they are considered only for the latter, totaling a cluster of three words.Thus, in these cases, we should always choose the category that encompasses more words (Troyer et al., 1997).

a)
Phonemic verbal fluency: phonemic clusters are considered successively retrieved words that share the categories presented in Table 1.b) Semantic verbal fluency: semantic clusters are considered successively retrieved words belonging to the same semantic categories.These words were arranged as described in Table 2. Importantly, there are words that can belong to more than one category, such as camel, which appears in the categories wild and breeding animals.In such cases, the words preceding and following the generation of the word camel in the sequence should be taken into account to decide whether that word will be counted in wild or breeding animals, or even in both, as explained above in the criteria for cluster formation.

Final Considerations
The present study describes the steps included in the analyses of clustering and switching in PVF and SVF tasks adapted to the Brazilian context.We also assessed the inter-rater reliability of this adaptation.The results of inter-rater agreement were satisfactory for both VF tasks (PVF and SVF), demonstrating that the adaptation performed to determine the clustering and switching scoring was consistent, at least considering the context of data collection using a sample of children.We attempted to contribute to the analysis of the cognitive components underlying VF performance to assist in refining the diagnosis and prognosis of neuropsychological performances.Up to date, the Brazilian studies using this method (Becker et al., 2014;Bertola et al., 2014;Brucki & Rocha, 2004;Lopes et al., 2009;Silva et al., 2011;Toazza et al., 2014) to answer the research questions did not describe the scoring rules in detail so as to enable accurate replication.
The original study by Troyer et al. (1997) is available in the international literature, providing a detailed description of this method development.The authors described the procedures required for the analysis of the following variables: the raw number of clusters, the mean cluster size, and the raw number of switches, as well as the categories considered for the phonemic analysis in the PVF task and the semantic analysis in the SVF task.However, because the study was conducted in Canada, the categories used required adaptation to the Brazilian context because cultural factors have an influence on the performance of VF tasks (Oberg & Ramírez, 2006).
Other authors have sought to expand the qualitative scoring system created by Troyer et al. (1997).Abwender, Swan, Bowerman, & Connolly (2001) extended the system used by Troyer et al. (1997) including other categories and using phonemic and semantic clusters in PVF tasks.Furthermore, these authors argued that switching between single words (hard switching) reflects only an inability to perform clustering, whereas switching between clusters (cluster switching) reflects cognitive flexibility and switching skills, thus distinguishing between these two scores.However, Ross et al. (2007) compared the two methods (Troyer et al., 1997;Abwender et al., 2001) to check the reliability and validity of the qualitative analysis methods.The authors found good reliability rates related to the results of the original study by Abwender et al. (2001); however, there was no evidence that this method of analysis was better than the method proposed by Troyer et al. (1997).Therefore, we chose to keep the scoring used in the original method.Nevertheless, future studies should further investigate the reliability and validity of the method because no consensus has been reached (Ross et al., 2007).
Our study describes in detail the steps required for the analysis of clustering and switching scoring adapted to Brazilian Portuguese with strong inter-rater correlation, demonstrating the reliability of the method.However, there are some limitations in our adaptation.It is worth mentioning that our results should be used with caution in adult populations, considering that the data we used for the analysis of clustering and switching components are derived from a children sample aged 6-12 years.These children are still developing their language skills (especially in terms of vocabulary) and the maturation of brain structures that mediate the processes related to VF and language.In addition, phonemic analysis in the SVF task and semantic analysis in the PVF could also have been explored in order to adapt to the Brazilian context.International studies have also used this type of analysis when investigating the components of clustering and switching (Sauzéon et al., 2004;Tallberg et al., 2011).However, the adaptation of the method carried out in our study is the initial step to reinforce the use of PVF and SVF tasks, which is important to stimulate other researchers to further develop adaptations of appropriate methods for the Brazilian population.
It is noteworthy that VF tasks consist of a quick and cheap method for clinical use; nevertheless these tasks offer in-depth analyzes showing the functioning of complex cognitive processes.In that sense, there is need of analyses that go beyond the overall score of the task, and the clustering and switching scores proves helpful to achieve deep understanding of the task results (Troyer, Moscovitch, Winocur, Alexander et al., 1998).Information about the cognitive processes underlying the overall performance can contribute to further studies because it is possible to understand normal cognitive functioning and brain-behavior relations, thus providing better diagnosis and treatment of neuropsychological deficits (Rapp & Goldrick, 2006).
In future studies, we intend to compare the results of this adaptation with computer programming in order to get a new measure of reliability, a process that could facilitate the work of researchers when scoring clustering and switching variables.In order to achieve that, a syllable separation system is used, i.e., the syllables of each emission are separated automatically using the system proposed by Oliveira (2007).After this step, two different approaches are used to accomplish clustering.In the first approach, the researchers check if the first syllable of an emission is equal to the first syllable of the following emission.In the second approach, the researchers check if there is rhyme between two consecutive emissions.Also, the data can be modeled as a network by analyzing graphs, using computing techniques (Albert & Barabasi, 2002).This method has been used to analyze word association (Steyvers & Tenenbaum, 2005;Zortea, Menegola, Villavicencio, & Salles, 2014), as well as in studies of clustering and switching analysis in the SVF tasks of adults with right hemisphere lesions (Becker et al., 2014) because it provides greater understanding of the semantic associations between words and lexical-semantic differences between individuals.

Table 1
Criteria for Phonemic Clustering Score in the PVF Task

Table 2
Criteria for Semantic Clustering Score in the SVF Task

Table 3
Results of the Intraclass Correlation Coefficient Analysis in the Scores of PVF and SVF TasksNote.Total nº. of words generated = total number of words generated; Nº of clusters = raw number of clusters; Nº of switches = raw number of switches; CI = confidence interval.