On the rOle Of assumptiOns in cladistic biOgeOgraphical analyses

The biogeographical Assumptions 0, 1, and 2 (respectively A0, A1 and A2) are theoretical terms used to interpret and resolve incongruence in order to find general areagrams. The aim of this paper is to suggest the use of A2 instead of A0 and A1 in solving uncertainties during cladistic biogeographical analyses. In a theoretical example, using Component Analysis and Primary Brooks Parsimony Analysis (primary BPA), A2 allows for the reconstruction of the true sequence of disjunction events within a hypothetical scenario, while A0 adds spurious area relationships. A0, A1 and A2 are interpretations of the relationships between areas, not between taxa. Since area relationships are not equivalent to cladistic relationships, it is inappropriate to use the distributional information of taxa to resolve ambiguous patterns in areagrams, as A0 does. Although ambiguity in areagrams is virtually impossible to explain, A2 is better and more neutral than any other biogeographical assumption. Key-WOrds: Assumption 2; Brooks Parsimony Analysis; Cladistic Biogeography; Com‐ ponent Analysis; Vicariance.


IntroductIon
Cladistic biogeography aims to discover biogeographical congruence among areagrams (sometimes called area cladograms) based on the assumption that there is a direct correspondence between cladistic and area relationships (Nelson & Platnick, 1981;Morrone & Crisci, 1995;Humphries & Parenti, 1999;Crisci, 2001;Ebach, 2001;Santos & Amorim, 2007).The procedure begins by replacing the terminal taxa on a cladogram with the areas in which they occur: the result is an areagram.Although the areagram resembles a cladogram, it only represents the relationships among areas.When added together, a set of geographical patterns may reveal a single common pattern, that is, a general areagram.It is the result of the congruence among individual areagrams, allowing for interpretation of a common geographical history.The aim of cladistic biogeography, therefore, is to discover biogeographical congruence among areagrams.
As pointed by Ebach (2001), Ebach & Humphries (2002) and Ebach & Williams (2004), both cladistic analysis and cladistic biogeography are about finding congruent patterns: the former related to character distribution in topologies, and the later to taxonomic distribution in space.According to cladistic biogeography, the first explanation for the coincidence among different areagrams is that there exists a strong correlation between the evolution of space and the evolution of biotas within it; i.e., the coincidental relationships among areas in distinct areagrams are not due to chance only, but reveal Volume 51(##):###-###, 2011 underlying common causes.The cladistic approach to biogeography focuses on information about area relationships contained in one or more (taxonomic) cladograms (Nelson & Ladiges, 1991).Some cladistic biogeographical methods deal with incongruence in areagrams using the distributional information of taxa, as Brooks Parsimony Analysis (BPA: Wiley, 1986Wiley, , 1988aWiley, , 1988b;;Brooks, 1985Brooks, , 1990;;Brooks et al., 2001Brooks et al., , 2004)), but some consider only the area relationships revealed by the areagrams, such as component analysis (proposed by Nelson & Platnick, 1981) and paralogy free-subtrees (Nelson & Ladiges, 1996, 2003).
When different taxa reveal identical area relationships, a general historical pattern is said to be shared by these taxa.The real world, however, is much more complex.There are few examples of completely congruent patterns of area relationships derived from different taxa because ambiguity is common in biogeographical reconstructions.It prevents the identification of general patterns, obscuring the relationships among areas.Thus, the depicted historical pattern is often vague, poorly solved, and unreliable.The sources of incongruence are many: multiple areas on a single terminal-branch (MAST), paralogous nodes (redundant areas, when different areas have the same taxa), missing areas (when, in comparison with other patterns, there is no species distributed in a certain area, or areas), and inadequate methods (Nelson & Ladiges, 1996, 2003;Humphries & Parenti, 1999;Espinosa-Organista et al., 2002;Crisci et al., 2003;Ebach et al., 2005;Parenti & Ebach, 2009).The origin of a barrier or the split of an area without speciation, as well as random dispersal, extinction, and sympatric speciation are some of the probable causes of incongruence in biogeographical patterns.Cladistic biogeography, however, is silent about the causes of ambiguity, and it cannot be implemented to choose between vicariance, dispersal, and any other kind of explanation.Cladistic biogeography relies on pattern analysis, the next step being the interpretation of such patterns under a given causal scenario.In the words of Ebach & Humphries (2002:429-430), "… cladistic biogeographical methodology may provide evidence for or against geographical congruence, rather than recreate a scenario of earth's biotic history … [It] aims to discover geographical congruence, rather than generating its presence".
In methods such as BPA, Phylogenetic Analysis for Comparing Trees (PACT: Wojcicki & Brooks, 2005), Component Analysis, and three-item statement analysis (Nelson & Ladiges, 1995), theoretical terms called 'Assumptions' are used to interpret and resolve incongruence (ambiguities) in order to find general areagrams.There are three Assumptions, A0 (Zandee & Ross, 1987), A1, and A2 (Nelson & Platnick, 1981) (Figure 1; see description below).The aim of this paper is to suggest the use of A2 over A1 and (especially) A0 in solving biogeographical problems.An analysis of a theoretical example in which the history of the areas is previously known is performed to illustrate the behavior of A0 and A2 when facing biogeographical uncertainties (A1 will not be tested because of its incompleteness when compared to A2).

biogeographical assumptions
Under A0, multiple areas on a single terminalbranch (MAST) are always considered to form a clade because the presence of a widespread taxon is treated as a "synapomorphy" of the set of areas it habits, which means that the distributional information of the taxon resolves the conflict presented in the areagram (Figure 1).Vicariance is the first-order explanation (van Veller et al., 2000).A0 considers widespread distribution as the result of a failure to speciate in response to vicariance events affecting other populations or species in the same area.According to van Veller et al. (1999:397), widespread taxa are "… the result of isolation or break-up without yet triggering speciation".
Under A1, MASTs could form monophyletic or paraphyletic groups of areas (Figure 1).The widespread distribution is seen as the result of a failure to vicariate, possibly in combination with succeeding extinction.In the areagram, the unambiguous area relationships are maintained, and the conflicting areas are positioned on every node within the areagram (Nelson & Platnick, 1981).
Under A2, MASTs may constitute poly-, paraor monophyletic groups of areas (Figure 1).To explain widespread distributions, A2 allows extinction, dispersal, failure to vicariate, or any combination of these events.A2 attempts to solve the problem of MASTs by trying all possible combinations of area relationships, providing the greater possibility to elucidate conflicting distributional patterns (Nelson & Platnick, 1981;Ebach, 2001;Ebach & Humphries, 2002).Even the unambiguous relationships in the areagram can be modified, since the conflicting areas are positioned within all the different nodes during areagram searches.
Each occurrence of a redundant distribution is considered as equally valid (representing duplicated area patterns) under A0 and A1.Under A2, each occurrence of redundant distributions is taken separately.Missing areas are treated as missing data under A1 and A2, and explained by primitive absence, extinction or inadequate sampling.A0 considers missing areas as true absence due to primitive absence or extinction.

A theoretical example
The vicariance model predicts whether a group of organisms: (1) had a primitive cosmopolitan distribution (i.e., whose ancestors were widely distributed in a certain area); (2) had responded to the geological or ecological vicariance events that occurred (i.e., to every barrier that appeared) after the origin of its ancestors; (3) had undergone no extinction; and (4) had undergone no dispersal.It is possible, by reconstructing the interrelationships of its members, to describe a detailed spatial history of the group's ancestors and their ancestral areas (Nelson & Platnick, 1981).Simulations and models provide a context in which the phylogeny and complete biogeographical history are known with certainty.Obviously, simulated data sets do not match the complexity of real world examples, and generalizations from a specific case are problematic issues.However, such unrealistic simplicity helps to understand the general mechanisms and analytical tools that influence phylogenetic accuracy (Wiens, 2006) and, in general, biogeographical accuracy.
The hypothetical example of Figure 2 illustrates this point of view.At time zero, species 1 is widely distributed in area A (Figure 2A).The first disjunction event separates ancestral area A into two areas, B and C. Consequently, there is a cladogenetic event, and ancestral species 1 gives rise to species 2 and 3 (Figure 2B), the first species distributed in area B and the latter distributed in area C. The second disjunction event separates ancestral area C into two areas, D and E -area B is not affected and, thus, remains with the same endemic taxon (species 2).The disjunction causes a cladogenesis, and ancestral species 3 gives rise to species 4 and 5, respectively distributed in areas D and E (Figure 2C).The third disjunction event separates ancestral area E into two areas, F and G (area D is not affected).This vicariance event splits ancestral species 5 into two different species, 6 and 7, the first species distributed in area F and the latter in area G (Figure 2D).According to this hypothetical example, the cladistic relationships among extant species are represented by the cladogram (2(4(6,7))).The sequence of splits resulting in the actual pattern of area relationships (Figure 2D) is given by the areagram (B(D(F,G))), which describes the history of the areas since the first disjunction event.The purpose of any cladistic biogeographical method should be to recover -which means to discover, and not to generate or create -precisely such kind of pattern.
However, biological evolution is a complex set of interrelated episodes, some of them unpredictable, often obscuring the real history.The addition of some ambiguities to the hypothetical scenario simulates the complexity and randomness of the natural world.Given the previous sequence of splits above (Figure 2), a population of species 6 had dispersed from area F into area B (Figure 2E) after the third vicariance event.Species 6 is now distributed in two different areas (B and F), and considered widespread (substituting the taxon in the cladogram for the areas it inhabited results in a MAST).Thus, based on the cladogram and on the current distribution of species, the pattern of area relationships is (B(D(BF,G))) (Figure 2F).This areagram does not directly reflect the real history of disjunctions but it is the only pattern that the evidence reveals, since we do not know a priori the past events that shaped the region.The presence of a MAST (represented by an underscore) is a source of ambiguity -it allows to more than one possible meaning -and prevents the discovery of completely resolved areagrams.It is the aim of biogeography to elucidate this ambiguity or, even better, to extract from it some useful area relationships.At this point, A0, A1, and A2 are made necessary.

recovering historical patterns
The observable pattern (Figure 2F) has an ambiguity caused by the widespread taxon 6.Under A0, the presence of taxon 6 in both areas B and F is taken as a "synapomorphy" shared by these areas, "resolving" the MAST through the addition of a "character" shared by the two conflicting areas (Figure 3A).A0 does not allow for any removal of information (Zandee & Ross, 1987; see also Brooks et al., 2001), but creates a new relationship where there once was only ambiguous information.The result is the areagram (B(D(G(B,F)))).Despite the "resolution" of the widespread taxon problem, the general pattern resulting from the application of A0 shows another conflict: the redundancy (paralogy) of area B, simultaneously the sister-group of area F and of all the remaining areas (Figure 3A).Both occurrences of redundant distribution are equally valid under A0, representing duplicated areas.
The analysis under A2 of the observable pattern in Figure 2F leads to different scenarios.A2 allows conflicting areas to be positioned in every node of the areagram, and each occurrence of redundant distributions is considered independently.From eight possible solutions, two of them are identical to the areagram (B(D(F,G))) (Figure 2B).
A0 and A2 produce different solutions to the pattern with ambiguities.The analysis under A0 presents an areagram (B(D(G(B,F)))) which is different from the real pattern of disjunctions (Figure 2D).For example, an ancestral area B+F never existed during the history of land breaks of the hypothetical example.A0 simply did not find the real pattern.In fact, with this assumption a spurious relationship was added to the already problematic observable pattern.Under A2, in contrast, an areagram depicting the exact sequence of splits from time zero to the last disjunction event (Figure 3B) is among the several possible solutions to the MAST in the observable pattern.In this particular hypothetical case, A0 is not able to extract the 'true history' from an ambiguous pattern.In the search for a common pattern, the addition of areagrams derived from other distinct taxa is needed, since "congruence is the main target of comparative biology" (Santos & Capellari, 2009, p. 410).Geographical congruence within two or more areagrams strongly suggests the existence of a common cause rather than numerous independent causes (Nelson & Platnick, 1981;Llorente et al., 1996;Amorim et al., 2009;Crisp et al., 2011).

component analysis and bPA
Component analysis derives sets of fully resolved areagrams from taxon cladograms, applying biogeographical assumptions to solve ambiguity (Nelson & Platnick, 1981;Page, 1988Page, , 1989Page, , 1990;;Morrone & Crisci, 1995;Humphries & Parenti, 1999;Espinosa-Organista et al., 2002).It includes A0, A1 and A2.The aim of this method is to obtain a classification of areas despite the unavailability of fully resolved (non-conflicting) biogeographical information (Nelson & Platnick, 1981).The intersection of the sets of areagrams is taken as the general areagram (the common pattern) or, when intersection leads to more than one areagram, a consensus tree is constructed.
Brooks Parsimony Analysis (BPA) (Brooks et al., 2001), as well as its developments (secondary BPA and modified BPA) tries to resolve biogeographical ambiguity via a generational procedure that uses cladistics for describing evolutionary scenarios rather than simply determining the relationships of areas (Ebach & Humphries, 2002).Following the application of A0, each node of the areagrams is codified as an entry in an area versus taxon matrix, used to derive general areagrams of minimal length employing a maximum parsimony algorithm.
One way or another, both component analysis and BPA deal with ambiguity.Herein, they were used to analyze the following situation.In Figure 4A, the observable pattern is represented by the areagram (D(BF,G)), with taxon 6 distributed in both areas B and F (a MAST).In this scenario species 6 dispersed from area F into area B, and species 2 was extinct in the invaded area.Figure 4B shows an areagram in which area F is missing -the observable pattern is (B(D,G)).In Figure 4C, there is a redundant distribution, with the duplication of area B in the areagram (B(D(F(B,G)))).
Although not obvious, there is a common pattern valid for all the described situations.It is possible to extract the general pattern from a combination of these three problematic distributions, regardless of the assumption used for such a task.Nevertheless, the simple agreement among areagrams does not guarantee the reliability of a common pattern or its biogeographical relevance as a description of the disjunction events that shaped current distributions (Crisp et al., 2011).
Through primary BPA, the MAST is "resolved".Under A0, the nodes of the areagrams (Figures 5A, 5B and 5C) are codified as entries in an area versus taxon matrix (Figure 5D).To polarize data, a hypothetical out-group with all zeros is added (van Veller et al., 2000).The primary BPA resulted in two equally parsimonious solutions (two general areagrams), the areagrams (D(G(B,F))) and (D(F(B,G))) (Figure 5E).Both general areagrams are not in accordance with the sequence of land breaks and cladogenetic events presented in Figure 2A-D and therefore do not represent the 'real history' of ancestral area A. In this example, a general pattern consistent with the hypothetical scenario (Figure 2D) is obtained only with component analysis after A2 (Figures 6A, 6B, and 6C).
In this theoretical example, A0 and BPA generate new area relationships (Ebach, 2001;Ebach & Humphries, 2002;Siddall & Perkins, 2003;Siddall, 2004Siddall, , 2005)).Although designed to discover geographical congruence, A0 and BPA add spurious information, resulting in even more conflicting and incongruent patterns.Moreover, A0 is in general limited to a vicariancist perspective and it negatively influences causal interpretations of biogeographical patterns (see geodispersal of Lieberman & Eldredge, 1996, for instance).It is generally accepted among historical biogeographers that dispersal explanations should not be used as first-order biogeographical explanations (e.g., Santos, 2007a, andAmorim et al., 2009), since they are untestable individual narratives.However, to ignore dispersal a priori and to assume it a posteriori (as in Secondary BPA) seems to deny (or at least to question) the relevance of dispersal to biogeography.between areas, not between taxa.An areagram is not a cladogram, and, as the representation of a certain biogeographical pattern, it yields little evidence regarding biogeographical processes (speciation, vicariance, dispersal, extinction) (Ebach, 2001).According to Nelson & Platnick (1981), the geographical relationships are not necessarily the same as cladistic relationships.For this reason, it is spurious to solely use distribution to resolve ambiguous patterns in areagrams, which is exactly what A0 tries to do.The presence of a taxon in more than one area is taken by A0 as evidence of an ancient relationship between these areas, and the ambiguity of the areagram, due to the presence of a MAST, appears to be 'resolved' by considering the areas as sister-taxa.This is not what happens under A1 and A2 as they both allow for other area relationships not strictly dependent on taxa.
The hypothetical scenario presented here is an instance of a general rule, and shows that the multiple solutions provided by A2 are more wide-ranging than the patterns generated by A0 and BPA, correctly leading to the reconstruction of the chain of events that result in the current observable pattern.Despite the simplicity of the example, A0 and BPA were not able to depict the 'real history', which casts a degree of doubt on their ability to deal with more complicated situations.Nevertheless, A0 rests as one of the essential elements of PACT, a method created by Wojcicki & Brooks (2005), as well as in primary, secondary, and modified BPA (Brooks et al., 2001).
Criticisms against BPA are rampant (Platnick, 1988;Nelson & Ladiges, 1991;Page, 1994;Siddall & Perkins, 2003;Siddall, 2004Siddall, , 2005;;Santos, 2007b; but see Brooks et al., 2004).According to Ebach & Humphries (2002), BPA is a method that uses cladistics for "describing evolutionary scenarios rather than determining the relationships of areas using cladistics" (Ebach & Humphries, 2002, p. 433).By treating species (or supraspecific taxa) as characters and areas as taxa, BPA causes spurious results, introducing area relationships on the basis of widespread distributions rather than sister-group relationships between areas.Secondarily, BPA is also controversial.The method tries to resolve ambiguity by duplicating redundant areas (Brooks et al., 2001) using a non-objective procedure (Siddall, 2005).The theoretical example presented here, in which the general areagrams resulting from A0 and BPA are completely different from the 'real history' of the hypothetical disjunction events, reinforces previous criticisms on A0 and BPA.
Despite the great number of possible solutions, A2 does not explain the sources of ambiguity.However, it is a much less restrictive assumption than A0.Along with methods such as component analysis, A2 can be very useful to find common (congruent) patterns among different areagrams.Certainly there are critics who question the reliability of results obtained through the available biogeographical methods; there is an ongoing debate and new methods and tools to depict the historical affinities among areas continue to arise.For example, philosophical issues such as reciprocal illumination and consilience (Santos & Capellari, 2009) should be considered.They are steps toward a less instrumentalist biogeography (based solely on the application of analytical methods, without considering the explanatory power of the resultant biogeographical hypothesis when compared to other taxonomic groups).
Regarding biogeographical assumptions, the perspective of Humphries (1989) on the subject is still applicable: A2 remains a powerful tool, allowing "an analytical escape from such accidental biological events as dispersal, extinction, and failures by taxa to respond to vicariance" (Humphries, 1989, p. 101), which are common in the investigation of the natural world.Although ambiguity in areagrams may be impossible to explain, A2 seems better and more neutral than any other biogeographical assumption.

FIgure 3 :
FIgure 3: Applying assumptions to the observable pattern.A: assumption 0, resulting in areagram (B(D(G(B,F)); b: assumption 2, resulting in eight areagrams, two of them identical to the actual disjunction pattern (B(D(F,G).

conclusIons
Despite some claims (van Veller et al., 1999), A0, A1 and A2 are interpretations of the relationships

FIgure 4 :
FIgure 4: The same hypothetical area with different biogeographical problems.A: widespread taxon; b: missing area; c: paralogy (redundant distribution).

FIgure 5 :FIgure 6 :
FIgure 5: Analysis of ambiguity under assumption 0 and BPA.A-c: each node of the areagram corresponds to an entry in BPA data matrix; d: area versus taxon matrix used in primary BPA; e: areagrams resulting from matrix analysis.