BOT300 Home Page

Cladistic analyses 1 (1-Apr-03 and subsequent lectures)


Both cladistic analyses and the phenetic analyses described last week start with a matrix that consists of scores on a suite of characters for a sample of objects (operational taxonomic units or OTUs; taxa, specimens, etc.). However, phenetic analyses typically proceed to taxonomic conclusions by way of a resemblance matrix describing overall character covariation (R-mode) or OTU (dis-) similarity (Q-mode). Cladistic analyses proceed directly from the characters (and the distribution of character states across the sample) to a taxonomic conclusion. Moreover, while the classifications that result from phenetic analyses can be seen as following in a tradition established by Adanson and, to some extent, Bernard de Jussieu, aimed at producing a natural system of classification that may succeed in reflecting phylogenetic relationships. In contrast, classifications derived from cladistic analyses are explicitly phylogenetic ones.

Filiation of manuscripts

Just as phenetic analyses are linked philosophically to 18th century efforts to achieve a natural classification, cladistic analyses have antecedents in the study of ancient writings, notably manuscript copies of classical authors and of Christian sacred texts. A 15th century study of letters by the Roman writer Cicero concluded that an error in an early copy that was found also in several later copies demonstrated that the earlier copy was the ancestor of these later copies (Robinson & O'Hara 1996). Similar work in the 18th and 19th centuries led to detailed comparisons, by means of which it was argued that the original texts could be inferred from the patterns of changes and errors found in a sample of copies (see www.earlham.edu/~seidti/iam/text_crit.html for an example). In the last 35 years students of ancient texts have adopted explicitly cladistic methods (e.g. Robinson & O'Hara 1996), and have even been able to expand their scope by focusing on the problem of contamination (in biological systematics, lateral transfers; also reticulation? - Ragan & Lee 1991).

[Back to TOP] 

Character polarity & Hennigian analysis

Cladistic analysis was the brainchild of the German entomologist, Willi Hennig (1913-1976). Hennig sought a method with which to deduce the phylogenetic relationships required in order to produce a classification that reflected phylogeny (a phylogenetic classification). According to Hennig, phylogenetic relationships were to be inferred not from similarities but rather from sister-group relationships defined by shared, derived character-states (synapomorphies). Synapomorphies are analogous to the changes and errors found in manuscript copies; they represent new features shared by descendants of a common ancestor.

Since characters are fundamental to cladistic analysis, how are they to be chosen? The critical issue is one of homology, as opposed to analogy; are you looking at the same thing in each of the study objects? In the first case, in morphological comparisons, homologous structures resemble each other in origin and composition, but might differ in function. In the second case, analogous structures differ in origin and composition, but are similar in function. Bell (1991) provides the following botanical example.

In the same way, the states of a character must also be homologous. For example, some workers considered that vessels are found in members of the gymnosperm group Gnetales. Other workers dispute this, saying that the character state, "vessels present" is not the same thing in the Gnetales and in the flowering plants, because the structure and development of these xylem elements differ between the two groups.

These issues are hardly unique to cladistic analysis. Comparing homologous characters across all of the objects in a study is equally important in phenetic analyses.

How are characters to be polarized, so that one knows which of two (or more) states is the ancestral one? The most widely accepted method is outgroup comparison, whereby the character state found (immediately) outside the group under study is considered to represent the ancestral state (typically coded as 0, while the derived state is coded as 1). Click HERE to go to a page from the University of Tennessee with an exercise involving outgroup comparison and character polarities. Note that this page includes links to other pages with lecture material that will help you do the exercises correctly.

The distribution of the states of each polarized character across the study objects provides a phylogenetic hypothesis. The next task is to integrate the hypotheses implied by an entire data matrix. Two methods for doing so are illustrated HERE. Both result in branching diagram describing the overall phylogenetic hypothesis supported by the data.

[Back to TOP] 

Character compatibility

Consider the data matrix below (Meacham 1981), representing the distribution of the states of three characters (1-3) across four taxa (a-d) and their common ancestor, x.

1 2 3
a

A

A

B

b

A

B

A

c

B

A

B

d

A

B

A

x

B

A

A

Draw the three character-state trees. Now begin combining the trees by adding the character-state transitions that you've discovered for, say, character 2 to the tree for character 1. Do the same for characters 2 and 3. Can you make a single tree on which all three characters agree? The agreement or disagreement that you discover is known as character compatibility (or incompatibility). In larger, real-life data sets it can be useful to have a formal way of determining the extent of character compatibility. Theoreticians have shown that incompatibilty arises if and only if all four combinations of character states are present in any given comparison of two characters. The table below formalizes this for our data matrix.

1

2

AA

AB

BA

BB

2

3

AA

AB

BA

BB

AA

AB

BA

BB

Characters 1 and 2, and 2 and 3, are pairwise compatible, but characters 1 and 3 are incompatible (all four combinations of the states A and B occur in the original data matrix). Each of these pairs of compatible characters is called a clique. In class we will review the formalism for tree construction using cliques of compatible characters. This can also be reviewed HERE using a page from Mark Wilson's systematics course website (his lecture notes on characters and character compatibility are found HERE). Another site with a java applet for finding cliques of compatible characters is available HERE (you may find that you need to view this page with Internet Explorer for this applet to work).

Building a phylogeny on the basis of one clique of mutually compatible characters has come to be seen as an unnecessarily restrictive condition for phylogeny reconstruction. Moreover, it begs the question of how we are to understand alternative phylogenies based on other (smaller) cliques. Nevertheless, character compatibility methods can be extremely useful not only as a means of evaluating the data available for a phylogenetic study, but also as a tool in other kinds of work. For example, van der Hulst et al. (2000) examined binary molecular fingerprint data (presence/absence of polymorphic amplification fragments) for evidence of exclusively clonal reproduction using character compatibility. Pairs of incompatible fragments were taken as indicating the occurrence of recombination, hence of sexual reproduction.

[Back to TOP] 

Parsimony

We have seen that a dataset can support more than one phylogenetic hypothesis. How then are we to choose among the alternatives? Early on, when computer-based cladistic methods were first being developed, it became clear that there are a huge number of possible trees for even modest numbers of OTUs (e.g. over 2x106 for merely 9 terminal taxa, and almost 17 times more for 10 terminal taxa). The solution to this problem follows the principle of parsimony that was suggested by the 14th c. Franciscan monk, William of Occam: the hypothesis requiring the fewest assumptions is most likely to be true. In the case of phylogenetic trees, the assumptions involved are the character state changes that must be placed on a tree to account for the distribution of character states observed in the study group. Other solutions to this problem exist, such as Maximum Likelihood, but for our purposes discussion of how to choose among alternative phylogenies will be restricted to the criterion of Maximum Parsimony.

Consider the following dataset (Wiley et al. 1991),

Characters
Taxon 1 2 3 4 5 6 7
X 0 0 0 0 0 0 0
M 1 1 0 0 1 1 1
N 1 1 1 1 1 1 1
O 1 1 1 1 0 0 0

and give yourself the exercise of building the seven character trees for this dataset, and figuring out which characters are compatible. After assembling the tree for all characters you should have something like each of the following two trees. Which is the correct phylogeny?

Count the number of character state changes in each tree. On the left there are only 9, whereas on the right there are 10. The principle of parsimony suggests that the tree on the left is slightly more likely to be true. This difference can be expressed using a consistency index (C.I.), calculated as the total number of character state changes implied by the dataset (in this case 7, one for each of the 7 binary characters) divided by the total number of state changes required to evolve the characters on the tree. In the examples above, the C.I. takes the following values: C.I.LEFT = 7/9 = 0.78, and C.I.RIGHT = 7/10 = 0.70. What would the C.I. be for a tree based on a clique of compatible characters?

Note the synapomorphies (in red) for {O, N, M} (characters 1-4), and for {N, M} (characters 5-7). Note also that reversals (in blue) in characters 3 and 4 are required as autapomorphies for M. Note the synapomorphies (in red) for {O, N, M} (characters 1-2), and for {O, N} (characters 3-4). Note also that convergence or parallel evolution (in green) has resulted in characters 5-7 being required as autapomorphies for M and N.

In converting the trees above into a classification, the following concepts are worth mentioning:

[Back to TOP] 


Further reading

Bell, A. D. (1991). Plant form - an illustrated guide to flowering plant morphology. Oxford, Oxford University Press.

Hull, D. L. (1988). Science as process. Chicago, University of Chicago Press.

Judd et al. (2002), ch. 2.

Meacham, C. A. (1981). A manual method for character compatibility analysis. Taxon 30: 591-600.

Ragan, M. A. & A. R. Lee III (1991). Making phylogenetic sense of biochemical and morphological diversity among the protists, pp. 432-441 in The unity of evolutionary biology, Proceedings of the 4th International Congress of Systematic and Evolutionary Biology, Vol. I, E. C. Dudley (ed.). Portland OR, Dioscorides Press.

Robinson, P. M. W. & R. J. O'Hara (1996). Cladistic Analysis of an Old Norse Manuscript Tradition. Research in Humanities Computing 4: 115137.

Sneath, P. H. A. & R. R. Sokal (1973). Numerical Taxonomy. San Francisco, W. H. Freeman.

van der Hulst, R. G. M., Mes, T. H. M., den Nijs, J. C. M. & K. Bachmann (2000). Signatures of both sexual and asexual reproduction in triploid dandelion Taraxacum Weber) populations based on AFLP fingerprints. Molecular Ecology 9: 1-8.

Wiley, E. O., Siegel-Causey, D., Brooks, D. R. & V. A. Funk (1991). The compleat cladist. University of Kansas Museum of Natural History Special Publication No. 19.


|BOT300S Home Page | U of T Botany | University of Toronto |

© 2003 Botany Department, University of Toronto.

Please send your comments to tim.dickinson@utoronto.ca; last updated 06-Apr-2003