**Simple
bivariate plots** can often be extremely effective
for small datasets. The plot below is based on a famous dataset consisting of
four flower measurements for three species of *Iris* collected in the Gaspé
(Anderson 1935; Fisher 1936).

This plot was made using Rweb. Run your mouse over the to see the instructions used to create this plot.

**With even as
few as four measurements** it can be difficult to take in all of the (3*4)/2
pairwise relationships between these characters. In fact, however, there are
methods available that allow us to efficiently summarize multivariate data so
as to see these relationships very clearly. These methods are based on the matrices
of R- and Q-mode resemblances discussed on
25 March 2003.

R-mode analyses

**Principal
components analysis** (PCA) allows us to rotate our multidimensional data
so as to see the directions (say 2 or 3 at a time) in which it varies most.
It is based on eigenanalysis of a symmetric dispersion matrix, such as a matrix
of variances and covariances, or a matrix of correlations. In effect, eigenanalysis
examines the shape of the cloud of data points in hyperspace (a space of more
than 3 dimensions) and finds eigenvalues that describe the maximum dispersion
of the datapoints in a succession of orthogonal (at 90° to each other) directions.
Eigenanalysis also finds vectors (eigenvectors) corresponding to each eigenvalue
that allow us to rotate our data into a new coordinate system in which their
dispersion is maximized (= the eigenvalues).

**Consider the
following** 2-dimensional data, created using Rweb.

This plot was made using Rweb. Run your mouse over the to see the instructions used to create this plot.

**It's pretty
clear** in which directions the data vary most, and that these directions
are not the same as those of the x- and y-axes. A PCA of these data rotates
this cloud of points into such a coordinate system.

This plot was made using Rweb. Run your mouse over the to see the instructions used to create this plot.

**PCA of the
Iris data** when done up fully will look like this. Note the biplot
of PCA scores and component-character correlations, and the screeplot showing
the relative magnitudes of the eigenvalues. These features are described HERE.

Q-mode analyses

**Eigenanalysis**
of a *Q-mode resemblance matrix* can be used in a similar way, to obtain
a Principal
Coordinates Analysis (PCoA) of the data. The results of PCA and PCoA will
tend to be identical, to the extent that the original datasets meet the assumptions
of each method. In general, PCA is restricted to ratio scale metric data for
which character covariation and correlation are meaningful. PCoA can be applied
much more widely, since resemblance functions are available for most data types,
and for mixed data (handout, 25 March 2003).

**Cluster analysis**
is the other major type of Q-mode analysis. This involves representing the distances
between objects (OTUs) in the form of a tree diagram (dendrogram). Once discontinuities
in the data have been detected by means of an ordination method like PCA or
PCoA you can then use an algorithmic method to sort objects into groups on the
basis of their resemblances to each other. Typically, this is done in a bottom-up
agglomerative manner, but with increased computational power it is now practical
to also do this sorting top-down, or divisively, if that is more appropriate.

This plot, like the biplot and screeplot above, was created using S-Plus, the commercial version of the S language. Clustering of this kind can also be done using R and Rweb by attaching the cluster package ("library(cluster)").

R is a freeware package available for multiple platforms (as well as on the web) that implements the original S language for data analysis and graphics that was developed at the Bell Laboratories in Murray Hill NJ. There is also another R package for multivariate data analysis that is also distributed as freeware from the University of Montréal.

Anderson, E. (1935). The irises of the Gaspé Peninsula, Bulletin of the American Iris Society, 59, 2-5.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179-188.

Judd et al. (2002). Box 6C, ch. 6 [PCA as a tool in studying hybridization].

Legendre, P. & L. Legendre
(1998). *Numerical Ecology,* 2nd ed. New York, Springer-Verlag.

Manly, B. F. J. (1994).
*Multivariate statistical methods - A primer,* 2nd ed. London, Chapman
& Hall.

Podani, J. (2000). *Introduction
to the exploration of Multivariate biological data.* Leiden, Backhuys.

|BOT300S Home Page | U of T Botany | University of Toronto |

**© 2003 Botany Department, University
of Toronto.**