Maximum Parsimony tutorial using PAUP
Step 1: Downloading and installing software
For this tutorial the programs we will use are SeaView, PAUP, and the text editor of your choice. SeaView has many uses, including:
- Viewing molecular sequences
- Algorithmic alignment of molecular sequences
- Manually editing and alignment of molecular sequences
- Estimating phylogenetic trees from molecular sequences
- Viewing phylogenetic trees
If you are running Windows or macOS, you can download the latest version of
SeaView from the SeaView web site. If you are running Ubuntu, then
SeaView is available from the package manager. You can install it from the
“Ubuntu Software” GUI, or manually using apt install seaview
.
PAUP is used to infer trees from molecular data, and incorporates many different methods and models for doing so. These include:
- Maximum parsimony
- Maximum likelihood
- Distance based methods like neighbor-joining
- SVDquartets, which is statistically consistent with the multispecies coalescent
If you are running macOS or Linux, please download the latest command line
version of PAUP for your platform from the PAUP test-version downloads
web site. Extract PAUP, and make sure the program is executable by opening the
command line, navigating to the directory it was stored in, and running
chmod +x paup4a164_ubuntu64
on Ubuntu or chmod +x paup4a164_osx
on
macOS. If you are running Windows, download the Windows GUI version from the
same web site.
If you do not have a favorite text editor already, I recommend Sublime Text or Visual Studio Code. You can download and install either program from their respective web sites.
After downloading the software, download the workshop materials archive to your computer, and extract its contents.
Step 2: Exploring the true tree and sequence data
Launch SeaView, and then open the fz.tree
file in the
phylogenetics-workshop
folder. This will show you an ultrametric tree that
was randomly generated for this workshop (using a coalescent model).
Still in SeaView, open the fz.nexus
multiple sequence alignment file. This
is a 100,000 character alignment generated based on the tree you just opened,
and using a Jukes-Cantor model of molecular evolution.
Step 3: Inferring the maximum parsimony tree with PAUP
We will use PAUP to infer a phylogenetic tree. Open the command line on your
computer, and navigate to the extracted phylogenetics-workshop
folder. On
Windows, run paup fz.nexus
. On macOS or Linux, replace paup
with the
path to the PAUP executable on your computer. For example if you saved it
to the Downloads folder on a Mac, this might be ~/Downloads/paup4a164_osx
.
Run the following lines of PAUP code:
Set Criterion=Parsimony;
This tells PAUP that the parsimony score of a tree should be used to judge its goodness of fit.
BandB;
This command will identify the best fitting tree according to the parsimony criterion. Normally we have to use some kind of stochastic algorithm like hill-climbing or MCMC to infer trees, as the number of possible trees is so large. Because this data set is relatively small (100,000 sites and 12 taxa), we can instead use an exact “branch-and-bound” algorithm.
SaveTrees file=mp.tree replace=yes;
Save the inferred tree as a file with the name mp.tree
.
Quit;
Should be self-explanatory.
Step 3: Exploring the inferred tree
Open the inferred tree in SeaView. Make sure the true tree is still open. The kind of inference we used produces an unrooted tree without branch lengths, so you may have to reroot it or rotate nodes in SeaView. Experiment with the “Swap” and “Re-root” options in SeaView so that the trees match.
What if any nodes are different between the truth and the estimated tree topology?