<<

1

PAUP tutorial - Macintosh

The following tutorial is based on the tutorial provided with the PAUP package (but modified to emphasise morphological analysis, and updated to refer to the current -line Mac version).

It provides a very brief overview of the basic usage of PAUP* 4.0. This tutorial was designed for people with no prior experience using PAUP. If you are already familiar with PAUP* then you will probably wish to skip this tutorial. We assume that users are familiar with basic phylogenetic terminology and specific issues. As you become experienced using PAUP, you will discover that there are many alternative ways to execute the operations described below. For obvious reasons, we have chosen not to describe all the possibilities in this tutorial; however, we encourage you to explore other menu and command-line options as your permits.

Several versions of PAUP* 4.0 are currently available. These versions fall under three general interface types: Macintosh PowerPC (now obsolete), Windows, and Portable (including current Mac). The Macintosh PowerPC interfac allows you to execute commands via menus and the command-line, but this version will no longer run on any current mac. The Windows and Portable (DOS, and current Mac) interfaces are almost entirely command-line driven. Some menu functions are available in the Windows interface; however, these functions mostly include and edit operations. This tutorial will use command-line syntax of the mac version.

While you will be entering commands separately (one a time) during this tutorial, note that PAUP can run in batch mode (where multiple commands can be combined into a single and processed sequentially by executing that command file). Some batch file examples are at the end of this tutorial.

In this tutorial, commands to be typed are in bold courier font. Paup is generally NOT case-sensitive, but other phylogenetic programs (notably BEAST) are. Beware.

2

1. EXAMINE THE DATA FILE

Choose your favorite Plain Text editor (e.g., TextWrangler, Textedit) to open the sample file named squamates.nex

Textwrangler is included in the tutorial folder but is also available here as a free download: www.barebones.com/products/textwrangler/

The data are morphological and molecular data for a range of lizards and snakes. In particular, many extinct marine lizards are included, in order to their possible affinities to snakes.

Scroll through the file. Note following:

The datafile consists of a character-by-taxon matrix of discrete traits, with additional Nexus “stuff” before and after.

Anything in square brackets [e.g. comments describing the Nexus ] are ignored by PAUP*.

The is divided into blocks of text, delimited by the words "begin" and "end". The word following "begin" defines the block- e.g. TAXON block.

In this example, the following types of blocks are used: ASSUMPTIONS, SETS and PAUP. As an example, the SETS block defines sets of characters and taxa, can then be readily excluded and included using simple PAUP commands. You can have several blocks of the same type, e.g. multiple PAUP blocks specifying different analyses, or multiple SETS blocks specifying different sets of taxa and/or chraracters.

There are, however, numerous other Nexus block-types. One of the advantages of the Nexus format is that applications will simply skip over blocks that they do not recognize. For a more detailed discussion of the Nexus format see Maddison, et al. (1997). For this example, you will not need to modify the original sample file.

Notice that spaces in taxon names must be replaced with an underscore character _. Also, PAUP* does not pay attention to the character case in taxa labels.

To a new PAUP datafile: You can enter new data matrices in Mesquite which will save the matrix as a nexus file readable by PAUP, see the Mesquite tutorial. Otherwise, you can use Excel or Word or similar program to generate a matrix. If using Excel or Word, export the Excel data as a space-delineated file; PAUP does not mind if there are spaces between characters in the matrix [but DOES require a space after the taxon name]: Mosasauria 1 0 1 0 and Mosasauria 1010 are both treated the same. After you export the bare matrix from Excel etc, you will then need to manually add the Nexus “stuff” before and after.

3

2. OPEN PAUP

Create a new folder on your desktop (not in a subfolder) called Paup_Tutorial (using underscore_ and capitals as shown).

Copy squamates.nex to this folder.

For Mac, Paup operates via the Terminal interface – the command-line system that “hides” behind the icon-driven desktop you normally use.

If you double-click the PAUP icon you will automatically open and run PAUP in this interface, However, you can also open this interface first and then run PAUP. Open the terminal application with comes with all macs (under Applications / Utilities / Terminal), drag-and-drop the PAUP program to the command line, and hit return.

In the Terminal window where PAUP is running, make Paup_Tutorial your working folder by typing the following ( means change ): cd /Users/user/Desktop/Paup_Tutorial

Hit return to submit the command. The above command can be case sensitive on some mac settings – so make sure your capitalization matches your folder.

Alternatively, just type cd followed by a space, and then drag-and-drop the folder Paup_Tutorial to the command line – the and folder name will appear automatically!

4

IMPORTANT: folder names along the entire path to your working folder cannot have a space in the folder name, e.g. a folder name called “my stuff” is not permitted, but “my_stuff” is OK.

Unless told otherwise (by putting a different path before filenames), PAUP* will read and to files in the working folder, currently set to /Users/user/Desktop/Paup_Tutorial

To change the working folder at any time (e.g. if you want output to be written into a different folder), type cd followed by a space, then drag-and-drop the new working folder onto the command-line. PAUP will read and write to files in the new folder.

3. EXECUTE THE DATA FILE

Type: EXECUTE squamates.nex;[hit return to submit command]

Note: you can often abbreviate commands to the first three letters: EXE squamates.nex;

If your paup file is not in your working folder, you can type EXECUTE followed by a space, and then drag-and-drop the PAUP file onto the command line. The path to the file will be auto-populated (e.g. /Users/user/Desktop/Some_Other_Folder/Some_Other_SubFolder /squamates.nex)

PAUP should execute the file and give you a summary of the contents:

4. LOG RESULTS TO FILE

Ordinarily, you will want to log the results of a PAUP* session to a disk file to have a record of the results of your analyses. To the screen output to a file, LOG file=practice.log;

By default, this log file will be created in your working folder (so it adds this path to the filename, ie /Users/user/Desktop/Paup_Tutorial/practice.log) and all screen output sent to it. Have a look and you should see this file appear. Again, you can 5 change the folder for this log file by changing your working folder (or by adding a different path to the filename).

To stop logging at any time, Type log stop;

5. SUMMARISING THE DATA

Now that the data matrix has been processed, you can use PAUP* to obtain basic summary information about the data set. To start, you will display information about the characters included in sample data set. cstatus;

PAUP* will display a summary of the current character status (i.e., types, weights, etc.). Remember, if logging was turned on, the summary information displayed to your screen will also be saved to the log file. You may also choose to display a summary of the taxa (tstatus), the entire data matrix (showmatrix), and more.

6. INCLUDE AND EXCLUDE CHARACTERS

PAUP* provides several ways to restrict analyses to a subset of the taxa and characters included in a data matrix. For example, the sample data set includes morphological and molecular characters. Suppose we wish to analyze only the morphological traits. These characters have already been identified in the sample file using the command charset (as charset morph). Character sets simplify certain procedures by allowing you to refer to a group of characters by a single name. You will start by excluding all characters in the data set except for the morphological characters.

EXCLUDE all; [this excludes all characters, the charset “all” is automatically recognized by PAUP and does not need to be defined] INCLUDE morph; This includes all characters in a character set named “morph” defined in the nexus file you examined earlier, using the charset command. In other words, all molecular characters are now excluded.

7. DELETE AND RESTORE TAXA

In the same way, you can define sets of taxa using the TAXSET command (see PAUP file in TextWrangler), and delete and restore them.

DELETE fossils; [this deletes all taxa in the taxset fossils, defined using the TAXSET command in the nexus file]

To restore these taxa –

RESTORE fossils;

You can also delete and restore single taxa by using their taxon name.

6

8. SET CHARACTER TYPES (unordered and ordered multistate)

Morphological characters. By default, PAUP treats all multistate characters as unordered. However, if you wish to order multistate characters (012 etc), you can use the TYPESET and ASSUME commands.

This command defines a typeset (named MorphoclineChars) that sets certain multistate characters as ordered – all other characters are the default, ie unordered.

TYPESET MorphoclineChars = ord: 55 63 67-70 83 94 124 134 152-153 157;

Hint: To save typing, you can and any commands here into the PAUP window.

You can define as many typesets as you want – none are actually enforced/activated until you employ the ASSUME command.

Enforce the typeset called MorphoclineOrdered with:

ASSUME Typeset= MorphoclineChars;

DNA Characters. By default, PAUP* considers all transformation costs to be equal. However, you can invoke a character type that will assign a higher weight to transversions than to transitions. More specifically,you can for instance assume that transversions, changes from a purine (A or G) to pyrimidine ( or T), are two times the cost of transitions, changes from a purine to a purine and pyrimidine to a pyrimidine. One way to incorporate this assumption into the analysis is to set up a transition/transversion "step matrix” in the nexus file, an example of this is in the PAUP examples folder. You can also set up similar stepmatrices for morphological characters.

9. SET OUTGROUP(S).

Define Sphenodontida as the outgroup. If you do not define an outgroup, the first taxon in the matrix will be the default outgroup, which is often NOT what you want.

OUTGROUP Sphenodontida; [This is already set as the outgroup in the nexus file, so you will get a warning that nothing has changed]

Now use the above outgroup to root the - SET root=outgroup;

You can list several taxa as outgroups, e.g. outgroup X Y Z. Paup will then root the tree somewhere amongst these outgroups. However, PAUP will not enforce monophyly or paraphyly of the outgroups with respect to the ingroup unless you specifically add separate commands -

SET root=outgroup outroot=monophyl; SET root=outgroup outroot=paraphyl; 7

The outroot=paraphyl command does not specify a particular outgroup order; to set one outgroup to be the furthest outgroup, for instance, you will need to use another command (the constraint command, discussed later).

9. SEARCHING FOR TREES, AND GETTING A CONSENSUS TREE

PAUP* 4.0 has the advantage of being able to analyze molecular data using several different optimality criteria; parsimony, likelihood, and distance. However, it can only analyse morphology under maximum parsimony. To begin with, you will use the default criterion, maximum parsimony, to search for optimal trees. Note that other programs, e.g. RaxML, MrBayes, BEAST, can employ likelihood models on morphological data.

Before doing the search; ensure you only include morphology, include all taxa, and have ordered characters. Type (or cut-and-paste) the following commands:

EXCLUDE all; [exclude all characters] INCLUDE morph; [include morphological characters] RESTORE all; [include all taxa] ASSUME Typeset= MorphoclineChars; [order characters which form morphoclines]

Now begin the search:

HSEARCH addseq=random nreps=1000;

This tells the program to do 1000 heuristic (quick) searches, starting from 1000 different random trees, and saving the best tree(s) found. The best tree PAUP finds can depend on the random starting tree used to start the search – so doing the process 1000 times (or more) makes it more likely the globally best tree is found.

Wait a minute or so, and you should that the 1000 replicate searches have converged on several different “islands” of best trees. The best of these islands was found in about 96% of the searches, and consists of 2 trees each 539 steps long.

To see these trees: 8

SHOWTREES all;

Have a look at the closest relatives of snakes. What does this tree imply for snake origins?

Save these trees to file (so you can open them with nicer graphics in other progams such as FigTree): SAVETREES File=allbest.trees root= brlens=yes;

To get a consensus tree, and save it to a file, use the commands below. If you don’t specify a consensus type, it will be a strict consensus (only clades found in every one of the - parsimonious tree are retained). CONTREE / treefile=con.tree root=outgroup; [strict consensus, default]

To get a majority-rule consensus, you have to over-ride the defaults: CONTREE / strict=no majrule=yes treefile=majrule_con.tree root=outgroup; [majority-rule consensus]

Getting Nicer Trees: You can look at your saved trees in better resolution (and obtain publication-quality figures) by opening and editing/colouring the files in Tree viewing programs such as FigTree (included in Paup Tutorial folder, but also available here - http://tree.bio.ed.ac.uk/software/figtree/ )

10. OPTIMISING (MAPPING) CHARACTERS ON A TREE

Note: for various reasons, it is not advisable to map characters on poorly-resolved strict consensus trees. It is best to do it on each of your most-parsimonious trees or a resolved majority rule consensus tree.

To examine the evolutionary of characters on your tree (e.g. on what branches has a particular character changed, and how many convergences/reversals have occurred?), type:

DESCRIBETREES 1 / plot=cladogram opt=deltran apolist=yes; [The “1” after “Describetrees” tells it to describe the first tree in memory. Describetrees all would map characters on, sequentially, every tree in memory, in this case both trees]. deltran (delayed transformation) is an optimization strategy which says, if there is any uncertainty about where a character changes on a tree (e.g. because of missing data and/or extensive homoplasy), places the changes as late as possible. The converse is acctran (accelerated transformation), which places changes as early in the tree as possible.

You will get a tree with numbered nodes, tree length and consistency index information, and character change information, as below: 9

This says that character 4 has changed with 1 step on the branch between nodes 52 and 50; the character has homoplasy because the consistency index is 0.33 (ie 1/3, so it has changed 3 times), and the “1 step” change is from state 0 to state 1. The single arrow means the change is ambiguous, i.e. it only applies under the current optimization strategy deltran but does not occur on this branch if an alternative optimization (acctran) is used. A double arrow (e.g. for character 26) is better as it means the change definitely occurs on this branch no matter what optimization strategy is used.

The clade of snakes and marine lizards is identified as clade 40 here; you will need this information later.

11. BOOTSTRAPPING

Bootstrapping is a simple resampling technique to assess clade support. Characters are randomly re-sampled (with replacement) so that a resampled matrix has the same number of total characters as the original matrix – but in the resampled matrix, some characters are represented multiple times and others are not represented at all. This resampled matrix is analysed and a tree (or trees) obtained. The process is repeated many times (>100 is best). 10

Clades which appear in most resampled matrices (e.g. >70% or >95% of resampled matrices) are considered highly supported, as it means there is strong support in the original matrix (since most random samples of characters from this matrix give you trees with the same clade).

To run a bootstrap, cut-and-paste this command into the PAUP window:

BOOTSTRAP nreps=100 conlevel=50 grpfreq=yes keepall=yes brlens=yes treefile=bootstrap100.trees =yes search=heuristic / addseq=random nreps=50 nchuck=1000 chuckscore=1;

It will generate 100 bootstrap replicate matrices, and do a quick heuristic search from each matrix, and write all trees with branch lengths to the file bootstrap100.trees. The portion after the / are settings to prevent the analysis getting stuck with too many trees at any particular bootstrap replicate (which can happen if a resampled matrix has a lot of character conflict).

The screen output is below.

You will get a tree with bootstrap values for each clade, followed a table with similar information (e.g. the first line of the table says taxa 23 and 24 in the tree appear 100% of the time). What is the support for clade 40 (the mosasnake clade) shown in step 10? Is this strong or weak?

11

All the individual sampled bootstrap trees are in the file bootstrap100.trees This will be a large file with lots of trees. Open this in Textwrangler and look at the file structure.

To save the bootstrap consensus tree to a file: SAVETREES from=1 to=1 file=bootstrap_consensus.tree savebootp=nodelabels;

You can open this file in Figtree to get a nicer looking graphic to edit. Play around with Node Labels options to get the “” field to display in Figtree, this will be the bootstrap values.

12. CALCULATING BRANCH (BREMER) SUPPORT

Branch (Bremer) support is a simple measure of clade robustness - how many steps does it take to “break up” a clade? Higher is always better, but this measure is dataset-dependent and cannot be interpreted in statistical/probabilistic terms. A Bremer support of 10 would often be considered very strong in a small morphological dataset of 50 characters - but probably very weak in a genomic dataset with 100 000 characters.

Let us consider the “marine lizard plus snake” or “mosasnake” clade found in the 2 most- parsimonious trees (labeled as clade 40 in the screenshot in Section 10). These trees are 539 steps long, as we saw from previous analyses. We are interested in finding out - how many steps are the best trees without Aigialosauridae+Mosasauridae?

To check this, first we have to define this clade as a constraint: CONSTRAINTS mosasnakes = ((Aigialosauridae Mosasauridae Dolichosauridae Adriosaurus Serpentes Haasiophis Pachyrhachis));

The above constraint obviously enforces only a single clade – you can define multiple constraints simultaneously by loading a partially-resolved nexus treefile you constructed in Mesquite or some other program, or manually.

To see the constraints, type: Showconstr

Now, you want the tree length of the best tree which violates these constraints – this represents the shortest tree without these taxa as a clade. To find this tree, HSEARCH addseq=random nreps=1000 Constraints=mosasnakes Enforce=yes converse=yes;

This is the heuristic search you used before but with three extra commands, identifying the constraint you defined, enforcing it, and telling paup to enforce it as a “converse constraint”, ie only considering trees that do not satisfy the constraint.

The output (below) will tell you that there are 3 shortest trees where the clade mosasnakes is “broken up” and these are all 555 steps long – 16 steps longer than the (unconstrained) 12 shortest trees (539 steps). So the Bremer or Branch support for the mosasnake clade is 16, i.e. it takes 16 steps to break up this clade. You can see these 3 trees with showtrees all – examine how the mosasnakes clade is broken up in each of these “reverse constraint” trees.

This type of analysis can be tedious to do this for every clade on a large tree, but fortunately programs like TreeRot (http://people.bu.edu/msoren/TreeRot.html) automatically generate PAUP batch files which sequentially get Bremer supports for every clade on any given tree.

Don’t forget to turn of logging, before step 13.

LOG stop;

13. RUNNING PAUP IN BATCH MODE

Most of the commands you executed above are concatenated into a plain text file called batchfile.txt, and can all be executed sequentially with a single command:

Execute batchfile.txt; [make sure the file is in your working directory]

Note that you must have a data matrix already loaded and running in paup to execute any batchfile (for obvious reasons).

The file batchfile.txt contains the following commands – open in Text Editor and see.

The commands in the batchfile will then execute automatically in sequential order, i.e.

#NEXUS

BEGIN PAUP; LOG start file=batch.log; INCLUDE morph; TYPESET Morphocline = ord: 55 63 67-70 83 94 124 134 152-153 157; 13

ASSUME Typeset= MorphoclineChars; OUTGROUP Sphenodontida; SET root=outgroup; HSEARCH addseq=random nreps=100; SAVETREES File=allbest.trees root=yes brlens=yes; CONTREE / strict=no majrule=yes treefile=majrule_con.tree root=outgroup; DESCRIBETREES 1 / plot=cladogram opt=acctran apolist=yes;

END;

MORE EXAMPLE BATCH COMMAND BLOCKS ARE IN THE APPENDIX

14

APPENDIX

SOME USEFUL BATCH COMMANDS

USE THESE IN TOGETHER WITH THE PAUP COMMAND REFERENCE MANUAL, which describes all available commands.

NOTE: the command #nexus must be at the start of every nexus file, and only at the start. If you are adding these batch commands to an existing nexus file, delete the #nexus (you don’t want this command appearing in the middle of the file).

*********General Commands************ [These are useful to add to every file – compresses trees so you can see the whole tree, ladderises trees, and sets a big limit on maxtrees etc]

#nexus [see above]

BEGIN PAUP; OUTGROUP Taxon1 Taxon2 etc / only; [set one taxon, or a clade, as the most distant outgroup]

SET MAXTREES=200000 tcompress=yes torder=left showtaxnum=yes taxlabels=full; SET ROOT=OUTGROUP OUTROOT = monophyl CRITERION=parsimony ; SET storetreewts=yes; END;

*********** BOOTSTRAPPING (batch mode) *****************

Use the commands this batchfile to do a simple bootstrap (with 100 replicates) and generate a bootstrap consensus tree.

#nexus

BEGIN PAUP;

LOG start file=Screenoutput.txt replace=yes;

BOOTSTRAP nreps=100 conlevel=50 grpfreq=yes keepall=yes brlens=yes treefile=bootstrap1000.trees replace=yes search=heuristic / addseq=random nreps=50 nchuck=1000 chuckscore=1; SAVETREES from=1 to=1 file=bootstrap_consensus.tree savebootp=nodelabels; CONTREE / treefile=consensus.tree root=outgroup outroot=monophyl; 15

LOG stop;

END;

[************ Good Random Addition Search - for data with lots of tree "islands".

Saves 1000 trees per random search / island [For cleaner data with fewer multiple trees, delete "nchuck=1000 chuckscore=1" to save all trees from each search]

#Nexus

BEGIN PAUP;

HSEARCH addseq=random nreps=100 nchuck=1000 chuckscore=1; SAVETREES File=heuristic.trees brlens=yes root=yes; CONTREE / treefile=strictconsensus.tree root=outgroup outroot=monophyl;

CONTREE / treefile=stictandmajruleconsensus.trees majrule=yes le50=yes percent=50 root=outgroup outroot=monophyl; [This gives 2 trees, a strict and a majority-rule]

END;

[***********DESCRIBE TREES******************************]

Plots apomorphies for each node.

Acctran = accelerated transformation optimisation (makes changes as early in the tree as possible) Deltran = delayed transformation optimisation (makes changes as late in the tree as possible)

Double arrows in output means optimization is same under acctran or deltran, single arrow means optimization dependent (changes under acctran or deltran); see tutorial above.

DESCRIBETREES / plot=cladogram [phylogram] root=outgroup outroot=monophyl apolist=yes diag=yes opt=deltran [acctran];

16

[******* CHECK WHICH CHARACTERS VARY IN FIT ACROSS 2 OR MORE TREES UNDER PARSIMONY.]

This loads 2 trees up, and finds characters which favour one tree over the other (and vice versa), and sees if one tree is significantly favoured over the other. It is often called the Templeton Test or Nonparametric test.

You need to save your two trees as separate nexus files (e.g. construct them in Mesquite tree window and export them), and then make sure you get the path and filename correct. If you are not sure of the path, drag and drop the files onto the command line to check.

#nexus

GETTREES file=PATH+FILENAME_TREE1 mode=3; [replace trees in memory] GETTREES file=PATH+FILENAME_TREE2 mode=7; [add to trees in memory] [SHOWTREES all; this will show trees in memory if you want to check]

LOG start file= CompareTrees_screenoutput.txt replace=yes; BEGIN PAUP; PSCORES / NonparamTest=yes single=var ci=yes ri=yes scorefile=treescores.txt; [This outputs characters which vary across your trees to the screen and log file CompareTrees_screenoutput.txt , and generates a separate file treescores.txt with ALL characters and their lengths and CI and RI] LOG stop; END;