This tutorial will explain how to make Super Trees using the SFT1 approach

The SFT1 approach builds a tree at the level of each TCDB sequence. To build a SFT2, instead of using getNcbiSeq.pl, you should build fasta files containing homologs for all the members of a subfamily or group, fasta files like those representing an entire TC family. After obtaining fasta files for each group that will be included in your SFT2, you would run supertree.pl.

The programs used here have recently been updated to run faster than before, to take care of some labeling problems (due to phylip‘s limitation to 10 character labels), and to run fitch and consense. These instructions reflect the new usage protocols.

Test if the programs are in your computer by typing

which getNcbiSeq.pl
which supertree.pl

Each command should respond with the full path to the program if the program is installed, and with “Command not found” if not (if not, ask if they can be installed).

If the programs are not installed, and/or you cannot reasonably run them in your computer, then you might open up the terminal, and SSH into the “.21″ server:

ssh saierlab@132.239.144.21

Either way, the following instructions apply to your computer if the programs are installed, or to the “.21″ server otherwise.

Always remember to be well organized so as to avoid cluttering the server(s) with files and directories with meaningless names found just about anywhere. Start by making a directory to store your work:

mkdir mysupertree
cd mysupertree

You can name your directory anything you prefer instead of “mysupertree.” A meaningful name would be better than a very generic name. You could, for example, make a directory with your name, and then make another directory within that one for your superfamily tree.

Create a text file containing FASTA representatives of families within the super family you’ll be working with.

For example, for each family within MFS you would retrieve one representative for each sub-family for the list. You might select more representatives from each subfamily (also called a cluster) if one representative does not bring up all its members in a TC-Blast.

Name each FASTA sequence carefully. The name of the sequence after the “>” symbol is what will actually appear on the branches of your tree. So make sure this is a significant name. We recommend you use the full TCID, this makes the most sense. For example:

>2.A.1.1.1
ASDAFJAJJSJSJJSJAJJAJAJFFFF
>2.A.1.2.1
ARPLLLLQQQQWWWWWWWWVVVVMM

But you may have any names as long as each sequence has a different one. There’s many ways to get your file into the “.21″ server. You can use sftp, you can use rsync, you can just copy and paste into a file opened in the server.

Suppose you made a directory in the “.21″ server named “john_smith,” and that your superfamily tree directory, inside the “john_smith” directory is “FamXFamYtree.” Your local file (the file in your computer(, is called “FamXFamY.fasta.” Then you would use rsync to send your file as follows:

rsync -av -e ssh FamXFamY.fasta saierlab@132.239.144.21:john_smith/FamXFamYtree/

Enter the password when prompted. Done! Your file is at the “.21″ server.

The copy/paste method could be, for example, as follows. Copy the contents of your list to the clipboard (control-c of selected text in your file).

Then, in the remote shell type the following:

vi FamXFamY.fasta
a
(CMD+V) then (ESC) then (SHIFT+;) then wq [ENTER]

Type this as is, line-breaks denoting the ‘return’ key.

Retrieve NCBI sequences from BLAST results by typing the following (remember that you have to work in the directory are you have the fast file):

getNcbiSeq.pl -i FamXFamY.fasta -o SFT1

When this is done, create your distance matrix, and your tree, by simply typing:

cd SFT1
supertree.pl

This program will build 100 matrices, and then run fitch and consense to produce a file with a consensus tree (superfam.tree file).

You can draw the tree using this website or this other website (this one requires you to register, but is very powerful). Another alternative is to use the program R (if you know how to use R). Or you can open it with FigTree. Just paste the contents of ‘superfam.tree’ into the window. To display the contents of ‘superfam.tree’ in the shell, just type:

cat superfam.tree

If there’s no fitch and consense in your computer you may run “supertreeNP.pl” instead of “supertree.pl,” and build the trees running fitch and consense yourself in some other computer.

If that’s the case, then you would run fitch and consense as follows:

Once you have an infile from supertreeNP.pl (could take a while), the next step is to create your fitch tree, then your consensus tree. Type the following:

fitch

Fitch will give you a list of options. Type ‘M’ and set that to 100. The next prompt should be any odd number, just type 3 if you can’t think of 1.
Let the program re-write whatever it needs to. When its done rename “outtree” to “intree” by typing

mv outtree intree

The final step is the type the following into the shell:

consense

Check the options to make sure that nonsense is running assuming an unrooted tree, otherwise type “R” so that you can tell consense that the trees are not rooted. After that type “Y” to let consense run.

When this is finished your final tree will be saved in the ‘outtree’ file.

This file contains your tree information in Newick format.

You can draw the tree using this website or this other website (this one requires you to register, but is very powerful). Another alternative is to use the program R (if you know how to use R). Or you can open it with FigTree. Just paste the contents of ‘outtree’ into the window. To display the contents of ‘outtree’ in the shell, just type:

cat outtree