Archive for the ‘Methods’ Category

Creating Super Family Trees

This tutorial will explain how to make Super Trees using the SFT1 approach

The SFT1 approach builds a tree at the level of each TCDB sequence. To build a SFT2, instead of using getNcbiSeq.pl, you should build fasta files containing homologs for all the members of a subfamily or group, fasta files like those representing an entire TC family. After obtaining fasta files for each group that will be included in your SFT2, you would run supertree.pl.

The programs used here have recently been updated to run faster than before, to take care of some labeling problems (due to phylip‘s limitation to 10 character labels), and to run fitch and consense. These instructions reflect the new usage protocols.

Read more

Using Ancient-Rep to find internal TMS repeat units

Ancient-Rep will find internal TMS repeats using a list of homologs.

Enter ‘ancient’ in the Terminal app to begin.

Read more

Create a list of homologs to represent an entire TC Family

Having a FASTA file that defines an entire family is very useful if you want to find repeats with Ancient or use TSSearch/Protocol2 to compare two entire families for homology.
A TC-Family looks like this : 2.A.1 (It has three digits).

The program we are using is called define_family.py

Usage: define_family.py FAMILY <P/PSI> OUTPUT

Open up your terminal application and type:

cd ~/Desktop/ # Changes your working directory to your desktop.
define_family.py 2.A.1 P output.faa # P or PSI

The “P” option refers to BLASTP. Alternatively we can use “PSI” if we are looking for more distant homologs. When comparing families or looking for repeats, it is best to use the “P” option. If no good results are found, then use “PSI”.

When prompted, enter 0.7 for CD-Hit threshold if you are about to compare this family to another. Enter 0.9 if you are searching for repeats. This will remove proteins that are 70% and 90% identical to their clusters, respectively.

We use forgiving thresholds, because having a very large FASTA list will not cost us very much time, so long as we are using TSSearch. When looking for repeats, we don’t want to eliminate too many sequences. This becomes apparent when doing a vertical search with Ancient. A good example of a TMS repeat across two homologs can be masked if we have a threshold that is any lower.

How to run CD-HIT/Maketable5 on multiple FASTA files

If you have several accession numbers out of which you want to build a single database and run CD-HIT; this can be done using our BioTools Work Environment
Read more

Semi-automated Genome Analysis (Ujjwal Kumar)

Read more

Return top

Welcome.