The first step in most of your projects will be to collect a list of homologs. If you already have an accession number, sequence or Gi number, you are ready to start.

Protocol1 takes any protein sequence, gi number, or accession number and performs an NCBI PSI-BLAST search, returning a list of close homologs that have e values of  0.005 or better. The user is presented with the results and is given the option to perform a second iteration. Once the user is satisfied with the results, Protocol1 removes any redundancies from the gi list and submits it to NCBI’ s Protein Entrez tool, where it downloads a Tiny XML format file containing information about each gi number. This is then run through Maketable5, which removes short, and abnormally long sequences,removes similar sequences, and annotates the remaining sequences.

How to run Protocol1

1.) From your terminal app, type:


2.) As soon as you’ve run the above command, you’ll be presented with

"Enter the Accession, GI, or Full protein sequence"

to investigate Type the GI/Accession number. You can also paste an entire sequence.  Just make sure there are no fasta headers, and the sequence is in one line.

3.) Next you’ll be presented with Directory Name (session files will be stored here): Type in the directory name. This directory will be created on your desktop.

4.) In the next step e Value Threshold (Default: 0.005): type the desired e Value Threshold. The default is usually ok. You can make this number smaller to restrict your results to closer homologs.

5.) Protocol1 will the BLAST NCBI based on the accession number and present you with the following status messages:

>> Blasting NCBI Database...Done.
>> Fetching Results.
>> Waiting for NCBI.........Done.
>> Results Recieved.
++ Found 500 Unique GI Numbers.
++ GI List Written to <directory name>

You will be given an option to view the BLAST results. If you say yes, an HTML page with BLAST results will open.

View BLAST Results? (Leave blank to skip):

The tool then asks you to key in the maximal number of BLAST iterations.

# Max iteration (leave blank to ignore): 

If your first result pulled very few Gi numbers (< 100), it will be best to not to preform second iteration. The number you enter here is the maximal number of results to retrieve with your second iteration.

200 is a safe number to use if you decide to preform a second iteration.

>> Waiting for NCBI..........Done.
>> Results Recieved.
++ Found 2 Unique GI Numbers. // Note! You won't always find many new results!
++ GI List Written to <directory name>

You can now view these BLAST results.

View New BLAST Results? (Leave blank to skip):
>> Uploading GI File to Entrez...Done!
>> Downloading TinyXML Sequence...

Finally, Protocol1 will run Maketable5. This program will annotate and run CD-HIT based on your results.

When you are prompted to enter a cut off range, pick any number from 0.7-1

This corresponds to 70% to 100%. CD-HIT will remove sequences that are x% similar to each other.


When this if finished, a folder on your desktop should pop up containing a fasta (.faa) file. This is your generated list of homologs