README

 

We developed the HangOut procedure to build clean profiles for accurately defined single domains (e.g. SCOP domains). To our surprise, single domains can also have corrupted profiles according to our recent analysis (the manuscript submitted).

We implemented and tested the HangOut program under python 2.3 (however, the program should work higher versions) to fix those profile corruptions. HangOut reads in a fasta file or a PDB files and build a profiles (much like PSI-BLAST –m6 option output) using the sequence extracted from the file. In the program, we added two other convenient procedures of building profiles; original PSI-BLAST and RemoveHit besides the HangOut procedure.

As HangOut is developed for generating clean profiles for a given domain definition (especially those discontinuously defined domains shown in SCOP domains; for example a SCOP domain d1xzpa1 is defined for a range chain A:118-211 and A:372-450, due to an inserted domain), the user should give a domain definition as command line arguments, otherwise HangOut program will automatically run PSI-BLAST for those domains without domain definitions.

Particularly, the domain definition can be given one of the two ways according to the type of the input of protein sequence;
1) The input sequence can be extracted from PDB file.
2) The input sequence can be a simple FASTA format with an amino acid sequence.

 

Method1: HangOut with PDB file

HangOut can read in PDB file format with range definition similar to SCOP database (http://scop.mrc-lmb.cam.ac.uk/scop/).

Shell% hangout [optional parameters] <pdb file> <range definition>

The <range definition> can be like A:118-211,A:372-450.

Here, the first contiguous sequence will be from chain A residue number (in PDB) 118 to 211.
In addition, the second contiguous sequence will be from residues 372 to 450 from chain A.

Note that if you want to use full power of HangOut methodology, it is better to use unmodified PDB files (downloaded from the PDB website, not the SCOP domain PDB files downloaded from ASTRAL (SCOP domain deposition website). If the ASTRAL domain PDB files are used, the inserted domain sequences will not be automatically detected (because ASTRAL people removed it for you, and usually it is very convenient) but you can define the inserted domain by youself (see –n option below).

 

Method2: HangOut with Input FASTA amino acid sequence file

To input domain defined within a FASTA file, users can use the following command line.

Shell% hangout [optional parameters] <fasta file> [<range def>]

<range def> is defined as <start position>-<end position> and concatenated by “,” where start positions and end positions are integer values representing positions of the amino acid in the sequence.

For example, if we have a fasta file, test.fa, that has an amino acid sequence of 10 residues.

################ test.fa #################

>test domain

MDLTSAEAVR

########################################

And if this test domain had an inserted domain between the two residues “TS” (if the insertion is currently removed and not shown in the test.fa file),
then the range should be explicitly represented as 1-4 5-10.

The hangout command for this case would be like the following;
% hangout test.fa 1-4,5-10

If you want to use whole sequence of the test domain, you may set the range using the keyword “all” or put ranges “1-10”. Note that if the range is defined this way, the HangOut procedure cannot be used. Instead, RemoveHit or PSI-BLAST should be used.

 

Optional parameters

Selecting profile building methodology
-m <method_name>

<method_name> should be one of the following choices;
hangout (default)
removehit
psiblast

You may choose profiles building method not the default HangOut procedure but RemoveHit or PSI-BLAST. RemoveHit supposedly produce less clean profiles compared to HangOut (slightly better than PSI-BLAST), but it does not require domain definition.

 

Setting a new query domain name.
-t <new_name>

The new_name can be used for the output filename and as the query domain name in the profiles instead of the name found in the original query fasta file or from the pdb file.

 

Setting neighboring domain ranges.
-n:
<neighbor_def>
Usage:
-n auto (default)
-n “1c0p.pdb A:1128-1208”
-n “1c0p.fa 1128-1208”

The HangOut method uses inserted domains to build clean corruption free profiles. However, users can add freely N-terminal or C-terminal end domains for cleaner results and one or more domains around the user’s input domain definitions.

Note that the HangOut only understand the –n optional input file format the same as the input argument file; i.e. if HangOut input method #1 is used (fasta) than –n should also expect to read fasta file, and vice versa.

 

If no domain definition is given (or the whole domain sequence is continuous),
HangOut == PSI-BLAST.

 

Currently, if the range definitions or insertion positions are not given, or the given range is continuous, then HangOut will run as normal PSI-BLAST since HangOut cannot do the profile checking without given clear domain boundary. We plan to lift the strict requirement of the domain definition and make HangOut a solution for the general multidomain problem.

 

Note for users who want to run RemoveHit:

RemoveHit runs similar manner as HangOut, but RemoveHit is not required to have the domain definition (so –n option will not affect any results). However, RemoveHit is not strongly recommended for building profiles, since it performed worse than HangOut.