Profile-based sequence similarity search is an essential step in structure-function studies of proteins. However, inclusion of non-homologous sequence segments into a profile causes its corruption and results in false positives. Profile corruption is frequent in multidomain proteins, and single domains with long insertions are a significant source of errors. We developed a procedure (HangOut) that, for a given single domain, effectively removes non-homologous sequence segments from erroneously extended PSI-BLAST alignments and generates cleaner profiles.
HangOut is implemented in Python 2.3 and runs on the command-line on all Unix-compatible platforms.
The source code is available under the GNU GPL license.
Source codes are available here.
Test set is available here.
Short Tutorial how to use HangOut is available here.
For Installation:
Download hangout and two libraries (blast_lib.py and pdb_lib.py) files hangout.zip.
Those files should be in the same directory.
Note that HangOut requires standalone PSI-BLAST program (blastpgp) and NR database from NCBI.
Make sure to download PSI-BLAST "Legacy executables" rather than BLAST+ executables, since HangOut is developed and tested based on the legacy software.
Reference:
B.H. Kim, Q. Cong, N.V. Grishin (2010) "HangOut: generating clean PSI-BLAST profiles for domains containing long insertions".
(submitted to bioinformatics)
For questions and suggestions: kim@chop.swmed.edu or grishin@chop.swmed.edu