promals with 3D information This program is free for academic use. 1. Install promals Operating system: Linux; Unix; Mac OS X System requirements: g++; gcc; g77; python; awk This whole package is about 8GB. The main disk-consuming part is sequence and structure databases. Please check if your computer has enough free disk space before installation. Installation instructions: 1.1 Download 'promals_package.tar.gz' to your computer. Extract files using gunzip and tar. gunzip promals_package.tar.gz tar xf promals_package.tar 1.2 Go to the directory 'promals_package': cd promals_package 1.3 Run the following command: python install.py (pay attention to error or warning messages) This will install promals and other programs promals uses, including: - blastpgp (for running PSI-BLAST) - cd-hit (for initial clustering of sequences) - mafft (for alignment of closely related sequences) - psipred (for secondary structure prediction) - TMalign (for structural alignment) You need to install the following programs by yourself. - 'DaliLite' (for structural alignment) is not installed by this package. To use this program, you need to install it by yourself (due to license of DaliLite), and put the 'DaliLite' executable in the 'bin/' directory inside this package. DaliLite is available at: http://ekhidna.biocenter.helsinki.fi/dali_lite/downloads - 'fast' (for structural alignment) is not installed by this package. To use this program, you need to install it by yourself, and put the 'fast' executable in the 'bin/' directory inside this package. 'fast' is available at: http://bu.wenglab.org/FAST/download.htm Make sure you have 'fast' executable of the correct platform. 1.4 Check if the python script file 'promals' is generated in the 'bin/' directory, which should also include executables of blastpgp, cd-hit, fast, TMalign, mafft, runpsipred1, makemat, al2co_consensus and promals_c. 2. Run promals Use the 'promals' (a python script) in the bin/ direcory. Command: promals input_file [options] > input_file.log or python promals input_file [options] > input_file.log Example: promals yfp.fa > yfp.fa.log promals yfp.fa -id_thr 0.6 -blast_dir yfp.fa_blast -outfile yfp.promals.aln > yfp.fa.log Input: input_file needs to be in fasta format Output: Two alignment files will be generated. One is a clustal format alignment (file name can be specified by option -outfile). The other file is an html file of colored alignment. Options: Any option name (starting with '-') and its value are separated by white space. For alignment strategies: -id_thr [0, 1] Identity threshold that determined the partition of fast and slow alignment processes. If two groups of sequences has average identity above this threshold, align them in a fast way. Otherwise, use slower but more accurate way (by profile-profile alignment with predicted secondary structures and available 3D constraints). Default: 0.6 (corresponding to 60% identity) For using 3D information: -dali [0 or 1] Use DaliLite structural alignment (1) or not use fast alignment (0) ("DaliLite" executable needs to be present in bin/ directory). Default: 0 (it is relatively slow to run DaliLite) -fast [0 or 1] Use fast structural alignment (1) or not use fast alignment (0) ("fast" executable needs to be present in bin/ directory). Default: 1 -tmalign [0 or 1] Use TMalign structural alignment (1) or not use fast TMalign alignment (0) ("TMalign" executable needs to be present in bin/ directory). Default: 1 -struct_weight [0, inf[ Weight of structural constraints relative to sequence constraints. Default: 1.5 For profile scoring: -ss_weight [0,inf[ Weight of predicted secondary structure in profile- profile scoring. Default: 0.2 -score_weight [0,inf[ Weight of amino acids in profile-profile scoring. Default: 0.8 For running PSI-BLAST to get sequence profile: -iter_numberNumber of PSI-BLAST iterations for profile generation. Default: 3 -evalue [0, inf[ PSI-BLAST evalue cutoff for inclusion. Default: 0.001 -low_id_thr [0,1] Remove PSI-BLAST hits with identity to the query less than this value. Default: 0.2 -blast_dir Directory of running PSI-BLAST and store other intermediate results. -clean_blast_before [0 or 1] Remove any file in the directory that stores intermediate results (specified by -blast_dir option) before running PSI-BLAST. Default: 0. -clean_blast_after [0 or 1] Remove any file in the PSI-BLAST directory after running PSI-BLAST. Default: 0 For output: -outfile The name of output alignment file. -blocksize Number of letters in clustal-format alignment blocks. Default: 70 -resnum [0 or 1] In colored html alignment, show residue numbers for alignment blocks. Default: 1 -caa_freq [0, 1] In colored html alignment, show amino acid consensus symbol if the fraction of a class of residues is higher than this threshold. Default: 0.8 3. Contact: Jimin Pei (jpei [AT] chop [DOT] swmed [DOT] edu) Please report bugs during installation or when running promals. 4. References Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. Holm L, Sander C: Mapping the protein universe. Science 1996, 273:595-603. Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292:195-202. Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 2005, 33:511-518. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22:1658-1659. Pei J, Grishin NV: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 2007, 23:802-808. Pei J, Kim BH, Grishin NV: PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 2008, 36:2295-2300. Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33:2302-2309. Zhu J, Weng Z: FAST: a novel protein structure alignment algorithm. Proteins 2005, 58:618-627.