PCMA - Profile Consistency Multiple sequence Alignment


******************************************************************************

Please send bug reports, comments etc. to one of:
	jpei@mednet.swmed.edu
	grishin@chop.swmed.edu

******************************************************************************

                  POLICY ON DISTRIBUTION OF PCMA

PCMA has adapted codes from Clustal W, version 1.81.  Clustal W, 
developed by Thompson JD , Higgins DG and Gibson TJ, is freely available to the
user community.  Commercial distributors of Clustal W must take out a 
NON-EXCLUSIVE LICENCE from the authors of Clustal W (gibson@embl-heidelberg.de, 
thompson@embl-heidelberg.de or d.higgins@ucc.ie).  According to this policy, 
PCMA is free for non-commercial uses.  Commercial uses are disallowed. 

******************************************************************************

Version: 2.0

What is new in version 2.0:

     - pcma can now make an alignment by combining a number of input alignments.
     - output alignment looks nicer for the N-terminal gaps.
     - pcma is now faster due to modifications of library generation and 
       local profile-profile alignment, especially for large numbers of 
       sequences.  Below is a comparison of performance on 49 large SMART 
       database alignments with sequence number between 100 and 200.  PCMA 
       ave_grp_id threshold set to 50. Alignment evaluation routines are
       available at: ftp://iole.swmed.edu/pub/PCMA/evalscore.

                            PCMA-v2.0 PCMA-v1.0  T-Coffee  ClustalW
     Sum-of-pairs accuracy    0.870     0.870     0.841     0.780
     Column-score accuracy    0.263     0.258     0.246     0.210
     Average CPU time (s)       732      1334     16284        28
     Median CPU time (s)        311       565     15386        15


******************************************************************************

				PCMA help

PCMA - Profile Consistency Multiple sequence Alignment

1.A quick start
  To align a sequence set in fasta format, use the following command:
      pcma <target_sequences>
  Two output files will be generated:
     <target_sequences>.aln - A multiple sequence alignment in clustal format
     <target_sequences>.dnd - A dendrogram in phylip format 

2.Usage:      pcma <target_sequences> <options>   
   The first command line argument <target_sequences> should be the name of the 
   file containing FASTA format sequences.  One IMPORTANT notice is that the 
   sequences should not contain gap characters in them, otherwise the results 
   might be incorrect.
   
   Options are in the format of -optionName or -optionName=option.
   NOTE that there should be no space(s) between "optionName", "=" and "option".
   Although many of the original ClustalW options are supported in PCMA,
   changes from default parameters are not recommended for most of them.
   
   An example:
           pcma yfp.fa -ave_grp_id=50 -outfile=yfp.pcma50.aln

3.Commonly used options
  -ave_grp_id=     Threshold of PERCENTAGE sequence identity above which 
                   neighboring groups are aligned by ClustalW and below which 
                   neighboring groups are subject to profile consistency measure.
                   If the sequence number is very large, a decrease of the
                   threshold from the default value is recommended.
                      Range [0..100]
                      Default: -ave_grp_id=40

  -outfile=        Name of the output alignment.
                     If this option is not used, the output alignment will be 
                     in clustal format with .aln suffix
                  
  -output=         The output alignment format.
                     Default: -output=clustal
                     Other formats include gcg, phylip and pir.
 
  -help or -options   Help and options.
 
 4.Newly added function for PCMA
   PCMA now supports the alignment of several alignments. In this case, the first 
   command-line parameter should be a file containing a list of file names of the
   alignments to be aligned together.  To make this format distinct from a fasta
   format sequence file, the first line should start with a character "@".  Each of
   the other lines contains the file name of an alignment. Here, input alignments
   CAN have gap characters (usually they do). There should NOT be any two sequences 
   with the same name in all these alignments.
   For example, below is the content of file "alnlist": 
   
        @
        alignment1.aln
        alignment2.aln
        alignment3.aln
   
   PCMA (command: pcma alnlist) will generate a new alignment named "alnlist.aln".  


******************************************************************************

				References

Pei, J., Sadreyev, R., Grishin, N.V., (2003) PCMA: a program for fast
and accurate multiple sequence alignment, Bioinformatics.19(3):427-428.

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, positions-specific gap penalties and weight matrix choice.  Nucleic
Acids Research, 22:4673-4680.

Notredame, C., Higgins, D.G., and Heringa, J. (2000). T-Coffee: A novel method 
for fast and accurate multiple sequence alignment, J Mol Biol 302, 205-17.