PCMA - Profile Consistency Multiple sequence Alignment ****************************************************************************** Please send bug reports, comments etc. to one of: jpei@mednet.swmed.edu grishin@chop.swmed.edu ****************************************************************************** POLICY ON DISTRIBUTION OF PCMA PCMA has adapted codes from Clustal W, version 1.81. Clustal W, developed by Thompson JD , Higgins DG and Gibson TJ, is freely available to the user community. Commercial distributors of Clustal W must take out a NON-EXCLUSIVE LICENCE from the authors of Clustal W (gibson@embl-heidelberg.de, thompson@embl-heidelberg.de or d.higgins@ucc.ie). According to this policy, PCMA is free for non-commercial uses. Commercial uses are disallowed. ****************************************************************************** Version: 2.0 What is new in version 2.0: - pcma can now make an alignment by combining a number of input alignments. - output alignment looks nicer for the N-terminal gaps. - pcma is now faster due to modifications of library generation and local profile-profile alignment, especially for large numbers of sequences. Below is a comparison of performance on 49 large SMART database alignments with sequence number between 100 and 200. PCMA ave_grp_id threshold set to 50. Alignment evaluation routines are available at: ftp://iole.swmed.edu/pub/PCMA/evalscore. PCMA-v2.0 PCMA-v1.0 T-Coffee ClustalW Sum-of-pairs accuracy 0.870 0.870 0.841 0.780 Column-score accuracy 0.263 0.258 0.246 0.210 Average CPU time (s) 732 1334 16284 28 Median CPU time (s) 311 565 15386 15 ****************************************************************************** PCMA help PCMA - Profile Consistency Multiple sequence Alignment 1.A quick start To align a sequence set in fasta format, use the following command: pcma Two output files will be generated: .aln - A multiple sequence alignment in clustal format .dnd - A dendrogram in phylip format 2.Usage: pcma The first command line argument should be the name of the file containing FASTA format sequences. One IMPORTANT notice is that the sequences should not contain gap characters in them, otherwise the results might be incorrect. Options are in the format of -optionName or -optionName=option. NOTE that there should be no space(s) between "optionName", "=" and "option". Although many of the original ClustalW options are supported in PCMA, changes from default parameters are not recommended for most of them. An example: pcma yfp.fa -ave_grp_id=50 -outfile=yfp.pcma50.aln 3.Commonly used options -ave_grp_id= Threshold of PERCENTAGE sequence identity above which neighboring groups are aligned by ClustalW and below which neighboring groups are subject to profile consistency measure. If the sequence number is very large, a decrease of the threshold from the default value is recommended. Range [0..100] Default: -ave_grp_id=40 -outfile= Name of the output alignment. If this option is not used, the output alignment will be in clustal format with .aln suffix -output= The output alignment format. Default: -output=clustal Other formats include gcg, phylip and pir. -help or -options Help and options. 4.Newly added function for PCMA PCMA now supports the alignment of several alignments. In this case, the first command-line parameter should be a file containing a list of file names of the alignments to be aligned together. To make this format distinct from a fasta format sequence file, the first line should start with a character "@". Each of the other lines contains the file name of an alignment. Here, input alignments CAN have gap characters (usually they do). There should NOT be any two sequences with the same name in all these alignments. For example, below is the content of file "alnlist": @ alignment1.aln alignment2.aln alignment3.aln PCMA (command: pcma alnlist) will generate a new alignment named "alnlist.aln". ****************************************************************************** References Pei, J., Sadreyev, R., Grishin, N.V., (2003) PCMA: a program for fast and accurate multiple sequence alignment, Bioinformatics.19(3):427-428. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680. Notredame, C., Higgins, D.G., and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment, J Mol Biol 302, 205-17.