Options for alignment parameters

Weight for amino acid scores and Weight for predicted secondary structure scores

These two weights determine the relative contribution of amino acid similarity and predicted secondary structure similarity to the profile-profile hidden Markov model used in PROMALS. These weights can be set to any positive values. Larger weight means larger contribution of this score to the total alignment score. The default and recommended values are 0.8 for amino acid scores and 0.2 for secondary structure scores. These values were optimized on SCOP domain pairs with less than 20% sequence identity.

Identity threshold

The parameter "Identity threshold" is the sequence identity threshold that specifies the boundary between the fast-stage less accurate alignment process and the slow-stage more accurate alignment process.

To properly balance alignment speed and accuracy, we have applied a two-stage alignment strategy similar to the one used in our program PCMA (web server). In the first stage, highly similar sequences are progressively aligned in a fast way without consistency scoring. The scoring function in this stage is weighted sum-of-pairs measure of BLOSUM62 scores. If two groups neighboring on a tree have an average sequence identity higher than a certain threshold (default is 0.6), they are aligned in this fast way. The result of the first stage is a set of pre-aligned groups that are relatively divergent from each other. One representative sequence is selected from each pre-aligned group. In the second alignment stage, these representative sequences are subject to the more time-consuming probabilistic consistency measure, and are aligned progressively according to the consistency scoring function. Finally, the pre-aligned groups obtained in the first stage are merged according to the alignment of the representatives to obtain the alignment of all sequences.

If "Identity threshold" is equal to or larger than 1, all sequences are subject to consistency measure and the alignment process is the most time-consuming. If it is set to 0, all sequences are aligned in a fast way; in this case the alignment quality for divergent sequences is expected to be low since consistency-based scoring function is not used. The default value is set to 0.6 since for sequences with identity above 60% the fast stage can still produce good quality alignments, but alignment proceeds about 6 times faster than when all sequences are aligned using consistency measure.

Reference: Pei J, Sadreyev R, Grishin NV: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 2003, 19(3):427-428