Identity threshold

The parameter "Identity threshold" is the sequence identity threshold that specifies the boundary between the fast-stage less accurate alignment process and the slow-stage more accurate alignment process.

To properly balance alignment speed and accuracy, we have applied a two-stage alignment strategy similar to the one used in our program PCMA (PCMA server). In the first stage, highly similar sequences are progressively aligned in a fast way without consistency scoring. The scoring function in this stage is weighted sum-of-pairs measure of BLOSUM62 scores. If two groups neighboring on a tree have an average sequence identity higher than a certain threshold (default is 0.6), they are aligned in this fast way. The result of the first stage is a set of pre-aligned groups that are relatively divergent from each other. One representative sequence is selected from each pre-aligned group. In the second alignment stage, these representative sequences are subject to the more time-consuming probabilistic consistency measure, and are aligned progressively according to the consistency scoring function. Finally, the pre-aligned groups obtained in the first stage are merged according to the alignment of the representatives to obtain the alignment of all sequences.

If "Identity threshold" is set to 1, all sequences are subject to consistency measure and the alignment process is the most time-consuming. If it is set to 0, all sequences are aligned in a fast way; in this case the alignment quality for divergent sequences is expected to be low since consistency-based scoring function is not used. The default value is set to 0.6 since for sequences with identity above 60% the fast stage can still produce good quality alignments, but alignment proceeds about 6 times faster than when all sequences are aligned using consistency measure.

Reference: Pei J, Sadreyev R, Grishin NV: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 2003, 19(3):427-428