Identity threshold To
properly balance alignment speed and accuracy, we have applied a
two-stage alignment strategy similar to the one used in our program PCMA (web server).
In the first stage, highly similar sequences are progressively aligned
in a fast way without consistency scoring. The scoring function in this
stage is weighted sum-of-pairs measure of BLOSUM62 scores. If two groups
neighboring on a tree have an average sequence identity higher than a
certain threshold (default is 0.6), they are aligned in this fast way.
The result of the first stage is a set of pre-aligned groups that are
relatively divergent from each other. One representative sequence is
selected from each pre-aligned group. In the second alignment stage,
these representative sequences are subject to the more time-consuming
probabilistic consistency measure, and are aligned progressively
according to the consistency scoring function. Finally, the pre-aligned
groups obtained in the first stage are merged according to the alignment
of the representatives to obtain the alignment of all sequences. If "Identity threshold"
is equal to or larger than 1, all sequences are subject to consistency measure and
the alignment process is the most time-consuming. If it is set to 0,
all sequences are aligned in a fast way; in this case the alignment
quality for divergent sequences is expected to be low since
consistency-based scoring function is not used. The default value is set
to 0.6 since for sequences with identity above 60% the fast stage
can still produce good quality alignments, but alignment proceeds about
6 times faster than when all sequences are aligned using consistency
measure. Reference:
Pei J, Sadreyev R, Grishin NV: PCMA: fast
and accurate multiple sequence alignment based on profile consistency. Bioinformatics
2003, 19(3):427-428 |