MUMMALS (MUltiple alignment with
Multiple MAtch state models of Local Structure) is a program for
constructing multiple alignment for protein sequences (Pei and
Grishin, submitted). It implements complex hidden Markov models (HMMs)
of pairwise alignment with multiple match states that capture local
structural information. MUMMALS adopts a progressive alignment method
that applies a probabilistic consistency-based scoring function similar
to the one used in ProbCons (Do et al., 2005). First, a tree is built in
a fast way based on a K-mer count method (Edgar, 2004). An initial
alignment is built progressively guided by the tree with a simple
sum-of-pairs scoring function. A second tree is then built with a UPGMA
method based on sequence identities calculated from the initial
alignment. The probabilistic consistency strategy is applied in the same
way as in the ProbCons program (Do et al., 2005). For each sequence
pair, the match probabilities of residue pairs are calculated using one
of the HMMs listed below. These probability matrices are subject to
consistency measure, which involves multiplications of the matrices.
Finally MUMMALS progressively aligns the sequences guided by the second
tree using the consistency-based scoring function. In this web server, users have the option of selecting one of the five hidden Markov models listed below: HMM_1_1_0: with only one match state to model any residue pair. HMM_1_1_1: with 2 match states. One match state models residue pairs in core blocks, the other match state models residue pairs in unaligned (structurally divergent) regions. HMM_1_3_1: with 4 match states. Residue pairs in core blocks are modeled by 3 match states corresponding to three secondary structure types (helix, strand and coil). Residue pairs in unaligned regions are modeled by 1 match state. HMM_3_1_1: with 4 match states. Residue pairs in core blocks are modeled by 3 match states corresponding to three categories of relative sidechain solvent accessibility (<11.2, 11.2~51.7, and >51.7). Residue pairs in unaligned regions are modeled by 1 match state. HMM_3_3_1: with 10 match states. Residue pairs in core blocks are modeled by 9 match states corresponding to combinations of three secondary structure types and three solvent accessibility categories. Residue pairs in unaligned regions are modeled by 1 match state. Here are diagrams about HMM_1_1_0, HMM_1_1_1 and HMM_1_3_0. Running time order: HMM_1_1_0 < HMM_1_1_1 < HMM_1_3_1 ≈ HMM_3_1_1 < HMM_3_3_1 Alignment quality generally in the order: HMM_1_1_0 < HMM_1_1_1 < HMM_3_1_1 < HMM_1_3_1 ≈ HMM_3_3_1 HMM_1_3_1 is the default since it balances running time and alignment quality. The two best performing HMMs (HMM_1_3_1 and HMM_3_3_1) give on average slightly better results (several percent) than ProbCons (version 1.1), MAFFT (version 5.667) and MUSCLE (version 3.52) on several multiple alignment testing datasets (Pei and Grishin, submitted).
Do, C. B., Mahabhashyam, M. S., Brudno, M. and Batzoglou, S. (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15, 330-340. |