SFESA (Shift to Fix secondary structure ElementS in Alignments) is a tool to refine pairwise protein sequence alignment, with a combination of sequence and structural scoring by locally shifting secondary structure elements.

Overview of alignment refinement process

SFESA is a method that can refine an original alignment (for example, PROMALS [1] pairwise alignment) by evaluating local shifts of template-defined secondary structural elements based on a novel scoring function (see flowchart). Firstly, a pairwise alignment input is treated as original alignment. Otherwise SFESA generates PROMALS alignment as the original alignment when two unaligned sequences as input. Secondly, the template sequence is subject to PSI-BLAST [2] searches to retrieve closest structure from our structural representative database if no template structure is given from input. Then, if the sequence identity between structure file and template sequence in input is above the threshold (default is 50%), the input of original alignment and template structure is fed into alignment refinement pipeline. Otherwise, only original alignment will be return without refinement. There are four modes for alignment refinement method: SFESA_O, SFESA_O+G, SFESA_O+G+M, SFESA_O+G+M+S depending on different gap processing methods and contact-based energy matrices (See parameters). Finally, the original alignment and the refined alignment will be shown as result.

SFESA input

Two unaligned sequences (or a pairwise alignment) and/or a structure of template can be input to SFESA. Strongly recommend to provide a structure of template to increase alignment quality.

Input sequences of query and template: (1). A pairwise alignment in FASTA format, or (2). two unaligned sequences in FASTA format.

Input sequences should be in FASTA format: Each sequence record consists of a description line followed by line(s) of the sequence.
Note: The first character of each description line should be a greater-than (">") sign.

(1). Input of a pairwise alignment in FASTA format.

For this input format, SFESA will make local shifts of secondary structure elements starting from this input alignment.

>d1ja1a3 (query)
>d2piaa2 (template)
(2). Input of two unaligned sequences in FASTA format.

For this input format, SFESA will generate a pairwise alignment for the two sequences by PROMALS,
then make local shifts of secondary structure elements.

Note: Please click the choice of "Two unaligned sequences" above the input frame, otherwise the default is to use input as an pairwise alignment.

>d1ja1a3 (query)
>d2piaa2 (template)

Any non-alphabetical character in the input sequences is ignored by SFESA.

A note for sequence names: Certain characters in sequence names are changed to "_", including space, tab, and *?'`"&|\/{}()[]$; (.(dot) and - are kept).

Input structures: It is recomended to upload an structure file for template or template homolog. The uploading structure files should be in PDB format. If not provided, search agaisnt our database to find the cloest homolog structure for the template. The sequence that is found to be closest to the provided structure or the structure database is assigned as the Template. The other sequence is assigned as the Query. Alignment will be refined only when the sequence identity between structure and template sequence (idential residues devied by all aligned positions) is above the threshold (default is 50%).

Input of template structure example (part) (see complete structure, click here):
ATOM    795  N   GLU A   1       8.804  32.358  19.734  1.00 12.73           N  
ATOM    796  CA  GLU A   1       9.685  32.086  18.600  1.00 14.50           C  
ATOM    797  C   GLU A   1      10.448  33.301  18.068  1.00 15.83           C  
ATOM    798  O   GLU A   1      11.330  33.155  17.221  1.00 16.89           O  
ATOM    799  CB  GLU A   1       8.880  31.462  17.446  1.00 17.54           C  
ATOM    800  CG  GLU A   1       8.258  30.101  17.753  1.00 16.58           C  
ATOM    801  CD  GLU A   1       7.417  29.543  16.614  1.00 16.05           C  
ATOM    802  OE1 GLU A   1       6.651  30.296  15.981  1.00 20.00           O  
ATOM    803  OE2 GLU A   1       7.468  28.325  16.391  1.00 18.78           O  
ATOM    804  N   PHE A   2      10.062  34.497  18.492  1.00 13.99           N  
ATOM    805  CA  PHE A   2      10.683  35.714  17.982  1.00 13.09           C  
ATOM    806  C   PHE A   2      10.788  36.758  19.102  1.00 13.51           C  
ATOM    807  O   PHE A   2      10.157  37.807  19.040  1.00 15.07           O  
ATOM    808  CB  PHE A   2       9.859  36.249  16.806  1.00 13.31           C  
ATOM    809  CG  PHE A   2      10.557  37.286  15.962  1.00 11.78           C  
ATOM    810  CD1 PHE A   2      11.882  37.124  15.579  1.00 12.00           C  
ATOM    811  CD2 PHE A   2       9.856  38.378  15.473  1.00 12.73           C  
ATOM    812  CE1 PHE A   2      12.486  38.033  14.712  1.00 15.85           C  
ATOM    813  CE2 PHE A   2      10.452  39.286  14.611  1.00 14.41           C  
ATOM    814  CZ  PHE A   2      11.765  39.114  14.228  1.00 11.11           C  
ATOM    815  N   PRO A   3      11.539  36.447  20.176  1.00 16.07           N  
ATOM    816  CA  PRO A   3      11.504  37.278  21.381  1.00 12.77           C  
ATOM    817  C   PRO A   3      12.422  38.484  21.330  1.00 13.95           C  
ATOM    818  O   PRO A   3      13.440  38.486  20.635  1.00 14.16           O  
ATOM    819  CB  PRO A   3      11.925  36.315  22.485  1.00 15.51           C  
ATOM    820  CG  PRO A   3      12.873  35.417  21.805  1.00 14.27           C  
ATOM    821  CD  PRO A   3      12.264  35.186  20.429  1.00 13.84           C  
ATOM    822  N   LEU A   4      11.972  39.548  21.985  1.00 13.07           N  
ATOM    823  CA  LEU A   4      12.773  40.728  22.293  1.00 14.75           C  
ATOM    824  C   LEU A   4      14.026  40.348  23.073  1.00 17.28           C  
ATOM    825  O   LEU A   4      13.915  39.740  24.129  1.00 16.56           O  
ATOM    826  CB  LEU A   4      11.947  41.662  23.163  1.00 18.57           C  
ATOM    827  CG  LEU A   4      11.437  42.989  22.645  1.00 21.68           C  
ATOM    828  CD1 LEU A   4      11.549  43.062  21.139  1.00 24.97           C  
ATOM    829  CD2 LEU A   4      10.010  43.146  23.114  1.00 21.31           C  
ATOM    830  N   ASP A   5      15.198  40.802  22.629  1.00 16.86           N  
ATOM    831  CA  ASP A   5      16.432  40.581  23.384  1.00 17.70           C  
ATOM    832  C   ASP A   5      16.420  41.250  24.772  1.00 18.70           C  
ATOM    833  O   ASP A   5      16.162  42.443  24.876  1.00 19.51           O  
ATOM    834  CB  ASP A   5      17.623  41.095  22.590  1.00 21.09           C  
ATOM    835  CG  ASP A   5      18.938  40.534  23.084  1.00 21.43           C  
ATOM    836  OD1 ASP A   5      19.512  41.116  24.021  1.00 23.83           O  
ATOM    837  OD2 ASP A   5      19.407  39.536  22.504  1.00 27.41           O 

OR enter PDB identifier and chain identifier: Input of 4-digital formatted PDB identifier (e.g.: 1ja1) and chain identifier can directly load the strucutre from PDB Database. If no chain identifier is given, the whole chain will be used.

Input Email: SFESA jobs usually take minutes to hours for sequence alignment. It takes longer if seuqnece with long length or many homologs. Thus, it is highly recommended that an email address is provided so that the link to your result is sent to you when the alignment is finished.

Input job name: Assign your sequences a short name can help identify your alignment job. This name will appear in the subject line of the email sent to you.

SFESA output

SFESA web server provides results include the following information.

1. Original alignment with secondary structure and colored alignment blocks (based on secondary structure of template).

The first line in this block shows predicted secondary structure from PSIPRED [3] for query sequence. The last two lines show the secondary structure from DSSP [4] and PALSSE [5] ("H"-Helix, "S"-Strand and "C"- Coil). The template elements are based on PALSSE [5]. The "Query" and "template" are from the starting sequence alignment (PROMALS pairwise alignment when just two unaligned sequences as input or the input alignment). The line started with "Number" above query or below template alignment shows the position number of the residue below and above in query or template.

If the sequence identity between of structure file and template sequence in input (idential residues devied by all aligned positions) is below the threshold (default is 50%), only this original alignment part is generated as final result.
The "Helix" alignment blocks are shown alternately in Red and Orange in alignment.
The "Strand" alignment blocks are shown alternately in Blue and Darkgreen in alignment.

Starting alignment example:

Original Pairwise Alignment (no SFESA refinement)

PSIPRED : CCC-CCCCCCEEEEECCCCHHHHHHHHHHHHHHHHCCCCCCCEEEEEEEECCCCCCCHHHHHHHHHHCCCCCEEEEEEECCCCCCCHHHHHHHHHHHHHHHHHHCCCCEEEEECCCCCHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHCCCEEEECCC Number  :          10        20        30        40        50        60        70        80        90       100       110       120       130       140       150       160
Number  :         10        20        30             40         50        60         70         80                 90               100       110                    120

2. Refined alignment with secondary structure and colored alignment blocks (based on secondary structure of template).

If the sequence identity between of structure file and template sequence in input (idential residues devied by all aligned positions) is above the threshold (default is 50%), the following parts including such refined alignment and shifting details will be generated.
The line started with "Cm1" and "Cm2" represent the comparison of refined and original alignment. "Cm1" shows the sign of the query residue shifting ("+": query residue shifted towards C-terminal; "-": query residue shifted towards N-terminal;)) while "Cm2" shows the query residue shifting number. If the query residue is aligned to a gap in both original and refined alignment, "Cm1" leaves blank and "Cm2" shows "-". If query residue is aligned to one residue in original alignment but aligned to gap in refined alignment, "Cm1" leaves blank and "Cm2" shows "*". If template residue is aligned to gap, both "Cm1" and "Cm2" leaves blank. Other lines are same as above original alignment explanation.
The "Helix" alignment blocks are shown alternately in Red and Orange in alignment.
The "Strand" alignment blocks are shown alternately in Blue and Darkgreen in alignment.
In SFESA alignment, the alignment blocks shifted are marked with Underscore.

Refined alignment example:

Refined Alignment by SFESA (SFESA_O+G+M)

PSIPRED : CCC-CCCCCCEEEEECCCCHHHHHHHHHHHHHHHHCCCCCCCEEEEEEEECCCCCCCHHHHHHHHHHCCCCCEEEEEEEC--CCCCCCHHHHHHHHHHHHHHHHHHCCCCEEEEECCCCCHHHHHHHHHH-HHHHHCCCCCHHHHHHHHHHHHHCCCEEEECCC Number  :          10        20        30        40        50        60        70          80        90       100       110       120        130       140       150       160
Cm1     :                                                                         --------                                             +++++                                  
Cm2     : 000 000000000000000000000000000000-----0000000000000-0000000000000000-**22222222  00-000000000000---------0000000000*########77777 *******0000000-------------000---
Number  :         10        20        30             40         50        60           70         80                 90        100       110                            120

3. List of all alignment blocks in SFESA.

This part shows a table containing all alignment blocks (based on secondary structure of template) shifting results. The 1st column is the number of alignment block. Clicking on this number will direct to the details of every shifting position of this alignment block.
The 2nd and 3rd column is the template secondary structure element starting and ending residue position number of this alignment block.
The 4th column shows the secondary structure type.
The 5th column shows the original alignment block.
The 6th column indicates the shifting result of this alignment block. If shifted, a format of "Gap mode [shifting number]" is showed. There are three gap modes: Original (no change of original alignment block), Left (residues in alignment blocks can be aligned to leftmost while all gaps are put to the opposite side before shifting) and Right (residues in alignment blocks can be aligned to rightmost while all gaps are put to the opposite side before shifting). And the default setting for shifting number is from -4 to +4.
The 7th column shows the modified alignment block if the alignment block is shifting. Otherwise the 7th column shows "-" that means no change.
The colored line means this alignment block is shifted in refined alignment.


List of all elements in SFESA

Original Alignment BlockShift by SFESA
(Gap Mode
[Shifting Number])
Refined Alignment Block
1 9 16 Strand TPVIMVGP
No Shift -
No Shift -
3 36 46 Strand GETLLYYGCRR
No Shift -
4 64 71 Strand LTQLNVAF
Original[-2] QLNVAFSR
5 79 87 Helix YVQHLLKRD
No Shift -
6 90 97 Strand GAHIYVAG
No Shift -

4. Shifting details of all elements in SFESA

This part shows tables for each alignment blocks which contains all possible alignment block variants (alignment blocks without gaps: 1 original + 8 variants; alignment blocks with gaps: 1 original + 9 "left" variants + 9 "right" variants). The 1st column in each table is gap mode. There are three gap modes if there are gaps in this alignment block: Original (no change of original alignment block), Left (residues in alignment blocks can be aligned to leftmost while all gaps are put to the opposite side before shifting) and Right (residues in alignment blocks can be aligned to rightmost while all gaps are put to the opposite side before shifting).
The 2nd column is the shifting number. The default setting for shifting number is from -4 to +4.
The 3rd column indicates if such variant is unique or the same as one shown previously. If not unique, the previous shifting result is shown.
The 4th column shows the alignment block variant with extended residues in two ends. The residues in blue and pink show the original alignment block in query and template. The maximal boundaries of marked original alignment block defined the boundaries of the alignment variant.
The 5th, 6th, 7th and 8th column show the sequence score, structure score, combined score 1 and combined score 2 of such alignment block variant.
The line in red means this alignment block variant is the final choice in refined alignment.


Scoring details of shifts in alignment block number 4:

Gap ModeShift
Unique?Alignment VariantsSequence
Score I
Score II
Original -4 Yes ALTQLNVAFSREQ----A
0.5110 0.7999 0.7653 0.7940
Original -3 Yes GALTQLNVAFSRE---QA
0.0422 0.8231 0.7294 0.8074
1.7973 0.8078 0.9265 0.8276
-0.9663 0.6966 0.4970 0.6634
0.1343 0.7539 0.6795 0.7414
-0.6102 0.7326 0.5715 0.7057
-0.8445 0.6599 0.4793 0.6298
-0.9510 0.6068 0.4200 0.5757
-1.5960 0.5539 0.2958 0.5109

Alignment parameters are listed below

SFESA refined alignment mode

When we analyzed PROMLAS alignments, we found that secondary structure elements are often misaligned in alignment generated by PROMALS by a few residues. And the main reason is PROMLAS cannot introduce gaps in order to avoid the gap penalties. In these cases correct alignment solutions can be found within a limited set of local shifts of secondary structure elements. Data shows about 80% of the improvable alignment block can be refined by up to 4 residue local shifts. Thus, the alignment variants generated for one alignment block is limited to +/-4 shifting and the number of variants is up to 8.
Another scenario is that there are gaps in query or template alignment blocks. Above +/-4 shifting works for this scenario. In addition, there are more alignment variant options if the pre-process of gaps is done. It is known that most secondary structure elements should be aligned to secondary structure elements without gaps inserted, thus we can process the gaps by putting them to one side without interrupting the secondary structure elements. Residues in alignment blocks can be aligned to leftmost or rightmost while all gaps are put to the opposite side.These two alignment variants can be considered as new original alignment blocks, so 8 alignment variants can be produced for each new origin. In this case, computational experiments show that it's better to consider only gap-processed alignment variants. That is to say for these cases with gaps, SFESA can generate up to 18 (1+1+8+1+8) alignment variants.
On the other hand, contacted residues are defined as residue pair within a distance cutoff between them and the contacting network among the protein supports the global protein structure. In the template of one alignment, the contacting network can be identified based on the template known structure. One residue in the template cannot only contact with sequence neighboring residues but also contact with the structurally closed residues far ways in sequence order. These contacts defines residues structural environment in the template, and the correctly aligned equivalent residues in the query should have similar structural environment and vice versa. Based on the deduced residue-residue contact energy matrix by Miyazawa and Jernigan [6], the total contact energies (structure score) of query residues in the alignment block can be utilized to select better alignment variant.
Our study shows contact matrix derived from alignment and different definition of residue-residue contact can help to better differentiate correct and wrong alignment. The new contact cutoff is 6.5A between any atoms of two residues. Alignments for deriving are original PROMALS alignments from training dataset. Each alignment block is allowed to shift +/-4 residues (8 variants), and then the best one is selected by DALI-dependent accuracy [7] and the other 8 alignment variant (or original alignment block) are considered as decreased cases.

SFESA (O) means to use up to 8 variants and Miyazawa-Jernigan contact matrix;
SFESA (O+G), considering gap processing, used up to 18 variants; besides this;
SFESA (O+G+M) tried our derived contact matrix;
SFESA (O+G+M+S) used SSVM in second filter instead of Scomb_II.

The default mode is SFESA (O+G+M) which has the highest overall alignment accuracy in our inhouse data. But, other three modes show a better result in some cases.

Structure sequence identity threshold (Identity threshold above which structure of template is applied)

The parameter "structure sequence identity threshold" is the sequence identity threshold between template structure (the input structure or the closest homolog structure searching against our structure database if no structure input) and template sequnece in input. The sequence identity is idential residues devied by all aligned positions (ignore resides position aligned to gaps).
Our alignment refinement method replies on the contacts from the template structure. More accurate template structure given, the higher alignment quality generated. And the low sccurate structure may hurt the original alignment quality. Thus, a accurate structure of template is strongly recommended to upload.
The default value is 0.5 (range is 0-1). This means that original alignment will be refined by our method only if at least 50% of aligned residues between template structure and sequence are identical.

Maximal residue positions to shift

After secondary structure elements are recognized from template structure, all alignment blocks are generated. For each alignment block, our method can locally shift to generate several alignment block variants. Then, a better (or the original) alignment block variant can be selected based on a combination of sequence and structural scoring. Thus, it is very important to choose the maximal shifting positions. The more shifting positions tried, the higher probability the better variant generated but the more difficult the better correctly selected. Our computational result shows that 4 residue shifting is the most efficient one which can generate 80% better alignment blocks in our data.
The default setting is 4 (range is integer above 1).
However, this parameter can be customized to check different alignment block variants with different maximal shifting position number. For example, n maximal residue positions means 2n alignment block variants if no gaps processing while 4n+2 alignment block variant if gaps processing considered.

Non-gap threshold above which elements in template is applied SFESA

The secondary structure elements are recognized from template structure. It's probable that such elements are insertions if most residues of them are aligned to gaps in query and template alignment. It is meaningless to fix alignment blocks recognized from such insertion elements. Thus, a threshold parameter is to prevent such alignment block processing.
The default value is 0.5 (range is [0-1)). This means the alignment block is applied SFESA to refine only if the non-gap aligned residue percentage is more than 50% in the alignment block recognized from template secondary structure elements.

Automatically assign structure for sequence in input

The default setting is "Yes". That means, the sequence that is found to be closest to the provided structure or the structure database is assigned as the Template (T). The other sequence is assigned as the Query (Q). If "No" is selected, the input user provided is assumed to put query sequence firstly followed by template sequence (the structure is for the second sequence in input).

Parameters for generating Psi-blast profile


maximum number of iterations done by PSI-BLAST. The range is integer between 1 and 8.


only hits with an e-value less than this threshold will be included in the next iteration.

Identity cutoff below which distant homologs are removed

any PSI-BLAST hit with a sequence identity to the query less than this cutoff will be removed. Divergent homologs could negatively affect sequence profile quality. (default: 0.2, corresponding to 20% sequence identity. The range is 0-1.)


1. Pei, J. and Grishin, N.V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics, 23, 802-808.

2. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. (2005) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.

3. Jones D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol , 292:195-202.

4. Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637.

5. I.Majumdar, S.S.Krishna and N.V.Grishin (2005) PALSSE: A program to delineate linear secondary structural elements from protein structures. BMC Bioinformatics 6: 202.

6. Miyazawa, S. and Jernigan, R.L. (1999) An empirical energy potential with a reference state for protein fold and sequence recognition. Proteins, 36, 357-369.

7. Holm, L. and Sander, C. (1996) Mapping the protein universe. Science, 273, 595-603.