MESSA - Meta Server for Sequence Analysis
General Description:
MESSA is a meta server to facilitate protein sequence analysis. It predicts the sequence features, structure and
function for a given protein sequence. For an input sequence, the server exploits a number of select tools to predict
local sequence properties such secondary structure and structurally disordered regions, coiled coils, signal peptides
and transmembrane helices; detect homologous proteins and assign the query to related protein families; identify 3D
structure templates and generate structure models; provide predictive statements about protein's function, including
functional annotations transferred from close homologs, Gene Ontology terms, enzyme classification and possible
functionally associated proteins.
Submission:
MESSA asks a user to provide a protein sequence and his or her non commercial email address to initiate a job.
In addition, it is recommended to provide information about the organism to which the input sequence belongs to and
a MODELLER KEY to enable the full function of MESSA. Details about the inputs are listed below:
Protein Sequence: (required)
The protein sequence should be in fasta format or as plain-text. You can paste the sequence in the text area
or upload it as a file. A valid sequence should contain no less than 30 and no more than 5000 amino acids.
Organism Type: (optional)
This input is required for Signal Peptide prediction by SignalP (version 3.0). You need to specify the type of
organism your sequence belong to. You can select from "Eukaryote", "Gram-negative bacterium" and
"Gram-positive bacterium". If none of these three categories is applicable, select "other" instead, and in that
case, the signal peptide prediction by SignalP is still indicative, but not as reliable.
Organism Name: (optional)
This input is useful for testing the potential orthologous relationship between your query and its homologs
by Reciprocal Best Hit method and for MESSA to map your sequence to its genome locus. Simply type in the name
of the organism to which your sequence belongs and MESSA will try to search if this name matches any
organisms in our database of whole genome sequences. The top 10 possible matches for the name (or partial
name) you typed will show up in the drop down menu right below, from which you can select the one you want.
[Warning: as only 10 matching names are shown, your might need to input a few might need to input a few more
characters to find the name you want].
MODELLER KEY: (optional)
If you would like MESSA to generate structure models for your input sequence, you need to provide a
MODELLER KEY as required by the MODELLER license. A MODELLER KEY can be obtained here.
Please note that if we cannot identify any confident structure templates for your query, structure models will
not be generated, even if you provide this key.
Email Address: (required)
MESSA will send an email to this address when your job is done. Only a non-commercial email address
(ending with ".org" or ".edu") will be accepted.
Job Name: (optional)
You can assign a name for your job. Otherwise, MESSA will assign a random number.
Explanation of Results (full version):
The full version offers extensive information and is designed for manual analysis of a protein. It presents important
information from all programs and provides links to the original results. It contains the following eight sections:
Part I. Prediction of Local Sequence Features:
MESSA uses multiple tools to predict local sequence features such as secondary structure, transmembrane helices
and signal peptides. The result from each predictor is represented as one string reporting each residue's predicted
status. These strings are all aligned to the original protein sequence for ease of comparison. When the sequence
is too long and cannot fit in the horizontal dimension of the web page, a scroll bar will appear under these predictions,
and you can move it to view the whole sequence.
Protein Sequence
The input protein sequence shown in this line is highlighted according to amino acid property: positively
charged residues in blue (light blue for partially positive), negatively charged residues in red (pink for
partially negative), hydrophobic residues in yellow and no highlight for small residues.
Secondary Structure (PSIPRED)
This program predicts 3-state secondary structure, alpha-helix (including 3-turn and 5-turn helix), beta-strand
and coils (including any secondary structure other than helix and strand). In the result, "H" represents alpha-
helix; "E" stands for beta-strand; and "c" refers to coils.
Secondary Structure (SSPRO)
This program predict 3-state secondary structure, alpha-helix (including 3-turn and 5-turn helix), beta-strand
and coils (including any secondary structure other than helix and strand). In the result, "H" represents alpha-
helix; "E" stands for beta-strand; and "c" refers to coils.
Coil and Loop (DISEMBL)
This program can be considered as a secondary structure predictor. Residues with secondary structure as neither
alpha-helix or beta-strand, will be predicted by this program. Predicted coils and loops are highlighted in pink.
Flexible Loop (DISEMBL)
Loops and coils that are likely to be structurally flexible will be predicted and highlighted in pink.
Low Complexity Region (SEG)
Regions with biased amino acid composition are called low-complexity regions. These regions are usually
disordered, and sometimes fold as alpha-helices. In addition, such regions could cause problems in
sequence comparison as similar low-complexity regions could appear in totally unrelated proteins.
Predicted low-complexity regions are colored in light red.
Disordered Region (IsUnstruct)
Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.
Predicted disordered regions are highlighted in red.
Disordered Region (DISOPRED)
Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.
Predicted disordered regions are highlighted in red.
Disordered Region (DISEMBL)
Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.
Predicted disordered regions are highlighted in red.
Disordered Region (DISPRO)
Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.
Predicted disordered regions are highlighted in red.
Transmembrane Helix (TMHMM)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization
and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) When at least one transmembrane helix is predicted, the characters "o" and "i" are used to represent the
topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side.
When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and
is thus meaningless
Transmembrane Helix (TOPPRED)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization
and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false
predictions.
(3) Regions that do not likely belong to transmembrane helices are marked as "x".
Transmembrane Helix (HMMTOP)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization
and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the
topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side.
When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and
is thus meaningless
Transmembrane Helix (MEMSAT)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization
and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false
predictions
(3) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the
topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side.
When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and
is thus meaningless
TM Helix and Signal Peptide (MEMSAT_SVM) (TM: Transmembrane)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). Proteins with signal peptides are usually
secreted into extracelluar space or periplasmic space of Gram-Negative bacteria. Secreted proteins play
important roles in exchanging materials and information with the environment. These predictions are indicative
of localization and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false
predictions
(3) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the
topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side.
When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and
is thus meaningless
(4) A predicted signal peptide will be marked as "S" and highlighted in green.
(5) Sometimes, a helix might partly insert itself into the membrane without passing through the membrane,
these helices are predicted by this program, marked as "R" and highlighted in yellow.
TM Helix and Signal Peptide (Phobius) (TM: Transmembrane)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptors, transporters or components of
machinery on the membrane (such as electronic chain, flagellar). Proteins with signal peptides are usually
secreted into extracelluar space or periplasmic space of Gram-Negative bacteria. Secreted proteins play
important roles in exchanging materials and information with the environment. These predictions are indicative
of localization and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the
topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side.
When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and
is thus meaningless
(3) A predicted signal peptide will be marked as "S" and highlighted in green.
Signal Peptide (SignalP HMM Mode) (HMM: Hidden Markov Model)
Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
Negative bacteria. Secreted proteins play important roles in exchanging materials and information with the
environment.
(1) A predicted signal peptide will be marked as "S" and highlighted in green.
(2) Residues that do not likely to belong to a signal peptide are marked as "x".
Signal Peptide (SignalP NN Mode) (NN: Neural Network)
Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
Negative bacteria. Secreted proteins play important roles in exchanging materials and information with the
environment.
(1) A predicted signal peptide will be marked as "S" and highlighted in green.
(2) Residues that do not likely to belong to a signal peptide are marked as "x".
Coiled Coils (COILS)
A coiled coil is a special structure motif, and several such motifs can be coiled together like strands of a rope.
Detecting them is useful for 3D structure prediction. Coiled coils predicted by this program are marked as "x"
and highlighted in yellow.
Positional Conservation
Multiple Sequence Alignment of confident BLAST hits, filtered by less than 90% identity and more than 40%
coverage, are used to calculate the positional conservation indices of residues in the sequence. The
conserved residues usually play important roles in maintaining the function or structure of a protein. The
residues are highlighted from white, through yellow to dark red as the conservation level increases.
Part II. Close Homologs for Annotation Transfer:
Close Homologs in the Swiss-Prot Database Detected by BLAST
Since annotations in Swiss-Prot database are of high quality, a close homolog (for example, a reciprocal
best hit for the query or a hit that shares above 40% sequence identity with the query) in this section can be
used for annotation transfer. The top 10 confident hits (e-value cutoff: 0.001) detected by BLAST from the
Swiss-Prot database are shown in this section.
A summary table about these hits is on top, containing the following information:
(1) ID: The Swiss-Prot ID for the hit, linking to the web page for this protein in the Swiss-Prot database.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about the
hit shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 30 characters in the annotation of the hit (not a complete annotation).
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) RBH(Q2H): Reciprocal Best Hit, Query to Hit. Indicates whether the hit is the most similar to (measured
by BLAST) the query among all proteins in the same organism as the hit. This measurement is not
available (N/A) when the hit is from a organism without available whole genome sequence.
(6) RBH(H2Q): Reciprocal Best Hit, Hit to Query. Indicates whether the query is the most similar to
(measured by BLAST) the hit among all proteins in the same organism as the query. This measurement is
not available (N/A) when the query is from a organism without available whole genome sequence.
(7) Q cover: BLAST alignment coverage for the Query.
(8) H cover: BLAST alignment coverage for the Hit.
(9) Identity: Identity between the query and the hit over the aligned region.
(10) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
as this hit in a random database of the same size as the Swiss-Prot database.
More information about each hit is listed below the summary, including:
(1) Hit header: the Swiss-Prot ID, Swiss-Prot name and complete Swiss-Prot annotation of the
hit, formatted as the sequence headers in BLAST database.
(2) The alignment between the query and the hit
(3) The function description, assigned Enzyme Commission number of the hit (if any), and the organism
from which this hit comes.
(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on
The right of the hit header.
Close Homologs in the NCBI Non-Redundant Database Detected by BLAST
This section contains the closest sequences to your query available in the Non-Redundant (NR) database.
The annotations of these proteins are not as reliable as the Swiss-Prot entries. However, if the
annotations of the best hits from NR match those from Swiss-Prot, it is an additional evidence for
annotation transfer. The top 10 confident hits (e-value cutoff: 0.001) from the NR database are shown.
A summary table about them is on top, containing the following information:
(1) GI: The unique ID for the hit in the NCBI database, linking to the page for this protein in NCBI.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking details about
the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters of the hit annotation in NR database (not a complete annotation).
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) Q cover: BLAST alignment coverage for the Query.
(6) H cover: BLAST alignment coverage for the Hit.
(7) Identity: Identity between the query and the hit over the aligned region.
(8) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
as this hit in a random database of the same size as the NR database.
More information about each hit is listed below the summary, including:
(1) Hit header: the GI and the full definition of the hit in the NR database.
(2) The alignment between the query and the hit.
(3) The organism from which this sequence come and the lineage of this organism.
(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on
the right of the hit headers.
Part III. Prediction of Gene Ontology (GO) Terms
Close Homologs with Gene Ontology terms Detected by BLAST
GO terms are the standard representations of protein attributes and they are widely used by researchers.
MESSA predict the GO terms associated with the query using the AMIGO server. The AMIGO server uses
BLAST to detect homologs of the query in the GO database. These homologs, if close enough to the query,
could be used as sources for GO term transfer. The top 10 closest homologs (E-value cutoff 0.001) in the
GO databases detected by AMIGO are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the GO database, linking to the web page for it in the GO database.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about
the hit shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit. This is not a complete annotation. We
use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) Q cover: BLAST alignment coverage for the Query.
(6) H cover: BLAST alignment coverage for the Hit.
(7) Identity: Identity between the query and the hit over the aligned region.
(8) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
as this hit in a random database of the same size as the GO database.
More information about each hit is listed below the summary, including:
(1) Hit header: include the ID and the full definition of the hit in the GO database.
(2) The alignment between the query and the hit.
(3) The GO terms associated with this hit.
(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on
the right of the hit header.
Part IV. Prediction of Enzyme Commission (EC) Number
EC numbers describe the types of reactions enzymes catalyze and they are essential for understanding the function
of proteins in the context of metabolic pathways. This section contains EC number predictions by three methods, as
introduced below:
EC Number Prediction by Annotation Transfer from Swiss-Prot Entries
The close homologs are selected by MESSA from all confident BLAST hit (e-value cutoff: 0.001) from the
Swiss-Prot database. These homologs could be used as source to transfer their assigned EC numbers
to the query. For each of them, the following information is provided:
(1) ID: Swiss-Prot ID for the entry, linking to the web page for it in the Swiss-Prot database.
(2) Name: name of the entry in Swiss-Prot database, usually composed of the short name of
the protein and the short name of the organism the protein comes from.
(3) Annotated EC number: The EC number of this entry annotated by Swiss-Prot curators.
(4) Identity Sequence identity between this entry and the query calculated from BLAST alignment.
(5) Query coverage: The coverage of BLAST alignment for the query.
(6) Hit coverage: The coverage of BLAST alignment for this Swiss-Prot entry.
(7) RBH(Q2H): Reciprocal Best Hit, Query to Hit. Indicates whether this Swiss-Prot entry is the most
similar to (measured by BLAST) the query among all proteins from the same organism as this entry.
This measurement is not available (N/A) when the entry is from an organism without available whole
genome sequence.
(8) RBH(H2Q): Reciprocal Best Hit, Hit to Query. Indicates whether query is the most similar to
(measured by BLAST) this Swiss-Prot entry among all proteins from the same organism as the query.
This measurement is not available (N/A) when the query is from a organism without available whole
genome sequence.
EC Number Prediction by Ezypred Server
Ezypred predicts whether the query is an enzyme first. If it is an enzyme, Ezypred will predict the first two
levels of its EC number(such as 1.1.-.-). For prediction at each level, the Predicted EC number and its
explanation are listed.
EC Number Prediction by EFICAz Software
EFICAz predicts whether a query is an enzyme. If it is an enzyme, it tries to predict all four levels of its EC
number(such as 1.1.1.1). MESSA will list the predicted EC number, the description of this EC number and
the confidence of this prediction provided by EFICAz.
Part V. Prediction of functionally associated proteins
This section lists proteins that are likely to function together with the query. The prediction mostly relies on the
STRING database that assigns functional association between proteins by multiple criteria, such as physical
interaction, expression pattern and genomic context. This information is most helpful for bacterial queries.
Genomic Context
When the query comes from a user-specified organism with complete genome sequence available, MESSA
will map it to its genomic locus and provide a link, namely "View genomic context of your protein in NCBI", to
NCBI Gene database to show the genomic context of the query.
Functionally Associated Proteins Detected by STRING:
STRING assigns functionally associated proteins to a query by finding its close homolog in their database
and transferring the associations for that close homolog to the query. Predictions will be available only
for cases with close homologs present in the STRING database. MESSA presents the results in similar
format as STRING.
First, the closest homolog to the query in the STRING database used for association transfer is provided. We
suggest user to verify if it is an ortholog of your input. If not, the results may not be meaningful.
Second, the "Predicted Functional Partners" are listed in a table, providing the following information:
(1) The name and annotation of the partner.
(2) The evidence that suggests an association between this protein and your input. A dot will appear
under the certain criterion listed in the header of this table if it is used as evidence to suggest the
functional association. The color of this dot indicates the confidence of the evidence, with black being
confident and grey being less confident.
(3) The overall confidence score for this predicted association.
Part VI. Conserved Domains and Related Protein Families:
Proteins usually contain one or several conserved sequence domains. Identifying these domains will assign the query
sequence to a certain protein family and apply any relevant functional information about these conserved domains to
the query.
Conserved Domains Detected by RPS-BLAST
Confident homologs (E-value cutoff: 0.005) in the Conserved Domain Database (CDD) are shown. In addition
to its own domain library, CDD also contains domains and protein families from the Pfam, COG, KOG, SMART
and TIGRFAM databases.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the CDD, linking to the web page of this conserved domain in the CDD.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about
the hit shown below. Gaps in the query are negelected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation).
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) E-value: E-value from BLAST, indicating the likelihood of finding a hit with the same level of
similarity as this hit in a random database of the same size as CDD.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the CDD.
(2) The alignment between the query and the hit.
(3) The detailed description of this domain or protein family.
(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on
the right of the hit header.
Conserved Domains Detected by HHsearch
The confident homologous domains and protein families detected by HHsearch (probability cutoff: 90%) in
the CDD are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the CDD, linking to the web page of this conserved domain in the CDD.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation).
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) Probability: HHsearch probability, indicating the likelihood for the hit being a true homolog.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the CDD.
(2) The alignment between the query and the hit.
(3) The description of this domain or protein family in CDD.
(2) and (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button
on the right of the hit header.
Part VII. Homologous Structure Templates:
So far, homolog modeling is still the most reliable and effective way of structure prediction. MESSA utilizes several
procedures to get homologous structures. With different sensitivity and accuracy, the combination of these procedures
will enhance the chance to get reliable templates for homology based structure prediction.
Structure Templates Detected by BLAST
Confident homologs (e-value cutoff: 0.001) detected by BLAST in the Protein Data Bank (PDB) are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit, it is a combination of the PDB ID and the chain ID. It links to the web page of
this entry in PDB.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We
use this truncated annotation just to provide a brief idea of the hit, and the full annotation is listed below.
(5) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level
of similarity as this hit in a random database of the same size as PDB.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the PDB.
(2) The alignment between the query and the hit.
(3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon
diagram and colored in rainbow from the N- to C-terminus.
(2) and (3) are hidden by default for most hits and they can be easily retrieved by clicking the "show" button
on the right of the hit header.
Structure Templates Detected by RPS-BLAST
The confident homologs (e-value cutoff: 0.001) in the PDB, filtered by 70% sequence identity (PDB70)
detected by RPS-BLAST are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the PDB, it is a combination of the PDB ID and the chain ID. It links to the web
page for this entry in PDB.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We
use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) E-value: E-value from RPS-BLAST, indicating the likelihood of finding a hit with the same level of
similarity as this hit in a random database of the same size as PDB70.
More information about each hit is listed below the summary, including:
(1) Hit header, the ID and the full definition of the hit in the PDB.
(2) The alignment between the query and the hit.
(3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon
diagram and colored in rainbow from N- to C-terminus.
(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on
the right of the hit header.
Structure Templates Detected by HHsearch
The confident structure templetes detected by HHsearch (probability cutoff: 80%) in the PDB70 are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the PDB, it is a combination of the PDB ID and the chain ID. The ID links to the
web page of this entry in the PDB.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We
use this truncated annotation just to provide a brief idea, and the full annotations are listed below.
(5) Probability: Probability from HHsearch, indicating the likelihood of the hit being a true homolog.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the PDB.
(2) The alignment between the query and the hit
(3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon
diagram and colored in rainbow from N- to C-terminus.
(2) and (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button
on the right of the hit header.
Part VIII. Homologous Structure Domains:
Structure domains are the folding and evolutionary units of proteins. The presence of a certain structure domain usually implies
function. Indentifying such structure units in the protein will not only reveal the architecture of the target protein,
but provide hints about protein function as well.
Structure Domains Detected by RPS-BLAST
The confident homologs detected by RPS-BLAST (e-value cutoff: 0.001) in the Structure Classification of
Proteins database, filtered by 70% identity (SCOP70) are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the SCOP database.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation).
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) E-value: E-value from RPS-BLAST, indicating the likelihood of finding a hit with the same level
of similarity as this hit in a random database of the same size as SCOP70.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the PDB.
(2) The classification of the domain in SCOP.
(3) The alignment between the query and the hit.
(4) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon
diagram and colored in rainbow from N- to C-terminus.
(2), (3) and (4) are hidden for most hits by default and they can be easily retrieved by clicking the "show"
button on the right of the hit header.
Structure Domains detected by HHsearch
Confident homologs detected by HHsearch (probability cutoff: 90%) in the SCOP70 are shown.
A summary table of them is on top, containing the following information:
(1) ID: The ID for the hit in the SCOP database.
(2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
about the hit that are shown below. Gaps in the query are neglected in this graph.
(3) Length: the total sequence length of the hit.
(4) Definition: the first 40 characters in the annotation of the hit. This is not a complete annotation.
We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
(5) Probability: Probability from HHsearch, indicating the likelihood of the hit being a true homolog.
More information about each hit is listed below the summary, including:
(1) Hit header: the ID and the full definition of the hit in the SCOP database.
(2) The classification of the domain in SCOP.
(3) The alignment between the query and the hit.
(4) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon
diagram and colored in rainbow from N- to C-terminus.
(2), (3) and (4) are hidden for most hits by default and they can be easily retrieved by clicking the "show"
button on the right of the hit header.
Explanation of Results (summary version):
By integrating results from different methods, we generate the consensus-based final predictions for local sequence
features, three-dimensional structure and function. We present these predictions as a summary page, which contains
three sections:
Part I. Local Sequence Features Prediction:
This section contains prediction of secondary structure, disordered regions, transmembrane helices, signal peptide,
coiled coils and positional conservation indices. Except the last two, the predictions are based on the consensus
between multiple predictors The result for each prediction is represented as one string reporting each residue's
predicted status. These strings are all aligned to the original protein sequence for ease of comparison. When the
sequence is too long and cannot fit in the horizontal dimension of the web page, a scroll bar will appear under these
predictions, and you can move it to view the whole sequence.
Protein Sequence
The input protein sequence shown in this line is highlighted according to amino acid property:
positively charged residues in blue (light blue for partially positive), negatively charged residues in red (pink
for partially negative), hydrophobic residues in yellow and no highlight for small residues.
Secondary Structure (Consensus)
The 3-state secondary structure of the query is predicted, alpha-helix (including 3-turn and 5-turn helix),
beta-strand and coils (including any secondary structure other than helix and strand). In the result, "H"
represents alpha-helix; "E" stands for beta-strand; and "c" refers to coils.
Disordered Region (Consensus)
Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction. Here
the predicted disordered regions are highlighted in red.
Transmembrane Helix (Consensus)
Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative
bacteria). These transmembrane proteins typically function as receptor, transporter or component of
machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative for localization
and function. The prediction is reported following these rules:
(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue.
(2) Regions that do not likely belong to transmembrane helices are marked as "x".
Signal Peptide (Consensus)
Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
Negative bacteria). They play important role in exchanging materials and information with the environment.
(1) A predicted signal peptide will be marked as "S" and highlighted in green.
(2) Residues that do not likely to belong to a signal peptide are marked as "x".
Coiled Coils (COILS)
A coiled coil is a special structure motif, and several such motifs can be coiled together like strands of a rope.
Detecting them is useful for 3D structure prediction. Coiled coils predicted by this program is marked as "x"
and highlighted in yellow.
Positional Conservation
Multiple Sequence Alignment of confident BLAST hits, filtered by less than 90% identity and more than 40%
coverage, are used to calculate the positional conservation indices of residues in the sequence. The
conserved residues usually plays important in maintaining the function or structure of a protein. Here the
residues are highlighted from white, through yellow to dark red as the conservation level increases.
Part II. Function Prediction:
This section contains predicted function annotation, GO terms and EC numbers (if the query is an enzyme). A
confidence level ("very confident", "confident" or "probable") is provided for each prediction.
Annotation Transferred From Closely Related Swiss-Prot Entries
Since annotations in Swiss-Prot database are of high quality, a closely related Swiss-Prot entry could be
used as a reference for function prediction. MESSA detects homologous Swiss-Prot entries to the query,
evaluates their relationship and annotates the query by annotation transfer from potential othologs in the
Swiss-Prot database. The predicted annotations, and their confidence levels assigned by MESSA are listed:
(1) Annotation: the predicted annotation. A "very confident" annotation will be
highlighted in green; a "confident" annotation in yellow green and a "probable" annotation in yellow.
(2) Function Description:a detailed explanation for the function annotation
(3) Confidence Level: confidence of this prediction assigned by MESSA.
(4) Reference Protein: Swiss-Prot entry used for annotation transfer, linking to the
web page for that entry in the Swiss-Prot database
Prediction of Gene Ontology Terms
GO terms are the standard representations of protein attributes and they are widely used by researchers.
GO terms can be divided into three sub ontologies, i.e. Molecular Function (MF), Biological Process (BP) and
Cellular Compartment (CC). They describe the function of proteins from different aspects. MF ontologies focus
on the microscopic and molecular level; BP ontologies imply the macroscopic effect of protein's function and CC
ontologies annotate the sub cellular localization of proteins. MESSA predict the GO terms associated with the
query using the AMIGO server. The AMIGO server uses BLAST to detect homologs of the query in the GO database.
The GO terms associated with these hits, and their parent GO terms, are candidates to transfer to the query.
Their relevance to the query is evaluated by the similarity between the hits and the query, consensus in GO
terms annotated for different hits and the evidence used to assign these GO terms to the hits in the GO database.
The putative GO terms for the query are listed, providing the following information:
(1) GO Term: The ID for GO term, linking to the web page for it in the GO database.
The characters in the brackets ([MF], [BP] or [CC]) indicate which sub ontology this term belongs to.
MF means molecular function; BP indicates biological process and CC stands for cellular compartment.
A "very confident" annotation will be highlighted in green; a "confident" annotation in yellow green and a
"probable" annotation in yellow.
(2) Description: detailed explanation of the GO term.
(3) Confidence Level: confidence for this prediction assigned by MESSA.
(4) Parent GO Terms: the parent terms for the predicted GO term. Once a GO term is
assigned to a protein, its parent GO terms should be automatically assigned. Thus, the probability for these
parent GO terms to be associated with the query should be no less than the predicted one.
Prediction of Enzyme Commission Number
EC numbers describe the types of reactions enzymes catalyze and they are essential for understanding the
function of proteins in the context of metabolic pathways. MESSA assembles three tools to predict whether a
query is an enzyme and the EC number of a query that is predicted to be an enzyme: transferring EC numbers
from closely related Swiss-Prot entries, the EFICAz software and the Ezypred server. The predictions from
these 3 resources are combined to generate the final prediction, together with the confident of these predictions.
If your query is predicted as an enzyme, the following information will be provided:
(1) EC Number: predicted EC number, linking to the web page about this EC number in the Enzyme
Nomenclature database. A "very confident" annotation will be highlighted in green; a "confident" annotation in
yellow green and a "probable" annotation in yellow.
(2) Description explanation of the predicted EC number.
(3) Confidence Level: confidence for this prediction assigned by MESSA.
If your query is not likely to be an enzyme, MESSA will output the following statemement:
"No EC number assigned to the protein, probably not an enzyme!"
A "very confident" statement will be highlighted in green; a "confident" one in yellow green and a "probable" one
in yellow.
Part III. Spatial Structure Prediction:
So far, homolog modeling is still the most reliable and effective way of structure prediction. MESSA utilizes several
procedures to get homologous structure and generate 3D structure models for your query.
This section displays the 3D structure models in Jmol for the query if a MODELLER KEY is provided.
Otherwise, templates select by MESSA, their alignments to the query and confidence levels will be listed as follows:
(1) ID: a combination of the PDB ID and the chain ID, linking to the web page of this entry in PDB.
(2) Alignment Graph: graphic view of the alignment between the query and hit. Gaps in the query
are neglected in this graph.
(3) Confidence Level: the confidence of this predicted template assigned by MESSA.
(4) View Alignment and Template Structure: provide links to the alignment between the query and
this template, and to the PDB file of this template.