MESSA - Meta Server for Sequence Analysis

General Description:

MESSA is a meta server to facilitate protein sequence analysis. It predicts the sequence features, structure and
 function for a given protein sequence. For an input sequence, the server exploits a number of select tools to predict
 local sequence properties such secondary structure and structurally disordered regions, coiled coils, signal peptides
 and transmembrane helices; detect homologous proteins and assign the query to related protein families; identify 3D
 structure templates and generate structure models; provide predictive statements about protein's function, including
 functional annotations transferred from close homologs, Gene Ontology terms, enzyme classification and possible
 functionally associated proteins.

Submission:

MESSA asks a user to provide a protein sequence and his or her non commercial email address to initiate a job.
In addition, it is recommended to provide information about the organism to which the input sequence belongs to and 
a MODELLER KEY to enable the full function of MESSA. Details about the inputs are listed below:

Protein Sequence:  (required) 
	The protein sequence should be in fasta format or as plain-text. You can paste the sequence in the text area
	or upload it as a file. A valid sequence should contain no less than 30 and no more than 5000 amino acids. 

Organism Type:  (optional)
	This input is required for Signal Peptide prediction by SignalP (version 3.0). You need to specify the type of
	organism your sequence belong to. You can select from "Eukaryote", "Gram-negative bacterium" and
	"Gram-positive bacterium". If none of these three categories is applicable, select "other" instead, and in that
	 case, the signal peptide prediction by SignalP is still indicative, but not as reliable.

Organism Name: (optional)
	This input is useful for testing the potential orthologous relationship between your query and its homologs 
	by Reciprocal Best Hit method and for MESSA to map your sequence to its genome locus. Simply type in the name 
	of the organism to which your sequence belongs and MESSA will try to search if this name matches any 
	organisms in our database of whole genome sequences. The top 10 possible matches for the name (or partial 
	name) you typed will show up in the drop down menu right below, from which you can select the one you want. 
	[Warning: as only 10 matching names are shown, your might need to input a few might need to input a few more 
	characters to find the name you want]. 

MODELLER KEY: (optional)
	If you would like MESSA to generate structure models for your input sequence, you need to provide a
	MODELLER KEY as required by the MODELLER license. A MODELLER KEY can be obtained here.
	Please note that if we cannot identify any confident structure templates for your query, structure models will
	not be generated, even if you provide this key.

Email Address: (required)
	MESSA will send an email to this address when your job is done. Only a non-commercial email address 
	(ending with ".org" or ".edu") will be accepted.

Job Name: (optional)
	You can assign a name for your job. Otherwise, MESSA will assign a random number. 

Explanation of Results (full version):

The full version offers extensive information and is designed for manual analysis of a protein. It presents important 
information from all programs and provides links to the original results. It contains the following eight sections:

Part I. Prediction of Local Sequence Features:
MESSA uses multiple tools to predict local sequence features such as secondary structure, transmembrane helices 
and signal peptides. The result from each predictor is represented as one string reporting each residue's predicted 
status. These strings are all aligned to the original protein sequence for ease of comparison. When the sequence 
is too long and cannot fit in the horizontal dimension of the web page, a scroll bar will appear under these predictions, 
and you can move it to view the whole sequence.

    Protein Sequence 
	The input protein sequence shown in this line is highlighted according to amino acid property: positively 
	charged residues in blue (light blue for partially positive), negatively charged residues in red (pink for 
	partially negative), hydrophobic residues in yellow and no highlight for small residues.

    Secondary Structure (PSIPRED)
	This program predicts 3-state secondary structure, alpha-helix (including 3-turn and 5-turn helix), beta-strand 
	and coils (including any secondary structure other than helix and strand). In the result, "H" represents alpha-
	helix; "E" stands for beta-strand; and "c" refers to coils.

    Secondary Structure (SSPRO)
	This program predict 3-state secondary structure, alpha-helix (including 3-turn and 5-turn helix), beta-strand 
	and coils (including any secondary structure other than helix and strand). In the result, "H" represents alpha-
	helix; "E" stands for beta-strand; and "c" refers to coils.

    Coil and Loop (DISEMBL)
	This program can be considered as a secondary structure predictor. Residues with secondary structure as neither 
	alpha-helix or beta-strand, will be predicted by this program. Predicted coils and loops are highlighted in pink.

    Flexible Loop (DISEMBL)
	Loops and coils that are likely to be structurally flexible will be predicted and highlighted in pink.

    Low Complexity Region (SEG)
	Regions with biased amino acid composition are called low-complexity regions. These regions are usually 
	disordered, and sometimes fold as alpha-helices. In addition, such regions could cause problems in
	 sequence comparison as similar low-complexity regions could appear in totally unrelated proteins. 
	Predicted low-complexity regions are colored in light red.

    Disordered Region (IsUnstruct)
        Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction. 
	Predicted disordered regions are highlighted in red.   

    Disordered Region (DISOPRED)
	Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction. 
	Predicted disordered regions are highlighted in red.

    Disordered Region (DISEMBL)
	Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.  
	Predicted disordered regions are highlighted in red.

    Disordered Region (DISPRO)
	Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction.
	Predicted disordered regions are highlighted in red.

    Transmembrane Helix (TMHMM)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization 
	and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) When at least one transmembrane helix is predicted, the characters "o" and "i" are used to represent the 
	topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side. 
	When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and 
	is thus meaningless

    Transmembrane Helix (TOPPRED)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization 
	and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false 
	predictions.
	(3) Regions that do not likely belong to transmembrane helices are marked as "x". 

    Transmembrane Helix (HMMTOP)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization 
	and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the 
	topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side. 
	When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and 
	is thus meaningless

    Transmembrane Helix (MEMSAT)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative of localization 
	and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false 
	predictions
	(3) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the 
	topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side. 
	When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and 
	is thus meaningless

    TM Helix and Signal Peptide (MEMSAT_SVM) (TM: Transmembrane)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). Proteins with signal peptides are usually 
	secreted into extracelluar space or periplasmic space of Gram-Negative bacteria. Secreted proteins play 
	important roles in exchanging materials and information with the environment. These predictions are indicative 
	of localization and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) Predicted transmembrane helices with low confidence will be marked as "h" and they are usually false 
	predictions
	(3) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the 
	topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side. 
	When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and 
	is thus meaningless
	(4) A predicted signal peptide will be marked as "S" and highlighted in green.
	(5) Sometimes, a helix might partly insert itself into the membrane without passing through the membrane,
	these helices are predicted by this program, marked as "R" and highlighted in yellow.

    TM Helix and Signal Peptide (Phobius) (TM: Transmembrane)
	Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
	bacteria). These transmembrane proteins typically function as receptors, transporters or components of 
	machinery on the membrane (such as electronic chain, flagellar). Proteins with signal peptides are usually 
	secreted into extracelluar space or periplasmic space of Gram-Negative bacteria. Secreted proteins play 
	important roles in exchanging materials and information with the environment. These predictions are indicative 
	of localization and function. The prediction is reported following these rules:
	(1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
	(2) When at least one transmembrane helix is predicted, character "o" and "i" are used to represent the 
	topology of transmembrane proteins, with "o" standing for periplasmic side and "i" for cytoplasmic side. 
	When no transmembrane helices are predicted, the "o" or "i" state is randomly assigned by the program and 
	is thus meaningless
	(3) A predicted signal peptide will be marked as "S" and highlighted in green.

    Signal Peptide (SignalP HMM Mode) (HMM: Hidden Markov Model)
	Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
	Negative bacteria. Secreted proteins play important roles in exchanging materials and information with the 
	environment.
	(1) A predicted signal peptide will be marked as "S" and highlighted in green.
	(2) Residues that do not likely to belong to a signal peptide are marked as "x".

    Signal Peptide (SignalP NN Mode) (NN: Neural Network)
	Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
	Negative bacteria. Secreted proteins play important roles in exchanging materials and information with the 
	environment.
	(1) A predicted signal peptide will be marked as "S" and highlighted in green.
	(2) Residues that do not likely to belong to a signal peptide are marked as "x".

    Coiled Coils (COILS)
	A coiled coil is a special structure motif, and several such motifs can be coiled together like strands of a rope.
	Detecting them is useful for 3D structure prediction. Coiled coils predicted by this program are marked as "x" 
	and highlighted in yellow.

    Positional Conservation 
	Multiple Sequence Alignment of confident BLAST hits, filtered by less than 90% identity and more than 40%
	coverage, are used to calculate the positional conservation indices of residues in the sequence. The 
	conserved residues usually play important roles in maintaining the function or structure of a protein. The 
	residues are highlighted from white, through yellow to dark red as the conservation level increases. 



Part II. Close Homologs for Annotation Transfer:

    Close Homologs in the Swiss-Prot Database Detected by BLAST
	Since annotations in Swiss-Prot database are of high quality, a close homolog (for example, a reciprocal 
	best hit for the query or a hit that shares above 40% sequence identity with the query) in this section can be 
	used for annotation transfer. The top 10 confident hits (e-value cutoff: 0.001) detected by BLAST from the 
	Swiss-Prot database are shown in this section.
	A summary table about these hits is on top, containing the following information:
	    (1) ID: The Swiss-Prot ID for the hit, linking to the web page for this protein in the Swiss-Prot database.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about the
	    hit shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 30 characters in the annotation of the hit (not a complete annotation).
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) RBH(Q2H): Reciprocal Best Hit, Query to Hit. Indicates whether the hit is the most similar to (measured
	    by BLAST) the query among all proteins in the same organism as the hit. This measurement is not 
	    available (N/A) when the hit is from a organism without available whole genome sequence.
	    (6) RBH(H2Q): Reciprocal Best Hit, Hit to Query. Indicates whether the query is the most similar to 
	    (measured by BLAST) the hit among all proteins in the same organism as the query. This measurement is 
	    not available (N/A) when the query is from a organism without available whole genome sequence.
	    (7) Q cover: BLAST alignment coverage for the Query.
	    (8) H cover: BLAST alignment coverage for the Hit.
	    (9) Identity: Identity between the query and the hit over the aligned region.
	    (10) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
	    as this hit in a random database of the same size as the Swiss-Prot database.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the Swiss-Prot ID, Swiss-Prot name and complete Swiss-Prot annotation of the 
	    hit, formatted as the sequence headers in BLAST database. 
	    (2) The alignment between the query and the hit
	    (3) The function description, assigned Enzyme Commission number of the hit (if any), and the organism 
	    from which this hit comes.
	(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on 
	The right of the hit header. 
  
    Close Homologs in the NCBI Non-Redundant Database Detected by BLAST
	This section contains the closest sequences to your query available in the Non-Redundant (NR) database. 
	The annotations of these proteins are not as reliable as the Swiss-Prot entries. However, if the 
	annotations of the best hits from NR match those from Swiss-Prot, it is an additional evidence for 
	annotation transfer. The top 10 confident hits (e-value cutoff: 0.001) from the NR database are shown.
	A summary table about them is on top, containing the following information:
	    (1) GI: The unique ID for the hit in the NCBI database, linking to the page for this protein in NCBI.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking details about
	    the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters of the hit annotation in NR database (not a complete annotation).
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) Q cover: BLAST alignment coverage for the Query.
	    (6) H cover: BLAST alignment coverage for the Hit.
	    (7) Identity: Identity between the query and the hit over the aligned region.
	    (8) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
	    as this hit in a random database of the same size as the NR database.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the GI and the full definition of the hit in the NR database.
	    (2) The alignment between the query and the hit.
	    (3) The organism from which this sequence come and the lineage of this organism.
	(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on 
	the right of the hit headers.



Part III. Prediction of Gene Ontology (GO) Terms
    Close Homologs with Gene Ontology terms Detected by BLAST
	GO terms are the standard representations of protein attributes and they are widely used by researchers. 
	MESSA predict the GO terms associated with the query using the AMIGO server. The AMIGO server uses 
	BLAST to detect homologs of the query in the GO database. These homologs, if close enough to the query, 
	could be used as sources for GO term transfer. The top 10 closest homologs (E-value cutoff 0.001) in the 
	GO databases detected by AMIGO are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the GO database, linking to the web page for it in the GO database.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about 
	    the hit shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit. This is not a complete annotation. We
	    use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) Q cover: BLAST alignment coverage for the Query.
	    (6) H cover: BLAST alignment coverage for the Hit.
	    (7) Identity: Identity between the query and the hit over the aligned region.
	    (8) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level of similarity
	    as this hit in a random database of the same size as the GO database.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: include the ID and the full definition of the hit in the GO database.
	    (2) The alignment between the query and the hit.
	    (3) The GO terms associated with this hit. 
	 (2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on 
	the right of the hit header.



Part IV. Prediction of Enzyme Commission (EC) Number
EC numbers describe the types of reactions enzymes catalyze and they are essential for understanding the function 
of proteins in the context of metabolic pathways. This section contains EC number predictions by three methods, as 
introduced below:

    EC Number Prediction by Annotation Transfer from Swiss-Prot Entries
	The close homologs are selected by MESSA from all confident BLAST hit (e-value cutoff: 0.001) from the 
	Swiss-Prot database. These homologs could be used as source to transfer their assigned EC numbers 
	to the query. For each of them, the following information is provided:
	    (1) ID: Swiss-Prot ID for the entry, linking to the web page for it in the Swiss-Prot database.
	    (2) Name: name of the entry in Swiss-Prot database, usually composed of the short name of
	    the protein and the short name of the organism the protein comes from.
	    (3) Annotated EC number: The EC number of this entry annotated by Swiss-Prot curators.
	    (4) Identity Sequence identity between this entry and the query calculated from BLAST alignment.
	    (5) Query coverage: The coverage of BLAST alignment for the query.
	    (6) Hit coverage: The coverage of BLAST alignment for this Swiss-Prot entry.
	    (7) RBH(Q2H): Reciprocal Best Hit, Query to Hit. Indicates whether this Swiss-Prot entry is the most 
	    similar to (measured by BLAST) the query among all proteins from the same organism as this entry. 
	    This measurement is not available (N/A) when the entry is from an organism without available whole 
	    genome sequence.
	    (8) RBH(H2Q): Reciprocal Best Hit, Hit to Query. Indicates whether query is the most similar to 
	    (measured by BLAST) this Swiss-Prot entry among all proteins from the same organism as the query. 
	    This measurement is not available (N/A) when the query is from a organism without available whole 
	    genome sequence.

    EC Number Prediction by Ezypred Server
	Ezypred predicts whether the query is an enzyme first. If it is an enzyme, Ezypred will predict the first two 
	levels of its EC number(such as 1.1.-.-). For prediction at each level, the Predicted EC number and its 
	explanation are listed.

    EC Number Prediction by EFICAz Software
	EFICAz predicts whether a query is an enzyme. If it is an enzyme, it tries to predict all four levels of its EC
	number(such as 1.1.1.1). MESSA will list the predicted EC number, the description of this EC number and 
	the confidence of this prediction provided by EFICAz. 



Part V. Prediction of functionally associated proteins
This section lists proteins that are likely to function together with the query. The prediction mostly relies on the 
STRING database that assigns functional association between proteins by multiple criteria, such as physical 
interaction, expression pattern and genomic context. This information is most helpful for bacterial queries.

    Genomic Context
	When the query comes from a user-specified organism with complete genome sequence available, MESSA 
	will map it to its genomic locus and provide a link, namely "View genomic context of your protein in NCBI", to
	NCBI Gene database to show the genomic context of the query.

    Functionally Associated Proteins Detected by STRING:
	STRING assigns functionally associated proteins to a query by finding its close homolog in their database
	and transferring the associations for that close homolog to the query. Predictions will be available only
	for cases with close homologs present in the STRING database. MESSA presents the results in similar 
	format as STRING. 
	First, the closest homolog to the query in the STRING database used for association transfer is provided. We 
	suggest user to verify if it is an ortholog of your input. If not, the results may not be meaningful. 
	Second, the "Predicted Functional Partners" are listed in a table, providing the following information:
	    (1) The name and annotation of the partner.
	    (2) The evidence that suggests an association between this protein and your input. A dot will appear 
	    under the certain criterion listed in the header of this table if it is used as evidence to suggest the 
	    functional association. The color of this dot indicates the confidence of the evidence, with black being 
	    confident and grey being less confident.
	    (3) The overall confidence score for this predicted association. 

 

Part VI. Conserved Domains and Related Protein Families:
Proteins usually contain one or several conserved sequence domains. Identifying these domains will assign the query 
sequence to a certain protein family and apply any relevant functional information about these conserved domains to 
the query. 

    Conserved Domains Detected by RPS-BLAST
	Confident homologs (E-value cutoff: 0.005) in the Conserved Domain Database (CDD) are shown. In addition 
	to its own domain library, CDD also contains domains and protein families from the Pfam, COG, KOG, SMART 
	and TIGRFAM databases.
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the CDD, linking to the web page of this conserved domain in the CDD.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to details about 
	    the hit shown below. Gaps in the query are negelected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). 
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) E-value: E-value from BLAST, indicating the likelihood of finding a hit with the same level of
	    similarity as this hit in a random database of the same size as CDD.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the CDD.
	    (2) The alignment between the query and the hit.
	    (3) The detailed description of this domain or protein family. 
	(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on 
	the right of the hit header.

    Conserved Domains Detected by HHsearch
	The confident homologous domains and protein families detected by HHsearch (probability cutoff: 90%) in 
	the CDD are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the CDD, linking to the web page of this conserved domain in the CDD.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details 
	    about the hit shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). 
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) Probability: HHsearch probability, indicating the likelihood for the hit being a true homolog.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the CDD.
	    (2) The alignment between the query and the hit.
	    (3) The description of this domain or protein family in CDD. 
	(2) and (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button 
	on the right of the hit header.



Part VII. Homologous Structure Templates:
So far, homolog modeling is still the most reliable and effective way of structure prediction. MESSA utilizes several 
procedures to get homologous structures. With different sensitivity and accuracy, the combination of these procedures 
will enhance the chance to get reliable templates for homology based structure prediction.

    Structure Templates Detected by BLAST
	Confident homologs (e-value cutoff: 0.001) detected by BLAST in the Protein Data Bank (PDB) are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit, it is a combination of the PDB ID and the chain ID. It links to the web page of 
	    this entry in PDB.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
	    about the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We 
	    use this truncated annotation just to provide a brief idea of the hit, and the full annotation is listed below.
	    (5) E-value: E-value from BLAST, indicating the likelihood of finding a protein with the same level 
	    of similarity as this hit in a random database of the same size as PDB.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the PDB.
	    (2) The alignment between the query and the hit.
	    (3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon 
	    diagram and colored in rainbow from the N- to C-terminus.
	(2) and (3) are hidden by default for most hits and they can be easily retrieved by clicking the "show" button 
	on the right of the hit header.

    Structure Templates Detected by RPS-BLAST
	The confident homologs (e-value cutoff: 0.001) in the PDB, filtered by 70% sequence identity (PDB70) 
	detected by RPS-BLAST are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the PDB, it is a combination of the PDB ID and the chain ID. It links to the web 
	    page for this entry in PDB.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
	    about the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We 
            use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) E-value: E-value from RPS-BLAST, indicating the likelihood of finding a hit with the same level of
	    similarity as this hit in a random database of the same size as PDB70.
	More information about each hit is listed below the summary, including:
	    (1) Hit header, the ID and the full definition of the hit in the PDB.
	    (2) The alignment between the query and the hit.
	    (3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon 
	    diagram and colored in rainbow from N- to C-terminus.
	(2), (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button on 
	the right of the hit header.

    Structure Templates Detected by HHsearch
	The confident structure templetes detected by HHsearch (probability cutoff: 80%) in the PDB70 are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the PDB, it is a combination of the PDB ID and the chain ID. The ID links to the 
	    web page of this entry in the PDB. 
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details 
	    about the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation). We 
            use this truncated annotation just to provide a brief idea, and the full annotations are listed below.
	    (5) Probability: Probability from HHsearch, indicating the likelihood of the hit being a true homolog.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the PDB.
	    (2) The alignment between the query and the hit
	    (3) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon 
	    diagram and colored in rainbow from N- to C-terminus.
	(2) and (3) are hidden for most hits by default and they can be easily retrieved by clicking the "show" button 
	on the right of the hit header.



Part VIII. Homologous Structure Domains:
Structure domains are the folding and evolutionary units of proteins. The presence of a certain structure domain usually implies 
function. Indentifying such structure units in the protein will not only reveal the architecture of the target protein, 
but provide hints about protein function as well.

    Structure Domains Detected by RPS-BLAST
	The confident homologs detected by RPS-BLAST (e-value cutoff: 0.001) in the Structure Classification of 
	Proteins database, filtered by 70% identity (SCOP70) are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the SCOP database.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details
	    about the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit (not a complete annotation).
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) E-value: E-value from RPS-BLAST, indicating the likelihood of finding a hit with the same level 
	    of similarity as this hit in a random database of the same size as SCOP70. 
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the PDB.
	    (2) The classification of the domain in SCOP. 
	    (3) The alignment between the query and the hit.
	    (4) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon 
	    diagram and colored in rainbow from N- to C-terminus.
	(2), (3) and (4) are hidden for most hits by default and they can be easily retrieved by clicking the "show" 
	button on the right of the hit header.

    Structure Domains detected by HHsearch
	Confident homologs detected by HHsearch (probability cutoff: 90%) in the SCOP70 are shown. 
	A summary table of them is on top, containing the following information:
	    (1) ID: The ID for the hit in the SCOP database.
	    (2) Alignment Graph: graphic view of the alignment between the query and hit, linking to the details 
	    about the hit that are shown below. Gaps in the query are neglected in this graph.
	    (3) Length: the total sequence length of the hit.
	    (4) Definition: the first 40 characters in the annotation of the hit. This is not a complete annotation. 
	    We use this truncated annotation just to provide a brief idea, and the full annotation is listed below.
	    (5) Probability: Probability from HHsearch, indicating the likelihood of the hit being a true homolog.
	More information about each hit is listed below the summary, including:
	    (1) Hit header: the ID and the full definition of the hit in the SCOP database.
	    (2) The classification of the domain in SCOP.
	    (3) The alignment between the query and the hit.
	    (4) The 3D structure of the regions in the hit that are aligned to the query. It is displayed in Jmol as ribbon 
	    diagram and colored in rainbow from N- to C-terminus.
	 (2), (3) and (4) are hidden for most hits by default and they can be easily retrieved by clicking the "show" 
	button on the right of the hit header.

Explanation of Results (summary version):

By integrating results from different methods, we generate the consensus-based final predictions for local sequence 
features, three-dimensional structure and function. We present these predictions as a summary page, which contains 
three sections:

Part I. Local Sequence Features Prediction:
This section contains prediction of secondary structure, disordered regions, transmembrane helices, signal peptide, 
coiled coils and positional conservation indices. Except the last two, the predictions are based on the consensus 
between multiple predictors The result for each prediction is represented as one string reporting each residue's 
predicted status. These strings are all aligned to the original protein sequence for ease of comparison. When the 
sequence is too long and cannot fit in the horizontal dimension of the web page, a scroll bar will appear under these 
predictions, and you can move it to view the whole sequence.

    Protein Sequence 
        The input protein sequence shown in this line is highlighted according to amino acid property: 
        positively charged residues in blue (light blue for partially positive), negatively charged residues in red (pink 
        for partially negative), hydrophobic residues in yellow and no highlight for small residues.

    Secondary Structure (Consensus)
        The 3-state secondary structure of the query is predicted, alpha-helix (including 3-turn and 5-turn helix), 
        beta-strand  and coils (including any secondary structure other than helix and strand). In the result, "H" 
        represents alpha-helix; "E" stands for beta-strand; and "c" refers to coils.

    Disordered Region (Consensus)
        Disordered regions lack stable tertiary structure and identifying them is useful for structure prediction. Here 
        the predicted disordered regions are highlighted in red.

    Transmembrane Helix (Consensus)
        Proteins with transmembrane helices are located in the membrane (inner membrane for Gram-negative 
        bacteria). These transmembrane proteins typically function as receptor, transporter or component of 
        machinery on the membrane (such as electronic chain, flagellar). This prediction is indicative for localization 
        and function. The prediction is reported following these rules:
        (1) Predicted transmembrane helices will be marked as "H" and highlighted in blue. 
        (2) Regions that do not likely belong to transmembrane helices are marked as "x". 

    Signal Peptide (Consensus) 
        Proteins with signal peptides are usually secreted into extracelluar space or periplasmic space of Gram-
        Negative bacteria). They play important role in exchanging materials and information with the environment.
        (1) A predicted signal peptide will be marked as "S" and highlighted in green.
        (2) Residues that do not likely to belong to a signal peptide are marked as "x".

    Coiled Coils (COILS)
        A coiled coil is a special structure motif, and several such motifs can be coiled together like strands of a rope.
        Detecting them is useful for 3D structure prediction. Coiled coils predicted by this program is marked as "x" 
        and highlighted in yellow.

    Positional Conservation 
        Multiple Sequence Alignment of confident BLAST hits, filtered by less than 90% identity and more than 40%
        coverage, are used to calculate the positional conservation indices of residues in the sequence. The 
        conserved residues usually plays important in maintaining the function or structure of a protein. Here the 
        residues are highlighted from white, through yellow to dark red as the conservation level increases. 



Part II. Function Prediction:
This section contains predicted function annotation, GO terms and EC numbers (if the query is an enzyme). A 
confidence level ("very confident", "confident" or "probable") is provided for each prediction.

     Annotation Transferred From Closely Related Swiss-Prot Entries 
        Since annotations in Swiss-Prot database are of high quality, a closely related Swiss-Prot entry could be 
        used as a reference for function prediction. MESSA detects homologous Swiss-Prot entries to the query, 
        evaluates their relationship and annotates the query by annotation transfer from potential othologs in the 
         Swiss-Prot database. The predicted annotations, and their confidence levels assigned by MESSA are listed:
            (1) Annotation: the predicted annotation. A "very confident" annotation will be 
            highlighted in green; a "confident" annotation in yellow green and a "probable" annotation in yellow.
            (2) Function Description:a detailed explanation for the function annotation
            (3) Confidence Level: confidence of this prediction assigned by MESSA.
            (4) Reference Protein: Swiss-Prot entry used for annotation transfer, linking to the 
            web page for that entry in the Swiss-Prot database

    Prediction of Gene Ontology Terms
        GO terms are the standard representations of protein attributes and they are widely used by researchers. 
        GO terms can be divided into three sub ontologies, i.e. Molecular Function (MF), Biological Process (BP) and 
        Cellular Compartment (CC). They describe the function of proteins from different aspects. MF ontologies focus 
        on the microscopic and molecular level; BP ontologies imply the macroscopic effect of protein's function and CC 
        ontologies annotate the sub cellular localization of proteins. MESSA predict the GO terms associated with the 
	query using the AMIGO server. The AMIGO server uses BLAST to detect homologs of the query in the GO database. 
        The GO terms associated with these hits, and their parent GO terms, are candidates to transfer to the query. 
        Their relevance to the query is evaluated by the similarity between the hits and the query, consensus in GO 
        terms annotated for different hits and the evidence used to assign these GO terms to the hits in the GO database. 
        The putative GO terms for the query are listed, providing the following information:
            (1) GO Term: The ID for GO term, linking to the web page for it in the GO database. 
            The characters in the brackets ([MF], [BP] or [CC]) indicate which sub ontology this term belongs to. 
            MF means molecular function; BP indicates biological process and CC stands for cellular compartment.
            A "very confident" annotation will be highlighted in green; a "confident" annotation in yellow green and a 
            "probable" annotation in yellow.
            (2) Description: detailed explanation of the GO term.
            (3) Confidence Level: confidence for this prediction assigned by MESSA.
            (4) Parent GO Terms: the parent terms for the predicted GO term. Once a GO term is 
            assigned to a protein, its parent GO terms should be automatically assigned. Thus, the probability for these 
            parent GO terms to be associated with the query should be no less than the predicted one. 

    Prediction of Enzyme Commission Number
        EC numbers describe the types of reactions enzymes catalyze and they are essential for understanding the 
        function of proteins in the context of metabolic pathways. MESSA assembles three tools to predict whether a 
        query is an enzyme and the EC number of a query that is predicted to be an enzyme: transferring EC numbers 
        from closely related Swiss-Prot entries, the EFICAz software and the Ezypred server. The predictions from 
        these 3 resources are combined to generate the final prediction, together with the confident of these predictions.
        If your query is predicted as an enzyme, the following information will be provided:
            (1) EC Number: predicted EC number, linking to the web page about this EC number in the Enzyme 
            Nomenclature database. A "very confident" annotation will be highlighted in green; a "confident" annotation in 
            yellow green and a "probable" annotation in yellow.
            (2) Description explanation of the predicted EC number.
            (3) Confidence Level: confidence for this prediction assigned by MESSA.
        If your query is not likely to be an enzyme, MESSA will output the following statemement:
            "No EC number assigned to the protein, probably not an enzyme!" 
        A "very confident" statement will be highlighted in green; a "confident" one in yellow green and a "probable" one 
        in yellow.



Part III. Spatial Structure Prediction:
So far, homolog modeling is still the most reliable and effective way of structure prediction. MESSA utilizes several 
procedures to get homologous structure and generate 3D structure models for your query. 
This section displays the 3D structure models in Jmol for the query if a MODELLER KEY is provided. 
Otherwise, templates select by MESSA, their alignments to the query and confidence levels will be listed as follows:
    (1) ID: a combination of the PDB ID and the chain ID, linking to the web page of this entry in PDB.
    (2) Alignment Graph: graphic view of the alignment between the query and hit. Gaps in the query 
    are neglected in this graph.
    (3) Confidence Level: the confidence of this predicted template assigned by MESSA.
    (4) View Alignment and Template Structure: provide links to the alignment between the query and
    this template, and to the PDB file of this template.