C.L.asiaticus



Brief Description of the Webpage

In this website, each protein is represented by one webpage, and sorted by the position in the genome sequence. For each protein, information and analysis from the following perspectives are provided: First, known information in various database is gathered or linked for easy access. Second, important sequence features are predicted by multiple programs. Third, homologous proteins and protein families are detected by several procedures to get function prediction. Finally, closely related protein and protein domain structures obtained by several programs are listed to provide templates for structure modeling. Specifically, each webpage consists of the following parts:

Part I. Basic Information:

Explanation:

Ever since the genome of C. L. asiaticus was publised, related information has been accumulated in different databases, especially in resources from NCBI. In this part, we gathered useful IDs, function annotations and links to other databases, as references for structure and function analysis.

Components:

1.1   GeneID in NCBI database: the GeneID links to NCBI Entrez Gene database
1.2   Locus tag: the tag links to NCBI Genome database and the graph shows this this gene and its neighbours' mapping to the whole genome
1.3   Protein GI in NCBI database: the GI links to the Genpept format data for this protein in NCBI Protein Database
1.4   Protein Accession: another identifier for this protein
1.5   Gene range: shows the range of this gene (unit: basepair) in the genome, and link to the mapping of this protein to the genome sequence in NCBI Nucleotide Database
1.6   Protein Length
1.7   Gene description: the definition line of the protein in NCBI Protein Database
1.8   COG prediction: the functional prediction from NCBI Nucleotide Database based on homology search against Clusters of Orthologous Groups (COG) database
1.9   pfam domain: the most confident pfam domain detected in this protein detected by RPS-BLAST (generated by our group)
1.10   Effector prediction: result of an effector predictor, effective T3
1.11   Pathway involved: link to more information of this protein in KEGG
1.12   sequence: link to the sequence of this protein in "fasta" format
1.13   sequence profile: link to the sequence profile of this protein in "clustalW" format (generated by our group)

Part II. Prediction of Local Features:

Explanation:

Several essential sequence features are predicted and the result of each predictor is represented in one line for easy analysis. Secondary structure, low complexity region, disordered region and coiled coils are predicted to assist structure prediction; conserved residues are mapped to help function analysis; with the concern that transmembrane proteins and secretted proteins are more likely to be our target to control the bacterium, several transmembrane helix predictions and signal peptide predictions are carried out. For important and difficult featurea, multiple predictors are applied so that more confident conclusion can be made based on the consensus between different predictors.

Components:

2.1   Sequence (highlighted according to the property of amino acid) from NCBI database
2.2   Secondary structure prediction by PSIPRED
2.3   Secondary structure prediction by SSPRO
2.4   Coil and loop (highlighted in pink) prediction by DISEMBL
2.5   Flexible loop (highlighted in pink) prediction by DISEMBL
2.6   Low complexity region (highlighted in light red) prediction by SEG
2.7   Disordered region (highlighted in red) prediction by DISOPRED
2.8   Disordered region (highlighted in red) predicted by DISEMBL
2.9   Disordered region (highlighted in red) prediction by DISPRO
2.10   Transmembrane helix (highlighted in blue) prediction by TMHMM
2.11   Transmembrane helix (highlighted in blue) prediction by TOPPRED2
2.12   Transmembrane helix (highlighted in blue) predction by HMMTOP
2.13   Transmembrane helix (highlighted in blue) predction by MEMSAT
2.14   Transmembrane helix (highlighted in blue), reentered helix (highlighted in yellow) and signal peptide (highlighted in green) predction by MEMSAT_SVM
2.15   Transmembrane helix (highlighted in blue) and signal peptide (highlighted in green) prediction by Phobius
2.16   Signal Peptide (highlighted in green) prediction by signalP Hidden Markov Model mode
2.17   Signal Peptide (highlighted in green) prediction by signalP Neural Network mode
2.18   Coiled coils (highlighted in yellow) prediction by COILS
2.19   Conserved pattern (highlighted from white, through yellow to dark red as the level of conservation increases) generated by Multiple Sequence Alignment of confident PSI-BLAST hits in the first 2 iterations, filtered by 70% identity
2.20   Conserved pattern (highlighted from white, through yellow to dark red as the level of conservation increases) generated by Multiple Sequence Alignment of confident PSI-BLAST hits in the first 2 iterations, filtered by 90% identity

Part III. Close Homologs:

Explanation:

Close homologs usually preserve the same function inherited from the common ancestor, thus detection of close homologs is essential for function prediction. Besides, analysis of the taxnomy distribution of these close homologs can reveal possible horizontal gene transfer event in the genome, which is proved to be common feature of bacterial virulence factors. So in this part, the close homologs and their taxnomy information are provided. We also noticed some internal duplication events within this small genome, which may reveal some interesting internal symmetry of the C. L. asiaticus genome, so we specifically checked the close homologous relationship within the genome to aid the analysis.

Components:

3.1   Close homologs (if any) from the same genome detected by BLAST or PSI-BLAST first 2 iterations
3.2   Top 10 hits from BLAST or PSI-BLAST first 2 iterations with e-value below 0.005(click the "show" button to see the alignment and organism information for each hit)

Part IV. Conserved Domains:

Explanation:

Proteins usually consists of one or several conserved sequence domains. Identifying of these domains will help us to group our target to certain protein family and get abundant and reliable information from the protein family. Thus, we applied RPS-BLAST or more sensitive method, HHsearch to identify the conserved domains.

Components:

4.1   Conserved domains in the protein detected by RPS-BLAST against CDD database with e-value below 0.005 (click the "show" button to see the alignment and the domain information for each hit)
4.2   Conserved domains in the protein detected by HHsearch against CDD database with probability higher than 90.0% (click the "show" button to see the alignment and the domain information for each hit)

Part V. Homologous Structures:

Explanation:

So far, homolog modeling is still the most reliable and effective way of structure prediction. In this part we utilized several procedures to get homologous structure. With different sensitivity and accuracy, the combination of these procedures will enhance the chance to get a reliable template for homology based structure prediciton.

Components:

5.1   Homologous structures detected by PSI-BLAST against Nonredundant database with e-value below 0.005 (click the "show" button to see the alignment and the corresponding structure for each hit)
5.2   Homologous structures detected by RPS-BLAST against PDB70 (PDB database filtered for 70% identity) database with e-value below 0.005 (click the "show" button to see the alignment and the corresponding structure for each hit)
5.3   Homologous structures detected by HHsearch against PDB70 database with probability higher than 90.0% (click the "show" button to see the alignment and the corresponding structure for each hit)

Part VI. Homologous Structure Domains:

Explanation:

Structure domains are the folding and evolutionary units of protein. As the accumulation of known protein structures, such units have been analyzed and made into databases. The presence of certain structure domain usually implies certain function. Indentifying of such structure units in the protein will not only reveal the architecture of target protein, but provide hints about protein function as well. units

Components:

6.1   Homologous SCOP domains detected by RPS-BLAST against SCOP70 (SCOP database filtered for 70% identity, version1.75) database with e-value below 0.005 (click the "show" button to see the alignment, the domain information and the corresponding structure for each hit)
6.2   Homologous SCOP domains detected by HHsearch against SCOP70 (version1.75) database with probability higher than 90.0% (click the "show" button to see the alignment, the domain information and the corresponding structure for each hit)
6.3   Homologous MMDB (NCBI's protein domain database) domains detected by RPS-BLAST against MMDB70 (MMDB database filtered for 70% identity) database with e-value below 0.005 (click the "show" button to see the alignment and the corresponding structure for each hit)
6.4   Homologous MMDB domains detected by HHsearch against MMDB70 database with probability higher than 90.0% (click the "show" button to see the alignment and the corresponding structure for each hit)

Part VII. Area for you to post your comments:

We appreciate your suggestions and comments, and please provide your email address for further contact :)