HOME | ABOUT DATABASE | DATABASE STATISTICS | DOCUMENTATION | CONTACT |
M2SG: A MUTATION DATABASE VALIDATED AT PROTEOMIC AND GENOMIC LEVEL |
|||
M2SG web server could be queried by several types of inputs, including formats as protein sequence, OMIM entry number, UniProt accession, protein name, gene name, and disease phenotype. It would return the web page for the most relevant protein and the mapped mutations instantly.
You could input your protein sequence in FASTA format or plain-text format to query M2SG.
1.A protein sequence in FASTA format:
A protein sequence in FASTA-format contains a definition line followed by the actual sequence. The first line (definition line) starts with a ">" (greater-than) symbol and is usually a description of the sequence. From the second line is the actual sequence itself in standard one-letter code. Anything other than a valid code would be ignored (including spaces, tabulators, etc...). An exmaple is shown below,
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
2.A protein sequence in plain-text format:
In this format, the sequence should only contain the standard one-letter codes (without the definition line). Anything other than a valid code would be ignored (including spaces, tabulators, etc...). An example is given below,
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
M2SG uses BLAST to detect homologous disease-related proteins with mutations.
The top hit in the BLAST result will be shon only if it satisfiese either of the two criteria below: Other hits are omitted.
1. The "HIT" WITH HIGH CONFIDENCE: the hit with sequence identity greater than 90% and hit coverage (The coverage of BLAST alignment for the hit) greater than 90%;
2. The "HIT" WITH WARNING AND CAUTION: the hit with E-value lower than 10-10 and hit coverage greater 50%.
Other hits will be omitted.
One OMIM ID represents one OMIM entry indicated by a unique six-digit number, such as 202110. You can select the OMIM ID of your interest in the drop-down box to query our database. The server will return all disease-related mutations associated with this OMIM ID.
Each sequence is indicated by an accession number (AC) in the Uniprot database. Accession numbers are stable from release to release. You can select the UniProt AC of your protein, such as P05093, in the drop-down box to query our database. The server will return all disease related mutations associated with this UniProt AC as a result.
The protein name here means the full name recommended by the UniProt consortium to represent a human protein. You can input the protein name, such as "vasopressin V2 receptor", to query our database. In case that you cannot remember the whole name, the protein name will be autocompleted in the drop-down box if it matches one in our database. The server will return all disease-related mutations associated with this protein as a result.
The gene name here presents the recommended name that is used to officially represent a human gene. You can input the gene namei, such as "AVPR2", to query our database. In case that you cannot remember the whole name, the gene name will be autocompleted in the drop-down box if it matches one in our database. The server will return all disease-related mutations associated with this gene as a result.
You can input the disease phenotype name of your interest, such as "diabetes insipidus", to query our database. In case that you cannot remember the whole name, the disease phenotype name will be autocompleted in the drop-down box if it matches one in our database. The server will return all disease related mutations associated with the disease as a result. If there is more than one protein in our database associated with the disease, you can select your proteins from a list to find all disease related mutations.
A graphic view labeled by residue positions represents all the matches (of the native residues) of the mutations. The hit in M2SG is represented as a grey strip with its UniProt accession number. The red bars in the graph represent the mutations in coresponding positions. Details about the mutation will be shown when the mouse hovers over the red bar.
This protein sequence is used for mapping mutations. All positional information in that entry is based on this protein sequence.