The protein in three formats are acceptable:
I. FASTA format protein sequence
FASTA format file contains a definition line followed by the actual sequence.
The first line (definition line) in a FASTA file starts with a ">" (greater-than)
symbol and is usually a description of the sequence.
Following the initial line is the actual sequence itself in standard one-letter code.
Anything other than a valid code would be ignored (including spaces, tabulators, etc...).
One FASTA format sequence contains at least two lines (first line as definition line
and others as protein sequence)
Example of FASTA format protein sequence:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
II. plain-text format protein sequence
Plain-text format sequence is the actual sequence itself in standard one-letter code.
Anything other than a valid code would be ignored (including spaces, tabulators, etc...).
Example of plain-text format protein sequence:
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
III. protein gi number
GI number (sometimes written in lower case, "gi") is simply a series of digits that are
assigned consecutively to each sequence record processed by NCBI.
Example of gi number:
The gi number of the FASTA format sequence in Part I is: 5524211