The formats that Pclust accepts are as follows.

I.  Seq2Ref result link:
Seq2Ref is a web server designed to facilitate functional interpretation. The result link contains the report for reference proteins, which are experimentally studied or manually curated, for a query protein.

An example of Seq2Ref result link:
http://prodata.swmed.edu/wenlin/server/user_data/seq2ref/S2RGsaCMG/result.html 



II. Sequence(s) in FASTA format

A FASTA format file contains a definition line with a return space, followed by the actual sequence. The first line (definition line) in a FASTA file starts with a ">" (greater-than) symbol and is usually a description of the sequence. The sequence, represented as the 20 standard single amino acid characters, follows on the next line. Anything other than these characters is ignored (including spaces, tabs, etc...). Thus, any valid FASTA-input has at least two lines. For more details, please refer to http://en.wikipedia.org/wiki/FASTA_format 

Examples of FASTA input:

1. single sequence:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY

2.  multiple sequences:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>gi|5524215|gb|AAD44168.1| cytochrome b [Elephas maximus indicus]
THIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIW
GGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLIL
ILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSILXXGLMP
XLHTSKHRSMMLRPLSQALFWTLTMDLLXLTWIGXQPVEYXYTIIGQMASXLYFSIILAFLPIAGXIENY
LX



III. Network data with protein IDs
The network data defines a network structure and topology. Each line of the data contains 4 columns separated by a space: 1)1st GI number, 2)2nd GI number, 3) link strength between the two GIs, and 4)notes for the link (optional). A header is optional to label each column. Pclust will use the GI numbers to check whether PDB structures, Swiss-Prot functional annotations, and PubMed references are available.

Some special formats will be recognized to mark the nodes in the network:
1. "sp|Swiss-Prot_ID" (such as "sp|P61472" or "sp|YIDD_SALTI"). The node will be colored yellow and associated with the link to the Swiss-Prot annotations.
2. "pdb|PDB_ID" (such as "pdb|2onk"). The node will be colored purple and associated with the link to the pdb structure.
3. "ref|PubMed_IDs" (such as "ref|21803992, 2415431"). The node will be colored green and associated with the PubMed articles. 
Note that we will NOT check the validity of all the IDs indicated by the user in such special formats.


An example is shown as follows.

GI1    GI2    link    notes(optional)
80193 16767126 1e-3 this is a pseudo-link
25478(sp|YIDD_SALTI) 2323 1e-30 all the GI numbers are imaginary.
80193 2323(sp|P61472|pdb|2onk) 1e-11 here is an example for multiple notes for nodes
2323  25478(pdb|1ky2) 1e-22 example: 25478 will get info both from the 2nd record and here