The formats that Pclust accepts are as follows. I. Seq2Ref result link: Seq2Ref is a web server designed to facilitate functional interpretation. The result link contains the report for reference proteins, which are experimentally studied or manually curated, for a query protein. An example of Seq2Ref result link: http://prodata.swmed.edu/wenlin/server/user_data/seq2ref/S2RGsaCMG/result.html II. Sequence(s) in FASTA format A FASTA format file contains a definition line with a return space, followed by the actual sequence. The first line (definition line) in a FASTA file starts with a ">" (greater-than) symbol and is usually a description of the sequence. The sequence, represented as the 20 standard single amino acid characters, follows on the next line. Anything other than these characters is ignored (including spaces, tabs, etc...). Thus, any valid FASTA-input has at least two lines. For more details, please refer to http://en.wikipedia.org/wiki/FASTA_format Examples of FASTA input: 1. single sequence: >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY 2. multiple sequences: >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY >gi|5524215|gb|AAD44168.1| cytochrome b [Elephas maximus indicus] THIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIW GGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLIL ILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSILXXGLMP XLHTSKHRSMMLRPLSQALFWTLTMDLLXLTWIGXQPVEYXYTIIGQMASXLYFSIILAFLPIAGXIENY LX III. Network data with protein IDs The network data defines a network structure and topology. Each line of the data contains 4 columns separated by a space: 1)1st GI number, 2)2nd GI number, 3) link strength between the two GIs, and 4)notes for the link (optional). A header is optional to label each column. Pclust will use the GI numbers to check whether PDB structures, Swiss-Prot functional annotations, and PubMed references are available. Some special formats will be recognized to mark the nodes in the network: 1. "sp|Swiss-Prot_ID" (such as "sp|P61472" or "sp|YIDD_SALTI"). The node will be colored yellow and associated with the link to the Swiss-Prot annotations. 2. "pdb|PDB_ID" (such as "pdb|2onk"). The node will be colored purple and associated with the link to the pdb structure. 3. "ref|PubMed_IDs" (such as "ref|21803992, 2415431"). The node will be colored green and associated with the PubMed articles. Note that we will NOT check the validity of all the IDs indicated by the user in such special formats. An example is shown as follows. GI1 GI2 link notes(optional) 80193 16767126 1e-3 this is a pseudo-link 25478(sp|YIDD_SALTI) 2323 1e-30 all the GI numbers are imaginary. 80193 2323(sp|P61472|pdb|2onk) 1e-11 here is an example for multiple notes for nodes 2323 25478(pdb|1ky2) 1e-22 example: 25478 will get info both from the 2nd record and here