README

PAIR

A pair is named by concatenating the SCOP identifier (sid) of the first motif and the SCOP identifier (sid) of the second motif. If a motif involves more than one SCOP identifier, we randomly pick one and use it in the pair name.

MOTIF

Motif name is PDB ID followed by an underscore, followed by the chain ID (if no chain ID, an underscore), followed by a small letter indicating the category of that motif
(a: an artificial motif; c: a core motif; h: a hybrid motif; i: an interface motif).

RANGE

Range of a motif is specified by the starting and ending residue numbers. The capital letters in a range are the PDB chain identifier.

SCOPid

The unique identifier of a domain in the SCOP database, the same as SCOP sid.

CLASS

In SCOP database, domains in the same class have similar secondary structural element compositions. We abbreviate SCOP classes as follows:
All alpha proteins --- all a
All beta proteins --- all b
Alpha and beta proteins (a/b) --- a/b
Alpha and beta proteins (a+b) --- a+b
Multi-domain proteins (alpha and beta) --- multi
Membrane and cell surface proteins and peptides --- membrane
Small proteins --- small
Designed proteins --- design

FOLD

In SCOP database, domains in the same fold share the same overall three-dimensional architecture and topology.

SUPERFAMILY

In SCOP database, domains in the same superfamily are all homologs.

FAMILY

In SCOP database, domains in the same family are close homologs.

PROTEIN

The protein to which the domain belongs.

Aligned-Length

Aligned length equals the number of aligned residue pairs (two upper case letters in the alignment).

Sequence-Identity

Sequence identity equals the number of identical residue pairs in the aligned positions divided by the aligned length.

RMSD

RMSD (Root Mean Square Deviation) is calculated on aligned CA atoms.

GDT-TS

GDT-TS = 100 * (n1 + n2 + n4 + n8) / (4 * a), where n1, n2, n4, n8 is the number of aligned residues within 1, 2, 4, 8 angstroms, respectively, and a is the aligned-length. Note that GDT-TS is a percentage. (Zemla, A. (2003) Nucleic acids research, 31, 3370)

COMPASS

This score is based on the scoring function used in the profile-profile comparison program COMPASS (Sadreyev, R. and Grishin, N. (2003) Journal of Molecular Biology, 326, 317).
Suppose a structural alignment between domain 1 and domain 2. A sequence profile is constructed for each domain by running PSI-BLAST (Altschul, S. et al. (1997) Nucleic Acids Res, 25, 3389) for eight iterations against the NCBI nr database. The columns corresponding to the aligned positions in the structure alignment are extracted from the two sequence profiles. For every aligned position, the two extracted columns, one from domain 1 profile and the other from domain 2 profile, are compared and scored by the COMPASS scoring function. The COMPASS score for the whole alignment is calculated by summing over all the aligned positions and then dividing by the aligned-length.