Database
Database of numerical profiles representing a set of multiple sequence alignments.
The search detects similarities between query and alignments from this set.
The following databases are available:
Full chains of known 3D structures:
PDB70: PSI-BLAST alignments produced with sequences of PDB full chain representativies (<70% sequence identity) as queries
Structural domains, delineated and classified by expert analysis:
SCOP40 : PSI-BLAST alignments produced with sequences of
SCOP domain representatives in ASTRAL (<40% identity)
as queries (http://astral.berkeley.edu/)
SCOP_TEST_SUBSET : the dataset of ~4000 profiles originally used for PROCAIN parameter optimization and testing : PSI-BLAST alignments produced from ASTRAL domain representatives filtered by 20% sequence identity of structure-based alignments in SCOP superfamilies.
Note: PDB codes are present in the names of SCOP domains that are displayed as hit names.
The PDB code is preceded by letter 'd' and followed by domain identifier; e.g. d1psda1 corresponds to PDB 1psdA, domain 1.
Sequence families, including proteins of known and unknown structure and function:
PFAM (http://pfam.janelia.org/) : full sequence alignments for PFAM families (from distributed file Pfam-A.full)
The PDB representatives are full chains extracted from the whole set of available 3D structures, based on 70% cutoff of sequence identity.
The SCOP representatives are structural domains defined and classified by expert analysis into families, superfamilies, folds, and classes. These representatives are based on 40% identity and are taken from ASTRAL database.
The PDB and ASTRAL sequences are used as queries for the PSI-BLAST searches against NCBI nr database. The resulting multiple sequence alignments (MSAs) of detected homologs are used to generate COMPASS profiles. To allow for the choice of different levels of sequence divergence within MSAs, the user can choose profiles corresponding to different numbers of PSI-BLAST iterations.
PFAM database includes families of both known and unknown 3D structure, covering the protein sequence space more completely. They provide alternative ways of family classification: typically tighter sequence grouping, with more consideration of protein function, and clustering of orthologs from different genomes.
PFAM profiles are generated from full family alignments provided by PFAM.