Database
Database of numerical profiles representing a set of multiple sequence alignments.
The search detects similarities between query and alignments from this set.

The following databases are available:

Full chains of known 3D structures:
PDB70_iter3: PSI-BLAST alignments (after iteration 3) produced with sequences of PDB full chain representativies (<70% sequence identity) as queries
PDB70_iter5: PSI-BLAST alignments (after iteration 5) produced with sequences of PDB full chain representativies (<70% sequence identity) as queries

Structural domains, delineated and classified by expert analysis:
SCOP40_iter3: PSI-BLAST alignments (after iteration 3) produced with sequences of SCOP domain representatives in ASTRAL (<40% identity) as queries (http://astral.berkeley.edu/)
SCOP40_iter5 : PSI-BLAST alignments (after iteration 5) produced with sequences of SCOP domain representatives in ASTRAL (<40% identity) as queries (http://astral.berkeley.edu/)

Note: PDB codes are present in the names of SCOP domains that are displayed as hit names.
The PDB code is preceded by letter 'd' and followed by domain identifier; e.g. d1psda1 corresponds to PDB 1psdA, domain 1.

Sequence families, including proteins of known and unknown structure and function:
PFAM (http://pfam.janelia.org/) : full sequence alignments for PFAM families (from distributed file Pfam-A.full)

Clusters of orthologous genes, mainly from prokaryotic genomes:
COG (http://www.ncbi.nlm.nih.gov/COG/) : alignments of all members in each COG (produced by MUSCLE)

Clusters of orthologous genes, including eukaryotic genomes:
KOG (http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi) : alignments of all members in each KOG (produced by MUSCLE)


The PDB representatives are full chains extracted from the whole set of available 3D structures, based on 70% cutoff of sequence identity.
The SCOP representatives are structural domains defined and classified by expert analysis into families, superfamilies, folds, and classes. These representatives are based on 40% identity and are taken from ASTRAL database.
The PDB and ASTRAL sequences are used as queries for the PSI-BLAST searches against NCBI nr database. The resulting multiple sequence alignments (MSAs) of detected homologs are used to generate COMPASS profiles. To allow for the choice of different levels of sequence divergence within MSAs, the user can choose profiles corresponding to different numbers of PSI-BLAST iterations.

PFAM, COG, and KOG databases include families of both known and unknown 3D structure, covering the protein sequence space more completely. They provide alternative ways of family classification: typically tighter sequence grouping, with more consideration of protein function, and clustering of orthologs from different genomes.
PFAM profiles are generated by COMPASS from full family alignments provided by PFAM. COG and KOG profiles are generated from MSAs produced from the database sequences by program MUSCLE.
The profile databases are regularly updated when new versions of original databases are available.