Documentation

Guide to the ECOD classification system, search tools, and data formats.

About ECOD

ECOD (Evolutionary Classification of Protein Domains) is a hierarchical classification of protein domains according to their evolutionary relationships. It combines automated sequence and structure analysis with manual curation to organize protein domains into a five-level hierarchy.

ECOD integrates domain classifications from both experimental structures in the Protein Data Bank and computationally predicted structures from the AlphaFold Protein Structure Database, covering 48 reference proteomes.

Classification Hierarchy

ECOD organizes protein domains into five hierarchical levels, from broad architectural similarity to close evolutionary families.

Architecture

Architecture groups domains with similar secondary structure compositions and geometric shapes.

X-group (Possible Homology)

X-group groups domains that are possible homologs with some yet inadequate evidence to support the homology relationships. In practice, they frequently share structural similarity.

H-group (Homology)

H-group groups domains that are thought to be homologous based on various considerations, such as high sequence or structure scores, functional similarity, unusual features, and literature.

T-group (Topology)

T-group groups domains with similar topological connections. Homologs with distinct topologies are separated in different T-groups under the same H-group.

F-group (Family)

F-group groups domains with significant sequence similarity in a family. Currently, F-groups consist of a large proportion of mapped Pfam families and some HHsearch-based clusters.

Provisional Representative

An automatically assigned domain is designated as a provisional representative when the F-group does not contain any manual representative. These are marked with an asterisk (*) in the tree view.

ECOD classification hierarchy levels showing the relationship between Architecture, X-group, H-group, T-group, and F-group

ECOD hierarchical levels (from Cheng et al., 2014)

Domain Identifiers

UID (Unique Identifier)

A numeric identifier unique to each domain. UIDs are 6–9 digit numbers that serve as the primary key for domain lookups (e.g., 002083261).

ECOD Domain ID

A human-readable identifier derived from the source structure. For experimental structures the format is e[pdb_id][chain][domain_number] — e.g., e1abcA1 refers to the first domain on chain A of PDB structure 1abc. For AlphaFold models, the format uses the UniProt accession.

Classification ID

Hierarchical IDs use dot notation. For example, 1.2.3.4 represents X-group 1, H-group 1.2, T-group 1.2.3, F-group 1.2.3.4. The number of dots indicates the level: 0 dots = X-group, 1 = H-group, 2 = T-group, 3 = F-group.

Searching ECOD

ECOD supports several search methods. The main search bar auto-detects the query type.

Method	Example	Description
UID	`002083261`	Directly opens the domain detail page
Domain ID	`e1abcA1`	ECOD domain identifier
PDB ID	`1abc`	Shows all domains from a PDB structure
UniProt	`P12345`	UniProt accession search
Cluster ID	`1.2.3`	Browse domains in a classification group
Keyword	`kinase`	Searches cluster names and protein annotations

Sequence Search (BLAST)

Find ECOD domains with similar protein sequences using BLASTP. Paste a FASTA sequence on the home page or use the dedicated BLAST search page. Searches run against the ECOD representative domain database.

Structure Search (Foldseek)

Find ECOD domains with similar 3D folds by uploading a PDB or mmCIF file. Uses Foldseek for fast structural alignment. See the structure search page.

Advanced Taxonomic Search

Filter domains by superkingdom, taxonomic rank, structure source (PDB/AlphaFold), and protein name. See the advanced search page.

Data Formats

Files available on the download page use the following formats:

domains.txt

Tab-separated file with one domain per line. Columns include UID, ECOD domain ID, classification hierarchy (A/X/H/T/F), source PDB/UniProt identifier, chain, residue range, and protein annotation.

hierarchy.txt

Tab-separated file describing the A/X/H/T/F group tree. Each line gives a group ID, its parent, level, and name.

FASTA files (.fa)

Standard FASTA-format protein sequences for all domains. Headers contain the ECOD domain ID and classification.

Clustering representatives (F40/F70/F99)

Subsets of domains selected as cluster representatives at 40%, 70%, and 99% sequence identity thresholds. Useful for reducing redundancy in analyses.

API Access

ECOD provides a public REST API for programmatic access. All endpoints return JSON with CORS enabled. No authentication is required. Rate-limited to 100 requests per minute per IP.

Endpoint	Description
`/api/v1/domains/:uid`	Domain details and classification
`/api/v1/domains/:uid/pdb`	Download pre-cut domain PDB coordinates
`/api/v1/domains/:uid/fasta`	Download domain FASTA sequence
`/api/v1/domains/uniprot/:acc`	All domains for a UniProt accession
`/api/v1/domains/pdb/:pdbId`	All domains from a PDB entry
`/api/v1/domains/pfam/:acc`	All domains mapped to a Pfam family
`/api/v1/domains/clan/:acc`	All domains in a Pfam clan
`/api/v1/domains/unclassified/:groupId`	Unclassified domains in an ECOD group
`/api/v1/health`	Service health check

Full API Reference with interactive examples →

Citation

If you use ECOD in your research, please cite:

Most Recent

Schaeffer RD, Medvedev KE, Andreeva A, Chuguransky SR, Pinto BL, Zhang J, Cong Q, Bateman A, Grishin NV. (2025) ECOD: integrating classifications of protein domains from experimental and predicted structures. Nucleic Acids Research, gkae1029. doi:10.1093/nar/gkae1029

Schaeffer RD, Liao Y, Cheng H, Grishin NV. (2017) ECOD: new developments in the evolutionary classification of domains. Nucleic Acids Research, 45(D1): D296-D302. doi:10.1093/nar/gkw1137

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV. (2014) ECOD: An evolutionary classification of protein domains. PLoS Comput Biol, 10(12): e1003926. doi:10.1371/journal.pcbi.1003926

Contact

For questions, feedback, or to report issues, email ecod.database@gmail.com. ECOD is developed and maintained by the Grishin Lab at UT Southwestern Medical Center.