Documentation
Guide to the ECOD classification system, search tools, and data formats.
About ECOD
ECOD (Evolutionary Classification of Protein Domains) is a hierarchical classification of protein domains according to their evolutionary relationships. It combines automated sequence and structure analysis with manual curation to organize protein domains into a five-level hierarchy.
ECOD integrates domain classifications from both experimental structures in the Protein Data Bank and computationally predicted structures from the AlphaFold Protein Structure Database, covering 48 reference proteomes.
Classification Hierarchy
ECOD organizes protein domains into five hierarchical levels, from broad architectural similarity to close evolutionary families.
Architecture
Architecture groups domains with similar secondary structure compositions and geometric shapes.
X-group (Possible Homology)
X-group groups domains that are possible homologs with some yet inadequate evidence to support the homology relationships. In practice, they frequently share structural similarity.
H-group (Homology)
H-group groups domains that are thought to be homologous based on various considerations, such as high sequence or structure scores, functional similarity, unusual features, and literature.
T-group (Topology)
T-group groups domains with similar topological connections. Homologs with distinct topologies are separated in different T-groups under the same H-group.
F-group (Family)
F-group groups domains with significant sequence similarity in a family. Currently, F-groups consist of a large proportion of mapped Pfam families and some HHsearch-based clusters.
Provisional Representative
An automatically assigned domain is designated as a provisional representative when the F-group does not contain any manual representative. These are marked with an asterisk (*) in the tree view.

ECOD hierarchical levels (from Cheng et al., 2014)
Domain Identifiers
UID (Unique Identifier)
A numeric identifier unique to each domain. UIDs are 6–9 digit numbers that serve as the primary key for domain lookups (e.g., 002083261).
ECOD Domain ID
A human-readable identifier derived from the source structure. For experimental structures the format is e[pdb_id][chain][domain_number] — e.g., e1abcA1 refers to the first domain on chain A of PDB structure 1abc. For AlphaFold models, the format uses the UniProt accession.
Classification ID
Hierarchical IDs use dot notation. For example, 1.2.3.4 represents X-group 1, H-group 1.2, T-group 1.2.3, F-group 1.2.3.4. The number of dots indicates the level: 0 dots = X-group, 1 = H-group, 2 = T-group, 3 = F-group.
Searching ECOD
ECOD supports several search methods. The main search bar auto-detects the query type.
| Method | Example | Description |
|---|---|---|
| UID | 002083261 | Directly opens the domain detail page |
| Domain ID | e1abcA1 | ECOD domain identifier |
| PDB ID | 1abc | Shows all domains from a PDB structure |
| UniProt | P12345 | UniProt accession search |
| Cluster ID | 1.2.3 | Browse domains in a classification group |
| Keyword | kinase | Searches cluster names and protein annotations |
Sequence Search (BLAST)
Find ECOD domains with similar protein sequences using BLASTP. Paste a FASTA sequence on the home page or use the dedicated BLAST search page. Searches run against the ECOD representative domain database.
Structure Search (Foldseek)
Find ECOD domains with similar 3D folds by uploading a PDB or mmCIF file. Uses Foldseek for fast structural alignment. See the structure search page.
Advanced Taxonomic Search
Filter domains by superkingdom, taxonomic rank, structure source (PDB/AlphaFold), and protein name. See the advanced search page.
Data Formats
Files available on the download page use the following formats:
domains.txt
Tab-separated file with one domain per line. Columns include UID, ECOD domain ID, classification hierarchy (A/X/H/T/F), source PDB/UniProt identifier, chain, residue range, and protein annotation.
hierarchy.txt
Tab-separated file describing the A/X/H/T/F group tree. Each line gives a group ID, its parent, level, and name.
FASTA files (.fa)
Standard FASTA-format protein sequences for all domains. Headers contain the ECOD domain ID and classification.
Clustering representatives (F40/F70/F99)
Subsets of domains selected as cluster representatives at 40%, 70%, and 99% sequence identity thresholds. Useful for reducing redundancy in analyses.
API Access
ECOD provides a public REST API for programmatic access. All endpoints return JSON with CORS enabled. No authentication is required. Rate-limited to 100 requests per minute per IP.
| Endpoint | Description |
|---|---|
/api/v1/domains/:uid | Domain details and classification |
/api/v1/domains/:uid/pdb | Download pre-cut domain PDB coordinates |
/api/v1/domains/:uid/fasta | Download domain FASTA sequence |
/api/v1/domains/uniprot/:acc | All domains for a UniProt accession |
/api/v1/domains/pdb/:pdbId | All domains from a PDB entry |
/api/v1/domains/pfam/:acc | All domains mapped to a Pfam family |
/api/v1/domains/clan/:acc | All domains in a Pfam clan |
/api/v1/domains/unclassified/:groupId | Unclassified domains in an ECOD group |
/api/v1/health | Service health check |
Citation
If you use ECOD in your research, please cite:
Schaeffer RD, Medvedev KE, Andreeva A, Chuguransky SR, Pinto BL, Zhang J, Cong Q, Bateman A, Grishin NV. (2025) ECOD: integrating classifications of protein domains from experimental and predicted structures. Nucleic Acids Research, gkae1029. doi:10.1093/nar/gkae1029
Schaeffer RD, Liao Y, Cheng H, Grishin NV. (2017) ECOD: new developments in the evolutionary classification of domains. Nucleic Acids Research, 45(D1): D296-D302. doi:10.1093/nar/gkw1137
Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV. (2014) ECOD: An evolutionary classification of protein domains. PLoS Comput Biol, 10(12): e1003926. doi:10.1371/journal.pcbi.1003926
Contact
For questions, feedback, or to report issues, email ecod.database@gmail.com. ECOD is developed and maintained by the Grishin Lab at UT Southwestern Medical Center.