# ECOD v294 Release Notes **Release Date:** March 8, 2026 ## Overview ECOD v294 is a major update combining three workstreams: full Pfam 38.2 reconciliation, archaeal proteome expansion, and curator-driven consistency fixes. This release introduces 25,921 new protein families, adds 157,335 archaeal domains from 65 target classes, and incorporates boundary corrections and reclassifications across the hierarchy. ## Key Changes ### Pfam 38.2 Reconciliation Complete bidirectional reconciliation against Pfam database version 38.2: - **661,822 domain reassignments** based on updated Pfam HMM profiles - **25,921 new F-groups** created from Pfam 38.2 families - **1,168 F-groups deprecated** (emptied by reassignment) - **273,016 domains demoted to .0** (no valid Pfam 38.2 hit) - **21,389 domains promoted** from .0 to classified F-groups - **97,153 marginal AFDB fragment domains** deprecated (48P set) ### Archaeal Proteome Expansion First systematic incorporation of archaeal predicted structures: - **157,335 new domains** from **98,146 proteins** across 65 archaeal species - Sources: AlphaFold DB (96,651), Prodigal gene predictions (33,700), UniParc (26,984) - **1,630 new F-groups** created via DPAM + Pfam annotation pipeline - Domain assignment: 115,717 family-specific + 41,618 topology-only (.0) ### Consistency Curation Curator-driven corrections improving classification accuracy: - X-group merges and domain splits - Domain boundary corrections - Family reclassifications - Beta propeller remediation - Helicase domain fixes ## Statistics ### Domain Coverage | Metric | Count | |--------|-------| | Total domains | 2,720,205 | | PDB domains | 1,146,444 | | AlphaFold/predicted domains | 1,573,761 | ### Classification Hierarchy | Level | Count | |-------|-------| | X-groups (Possible Homology) | 2,459 | | H-groups (Homology) | 3,719 | | T-groups (Topology) | 3,954 | | F-groups (Family) | 40,663 | ### Representatives | Type | Count | |------|-------| | F40 clustering representatives | 464,955 | | F70 clustering representatives | 889,259 | | F99 clustering representatives | 1,491,147 | ## Distribution Files All distribution files are available at: http://prodata.swmed.edu/ecod/distributions/ ### Main Files - `ecod.v294.domains.txt` - Complete domain list with classifications - `ecod.v294.names.txt` - Simplified domain index - `ecod.v294.fa` - FASTA sequences for all domains - `ecod.v294.hierarchy.txt` - Classification hierarchy with domain counts - `ecod.v294.f_id_pfam_acc.txt` - ECOD F-group to Pfam accession mapping ### Clustering Representatives - `ecod.v294.F40.*` - 40% sequence identity representatives - `ecod.v294.F70.*` - 70% sequence identity representatives - `ecod.v294.F99.*` - 99% sequence identity representatives ### Structure Files - `ecod.v294.F40.pdb.tar.gz` - F40 representative PDB structures (13GB) ### Search Databases - `ecod.v294.F40.hhm_db.tar.gz` - HHsuite profile database (F40 representatives, 3.9GB) - `ecod.v294.blast.*` - BLAST-formatted database (domain-level) - `ecod.v294.chainwise.*` - BLAST-formatted database (chain-level) ### Checksums - `ecod.v294.md5` - MD5 checksums for all files ## Changes from v293.1 | Metric | v293.1 | develop294 | Change | |--------|--------|------------|--------| | Total domains | 2,626,470 | 2,720,205 | +93,735 | | PDB domains | 1,110,671 | 1,146,444 | +35,773 | | Predicted domains | 1,515,799 | 1,573,761 | +57,962 | | F-groups | 20,632 | 40,663 | +20,031 | | X-groups | 2,455 | 2,459 | +4 | | T-groups | 3,948 | 3,954 | +6 | ## Data Sources - PDB structures current as of February 2026 - AlphaFold DB v4 (48 proteomes + SwissProt set) - Archaeal proteomes from AlphaFold DB, Prodigal, and UniParc - Pfam 38.2 HMM profiles - UniProt 2024_06 annotations ## Citation Please cite ECOD as: > R. D. Schaeffer, K. E. Medvedev, A. Andreeva, S. R. Chuguransky, B. L. Pinto, J. Zhang, Q. Cong, A. Bateman, N. V. Grishin. (2025) ECOD: integrating classifications of protein domains from experimental and predicted structures. Nucleic Acids Res. 53(D1): D411-D418. ## Contact For questions or issues, please contact: ecod.database@gmail.com