# ECOD v293.1 Release Notes **Release Date:** January 13, 2026 ## Overview ECOD v293.1 is a significant update featuring the migration from Pfam 37.4 to Pfam 38.1, introducing 3,358 new protein families and reclassifying approximately 47,000 domains based on updated Pfam clan and family definitions. ## Key Changes ### Pfam 38.1 Migration This release incorporates Pfam database version 38.1 (released July 2024), which includes: - Updated HMM profiles with improved sensitivity - Revised clan memberships reflecting new evolutionary relationships - New protein families discovered since Pfam 37.4 The migration involved: - **47,095 domains** reclassified through updated Pfam-to-F-group mappings - **3,358 new F-groups** created from novel Pfam 38.1 families - Clan-aware assignment using `--cut_ga` (gathering threshold) for reliable domain boundaries ### New F-group Composition The 3,358 new F-groups include families from Pfam 38.1 that were not present in the previous Pfam-to-F-group mappings. These families contain both experimental PDB structures and AlphaFold predicted models. ## Statistics ### Domain Coverage | Metric | Count | |--------|-------| | Total domains | 2,626,470 | | PDB domains | 1,110,671 | | AlphaFold domains | 1,515,799 | | Sequences | 2,709,251 | ### Classification Hierarchy | Level | Count | |-------|-------| | X-groups (Possible Homology) | 2,455 | | H-groups (Homology) | 3,948 | | T-groups (Topology) | 3,948 | | F-groups (Family) | 20,632 | ### Representatives | Type | Count | |------|-------| | Manual representatives | 26,664 | | F40 clustering representatives | 483,474 | | F70 clustering representatives | 851,573 | | F99 clustering representatives | 1,434,264 | ## Distribution Files All distribution files are available at: http://prodata.swmed.edu/ecod/distributions/ ### Main Files - `ecod.v293.1.domains.txt` - Complete domain list with classifications - `ecod.v293.1.names.txt` - Simplified domain index - `ecod.v293.1.fa` - FASTA sequences for all domains - `ecod.v293.1.hierarchy.txt` - Classification hierarchy with domain counts ### Clustering Representatives - `ecod.v293.1.F40.*` - 40% sequence identity representatives - `ecod.v293.1.F70.*` - 70% sequence identity representatives - `ecod.v293.1.F99.*` - 99% sequence identity representatives ### BLAST Database - `ecod.v293.1.blast.*` - BLAST-formatted database for sequence searches ### Checksums - `ecod.v293.1.md5` - MD5 checksums for all files ## Technical Details ### Pfam Assignment Parameters - Pfam-A HMM database version: 38.1 - hmmscan with `--cut_ga` (gathering threshold) - Clan-aware assignment respecting Pfam clan definitions ### Clustering Parameters - CD-HIT used for sequence clustering within F-groups - F40: 40% sequence identity, 80% alignment coverage - F70: 70% sequence identity, 80% alignment coverage - F99: 99% sequence identity, 80% alignment coverage ### Data Sources - PDB structures through July 2025 - AlphaFold DB v4 (48 proteomes + SwissProt set) - UniProt 2024_06 annotations ## Changes from v293 | Metric | v293 | v293.1 | Change | |--------|------|--------|--------| | Total domains | 2,626,478 | 2,626,470 | -8 | | F-groups | 17,274 | 20,632 | +3,358 | | Pfam mappings | 17,274 | 20,632 | +3,358 | Note: The slight decrease in total domains reflects obsolescence of 8 domains, while F-group count increases due to Pfam 38.1 migration. ## Acknowledgments This release was developed using the ECOD release management infrastructure with automated Pfam clan integration and per-F-group clustering pipelines. ## Citation Please cite ECOD as: > Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV. (2014) ECOD: An evolutionary classification of protein domains. PLoS Comput Biol 10(12): e1003926. ## Contact For questions or issues, please contact: ecod.database@gmail.com