TriCyp
BrowseH-GroupsBenchmarkDownloadsPaper

TriCyp

Three-state cysteine classification across ECOD F70 representative domains — disulfide-bonded, metal-binding, or free thiol — combining ESM2 predictions with PDB structural evidence.

Navigation

  • Dashboard
  • Browse Families
  • H-Groups
  • Benchmark
  • AF Geometric
  • Downloads & API
  • About / Methods
  • Paper

Resources

  • ECOD Database
  • RCSB PDB

© 2026 Schaeffer & Cong Labs, UT Southwestern Medical Center

data · paper-v1·refreshed 2026-05-06

TriCyp

Three-state cysteine classification across ~700,000 ECOD representative domains.

Companion deposition site for Classification of cysteine fates in structure predictions using a protein language model — Yuan, Durham, Cong, Schaeffer. preprint link pending.

Domains
691,078
Cysteines
2,706,778
Disulfide
461,306(17.0%)
Metal-binding
168,171(6.2%)
Free thiol
2,077,301(76.7%)

Cysteine fates by classification source

|

Cysteine fate breakdown stratified by classification source. PDB-geom uses geometric ground truth (Sγ–Sγ disulfides plus PDB metal LINK records); PDB-ESM and AFDB-ESM are ESM2-3state predictions on PDB-source and AFDB-source F70 representatives, respectively.

Domain vs cysteine fraction by kingdom

|

Eukaryotic domains contribute disproportionately to total cysteine count. Bacterial and archaeal domains are cysteine-poor by comparison; the gap between domain fraction and cysteine fraction is the headline of this panel.

Per-kingdom classification rates

|

Per-kingdom three-state classification rates. Eukaryotic cysteines are enriched for disulfides; archaeal cysteines retain a higher metal-binding rate consistent with iron–sulfur cluster prevalence in archaea.

Subcellular localization (eukaryotic)

|

Eukaryotic-only subcellular gradient: extracellular and secretory-pathway compartments are disulfide-rich; cytoplasmic, nuclear, and mitochondrial compartments are metal-binding-rich. Source: UniProt Subcellular Location annotations cross-referenced to ECOD F70 representative domains.

ESM2 classification confidence

|

Distribution of max-class probability across all classified cysteines. The long right tail in disulfide and metal-binding classes shows that the positive predictions are made with high confidence; free-thiol calls dominate the lower-probability bins where the model is appropriately uncertain.

Source-type breakdown

|

Source-type breakdown across F70 representative domains. PDB-source domains have experimental coverage; AFDB and other predicted-source domains rely on ESM2-3state predictions only.