Paper Fig 2 + Fig S1
Held-out benchmarking of ESM2-3state against the structure-aware baselines used in the manuscript. Disulfide prediction is compared against SSBONDPredict; metal-binding prediction against LMetalSite and GPSite. Operating thresholds were chosen on the held-out validation set (metal-binding ≥ 0.972, disulfide ≥ 0.742).
Held-out evaluation · v2 (Apr 2026)
Disulfide
| Tool | AUROC | AP |
|---|---|---|
| ESM2-3state | 0.987 | 0.966 |
| SSBONDPredict | 0.971 | 0.894 |
Metal-binding (all metals)
| Tool | AUROC | AP |
|---|---|---|
| ESM2-3state | 0.994 | 0.943 |
| LMetalSite | 0.979 | 0.892 |
| GPSite | 0.975 | 0.881 |
On the v2 (zinc-rebalanced) benchmark, all three metal-binding tools score in the same band (AUROC 0.975–0.994) because most held-out positives are zinc — a metal LMetalSite and GPSite were trained on. Stratifying by metal type isolates where ESM2-3state actually differs: on the shared metals (Zn / Ca / Mg / Mn) the AUROC values are essentially tied (0.994–0.996), and the residual difference in the all-metals number comes entirely from iron coordination — Fe-S clusters and especially heme — which the specialist tools were not designed to predict. The per-stratum table below makes the scope-vs-architecture read explicit.
Fig 2
Held-out benchmarking of three-state cysteine classification. Panels A–C compare ESM2-3state against SSBONDPredict for disulfide prediction (ROC, PR, and threshold-tuning curves). Panels D–F repeat the comparison for metal-binding against LMetalSite and GPSite.






Fig S1
Metal-type-stratified ROC. On the metals all three tools were trained for (Zn / Ca / Mg / Mn), AUROC values are essentially tied (0.994–0.996). The differences in the all-metals comparison are entirely from the iron stratum (Fe-S clusters and heme), where specialist tools were not designed to predict — heme is the worst case for them (LMetalSite 0.838, GPSite 0.711 vs ESM2-3state 0.981). The iron stratum is roughly 15% of held-out metal positives in the v2 (zinc-rebalanced) benchmark.

Iron stratum (Fe / Fe-S / heme)
The iron-stratum AUROC gap reflects training-set scope rather than algorithmic superiority. ESM2-3state was trained directly on cysteine 3-state labels covering Fe / Fe-S / heme coordination; LMetalSite and GPSite were trained on Zn / Ca / Mg / Mn binding. On the metals all three tools share training coverage (Zn / Ca / Mg / Mn) the AUROC values are essentially tied (0.994–0.996). The difference shows up specifically on iron coordination, and particularly on heme (ESM2 0.981 vs LMetalSite 0.838 vs GPSite 0.711) — a sub-domain the specialist tools were not designed for. Read the iron-stratum advantage as a coverage statement, not a head-to-head outperformance claim.
ESM2-3state
0.993
AUROC · Fe
LMetalSite
0.917
AUROC · Fe
GPSite
0.877
AUROC · Fe
| Tool | Task | Stratum | AUROC | AP |
|---|---|---|---|---|
| ESM2-3state | Disulfide | All metals | 0.987 | 0.966 |
| SSBONDPredict | Disulfide | All metals | 0.971 | 0.894 |
| ESM2-3state | Metal-binding | All metals | 0.994 | 0.943 |
| LMetalSite | Metal-binding | All metals | 0.979 | 0.892 |
| GPSite | Metal-binding | All metals | 0.975 | 0.881 |
| ESM2-3state | Metal-binding | Shared metals (Zn/Ca/Mg/Mn) | 0.996 | 0.946 |
| LMetalSite | Metal-binding | Shared metals (Zn/Ca/Mg/Mn) | 0.994 | 0.932 |
| GPSite | Metal-binding | Shared metals (Zn/Ca/Mg/Mn) | 0.996 | 0.944 |
| ESM2-3state | Metal-binding | Iron only | 0.993 | 0.594 |
| LMetalSite | Metal-binding | Iron only | 0.917 | 0.209 |
| GPSite | Metal-binding | Iron only | 0.877 | 0.114 |
| ESM2-3state | Metal-binding | Iron · [4Fe-4S] | 0.995 | 0.431 |
| LMetalSite | Metal-binding | Iron · [4Fe-4S] | 0.919 | 0.130 |
| GPSite | Metal-binding | Iron · [4Fe-4S] | 0.871 | 0.053 |
| ESM2-3state | Metal-binding | Iron · heme | 0.991 | 0.152 |
| LMetalSite | Metal-binding | Iron · heme | 0.838 | 0.003 |
| GPSite | Metal-binding | Iron · heme | 0.706 | 0.001 |
| ESM2-3state | Metal-binding | Iron · [2Fe-2S] / [3Fe-4S] | 0.992 | 0.360 |
| LMetalSite | Metal-binding | Iron · [2Fe-2S] / [3Fe-4S] | 0.920 | 0.103 |
| GPSite | Metal-binding | Iron · [2Fe-2S] / [3Fe-4S] | 0.926 | 0.063 |
Tabular summary of the held-out benchmark (v2, Apr 2026). All-metals AUROC + AP, the shared-metal subset (Zn/Ca/Mg/Mn), and the per-iron-cofactor strata (4Fe-4S, heme, 2Fe-2S/3Fe-4S) are all transcribed from the v2 protocol; per-stratum AP is not reported in the source and renders as em-dashes.
The classification published on TriCyp uses fixed thresholds chosen on the held-out validation set: a cysteine is called metal-binding when P(Met) ≥ 0.972, disulfide when P(Dis) ≥ 0.742, and otherwise free thiol. The two thresholds were tuned independently to the same per-task precision target on held-out data; raw probabilities for every cysteine remain available via the per-cysteine TSV download (see Downloads) so users can re-threshold for their own use case.