TriCyp
BrowseH-GroupsBenchmarkDownloadsPaper

TriCyp

Three-state cysteine classification across ECOD F70 representative domains — disulfide-bonded, metal-binding, or free thiol — combining ESM2 predictions with PDB structural evidence.

Navigation

  • Dashboard
  • Browse Families
  • H-Groups
  • Benchmark
  • AF Geometric
  • Downloads & API
  • About / Methods
  • Paper

Resources

  • ECOD Database
  • RCSB PDB

© 2026 Schaeffer & Cong Labs, UT Southwestern Medical Center

data · paper-v1·refreshed 2026-05-06

Downloads & API

Canonical landing for the manuscript's Software and data availability pointer. Bulk TSVs are rebuilt nightly from the live database; figure-data CSVs mirror the manuscript's paper/figure_data/ exports; predictor source and model weights are deposited on Zenodo.

Bulk data last refreshed 2026-05-06T19:11:57.000Z

Bulk data

cysteine-classifications.tsvTSV

Canonical full dump: one row per classified cysteine across all F70 representative domains, with three-state classification, ESM2-3state per-class probabilities, structural-evidence tags, and ECOD hierarchy.

domain_id, cys_position, classification, p_neg, p_dis, p_met, evidence_tags, f_group_id, h_group_id, x_group_id, source_type

Regenerated nightly via scripts/dump-tsv.sh. SHA-256 sidecar published alongside the file.

313.2 MBupdated 2026-05-06sha256: 471e0bb19c18aee2…
Download↓ .sha256
domain-summary.tsvTSV

Per-domain classification counts and source-type. One row per F70 representative domain.

domain_id, source_type, pdb_id, uniprot_acc, total_cys, n_disulfide, n_metal_binding, n_unclassified, f_group_id, h_group_id, x_group_id

38.1 MBupdated 2026-05-06sha256: e49831710f44e8dd…
Download↓ .sha256
hgroup-aggregates.tsvTSV

Per-H-group aggregate underlying Fig 3C (per-kingdom rates) and Fig 5A,B (confusion matrices). One row per H-group.

h_group_id, h_group_name, x_group_id, n_pdb_reps, n_afdb_reps, pdb_total_cys, afdb_total_cys, pdb_disulfide_pct, pdb_metal_pct, afdb_disulfide_pct, afdb_metal_pct

242.5 KBupdated 2026-05-06sha256: 8bb3c59e339071d5…
Download↓ .sha256

Figure data

One CSV per main and supplementary figure. Several panels also expose a client-side "Download CSV" button right next to the chart on the corresponding page; those exports use the same column conventions as the canonical files here.

Fig 3Afig3a_source_stratification.csvCSV

Source-stratified cysteine fates: PDB-geom / PDB-ESM / AFDB-ESM × free-thiol / disulfide / metal-binding fractions.

161 Bupdated 2026-05-07sha256: 7c1ef9ae9e84f4a2…
Download↓ .sha256
Fig 3Bfig3b_kingdom_fractions.csvCSV

Domain fraction vs cysteine fraction by superkingdom.

154 Bupdated 2026-05-08sha256: 0f1625571c197063…

Predictor source & model weights

cys3state predictor

Source for the ESM2-3state per-cysteine classifier, snapshotted at the paper-publication commit.

GitHub →Zenodo DOI pending

Model weights (best_modelA.pth … best_modelE.pth)

Five fine-tuned ESM2 model checkpoints used by the published 3-state classifier. Not hosted on this site; deposited on Zenodo.

Zenodo DOI pending

REST API

Common contract

Read-only JSON endpoints. No authentication required. All responses use the same envelope: { success: boolean, data?: <route-specific>, error?: { code, message } }. A successful response never carries an error field, and vice versa, so a single truthy check on success is enough to branch.

Rate limit: 1500 requests / 60 s per IP. Responses include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. Exceeding the limit returns 429 with a Retry-After header.

GET/api/domain/{domainId}

Per-cysteine predictions and structural evidence for one domain.

Path parameter accepts the ECOD domain identifier (`e3h35A1`) or the numeric database id. Returns the domain header, every classified cysteine with its three-state call, and structural-evidence streams (geometric SS bonds, PDB SSBOND records, PDB metal LINK records, ESM2 per-class probabilities).

Example request

GET /api/domain/e3h35A1

Example response

{
  "success": true,
  "data": {
    "domain": {
      "domainId": "e3h35A1",
      "rangeDefinition": "A:5-150",
      "sourceType": "pdb",
      "pdbId": "3h35",
      "chainId": "A",
      "uniprotAcc": "P12345",
      "xGroupId": "131.1",
      "hGroupId": "131.1.1",
      "tGroupId": "131.1.1.1",
      "fGroupId": "131.1.1.1.0"
    },
    "classifications": [
      {
        "cysPosition": 23,
        "classification": "DISULFIDE",
        "confidence": 0.984,
        "evidence": "SSBOND"
      },
      {
        "cysPosition": 67,
        "classification": "METAL_BINDING",
        "confidence": 0.992,
        "evidence": "METAL_LINK:ZN"
      }
    ],
    "evidence": {
      "esm2Predictions": [
        {
          "cysPosition": 23,
          "negProb": 0.012,
          "disProb": 0.984,
          "metProb": 0.004
        }
      ],
      "geometricDisulfides": [
        {
          "chain1": "A",
          "resnum1": 27,
          "chain2": "A",
          "resnum2": 134,
          "sgSgDistance": 2.04
        }
      ],
      "pdbSsbonds": [
        {
          "pdbId": "3h35",
          "chain1": "A",
          "resnum1": 27,
          "chain2": "A",
          "resnum2": 134,
          "bothInDomain": true
        }
      ],
      "pdbMetalLinks": [
        {
          "pdbId": "3h35",
          "metal": "ZN",
          "metalChain": "A",
          "metalResnum": 401,
          "coordResname": "CYS",
          "coordChain": "A",
          "coordResnum": 67,
          "coordAtom": "SG",
          "cofactor": null
        }
      ]
    }
  }
}

How to cite

Please cite the manuscript when reusing TriCyp data: Classification of cysteine fates in structure predictions using a protein language model. Yuan, Durham, Cong, Schaeffer. preprint DOI pending.

Data is released under CC-BY 4.0; the predictor source carries its existing license. The Zenodo deposition holds the versioned snapshot pinned to the paper-publication commit.

Download↓ .sha256
Fig 3Cfig3c_kingdom_rates.csvCSV

Per-kingdom three-state classification rates.

186 Bupdated 2026-05-08sha256: cc08b5f63f18b4f7…
Download↓ .sha256
Fig 3Dfig3d_subcellular.csvCSV

Disulfide and metal-binding rates per eukaryotic subcellular compartment.

436 Bupdated 2026-05-06sha256: b389a6b5a04c55a6…
Download↓ .sha256
Fig 2fig2_benchmark.csvCSV

ROC + PR curves and threshold-tuning data for ESM2-3state vs SSBONDPredict (disulfide) and vs LMetalSite / GPSite (metal-binding).

314 Bupdated 2026-05-06sha256: 3996247c7414a29f…
Download↓ .sha256
Fig S1figS1_metal_stratification.csvCSV

Metal-type-stratified ROC: shared-metals (Zn / Ca / Mg / Mn) and iron-only strata across ESM2-3state, LMetalSite, GPSite.

965 Bupdated 2026-05-06sha256: 9a4afad821fd0c78…
Download↓ .sha256
Fig S2figS2_source_breakdown.csvCSV

Source-type breakdown across PDB / AFDB / Prodigal / UniParc.

179 Bupdated 2026-05-06sha256: 5fb89ce2ef24afa1…
Download↓ .sha256
Fig S3figS3_confidence_distribution.csvCSV

Distribution of max-class probability across all classified cysteines.

398 Bupdated 2026-05-06sha256: 7b60125c99651e74…
Download↓ .sha256
Fig 4fig4_af_geometric.csvCSV

AlphaFold geometric scanning vs PDB ground truth: distance distributions, recall, and PAE attenuation.

248.1 KBupdated 2026-05-06sha256: cd83ddd2df3ee23a…
Download↓ .sha256
Fig 5A,Bfig5ab_hgroup_confusion.csvCSV

H-group confusion-matrix data underlying Fig 5A (disulfide) and Fig 5B (metal-binding).

341.7 KBupdated 2026-05-06sha256: 9313cd8035f10ba3…
Download↓ .sha256

Possible error codes

INVALID_ID, NOT_FOUND, RATE_LIMITED, DOMAIN_ERROR

GET/api/family/{fGroupId}

Aggregate stats and paginated domain list for one F-group.

Path parameter is the dotted F-group identifier (`131.1.1.0`). Returns a family header, classification totals, and a slice of the F70 representative domain list governed by the `page`, `limit`, `sortBy`, and `sortDir` query parameters.

Query parameters

NameDefaultDescription
page11-indexed page number.
limit50Page size (max 100).
sortBydomain_idOne of domain_id, source_type, total_cys, n_disulfide, n_metal_binding, n_unclassified.
sortDirascasc | desc.

Example request

GET /api/family/131.1.1.0?page=1&limit=2&sortBy=n_metal_binding&sortDir=desc

Example response

{
  "success": true,
  "data": {
    "family": {
      "fGroupId": "131.1.1.0",
      "fGroupName": "Example F-group",
      "xGroupId": "131",
      "xGroupName": "Example X-group",
      "hGroupId": "131.1",
      "tGroupId": "131.1.1",
      "domainCount": 47,
      "totalCys": 312,
      "nDisulfide": 88,
      "nMetalBinding": 42,
      "nUnclassified": 182
    },
    "domains": [
      {
        "domainId": "e3h35A1",
        "sourceType": "pdb",
        "pdbId": "3h35",
        "totalCys": 9,
        "nDisulfide": 2,
        "nMetalBinding": 4,
        "nUnclassified": 3
      }
    ],
    "pagination": {
      "total": 47,
      "page": 1,
      "limit": 2,
      "totalPages": 24
    }
  }
}

Possible error codes

INVALID_ID, NOT_FOUND, RATE_LIMITED, FAMILY_ERROR

GET/api/hgroup/{hGroupId}

Per-H-group aggregate plus the F70 representative list.

Path parameter is the dotted H-group identifier (`3380.1`). Returns aggregate PDB-source vs AFDB-source classification fractions and the full representative list with per-domain classification counts. Backs the /h-group/[id] detail page.

Example request

GET /api/hgroup/3380.1

Example response

{
  "success": true,
  "data": {
    "hGroupId": "3380.1",
    "hGroupName": "Candidate-novel metal-binding H-group",
    "xGroupId": "3380",
    "xGroupName": "Parent X-group",
    "nPdbReps": 12,
    "nAfdbReps": 318,
    "pdbTotalCys": 84,
    "afdbTotalCys": 2204,
    "pdbDisulfidePct": 0,
    "pdbMetalPct": 0,
    "afdbDisulfidePct": 1.4,
    "afdbMetalPct": 96.7,
    "representatives": [
      {
        "domainId": "e1abcA1",
        "sourceType": "pdb",
        "pdbId": "1abc",
        "uniprotAcc": null,
        "chainId": "A",
        "rangeDefinition": "A:5-110",
        "fGroupId": "3380.1.1.1.0",
        "fGroupName": "Example",
        "totalCys": 7,
        "nDisulfide": 0,
        "nMetalBinding": 6,
        "nUnclassified": 1
      }
    ]
  }
}

Possible error codes

INVALID_ID, NOT_FOUND, RATE_LIMITED, HGROUP_ERROR

GET/api/search

Search domain ID, PDB ID, UniProt accession, or X/H/F-group dotted notation.

Same matcher as the header search bar. The query string must be at least 2 characters. A single dotted query (`3380.1`) may resolve to several ECOD levels simultaneously — the response includes one entry per matching level so the caller can pick the right surface.

Query parameters

NameDefaultDescription
q—Search string. Minimum 2 chars.

Example request

GET /api/search?q=3380.1

Example response

{
  "success": true,
  "data": [
    {
      "type": "family",
      "id": "3380.1.1.1.0",
      "label": "F-group 3380.1.1.1.0",
      "description": "Example F-group"
    },
    {
      "type": "hgroup",
      "id": "3380.1",
      "label": "H-group 3380.1",
      "description": "Candidate-novel metal-binding H-group"
    },
    {
      "type": "xgroup",
      "id": "3380",
      "label": "X-group 3380",
      "description": "Parent X-group"
    }
  ]
}

Possible error codes

INVALID_QUERY, RATE_LIMITED, SEARCH_ERROR

GET/api/summary

Top-level dashboard counts.

Snapshot of the totals shown on the dashboard stat strip: total domains, total cysteines, the three classification counts, plus a top-X-group breakdown kept for legacy callers (it predates the dedicated /x-group surface).

Example request

GET /api/summary

Example response

{
  "success": true,
  "data": {
    "summary": {
      "totalDomains": 691078,
      "totalCysteines": 2706778,
      "nDisulfide": 461306,
      "nMetalBinding": 168171,
      "nUnclassified": 2077301,
      "pdbDomains": 48048,
      "predictedDomains": 643030
    },
    "xGroups": [
      {
        "xGroupId": "131",
        "xGroupName": "Example",
        "nDisulfide": 1234,
        "nMetal": 567,
        "nUnclassified": 4321,
        "total": 6122,
        "metalFraction": 0.092
      }
    ]
  }
}

Possible error codes

RATE_LIMITED, SUMMARY_ERROR

Error codes

CodeHTTPDescription
INVALID_ID400Path identifier missing or not in the expected shape.
INVALID_QUERY400Query string missing or below the minimum length (search: 2 chars).
NOT_FOUND404The requested resource (domain / family / H-group) does not exist or has no classified F70 representative.
RATE_LIMITED429IP exceeded the rate-limit window. Retry after the seconds in the Retry-After header.
DOMAIN_ERROR500Domain detail query failed (DB unreachable or schema drift).
FAMILY_ERROR500Family detail query failed.
HGROUP_ERROR500H-group detail query failed.
SEARCH_ERROR500Search query failed unexpectedly.
SUMMARY_ERROR500Dashboard summary query failed.