Where are the new folds?

Were did the new folds go? It was a prominent category in early CASPs. Now, it is not even possible to define it as a "category". New folds are approaching extinction, and fast. In fact, it is likely that the majority of structure types are already known for water soluble proteins. We agree with the seminal paper from Jeff Skolnick group 1) that the structure space knowledge is close to complete as far as distinct types of secondary structure packing are concerned. HOWEVER, this is very far from saying that we know how to map sequence space on structure space, as deduction of many folds from sequence is currently not possible, and CASP8 results show it. Many families with "old" fold are not predictable from sequence. For instance, no server found the correct template for T0460, and confident predictions were not possible for T0465, T0466, etc. Thus, structural biologists still have a long road ahead as structural genomics reveals and will continue to find many wonderful examples of unusual occurrences of such sequence-unpredictable "old" folds hiding among the semi-random families being structurally characterized.

Back to new folds, i.e. distinct cores of secondary structural elements with connections and spatial arrangement not observed before. Although fold definition is subject to debate, experts frequently agree on what looks like a "new fold". Several experts in our group inspected CASP8 target structures and concluded that only two domains represent new folds. These were T0397_1 and T0496_1:

3d4r_A:-7-82 cartoon
N-domain of T0397: 3d4r chain A residues -7-82

ferredoxin fold diagram
structure and topology diagrams of ferredoxin fold – fold closest to T0397_1

3do9_A:4-126 cartoon
N-domain of T0496: 3do9 chain A, residues 4-126

RNAseH fold diagram
structure and topology diagrams of RNAseH fold – fold closest to T0496_1

Nevertheless, there are some similarities between each of these domains and PDB structures. N-domain of T0397 has some topological resemblance to Ferredoxin-like fold with a curved β-sheet and α-helices deteriorated into loops. The ferredoxin-like core is elaborated with a loop and a β-strand (green) inserted into its β-hairpin and a β-strand (red) at its C-terminus. N-domain of T0496 shares similarity with RNAse H fold, and may even be viewed as a circular permutation of it.

Server predictions for both of these new folds were quite poor. However, our analysis shows that other two targets, clearly with known folds, i.e. T0407_2: an IG-like domain; and T0465: FYSH domain, were predicted very poorly as well. Apparently, as far as structure prediction is concerned, there is little difference between new folds and known folds for which templates are not detectable from sequence. Due to this observation and small number of new folds, CASP category definition should be based on a different criterion, e.g. quality of a few best server predictions.

¹⁾ Y.Zhang, I.A.Hubner, A.K.Arakaki, E.I.Shakhnovich and J.Skolnick (2006) "On the origin and highly likely completeness of single-domain protein structures." Proc Natl Acad Sci USA 103(8):2605-2610 PMID: 16478803

Targets
387	388	389	390
391	392	393	394
395	396	397	398
399	400	401	402
403	404	405	406
407	408	409	410
411	412	413	414
415	416	417	418
419	420	421	422
423	424	425	426
427	428	429	430
431	432	433	434
435	436	437	438
439	440	441	442
443	444	445	446
447	448	449	450
451	452	453	454
455	456	457	458
459	460	461	462
463	464	465	466
467	468	469	470
471	472	473	474
475	476	477	478
479	480	481	482
483	484	485	486
487	488	489	490
491	492	493	494
495	496	497	498
499	500	501	502
503	504	505	506
507	508	509	510
511	512	513	514