CASP8

Targets
387 388 389 390
391 392 393 394
395 396 397 398
399 400 401 402
403 404 405 406
407 408 409 410
411 412 413 414
415 416 417 418
419 420 421 422
423 424 425 426
427 428 429 430
431 432 433 434
435 436 437 438
439 440 441 442
443 444 445 446
447 448 449 450
451 452 453 454
455 456 457 458
459 460 461 462
463 464 465 466
467 468 469 470
471 472 473 474
475 476 477 478
479 480 481 482
483 484 485 486
487 488 489 490
491 492 493 494
495 496 497 498
499 500 501 502
503 504 505 506
507 508 509 510
511 512 513 514

Where are the new folds?

Were did the new folds go? It was a prominent category in early CASPs. Now, it is not even possible to define it as a "category". New folds are approaching extinction, and fast. In fact, it is likely that the majority of structure types are already known for water soluble proteins. We agree with the seminal paper from Jeff Skolnick group 1) that the structure space knowledge is close to complete as far as distinct types of secondary structure packing are concerned. HOWEVER, this is very far from saying that we know how to map sequence space on structure space, as deduction of many folds from sequence is currently not possible, and CASP8 results show it. Many families with "old" fold are not predictable from sequence. For instance, no server found the correct template for T0460, and confident predictions were not possible for T0465, T0466, etc. Thus, structural biologists still have a long road ahead as structural genomics reveals and will continue to find many wonderful examples of unusual occurrences of such sequence-unpredictable "old" folds hiding among the semi-random families being structurally characterized.

Back to new folds, i.e. distinct cores of secondary structural elements with connections and spatial arrangement not observed before. Although fold definition is subject to debate, experts frequently agree on what looks like a "new fold". Several experts in our group inspected CASP8 target structures and concluded that only two domains represent new folds. These were T0397_1 and T0496_1:

Nevertheless, there are some similarities between each of these domains and PDB structures. N-domain of T0397 has some topological resemblance to Ferredoxin-like fold with a curved β-sheet and α-helices deteriorated into loops. The ferredoxin-like core is elaborated with a loop and a β-strand (green) inserted into its β-hairpin and a β-strand (red) at its C-terminus. N-domain of T0496 shares similarity with RNAse H fold, and may even be viewed as a circular permutation of it.

Server predictions for both of these new folds were quite poor. However, our analysis shows that other two targets, clearly with known folds, i.e. T0407_2: an IG-like domain; and T0465: FYSH domain, were predicted very poorly as well. Apparently, as far as structure prediction is concerned, there is little difference between new folds and known folds for which templates are not detectable from sequence. Due to this observation and small number of new folds, CASP category definition should be based on a different criterion, e.g. quality of a few best server predictions.

 


1) Y.Zhang, I.A.Hubner, A.K.Arakaki, E.I.Shakhnovich and J.Skolnick (2006) "On the origin and highly likely completeness of single-domain protein structures." Proc Natl Acad Sci USA 103(8):2605-2610 PMID: 16478803

5