CASP9

Targets
515 516 517 518
519 520 521 522
523 524 525 526
527 528 529 530
531 532 533 534
535 536 537 538
539 540 541 542
543 544 545 546
547 548 549 550
551 552 553 554
555 556 557 558
559 560 561 562
563 564 565 566
567 568 569 570
571 572 573 574
575 576 577 578
579 580 581 582
583 584 585 586
587 588 589 590
591 592 593 594
595 596 597 598
599 600 601 602
603 604 605 606
607 608 609 610
611 612 613 614
615 616 617 618
619 620 621 622
623 624 625 626
627 628 629 630
631 632 633 634
635 636 637 638
639 640 641 642
643      

Target Categories in CASP9

Some targets are easy to predict, as they have very close templates among known structures, other targets are quite challenging. It is essential to evaluate predictions taking into account target difficulty, since performance of different algorithms depends on it. Grouping targets into categories of approximately the same prediction difficulty brings out the flavors of how each method deals with different target types.

In CASP9, targets were classified in two general categories: TBM (template-based modelling) and FM (Free Modelling), to reflect the method that was used to obtain models. Clearly,  templates have something to do with it, like TBM assumes presence of templates by definition. Does FM assume absence of template by definition? we would say it is not. Traditionally, predictors thought about FM as 'hard'. FM, which is "free modelling", is a category where predictors are 'free' to do whatever they can, as they can't get is right anyway:).  Then, how to define CASP9 category? It became clear with time that the best approach is to use a combination of various methods as what matters is the quality of the final prediction. Therefore, it is logical to group targets into categories by the prediction quality.

During CASP8 season, we developed a general approach which leads to a well-defined boundaries between categories coming out naturally from the data 1).  For CSAP9, we applied very similar approach to define the target categories based on the quality of predictions. We resorted to a traditional model quality metric that stood the test of time – LGA GDT-TS scores. Targets for which domain-based evaluation is essential, as established by our analysis, were split into domains, and other targets remained as whole chains and were considered as single domain targets for evaluation purpose. This procedure resulted in 147 "domains" gathered from 116 targets. For each of these domains, we took GDT_TS score for the best server models, and calculated the median GDT_TS for above random models. Moreover, the rank of random model was recorded. Those two values were used to measure each target's difficulty.

We looked for naturally emerging clusters in these median GDT-TS scores and ranks of random models and used Gaussian kernel density estimation. Gaussian kernel density estimator is a function ρ(x) = in e-(x − μi)2/(2 σ2)/(√ σ n), where n is the number of domains, μi is median GDT-TS score or rank of random model for a domain i, and σ is a standard deviation, called bandwidth. Conceptually, each domain score generates a Gaussian centered at that score and with standard deviation σ. Averaging these Gaussians gives a density function ρ(x) that reveals score groups. Maxima of this function correspond to the group centers, and minima mark the boundaries between group. When the bandwidth is very narrow (= variance very small), each domain forms its own group. When the bandwidth is broad (= variance very large), all domains are in one group. Some optimal bandwidth setting should reveal meaningful groups in the data.

 

We combine the median GDT_TS for above random model and rank of the random model, and plot 2D of these CASP9 category definition as below:

From the 2D plot of these CASP9 category definition, cleary, the domains located at the left lower cornor should go to the FM category. Except that, we also would like to take the domains that do not have template available as FM targets, which results in 30 FM targets in total.

 


1) S.Shi, J.Pei, R.I.Sadreyev, L.N.Kinch, I.Majumdar, J.Tong, H.Cheng, B.H.Kim, N.V.Grishin (2009) "Analysis of CASP9 targets, predictions and assessment methods." Database (Oxford) 2009: bap003; PMID: 20157476