Scores we used for evaluation of predictions of FM category

Having a good score to evaluate predictions is crucial for method development. Since many approaches are trained to produce models scoring better according to some evaluation method, flaws in the evaluation method will result in better-scoring models that will not represent real protein structure in any better way. As asessors of CASP9 FM category, We analyzed predictions using four scoring systems: the classic LGA GTD–TS, Contact score (CS), TenS and QCS.

CS: We developed contact score during the CASP8 season, and it was proven to give the best performance according to the study of offical CASP8 assessors. Please go here for the details of algorithm.

TenS: We developed TenS during the CASP5 season when we accessed the FR category (PubMed). Since FR and FM are close to each other, we tried to apply to the assessment of CASP9 FM targets. In general, TenS score is a automatic numerical evaluation scoring system, which contains six different structural measures (GDT, intra-molecular distance, Dali, TM, Mammoth and SOV) and four alignment score (Qlga, QDali, QTM, and Qmammoth). Among these measures, intra-molecular distance, Dali, CE, Mammoth and SOV are sequence independent; GDT, Qlga, QDali, QCE, and Qmammoth are sequence dependent. Thus, the scoring system is balanced. We rescaled the each individual score to Z-score, and combined them toghether with equal weight. The summed score (TenS) was used to compare overall performance of participating groups. Please see TenS slides for details.

QCS: To mimic the manual score, We a new score system QCS to mimic the manual assessment, and it was quite successful according to our critical inspection. Please see QCS slides for details.

Ratio score: Top performing groups use similar strategies that rank server models with various scoring functions and refine top picks. A natural question is: who did better than servers?. To answer this question, we developped a additional score: Ratio of best group model scores to top server model for each of the main scores (GDT, CS, TenS and QCS). The procedure is: ratio scores below 1 are ignored, average Scores(4) for each target, Sum score averages. The Sum of average ratios (which are rarely much larger than 1) indicates the number of times each group outperformed servers. Please see TenS slides for details.

Targets
515	516	517	518
519	520	521	522
523	524	525	526
527	528	529	530
531	532	533	534
535	536	537	538
539	540	541	542
543	544	545	546
547	548	549	550
551	552	553	554
555	556	557	558
559	560	561	562
563	564	565	566
567	568	569	570
571	572	573	574
575	576	577	578
579	580	581	582
583	584	585	586
587	588	589	590
591	592	593	594
595	596	597	598
599	600	601	602
603	604	605	606
607	608	609	610
611	612	613	614
615	616	617	618
619	620	621	622
623	624	625	626
627	628	629	630
631	632	633	634
635	636	637	638
639	640	641	642
643