Scores we used for evaluation of predictions of FM category
Having a good score to evaluate predictions is crucial for method development. Since many approaches are trained to produce models scoring better according to some evaluation method, flaws in the evaluation method will result in better-scoring models that will not represent real protein structure in any better way. As asessors of CASP9 FM category, We analyzed predictions using four scoring systems: the classic LGA GTD–TS, Contact score (CS), TenS and QCS.
CS: We developed contact score during the CASP8 season, and it was proven to give the best performance according to the study of offical CASP8 assessors. Please go here for the details of algorithm.
TenS: We developed TenS during the CASP5 season when we accessed the FR category (PubMed). Since FR and FM are close to each other, we tried to apply to the assessment of CASP9 FM targets. In general, TenS score is a automatic numerical evaluation scoring system, which contains six different structural measures (GDT, intra-molecular distance, Dali, TM, Mammoth and SOV) and four alignment score (Qlga, QDali, QTM, and Qmammoth). Among these measures, intra-molecular distance, Dali, CE, Mammoth and SOV are sequence independent; GDT, Qlga, QDali, QCE, and Qmammoth are sequence dependent. Thus, the scoring system is balanced. We rescaled the each individual score to Z-score, and combined them toghether with equal weight. The summed score (TenS) was used to compare overall performance of participating groups. Please see TenS slides for details.
QCS: To mimic the manual score, We a new score system QCS to mimic the manual assessment, and it was quite successful according to our critical inspection. Please see QCS slides for details.
Ratio score: Top performing groups use similar strategies that rank server models with various scoring functions and refine top picks. A natural question is: who did better than servers?. To answer this question, we developped a additional score: Ratio of best group model scores to top server model for each of the main scores (GDT, CS, TenS and QCS). The procedure is: ratio scores below 1 are ignored, average Scores(4) for each target, Sum score averages. The Sum of average ratios (which are rarely much larger than 1) indicates the number of times each group outperformed servers. Please see TenS slides for details.