HorA Server Documentation

Component
As shown in this flowchart, HorA server has three components: a negative filter, a positive filter, and a SVM model. 'neg' stands for the negative filter, 'hh' stands for the positive filter because the positive filter is HHsearch score, and 'svm' stands for the SVM model.

Score
If the component is 'hh', the score is the HHsearch probability ranging from 0 (least likely to be homologous) to 1 (most likely to be homologous). If the component is 'svm', the score is given by the SVM model. The higher the SVM score, the more likely the the pair is homologous.

Scaled score
Suppose S12 is the raw score between domain 1 and domain 2, Srandom is the random score between domain 1 and domain 2, S11 is the raw score between domain 1 and itself, and S22 is the raw score between domain 2 and itself. The scaled score between domain 1 and domain 2 equals (S12-Srandom) / ((S11+S22)/2 - Srandom). Scaled scores and modified z-scores (below) are used as input to the SVM model.

Modified z-score
Suppose S12 is the raw score between domain 1 and domain2. Suppose M1 and VAR1 are the mean and variance of the score distribution generated by comparing domain 1 to every domain in the database. And M2 and VAR2 are the same thing for domain 2. We calculate M12 = (M1 + M2) / 2 and STD12 = square_root((VAR1+VAR2) / 2). Modified z-score is calculated as Z = (S12 - M12) / STD12. Z is then transformed to Ztran: Ztran = 1 / (1 + e**(-Z)),   ** meaning power. Modified z-scores and scaled scores (above) are used as input to the SVM model.

DALI match
HHsearch alignments are displayed with DALI match line above the alignments. When the HHsearch aligned positions are exactly match with the DALI alignment (or other structural alignments FAST/TMalign if DALI is not alvailable), the aligned positions are marked with asterisks (*). If the aligned positions are not exactly same as DALI but within 4 residues from DALI aligned positions, they are marked by dots (.).

Consensus
Consensus score is an aggregate agreement value among three different structural alignment programs (DALI, FAST, and TMalign) that HorAServer runs to infer homologies. Consensus 0 means all of the alignments are different and 1 means at least two alignments agrees over significant potion of the alignments. Details are following; First, agreement values between alignments are calculated. This agreement is number of exactly matching aligned positions. Since there are three alignments programs, three agreement values are calculated (DALI vs. FAST, FAST vs. TMalign, and TMalign vs. DALI). The final consensus value is determined by the maximum of three agreement values divided by shorter length of query protein or hit protein.
Contact
Contact value (c) is number of long range contact in a DALI alignment. Suppose residues Ai and Aj in domain A are aligned to residues Bi and Bj in domain B, respectively. If Ai and Aj (and Bi and Bj) are separated by at least 10 amino acids in primary sequence and are within 14 Angstrom in the 3D structure, we consider that there is one long-range contact. By scanning all possible residue pairs in the aligned region, we sum up the total contact number c.