S2 covers the simplest compact motifs

The expected S2 frequencies based on superfamily hits and motif hits are summarized in the table below. The theoretical expected frequencies yield identical values for each of the SSPs, with the exception of the split EE based on the presence of a hydrogen bond, while the observation-based expected frequencies vary according to the component SSEs. For example, the expected frequency of HH is 16.2% while the expected frequency for HE is 24.1%. In order to compare these two methods, we measured the difference between the expected frequency and observed frequency for each method (table below, * marks expected calculation closer to observed). The expected motif frequencies obtained by the observation-based expected frequency are closer to the observed motif frequency for three (out of five) SSPs (EE, HH and HE). Similarly, the expected and observed superfamily frequencies of three SSPs (EE, EH and HE) more closely match the observation-based expected frequency. Thus, employing observed frequencies of the constituent SSPs in the expected frequency enhances accuracy. The trend is more pronounced in larger SSPs. Therefore, in our analysis of S2-S5, we will consider only the observation-based expected frequency.

S2 consists of five SSPs: EH, HE, HH, hydrogen-bonded EE and non-hydrogen-bonded EE (frequencies in table below). The three simplest units of supersecondary structure analyzed by others include the β-hairpin, α-hairpin and βαβ units. Two of these stable units are included in the S2 SSPs, i.e. EE with hydrogen bonds (β-hairpin and the β-strands of βαβ) and HH (α-hairpin), while EH and HE alternate in the βαβ unit. The SSP EE, which does not have hydrogen bonds, represents a special case that does not exist as an independent unit (each β-strand requires hydrogen-bonding to another β-strand not in the SSP), but can be used in assembling larger SSPs. It is observed much less commonly than the others (table below).

 

 

EH and HE have identical expected frequencies. However, the observed motif frequency of EH (25.04%) is higher than that of HE (22.49%). The prevalence of the classic Rossmann-fold (doubly-wound) that usually starts from a β-strand and ends with an α-helix and the TIM barrel where EH and HE overlap with each other are causal factors. For instance, a typical P-loop containing nucleoside triphosphate hydrolase has 5 EH units and 4 HE units. However, both HE and EH have a lower observed superfamiliy frequency (20.56%), which is lower than that of α-hairpins and β-hairpins. Potentially, the discrepancy is caused by the clustering of Rossmann-like proteins into large SCOP superfamilies. For instance, a single superfamily of P-loop containing nucleoside triphosphate hydrolases contains 2433 domains with 26104 EH motif hits. This classification is reflected in the SCOP superfamily counts (Table 4): SCOP has less superfamilies in the α/β class (244 superfamilies) than in the all-β class (354 superfamilies).

 

 

EE displays the highest observed superfamily frequency (24.7%), and the second highest observed motif frequency (22.7%) among all S2 SSPs. These frequencies are higher than the expected values of 14.6% and 17.8%, respectively. EE is in fact a combination of EE with and without hydrogen bonds, thus its observed motif frequency is higher. Further, most of the non-bonded EE motif hits correspond to nearby β-strand pairs in two different sheets of β-sandwich structures. This also leads to higher than expected observed superfamily frequencies. When compared to the parallel β-sheet configuration established by the EH and HE motifs, the antiparallel β-hairpin represented by EE is the energetically more favored β-sheet configuration due to the well-aligned hydrogen bonds.

The HH (α-helical hairpin) motif shows the second lowest observed motif frequency (15.7%), which is close to its expected motif frequency of 16.2% (lowest). This correlates with the lower number of α-helix hairpin repeating units present in α-helical proteins. The superfamily with the largest number of motif hits found by HH is the Globin-like, but it contains only 3 pairs of α-helical hairpins that are perpendicular to each other and packed in a folded leaf topology. Alternatively, the observed superfamily frequency of HH ranks the second highest (23.66%) among S2 SSPs (higher than that of EH and HE), which is predicted by its expected superfamily frequency (second highest, 21.07%). The higher superfamily frequency correlates with the prevalence of superfamilies belonging to the all-α class (507) and a very high percentage (62%) of the superfamilies belonging to the all-β class contain α-helices.