Random model of protein structure superposition

Random model of protein structure superposition. Two four-residue fragments of non-homologous protein chains are shown as spheres, representing CA atoms, connected by virtual bonds. One fragment, "protein 1", is colored black, the second, "protein 2", gray. Gapless equivalences of Cα atoms, depicted by, e.g. the labels b and b' in the figure, were forced regardless of structural similarity and a minimum RMSD superposition was performed. Such a forced superposition was termed a "random" superposition. Coordinate difference vector projections (DVPs) and their standard deviation, σobs, were computed from the random superposition. As an example, three projections Δvx, y, z (short thick lines on the x, y, z axes) from one vector Δv (arrow) are indicated. This procedure was repeated for all pairs of fragments in the dataset (32,004 pairs). Then two size and shape parameters from each superposition, Rg (gyration radius) and c, were used in a singular value decomposition (SVD) generalized linear least squares fit to determine coefficients for a polynomial f. Polynomial f estimated the expected standard deviation of projections, σexp, given the size and shape of a superposition, and σexp was calculated for all random superpositions in the set. Finally, distributions of σobs for a narrow range of σexp (0.2Å increments) were extracted and approximated with a PDF (Nakagami distribution). Parameters for this PDF were therefore dependent on size and shape of the random superpositions contained within the narrow range of σexp. These parameters were subsequently fit to a continuous curve with σexp as the independent variable. The continuous curve allowed construction of a random model, and therefore estimation of a p-value, for a superposition of arbitrary size and shape. As explained in the text but not shown in the figure, PDF parameters describing a distribution of σobs simultaneously described the corresponding distribution of projections Δvx, y, z, using a second PDF (Variance-Gamma distribution).