Compute the Pnear Quality Metric for a RMSD Funnel
Pnear.Rd
In protein structure prediction a key measure of accuracy is how well does the predicted energy or score correlate with the distance to a native conformation. A common distance measure is the all-atom root mean squared distance (RMSD). A challenge, however, is that we don't expect that far away from the native conformation, the energy should be discriminating, so we want to bias the assessment to those near the native conformation. We therefore The Pnear metric defined in (Bhardwaj, et al., Nature, 2016) measures the how "funnel-like" a score-vs-rmsd plot is. Pnear Rosetta Documentation
Arguments
- score
a vector of scores e.g. Rosetta energies e.g. in the Ref2015.
- rmsd
root mean squared deviation values for e.g. backbone atoms
- lambda
Lambda is a value in Angstroms indicating the breadth of the Gaussian used to define "native-like-ness". The bigger the value, the more permissive the calculation is to structures that deviate from native. Typical values for peptides range from 1.5 to 2.0, and for proteins from 2.0 to perhaps 4.0.
- kbt
The value of k_B*T, in energy units, determines how large an energy gap must be in order for a sequence to be said to favor the native state. The default value, 0.62, should correspond to physiological temperature for ref2015 or any other scorefunction with units of kcal/mol.
- verbose
give verbose output.
Details
# subtract off the min-score as is done in the Rosetta Code
scores = scores - min(scores)
# write down the equation in more code-like notation
Pnear <- Sum_i[exp(-RMSD[i]^2/lambda^2)*exp(-scores[i]/k_BT)] /
Sum_i[exp(-scores[i]/k_BT)]
# combine the terms in the first exponential
Pnear = Sum_i[exp(-RMSD[i]^2/lambda^2 - scores[i]/k_BT)] /
Sum_i[exp(-scores[i]/k_BT)]
let x_i = RMSD[i]^2/lambda^2 * k_BT/scores[i]
beta = -scores[i]
Pnear = Sum_i[exp(-RMSD[i]^2/lambda^2*k_BT/scores])]
# Use the log-sum-exponential trick
log(Pnear) = log_sum_exp(-RMSD[i]^2/lambda^2 - scores[i]/k_BT)
- log_sum_exp(-scores[i]/k_BT)
Note
Unlike the Conway discrimination score, the PNear calculation uses no hard cutoffs. This is advantageous for repeated testing: if the scatter of points on the RMSD plot changes very slightly from run to run, the PNear value will only change by a small amount, whereas any metric dependent on hard cutoffs could change by a large amount if a low-energy point crosses an RMSD threshold.
Author
Vikram K. Mulligan vmulligan@flatironinstitute.org adapted from Rosetta.