Abstract
In this work, via the use
of the ‘comparison’ concept, Random Forest (RF) models were successfully
generated using unbalanced data sets that assign different importance factors
to atom pair potentials to enhance their ability to identify native proteins
from decoy proteins. Individual and combined data sets consisting of twelve
decoy sets were used to test the performance of the RF models. We find that RF
models increase the recognition of native structures without affecting their
ability to identify the best decoy structures. We also created models using
scrambled atom types, which create physically unrealistic probability
functions, in order to test the ability of the RF algorithm to create useful
models based on inputted scrambled probability functions. From this test we
find that we are unable to create models that are of similar quality relative
to the unscrambled probability functions. Next we created uniform probability
functions where the peak positions as the same as the original, but each
interaction has the same peak height. Using these uniform potentials we were
able to recover models as good as the ones using the full potentials suggesting
all that is important in these models are the experimental peak positions.
Supplementary materials
Title
Support Information
Description
Actions