The Critical Assessment of Techniques for Protein Structure
Prediction (CASP) experience suggests that the conserved regions of
multiple predicted structures (called decoys or models) for a given
protein can be utilized for protein structure prediction. Most of previous studies focused on the
identification of conserved regions with helps of alignment
information. In the cases where alignment information is
unavailable, the identification of conserved regions remains as a
Based on our previous work on approximating the bottleneck distance,
we proposed a formal definition of conserved
regions, and designed an O(m^2*n^2*log n) time algorithm to extract
the maximum set of conserved regions from m decoys for a protein
with n residues.
Using the algorithm to identify conserved regions, we first
investigated whether conserved regions of ab initio decoys are
similar to their counterparts in native structure. We observed that
for 16 out of 25 TBM (template-based modeling) CASP7 targets, our
method identifies over 70% native-like regions and filters out
over 90% of non-native-like regions, simultaneously. In addition,
we obtained more than half of native-like regions and filtered out
over 80% non-native-like regions for $10$ out of 12 FM (free
modeling) CASP7 targets.
We further investigated whether these conserved regions improve
protein structure prediction. We observed that for 10 out of 12
FM CASP7 targets, our method improves accuracies of ROSETTA.
In particular, by identifying conserved regions, the
quality of four targets were improved from meaningless (TMscore <
0.4) to meaningful (TMscore > 0.4).
Experimental results illustrate that our definition of conserved region
is effective, and that most identified conserved regions
are similar to the corresponding regions in native structures. In
addition, coupling with iteration strategy, the identified conserved
regions can improve the quality of the final generated structures.