Mirny, L. & Shakhnovich, E. Evolutionary conservation of the folding nucleus. Journal of Molecular Biology 308, 2, 123 - 129 (2001). Publisher's VersionAbstract
Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear.
Tiana, G., Broglia, R.A. & Shakhnovich, E.I. Hiking in the energy landscape in sequence space: A bumpy road to good folders. Proteins - Structure, function and bioinformatics 39, 3, 244 - 251 (2000). Publisher's VersionAbstract
With the help of a simple 20-letter lattice model of heteropolymers, we investigated the energy landscape in the space of designed good-folder sequences. Low-energy sequences form clusters, interconnected via neutral networks, in the space of sequences. Residues that play a key role in the foldability of the chain and in the stability of the native state are highly conserved, even among the chains belonging to different clusters. If, according to the interaction matrix, some strong attractive interactions are almost degenerate (i.e., they can be realized by more than one type of amino acid contacts), sequence clusters group into a few superclusters. Sequences belonging to different superclusters are dissimilar, displaying very small (≈10%) similarity, and residues in key sites are, as a rule, not conserved. Similar behavior is observed in the analysis of real protein sequences.
Shimada, J., Ishchenko, A. v & Shakhnovich, E.I. Analysis of knowledge-based protein-ligand potentials using a self-consistent method. Protein Science 9, 4, 765 - 775 (2000). Publisher's VersionAbstract
We propose a self-consistent approach to analyze knowledge-based atom-atom potentials used to calculate protein-ligand binding energies. Ligands complexed to actual protein structures were first built using the SMoG growth procedure (DeWitte & Shakhnovich, 1996) with a chosen input potential. These model protein-ligand complexes were used to construct databases from which knowledge-based protein-ligand potentials were derived. We then tested several different modifications to such potentials and evaluated their performance on their ability to reconstruct the input potential using the statistical information available from a database composed of model complexes. Our data indicate that the most significant improvement resulted from properly accounting for the following key issues when estimating the reference state: (1) the presence of significant nonenergetic effects that influence the contact frequencies and (2) the presence of correlations in contact patterns due to chemical structure. The most successful procedure was applied to derive an atom-atom potential for real protein-ligand complexes. Despite the simplicity of the model (pairwise contact potential with a single interaction distance), the derived binding free energies showed a statistically significant correlation (∼0.65) with experimental binding scores for a diverse set of complexes.
Li, L., Mirny, L.A. & Shakhnovich, E.I. Kinetics, thermodynamics and evolution of non-native interactions in a protein folding nucleus. Nat Struct Mol Biol 7, 4, 336 - 342 (2000). Publisher's VersionAbstract
A lattice model with side chains was used to investigate protein folding with computer simulations. In this model, we rigorously demonstrate the existence of a specific folding nucleus. This nucleus contains specific interactions not present in the native state that, when weakened, slow folding but do not change protein stability. Such a decoupling of folding kinetics from thermodynamics has been observed experimentally for real proteins. From our results, we conclude that specific non-native interactions in the transition state would give rise to -values that are negative or larger than unity. Furthermore, we demonstrate that residue Ile 34 in src SH3, which has been shown to be kinetically, but not thermodynamically, important, is universally conserved in proteins with the SH3 fold. This is a clear example of evolution optimizing the folding rate of a protein independent of its stability and function.
Dokholyan, N.V., Buldyrev, S.V., Stanley, H.E. & Shakhnovich, E.I. Identifying the protein folding nucleus using molecular dynamics. Journal of Molecular Biology 296, 5, 1183 - 1188 (2000). Publisher's VersionAbstract
Molecular dynamics simulations of folding in an off-lattice protein model reveal a nucleation scenario, in which a few well-defined contacts are formed with high probability in the transition state ensemble of conformations. Their appearance determines folding cooperativity and drives the model protein into its folded conformation. Amino acid residues participating in those contacts may serve as "accelerator pedals” used by molecular evolution to control protein folding rate.
Grzybowski, B.A., et al. Modeling the Kinetics of Acylation of Insulin using a Recursive Method for Solving the Systems of Coupled Differential Equations. Biophysical Journal 78, 2, 652 - 661 (2000). Publisher's VersionAbstract
This paper describes a theoretical method for solving systems of coupled differential equations that describe the kinetics of complicated reaction networks in which a molecule having multiple reaction sites reacts irreversibly with multiple equivalents of a ligand (reagent). The members of the network differ in the number of equivalents of reagent that have reacted, and in the patterns of sites of reaction. A recursive algorithm generates series, asymptotic, and average solutions describing this kinetic scheme. This method was validated by successfully simulating the experimental data for the kinetics of acylation of insulin.
Gutman, L. & Shakhnovich, E. Orientational ordering in sequence-disordered liquid crystalline polymers. Chemical Physics Letters 325, 4, 323 - 329 (2000). Publisher's VersionAbstract
We study the nematic/isotropic (N/I) phase diagram and orientational ordering in heteropolymers consisting of stiff and flexible segments by field theory. It is shown that a finite and small variance in sequence disorder is sufficient to destroy orientational ordering; in the lyotropic case, the presence of sequence disorder triples the (N/I) coexistence density difference compared with that of stiff homopolymers at similar conditions. The typical N/I domain scale predicted using nucleation theory and field theory is ∼μm in agreement with recent optical microscopy experimental results.
Grzybowski, B.A., Ishchenko, A.V., DeWitte, R.S., Whitesides, G.M. & Shakhnovich, E.I. Development of a Knowledge-Based Potential for Crystals of Small Organic Molecules:  Calculation of Energy Surfaces for C=0···H−N Hydrogen Bonds. J. Phys. Chem. B 104, 31, 7293 - 7298 (2000). Publisher's VersionAbstract
This paper describes the derivation of a Knowledge-Based Potential for intermolecular interactions from the statistical information stored in the Cambridge Structural Database. We develop a statistical mechanical method that relates the occurrences of intermolecular contacts in the database to their energies. Our approach allows us to quantify (in the form of energy) the geometrical preferences of interactions. We use our method to construct energy maps for a hydrogen bond between carbonyl oxygen and amino hydrogen. Our results demonstrate high orientational selectivity of this type of hydrogen bonding.
Abkevich, V.I. & Shakhnovich, E.I. What can Disulfide Bonds Tell Us about Protein Energetics, Function and Folding: Simulations and Bioninformatics Analysis. Journal of Molecular Biology 300, 4, 975 - 985 (2000). Publisher's Version
Vendruscolo, M., Mirny, L.A., Shakhnovich, E.I. & Domany, E. Comparison of two optimization methods to derive energy parameters for protein folding: Perceptron and Z score. Proteins: Structure, Function, and Bioinformatics 41, 2, 192 - 201 (2000). Publisher's VersionAbstract
Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on finding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the difference between the native energy and the average energy of the decoys, measured in terms of the width of the decoys' energy distribution (Z-score). Whereas the perceptron method is sensitive mainly to "outlier” (i.e., extremal) decoys, the Z-score optimization is governed by the high density regions in decoy-space. We compare the two methods by deriving contact energies for two very different sets of decoys: the first obtained for model lattice proteins and the second by threading. We find that the potentials derived by the two methods are of similar quality and fairly closely related. This finding indicates that standard, naturally occurring sets of decoys are distributed in a way that yields robust energy parameters (that are quite insensitive to the particular method used to derive them). The main practical implication of this finding is that it is not necessary to fine-tune the potential search method to the particular set of decoys used. Proteins 2000;41:192–201. © 2000 Wiley-Liss, Inc.
Choe, S.E., Li, L., Matsudaira, P.T., Wagner, G. & Shakhnovich, E.I. Differential stabilization of two hydrophobic cores in the transition state of the villin 14T folding reaction. Journal of Molecular Biology 304, 1, 99 - 115 (2000). Publisher's VersionAbstract
We report the distribution of hydrophobic core contacts during the folding reaction transition state for villin 14T, a small 126-residue protein domain. The solution structure of villin 14T contains a central β-sheet with two flanking hydrophobic cores; transition states for this protein topology have not been previously studied. Villin 14T has no disulfide bonds or cis-proline residues in its native state; it folds reversibly, and in an apparently two-state manner under some conditions. To map the hydrophobic core contacts in the transition state, 27 point mutations were generated at positions spread throughout the two hydrophobic cores. After each point mutation, comparison of the change in folding kinetics with the equilibrium destabilization indicates whether the site of mutation is stabilized in the transition state. The results show that the folding nucleus, or the sub-region with the strongest transition state contacts, is located in one of the two hydrophobic cores (the predominantly aliphatic core). The other hydrophobic core, which is mostly aromatic, makes much weaker contacts in the transition state. This work is the first transition state mapping for a protein with multiple major hydrophobic cores in a single folding unit; the hydrophobic cores cannot be separated into individual folding subdomains. The stabilization of only one hydrophobic core in the transition state illustrates that hydrophobic core formation is not intrinsically capable of nucleating folding, but must also involve the right specific interactions or topological factors in order to be kinetically important.
Wilder, J. & Shakhnovich, E.I. Proteins with selected sequences: A heteropolymeric study. Phys. Rev. E 62, 5, 7100 - 7110 (2000). Publisher's VersionAbstract
Protein sequences are expected not to be random but selected in order to form a stable native structure that is kinetically accessible. Therefore our model contains a selective temperature in sequence space (see [S. Ramanathan and E. Shakhnovich, Phys. Rev. E 50, 1303 (1994)] ) to optimize the sequence for the target conformation statistically. Replica calculations, which go beyond quadratic approximations in the field-theoretical Hamiltonian, are presented. A phase diagram indicating the temperatures and selective temperatures at which transitions to a frozen globule, i.e., the native state, occur is obtained. It is shown that going beyond the quadratic approximation in the field Hamiltonian is very important, since it results in a significant change of the phase diagram. Moreover, we suggest that a one-step replica permutation symmetry scheme is sufficient to solve the model. In addition to this we present a result for the sequence correlation function along the chain in the case of a short-ranged potential between the monomers. A correlation function between monomers that form a contact in the native state is given depending on the temperature and the interaction parameter.
Mirny, L.A., Finkelstein, A.V. & Shakhnovich, E.I. Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. USA 97, 18, 9978 - 9983 (2000). Publisher's VersionAbstract
In this study, we estimate the statistical significance of structure prediction by threading. We introduce a single parameter ɛ that serves as a universal measure determining the probability that the best alignment is indeed a native-like analog. Parameter ɛ takes into account both length and composition of the query sequence and the number of decoys in threading simulation. It can be computed directly from the query sequence and potential of interactions, eliminating the need for sequence reshuffling and realignment. Although our theoretical analysis is general, here we compare its predictions with the results of gapless threading. Finally we estimate the number of decoys from which the native structure can be found by existing potentials of interactions. We discuss how this analysis can be extended to determine the optimal gap penalties for any sequence-structure alignment (threading) method, thus optimizing it to maximum possible performance.
DeWitte, R.S., Ishchenko, A.V. & Shakhnovich, E.I. SMoG:  de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 2. Case Studies in Molecular Design. J. Am. Chem. Soc. 119, 20, 4608 - 4617 (1997). Publisher's VersionAbstract
In this paper, we summarize three ligand design studies performed using the program SMoG, which was developed in our lab. The aim of this presentation is to communicate through examples the potential of this method:  the richness of the molecules that can be developed and the ease with which they are found. In particular, we present suggestions for ligands to Src SH3 domain (specificity pocket and LP site) and CD4.
DeWitte, R.S. & Shakhnovich, E.I. SMoG:  de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence. J. Am. Chem. Soc. 118, 47, 11733 - 11744 (1996). Publisher's VersionAbstract
In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.