Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear.
With the help of a simple 20-letter lattice model of heteropolymers, we investigated the energy landscape in the space of designed good-folder sequences. Low-energy sequences form clusters, interconnected via neutral networks, in the space of sequences. Residues that play a key role in the foldability of the chain and in the stability of the native state are highly conserved, even among the chains belonging to different clusters. If, according to the interaction matrix, some strong attractive interactions are almost degenerate (i.e., they can be realized by more than one type of amino acid contacts), sequence clusters group into a few superclusters. Sequences belonging to different superclusters are dissimilar, displaying very small (≈10%) similarity, and residues in key sites are, as a rule, not conserved. Similar behavior is observed in the analysis of real protein sequences.
We propose a self-consistent approach to analyze knowledge-based atom-atom potentials used to calculate protein-ligand binding energies. Ligands complexed to actual protein structures were first built using the SMoG growth procedure (DeWitte & Shakhnovich, 1996) with a chosen input potential. These model protein-ligand complexes were used to construct databases from which knowledge-based protein-ligand potentials were derived. We then tested several different modifications to such potentials and evaluated their performance on their ability to reconstruct the input potential using the statistical information available from a database composed of model complexes. Our data indicate that the most significant improvement resulted from properly accounting for the following key issues when estimating the reference state: (1) the presence of significant nonenergetic effects that influence the contact frequencies and (2) the presence of correlations in contact patterns due to chemical structure. The most successful procedure was applied to derive an atom-atom potential for real protein-ligand complexes. Despite the simplicity of the model (pairwise contact potential with a single interaction distance), the derived binding free energies showed a statistically significant correlation (∼0.65) with experimental binding scores for a diverse set of complexes.
A lattice model with side chains was used to investigate protein folding with computer simulations. In this model, we rigorously demonstrate the existence of a specific folding nucleus. This nucleus contains specific interactions not present in the native state that, when weakened, slow folding but do not change protein stability. Such a decoupling of folding kinetics from thermodynamics has been observed experimentally for real proteins. From our results, we conclude that specific non-native interactions in the transition state would give rise to -values that are negative or larger than unity. Furthermore, we demonstrate that residue Ile 34 in src SH3, which has been shown to be kinetically, but not thermodynamically, important, is universally conserved in proteins with the SH3 fold. This is a clear example of evolution optimizing the folding rate of a protein independent of its stability and function.
Molecular dynamics simulations of folding in an off-lattice protein model reveal a nucleation scenario, in which a few well-defined contacts are formed with high probability in the transition state ensemble of conformations. Their appearance determines folding cooperativity and drives the model protein into its folded conformation. Amino acid residues participating in those contacts may serve as "accelerator pedals” used by molecular evolution to control protein folding rate.
This paper describes a theoretical method for solving systems of coupled differential equations that describe the kinetics of complicated reaction networks in which a molecule having multiple reaction sites reacts irreversibly with multiple equivalents of a ligand (reagent). The members of the network differ in the number of equivalents of reagent that have reacted, and in the patterns of sites of reaction. A recursive algorithm generates series, asymptotic, and average solutions describing this kinetic scheme. This method was validated by successfully simulating the experimental data for the kinetics of acylation of insulin.
We study the nematic/isotropic (N/I) phase diagram and orientational ordering in heteropolymers consisting of stiff and flexible segments by field theory. It is shown that a finite and small variance in sequence disorder is sufficient to destroy orientational ordering; in the lyotropic case, the presence of sequence disorder triples the (N/I) coexistence density difference compared with that of stiff homopolymers at similar conditions. The typical N/I domain scale predicted using nucleation theory and field theory is ∼μm in agreement with recent optical microscopy experimental results.
This paper describes the derivation of a Knowledge-Based Potential for intermolecular interactions from the statistical information stored in the Cambridge Structural Database. We develop a statistical mechanical method that relates the occurrences of intermolecular contacts in the database to their energies. Our approach allows us to quantify (in the form of energy) the geometrical preferences of interactions. We use our method to construct energy maps for a hydrogen bond between carbonyl oxygen and amino hydrogen. Our results demonstrate high orientational selectivity of this type of hydrogen bonding.
We report the distribution of hydrophobic core contacts during the folding reaction transition state for villin 14T, a small 126-residue protein domain. The solution structure of villin 14T contains a central β-sheet with two flanking hydrophobic cores; transition states for this protein topology have not been previously studied. Villin 14T has no disulfide bonds or cis-proline residues in its native state; it folds reversibly, and in an apparently two-state manner under some conditions. To map the hydrophobic core contacts in the transition state, 27 point mutations were generated at positions spread throughout the two hydrophobic cores. After each point mutation, comparison of the change in folding kinetics with the equilibrium destabilization indicates whether the site of mutation is stabilized in the transition state. The results show that the folding nucleus, or the sub-region with the strongest transition state contacts, is located in one of the two hydrophobic cores (the predominantly aliphatic core). The other hydrophobic core, which is mostly aromatic, makes much weaker contacts in the transition state. This work is the first transition state mapping for a protein with multiple major hydrophobic cores in a single folding unit; the hydrophobic cores cannot be separated into individual folding subdomains. The stabilization of only one hydrophobic core in the transition state illustrates that hydrophobic core formation is not intrinsically capable of nucleating folding, but must also involve the right specific interactions or topological factors in order to be kinetically important.
Protein sequences are expected not to be random but selected in order to form a stable native structure that is kinetically accessible. Therefore our model contains a selective temperature in sequence space (see [S. Ramanathan and E. Shakhnovich, Phys. Rev. E 50, 1303 (1994)] ) to optimize the sequence for the target conformation statistically. Replica calculations, which go beyond quadratic approximations in the field-theoretical Hamiltonian, are presented. A phase diagram indicating the temperatures and selective temperatures at which transitions to a frozen globule, i.e., the native state, occur is obtained. It is shown that going beyond the quadratic approximation in the field Hamiltonian is very important, since it results in a significant change of the phase diagram. Moreover, we suggest that a one-step replica permutation symmetry scheme is sufficient to solve the model. In addition to this we present a result for the sequence correlation function along the chain in the case of a short-ranged potential between the monomers. A correlation function between monomers that form a contact in the native state is given depending on the temperature and the interaction parameter.
In this study, we estimate the statistical significance of structure prediction by threading. We introduce a single parameter ɛ that serves as a universal measure determining the probability that the best alignment is indeed a native-like analog. Parameter ɛ takes into account both length and composition of the query sequence and the number of decoys in threading simulation. It can be computed directly from the query sequence and potential of interactions, eliminating the need for sequence reshuffling and realignment. Although our theoretical analysis is general, here we compare its predictions with the results of gapless threading. Finally we estimate the number of decoys from which the native structure can be found by existing potentials of interactions. We discuss how this analysis can be extended to determine the optimal gap penalties for any sequence-structure alignment (threading) method, thus optimizing it to maximum possible performance.
In this paper, we summarize three ligand design studies performed using the program SMoG, which was developed in our lab. The aim of this presentation is to communicate through examples the potential of this method: the richness of the molecules that can be developed and the ease with which they are found. In particular, we present suggestions for ligands to Src SH3 domain (specificity pocket and LP site) and CD4.
In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.