Computational lead design procedures require fast and accurate scoring functions to rank millions of generated virtual ligands for protein targets. In this article, we present an improved version of the SMoG scoring function, called SMoG2001. This function is based on a knowledge-based approachthat is, the free energy parameters are derived from the observed frequencies of atom?atom contacts in the database of three-dimensional structures of protein?ligand complexes via a procedure based on statistical mechanics. We obtained the statistics from the set of 725 complexes. SMoG2001 reproduces the experimental binding constants of the majority of 119 complexes of the testing set with good accuracy. On similar testing sets, SMoG2001 performs better than two other widely used scoring functions, PMF and SCORE1(LUDI), and comparably to DrugScore. SMoG2001 poorly predicts the affinities of ligands interacting via quantum mechanical forces with metal ions and ligands that are large and flexible. We attribute significant improvement in accuracy over previous versions of the SMoG scoring function to a better description of the reference statethat is, the state of no interactions.
Protein G is folded with an all-atom Monte Carlo simulation by using a Gō potential. When folding is monitored by using burial of the lone tryptophan in protein G as the reaction coordinate, the ensemble kinetics is single exponential. Other experimental observations, such as the burst phase and mutational data, are also reproduced. However, more detailed analysis reveals that folding occurs over three distinct, three-state pathways. We show that, because of this tryptophan's asymmetric location in the tertiary fold, its burial (i) does not detect certain intermediates and (ii) may not correspond to the folding event. This finding demonstrates that ensemble averaging can disguise the presence of multiple pathways and intermediates when a non-ideal reaction coordinate is used. Finally, all observed folding pathways eventually converge to a common rate-limiting step, which is the formation of a specific nucleus involving hydrophobic core residues. These residues are conserved in the ubiquitin superfamily and in a phage display experiment, suggesting that fold topology is a strong determinant of the transition state.
We present an analytical theory for heteropolymer deformation, as exemplified experimentally by stretching of single protein molecules. Using replica mean-field theory, we determine phase diagrams for stress-induced unfolding of typical random sequences. This transition is sharp in the limit of infinitely long chain molecules. However, for chain lengths relevant to biological macromolecules, partially unfolded conformations prevail over an intermediate range of stress. These necklacelike structures, comprised of alternating compact and extended subunits, are stabilized by quenched variations in the composition of finite chain segments. The most stable arrangements of these subunits are largely determined by preferential extension of segments rich in solvophilic monomers. This predicted significance of necklace structures explains recent observations in protein stretching experiments. We examine the statistical features of select sequences that give rise to mechanical strength and may thus have guided the evolution of proteins that carry out mechanical functions in living cells.
A method for deriving all-atom protein folding potentials is presented and tested on a three-helix bundle protein, as well as on hairpin and helical sequences. The potentials obtained are composed of a contact term between pairs of atoms, and a local density term for each atom, mimicking solvent exposure preferences. Using this potential in an all-atom protein folding simulation, we repeatedly folded the three-helix bundle, with the lowest energy conformations having a Cα distance rms from the native structure of less than 2 Å. Similar results were obtained for the hairpin and helices by using different potentials. We derived potentials for several different proteins and found a high correlation between the derived parameters, suggesting that a potential of this form eventually could be found that folds multiple, unrelated proteins at the atomic level of detail.
The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
We perform a detailed analysis of the thermodynamics and folding kinetics of the SH3 domain fold with discrete molecular dynamic simulations. We propose a protein model that reproduces some of the experimentally observed thermodynamic and folding kinetic properties of proteins. Specifically, we use our model to study the transition state ensemble of the SH3 fold family of proteins, a set of unstable conformations that fold to the protein native state with probability 1/2. We analyze the participation of each secondary structure element formed at the transition state ensemble. We also identify the folding nucleus of the SH3 fold and test extensively its importance for folding kinetics. We predict that a set of amino acid contacts between the RT-loop and the distal hairpin are the critical folding nucleus of the SH3 fold and propose a hypothesis that explains this result.
The folding of many small proteins is kinetically a two-state process that represents overcoming the major free-energy barrier. A kinetic characteristic of a conformation, its probability to descend to the native state domain in the amount of time that represents a small fraction of total folding time, has been introduced to determine to which side of the free-energy barrier a conformation belongs. However, which features make a protein conformation on the folding pathway become committed to rapidly descending to the native state has been a mystery. Using two small, well characterized proteins, CI2 and C-Src SH3, we show how topological properties of protein conformations determine their kinetic ability to fold. We use a macroscopic measure of the protein contact network topology, the average graph connectivity, by constructing graphs that are based on the geometry of protein conformations. We find that the average connectivity is higher for conformations with a high folding probability than for those with a high probability to unfold. Other macroscopic measures of protein structural and energetic properties such as radius of gyration, rms distance, solvent-accessible surface area, contact order, and potential energy fail to serve as predictors of the probability of a given conformation to fold.
Combinatorial small molecule growth algorithm was used to design inhibitors for human carbonic anhydrase II. Two enantiomeric candidate molecules were predicted to bind with high potency (with R isomer binding stronger than S), but in two distinct conformations. The experiments veriﬁed that computational predictions concerning the binding afﬁnities and the binding modes were correct for both isomers. The designed R isomer is the best-known inhibitor (Kd ~ 30 pM) of human carbonic anhydrase II.
We use a simple off-lattice Langevin model of protein folding to characterize the folding and unfolding of a fast-folding, 46 residue three-helix bundle. Under conditions at which the C-terminal helix is 30 % stable, we observe a clear three-state folding mechanism. In the on-pathway intermediate state, the middle and C-terminal helices are folded and in contact with each other, while the N-terminal region remains disordered. Nevertheless, under these conditions this intermediate is thermodynamically unstable relative to its unfolded state. The first and highest folding barrier corresponds to the organization of the hinge between the middle and C-terminal helices. A subsequent major barrier corresponds to the organization of the hinge between the middle and N-terminal helices. Hyperstabilizing the hinge regions leads to twice the folding rate that is obtained from hyperstabilizing the helices, even though much fewer contacts are involved in hinge hyperstabilization than in helix hyperstabilization. Unfolding follows single-exponential kinetics, even at temperatures only slightly above the folding transition temperature.
Experimentally, protein engineering and φ-value analysis is the method of choice to characterize the structure in folding transition state ensemble (TSE) of any protein. Combining experimental φ values and computer simulations has led to a deeper understanding of how proteins fold. In this report, we construct the TSE of chymotrypsin inhibitor 2 from published φ values. Importantly, we verify, by means of multiple independent simulations, that the conformations in the TSE have a probability of ≈0.5 to reach the native state rapidly, so the TSE consists of true transition states. This finding validates the use of transition state theory underlying all φ-value analyses. Also, we present a method to dissect and study the TSE by generating conformations that have a disrupted α-helix (α-disrupted states) or disordered β-strands 3 and 4 (β-disrupted states). Surprisingly, the α-disrupted states have a stronger tendency to fold than the β-disrupted states, despite the higher φ values for the α-helix in the TSE. We give a plausible explanation for this result and discuss its implications on protein folding and design. Our study shows that, by using both experiments and computer simulations, we can gain many insights into protein folding.
The excluded volume occupied by protein side-chains and the requirement of high packing density in the protein interior should severely limit the number of side-chain conformations compatible with a given native backbone. To examine the relationship between side-chain geometry and side-chain packing, we use an all-atom Monte Carlo simulation to sample the large space of side-chain conformations. We study three models of excluded volume and use umbrella sampling to effectively explore the entire space. We find that while excluded volume constraints reduce the size of conformational space by many orders of magnitude, the number of allowed conformations is still large. An average repacked conformation has 20 % of its χ angles in a non-native state, a marked reduction from the expected 67 % in the absence of excluded volume. Interestingly, well-packed conformations with up to 50 % non-native χ angles exist. The repacked conformations have native packing density as measured by a standard Voronoi procedure. Entropy is distributed non-uniformly over positions, and we partially explain the observed distribution using rotamer probabilities derived from the Protein Data Bank database. In several cases, native rotamers that occur infrequently in the database are seen with high probability in our simulation, indicating that sequence-specific excluded volume interactions can stabilize rotamers that are rare for a given backbone. In spite of our finding that 65 % of the native rotamers and 85 % of χ1 angles can be predicted correctly on the basis of excluded volume only, 95 % of positions can accommodate more than one rotamer in simulation. We estimate that, in order to quench the side-chain entropy observed in the presence of excluded volume interactions, other interactions (hydrophobic, polar, electrostatic) must provide an additional stabilization of at least 0.6 kT per residue in order to single out the native state.
There have been many studies about the effect of circular permutation on the transition state/folding nucleus of proteins, with sometimes conflicting conclusions from different proteins and permutations. To clarify this important issue, we have studied two circular permutations of a lattice protein model with side-chains. Both permuted sequences have essentially the same native state as the original (wild-type) sequence. Circular permutant 1 cuts at the folding nucleus of the wild-type sequence. As a result, the permutant has a drastically different nucleus and folds more slowly than wild-type. In contrast, circular permutant 2 involves an incision at a site unstructured in the wild-type transition state, and the wild-type nucleus is largely retained in the permutant. In addition, permutant 2 displays both two-state and multi-state folding, with a native-like intermediate state occasionally populated. Neither the wild-type nor permutant 1 has a similar intermediate, and both fold in an apparently two-state manner. Surprisingly, permutant 2 folds at a rate identical with that of the wild-type. The intermediate in permutant 2 is stabilised by native and non-native interactions, and cannot be classified simply as on or off-pathway. So we advise caution in attributing experimental data to on or off-pathway intermediates. Finally, our work illuminates the results on α-spectrin SH3, chymotrypsin inhibitor 2 and β-lactoglobulin, and supports a key assumption in the experimental efforts to locate potential nucleation sites of real proteins via circular permutations.
We propose a model that explains the hierarchical organization of proteins in fold families. The model, which is based on the evolutionary selection of proteins by their native state stability, reproduces patterns of amino acids conserved across protein families. Due to its dynamic nature, the model sheds light on the evolutionary time-scales. By studying the relaxation of the correlation function between consecutive mutations at a given position in proteins, we observe separation of the evolutionary time-scales: at short time intervals families of proteins with similar sequences and structures are formed, while at long time intervals the families of structurally similar proteins that have low sequence similarity are formed. We discuss the evolutionary implications of our model. We provide a "profile” solution to our model and find agreement between predicted patterns of conserved amino acids and those actually observed in nature.
We study the Langevin dynamics of a heteropolymer by means of a mode-coupling approximation scheme, giving rise to a set of coupled integro-differential equations relating the response and correlation functions. The analysis shows that there is a regime at low temperature characterized by out-of-equilibrium dynamics, with violation of time-translational invariance and of the fluctuation-dissipation theorem. The onset of aging dynamics at low temperatures gives insight into the nature of the slow dynamics of a disordered polymer. We also introduce a renormalization-group treatment of our mode-coupling equations, which supports our analysis, and might be applicable to other systems.
A numerical study of the energy landscape of the space of model proteinsequences is carried out. As a consequence of the heterogeneity of thecontact energies among amino acids, the energy landscape displays a veryrough profile, a behaviour typical of frustrated systems. This givesraise to a hierarchical clustering of low-energy sequences and can have evolutionary consequences.
We present a novel Monte Carlo simulation of protein folding, in which all heavy atoms are represented as interacting hard spheres. This model includes all degrees of freedom relevant to folding, all side-chain and backbone torsions, and uses a Gō potential. In this study, we focus on the 46 residue α/β protein crambin and two of its structural components, the helix and helix hairpin. For a wide range of temperatures, we recorded multiple folding events of these three structures from random coils to native conformations that differ by less than 1 Å Cα dRMS from their crystal structure coordinates. The thermodynamics and kinetic mechanism of the helix-coil transition obtained from our simulation shows excellent agreement with currently available experimental and molecular dynamics data. Based on insights obtained from folding its smaller structural components, a possible folding mechanism for crambin is proposed. We observed that the folding occurs via a cooperative, first order-like process, and that many folding pathways to the native state exist. One particular sequence of events constitutes a "fast-folding” pathway where kinetic traps are avoided. At very low temperatures, a kinetic trap arising from the incorrect packing of side-chains was observed. These results demonstrate that folding to the native state can be observed in a reasonable amount of time on desktop computers even when an all-atom representation is used, provided the energetics sufficiently stabilize the native state.
This review focuses on recent advances in understanding protein folding kinetics in the context of nucleation theory. We present basic concepts such as nucleation, folding nucleus, and transition state ensemble and then discuss recent advances and challenges in theoretical understanding of several key aspects of protein folding kinetics. We cover recent topology-based approaches as well as evolutionary studies and molecular dynamics approaches to determine protein folding nucleus and analyze other aspects of folding kinetics. Finally, we briefly discuss successful all-atom Monte-Carlo simulations of protein folding and conclude with a brief outlook for the future.
We study solutions of statistically neutral polyampholyte chains containing a large fraction of neutral monomers. It is known that such solutions phase separate at very low concentrations, even if the quality of the solvent with respect to the neutral monomers is good. The precipitate is semidilute if the chains are weakly charged. This paper considers θ solvents and good solvents, and we calculate the dynamic charge density correlation function g(k,t) in the precipitate, using the quadratic approximation to the Martin-Siggia-Rose generating functional. It is convenient to express the results in terms of dimensionless space and time variables. Let ξ be the blob size, and let τ be the characteristic time scale at the blob level. Define the dimensionless wave vector q=ξk, and the dimensionless time s=t/τ. In the regime q<1, corresponding to length scales larger than the blob size, and 10.1, where entanglements are unimportant.