Shakhnovich, B.E. & Shakhnovich, E.I. Improvisation in evolution of genes and genomes: whose structure is it anyway?. Current Opinion in Structural Biology 18, 3, 375 - 381 (2008). Publisher's VersionAbstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales — from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Pereira de Araújo, A.F., Gomes, A.L.C., Bursztyn, A.A. & Shakhnovich, E.I. Native atomic burials, supplemented by physically motivated hydrogen bond constraints, contain sufficient information to determine the tertiary structure of small globular proteins. Proteins: Structure, Function, and Bioinformatics 70, 3, 971 - 983 (2008). Publisher's VersionAbstract
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-β WW-domain 1E0L, the α/β protein-G 1IGD, the all-α engrailed homeo-domain 1ENH, and the α + β engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |Ri − R i*|, where Ri is the atomic distance from the center of coordinates, or central distance, and R i* is the "ideal” central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-β 1E0L and α/β 1IGD. For all α 1ENH and α + β 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which α-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function. Proteins 2008. © 2007 Wiley-Liss, Inc.
Zhang, J., Maslov, S. & Shakhnovich, E.I. Constraints imposed by non-functional protein–protein interactions on gene expression and proteome size. Mol Syst Biol 4, 1, 210 (2008). Publisher's VersionAbstract
Crowded intracellular environments present a challenge for proteins to form functional specific complexes while reducing non-functional interactions with promiscuous non-functional partners. Here we show how the need to minimize the waste of resources to non-functional interactions limits the proteome diversity and the average concentration of co-expressed and co-localized proteins. Using the results of high-throughput Yeast 2-Hybrid experiments, we estimate the characteristic strength of non-functional protein–protein interactions. By combining these data with the strengths of specific interactions, we assess the fraction of time proteins spend tied up in non-functional interactions as a function of their overall concentration. This allows us to sketch the phase diagram for baker's yeast cells using the experimentally measured concentrations and subcellular localization of their proteins. The positions of yeast compartments on the phase diagram are consistent with our hypothesis that the yeast proteome has evolved to operate closely to the upper limit of its size, whereas keeping individual protein concentrations sufficiently low to reduce non-functional interactions. These findings have implication for conceptual understanding of intracellular compartmentalization, multicellularity and differentiation.
Lukatsky, D.B. & Shakhnovich, E.I. Statistically enhanced promiscuity of structurally correlated patterns. Phys. Rev. E 77, 2, 020901 (2008). Publisher's VersionAbstract
We predict that patterns with correlated surface density of atoms have statistically higher promiscuity (ability to bind stronger to an arbitrary pattern) as compared with noncorrelated patterns with the same average surface density. We suggest that this constitutes a generic design principle for highly connected proteins (hubs) in protein interaction networks. We develop an analytical theory for this effect. We show that our key predictions are generic and independent, qualitatively, on the specific form of the interatomic interaction potential, provided it has a finite range.
Zeldovich, K.B. & Shakhnovich, E.I. Understanding Protein Evolution: From Protein Physics to Darwinian Selection. Annual Review of Physical Chemistry 59, 1, 105 - 127 (2008). Publisher's VersionAbstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Faísca, P.F.N., Travasso, R.D.M., Ball, R.C. & Shakhnovich, E.I. Identifying critical residues in protein folding: Insights from ϕ-value and Pfold analysis. J. Chem. Phys. 129, 9, 095108 (2008). Publisher's VersionAbstract
We apply a simulational proxy of the ϕ-value analysis and perform extensive mutagenesis experiments to identify the nucleating residues in the folding "reactions” of two small lattice Gō polymers with different native geometries. Our findings show that for the more complex native fold (i.e., the one that is rich in nonlocal, long-range bonds), mutation of the residues that form the folding nucleus leads to a considerably larger increase in the folding time than the corresponding mutations in the geometry that is predominantly local. These results are compared to data obtained from an accurate analysis based on the reaction coordinate folding probability Pfold and on structural clustering methods. Our study reveals a complex picture of the transition state ensemble. For both protein models, the transition state ensemble is rather heterogeneous and splits up into structurally different populations. For the more complex geometry the identified subpopulations are actually structurally disjoint. For the less complex native geometry we found a broad transition state with microscopic heterogeneity. These findings suggest that the existence of multiple transition state structures may be linked to the geometric complexity of the native fold. For both geometries, the identification of the folding nucleus via the Pfold analysis agrees with the identification of the folding nucleus carried out with the ϕ-value analysis. For the most complex geometry, however, the applied methodologies give more consistent results than for the more local geometry. The study of the transition state structure reveals that the nucleus residues are not necessarily fully native in the transition state. Indeed, it is only for the more complex geometry that two of the five critical residues show a considerably high probability of having all its native bonds formed in the transition state. Therefore, one concludes that, in general, the ϕ-value correlates with the acceleration/deceleration of folding induced by mutation, rather than with the degree of nativeness of the transition state, and that the "traditional” interpretation of ϕ-values may provide a more realistic picture of the structure of the transition state only for more complex native geometries.
Yang, J.S., Wallin, S. & Shakhnovich, E.I. Universality and diversity of folding mechanics for three-helix bundle proteins. Proc. Natl. Acad. Sci. USA 105, 3, 895 - 900 (2008). Publisher's VersionAbstract
In this study we evaluate, at full atomic detail, the folding processes of two small helical proteins, the B domain of protein A and the Villin headpiece. Folding kinetics are studied by performing a large number of ab initio Monte Carlo folding simulations using a single transferable all-atom potential. Using these trajectories, we examine the relaxation behavior, secondary structure formation, and transition-state ensembles (TSEs) of the two proteins and compare our results with experimental data and previous computational studies. To obtain a detailed structural information on the folding dynamics viewed as an ensemble process, we perform a clustering analysis procedure based on graph theory. Moreover, rigorous p fold analysis is used to obtain representative samples of the TSEs and a good quantitative agreement between experimental and simulated Φ values is obtained for protein A. Φ values for Villin also are obtained and left as predictions to be tested by future experiments. Our analysis shows that the two-helix hairpin is a common partially stable structural motif that gets formed before entering the TSE in the studied proteins. These results together with our earlier study of Engrailed Homeodomain and recent experimental studies provide a comprehensive, atomic-level picture of folding mechanics of three-helix bundle proteins.
Donald, J.E., Chen, W.W. & Shakhnovich, E.I. Energetics of protein–DNA interactions. Nucl. Acids Res. 35, 4, 1039 - 1047 (2007). Publisher's VersionAbstract
Protein–DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein–DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
Perlstein, E.O., Deeds, E.J., Ashenberg, O., Shakhnovich, E.I. & Schreiber, S.L. Quantifying fitness distributions and phenotypic relationships in recombinant yeast populations. Proc. Natl. Acad. Sci. USA 104, 25, 10553 - 10558 (2007). Publisher's VersionAbstract
Studies of the role of sex in evolution typically involve a longitudinal comparison of a single ancestor to several intermediate descendants and to one terminally evolved descendant after many generations of adaptation under a given selective regime. Here we take a complementary, statistical approach to sex in evolution, by describing the distribution of phenotypic similarity in a population of yeast F1 meiotic recombinants. By applying graph theory to fitness measurements of thousands of Saccharomyces cerevisiae recombinants treated with 10 mechanistically distinct, growth-inhibitory small-molecule perturbagens (SMPs), we show that the network of phenotypic similarity among F1 recombinants exhibits a scale-free degree distribution. F1 recombinants are often phenotypically unique and sometimes exceptional, and their fitness strengths are unevenly distributed across the 10 compound treatments. By contrast, highly phenotypically similar F1 recombinants constitute failing hubs that display below-average fitness across all compound treatments and are candidate substrates for purifying selection. Comparison of the F1 generation with the parental strains reveals that (i) there is a specialist more fit in any given single condition than any of the parents but (ii) only rarely are there generalists that exhibit greater fitness than both parental strains across a majority of conditions. This analysis allows us to evaluate and to gain better theoretical understanding of the costs and benefits of sex in the F1 generation.
Zeldovich, K.B., Chen, P., Shakhnovich, B.E. & Shakhnovich, E.I. A First-Principles Model of Early Evolution: Emergence of Gene Families, Species, and Preferred Protein Folds. PLoS Comput Biol 3, 7, (2007). Publisher's VersionAbstract
Author Summary

Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.

Lukatsky, D.B., Shakhnovich, B.E., Mintseris, J. & Shakhnovich, E.I. Structural Similarity Enhances Interaction Propensity of Proteins. Journal of Molecular Biology 365, 5, 1596 - 1606 (2007). Publisher's VersionAbstract
We study statistical properties of interacting protein-like surfaces and predict two strong, related effects: (i) statistically enhanced self-attraction of proteins; (ii) statistically enhanced attraction of proteins with similar structures. The effects originate in the fact that the probability to find a pattern self-match between two identical, even randomly organized interacting protein surfaces is always higher compared with the probability for a pattern match between two different, promiscuous protein surfaces. This theoretical finding explains statistical prevalence of homodimers in protein–protein interaction networks reported earlier. Further, our findings are confirmed by the analysis of curated database of protein complexes that showed highly statistically significant overrepresentation of dimers formed by structurally similar proteins with highly divergent sequences ("superfamily heterodimers”). We suggest that promiscuous homodimeric interactions pose strong competitive interactions for heterodimers evolved from homodimers. Such evolutionary bottleneck is overcome using the negative design evolutionary pressure applied against promiscuous homodimer formation. This is achieved through the formation of highly specific contacts formed by charged residues as demonstrated both in model and real superfamily heterodimers.
Zeldovich, K.B., Chen, P. & Shakhnovich, E.I. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc. Natl. Acad. Sci. USA 104, 41, 16152 - 16157 (2007). Publisher's VersionAbstract
Classical population genetics a priori assigns fitness to alleles without considering molecular or functional properties of proteins that these alleles encode. Here we study population dynamics in a model where fitness can be inferred from physical properties of proteins under a physiological assumption that loss of stability of any protein encoded by an essential gene confers a lethal phenotype. Accumulation of mutations in organisms containing Γ genes can then be represented as diffusion within the Γ-dimensional hypercube with adsorbing boundaries determined, in each dimension, by loss of a protein's stability and, at higher stability, by lack of protein sequences. Solving the diffusion equation whose parameters are derived from the data on point mutations in proteins, we determine a universal distribution of protein stabilities, in agreement with existing data. The theory provides a fundamental relation between mutation rate, maximal genome size, and thermodynamic response of proteins to point mutations. It establishes a universal speed limit on rate of molecular evolution by predicting that populations go extinct (via lethal mutagenesis) when mutation rate exceeds approximately six mutations per essential part of genome per replication for mesophilic organisms and one to two mutations per genome per replication for thermophilic ones. Several RNA viruses function close to the evolutionary speed limit, whereas error correction mechanisms used by DNA viruses and nonmutant strains of bacteria featuring various genome lengths and mutation rates have brought these organisms universally ≈1,000-fold below the natural speed limit.
Berezovsky, I.N., Zeldovich, K.B. & Shakhnovich, E.I. Positive and Negative Design in Stability and Thermal Adaptation of Natural Proteins. PLoS Comput Biol 3, 3, (2007). Publisher's VersionAbstract
Author Summary

What mechanisms does Nature use in her quest for thermophilic proteins? It is known that stability of a protein is mainly determined by the energy gap, or the difference in energy, between native state and a set of incorrectly folded (misfolded) conformations. Here we show that Nature makes thermophilic proteins by widening this gap from both ends. The energy of the native state of a protein is decreased by selecting strongly attractive amino acids at positions that are in contact in the native state (positive design). Simultaneously, energies of the misfolded conformations are increased by selection of strongly repulsive amino acids at positions that are distant in native structure; however, these amino acids will interact repulsively in the misfolded conformations (negative design). These fundamental principles of protein design are manifested in the "from both ends of the hydrophobicity scale” trend observed in thermophilic adaptation, whereby proteomes of thermophilic proteins are enriched in extreme amino acids—hydrophobic and charged—at the expense of polar ones. Hydrophobic amino acids contribute mostly to the positive design, while charged amino acids that repel each other in non-native conformations of proteins contribute to negative design. Our results provide guidance in rational design of proteins with selected thermal properties.

Deeds, E.J., Ashenberg, O., Gerardin, J. & Shakhnovich, E.I. Robust protein–protein interactions in crowded cellular environments. Proc. Natl. Acad. Sci. USA 104, 38, 14952 - 14957 (2007). Publisher's VersionAbstract
The capacity of proteins to interact specifically with one another underlies our conceptual understanding of how living systems function. Systems-level study of specificity in protein–protein interactions is complicated by the fact that the cellular environment is crowded and heterogeneous; interaction pairs may exist at low relative concentrations and thus be presented with many more opportunities for promiscuous interactions compared with specific interaction possibilities. Here we address these questions by using a simple computational model that includes specifically designed interacting model proteins immersed in a mixture containing hundreds of different unrelated ones; all of them undergo simulated diffusion and interaction. We find that specific complexes are quite robust to interference from promiscuous interaction partners only in the range of temperatures T design > T > T rand. At T > T design, specific complexes become unstable, whereas at T < T rand, formation of specific complexes is suppressed by promiscuous interactions. Specific interactions can form only if T design > T rand. This condition requires an energy gap between binding energy in a specific complex and set of binding energies between randomly associating proteins, providing a general physical constraint on evolutionary selection or design of specific interacting protein interfaces. This work has implications for our understanding of how the protein repertoire functions and evolves within the context of cellular systems.
Yang, J.S., Chen, W.W., Skolnick, J. & Shakhnovich, E.I. All-Atom Ab Initio Folding of a Diverse Set of Proteins. Structure 15, 1, 53 - 63 (2007). Publisher's VersionAbstract
Natural proteins fold to a unique, thermodynamically dominant state. Modeling of the folding process and prediction of the native fold of proteins are two major unsolved problems in biophysics. Here, we show successful all-atom ab initio folding of a representative diverse set of proteins by using a minimalist transferable-energy model that consists of two-body atom-atom interactions, hydrogen bonding, and a local sequence-energy term that models sequence-specific chain stiffness. Starting from a random coil, the native-like structure was observed during replica exchange Monte Carlo (REMC) simulation for most proteins regardless of their structural classes; the lowest energy structure was close to native—in the range of 2–6 Å root-mean-square deviation (rmsd). Our results demonstrate that the successful folding of a protein chain to its native state is governed by only a few crucial energetic terms.
Gomes, A.L.C., de Rezende, J.R., de Araújo, A.P.F. & Shakhnovich, E.I. Description of atomic burials in compact globular proteins by Fermi-Dirac probability distributions. Proteins: Structure, Function, and Bioinformatics 66, 2, 304 - 320 (2007). Publisher's VersionAbstract
We perform a statistical analysis of atomic distributions as a function of the distance R from the molecular geometrical center in a nonredundant set of compact globular proteins. The number of atoms increases quadratically for small R, indicating a constant average density inside the core, reaches a maximum at a size-dependent distance Rmax, and falls rapidly for larger R. The empirical curves turn out to be consistent with the volume increase of spherical concentric solid shells and a Fermi-Dirac distribution in which the distance R plays the role of an effective atomic energy ϵ(R) = R. The effective chemical potential μ governing the distribution increases with the number of residues, reflecting the size of the protein globule, while the temperature parameter β decreases. Interestingly, βμ is not as strongly dependent on protein size and appears to be tuned to maintain approximately half of the atoms in the high density interior and the other half in the exterior region of rapidly decreasing density. A normalized size-independent distribution was obtained for the atomic probability as a function of the reduced distance, r = R/Rg, where Rg is the radius of gyration. The global normalized Fermi distribution, F(r), can be reasonably decomposed in Fermi-like subdistributions for different atomic types τ, Fτ(r), with ΣτFτ(r) = F(r), which depend on two additional parameters μτ and hτ. The chemical potential μτ affects a scaling prefactor and depends on the overall frequency of the corresponding atomic type, while the maximum position of the subdistribution is determined by hτ, which appears in a type-dependent atomic effective energy, ετ(r) = hτr, and is strongly correlated to available hydrophobicity scales. Better adjustments are obtained when the effective energy is not assumed to be necessarily linear, or ετ*(r) = hτ*rα,, in which case a correlation with hydrophobicity scales is found for the product ατhτ*. These results indicate that compact globular proteins are consistent with a thermodynamic system governed by hydrophobic-like energy functions, with reduced distances from the geometrical center, reflecting atomic burials, and provide a conceptual framework for the eventual prediction from sequence of a few parameters from which whole atomic probability distributions and potentials of mean force can be reconstructed. Proteins 2007. © 2006 Wiley-Liss, Inc.
Wallin, S., Zeldovich, K.B. & Shakhnovich, E.I. The Folding Mechanics of a Knotted Protein. Journal of Molecular Biology 368, 3, 884 - 893 (2007). Publisher's VersionAbstract
An increasing number of proteins are being discovered with a remarkable and somewhat surprising feature, a knot in their native structures. How the polypeptide chain is able to "knot” itself during the folding process to form these highly intricate protein topologies is not known. Here we perform a computational study on the 160-amino-acid homodimeric protein YibK, which, like other proteins in the SpoU family of MTases, contains a deep trefoil knot in its C-terminal region. In this study, we use a coarse-grained Cα-chain representation and Langevin dynamics to study folding kinetics. We find that specific, attractive nonnative interactions are critical for knot formation. In the absence of these interactions, i.e., in an energetics driven entirely by native interactions, knot formation is exceedingly unlikely. Further, we find, in concert with recent experimental data on YibK, two parallel folding pathways that we attribute to an early and a late formation of the trefoil knot, respectively. For both pathways, knot formation occurs before dimerization. A bioinformatics analysis of the SpoU family of proteins reveals further that the critical nonnative interactions may originate from evolutionary conserved hydrophobic segments around the knotted region.
Zeldovich, K.B., Berezovsky, I.N. & Shakhnovich, E.I. Protein and DNA Sequence Determinants of Thermophilic Adaptation. PLoS Comput Biol 3, 1, (2007). Publisher's VersionAbstract
There have been considerable attempts in the past to relate phenotypic trait—habitat temperature of organisms—to their genotypes, most importantly compositions of their genomes and proteomes. However, despite accumulation of anecdotal evidence, an exact and conclusive relationship between the former and the latter has been elusive. We present an exhaustive study of the relationship between amino acid composition of proteomes, nucleotide composition of DNA, and optimal growth temperature (OGT) of prokaryotes. Based on 204 complete proteomes of archaea and bacteria spanning the temperature range from −10 °C to 110 °C, we performed an exhaustive enumeration of all possible sets of amino acids and found a set of amino acids whose total fraction in a proteome is correlated, to a remarkable extent, with the OGT. The universal set is Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL), and the correlation coefficient is as high as 0.93. We also found that the G + C content in 204 complete genomes does not exhibit a significant correlation with OGT (R = −0.10). On the other hand, the fraction of A + G in coding DNA is correlated with temperature, to a considerable extent, due to codon patterns of IVYWREL amino acids. Further, we found strong and independent correlation between OGT and the frequency with which pairs of A and G nucleotides appear as nearest neighbors in genome sequences. This adaptation is achieved via codon bias. These findings present a direct link between principles of proteins structure and stability and evolutionary mechanisms of thermophylic adaptation. On the nucleotide level, the analysis provides an example of how nature utilizes codon bias for evolutionary adaptation to extreme conditions. Together these results provide a complete picture of how compositions of proteomes and genomes in prokaryotes adjust to the extreme conditions of the environment.
Chen, W.W., Yang, J.S. & Shakhnovich, E.I. A knowledge-based move set for protein folding. Proteins: Structure, Function, and Bioinformatics 66, 3, 682 - 688 (2007). Publisher's VersionAbstract
The free energy landscape of protein folding is rugged, occasionally characterized by compact, intermediate states of low free energy. In computational folding, this landscape leads to trapped, compact states with incorrect secondary structure. We devised a residue-specific, protein backbone move set for efficient sampling of protein-like conformations in computational folding simulations. The move set is based on the selection of a small set of backbone dihedral angles, derived from clustering dihedral angles sampled from experimental structures. We show in both simulated annealing and replica exchange Monte Carlo (REMC) simulations that the knowledge-based move set, when compared with a conventional move set, shows statistically significant improved ability at overcoming kinetic barriers, reaching deeper energy minima, and achieving correspondingly lower RMSDs to native structures. The new move set is also more efficient, being able to reach low energy states considerably faster. Use of this move set in determining the energy minimum state and for calculating thermodynamic quantities is discussed. Proteins 2007. © 2006 Wiley-Liss, Inc.
Shakhnovich, E.I. Physics and evolution of protein-protein interactions. The FASEB Journal 20, A1473 (2006). Publisher's VersionAbstract
We will discuss recent developments in bioinformatics analysis and theoretical studies of protein-protein interactions in living cells – an important aspect of systems biology. First, we consider the network of protein-protein interactions and demonstrate that two published independent measurements of these interactions produce graphs that are only weakly correlated with one another despite their strikingly similar topology. We then propose a physical model based on the fundamental principle that (de)solvation is a major physical factor in protein-protein interactions. This model reproduces not only the scale-free nature of such graphs but also a number of higher-order correlations in these networks. A key support of the model is provided by the discovery of a significant correlation between number of interactions made by a protein and the fraction of hydrophobic residues on its surface. Next, we discuss a number of fundamental models for specific protein-protein interactions that provide deep insight into how specific protein multimers could have evolved. In particular we show that homodimers are most likely to have been precursors of modern protein complexes (homodimers are prevalent in modern cells – phenomenon of "molecular narcissism"). Subsequent evolution created homodimers in the process of ‘’negative design’’ against non-specific and homodimeric associations.