Why highly stable proteins are not commonly observed in mesophilic organisms is an important evolutionary question. It has been suggested that high stability has a fitness cost in the form of loss of activity. However, this hypothesis has not yet been experimentally demonstrated. Here, we use an essential bacterial enzyme adenylate kinase (Adk) to explore this hypothesis and show that, as Adk’s stability increases, one of its own substrates inhibits its activity. Bacterial strains carrying such stable Adks show substantial fitness defects, which can be mapped to the loss of Adk activity due to substrate inhibition. Overall, our study adds “substrate inhibition” to the “toolbox” that is used to rationalize the stability distribution of proteins.Proteins are only moderately stable. It has long been debated whether this narrow range of stabilities is solely a result of neutral drift toward lower stability or purifying selection against excess stability—for which no experimental evidence was found so far—is also at work. Here, we show that mutations outside the active site in the essential Escherichia coli enzyme adenylate kinase (Adk) result in a stability-dependent increase in substrate inhibition by AMP, thereby impairing overall enzyme activity at high stability. Such inhibition caused substantial fitness defects not only in the presence of excess substrate but also under physiological conditions. In the latter case, substrate inhibition caused differential accumulation of AMP in the stationary phase for the inhibition-prone mutants. Furthermore, we show that changes in flux through Adk could accurately describe the variation in fitness effects. Taken together, these data suggest that selection against substrate inhibition and hence excess stability may be an important factor determining stability observed for modern-day Adk.
Mutation rate is a key determinant of the pace as well as outcome of evolution, and variability in this rate has been shown in different scenarios to play a key role in evolutionary adaptation and resistance evolution under stress caused by selective pressure. Here we investigate the dynamics of resistance fixation in a bacterial population with variable mutation rates, and we show that evolutionary outcomes are most sensitive to mutation rate variations when the population is subject to environmental and demographic conditions that suppress the evolutionary advantage of high-fitness subpopulations. By directly mapping a biophysical fitness function to the system-level dynamics of the population, we show that both low and very high, but not intermediate, levels of stress in the form of an antibiotic result in a disproportionate effect of hypermutation on resistance fixation. We demonstrate how this behavior is directly tied to the extent of genetic hitchhiking in the system, the propagation of high-mutation rate cells through association with high-fitness mutations. Our results indicate a substantial role for mutation rate flexibility in the evolution of antibiotic resistance under conditions that present a weak advantage over wildtype to resistant cells.
A central goal of protein-folding theory is to predict the stochastic dynamics of transition paths—the rare trajectories that transit between the folded and unfolded ensembles—using only thermodynamic information, such as a low-dimensional equilibrium free-energy landscape. However, commonly used one-dimensional landscapes typically fall short of this aim, because an empirical coordinate-dependent diffusion coefficient has to be fit to transition-path trajectory data in order to reproduce the transition-path dynamics. We show that an alternative, first-principles free-energy landscape predicts transition-path statistics that agree well with simulations and single-molecule experiments without requiring dynamical data as an input. This “topological configuration” model assumes that distinct, native-like substructures assemble on a time scale that is slower than native-contact formation but faster than the folding of the entire protein. Using only equilibrium simulation data to determine the free energies of these coarse-grained intermediate states, we predict a broad distribution of transition-path transit times that agrees well with the transition-path durations observed in simulations. We further show that both the distribution of finite-time displacements on a one-dimensional order parameter and the ensemble of transition-path trajectories generated by the model are consistent with the simulated transition paths. These results indicate that a landscape based on transient folding intermediates, which are often hidden by one-dimensional projections, can form the basis of a predictive model of protein-folding transition-path dynamics.
The relationship between the dynamics of a community and its constituent pairwise interactions is a fundamental problem in ecology. Higher-order ecological effects beyond pairwise interactions may be key to complex ecosystems, but mechanisms to produce these effects remain poorly understood. Here we model microbial growth and competition to show that higher-order effects can arise from variation in multiple microbial growth traits, such as lag times and growth rates, on a single limiting resource with no other interactions. These effects produce a range of ecological phenomena: an unlimited number of strains can exhibit multistability and neutral coexistence, potentially with a single keystone strain; strains that coexist in pairs do not coexist all together; and a strain that wins all pairwise competitions can go extinct in a mixed competition. Since variation in multiple growth traits is ubiquitous in microbial populations, our results indicate these higher-order effects may also be widespread, especially in laboratory ecology and evolution experiments.
Increased light scattering in the eye lens due to aggregation of the long-lived lens proteins, crystallins, is the cause of cataract disease. Several mutations in the gene encoding human γD-crystallin (HγD) cause misfolding and aggregation. Cataract-associated substitutions at Trp42 cause the protein to aggregate in vitro from a partially unfolded intermediate locked by an internal disulfide bridge, and proteomic evidence suggests a similar aggregation precursor is involved in age-onset cataract. Surprisingly, WT HγD can promote aggregation of the W42Q variant while itself remaining soluble. Here, a search for a biochemical mechanism for this interaction has revealed a previously unknown oxidoreductase activity in HγD. Using in vitrooxidation, mutational analysis, cysteine labeling, and MS, we have assigned this activity to a redox-active internal disulfide bond that is dynamically exchanged among HγD molecules. The W42Q variant acts as a disulfide sink, reducing oxidized WT and forming a distinct internal disulfide that kinetically traps the aggregation-prone intermediate. Our findings suggest a redox “hot potato” competition among WT and mutant or modified polypeptides wherein variants with the lowest kinetic stability are trapped in aggregation-prone intermediate states upon accepting disulfides from more stable variants. Such reactions may occur in other long-lived proteins that function in oxidizing environments. In these cases, aggregation may be forestalled by inhibiting disulfide flow toward mutant or damaged polypeptides.
Despite considerable efforts, no physical mechanism has been shown to explain N-terminalcodon bias in prokaryotic genomes. Using a systematic study of synonymous substitutions in two endogenous E. coli genes, we show that interactions between the coding region and the upstream Shine-Dalgarno (SD) sequence modulate the efficiency of translation initiation, affecting both intracellular mRNA and protein levels due to the inherent coupling of transcription and translation in E. coli. We further demonstrate that far-downstream mutations can also modulate mRNA levels by occluding the SD sequence through the formation of non-equilibrium secondary structures. By contrast, a non-endogenous RNA polymerase that decouples transcription and translation largely alleviates the effects of synonymous substitutions on mRNA levels. Finally, a complementary statistical analysis of the E. coli genome specifically implicates avoidance of intra-molecular base pairing with the SD sequence. Our results provide general physical insights into the coding-level features that optimize protein expression in prokaryotes.
Viral evolutionary pathways are determined by the fitness landscape, which maps viral genotype to fitness. However, a quantitative description of the landscape and the evolutionary forces on it remain elusive. Here, we apply a biophysical fitness model based on capsid folding stability and antibody binding affinity to predict the evolutionary pathway of norovirus escaping a neutralizing antibody. The model is validated by experimental evolution in bulk culture and in a drop-based microfluidics that propagates millions of independent small viral subpopulations. We demonstrate that along the axis of binding affinity, selection for escape variants and drift due to random mutations have the same direction, an atypical case in evolution. However, along folding stability, selection and drift are opposing forces whose balance is tuned by viral population size. Our results demonstrate that predictable epistatic tradeoffs between molecular traits of viral proteins shape viral evolution.
Motivation: Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level.
Results: We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determi- nants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of –0.49 (P-value < 10-10) and –0.46 (P-value<10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant.
Protein thermodynamics are an integral determinant of viral fitness and one of the major drivers of protein evolution. Mutations in the influenza A virus (IAV) hemagglutinin (HA) protein can eliminate neutralizing antibody binding to mediate escape from preexisting antiviral immunity. Prior research on the IAV nucleoprotein suggests that protein stability may constrain seasonal IAV evolution; however, the role of stability in shaping the evolutionary dynamics of the HA protein has not been explored. We used the full coding sequence of 9,797 H1N1pdm09 HA sequences and 16,716 human seasonal H3N2 HA sequences to computationally estimate relative changes in the thermal stability of the HA protein between 2009 and 2016. Phylogenetic methods were used to characterize how stability differences impacted the evolutionary dynamics of the virus. We found that pandemic H1N1 IAV strains split into two lineages that had different relative HA protein stabilities and that later variants were descended from the higher-stability lineage. Analysis of the mutations associated with the selective sweep of the higher-stability lineage found that they were characterized by the early appearance of highly stabilizing mutations, the earliest of which was not located in a known antigenic site. Experimental evidence further suggested that H1N1 HA stability may be correlated with in vitro virus production and infection. A similar analysis of H3N2 strains found that surviving lineages were also largely descended from viruses predicted to encode more-stable HA proteins. Our results suggest that HA protein stability likely plays a significant role in the persistence of different IAV lineages.
Mutations in a microbial population can increase the frequency of a genotype not only by increasing its exponential growth rate, but also by decreasing its lag time or adjusting the yield (resource efficiency). The contribution of multiple life-history traits to selection is a critical question for evolutionary biology as we seek to predict the evolutionary fates of mutations. Here we use a model of microbial growth to show that there are two distinct components of selection corresponding to the growth and lag phases, while the yield modulates their relative importance. The model predicts rich population dynamics when there are trade-offs between phases: multiple strains can coexist or exhibit bistability due to frequency-dependent selection, and strains can engage in rock–paper–scissors interactions due to non-transitive selection. We characterize the environmental conditions and patterns of traits necessary to realize these phenomena, which we show to be readily accessible to experiments. Our results provide a theoretical framework for analysing high-throughput measurements of microbial growth traits, especially interpreting the pleiotropy and correlations between traits across mutants. This work also highlights the need for more comprehensive measurements of selection in simple microbial systems, where the concept of an ordinary fitness landscape breaks down.
We present a study on the evolution of the Fenna–Matthews–Olson bacterial photosynthetic pigment–protein complex. This protein complex functions as an antenna. It transports absorbed photons—excitons—to a reaction center where photosynthetic reactions initiate. The efficiency of exciton transport is therefore fundamental for the photosynthetic bacterium’s survival. We have reconstructed an ancestor of the complex to establish whether coherence in the exciton transport was selected for or optimized over time. We have also investigated the role of optimizing free energy variation upon folding in evolution. We studied whether mutations which connect the ancestor to current day species were stabilizing or destabilizing from a thermodynamic viewpoint. From this study, we established that most of these mutations were thermodynamically neutral. Furthermore, we did not see a large change in exciton transport efficiency or coherence, and thus our results predict that exciton coherence was not specifically selected for.
Recent experiments and simulations have demonstrated that proteins can fold on the ribosome. However, the extent and generality of fitness effects resulting from cotranslational folding remain open questions. Here we report a genome-wide analysis that uncovers evidence of evolutionary selection for cotranslational folding. We describe a robust statistical approach to identify loci within genes that are both significantly enriched in slowly translated codons and evolutionarily conserved. Surprisingly, we find that domain boundaries can explain only a small fraction of these conserved loci. Instead, we propose that regions enriched in slowly translated codons are associated with cotranslational folding intermediates, which may be smaller than a single domain. We show that the intermediates predicted by a native-centric model of cotranslational folding account for the majority of these loci across more than 500 Escherichia coli proteins. By making a direct connection to protein folding, this analysis provides strong evidence that many synonymous substitutions have been selected to optimize translation rates at specific locations within genes. More generally, our results indicate that kinetics, and not just thermodynamics, can significantly alter the efficiency of self-assembly in a biological context.
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
The BACE-1 enzyme is a prime target to find a cure to Alzheimer's disease. In this article, we used the MM-PBSA approach to compute the binding free energies of 46 reported ligands to this enzyme. After showing that the most probable protonation state of the catalytic dyad is mono-protonated (on ASP32), we performed a thorough analysis of the parameters influencing the sampling of the conformational space (in total, more than 35 μs of simulations were performed). We show that ten simulations of 2 ns gives better results than one of 50 ns. We also investigated the influence of the protein force field, the water model, the periodic boundary conditions artifacts (box size), as well as the ionic strength. Amber03 with TIP3P, a minimal distance of 1.0 nm between the protein and the box edges and a ionic strength of I = 0.2 M provides the optimal correlation with experiments. Overall, when using these parameters, a Pearson correlation coefficient of R = 0.84 (R2 = 0.71) is obtained for the 46 ligands, spanning eight orders of magnitude of Kd (from 0.017 nm to 2000 μM, i.e., from −14.7 to −3.7 kcal/mol), with a ligand size from 22 to 136 atoms (from 138 to 937 g/mol). After a two-parameter fit of the binding affinities for 12 of the ligands, an error of RMSD = 1.7 kcal/mol was obtained for the remaining ligands.
To assess the mutational robustness of nucleic acids, many genome- and protein-level studies have been performed, where nucleic acids are treated as genetic information carriers and transferrers. However, the molecular mechanisms through which mutations alter the structural, dynamic, and functional properties of nucleic acids are poorly understood. Here we performed a SELEX in silico study to investigate the fitness distribution of the l-Arm-binding aptamer genotype neighborhoods. Two novel functional genotype neighborhoods were isolated and experimentally verified to have comparable fitness as the wild-type. The experimental aptamer fitness landscape suggests the mutational robustness is strongly influenced by the local base environment and ligand-binding mode, whereas bases distant from the binding pocket provide potential evolutionary pathways to approach the global fitness maximum. Our work provides an example of successful application of SELEX in silico to optimize an aptamer and demonstrates the strong sensitivity of mutational robustness to the site of genetic variation.
In this Letter we investigate a direct relationship between a graph’s topology and the free energy of a spin system on the graph. We develop a method of separating topological and energetic contributions to the free energy, and find that considering the topology is sufficient to qualitatively compare the free energies of different graph systems at high temperature, even when the energetics are not fully known. This method was applied to the metal lattice system with defects, and we found that it partially explains why point defects are more stable than high-dimensional defects. Given the energetics, we can even quantitatively compare free energies of different graph structures via a closed form of linear graph contributions. The closed form is applied to predict the sequence-space free energy of lattice proteins, which is a key factor determining the designability of a protein structure.
We present the third generation of our scoring function for the prediction of protein–ligand binding free energy. This function is now a hybrid between a knowledge-based potential and an empirical function. We constructed a diversified set of ∼1000 complexes from the PDBBinding-CN database for the training of the function, and we show that this number of complexes generates enough data to build the potential. The occurrence of 420 different types of atomic pairwise interactions is computed in up to five different ranges of distances to derive the knowledge-based part. All of the parameters were optimized, and we were able to considerably improve the accuracy of the scoring function with a Pearson correlation coefficient against experimental binding free energies of up to 0.57, which ranks our new scoring function as one of the best currently available and the second-best in terms of standard deviation (SD = 1.68 kcal/mol). The function was then further improved by inclusion of different terms taking into account repulsion and loss of entropy upon binding, and we show that it is capable of recovering native binding poses up to 80% of the time. All of the programs, tools, and protein sets are released in the Supporting Information or as open-source programs.
In drug discovery, systematic variations of substituents on a common scaffold and bioisosteric replacements are often used to generate diversity and obtain molecules with better biological effects. However, this could saturate the small-molecule diversity pool resulting in drug resistance. On the other hand, conventional drug discovery relies on targeting known pockets on protein surfaces leading to drug resistance by mutations of critical pocket residues. Here, we present a two-pronged strategy of designing novel drugs that target unique pockets on a protein’s surface to overcome the above problems. Dihydrofolate reductase, DHFR, is a critical enzyme involved in thymidine and purine nucleotide biosynthesis. Several classes of compounds that are structural analogues of the substrate dihydrofolate have been explored for their antifolate activity. Here, we describe 10 novel small-molecule inhibitors of Escherichia coli DHFR, EcDHFR, belonging to the stilbenoid, deoxybenzoin, and chalcone family of compounds discovered by a combination of pocket-based virtual ligand screening and systematic scaffold hopping. These inhibitors show a unique uncompetitive or noncompetitive inhibition mechanism, distinct from those reported for all known inhibitors of DHFR, indicative of binding to a unique pocket distinct from either substrate or cofactor-binding pockets. Furthermore, we demonstrate that rescue mutants of EcDHFR, with reduced affinity to all known classes of DHFR inhibitors, are inhibited at the same concentration as the wild-type. These compounds also exhibit antibacterial activity against E. coli harboring the drug-resistant variant of DHFR. This discovery is the first report on a novel class of inhibitors targeting a unique pocket on EcDHFR.
Homology modeling is a powerful tool for predicting a protein’s structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Mutations provide the variation that drives evolution, yet their effects on fitness remain poorly understood. Here we explore how mutations in the essential enzyme adenylate kinase (Adk) of Escherichia coli affect multiple phases of population growth. We introduce a biophysical fitness landscape for these phases, showing how they depend on molecular and cellular properties of Adk. We find that Adk catalytic capacity in the cell (the product of activity and abundance) is the major determinant of mutational fitness effects. We show that bacterial lag times are at a well-defined optimum with respect to Adk’s catalytic capacity, while exponential growth rates are only weakly affected by variation in Adk. Direct pairwise competitions between strains show how environmental conditions modulate the outcome of a competition where growth rates and lag times have a tradeoff, shedding light on the multidimensional nature of fitness and its importance in the evolutionary optimization of enzymes.