Human γD-crystallin (HγD) is an abundant and highly stable two-domain protein in the core region of the eye lens. Destabilizing mutations and post-translational modifications in this protein are linked to onset of aggregation that causes cataract disease (lens turbidity). Wild-type HγD greatly accelerates aggregation of the cataract-related W42Q variant, without itself aggregating. The mechanism of this “inverse prion” catalysis of aggregation remained unknown. Here we provide evidence that an early unfolding intermediate with an opened domain interface enables transient dimerization of the C-terminal domains of wild-type and mutant, or mutant and mutant, HγD molecules, which deprives the mutant’s N-terminal domain of intramolecular stabilization by the native domain interface and thus accelerates its misfolding to a distinct, aggregation-prone intermediate. A detailed kinetic model predicts universal power-law scaling relationships for lag time and rate of the resulting aggregation, which are in excellent agreement with the data. The mechanism reported here, which we term interface stealing, can be generalized to explain how common domain-domain interactions can have surprising consequences, such as conformational catalysis of unfolding, in multidomain proteins.
Fragment-based assembly has been widely used in Ab initio protein folding simulation which can effectively reduce the conformational space and thus accelerate sampling. The efficiency of fragment-based movement as well as the quality of fragment library determine whether the folding process can lead the free energy landscape to the global minimum and help the protein to reach near-native folded state. We designed an improved fragment-based movement, "fragmove", which substituted multiple backbone dihedral angles in every simulation step. This movement strategy was derived from the fragment library generated by LRFragLib, an effective fragment detection algorithm using logistic regression model. We show in replica exchange Monte Carlo (REMC) simulation that "fragmove", when compared with a set of existing movements in REMC, shows significant improved ability at increasing secondary and tertiary predicted model accuracy by 11.24% and 17.98%, respectively and reaching energy minima decreased by 5.72%. Our results demonstrate that this improved movement is more powerful to guide proteins faster to low energy regions of conformational space and promote folding efficiency and predicted model accuracy.
Many proteins must adopt a specific structure to perform their functions, and failure to do so has been linked to disease. Although small proteins often fold rapidly and spontaneously to their native conformations, larger proteins are less likely to fold correctly due to the myriad incorrect arrangements they can adopt. Here, we provide mechanistic insights into how this problem can be alleviated if proteins start folding while they are being translated by the ribosome. This process of cotranslational folding biases certain proteins away from misfolded states that tend to hinder spontaneous refolding. Signatures of unusually slow translation suggest that some of these proteins have evolved to fold cotranslationally.Many large proteins suffer from slow or inefficient folding in vitro. It has long been known that this problem can be alleviated in vivo if proteins start folding cotranslationally. However, the molecular mechanisms underlying this improvement have not been well established. To address this question, we use an all-atom simulation-based algorithm to compute the folding properties of various large protein domains as a function of nascent chain length. We find that for certain proteins, there exists a narrow window of lengths that confers both thermodynamic stability and fast folding kinetics. Beyond these lengths, folding is drastically slowed by nonnative interactions involving C-terminal residues. Thus, cotranslational folding is predicted to be beneficial because it allows proteins to take advantage of this optimal window of lengths and thus avoid kinetic traps. Interestingly, many of these proteins’ sequences contain conserved rare codons that may slow down synthesis at this optimal window, suggesting that synthesis rates may be evolutionarily tuned to optimize folding. Using kinetic modeling, we show that under certain conditions, such a slowdown indeed improves cotranslational folding efficiency by giving these nascent chains more time to fold. In contrast, other proteins are predicted not to benefit from cotranslational folding due to a lack of significant nonnative interactions, and indeed these proteins’ sequences lack conserved C-terminal rare codons. Together, these results shed light on the factors that promote proper protein folding in the cell and how biomolecular self-assembly may be optimized evolutionarily.
The protein misfolding avoidance hypothesis explains the universal negative correlation between protein abundance and sequence evolutionary rate across the proteome by identifying protein folding free energy (ΔG) as the confounding variable. Abundant proteins resist toxic misfolding events by being more stable, and more stable proteins evolve slower because their mutations are more destabilizing. Direct supporting evidence consists only of computer simulations. A study taking advantage of a recent experimental breakthrough in measuring protein stability proteome-wide through melting temperature (Tm) (Leuenberger et al. 2017), found weak misfolding avoidance hypothesis support for the Escherichia coli proteome, and no support for the Saccharomyces cerevisiae, Homo sapiens, and Thermus thermophilus proteomes (Plata and Vitkup 2018). I find that the nontrivial relationship between Tm and ΔG and inaccuracy in Tm measurements by Leuenberger et al. 2017 can be responsible for not observing strong positive abundance–Tm and strong negative Tm–evolutionary rate correlations.
The mechanisms of adaptation to inactivation of essential genes remain unknown. Here we inactivate E. coli dihydrofolate reductase (DHFR) by introducing D27G,N,F chromosomal mutations in a key catalytic residue with subsequent adaptation by an automated serial transfer protocol. The partial reversal G27- > C occurred in three evolutionary trajectories. Conversely, in one trajectory for D27G and in all trajectories for D27F,N strains adapted to grow at very low metabolic supplement (folAmix) concentrations but did not escape entirely from supplement auxotrophy. Major global shifts in metabolome and proteome occurred upon DHFR inactivation, which were partially reversed in adapted strains. Loss-of-function mutations in two genes, thyA and deoB, ensured adaptation to low folAmix by rerouting the 2-Deoxy-D-ribose-phosphate metabolism from glycolysis towards synthesis of dTMP. Multiple evolutionary pathways of adaptation converged to a suboptimal solution due to the high accessibility to loss-of-function mutations that block the path to the highest, yet least accessible, fitness peak.
Directed evolution using random mutation in vast sequence space leads to the low probability of obtaining target proteins. Emerging engineering strategies with computational tools are developed for more trustable outcomes. We used some semi-rational design methods to modify an industrial enzyme, namely cellobiose 2-epimerase (CE). A mutant was selected for its better thermostability and isomerization activity. The tradeoffs between thermostability, epimerization activity and isomerization activity of the CE mutants were different. To investigate the computational prediction performance of protein stability upon point mutations, molecular dynamics (MD) simulation analyses were conducted. The root mean square deviation (RMSD) and hydrogen bond analyses reproduced the correct trends in stability changes of the wild-type and mutated CEs with relatively high accuracy (correlation coefficients r ~ 0.5–0.8). The simulation temperature and time are important factors that influence the prediction performance. Our result shows that thermostability predictors calculated from MD simulation do better in predicting the thermostability changes of the mutated enzymes than the predictors using static-state information of the enzymes.
While reverse genetics and functional genomics have long affirmed the role of individual mutations in determining protein function, there have been fewer studies addressing how large‐scale changes in protein sequences, such as in entire modular segments, influence protein function and evolution. Given how recombination can reassort protein sequences, these types of changes may play an underappreciated role in how novel protein functions evolve in nature. Such studies could aid our understanding of whether certain organismal phenotypes related to protein function—such as growth in the presence or absence of an antibiotic—are robust with respect to the identity of certain modular segments. In this study, we combine molecular genetics with biochemical and biophysical methods to gain a better understanding of protein modularity in dihydrofolate reductase (DHFR), an enzyme target of antibiotics also widely used as a model for protein evolution. We replace an integral α‐helical segment of Escherichia coliDHFR with segments from a number of different organisms (many nonmicrobial) and examine how these chimeric enzymes affect organismal phenotypes (e.g., resistance to an antibiotic) as well as biophysical properties of the enzyme (e.g., thermostability). We find that organismal phenotypes and enzyme properties are highly sensitive to the identity of DHFR modules, and that this chimeric approach can create enzymes with diverse biophysical characteristics.
Human β2-microglobulin (b2m) protein is classically associated with dialysis-related amyloidosis (DRA). Recently, the single point mutant D76N was identified as the causative agent of a hereditary systemic amyloidosis affecting visceral organs. To get insight into the early stage of the β2m aggregation mechanism, we used molecular simulations to perform an in depth comparative analysis of the dimerization phase of the D76N mutant and the ΔN6 variant, a cleaved form lacking the first six N-terminal residues, which is a major component of ex vivo amyloid plaques from DRA patients. We also provide first glimpses into the tetramerization phase of D76N at physiological pH. Results from extensive protein–protein docking simulations predict an essential role of the C- and N-terminal regions (both variants), as well as of the BC-loop (ΔN6 variant), DE-loop (both variants) and EF-loop (D76N mutant) in dimerization. The terminal regions are more relevant under acidic conditions while the BC-, DE- and EF-loops gain importance at physiological pH. Our results recapitulate experimental evidence according to which Tyr10 (A-strand), Phe30 and His31 (BC-loop), Trp60 and Phe62 (DE-loop) and Arg97 (C-terminus) act as dimerization hot-spots, and further predict the occurrence of novel residues with the ability to nucleate dimerization, namely Lys-75 (EF-loop) and Trp-95 (C-terminus). We propose that D76N tetramerization is mainly driven by the self-association of dimers via the N-terminus and DE-loop, and identify Arg3 (N-terminus), Tyr10, Phe56 (D-strand) and Trp60 as potential tetramerization hot-spots.
Cellobiose 2-epimerase (CE) is a promising industrial enzyme that can be utilized in the dairy industry. More thermostable CEs from different microorganisms are still needed for a higher lactulose productivity. This study demonstrated the feasibility to use molecular dynamics (MD) simulation as the preliminary computational filter for thermostable enzymes screening. Sequence information of eleven uncharacterized CEs were chosen to be analyzed by MD simulations. The CE from Dictyoglomus thermophilum (Dith-CE) was determined experimentally to be one of the most thermostable CEs with the highest epimerization (160 ± 6.5 U mg−1) and isomerization activities (3.52 ± 0.23 U mg−1) among all the reported CEs. This enzyme shows the highest isomerization activity at 85 °C and pH 7.0. The kinetic parameters (kcat and Km) of isomerization activity of this CE are 3.98 ± 0.3 s−1 and 235.2 ± 11.2 mM, respectively. These results suggest that the CE from Dith-CE is a promising lactulose-producing enzyme.
Why highly stable proteins are not commonly observed in mesophilic organisms is an important evolutionary question. It has been suggested that high stability has a fitness cost in the form of loss of activity. However, this hypothesis has not yet been experimentally demonstrated. Here, we use an essential bacterial enzyme adenylate kinase (Adk) to explore this hypothesis and show that, as Adk’s stability increases, one of its own substrates inhibits its activity. Bacterial strains carrying such stable Adks show substantial fitness defects, which can be mapped to the loss of Adk activity due to substrate inhibition. Overall, our study adds “substrate inhibition” to the “toolbox” that is used to rationalize the stability distribution of proteins.Proteins are only moderately stable. It has long been debated whether this narrow range of stabilities is solely a result of neutral drift toward lower stability or purifying selection against excess stability—for which no experimental evidence was found so far—is also at work. Here, we show that mutations outside the active site in the essential Escherichia coli enzyme adenylate kinase (Adk) result in a stability-dependent increase in substrate inhibition by AMP, thereby impairing overall enzyme activity at high stability. Such inhibition caused substantial fitness defects not only in the presence of excess substrate but also under physiological conditions. In the latter case, substrate inhibition caused differential accumulation of AMP in the stationary phase for the inhibition-prone mutants. Furthermore, we show that changes in flux through Adk could accurately describe the variation in fitness effects. Taken together, these data suggest that selection against substrate inhibition and hence excess stability may be an important factor determining stability observed for modern-day Adk.
Mutation rate is a key determinant of the pace as well as outcome of evolution, and variability in this rate has been shown in different scenarios to play a key role in evolutionary adaptation and resistance evolution under stress caused by selective pressure. Here we investigate the dynamics of resistance fixation in a bacterial population with variable mutation rates, and we show that evolutionary outcomes are most sensitive to mutation rate variations when the population is subject to environmental and demographic conditions that suppress the evolutionary advantage of high-fitness subpopulations. By directly mapping a biophysical fitness function to the system-level dynamics of the population, we show that both low and very high, but not intermediate, levels of stress in the form of an antibiotic result in a disproportionate effect of hypermutation on resistance fixation. We demonstrate how this behavior is directly tied to the extent of genetic hitchhiking in the system, the propagation of high-mutation rate cells through association with high-fitness mutations. Our results indicate a substantial role for mutation rate flexibility in the evolution of antibiotic resistance under conditions that present a weak advantage over wildtype to resistant cells.
A central goal of protein-folding theory is to predict the stochastic dynamics of transition paths—the rare trajectories that transit between the folded and unfolded ensembles—using only thermodynamic information, such as a low-dimensional equilibrium free-energy landscape. However, commonly used one-dimensional landscapes typically fall short of this aim, because an empirical coordinate-dependent diffusion coefficient has to be fit to transition-path trajectory data in order to reproduce the transition-path dynamics. We show that an alternative, first-principles free-energy landscape predicts transition-path statistics that agree well with simulations and single-molecule experiments without requiring dynamical data as an input. This “topological configuration” model assumes that distinct, native-like substructures assemble on a time scale that is slower than native-contact formation but faster than the folding of the entire protein. Using only equilibrium simulation data to determine the free energies of these coarse-grained intermediate states, we predict a broad distribution of transition-path transit times that agrees well with the transition-path durations observed in simulations. We further show that both the distribution of finite-time displacements on a one-dimensional order parameter and the ensemble of transition-path trajectories generated by the model are consistent with the simulated transition paths. These results indicate that a landscape based on transient folding intermediates, which are often hidden by one-dimensional projections, can form the basis of a predictive model of protein-folding transition-path dynamics.
The relationship between the dynamics of a community and its constituent pairwise interactions is a fundamental problem in ecology. Higher-order ecological effects beyond pairwise interactions may be key to complex ecosystems, but mechanisms to produce these effects remain poorly understood. Here we model microbial growth and competition to show that higher-order effects can arise from variation in multiple microbial growth traits, such as lag times and growth rates, on a single limiting resource with no other interactions. These effects produce a range of ecological phenomena: an unlimited number of strains can exhibit multistability and neutral coexistence, potentially with a single keystone strain; strains that coexist in pairs do not coexist all together; and a strain that wins all pairwise competitions can go extinct in a mixed competition. Since variation in multiple growth traits is ubiquitous in microbial populations, our results indicate these higher-order effects may also be widespread, especially in laboratory ecology and evolution experiments.
Increased light scattering in the eye lens due to aggregation of the long-lived lens proteins, crystallins, is the cause of cataract disease. Several mutations in the gene encoding human γD-crystallin (HγD) cause misfolding and aggregation. Cataract-associated substitutions at Trp42 cause the protein to aggregate in vitro from a partially unfolded intermediate locked by an internal disulfide bridge, and proteomic evidence suggests a similar aggregation precursor is involved in age-onset cataract. Surprisingly, WT HγD can promote aggregation of the W42Q variant while itself remaining soluble. Here, a search for a biochemical mechanism for this interaction has revealed a previously unknown oxidoreductase activity in HγD. Using in vitrooxidation, mutational analysis, cysteine labeling, and MS, we have assigned this activity to a redox-active internal disulfide bond that is dynamically exchanged among HγD molecules. The W42Q variant acts as a disulfide sink, reducing oxidized WT and forming a distinct internal disulfide that kinetically traps the aggregation-prone intermediate. Our findings suggest a redox “hot potato” competition among WT and mutant or modified polypeptides wherein variants with the lowest kinetic stability are trapped in aggregation-prone intermediate states upon accepting disulfides from more stable variants. Such reactions may occur in other long-lived proteins that function in oxidizing environments. In these cases, aggregation may be forestalled by inhibiting disulfide flow toward mutant or damaged polypeptides.
Despite considerable efforts, no physical mechanism has been shown to explain N-terminalcodon bias in prokaryotic genomes. Using a systematic study of synonymous substitutions in two endogenous E. coli genes, we show that interactions between the coding region and the upstream Shine-Dalgarno (SD) sequence modulate the efficiency of translation initiation, affecting both intracellular mRNA and protein levels due to the inherent coupling of transcription and translation in E. coli. We further demonstrate that far-downstream mutations can also modulate mRNA levels by occluding the SD sequence through the formation of non-equilibrium secondary structures. By contrast, a non-endogenous RNA polymerase that decouples transcription and translation largely alleviates the effects of synonymous substitutions on mRNA levels. Finally, a complementary statistical analysis of the E. coli genome specifically implicates avoidance of intra-molecular base pairing with the SD sequence. Our results provide general physical insights into the coding-level features that optimize protein expression in prokaryotes.
Viral evolutionary pathways are determined by the fitness landscape, which maps viral genotype to fitness. However, a quantitative description of the landscape and the evolutionary forces on it remain elusive. Here, we apply a biophysical fitness model based on capsid folding stability and antibody binding affinity to predict the evolutionary pathway of norovirus escaping a neutralizing antibody. The model is validated by experimental evolution in bulk culture and in a drop-based microfluidics that propagates millions of independent small viral subpopulations. We demonstrate that along the axis of binding affinity, selection for escape variants and drift due to random mutations have the same direction, an atypical case in evolution. However, along folding stability, selection and drift are opposing forces whose balance is tuned by viral population size. Our results demonstrate that predictable epistatic tradeoffs between molecular traits of viral proteins shape viral evolution.
Motivation: Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level.
Results: We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determi- nants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of –0.49 (P-value < 10-10) and –0.46 (P-value<10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant.
Protein thermodynamics are an integral determinant of viral fitness and one of the major drivers of protein evolution. Mutations in the influenza A virus (IAV) hemagglutinin (HA) protein can eliminate neutralizing antibody binding to mediate escape from preexisting antiviral immunity. Prior research on the IAV nucleoprotein suggests that protein stability may constrain seasonal IAV evolution; however, the role of stability in shaping the evolutionary dynamics of the HA protein has not been explored. We used the full coding sequence of 9,797 H1N1pdm09 HA sequences and 16,716 human seasonal H3N2 HA sequences to computationally estimate relative changes in the thermal stability of the HA protein between 2009 and 2016. Phylogenetic methods were used to characterize how stability differences impacted the evolutionary dynamics of the virus. We found that pandemic H1N1 IAV strains split into two lineages that had different relative HA protein stabilities and that later variants were descended from the higher-stability lineage. Analysis of the mutations associated with the selective sweep of the higher-stability lineage found that they were characterized by the early appearance of highly stabilizing mutations, the earliest of which was not located in a known antigenic site. Experimental evidence further suggested that H1N1 HA stability may be correlated with in vitro virus production and infection. A similar analysis of H3N2 strains found that surviving lineages were also largely descended from viruses predicted to encode more-stable HA proteins. Our results suggest that HA protein stability likely plays a significant role in the persistence of different IAV lineages.
Mutations in a microbial population can increase the frequency of a genotype not only by increasing its exponential growth rate, but also by decreasing its lag time or adjusting the yield (resource efficiency). The contribution of multiple life-history traits to selection is a critical question for evolutionary biology as we seek to predict the evolutionary fates of mutations. Here we use a model of microbial growth to show that there are two distinct components of selection corresponding to the growth and lag phases, while the yield modulates their relative importance. The model predicts rich population dynamics when there are trade-offs between phases: multiple strains can coexist or exhibit bistability due to frequency-dependent selection, and strains can engage in rock–paper–scissors interactions due to non-transitive selection. We characterize the environmental conditions and patterns of traits necessary to realize these phenomena, which we show to be readily accessible to experiments. Our results provide a theoretical framework for analysing high-throughput measurements of microbial growth traits, especially interpreting the pleiotropy and correlations between traits across mutants. This work also highlights the need for more comprehensive measurements of selection in simple microbial systems, where the concept of an ordinary fitness landscape breaks down.