Despite considerable efforts, no physical mechanism has been shown to explain N-terminalcodon bias in prokaryotic genomes. Using a systematic study of synonymous substitutions in two endogenous E. coli genes, we show that interactions between the coding region and the upstream Shine-Dalgarno (SD) sequence modulate the efficiency of translation initiation, affecting both intracellular mRNA and protein levels due to the inherent coupling of transcription and translation in E. coli. We further demonstrate that far-downstream mutations can also modulate mRNA levels by occluding the SD sequence through the formation of non-equilibrium secondary structures. By contrast, a non-endogenous RNA polymerase that decouples transcription and translation largely alleviates the effects of synonymous substitutions on mRNA levels. Finally, a complementary statistical analysis of the E. coli genome specifically implicates avoidance of intra-molecular base pairing with the SD sequence. Our results provide general physical insights into the coding-level features that optimize protein expression in prokaryotes.
How post-translational modifications alter the structures and interactions of proteins is of great interest for understanding proteomic changes during aging and disease. Oxidative modifications of the long-lived cysteine-rich lens γ-crystallins are strongly associated with their aggregation into light-scattering structures that result in cataracts - the leading cause of age-related vision loss. How oxidation leads to aggregation is not well understood. Our previous computational and experimental work showed that formation of a particular non-native intramolecular disulfide bond in cataract-associated W42Q/R human γD-crystallin variants trapped a partially unfolded intermediate state prone to aggregation. Surprisingly, it also revealed that the wild-type protein was able to specifically promote aggregation of these variants without itself aggregating. The search for a biochemical mechanism behind this unprecedented “inverse- prion” interaction has now revealed that human γD-crystallin exhibits oxidoreductase activity. This activity depended on formation of a specific internal disulfide bond, which we mapped by LC/MS/MS and by comprehensive Cys mutagenesis. All-atom Monte-Carlo simulations with a statistical potential revealed conformational strain upon formation of this disulfide, which was confirmed by differential scanning flourometry. Disulfide exchange occurred among purified γD-crystallin molecules in solution. Both the Cys-oxidized (disulfide-bonded) wild-type protein and the destabilized (Trp-oxidation mimicking) W42Q variant were highly soluble at physiological temperature and pH. When the two were mixed, however, the disulfide bond transferred from the WT to the mutant. Once oxidized, the mutant became aggregation-prone, its insolubilization helping drive the disulfide transfer. Destabilized or damaged γ- crystallins may act as oxidation sinks in the lens, forming light-scattering aggregates as a consequence. There is evidence that human γD-crystallin's newly found oxidoreductase activity is enzymatically regulated in vivo.
Viral evolutionary pathways are determined by the fitness landscape, which maps viral genotype to fitness. However, a quantitative description of the landscape and the evolutionary forces on it remain elusive. Here, we apply a biophysical fitness model based on capsid folding stability and antibody binding affinity to predict the evolutionary pathway of norovirus escaping a neutralizing antibody. The model is validated by experimental evolution in bulk culture and in a drop-based microfluidics that propagates millions of independent small viral subpopulations. We demonstrate that along the axis of binding affinity, selection for escape variants and drift due to random mutations have the same direction, an atypical case in evolution. However, along folding stability, selection and drift are opposing forces whose balance is tuned by viral population size. Our results demonstrate that predictable epistatic tradeoffs between molecular traits of viral proteins shape viral evolution.
Motivation: Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level.
Results: We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determi- nants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of –0.49 (P-value < 10-10) and –0.46 (P-value<10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant.
Protein thermodynamics are an integral determinant of viral fitness and one of the major drivers of protein evolution. Mutations in the influenza A virus (IAV) hemagglutinin (HA) protein can eliminate neutralizing antibody binding to mediate escape from preexisting antiviral immunity. Prior research on the IAV nucleoprotein suggests that protein stability may constrain seasonal IAV evolution; however, the role of stability in shaping the evolutionary dynamics of the HA protein has not been explored. We used the full coding sequence of 9,797 H1N1pdm09 HA sequences and 16,716 human seasonal H3N2 HA sequences to computationally estimate relative changes in the thermal stability of the HA protein between 2009 and 2016. Phylogenetic methods were used to characterize how stability differences impacted the evolutionary dynamics of the virus. We found that pandemic H1N1 IAV strains split into two lineages that had different relative HA protein stabilities and that later variants were descended from the higher-stability lineage. Analysis of the mutations associated with the selective sweep of the higher-stability lineage found that they were characterized by the early appearance of highly stabilizing mutations, the earliest of which was not located in a known antigenic site. Experimental evidence further suggested that H1N1 HA stability may be correlated with in vitro virus production and infection. A similar analysis of H3N2 strains found that surviving lineages were also largely descended from viruses predicted to encode more-stable HA proteins. Our results suggest that HA protein stability likely plays a significant role in the persistence of different IAV lineages.
Mutations in a microbial population can increase the frequency of a genotype not only by increasing its exponential growth rate, but also by decreasing its lag time or adjusting the yield (resource efficiency). The contribution of multiple life-history traits to selection is a critical question for evolutionary biology as we seek to predict the evolutionary fates of mutations. Here we use a model of microbial growth to show that there are two distinct components of selection corresponding to the growth and lag phases, while the yield modulates their relative importance. The model predicts rich population dynamics when there are trade-offs between phases: multiple strains can coexist or exhibit bistability due to frequency-dependent selection, and strains can engage in rock–paper–scissors interactions due to non-transitive selection. We characterize the environmental conditions and patterns of traits necessary to realize these phenomena, which we show to be readily accessible to experiments. Our results provide a theoretical framework for analysing high-throughput measurements of microbial growth traits, especially interpreting the pleiotropy and correlations between traits across mutants. This work also highlights the need for more comprehensive measurements of selection in simple microbial systems, where the concept of an ordinary fitness landscape breaks down.
We present a study on the evolution of the Fenna–Matthews–Olson bacterial photosynthetic pigment–protein complex. This protein complex functions as an antenna. It transports absorbed photons—excitons—to a reaction center where photosynthetic reactions initiate. The efficiency of exciton transport is therefore fundamental for the photosynthetic bacterium’s survival. We have reconstructed an ancestor of the complex to establish whether coherence in the exciton transport was selected for or optimized over time. We have also investigated the role of optimizing free energy variation upon folding in evolution. We studied whether mutations which connect the ancestor to current day species were stabilizing or destabilizing from a thermodynamic viewpoint. From this study, we established that most of these mutations were thermodynamically neutral. Furthermore, we did not see a large change in exciton transport efficiency or coherence, and thus our results predict that exciton coherence was not specifically selected for.
Recent experiments and simulations have demonstrated that proteins can fold on the ribosome. However, the extent and generality of fitness effects resulting from cotranslational folding remain open questions. Here we report a genome-wide analysis that uncovers evidence of evolutionary selection for cotranslational folding. We describe a robust statistical approach to identify loci within genes that are both significantly enriched in slowly translated codons and evolutionarily conserved. Surprisingly, we find that domain boundaries can explain only a small fraction of these conserved loci. Instead, we propose that regions enriched in slowly translated codons are associated with cotranslational folding intermediates, which may be smaller than a single domain. We show that the intermediates predicted by a native-centric model of cotranslational folding account for the majority of these loci across more than 500 Escherichia coli proteins. By making a direct connection to protein folding, this analysis provides strong evidence that many synonymous substitutions have been selected to optimize translation rates at specific locations within genes. More generally, our results indicate that kinetics, and not just thermodynamics, can significantly alter the efficiency of self-assembly in a biological context.
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
The BACE-1 enzyme is a prime target to find a cure to Alzheimer's disease. In this article, we used the MM-PBSA approach to compute the binding free energies of 46 reported ligands to this enzyme. After showing that the most probable protonation state of the catalytic dyad is mono-protonated (on ASP32), we performed a thorough analysis of the parameters influencing the sampling of the conformational space (in total, more than 35 μs of simulations were performed). We show that ten simulations of 2 ns gives better results than one of 50 ns. We also investigated the influence of the protein force field, the water model, the periodic boundary conditions artifacts (box size), as well as the ionic strength. Amber03 with TIP3P, a minimal distance of 1.0 nm between the protein and the box edges and a ionic strength of I = 0.2 M provides the optimal correlation with experiments. Overall, when using these parameters, a Pearson correlation coefficient of R = 0.84 (R2 = 0.71) is obtained for the 46 ligands, spanning eight orders of magnitude of Kd (from 0.017 nm to 2000 μM, i.e., from −14.7 to −3.7 kcal/mol), with a ligand size from 22 to 136 atoms (from 138 to 937 g/mol). After a two-parameter fit of the binding affinities for 12 of the ligands, an error of RMSD = 1.7 kcal/mol was obtained for the remaining ligands.
To assess the mutational robustness of nucleic acids, many genome- and protein-level studies have been performed, where nucleic acids are treated as genetic information carriers and transferrers. However, the molecular mechanisms through which mutations alter the structural, dynamic, and functional properties of nucleic acids are poorly understood. Here we performed a SELEX in silico study to investigate the fitness distribution of the l-Arm-binding aptamer genotype neighborhoods. Two novel functional genotype neighborhoods were isolated and experimentally verified to have comparable fitness as the wild-type. The experimental aptamer fitness landscape suggests the mutational robustness is strongly influenced by the local base environment and ligand-binding mode, whereas bases distant from the binding pocket provide potential evolutionary pathways to approach the global fitness maximum. Our work provides an example of successful application of SELEX in silico to optimize an aptamer and demonstrates the strong sensitivity of mutational robustness to the site of genetic variation.
In this Letter we investigate a direct relationship between a graph’s topology and the free energy of a spin system on the graph. We develop a method of separating topological and energetic contributions to the free energy, and find that considering the topology is sufficient to qualitatively compare the free energies of different graph systems at high temperature, even when the energetics are not fully known. This method was applied to the metal lattice system with defects, and we found that it partially explains why point defects are more stable than high-dimensional defects. Given the energetics, we can even quantitatively compare free energies of different graph structures via a closed form of linear graph contributions. The closed form is applied to predict the sequence-space free energy of lattice proteins, which is a key factor determining the designability of a protein structure.
We present the third generation of our scoring function for the prediction of protein–ligand binding free energy. This function is now a hybrid between a knowledge-based potential and an empirical function. We constructed a diversified set of ∼1000 complexes from the PDBBinding-CN database for the training of the function, and we show that this number of complexes generates enough data to build the potential. The occurrence of 420 different types of atomic pairwise interactions is computed in up to five different ranges of distances to derive the knowledge-based part. All of the parameters were optimized, and we were able to considerably improve the accuracy of the scoring function with a Pearson correlation coefficient against experimental binding free energies of up to 0.57, which ranks our new scoring function as one of the best currently available and the second-best in terms of standard deviation (SD = 1.68 kcal/mol). The function was then further improved by inclusion of different terms taking into account repulsion and loss of entropy upon binding, and we show that it is capable of recovering native binding poses up to 80% of the time. All of the programs, tools, and protein sets are released in the Supporting Information or as open-source programs.
In drug discovery, systematic variations of substituents on a common scaffold and bioisosteric replacements are often used to generate diversity and obtain molecules with better biological effects. However, this could saturate the small-molecule diversity pool resulting in drug resistance. On the other hand, conventional drug discovery relies on targeting known pockets on protein surfaces leading to drug resistance by mutations of critical pocket residues. Here, we present a two-pronged strategy of designing novel drugs that target unique pockets on a protein’s surface to overcome the above problems. Dihydrofolate reductase, DHFR, is a critical enzyme involved in thymidine and purine nucleotide biosynthesis. Several classes of compounds that are structural analogues of the substrate dihydrofolate have been explored for their antifolate activity. Here, we describe 10 novel small-molecule inhibitors of Escherichia coli DHFR, EcDHFR, belonging to the stilbenoid, deoxybenzoin, and chalcone family of compounds discovered by a combination of pocket-based virtual ligand screening and systematic scaffold hopping. These inhibitors show a unique uncompetitive or noncompetitive inhibition mechanism, distinct from those reported for all known inhibitors of DHFR, indicative of binding to a unique pocket distinct from either substrate or cofactor-binding pockets. Furthermore, we demonstrate that rescue mutants of EcDHFR, with reduced affinity to all known classes of DHFR inhibitors, are inhibited at the same concentration as the wild-type. These compounds also exhibit antibacterial activity against E. coli harboring the drug-resistant variant of DHFR. This discovery is the first report on a novel class of inhibitors targeting a unique pocket on EcDHFR.
Homology modeling is a powerful tool for predicting a protein’s structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Mutations provide the variation that drives evolution, yet their effects on fitness remain poorly understood. Here we explore how mutations in the essential enzyme adenylate kinase (Adk) of Escherichia coli affect multiple phases of population growth. We introduce a biophysical fitness landscape for these phases, showing how they depend on molecular and cellular properties of Adk. We find that Adk catalytic capacity in the cell (the product of activity and abundance) is the major determinant of mutational fitness effects. We show that bacterial lag times are at a well-defined optimum with respect to Adk’s catalytic capacity, while exponential growth rates are only weakly affected by variation in Adk. Direct pairwise competitions between strains show how environmental conditions modulate the outcome of a competition where growth rates and lag times have a tradeoff, shedding light on the multidimensional nature of fitness and its importance in the evolutionary optimization of enzymes.
Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology.