Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology.
O-GlcNAc transferase (OGT) is one of essential mammalian enzymes, which catalyze the transfer of N-acetylglucosamine from UDP-N-acetylglucosamine (UDP-GlcNAc) to hydroxyl groups of serines and threonines (Ser/Thr) in proteins. Dysregulations of cellular O-GlcNAc have been implicated in diabetes, neurodegenerative disease, and cancer, which brings great interest in developing potent and specific small-molecular OGT inhibitors. In this work, we performed virtual screening on OGT catalytic site to identify potential inhibitors. 7134792 drug-like compounds from ZINC (a free database of commercially available compounds for virtual screening) and 4287550 compounds generated by FOG (fragment optimized growth program) were screened and the top 116 compounds ranked by docking score were analyzed. By comparing the screening results, we found FOG program can generate more compounds with better docking scores than ZINC. The top ZINC compounds ranked by docking score were grouped into two classes, which held the binding positions of UDP and GlcNAc of UDP-GlcNAc. Combined with individual fragments in binding pocket, de novo compounds were designed and proved to have better docking score. The screened and designed compounds may become a starting point for developing new drugs.
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of ‘true’ LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Fitness landscapes of drug resistance constitute powerful tools to elucidate mutational pathways of antibiotic escape. Here, we developed a predictive biophysics-based fitness landscape of trimethoprim (TMP) resistance for Escherichia coli dihydrofolate reductase (DHFR). We investigated the activity, binding, folding stability, and intracellular abundance for a complete set of combinatorial DHFR mutants made out of three key resistance mutations and extended this analysis to DHFR originated from Chlamydia muridarum and Listeria grayi. We found that the acquisition of TMP resistance via decreased drug affinity is limited by a trade-off in catalytic efficiency. Protein stability is concurrently affected by the resistant mutants, which precludes a precise description of fitness from a single molecular trait. Application of the kinetic flux theory provided an accurate model to predict resistance phenotypes (IC50) quantitatively from a unique combination of the in vitro protein molecular properties. Further, we found that a controlled modulation of the GroEL/ES chaperonins and Lon protease levels affects the intracellular steady-state concentration of DHFR in a mutation-specific manner, whereas IC50 is changed proportionally, as indeed predicted by the model. This unveils a molecular rationale for the pleiotropic role of the protein quality control machinery on the evolution of antibiotic resistance, which, as we illustrate here, may drastically confound the evolutionary outcome. These results provide a comprehensive quantitative genotype–phenotype map for the essential enzyme that serves as an important target of antibiotic and anticancer therapies.
Viruses constantly face the selection pressure of antibodies, either from innate immune response of the host or from administered antibodies for treatment. We explore the interplay between the biophysical properties of viral proteins and the population and demographic variables in the viral escape. The demographic and population genetics aspect of the viral escape have been explored before; however one important assumption was the a priori distribution of fitness effects (DFE). Here, we relax this assumption by instead considering a realistic biophysics-based genotype-phenotype relationship for RNA viruses escaping antibodies stress. In this model the DFE is itself an evolvable property that depends on the genetic background (epistasis) and the distribution of biophysical effects of mutations, which is informed by biochemical experiments and theoretical calculations in protein engineering. We quantitatively explore in silico the viability of viral populations under antibodies pressure and derive the phase diagram that defines the fate of the virus population (extinction or escape from stress) in a range of viral mutation rates and antibodies concentrations. We find that viruses are most resistant to stress at an optimal mutation rate (OMR) determined by the competition between supply of beneficial mutation to facilitate escape from stressors and lethal mutagenesis caused by excess of destabilizing mutations. We then show the quantitative dependence of the OMR on genome length and viral burst size. We also recapitulate the experimental observation that viruses with longer genomes have smaller mutation rate per nucleotide.
Considerable mechanistic insight has been gained into amyloid aggregation; however, a large number of non-amyloid protein aggregates are considered “amorphous,” and in most cases, little is known about their mechanisms. Amorphous aggregation of γ-crystallins in the eye lens causes cataract, a widespread disease of aging. We combined simulations and experiments to study the mechanism of aggregation of two γD-crystallin mutants, W42R and W42Q: the former a congenital cataract mutation, and the latter a mimic of age-related oxidative damage. We found that formation of an internal disulfide was necessary and sufficient for aggregation under physiological conditions. Two-chain all-atom simulations predicted that one non-native disulfide in particular, between Cys32 and Cys41, was likely to stabilize an unfolding intermediate prone to intermolecular interactions. Mass spectrometry and mutagenesis experiments confirmed the presence of this bond in the aggregates and its necessity for oxidative aggregation under physiological conditions in vitro. Mining the simulation data linked formation of this disulfide to extrusion of the N-terminal β-hairpin and rearrangement of the native β-sheet topology. Specific binding between the extruded hairpin and a distal β-sheet, in an intermolecular chain reaction similar to domain swapping, is the most probable mechanism of aggregate propagation.
We present a new open-source software, called OpenGrowth, which aims to create de novo ligands by connecting small organic fragments in the active site of proteins. Molecule growth is biased to produce structures that statistically resemble drugs in an input training database. Consequently, the produced molecules have superior synthetic accessibility and pharmacokinetic properties compared with randomly grown molecules. The growth process can take into account the flexibility of the target protein and can be started from a seed to mimic R-group strategy or fragment-based drug discovery. Primary applications of the software on the HIV-1 protease allowed us to quickly identify new inhibitors with a predicted Kd as low as 18 nM. We also present a graphical user interface that allows a user to select easily the fragments to include in the growth process. OpenGrowth is released under the GNU GPL license and is available free of charge on the authors’ website and at http://opengrowth.sourceforge.net/.
Domain swapping in proteins is an important mechanism of functional and structural innovation. However, despite its ubiquity and importance, the physical mechanisms that lead to domain swapping are poorly understood. Here, we present a simple two-dimensional coarse-grained model of protein domain swapping in the cytoplasm. In our model, two-domain proteins partially unfold and diffuse in continuous space. Monte Carlo multiprotein simulations of the model reveal that domain swapping occurs at intermediate temperatures, whereas folded dimers and folded monomers prevail at low temperatures, and partially unfolded monomers predominate at high temperatures. We use a simplified amino acid alphabet consisting of four residue types, and find that the oligomeric state at a given temperature depends on the sequence of the protein. We also show that hinge strain between domains can promote domain swapping, consistent with experimental observations for real proteins. Domain swapping depends nonmonotonically on the protein concentration, with domain-swapped dimers occurring at intermediate concentrations and nonspecific interactions between partially unfolded proteins occurring at high concentrations. For folded proteins, we recover the result obtained in three-dimensional lattice simulations, i.e., that functional dimerization is most prevalent at intermediate temperatures and nonspecific interactions increase at low temperatures.
We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions.
Gene dosage toxicity (GDT) is an important factor that determines optimal levels of protein abundances, yet its molecular underpinnings remain unknown. Here, we demonstrate that overexpression of DHFR in E. coli causes a toxic metabolic imbalance triggered by interactions with several functionally related enzymes. Though deleterious in the overexpression regime, surprisingly, these interactions are beneficial at physiological concentrations, implying their functional significance in vivo. Moreover, we found that overexpression of orthologous DHFR proteins had minimal effect on all levels of cellular organization - molecular, systems, and phenotypic, in sharp contrast to E. coliDHFR. Dramatic difference of GDT between 'E. coli's self' and 'foreign' proteins suggests the crucial role of evolutionary selection in shaping protein-protein interaction (PPI) networks at the whole proteome level. This study shows how protein overexpression perturbs a dynamic metabolon of weak yet potentially functional PPI, with consequences for the metabolic state of cells and their fitness.
Human noroviruses (HuNoVs) are positive-sense RNA viruses that can cause severe, highly infectious gastroenteritis. HuNoV outbreaks are frequently associated with recombination between circulating strains. Strain genotyping and phylogenetic analyses show that noroviruses often recombine in a highly conserved region near the junction of the viral polyprotein (ORF1) and capsid (ORF2) genes and occasionally within the RNA-dependent RNA polymerase (RdRP) gene. Although genotyping methods are useful for tracking changes in circulating viral populations, they only report the dominant recombinant strains and do not elucidate the frequency or range of recombination events. Furthermore, the relatively low frequency of recombination in RNA viruses has limited studies to cell culture or in vitro systems that do not reflect the complexities and selective pressures present in an infected organism. Using two murine norovirus (MNV) strains to model co-infection, we developed a microfluidic platform to amplify, detect, and recover individual recombinants following in vitro and in vivo co-infection. One-step RT-PCR was performed in picoliter drops with primers that identified the wild-type and recombinant progenies and scanned for recombination breakpoints at approximately 1-kb intervals. We detected recombination between MNV strains at multiple loci spanning the viral protease, RdRP, and capsid ORFs and isolated individual recombinant RNA genomes that were present at a frequency of 1/300,000 or greater. This study is the first to examine norovirus recombination following co-infection of an animal and suggests that the exchange of RNA among viral genomes in the infected host occurs in multiple locations and is an important driver of genetic diversity.
In complex, multicomponent systems, polymers often undergo phase transitions between distinct conformations. This paper reports a millimeter‐scale granular model of coil‐to‐globule transitions: one “polymer” chain—a cylinders‐on‐a‐string “pearl necklace”—and many spheres, all shaken on a horizontal surface. It is possible to describe the behavior of this granular system by using formalisms generally used in statistical physics of polymers. Two sets of experiments allowed the observation of first‐ and second‐order coil‐to‐globule transitions. The model shows that the competition between long‐ and short‐range interactions leads to a first‐order transition. Well‐designed granular system represents another kind of approach to the study of polymer phase transitions and might inspire future designs of polymer‐like mesoscale systems.
To isolate functional nucleic acids that bind to defined targets with high affinity and specificity, which are known as aptamers, the systematic evolution of ligands by exponential enrichment (SELEX) methodology has emerged as the preferred approach. Here, we propose a computational approach, SELEX in silico, that allows the sequence space to be more thoroughly explored regarding binding of a certain target. Our approach consists of two steps: (i) secondary structure-based sequence screening, which aims to collect the sequences that can form a desired RNA motif as an enhanced initial library, followed by (ii) sequence enrichment regarding target binding by molecular dynamics simulation-based virtual screening. Our SELEX in silico method provided a practical computational solution to three key problems in aptamer sequence searching: design of nucleic acid libraries, knowledge of sequence enrichment, and identification of potent aptamers. Six potent theophylline-binding aptamers, which were isolated by SELEX in silico from a sequence space containing 413 sequences, were experimentally verified to bind theophylline with high affinity: Kd ranging from 0.16 to 0.52 μM, compared with the dissociation constant of the original aptamer-theophylline, 0.32 μM. These results demonstrate the significant potential of SELEX in silico as a new method for aptamer discovery and optimization.
Linking the molecular effects of mutations to fitness is central to understanding evolutionary dynamics. Here, we establish a quantitative relation between the global effect of mutations on the E. coliproteome and bacterial fitness. We created E. coli strains with specific destabilizing mutations in the chromosomalfolA gene encoding dihydrofolate reductase(DHFR) and quantified the ensuing changes in the abundances of 2,000+ E. coli proteins in mutant strains using tandem mass tags with subsequent LC-MS/MS. mRNA abundances in the same E. coli strains were also quantified. The proteomic effects of mutations in DHFRare quantitatively linked to phenotype: the SDs of the distributions of logarithms of relative (to WT) protein abundances anticorrelate with bacterial growth rates. Proteomes hierarchically cluster first by media conditions, and within each condition, by the severity of the perturbation to DHFR function. These results highlight the importance of a systems-level layer in the genotype-phenotype relationship.
Horizontal gene transfer (HGT) plays a central role in bacterial evolution, yet the molecular and cellular constraints on functional integration of the foreign genes are poorly understood. Here we performed inter-species replacement of the chromosomal folA gene, encoding an essential metabolic enzyme dihydrofolate reductase (DHFR), with orthologs from 35 other mesophilic bacteria. The orthologous inter-species replacements caused a marked drop (in the range 10-90%) in bacterial growth rate despite the fact that most orthologous DHFRs are as stable as E.coli DHFR at 37°C and are more catalytically active than E. coli DHFR. Although phylogenetic distance between E. coli and orthologous DHFRs as well as their individual molecular properties correlate poorly with growth rates, the product of the intracellular DHFR abundance and catalytic activity (kcat/KM), correlates strongly with growth rates, indicating that the drop in DHFR abundance constitutes the major fitness barrier to HGT. Serial propagation of the orthologous strains for ~600 generations dramatically improved growth rates by largely alleviating the fitness barriers. Whole genome sequencing and global proteome quantification revealed that the evolved strains with the largest fitness improvements have accumulated mutations that inactivated the ATP-dependent Lon protease, causing an increase in the intracellular DHFR abundance. In one case DHFR abundance increased further due to mutations accumulated in folA promoter, but only after the lon inactivating mutations were fixed in the population. Thus, by apparently distinguishing between self and non-self proteins, protein homeostasis imposes an immediate and global barrier to the functional integration of foreign genes by decreasing the intracellular abundance of their products. Once this barrier is alleviated, more fine-tuned evolution occurs to adjust the function/expression of the transferred proteins to the constraints imposed by the intracellular environment of the host organism.
Design of proteins with desired thermal properties is important for scientific and biotechnological applications. Here we developed a theoretical approach to predict the effect of mutations on protein stability from non-equilibrium unfolding simulations. We establish a relative measure based on apparent simulated melting temperatures that is independent of simulation length and, under certain assumptions, proportional to equilibrium stability, and we justify this theoretical development with extensive simulations and experimental data. Using our new method based on all-atom Monte-Carlo unfolding simulations, we carried out a saturating mutagenesis of Dihydrofolate Reductase (DHFR), a key target of antibiotics and chemotherapeutic drugs. The method predicted more than 500 stabilizing mutations, several of which were selected for detailed computational and experimental analysis. We find a highly significant correlation of r=0.65-0.68 between predicted and experimentally determined melting temperatures and unfolding denaturant concentrations for WT DHFR and 42 mutants. The correlation between energy of the native state and experimental denaturation temperature was much weaker, indicating the important role of entropy in protein stability. The most stabilizing point mutation was D27F, which is located in the active site of the protein, rendering it inactive. However for the rest of mutations outside of the active site we observed a weak yet statistically significant positive correlation between thermal stability and catalytic activity indicating the lack of a stability-activity tradeoff for DHFR. By combining stabilizing mutations predicted by our method, we created a highly stable catalytically active E. coli DHFR mutant with measured denaturation temperature 7.2°C higher than WT. Prediction results for DHFR and several other proteins indicate that computational approaches based on unfolding simulations are useful as a general technique to discover stabilizing mutations.
Drug repurposing is a strategy employed to circumvent some of the bottlenecks involved in drug development, such as the cost and time needed for developing new molecular entities. Noroviruses cause recurrent epidemics and sporadic outbreaks of gastroenteritis associated with significant mortality and economic costs, but no treatment has been approved to date. Herein, a library of molecules previously used in humans was screened to find compounds with anti-noroviral activity. Antiviral testing for four selected compounds against murine norovirus infection revealed that rutin has anti-murine norovirus activity in cell-based assays.
Although heat shock response is ubiquitous in bacterial cells, the underlying physical chemistry behind heat shock response remains poorly understood. To study the response of cell populations to heat shock we employ a physics-based ab initio model of living cells where protein biophysics (i.e., folding and protein-protein interactions in crowded cellular environments) and important aspects of proteins homeostasis are coupled with realistic population dynamics simulations. By postulating a genotype-phenotype relationship we define a cell division rate in terms of functional concentrations of proteins and protein complexes, whose Boltzmann stabilities of folding and strengths of their functional interactions are exactly evaluated from their sequence information. We compare and contrast evolutionary dynamics for two models of chaperon action. In the active model, foldase chaperones function as nonequilibrium machines to accelerate the rate of protein folding. In the passive model, holdase chaperones form reversible complexes with proteins in their misfolded conformations to maintain their solubility. We find that only cells expressing foldase chaperones are capable of genuine heat shock response to the increase in the amount of unfolded proteins at elevated temperatures. In response to heat shock, cells? limited resources are redistributed differently for active and passive models. For the active model, foldase chaperones are overexpressed at the expense of downregulation of high abundance proteins, whereas for the passive model; cells react to heat shock by downregulating their high abundance proteins, as their low abundance proteins are upregulated.
It has long been known that solvation plays an important role in protein-protein interactions. Here, we use a minimalistic solvation-based model for predicting protein binding energy to estimate quantitatively the contribution of the solvation factor in protein binding. The factor is described by a simple linear combination of buried surface areas according to amino-acid types. Even without structural optimization, our minimalistic model demonstrates a predictive power comparable to more complex methods, making the proposed approach the basis for high throughput applications. Application of the model to a proteomic database shows that receptor-substrate complexes involved in signaling have lower affinities than enzyme-inhibitor and antibody-antigen complexes, and they differ by chemical compositions on interfaces. Also, we found that protein complexes with components that come from the same genes generally have lower affinities than complexes formed by proteins from different genes, but in this case the difference originates from different interface areas. The model was implemented in the software PYTHON, and the source code can be found on the Shakhnovich group webpage: http://faculty.chemistry.harvard.edu/shakhnovich/software.
Understanding the relative contributions of various evolutionary processes-purifying selection, neutral drift, and adaptation-is fundamental to evolutionary biology. A common metric to distinguish these processes is the ratio of nonsynonymous to synonymous substitutions (i.e., dN/dS) interpreted from the neutral theory as a null model. However, from biophysical considerations, mutations have non-negligible effects on the biophysical properties of proteins such as folding stability. In this work, we investigated how stability affects the rate of protein evolution in phylogenetic trees by using simulations that combine explicit protein sequences with associated stability changes. We first simulated myoglobin evolution in phylogenetic trees with a biophysically realistic approach that accounts for 3D structural information and estimates of changes in stability upon mutation. We then compared evolutionary rates inferred directly from simulation to those estimated using maximum-likelihood (ML) methods. We found that the dN/dS estimated by ML methods (ωML) is highly predictive of the per gene dN/dS inferred from the simulated phylogenetic trees. This agreement is strong in the regime of high stability where protein evolution is neutral. At low folding stabilities and under mutation-selection balance, we observe deviations from neutrality (per gene dN/dS > 1 and dN/dS < 1). We showed that although per gene dN/dS is robust to these deviations, ML tests for positive selection detect statistically significant per site dN/dS > 1. Altogether, we show how protein biophysics affects the dN/dS estimations and its subsequent interpretation. These results are important for improving the current approaches for detecting positive selection.