Bhattacharyya, S., et al. Accessibility of the Shine-Dalgarno sequence dictates N-terminal codon bias in E. coli. bioRxiv (Forthcoming). Publisher's VersionAbstract
Despite considerable effort, no physical mechanism has been experimentally shown to explain the N-terminal codon bias in prokaryotic genomes. Using a systematic study of synonymous substitutions in two endogenous E. coli genes, we show that interactions between the coding region and the upstream Shine-Dalgarno sequence modulate both the rate of mRNA degradation and the efficiency of translation-initiation via a ’ribosome-protection’ mechanism, which ultimately affects cellular fitness. We further demonstrate that occlusion of the Shine-Dalgarno sequence depends on the formation of non-equilibrium secondary structures, which may include long-range contacts, highlighting the role of co-transcriptional mRNA folding and the influence of the 5’-untranslated region in determining the optimal coding sequence. Finally, a statistical analysis of the E. coli genome also specifically implicates avoidance of intra-molecular base-pairing with the Shine-Dalgarno sequence. Our results therefore provide general physical insights into the coding-level sequence features that optimize protein expression in prokaryotes.
Manhart, M., Adkar, B.V. & Shakhnovich, E.I. Tradeoffs between microbial growth phases lead to frequency-dependent and non-transitive selection. bioRxiv (Forthcoming). Publisher's VersionAbstract
Microbial populations undergo multiple phases of growth, including a lag phase, an exponential growth phase, and a stationary phase. Therefore mutations can improve the frequency of a genotype not only by increasing its growth rate, but also by decreasing the lag time or adjusting the yield (resource efficiency). Furthermore, many mutations will be pleiotropic, affecting multiple phases simultaneously. The contribution of multiple life-history traits to selection is a critical question for evolutionary biology as we seek to predict the evolutionary fates of mutations. Here we use a simple model of microbial growth to quantify how these multiple phases contribute to selection. We find that there are two distinct components of selection corresponding to the growth and lag phases, while the yield modulates their relative importance. Despite its simplicity, the model predicts nontrivial population dynamics when mutations induce tradeoffs between phases. Multiple strains can coexist over long times due to frequency-dependent selection, and strains can engage in rock-paper-scissors interactions due to non-transitive selection. We characterize the environmental conditions and patterns of traits necessary to realize these phenomena, which we show to be readily accessible to experiments. Our results provide a theoretical framework for analyzing high-throughput measurements of microbial growth traits, especially interpreting the pleiotropy and correlations between traits across mutants. This work also highlights the need for more comprehensive measurements of selection in simple microbial systems, where the concept of an ordinary fitness landscape breaks down.
Valleau, S., et al. Absence of selection for quantum coherence in the Fenna-Matthews-Olson complex: a combined evolutionary and excitonic study. arXiv preprint arXiv:1705.01537 (Forthcoming).
Jacobs, W.M. & Shakhnovich, E.I. Evidence of evolutionary selection for co-translational folding. arXiv preprint arXiv:1703.10948 (Forthcoming).
Williams, M. Statistical physics of the symmetric group. Physical Review E 95, 4, 042126 (2017). Publisher's VersionAbstract
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Chéron, N. & Shakhnovich, E.I. Effect of sampling on BACE-1 ligands binding free energy predictions via MM-PBSA calculations. Journal of Computational Chemistry (2017). Publisher's VersionAbstract
The BACE-1 enzyme is a prime target to find a cure to Alzheimer's disease. In this article, we used the MM-PBSA approach to compute the binding free energies of 46 reported ligands to this enzyme. After showing that the most probable protonation state of the catalytic dyad is mono-protonated (on ASP32), we performed a thorough analysis of the parameters influencing the sampling of the conformational space (in total, more than 35 μs of simulations were performed). We show that ten simulations of 2 ns gives better results than one of 50 ns. We also investigated the influence of the protein force field, the water model, the periodic boundary conditions artifacts (box size), as well as the ionic strength. Amber03 with TIP3P, a minimal distance of 1.0 nm between the protein and the box edges and a ionic strength of I = 0.2 M provides the optimal correlation with experiments. Overall, when using these parameters, a Pearson correlation coefficient of R = 0.84 (R2 = 0.71) is obtained for the 46 ligands, spanning eight orders of magnitude of Kd (from 0.017 nm to 2000 μM, i.e., from −14.7 to −3.7 kcal/mol), with a ligand size from 22 to 136 atoms (from 138 to 937 g/mol). After a two-parameter fit of the binding affinities for 12 of the ligands, an error of RMSD = 1.7 kcal/mol was obtained for the remaining ligands.
Zhou, Q., et al. Exploring the Mutational Robustness of Nucleic Acids by Searching Genotype Neighborhoods in Sequence Space. The Journal of Physical Chemistry Letters 8, 2, 407-414 (2017). Publisher's VersionAbstract
To assess the mutational robustness of nucleic acids, many genome- and protein-level studies have been performed, where nucleic acids are treated as genetic information carriers and transferrers. However, the molecular mechanisms through which mutations alter the structural, dynamic, and functional properties of nucleic acids are poorly understood. Here we performed a SELEX in silico study to investigate the fitness distribution of the l-Arm-binding aptamer genotype neighborhoods. Two novel functional genotype neighborhoods were isolated and experimentally verified to have comparable fitness as the wild-type. The experimental aptamer fitness landscape suggests the mutational robustness is strongly influenced by the local base environment and ligand-binding mode, whereas bases distant from the binding pocket provide potential evolutionary pathways to approach the global fitness maximum. Our work provides an example of successful application of SELEX in silico to optimize an aptamer and demonstrates the strong sensitivity of mutational robustness to the site of genetic variation.
Choi, J.-M., Gilson, A.I. & Shakhnovich, E.I. Graph’s Topology and Free Energy of a Spin Model on the Graph. Physical Review Letters 118, 8, 088302 (2017). Publisher's VersionAbstract
In this Letter we investigate a direct relationship between a graph’s topology and the free energy of a spin system on the graph. We develop a method of separating topological and energetic contributions to the free energy, and find that considering the topology is sufficient to qualitatively compare the free energies of different graph systems at high temperature, even when the energetics are not fully known. This method was applied to the metal lattice system with defects, and we found that it partially explains why point defects are more stable than high-dimensional defects. Given the energetics, we can even quantitatively compare free energies of different graph structures via a closed form of linear graph contributions. The closed form is applied to predict the sequence-space free energy of lattice proteins, which is a key factor determining the designability of a protein structure.
Debroise, T., Shakhnovich, E.I. & Chéron, N. A Hybrid Knowledge-Based and Empirical Scoring Function for Protein–Ligand Interaction: SMoG2016. Journal of Chemical Information and Modeling 57, 3, 584–593 (2017). Publisher's VersionAbstract
We present the third generation of our scoring function for the prediction of protein–ligand binding free energy. This function is now a hybrid between a knowledge-based potential and an empirical function. We constructed a diversified set of ∼1000 complexes from the PDBBinding-CN database for the training of the function, and we show that this number of complexes generates enough data to build the potential. The occurrence of 420 different types of atomic pairwise interactions is computed in up to five different ranges of distances to derive the knowledge-based part. All of the parameters were optimized, and we were able to considerably improve the accuracy of the scoring function with a Pearson correlation coefficient against experimental binding free energies of up to 0.57, which ranks our new scoring function as one of the best currently available and the second-best in terms of standard deviation (SD = 1.68 kcal/mol). The function was then further improved by inclusion of different terms taking into account repulsion and loss of entropy upon binding, and we show that it is capable of recovering native binding poses up to 80% of the time. All of the programs, tools, and protein sets are released in the Supporting Information or as open-source programs.
Srinivasan, B., Rodrigues, J.V., Tonddast-Navaei, S., Shakhnovich, E. & Skolnick, J. Rational design of novel allosteric dihydrofolate reductase inhibitors showing antibacterial-effects on drug-resistant E. coli escape-variants. ACS Chemical Biology (2017). Publisher's VersionAbstract
In drug discovery, systematic variations of substituents on a common scaffold and bioisosteric replacements are often used to generate diversity and obtain molecules with better biological effects. However, this could saturate the small-molecule diversity pool resulting in drug resistance. On the other hand, conventional drug discovery relies on targeting known pockets on protein surfaces leading to drug resistance by mutations of critical pocket residues. Here, we present a two-pronged strategy of designing novel drugs that target unique pockets on a protein’s surface to overcome the above problems. Dihydrofolate reductase, DHFR, is a critical enzyme involved in thymidine and purine nucleotide biosynthesis. Several classes of compounds that are structural analogues of the substrate dihydrofolate have been explored for their antifolate activity. Here, we describe 10 novel small-molecule inhibitors of Escherichia coli DHFR, EcDHFR, belonging to the stilbenoid, deoxybenzoin, and chalcone family of compounds discovered by a combination of pocket-based virtual ligand screening and systematic scaffold hopping. These inhibitors show a unique uncompetitive or noncompetitive inhibition mechanism, distinct from those reported for all known inhibitors of DHFR, indicative of binding to a unique pocket distinct from either substrate or cofactor-binding pockets. Furthermore, we demonstrate that rescue mutants of EcDHFR, with reduced affinity to all known classes of DHFR inhibitors, are inhibited at the same concentration as the wild-type. These compounds also exhibit antibacterial activity against E. coli harboring the drug-resistant variant of DHFR. This discovery is the first report on a novel class of inhibitors targeting a unique pocket on EcDHFR.
Gilson, A.I., Marshall-Christensen, A., Choi, J.-M. & Shakhnovich, E.I. The role of evolutionary selection in the dynamics of protein structure evolution. Biophysical Journal 112, 7, 1350–1365 (2017). Publisher's VersionAbstract
Homology modeling is a powerful tool for predicting a protein’s structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Adkar, B.V., et al. Optimization of lag phase shapes the evolution of a bacterial enzyme. Nature Ecology & Evolution 1, 0149 - (2017). Publisher's VersionAbstract

Mutations provide the variation that drives evolution, yet their effects on fitness remain poorly understood. Here we explore how mutations in the essential enzyme adenylate kinase (Adk) of Escherichia coli affect multiple phases of population growth. We introduce a biophysical fitness landscape for these phases, showing how they depend on molecular and cellular properties of Adk. We find that Adk catalytic capacity in the cell (the product of activity and abundance) is the major determinant of mutational fitness effects. We show that bacterial lag times are at a well-defined optimum with respect to Adk’s catalytic capacity, while exponential growth rates are only weakly affected by variation in Adk. Direct pairwise competitions between strains show how environmental conditions modulate the outcome of a competition where growth rates and lag times have a tradeoff, shedding light on the multidimensional nature of fitness and its importance in the evolutionary optimization of enzymes.

Bershtein, S., Serohijos, A.W.R. & Shakhnovich, E.I. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations. Current Opinion in Structural Biology 42, 31–40 (2017). Publisher's VersionAbstract

Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology.

Zhou, Q.-tong, Liang, H.-jun & Shakhnovich, E.I. Virtual Screening of Human O-GlcNAc Transferase Inhibitors. Chinese Journal of Chemical Physics 29, 3, 374–380 (2016).Abstract
O-GlcNAc transferase (OGT) is one of essential mammalian enzymes, which catalyze the transfer of N-acetylglucosamine from UDP-N-acetylglucosamine (UDP-GlcNAc) to hydroxyl groups of serines and threonines (Ser/Thr) in proteins. Dysregulations of cellular O-GlcNAc have been implicated in diabetes, neurodegenerative disease, and cancer, which brings great interest in developing potent and specific small-molecular OGT inhibitors. In this work, we performed virtual screening on OGT catalytic site to identify potential inhibitors. 7134792 drug-like compounds from ZINC (a free database of commercially available compounds for virtual screening) and 4287550 compounds generated by FOG (fragment optimized growth program) were screened and the top 116 compounds ranked by docking score were analyzed. By comparing the screening results, we found FOG program can generate more compounds with better docking scores than ZINC. The top ZINC compounds ranked by docking score were grouped into two classes, which held the binding positions of UDP and GlcNAc of UDP-GlcNAc. Combined with individual fragments in binding pocket, de novo compounds were designed and proved to have better docking score. The screened and designed compounds may become a starting point for developing new drugs.
Jacquin, H., Gilson, A.I., Shakhnovich, E.I., Cocco, S. & Monasson, R. Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models. PLoS Comput Biol 12, 5, e1004889 (2016). Publisher's VersionAbstract

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of ‘true’ LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.

Rodrigues, J.V., et al. Biophysical principles predict fitness landscapes of drug resistance. Proceedings of the National Academy of Sciences 113, 13, E1470–E1478 (2016). Publisher's VersionAbstract

Fitness landscapes of drug resistance constitute powerful tools to elucidate mutational pathways of antibiotic escape. Here, we developed a predictive biophysics-based fitness landscape of trimethoprim (TMP) resistance for Escherichia coli dihydrofolate reductase (DHFR). We investigated the activity, binding, folding stability, and intracellular abundance for a complete set of combinatorial DHFR mutants made out of three key resistance mutations and extended this analysis to DHFR originated from Chlamydia muridarum and Listeria grayi. We found that the acquisition of TMP resistance via decreased drug affinity is limited by a trade-off in catalytic efficiency. Protein stability is concurrently affected by the resistant mutants, which precludes a precise description of fitness from a single molecular trait. Application of the kinetic flux theory provided an accurate model to predict resistance phenotypes (IC50) quantitatively from a unique combination of the in vitro protein molecular properties. Further, we found that a controlled modulation of the GroEL/ES chaperonins and Lon protease levels affects the intracellular steady-state concentration of DHFR in a mutation-specific manner, whereas IC50 is changed proportionally, as indeed predicted by the model. This unveils a molecular rationale for the pleiotropic role of the protein quality control machinery on the evolution of antibiotic resistance, which, as we illustrate here, may drastically confound the evolutionary outcome. These results provide a comprehensive quantitative genotype–phenotype map for the essential enzyme that serves as an important target of antibiotic and anticancer therapies.

Chéron, N., Serohijos, A.W.R., Choi, J.-M. & Shakhnovich, E.I. Evolutionary dynamics of viral escape under antibodies stress: A biophysical model. Protein Science 25, 1332-1340 (2016). Publisher's VersionAbstract

Viruses constantly face the selection pressure of antibodies, either from innate immune response of the host or from administered antibodies for treatment. We explore the interplay between the biophysical properties of viral proteins and the population and demographic variables in the viral escape. The demographic and population genetics aspect of the viral escape have been explored before; however one important assumption was the a priori distribution of fitness effects (DFE). Here, we relax this assumption by instead considering a realistic biophysics-based genotype-phenotype relationship for RNA viruses escaping antibodies stress. In this model the DFE is itself an evolvable property that depends on the genetic background (epistasis) and the distribution of biophysical effects of mutations, which is informed by biochemical experiments and theoretical calculations in protein engineering. We quantitatively explore in silico the viability of viral populations under antibodies pressure and derive the phase diagram that defines the fate of the virus population (extinction or escape from stress) in a range of viral mutation rates and antibodies concentrations. We find that viruses are most resistant to stress at an optimal mutation rate (OMR) determined by the competition between supply of beneficial mutation to facilitate escape from stressors and lethal mutagenesis caused by excess of destabilizing mutations. We then show the quantitative dependence of the OMR on genome length and viral burst size. We also recapitulate the experimental observation that viruses with longer genomes have smaller mutation rate per nucleotide.

Serebryany, E., et al. An internal disulfide locks a misfolded aggregation-prone intermediate in cataract-linked mutants of human γD-crystallin. Journal of Biological Chemistry 291, 36, 19172–19183 (2016). Publisher's VersionAbstract

Considerable mechanistic insight has been gained into amyloid aggregation; however, a large number of non-amyloid protein aggregates are considered “amorphous,” and in most cases, little is known about their mechanisms. Amorphous aggregation of γ-crystallins in the eye lens causes cataract, a widespread disease of aging. We combined simulations and experiments to study the mechanism of aggregation of two γD-crystallin mutants, W42R and W42Q: the former a congenital cataract mutation, and the latter a mimic of age-related oxidative damage. We found that formation of an internal disulfide was necessary and sufficient for aggregation under physiological conditions. Two-chain all-atom simulations predicted that one non-native disulfide in particular, between Cys32 and Cys41, was likely to stabilize an unfolding intermediate prone to intermolecular interactions. Mass spectrometry and mutagenesis experiments confirmed the presence of this bond in the aggregates and its necessity for oxidative aggregation under physiological conditions in vitro. Mining the simulation data linked formation of this disulfide to extrusion of the N-terminal β-hairpin and rearrangement of the native β-sheet topology. Specific binding between the extruded hairpin and a distal β-sheet, in an intermolecular chain reaction similar to domain swapping, is the most probable mechanism of aggregate propagation.

Chéron, N., Jasty, N. & Shakhnovich, E.I. OpenGrowth: An Automated and Rational Algorithm for Finding New Protein Ligands. Journal of Medicinal Chemistry 59, 9, 4171-4188 (2016). Publisher's VersionAbstract

We present a new open-source software, called OpenGrowth, which aims to create de novo ligands by connecting small organic fragments in the active site of proteins. Molecule growth is biased to produce structures that statistically resemble drugs in an input training database. Consequently, the produced molecules have superior synthetic accessibility and pharmacokinetic properties compared with randomly grown molecules. The growth process can take into account the flexibility of the target protein and can be started from a seed to mimic R-group strategy or fragment-based drug discovery. Primary applications of the software on the HIV-1 protease allowed us to quickly identify new inhibitors with a predicted Kd as low as 18 nM. We also present a graphical user interface that allows a user to select easily the fragments to include in the growth process. OpenGrowth is released under the GNU GPL license and is available free of charge on the authors’ website and at

Woodard, J.C., Dunatunga, S. & Shakhnovich, E.I. A Simple Model of Protein Domain Swapping in Crowded Cellular Environments. Biophysical Journal 110, 11, 2367–2376 (2016). Publisher's VersionAbstract

Domain swapping in proteins is an important mechanism of functional and structural innovation. However, despite its ubiquity and importance, the physical mechanisms that lead to domain swapping are poorly understood. Here, we present a simple two-dimensional coarse-grained model of protein domain swapping in the cytoplasm. In our model, two-domain proteins partially unfold and diffuse in continuous space. Monte Carlo multiprotein simulations of the model reveal that domain swapping occurs at intermediate temperatures, whereas folded dimers and folded monomers prevail at low temperatures, and partially unfolded monomers predominate at high temperatures. We use a simplified amino acid alphabet consisting of four residue types, and find that the oligomeric state at a given temperature depends on the sequence of the protein. We also show that hinge strain between domains can promote domain swapping, consistent with experimental observations for real proteins. Domain swapping depends nonmonotonically on the protein concentration, with domain-swapped dimers occurring at intermediate concentrations and nonspecific interactions between partially unfolded proteins occurring at high concentrations. For folded proteins, we recover the result obtained in three-dimensional lattice simulations, i.e., that functional dimerization is most prevalent at intermediate temperatures and nonspecific interactions increase at low temperatures.