Publications

Submitted
Chowdhury, S., et al. A systems-guided approach to discover the intracellular target of a novel evolution-drug lead. bioRxiv (Submitted). Publisher's VersionAbstract
Understanding intracellular antibiotic targeting and the associated mechanisms leading to bacterial growth inhibition has been a difficult problem. Here, we discovered the additional intracellular targets of the novel “evolution-drug” lead CD15-3 designed to delay the emergence of antibiotic resistance by inhibiting bacterial DHFR and its Trimethoprim resistant variants. Overexpression of DHFR only partially rescued inhibition of E. coli growth by CD15.3 suggesting that CD15.3 also inhibits a non-DHFR target in the cell. We utilized untargeted global metabolomics and the metabolic network analysis along with structural similarity search of the putative targets to identify the additional target of CD15-3. We validated in vivo and in vitro that besides DHFR CD15-3 inhibits HPPK (folK), an essential protein upstream of DHFR in bacterial folate metabolism. This bivalent cellular targeting makes CD15-3 a promising lead to develop a “monotherapy analogue” of combination drugs.Competing Interest StatementThe authors have declared no competing interest.
Serebryany, E., Chowdhury, S., Watson, N.E., McClelland, A. & Shakhnovich, E.I. A Native Chemical Chaperone in the Human Eye Lens. arXiv:2012.09805 [q-bio] (Submitted). Publisher's VersionAbstract
Cataract is one of the most prevalent protein aggregation disorders and still the biggest cause of vision loss worldwide. The human lens, in its core region, lacks turnover of any cells or cellular components; it has therefore evolved remarkable mechanisms for resisting protein aggregation for a lifetime. We now report that one such mechanism relies on an unusually abundant metabolite, myo-inositol, to suppress light-scattering aggregation of lens proteins. We quantified aggregation suppression by in vitro turbidimetry and characterized both macroscopic and microscopic mechanisms of myo-inositol action using negative-stain electron microscopy, differential scanning fluorometry, and a thermal scanning Raman spectroscopy apparatus. Given recent metabolomic evidence that it is dramatically depleted in human cataractous lenses compared to age-matched controls, we suggest that maintaining or restoring healthy levels of myo-inositol in the lens may be a simple, safe, and widely available strategy for reducing the global burden of cataract.
Wang, T., Gong, H. & Shakhnovich, E.I. Improved fragment-based movement with LRFragLib for all-atom Ab initio protein folding. arXiv preprint arXiv:1906.05785 (Submitted). Publisher's VersionAbstract
Fragment-based assembly has been widely used in Ab initio protein folding simulation which can effectively reduce the conformational space and thus accelerate sampling. The efficiency of fragment-based movement as well as the quality of fragment library determine whether the folding process can lead the free energy landscape to the global minimum and help the protein to reach near-native folded state. We designed an improved fragment-based movement, "fragmove", which substituted multiple backbone dihedral angles in every simulation step. This movement strategy was derived from the fragment library generated by LRFragLib, an effective fragment detection algorithm using logistic regression model. We show in replica exchange Monte Carlo (REMC) simulation that "fragmove", when compared with a set of existing movements in REMC, shows significant improved ability at increasing secondary and tertiary predicted model accuracy by 11.24% and 17.98%, respectively and reaching energy minima decreased by 5.72%. Our results demonstrate that this improved movement is more powerful to guide proteins faster to low energy regions of conformational space and promote folding efficiency and predicted model accuracy.
2021
Razban, R.M., Dasmeh, P., Serohijos, A.W.R. & Shakhnovich, E.I. Avoidance of Protein Unfolding Constrains Protein Stability in Long-Term Evolution. Biophysical Journal 120, 12, 2413–2424 (2021). Publisher's VersionAbstract
Every amino acid residue can influence a protein's overall stability, making stability highly susceptible to change throughout evolution. We consider the distribution of protein stabilities evolutionarily permittable under two previously reported protein fitness functions: flux dynamics and misfolding avoidance. We develop an evolutionary dynamics theory and find that it agrees better with an extensive protein stability data set for dihydrofolate reductase orthologs under the misfolding avoidance fitness function rather than the flux dynamics fitness function. Further investigation with ribonuclease H data demonstrates that not any misfolded state is avoided; rather, it is only the unfolded state. At the end, we discuss how our work pertains to the universal protein abundance-evolutionary rate correlation seen across organisms' proteomes. We derive a closed-form expression relating protein abundance to evolutionary rate that captures Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens experimental trends without fitted parameters.
Zhang, Y., Chowdhury, S., Rodrigues, J.V. & Shakhnovich, E. Development of Antibacterial Compounds That Constrain Evolutionary Pathways to Resistance. eLife 10, e64518 (2021). Publisher's VersionAbstract
Antibiotic resistance is a worldwide challenge. A potential approach to block resistance is to simultaneously inhibit WT and known escape variants of the target bacterial protein. Here, we applied an integrated computational and experimental approach to discover compounds that inhibit both WT and trimethoprim (TMP) resistant mutants of E. coli dihydrofolate reductase (DHFR). We identified a novel compound (CD15-3) that inhibits WT DHFR and its TMP resistant variants L28R, P21L and A26T with IC50 50– 75 \textmu M against WT and TMP-resistant strains. Resistance to CD15-3 was dramatically delayed compared to TMP in in vitro evolution. Whole genome sequencing of CD15-3-resistant strains showed no mutations in the target folA locus. Rather, gene duplication of several efflux pumps gave rise to weak (about twofold increase in IC50) resistance against CD15-3. Altogether, our results demonstrate the promise of strategy to develop evolution drugs - compounds which constrain evolutionary escape routes in pathogens.
Zhao, V.Y., Rodrigues, J.V., Lozovsky, E.R., Hartl, D.L. & Shakhnovich, E.I. Switching an Active Site Helix in Dihydrofolate Reductase Reveals Limits to Subdomain Modularity. Biophysical Journal (2021). Publisher's VersionAbstract
To what degree are individual structural elements within proteins modular such that similar structures from unrelated proteins can be interchanged? We study subdomain modularity by creating 20 chimeras of an enzyme, Escherichia colidihydrofolate reductase (DHFR), in which a catalytically important, 10-residue α-helical sequence is replaced by α-helical sequences from a diverse set of proteins. The chimeras stably fold but have a range of diminished thermal stabilities and catalytic activities. Evolutionary coupling analysis indicates that the residues of this α-helix are under selection pressure to maintain catalytic activity in DHFR. Reversion to phenylalanine at key position 31 was found to partially restore catalytic activity, which could be explained by evolutionary coupling values. We performed molecular dynamics simulations using replica exchange with solute tempering. Chimeras with low catalytic activity exhibit nonhelical conformations that block the binding site and disrupt the positioning of the catalytically essential residue D27. Simulation observables and in vitro measurements of thermal stability and substrate-binding affinity are strongly correlated. Several E. coli strains with chromosomally integrated chimeric DHFRs can grow, with growth rates that follow predictions from a kinetic flux model that depends on the intracellular abundance and catalytic activity of DHFR. Our findings show that although α-helices are not universally substitutable, the molecular and fitness effects of modular segments can be predicted by the biophysical compatibility of the replacement segment.
Graff, D.E., Shakhnovich, E.I. & Coley, C.W. Accelerating High-Throughput Virtual Screening through Molecular Pool-Based Active Learning. Chemical Science 12, 22, 7866–7881 (2021). Publisher's VersionAbstract
Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of 108 molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques, previously employed in other scientific discovery problems, can aid in their exploration: a surrogate structure– property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we explore the application of these techniques to computational docking datasets and assess the impact of surrogate model architecture, acquisition function, and acquisition batch size on optimization performance. We observe significant reductions in computational costs; for example, using a directed-message passing neural network we can identify 94.8% or 89.3% of the top-50 000 ligands in a 100M member library after testing only 2.4% of candidate ligands using an upper confidence bound or greedy acquisition strategy, respectively. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.
Bhattacharyya, S., Bershtein, S., Adkar, B.V., Woodard, J.C. & Shakhnovich, E.I. Metabolic Response to Point Mutations Reveals Principles of Modulation of in Vivo Enzyme Activity and Phenotype. Molecular Systems Biology 17, 6, e10200 (2021).Abstract
The relationship between sequence variation and phenotype is poorly understood. Here, we use metabolomic analysis to elucidate the molecular mechanism underlying the filamentous phenotype of E. coli strains that carry destabilizing mutations in dihydrofolate reductase (DHFR). We find that partial loss of DHFR activity causes reversible filamentation despite SOS response indicative of DNA damage, in contrast to thymineless death (TLD) achieved by complete inhibition of DHFR activity by high concentrations of antibiotic trimethoprim. This phenotype is triggered by a disproportionate drop in intracellular dTTP, which could not be explained by drop in dTMP based on the Michaelis–Menten-like in vitro activity curve of thymidylate kinase (Tmk), a downstream enzyme that phosphorylates dTMP to dTDP. Instead, we show that a highly cooperative (Hill coefficient 2.5) in vivo activity of Tmk is the cause of suboptimal dTTP levels. dTMP supplementation rescues filamentation and restores in vivo Tmk kinetics to Michaelis–Menten. Overall, this study highlights the important role of cellular environment in sculpting enzymatic kinetics with system-level implications for bacterial phenotype.
Ranganathan, S. & Shakhnovich, E. Effect of RNA on Morphology and Dynamics of Membraneless Organelles. The Journal of Physical Chemistry B 125, 19, 5035–5044 (2021). Publisher's VersionAbstract
Membraneless organelles (MLOs) are spatiotemporally regulated structures that concentrate multivalent proteins or RNA, often in response to stress. The proteins enriched within MLOs are often classified as high-valency ``scaffolds'' or low-valency ``clients'', with the former being associated with a phase-separation promoting role. In this study, we employ a minimal model for P-body components, with a defined protein– protein interaction network, to study their phase separation at biologically realistic low protein concentrations. Without RNA, multivalent proteins can assemble into solid-like clusters only in the regime of high concentration and stable interactions. RNA molecules promote cluster formation in an RNA-length-dependent manner, even in the regime of weak interactions and low protein volume fraction. Our simulations reveal that long RNA chains act as superscaffolds that stabilize large RNA– protein clusters by recruiting low-valency proteins within them while also ensuring functional ``liquid-like'' turnover of components. Our results suggest that RNA-mediated phase separation could be a plausible mechanism for spatiotemporally regulated phase separation in the cell.
2020
Bitran, A., Jacobs, W.M. & Shakhnovich, E. Validation of DBFOLD: An Efficient Algorithm for Computing Folding Pathways of Complex Proteins. PLOS Computational Biology 16, 11, e1008323 (2020). Publisher's VersionAbstract
Atomistic simulations can provide valuable, experimentally-verifiable insights into protein folding mechanisms, but existing ab initio simulation methods are restricted to only the smallest proteins due to severe computational speed limits. The folding of larger proteins has been studied using native-centric potential functions, but such models omit the potentially crucial role of non-native interactions. Here, we present an algorithm, entitled DBFOLD, which can predict folding pathways for a wide range of proteins while accounting for the effects of non-native contacts. In addition, DBFOLD can predict the relative rates of different transitions within a protein's folding pathway. To accomplish this, rather than directly simulating folding, our method combines equilibrium Monte-Carlo simulations, which deploy enhanced sampling, with unfolding simulations at high temperatures. We show that under certain conditions, trajectories from these two types of simulations can be jointly analyzed to compute unknown folding rates from detailed balance. This requires inferring free energies from the equilibrium simulations, and extrapolating transition rates from the unfolding simulations to lower, physiologically-reasonable temperatures at which the native state is marginally stable. As a proof of principle, we show that our method can accurately predict folding pathways and Monte-Carlo rates for the well-characterized Streptococcal protein G. We then show that our method significantly reduces the amount of computation time required to compute the folding pathways of large, misfolding-prone proteins that lie beyond the reach of existing direct simulation. Our algorithm, which is available online, can generate detailed atomistic models of protein folding mechanisms while shedding light on the role of non-native intermediates which may crucially affect organismal fitness and are frequently implicated in disease.
Zhao, V., Jacobs, W.M. & Shakhnovich, E.I. Effect of Protein Structure on Evolution of Cotranslational Folding. Biophysical Journal 119, 6, 1123–1134 (2020). Publisher's Version
Ranganathan, S. & Shakhnovich, E.I. Dynamic metastable long-living droplets formed by sticker-spacer proteins. Elife 9, e56159 (2020). Publisher's VersionAbstract
Multivalent biopolymers phase separate into membrane-less organelles (MLOs) which exhibit liquid-like behavior. Here, we explore formation of prototypical MOs from multivalent proteins on various time and length scales and show that the kinetically arrested metastable multi-droplet state is a dynamic outcome of the interplay between two competing processes: a diffusion-limited encounter between proteins, and the exhaustion of available valencies within smaller clusters. Clusters with satisfied valencies cannot coalesce readily, resulting in metastable, long-living droplets. In the regime of dense clusters akin to phase-separation, we observe co-existing assemblies, in contrast to the single, large equilibrium-like cluster. A system-spanning network encompassing all multivalent proteins was only observed at high concentrations and large interaction valencies. In the regime favoring large clusters, we observe a slow-down in the dynamics of the condensed phase, potentially resulting in loss of function. Therefore, metastability could be a hallmark of dynamic functional droplets formed by sticker-spacer proteins.
Razban, R.M. & Shakhnovich, E.I. Effects of single mutations on protein stability are Gaussian distributed. Biophysical Journal (2020). Publisher's VersionAbstract
The distribution of protein stability effects is known to be well approximated by a Gaussian distribution from previous empirical fits. Starting from first-principles statistical mechanics, we more rigorously motivate this empirical observation by deriving per-residue-position protein stability effects to be Gaussian. Our derivation requires the number of amino acids to be large, which is satisfied by the standard set of 20 amino acids found in nature. No assumption is needed on the number of residues in close proximity in space, in contrast to previous applications of the central limit theorem to protein energetics. We support our derivation results with computational and experimental data on mutant protein stabilities across all types of protein residues.
Bitran, A., Jacobs, W.M., Zhai, X. & Shakhnovich, E. Cotranslational folding allows misfolding-prone proteins to circumvent deep kinetic traps. Proceedings of the National Academy of Sciences (2020). Publisher's VersionAbstract
Many proteins must adopt a specific structure to perform their functions, and failure to do so has been linked to disease. Although small proteins often fold rapidly and spontaneously to their native conformations, larger proteins are less likely to fold correctly due to the myriad incorrect arrangements they can adopt. Here, we provide mechanistic insights into how this problem can be alleviated if proteins start folding while they are being translated by the ribosome. This process of cotranslational folding biases certain proteins away from misfolded states that tend to hinder spontaneous refolding. Signatures of unusually slow translation suggest that some of these proteins have evolved to fold cotranslationally.Many large proteins suffer from slow or inefficient folding in vitro. It has long been known that this problem can be alleviated in vivo if proteins start folding cotranslationally. However, the molecular mechanisms underlying this improvement have not been well established. To address this question, we use an all-atom simulation-based algorithm to compute the folding properties of various large protein domains as a function of nascent chain length. We find that for certain proteins, there exists a narrow window of lengths that confers both thermodynamic stability and fast folding kinetics. Beyond these lengths, folding is drastically slowed by nonnative interactions involving C-terminal residues. Thus, cotranslational folding is predicted to be beneficial because it allows proteins to take advantage of this optimal window of lengths and thus avoid kinetic traps. Interestingly, many of these proteins’ sequences contain conserved rare codons that may slow down synthesis at this optimal window, suggesting that synthesis rates may be evolutionarily tuned to optimize folding. Using kinetic modeling, we show that under certain conditions, such a slowdown indeed improves cotranslational folding efficiency by giving these nascent chains more time to fold. In contrast, other proteins are predicted not to benefit from cotranslational folding due to a lack of significant nonnative interactions, and indeed these proteins’ sequences lack conserved C-terminal rare codons. Together, these results shed light on the factors that promote proper protein folding in the cell and how biomolecular self-assembly may be optimized evolutionarily.
Chen, Q., Xiao, Y., Shakhnovich, E.I., Zhang, W. & Mu, W. Semi-rational design and molecular dynamics simulations study of the thermostability enhancement of cellobiose 2-epimerases. International Journal of Biological Macromolecules (2020). Publisher's VersionAbstract
Directed evolution using random mutation in vast sequence space leads to the low probability of obtaining target proteins. Emerging engineering strategies with computational tools are developed for more trustable outcomes. We used some semi-rational design methods to modify an industrial enzyme, namely cellobiose 2-epimerase (CE). A mutant was selected for its better thermostability and isomerization activity. The tradeoffs between thermostability, epimerization activity and isomerization activity of the CE mutants were different. To investigate the computational prediction performance of protein stability upon point mutations, molecular dynamics (MD) simulation analyses were conducted. The root mean square deviation (RMSD) and hydrogen bond analyses reproduced the correct trends in stability changes of the wild-type and mutated CEs with relatively high accuracy (correlation coefficients r ~ 0.5–0.8). The simulation temperature and time are important factors that influence the prediction performance. Our result shows that thermostability predictors calculated from MD simulation do better in predicting the thermostability changes of the mutated enzymes than the predictors using static-state information of the enzymes.
2019
Zhou, Q., et al. Common Activation Mechanism of Class A GPCRs. eLife 8, e50279 (2019). Publisher's VersionAbstract
Class A G-protein-coupled receptors (GPCRs) influence virtually every aspect of human physiology. Understanding receptor activation mechanism is critical for discovering novel therapeutics since about one-third of all marketed drugs target members of this family. GPCR activation is an allosteric process that couples agonist binding to G-protein recruitment, with the hallmark outward movement of transmembrane helix 6 (TM6). However, what leads to TM6 movement and the key residue level changes of this movement remain less well understood. Here, we report a framework to quantify conformational changes. By analyzing the conformational changes in 234 structures from 45 class A GPCRs, we discovered a common GPCR activation pathway comprising of 34 residue pairs and 35 residues. The pathway unifies previous findings into a common activation mechanism and strings together the scattered key motifs such as CWxP, DRY, Na+ pocket, NPxxY and PIF, thereby directly linking the bottom of ligand-binding pocket with G-protein coupling region. Site-directed mutagenesis experiments support this proposition and reveal that rational mutations of residues in this pathway can be used to obtain receptors that are constitutively active or inactive. The common activation pathway provides the mechanistic interpretation of constitutively activating, inactivating and disease mutations. As a module responsible for activation, the common pathway allows for decoupling of the evolution of the ligand binding site and G-protein-binding region. Such an architecture might have facilitated GPCRs to emerge as a highly successful family of proteins for signal transduction in nature.
Razban, R.M. Protein melting temperature cannot fully assess whether protein folding free energy underlies the universal abundance–evolutionary rate correlation seen in proteins. Molecular Biology and Evolution (2019). Publisher's VersionAbstract
The protein misfolding avoidance hypothesis explains the universal negative correlation between protein abundance and sequence evolutionary rate across the proteome by identifying protein folding free energy (ΔG) as the confounding variable. Abundant proteins resist toxic misfolding events by being more stable, and more stable proteins evolve slower because their mutations are more destabilizing. Direct supporting evidence consists only of computer simulations. A study taking advantage of a recent experimental breakthrough in measuring protein stability proteome-wide through melting temperature (Tm) (Leuenberger et al. 2017), found weak misfolding avoidance hypothesis support for the Escherichia coli proteome, and no support for the Saccharomyces cerevisiae, Homo sapiens, and Thermus thermophilus proteomes (Plata and Vitkup 2018). I find that the nontrivial relationship between Tm and ΔG and inaccuracy in Tm measurements by Leuenberger et al. 2017 can be responsible for not observing strong positive abundance–Tm and strong negative Tm–evolutionary rate correlations.
Rodrigues, J.V. & Shakhnovich, E.I. Adaptation to mutational inactivation of an essential gene converges to an accessible suboptimal fitness peak. eLife 8, (2019). Publisher's VersionAbstract
The mechanisms of adaptation to inactivation of essential genes remain unknown. Here we inactivate E. coli dihydrofolate reductase (DHFR) by introducing D27G,N,F chromosomal mutations in a key catalytic residue with subsequent adaptation by an automated serial transfer protocol. The partial reversal G27- > C occurred in three evolutionary trajectories. Conversely, in one trajectory for D27G and in all trajectories for D27F,N strains adapted to grow at very low metabolic supplement (folAmix) concentrations but did not escape entirely from supplement auxotrophy. Major global shifts in metabolome and proteome occurred upon DHFR inactivation, which were partially reversed in adapted strains. Loss-of-function mutations in two genes, thyA and deoB, ensured adaptation to low folAmix by rerouting the 2-Deoxy-D-ribose-phosphate metabolism from glycolysis towards synthesis of dTMP. Multiple evolutionary pathways of adaptation converged to a suboptimal solution due to the high accessibility to loss-of-function mutations that block the path to the highest, yet least accessible, fitness peak.
Rodrigues, J.V., Ogbunugafor, C.B., Hartl, D.L. & Shakhnovich, E.I. Chimeric dihydrofolate reductases display properties of modularity and biophysical diversity. Protein Science (2019). Publisher's VersionAbstract
While reverse genetics and functional genomics have long affirmed the role of individual mutations in determining protein function, there have been fewer studies addressing how large‐scale changes in protein sequences, such as in entire modular segments, influence protein function and evolution. Given how recombination can reassort protein sequences, these types of changes may play an underappreciated role in how novel protein functions evolve in nature. Such studies could aid our understanding of whether certain organismal phenotypes related to protein function—such as growth in the presence or absence of an antibiotic—are robust with respect to the identity of certain modular segments. In this study, we combine molecular genetics with biochemical and biophysical methods to gain a better understanding of protein modularity in dihydrofolate reductase (DHFR), an enzyme target of antibiotics also widely used as a model for protein evolution. We replace an integral α‐helical segment of Escherichia coliDHFR with segments from a number of different organisms (many nonmicrobial) and examine how these chimeric enzymes affect organismal phenotypes (e.g., resistance to an antibiotic) as well as biophysical properties of the enzyme (e.g., thermostability). We find that organismal phenotypes and enzyme properties are highly sensitive to the identity of DHFR modules, and that this chimeric approach can create enzymes with diverse biophysical characteristics.
J. S. Loureiro, R., Vila-Viçosa, D., Machuqueiro, M., Shakhnovich, E.I. & F. N. Faísca, P. The Early Phase of β2m Aggregation: An Integrative Computational Study Framed on the D76N Mutant and the ΔN6 Variant. Biomolecules 9, 8, 366 (2019). Publisher's VersionAbstract
Human β2-microglobulin (b2m) protein is classically associated with dialysis-related amyloidosis (DRA). Recently, the single point mutant D76N was identified as the causative agent of a hereditary systemic amyloidosis affecting visceral organs. To get insight into the early stage of the β2m aggregation mechanism, we used molecular simulations to perform an in depth comparative analysis of the dimerization phase of the D76N mutant and the ΔN6 variant, a cleaved form lacking the first six N-terminal residues, which is a major component of ex vivo amyloid plaques from DRA patients. We also provide first glimpses into the tetramerization phase of D76N at physiological pH. Results from extensive protein–protein docking simulations predict an essential role of the C- and N-terminal regions (both variants), as well as of the BC-loop (ΔN6 variant), DE-loop (both variants) and EF-loop (D76N mutant) in dimerization. The terminal regions are more relevant under acidic conditions while the BC-, DE- and EF-loops gain importance at physiological pH. Our results recapitulate experimental evidence according to which Tyr10 (A-strand), Phe30 and His31 (BC-loop), Trp60 and Phe62 (DE-loop) and Arg97 (C-terminus) act as dimerization hot-spots, and further predict the occurrence of novel residues with the ability to nucleate dimerization, namely Lys-75 (EF-loop) and Trp-95 (C-terminus). We propose that D76N tetramerization is mainly driven by the self-association of dimers via the N-terminus and DE-loop, and identify Arg3 (N-terminus), Tyr10, Phe56 (D-strand) and Trp60 as potential tetramerization hot-spots.

Pages