How do living cells achieve sufficient abundances of functional protein complexes while minimizing promiscuous nonfunctional interactions? Here we study this problem using a first-principle model of the cell whose phenotypic traits are directly determined from its genome through biophysical properties of protein structures and binding interactions in a crowded cellular environment. The model cell includes three independent prototypical pathways, whose topologies of protein–protein interaction (PPI) subnetworks are different, but whose contributions to the cell fitness are equal. Model cells evolve through genotypic mutations and phenotypic protein copy number variations. We found a strong relationship between evolved physical–chemical properties of protein interactions and their abundances due to a “frustration” effect: Strengthening of functional interactions brings about hydrophobic interfaces, which make proteins prone to promiscuous binding. The balancing act is achieved by lowering concentrations of hub proteins while raising solubilities and abundances of functional monomers. On the basis of these principles we generated and analyzed a possible realization of the proteome-wide PPI network in yeast. In this simulation we found that high-throughput affinity capture–mass spectroscopy experiments can detect functional interactions with high fidelity only for high-abundance proteins while missing most interactions for low-abundance proteins.
Fitness effects of mutations fall on a continuum ranging from lethal to deleterious to beneficial. The distribution of fitness effects (DFE) among random mutations is an essential component of every evolutionary model and a mathematical portrait of robustness. Recent experiments on five viral species all revealed a characteristic bimodal-shaped DFE featuring peaks at neutrality and lethality. However, the phenotypic causes underlying observed fitness effects are still unknown and presumably, are thought to vary unpredictably from one mutation to another. By combining population genetics simulations with a simple biophysical protein folding model, we show that protein thermodynamic stability accounts for a large fraction of observed mutational effects. We assume that moderately destabilizing mutations inflict a fitness penalty proportional to the reduction in folded protein, which depends continuously on folding free energy (ΔG). Most mutations in our model affect fitness by altering ΔG, whereas based on simple estimates, ∼10% abolish activity and are unconditionally lethal. Mutations pushing ΔG > 0 are also considered lethal. Contrary to neutral network theory, we find that, in mutation/selection/drift steady state, high mutation rates (m) lead to less stable proteins and a more dispersed DFE (i.e., less mutational robustness). Small population size (N) also decreases stability and robustness. In our model, a continuum of nonlethal mutations reduces fitness by ∼2% on average, whereas ∼10–35% of mutations are lethal depending on N and m. Compensatory mutations are common in small populations with high mutation rates. More broadly, we conclude that interplay between biophysical and population genetic forces shapes the DFE.
Mutators are clones whose mutation rate is about two to three orders of magnitude higher than the rate of wild-type clones and their roles in adaptive evolution of asexual populations have been controversial. Here we address this problem by using an ab initio microscopic model of living cells, which combines population genetics with a physically realistic presentation of protein stability and protein-protein interactions. The genome of model organisms encodes replication controlling genes (RCGs) and genes modeling the mismatch repair (MMR) complexes. The genotype-phenotype relationship posits that the replication rate of an organism is proportional to protein copy numbers of RCGs in their functional form and there is a production cost penalty for protein overexpression. The mutation rate depends linearly on the concentration of homodimers of MMR proteins. By simulating multiple runs of evolution of populations under various environmental stresses—stationary phase, starvation or temperature-jump—we find that adaptation most often occurs through transient fixation of a mutator phenotype, regardless of the nature of stress. By contrast, the fixation mechanism does depend on the nature of stress. In temperature jump stress, mutators take over the population due to loss of stability of MMR complexes. In contrast, in starvation and stationary phase stresses, a small number of mutators are supplied to the population via epigenetic stochastic noise in production of MMR proteins (a pleiotropic effect), and their net supply is higher due to reduced genetic drift in slowly growing populations under stressful environments. Subsequently, mutators in stationary phase or starvation hitchhike to fixation with a beneficial mutation in the RCGs, (second order selection) and finally a mutation stabilizing the MMR complex arrives, returning the population to a non-mutator phenotype. Our results provide microscopic insights into the rise and fall of mutators in adapting finite asexual populations.
The population dynamics theory of B cells in a typical germinal center could play an important role in revealing how affinity maturation is achieved. However, the existing models encountered some conflicts with experiments. To resolve these conflicts, we present a coarse-grained model to calculate the B cell population development in affinity maturation, which allows a comprehensive analysis of its parameter space to look for optimal values of mutation rate, selection strength, and initial antibody-antigen binding level that maximize the affinity improvement. With these optimized parameters, the model is compatible with the experimental observations such as the ~100-fold affinity improvements, the number of mutations, the hypermutation rate, and the “all or none” phenomenon. Moreover, we study the reasons behind the optimal parameters. The optimal mutation rate, in agreement with the hypermutation rate in vivo, results from a tradeoff between accumulating enough beneficial mutations and avoiding too many deleterious or lethal mutations. The optimal selection strength evolves as a balance between the need for affinity improvement and the requirement to pass the population bottleneck. These findings point to the conclusion that germinal centers have been optimized by evolution to generate strong affinity antibodies effectively and rapidly. In addition, we study the enhancement of affinity improvement due to B cell migration between germinal centers. These results could enhance our understanding of the functions of germinal centers.
Due to regulatory feedback, biological networks can exist stably in multiple states, leading to heterogeneous phenotypes among genetically identical cells. Random fluctuations in protein numbers, tuned by specific molecular mechanisms, have been hypothesized to drive transitions between these different states. We develop a minimal theoretical framework to analyze the limits of switching in terms of simple experimental parameters. Our model identifies and distinguishes between two distinct molecular mechanisms for generating stochastic switches. In one class of switches, the stochasticity of a single-molecule event, a specific and rare molecular reaction, directly controls the macroscopic change in a cell's state. In the second class, no individual molecular event is significant, and stochasticity arises from the propagation of biochemical noise through many molecular pathways and steps. As an example, we explore switches based on protein-DNA binding fluctuations and predict relations between transcription factor kinetics, absolute switching rate, robustness, and efficiency that differentiate between switching by single-molecule events or many molecular steps. Finally, we apply our methods to recent experimental data on switching in Escherichia coli lactose metabolism, providing quantitative interpretations of a single-molecule switching mechanism.
Systematic Monte Carlo simulations of simple lattice models show that the final stage of protein folding is an ordered process where native contacts get locked (i.e., the residues come into contact and remain in contact for the duration of the folding process) in a well-defined order. The detailed study of the folding dynamics of protein-like sequences designed as to exhibit different contact energy distributions, as well as different degrees of sequence optimization (i.e., participation of non-native interactions in the folding process), reveals significant differences in the corresponding locking scenarios—the collection of native contacts and their average locking times, which are largely ascribable to the dynamics of non-native contacts. Furthermore, strong evidence for a positive role played by non-native contacts at an early folding stage was also found. Interestingly, for topologically simple target structures, a positive interplay between native and non-native contacts is observed also toward the end of the folding process, suggesting that non-native contacts may indeed affect the overall folding process. For target models exhibiting clear two-state kinetics, the relation between the nucleation mechanism of folding and the locking scenario is investigated. Our results suggest that the stabilization of the folding transition state can be achieved through the establishment of a very small network of native contacts that are the first to lock during the folding process.
A previously established multiscale population genetics model posits that fitness can be inferred from the physical properties of proteins under the physiological assumption that a loss of stability by any protein confers the lethal phenotype to an organism. Here, we develop this model further by positing that replication rate (fitness) of a bacterial or viral strain directly depends on the copy number of folded proteins, which determine its replication rate. Using this model, and both numerical and analytical approaches, we studied the adaptation process of bacteria and viruses at varied environmental temperatures. We found that a broad distribution of protein stabilities observed in the model and in experiment is the key determinant of thermal response for viruses and bacteria. Our results explain most of the earlier experimental observations: the striking asymmetry of thermal response curves; the absence of evolutionary tradeoff, which was expected but not found in experiments; correlation between denaturation temperature for several protein families and the optimal growth temperature of their carrier organisms; and proximity of bacterial or viral optimal growth temperatures to their evolutionary temperatures. Our theory quantitatively and with high accuracy described thermal response curves for 35 bacterial species using, for each species, only two adjustable parameters—the number of rate-determining genes and the energy barrier for metabolic reactions.
MPORTANCE OF THE FIELD: De novo drug design serves as a tool for the discovery of new ligands for macromolecular targets as well as optimization of known ligands. Recently developed tools aim to address the multi-objective nature of drug design in an unprecedented manner.
AREAS COVERED IN THIS REVIEW: This article discusses recent advances in de novo drug design programs and accessory programs used to evaluate compounds post-generation.
WHAT THE READER WILL GAIN: The reader is introduced to the challenges inherent in de novo drug design and will become familiar with current trends in de novo design. Furthermore, the reader will be better prepared to assess the value of a tool, and be equipped to design more elegant tools in the future.
TAKE HOME MESSAGE: De novo drug design can assist in the efficient discovery of new compounds with a high affinity for a given target. The inclusion of existing chemoinformatic methods with current structure-based de novo design tools provides a means of enhancing the therapeutic value of these generated compounds
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
We study the population dynamics of lytic viruses which replicate slowly in dividing host cells within an organism or cell culture, and find a range of viral replication rates that allows viruses to persist, avoiding extinction of host cells or dilution of viruses at too rapid or too slow viral replication. For the within-host competition between viral strains with different replication rates, a strain with a "stable” replication rate in the persistence range could outcompete another strain. However, when strains with higher and lower than the stable value replication rates are both present, competition between strains does not result in the dominance of one strain, but in their coexistence.
Recent work has shown that the incorporation of an all-hydrocarbon ?staple? into peptides can greatly increase their α-helix propensity, leading to an improvement in pharmaceutical properties such as proteolytic stability, receptor affinity, and cell permeability. Stapled peptides thus show promise as a new class of drugs capable of accessing intractable targets such as those that engage in intracellular protein?protein interactions. The extent of α-helix stabilization provided by stapling has proven to be substantially context dependent, requiring cumbersome screening to identify the optimal site for staple incorporation. In certain cases, a staple encompassing one turn of the helix (attached at residues i and i+4) furnishes greater helix stabilization than one encompassing two turns (i,i+7 staple), which runs counter to expectation based on polymer theory. These findings highlight the need for a more thorough understanding of the forces that underlie helix stabilization by hydrocarbon staples. Here we report all-atom Monte Carlo folding simulations comparing unmodified peptides derived from RNase A and BID BH3 with various i,i+4 and i,i+7 stapled versions thereof. The results of these simulations were found to be in quantitative agreement with experimentally determined helix propensities. We also discovered that staples can stabilize quasi-stable decoy conformations, and that the removal of these states plays a major role in determining the helix stability of stapled peptides. Finally, we critically investigate why our method works, exposing the underlying physical forces that stabilize stapled peptides.
We show how single-molecule unzipping experiments can provide strong evidence that the zero-force melting transition of long molecules of natural dsDNA should be classified as a phase transition of the higher-order type (continuous). Toward this end, we study a statistical-mechanics model for the fluctuating structure of a long molecule of dsDNA, and compute the equilibrium phase diagram for the experiment in which the molecule is unzipped under applied force. We consider a perfect-matching dsDNA model, in which the loops are volume-excluding chains with arbitrary loop exponent c. We include stacking interactions, hydrogen bonds, and main-chain entropy. We include sequence heterogeneity at the level of random sequences; in particular, there is no correlation in the base-pairing (bp) energy from one sequence position to the next. We present heuristic arguments to demonstrate that the low-temperature macrostate does not exhibit degenerate ergodicity breaking. We use this claim to understand the results of our replica-theoretic calculation of the equilibrium properties of the system. As a function of temperature, we obtain the minimal force at which the molecule separates completely. This critical-force curve is a line in the temperature-force phase diagram that marks the regions where the molecule exists primarily as a double helix versus the region where the molecule exists as two separate strands. We compare our random-sequence model to magnetic tweezer experiments performed on the 48 502 bp genome of bacteriophage λ. We find good agreement with the experimental data, which is restricted to temperatures between 24 and 50 °C. At higher temperatures, the critical-force curve of our random-sequence model is very different for that of the homogeneous-sequence version of our model. For both sequence models, the critical force falls to zero at the melting temperature Tc like |T−Tc|α. For the homogeneous-sequence model, α=1/2 almost exactly, while for the random-sequence model, α≈0.9. Importantly, the shape of the critical-force curve is connected, via our theory, to the manner in which the helix fraction falls to zero at Tc. The helix fraction is the property that is used to classify the melting transition as a type of phase transition. In our calculation, the shape of the critical-force curve holds strong evidence that the zero-force melting transition of long natural dsDNA should be classified as a higher-order (continuous) phase transition. Specifically, the order is 3rd or greater.
Which factors govern the evolution of mutation rates and emergence of species? Here, we address this question by using a first principles model of life where population dynamics of asexual organisms is coupled to molecular properties and interactions of proteins encoded in their genomes. Simulating evolution of populations, we found that fitness increases in punctuated steps via epistatic events, leading to formation of stable and functionally interacting proteins. At low mutation rates, species form populations of organisms tightly localized in sequence space, whereas at higher mutation rates, species are lost without an apparent loss of fitness. However, when mutation rate was a selectable trait, the population initially maintained high mutation rate until a high fitness level was reached, after which organisms with low mutation rates are gradually selected, with the population eventually reaching mutation rates comparable with those of modern DNA-based organisms. This study shows that the fitness landscape of a biophysically realistic system is extremely complex, with huge number of local peaks rendering adaptation dynamics to be a glass-like process. On a more practical level, our results provide a rationale to experimental observations of the effect of mutation rate on fitness of populations of asexual organisms.
T lymphocytes (T cells) orchestrate adaptive immune responses upon activation. T-cell activation requires sufficiently strong binding of T-cell receptors on their surface to short peptides (p) derived from foreign proteins, which are bound to major histocompatibility gene products (displayed on antigen-presenting cells). A diverse and self-tolerant T-cell repertoire is selected in the thymus. We map thymic selection processes to an extreme value problem and provide an analytic expression for the amino acid compositions of selected T-cell receptors (which enable its recognition functions).
In this work we study how mutations that change physical properties of cell proteins (stability) affect population survival and growth. We present a model in which the genotype is presented as a set folding free energies of cell proteins. Mutations occur upon replication, so stabilities of some proteins in daughter cells differ from those in the parent cell by amounts deduced from the distribution of mutational effects on protein stability. The genotype–phenotype relationship posits that the cell's fitness (replication rate) is proportional to the concentration of its folded proteins and that unstable essential proteins result in lethality. Simulations reveal that lethal mutagenesis occurs at a mutation rate close to seven mutations in each replication of the genome for RNA viruses and at about half that rate for DNA-based organisms, in accord with earlier predictions from analytical theory and experimental results. This number appears somewhat dependent on the number of genes in the organisms and the organism's natural death rate. Further, our model reproduces the distribution of stabilities of natural proteins, in excellent agreement with experiments. We find that species with high mutation rates tend to have less stable proteins compared to species with low mutation rates.
An essential feature of all practical de novo molecule generating programs is the ability to focus the potential combinatorial explosion of grown molecules on a desired chemical space. It is a daunting task to balance the generation of new molecules with limitations on growth that produce desired features such as stability in water, synthetic accessibility, or druglikeness. We have developed an algorithm, Fragment Optimized Growth (FOG), which statistically biases the growth of molecules with desired features. At the heart of the algorithm is a Markov Chain which adds fragments to the nascent molecule in a biased manner, depending on the frequency of specific fragment-fragment connections in the database of chemicals it was trained on. We show that in addition to generating synthetically feasible molecules, it can be trained to grow new molecules that resemble desired classes of molecules such as drugs, natural products, and diversity-oriented synthetic products. In order to classify our grown molecules, we developed the Topology Classifier (TopClass) algorithm that is capable of classifying compounds, for example as drugs or nondrugs. The classification accuracies obtained with TopClass compare favorably with the literature. Furthermore, in contrast to “black-box” approaches such as Neural Networks, TopClass brings to light characteristics of drugs that distinguish them from nondrugs.
Although far from routine, simulating the folding of specific short protein chains on the computer, at a detailed atomic level, is starting to become a reality. This remarkable progress, which has been made over the last decade or so, allows a fundamental aspect of the protein folding process to be addressed, namely its statistical nature. In order to make quantitative comparisons with experimental kinetic data a complete ensemble view of folding must be achieved, with key observables averaged over the large number of microscopically different folding trajectories available to a protein chain. Here we review recent advances in atomic-level protein folding simulations and the new insight provided by them into the protein folding process. An important element in understanding ensemble folding kinetics are methods for analyzing many separate folding trajectories, and we discuss techniques developed to condense the large amount of information contained in an ensemble of trajectories into a manageable picture of the folding process.
The scale free structure p(k) ~ k−γ of protein–protein interaction networks can be reproduced by a static physical model in simulation. We inspect the model theoretically, and find the key reason for the model generating apparent scale free degree distributions. This explanation provides a generic mechanism of 'scale free' networks. Moreover, we predict the dependence of γ on experimental protein concentrations or other sensitivity factors in detecting interactions, and find experimental evidence to support the prediction.
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales — from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.