We report a detailed all-atom simulation of the folding of the GCAA RNA tetraloop. The GCAA tetraloop motif is a very common and thermodynamically stable secondary structure in natural RNAs. We use our simulation methods to study the folding behavior of a 12-base GCAA tetraloop structure with a four-base helix adjacent to the tetraloop proper. We implement an all-atom Monte Carlo (MC) simulation of RNA structural dynamics using a Go potential. Molecular dynamics (MD) simulation of RNA and protein has realistic energetics and sterics, but is extremely expensive in terms of computational time. By coarsely treating non-covalent energetics, but retaining all-atom sterics and entropic effects, all-atom MC techniques are a useful method for the study of protein and now RNA. We observe a sharp folding transition for this structure, and in simulations at room temperature the state histogram shows three distinct minima: an unfolded state (U), a more narrow intermediated state (I), and a narrow folded state (F). The intermediate consists primarily of structures with the GCAA loop and some helix hydrogen bonds formed. Repeated kinetic folding simulations reveal that the number of helix base-pairs forms a simple 1D reaction coordinate for the I→N transition.
Computational studies of protein folding have implicitly assumed that folding occurs from a denatured state comprised of the entire protein. Cotranslational folding accounts for the linear production and release of a protein from the ribosome, allowing part of the protein to explore its conformation space before other parts have been synthesized. This gradual ‘extrusion’ from the ribosome can yield different folding kinetics than direct folding from the denatured state, for a lattice folding model. First, in model proteins containing chiefly short-ranged (local in sequence) contacts, cotranslational folding is shown to be significantly faster than direct folding from the denatured state. Secondly, for model proteins with two competing native states, cotranslational folding tilts the apparent equilibrium toward the state with a more local-contact dominant topology.
A flexible protein?peptide docking method has been designed to consider not only ligand flexibility but also the flexibility of the protein. The method is based on a Monte Carlo annealing process. Simulations with a distance root-mean-square (dRMS) virtual energy function revealed that the flexibility of protein side chains was as important as ligand flexibility for successful protein?peptide docking. On the basis of mean field theory, a transferable potential was designed to evaluate distance-dependent protein?ligand interactions and atomic solvation energies. The potential parameters were developed using a self-consistent process based on only 10 known complex structures. The effectiveness of each intermediate potential was judged on the basis of a Z score, approximating the gap between the energy of the native complex and the average energy of a decoy set. The Z score was determined using experimentally determined native structures and decoys generated by docking with the intermediate potentials. Using 6600 generated decoys and the Z score optimization criterion proposed in this work, the developed potential yielded an acceptable correlation of R2 = 0.77, with binding free energies determined for known MHC I complexes (Class I Major Histocompatibility protein HLA-A*0201) which were not present in the training set. Test docking on 25 complexes further revealed a significant correlation between energy and dRMS, important for identifying native-like conformations. The near-native structures always belonged to one of the conformational classes with lower predicted binding energy. The lowest energy docked conformations are generally associated with near-native conformations, less than 3.0 Å dRMS (and in many cases less than 1.0 Å) from the experimentally determined structures.
The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, http://romi.bu.edu/elisa) is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function
Understanding of the evolutionary origins of protein structures represents a key component of the understanding of molecular evolution as a whole. Here we seek to elucidate how the features of an underlying protein structural "space” might impact protein structural evolution. We approach this question using lattice polymers as a completely characterized model of this space. We develop a measure of structural comparison of lattice structures that is analogous to the one used to understand structural similarities between real proteins. We use this measure of structural relatedness to create a graph of lattice structures and compare this graph (in which nodes are lattice structures and edges are defined using structural similarity) to the graph obtained for real protein structures. We find that the graph obtained from all compact lattice structures exhibits a distribution of structural neighbors per node consistent with a random graph. We also find that subgraphs of 3500 nodes chosen either at random or according to physical constraints also represent random graphs. We develop a divergent evolution model based on the lattice space which produces graphs that, within certain parameter regimes, recapitulate the scale-free behavior observed in similar graphs of real protein structures.
This Letter develops an analytically tractable model for determining the equilibrium distribution of mismatch repair deficient strains in unicellular populations. The approach is based on the single fitness peak model, which has been used in Eigen’s quasispecies equations in order to understand various aspects of evolutionary dynamics. As with the quasispecies model, our model for mutator-nonmutator equilibrium undergoes a phase transition in the limit of infinite sequence length. This "repair catastrophe” occurs at a critical repair error probability of ϵr=Lvia/L, where Lvia denotes the length of the genome controlling viability, while L denotes the overall length of the genome. The repair catastrophe therefore occurs when the repair error probability exceeds the fraction of deleterious mutations. Our model also gives a quantitative estimate for the equilibrium fraction of mutators in Escherichia coli.
Here we present an approximate analytical theory for the relationship between a protein structure’s contact matrix and the shape of its energy spectrum in amino acid sequence space. We demonstrate a dependence of the number of sequences of low energy in a structure on the eigenvalues of the structure’s contact matrix, and then use a Monte Carlo simulation to test the applicability of this analytical result to cubic lattice proteins. We find that the lattice structures with the most low-energy sequences are the same as those predicted by the theory. We argue that, given sufficiently strict requirements for foldability, these structures are the most designable, and we propose a simple means to test whether the results in this paper hold true for real proteins.
We describe a method of designing artificial sequences that resemble naturally occurring sequences in terms of their compatibility with a template structure and its functional constraints. The design procedure is a Monte Carlo simulation of amino acid substitution process. The selective fixation of substitutions is dictated by a simple scoring function derived from the template structure and a multiple alignment of its homologs. Designed sequences represent an enlargement of sequence space around native sequences. We show that the use of designed sequences improves the performance of profile-based homology detection. The difference in position-specific conservation between designed sequences and native sequences is helpful for prediction of functionally important residues. Our sequence selection criteria in evolutionary simulations introduce amino acid substitution rate variation among sites in a natural way, providing a better model to test phylogenetic methods.
The binding between a PK and its target is highly specific, despite the fact that many different PKs exhibit significant sequence and structure homology. There must be, then, specificity-determining residues (SDRs) that enable different PKs to recognize their unique substrate. Here we use and further develop a computational procedure to discover putative SDRs (PSDRs) in protein families, whereby a family of homologous proteins is split into orthologous proteins, which are assumed to have the same specificity, and paralogous proteins, which have different specificities. We reason that PSDRs must be similar among orthologs, whereas they must necessarily be different among paralogs. Our statistical procedure and evolutionary model identifies such residues by discriminating a functional signal from a phylogenetic one. As case studies we investigate the prokaryotic two-component system and the eukaryotic AGC (i.e., cAMP-dependent PK, cGMP-dependent PK, and PKC) PKs. Without using experimental data, we predict PSDRs in prokaryotic and eukaryotic PKs, and suggest precise mutations that may convert the specificity of one PK to another. We compare our predictions with current experimental results and obtain considerable agreement with them. Our analysis unifies much of existing data on PK specificity. Finally, we find PSDRs that are outside the active site. Based on our results, as well as structural and biochemical characterizations of eukaryotic PKs, we propose the testable hypothesis of "specificity via differential activation” as a way for the cell to control kinase specificity.
Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.
This chapter presents an overview of computational techniques for reconstructing the folding mechanisms of proteins from their crystal structures. The chapter describes analytical and computational tools for determining and characterizing protein-folding kinetics from crystal structures. It also discusses new protein model and shows that the thermodynamics of Src SH3 from molecular dynamics simulations is consistent with that observed experimentally. The revolution in protein crystallography has resulted in the identification of a large number of protein structures. The chapter discusses several studies that are based on the Go model. In one such study, it identifies the most evasive protein-folding transition state ensemble for Src SH3 protein and finds that it is consistent with experimental observations. These studies suggest that protein-folding kinetics can be determined to a reasonably detailed level from the knowledge of crystal structure. The chapter also dissects the transition state ensemble by studying the wiring properties of protein graphs. The structural properties of protein graphs are related to protein topology and thus may explain the kinetics of the folding process. These studies unveil the expanding possibilities for studying protein-folding kinetics from their crystal structures.
An open question of great interest in biophysics is whether variations in structure cause protein folds to differ in the number of amino acid sequences that can fold to them stably, i.e., in their designability. Recently, we have shown that a novel quantitative measure of a fold's tertiary topology, called its contact trace, strongly correlates with the fold's designability. Here, we investigate the relationship between a fold's contact trace and its relative frequency of usage in mesophilic vs. thermophilic eubacteria. We observe that thermophilic organisms exhibit a bias toward using folds of higher contact trace when compared with mesophiles. We establish this difference both for the distributions of folds at the whole-proteome level and also through more focused structural comparisons of orthologous proteins. Our findings suggest that thermophilic adaptation in bacterial genomes occurs in part through natural selection of more designable folds, pointing to designability as a key component of protein fitness.
We analyze the equilibrium response of random heteropolymers to mechanical deformation. In contrast to homopolymer response, the stress-induced transformation of a heteropolymer from globule to coil need not be sharp. For chain lengths relevant to biological macromolecules, intermediate necklacelike structures dominate over a range of applied force. Stability of these conformations is primarily a consequence of solvation: In a typical necklace, relatively solvophilic regions of the chain are extended, while solvophobic regions remain compact. In the long-chain limit, homopolymeric behavior is recovered. Our results suggest that only select polypeptide sequences should unfold reproducibly at a specific force, explaining recent experimental observations.
The concept of the protein transition state ensemble (TSE), a collection of the conformations that have 50% probability to convert rapidly to the folded state and 50% chance to rapidly unfold, constitutes the basis of the modern interpretation of protein engineering experiments. It has been conjectured that conformations constituting the TSE in many proteins are the expanded and distorted forms of the native state built around a specific folding nucleus. This view has been supported by a number of on-lattice and off-lattice simulations. Here we report a direct observation and characterization of the TSE by molecular dynamic folding simulations of the C-Src SH3 domain, a small protein that has been extensively studied experimentally. Our analysis reveals a set of key interactions between residues, conserved by evolution, that must be formed to enter the kinetic basin of attraction of the native state.
Computational methods are becoming increasingly used in the drug discovery process. In this Account, we review a novel computational method for lead discovery. This method, called CombiSMoG for ?combinatorial small molecule growth?, is based on two components:? a fast and accurate knowledge-based scoring function used to predict binding affinities of protein?ligand complexes, and a Monte Carlo combinatorial growth algorithm that generates large numbers of low-free-energy ligands in the binding site of a protein. We illustrate the advantages of the method by describing its application in the design of picomolar inhibitors for human carbonic anhydrase.
We introduce a modified version of protein lattice models in which monomers have several spin states, representing side-chain rotamers. Completion of folding corresponds to reaching the native backbone configuration with complete ordering of side chains. We find that as temperature is lowered, side-chain ordering becomes much slower than backbone folding. The presence of side chains leads to nonexponential kinetics and a broad distribution of relaxation times.
It has been conjectured that evolution exerted pressure to preserve amino acids bearing thermodynamic, kinetic, and functional roles. In this letter we show that the physical requirement to maintain protein stability gives rise to a sequence conservatism pattern that is in remarkable agreement with that found in natural proteins. Based on the physical properties of amino acids, we propose a model of evolution that explains conserved amino acids across protein families sharing the same fold.
We use molecular dynamics simulation to study the aggregation of Src SH3 domain proteins. For the case of two proteins, we observe two possible aggregation conformations: the closed form dimer and the open aggregation state. The closed dimer is formed by "domain swapping”—the two proteins exchange their RT-loops. All the hydrophobic residues are buried inside the dimer so proteins cannot further aggregate into elongated amyloid fibrils. We find that the open structure—stabilized by backbone hydrogen bond interactions—packs the RT-loops together by swapping the two strands of the RT-loop. The packed RT-loops form a β-sheet structure and expose the backbone to promote further aggregation. We also simulate more than two proteins, and find that the aggregate adopts a fibrillar double β-sheet structure, which is formed by packing the RT-loops from different proteins. Our simulations are consistent with a possible generic amyloidogenesis scenario.