Publications

2004
Morrissey, M.P., Ahmed, Z. & Shakhnovich, E.I. The role of cotranslation in protein folding: a lattice model study. Polymer 45, 2, 557 - 571 (2004). Publisher's VersionAbstract
Computational studies of protein folding have implicitly assumed that folding occurs from a denatured state comprised of the entire protein. Cotranslational folding accounts for the linear production and release of a protein from the ribosome, allowing part of the protein to explore its conformation space before other parts have been synthesized. This gradual ‘extrusion’ from the ribosome can yield different folding kinetics than direct folding from the denatured state, for a lattice folding model. First, in model proteins containing chiefly short-ranged (local in sequence) contacts, cotranslational folding is shown to be significantly faster than direct folding from the denatured state. Secondly, for model proteins with two competing native states, cotranslational folding tilts the apparent equilibrium toward the state with a more local-contact dominant topology.
Liu, Z., Dominy, B.N. & Shakhnovich, E.I. Structural Mining:  Self-Consistent Design on Flexible Protein−Peptide Docking and Transferable Binding Affinity Potential. J. Am. Chem. Soc. 126, 27, 8515 - 8528 (2004). Publisher's VersionAbstract
A flexible protein?peptide docking method has been designed to consider not only ligand flexibility but also the flexibility of the protein. The method is based on a Monte Carlo annealing process. Simulations with a distance root-mean-square (dRMS) virtual energy function revealed that the flexibility of protein side chains was as important as ligand flexibility for successful protein?peptide docking. On the basis of mean field theory, a transferable potential was designed to evaluate distance-dependent protein?ligand interactions and atomic solvation energies. The potential parameters were developed using a self-consistent process based on only 10 known complex structures. The effectiveness of each intermediate potential was judged on the basis of a Z score, approximating the gap between the energy of the native complex and the average energy of a decoy set. The Z score was determined using experimentally determined native structures and decoys generated by docking with the intermediate potentials. Using 6600 generated decoys and the Z score optimization criterion proposed in this work, the developed potential yielded an acceptable correlation of R2 = 0.77, with binding free energies determined for known MHC I complexes (Class I Major Histocompatibility protein HLA-A*0201) which were not present in the training set. Test docking on 25 complexes further revealed a significant correlation between energy and dRMS, important for identifying native-like conformations. The near-native structures always belonged to one of the conformational classes with lower predicted binding energy. The lowest energy docked conformations are generally associated with near-native conformations, less than 3.0 Å dRMS (and in many cases less than 1.0 Å) from the experimentally determined structures.
2003
Chen, W., Mirny, L. & Shakhnovich, E.I. Fold recognition with minimal gaps. Proteins: Structure, Function, and Bioinformatics 51, 4, 531 - 543 (2003). Publisher's VersionAbstract
Here we present a simplified form of threading that uses only a 20 × 20 two-body residue-based potential and restricted number of gaps. Despite its simplicity and transparency the Monte Carlo-based threading algorithm performs very well in a rigorous test of fold recognition. The results suggest that by simplifying and constraining the decoy space, one can achieve better fold recognition. Fold recognition results are compared with and supplemented by a PSI-BLAST search. The statistical significance of threading results is rigorously evaluated from statistics of extremes by comparison with optimal alignments of a large set of randomly shuffled sequences. The statistical theory, based on the Random Energy Model, yields a cumulative statistical parameter, ϵ, that attests to the likelihood of correct fold recognition. A large ϵ indicates a significant energy gap between the optimal alignment and decoy alignments and, consequently, a high probability that the fold is correctly recognized. For a particular number of gaps, the ϵ parameter reaches its maximal value, and the fold is recognized. As the number of gaps further increases, the likelihood of correct fold recognition drops off. This is because the decoy space is small when gaps are restricted to a small number, but the native alignment is still well approximated, whereas unrestricted increase of the number of gaps leads to rapid growth of the number of decoys and their statistical dominance over the correct alignment. It is shown that best results are obtained when a combination of one-, two-, and three-gap threading is used. To this end, use of the ϵ parameter is crucial for rigorous comparison of results across the different decoy spaces belonging to a different number of gaps. Proteins 2003;51:531–543. © 2003 Wiley-Liss, Inc.
Shakhnovich, B.E., et al. ELISA: Structure-Function Inferences based on statistically significant and evolutionarily inspired observations. BMC Bioinformatics 4, 1, (2003). Publisher's VersionAbstract
The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, http://romi.bu.edu/elisa) is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function
Deeds, E.J., Dokholyan, N.V. & Shakhnovich, E.I. Protein Evolution within a Structural Space. Biophysical Journal 85, 5, 2962 - 2972 (2003). Publisher's VersionAbstract
Understanding of the evolutionary origins of protein structures represents a key component of the understanding of molecular evolution as a whole. Here we seek to elucidate how the features of an underlying protein structural "space” might impact protein structural evolution. We approach this question using lattice polymers as a completely characterized model of this space. We develop a measure of structural comparison of lattice structures that is analogous to the one used to understand structural similarities between real proteins. We use this measure of structural relatedness to create a graph of lattice structures and compare this graph (in which nodes are lattice structures and edges are defined using structural similarity) to the graph obtained for real protein structures. We find that the graph obtained from all compact lattice structures exhibits a distribution of structural neighbors per node consistent with a random graph. We also find that subgraphs of 3500 nodes chosen either at random or according to physical constraints also represent random graphs. We develop a divergent evolution model based on the lattice space which produces graphs that, within certain parameter regimes, recapitulate the scale-free behavior observed in similar graphs of real protein structures.
Tannenbaum, E., Deeds, E.J. & Shakhnovich, E.I. Equilibrium Distribution of Mutators in the Single Fitness Peak Model. Phys. Rev. Lett. 91, 13, 138105 (2003). Publisher's VersionAbstract
This Letter develops an analytically tractable model for determining the equilibrium distribution of mismatch repair deficient strains in unicellular populations. The approach is based on the single fitness peak model, which has been used in Eigen’s quasispecies equations in order to understand various aspects of evolutionary dynamics. As with the quasispecies model, our model for mutator-nonmutator equilibrium undergoes a phase transition in the limit of infinite sequence length. This "repair catastrophe” occurs at a critical repair error probability of ϵr=Lvia/L, where Lvia denotes the length of the genome controlling viability, while L denotes the overall length of the genome. The repair catastrophe therefore occurs when the repair error probability exceeds the fraction of deleterious mutations. Our model also gives a quantitative estimate for the equilibrium fraction of mutators in Escherichia coli.
England, J.L. & Shakhnovich, E.I. Structural Determinant of Protein Designability. Phys. Rev. Lett. 90, 21, 218101 (2003). Publisher's VersionAbstract
Here we present an approximate analytical theory for the relationship between a protein structure’s contact matrix and the shape of its energy spectrum in amino acid sequence space. We demonstrate a dependence of the number of sequences of low energy in a structure on the eigenvalues of the structure’s contact matrix, and then use a Monte Carlo simulation to test the applicability of this analytical result to cubic lattice proteins. We find that the lattice structures with the most low-energy sequences are the same as those predicted by the theory. We argue that, given sufficiently strict requirements for foldability, these structures are the most designable, and we propose a simple means to test whether the results in this paper hold true for real proteins.
Pei, J., Dokholyan, N.V., Shakhnovich, E.I. & Grishin, N.V. Using protein design for homology detection and active site searches. Proc. Natl. Acad. Sci. USA 100, 20, 11361 - 11366 (2003). Publisher's VersionAbstract
We describe a method of designing artificial sequences that resemble naturally occurring sequences in terms of their compatibility with a template structure and its functional constraints. The design procedure is a Monte Carlo simulation of amino acid substitution process. The selective fixation of substitutions is dictated by a simple scoring function derived from the template structure and a multiple alignment of its homologs. Designed sequences represent an enlargement of sequence space around native sequences. We show that the use of designed sequences improves the performance of profile-based homology detection. The difference in position-specific conservation between designed sequences and native sequences is helpful for prediction of functionally important residues. Our sequence selection criteria in evolutionary simulations introduce amino acid substitution rate variation among sites in a natural way, providing a better model to test phylogenetic methods.
Li, L., Shakhnovich, E.I. & Mirny, L.A. Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases. Proc. Natl. Acad. Sci. USA 100, 8, 4463 - 4468 (2003). Publisher's VersionAbstract
The binding between a PK and its target is highly specific, despite the fact that many different PKs exhibit significant sequence and structure homology. There must be, then, specificity-determining residues (SDRs) that enable different PKs to recognize their unique substrate. Here we use and further develop a computational procedure to discover putative SDRs (PSDRs) in protein families, whereby a family of homologous proteins is split into orthologous proteins, which are assumed to have the same specificity, and paralogous proteins, which have different specificities. We reason that PSDRs must be similar among orthologs, whereas they must necessarily be different among paralogs. Our statistical procedure and evolutionary model identifies such residues by discriminating a functional signal from a phylogenetic one. As case studies we investigate the prokaryotic two-component system and the eukaryotic AGC (i.e., cAMP-dependent PK, cGMP-dependent PK, and PKC) PKs. Without using experimental data, we predict PSDRs in prokaryotic and eukaryotic PKs, and suggest precise mutations that may convert the specificity of one PK to another. We compare our predictions with current experimental results and obtain considerable agreement with them. Our analysis unifies much of existing data on PK specificity. Finally, we find PSDRs that are outside the active site. Based on our results, as well as structural and biochemical characterizations of eukaryotic PKs, we propose the testable hypothesis of "specificity via differential activation” as a way for the cell to control kinase specificity.
Kussell, E., Shimada, J. & Shakhnovich, E.I. Side-chain dynamics and protein folding. Proteins: Structure, Function, and Bioinformatics 52, 2, 303 - 321 (2003). Publisher's VersionAbstract
The processes by which protein side chains reach equilibrium during a folding reaction are investigated using both lattice and all-atom simulations. We find that rates of side-chain relaxation exhibit a distribution over the protein structure, with the fastest relaxing side chains located in positions kinetically important for folding. Traversal of the major folding transition state corresponds to the freezing of a small number of side chains, belonging to the folding nucleus, whereas the rest of the protein proceeds toward equilibrium via backbone fluctuations around the native fold. The postnucleation processes by which side chains relax are characterized by very slow dynamics and many barrier crossings, and thus resemble the behavior of a glass. Proteins 2003;52:303–321. © 2003 Wiley-Liss, Inc.
Shakhnovich, B.E., Dokholyan, N.V., DeLisi, C. & Shakhnovich, E.I. Functional Fingerprints of Folds: Evidence for Correlated Structure–Function Evolution. Journal of Molecular Biology 326, 1, 1 - 9 (2003). Publisher's VersionAbstract
Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.
Dokholyan, N.V., et al. Identifying Importance of Amino Acids for Protein Folding from Crystal Structures. Methods in Enzymology 374, 616 - 638 (2003). Publisher's VersionAbstract
This chapter presents an overview of computational techniques for reconstructing the folding mechanisms of proteins from their crystal structures. The chapter describes analytical and computational tools for determining and characterizing protein-folding kinetics from crystal structures. It also discusses new protein model and shows that the thermodynamics of Src SH3 from molecular dynamics simulations is consistent with that observed experimentally. The revolution in protein crystallography has resulted in the identification of a large number of protein structures. The chapter discusses several studies that are based on the Go model. In one such study, it identifies the most evasive protein-folding transition state ensemble for Src SH3 protein and finds that it is consistent with experimental observations. These studies suggest that protein-folding kinetics can be determined to a reasonably detailed level from the knowledge of crystal structure. The chapter also dissects the transition state ensemble by studying the wiring properties of protein graphs. The structural properties of protein graphs are related to protein topology and thus may explain the kinetics of the folding process. These studies unveil the expanding possibilities for studying protein-folding kinetics from their crystal structures.
England, J.L., Shakhnovich, B.E. & Shakhnovich, E.I. Natural selection of more designable folds: A mechanism for thermophilic adaptation. Proc. Natl. Acad. Sci. USA 100, 15, 8727 - 8731 (2003). Publisher's VersionAbstract
An open question of great interest in biophysics is whether variations in structure cause protein folds to differ in the number of amino acid sequences that can fold to them stably, i.e., in their designability. Recently, we have shown that a novel quantitative measure of a fold's tertiary topology, called its contact trace, strongly correlates with the fold's designability. Here, we investigate the relationship between a fold's contact trace and its relative frequency of usage in mesophilic vs. thermophilic eubacteria. We observe that thermophilic organisms exhibit a bias toward using folds of higher contact trace when compared with mesophiles. We establish this difference both for the distributions of folds at the whole-proteome level and also through more focused structural comparisons of orthologous proteins. Our findings suggest that thermophilic adaptation in bacterial genomes occurs in part through natural selection of more designable folds, pointing to designability as a key component of protein fitness.
2002
Geissler, P.L. & Shakhnovich, E.I. Reversible stretching of random heteropolymers. Phys. Rev. E 65, 5, 056110 (2002). Publisher's VersionAbstract
We analyze the equilibrium response of random heteropolymers to mechanical deformation. In contrast to homopolymer response, the stress-induced transformation of a heteropolymer from globule to coil need not be sharp. For chain lengths relevant to biological macromolecules, intermediate necklacelike structures dominate over a range of applied force. Stability of these conformations is primarily a consequence of solvation: In a typical necklace, relatively solvophilic regions of the chain are extended, while solvophobic regions remain compact. In the long-chain limit, homopolymeric behavior is recovered. Our results suggest that only select polypeptide sequences should unfold reproducibly at a specific force, explaining recent experimental observations.
Ding, F., Dokholyan, N.V., Buldyrev, S.V., Stanley, E.H. & Shakhnovich, E.I. Direct Molecular Dynamics Observation of Protein Folding Transition State Ensemble. Biophysical Journal 83, 6, 3525 - 3532 (2002). Publisher's VersionAbstract
The concept of the protein transition state ensemble (TSE), a collection of the conformations that have 50% probability to convert rapidly to the folded state and 50% chance to rapidly unfold, constitutes the basis of the modern interpretation of protein engineering experiments. It has been conjectured that conformations constituting the TSE in many proteins are the expanded and distorted forms of the native state built around a specific folding nucleus. This view has been supported by a number of on-lattice and off-lattice simulations. Here we report a direct observation and characterization of the TSE by molecular dynamic folding simulations of the C-Src SH3 domain, a small protein that has been extensively studied experimentally. Our analysis reveals a set of key interactions between residues, conserved by evolution, that must be formed to enter the kinetic basin of attraction of the native state.
Grzybowski, B.A., Ishchenko, A.V., Shimada, J. & Shakhnovich, E.I. From Knowledge-Based Potentials to Combinatorial Lead Design in Silico. Acc. Chem. Res. 35, 5, 261 - 269 (2002). Publisher's VersionAbstract
Computational methods are becoming increasingly used in the drug discovery process. In this Account, we review a novel computational method for lead discovery. This method, called CombiSMoG for ?combinatorial small molecule growth?, is based on two components:? a fast and accurate knowledge-based scoring function used to predict binding affinities of protein?ligand complexes, and a Monte Carlo combinatorial growth algorithm that generates large numbers of low-free-energy ligands in the binding site of a protein. We illustrate the advantages of the method by describing its application in the design of picomolar inhibitors for human carbonic anhydrase.
Kussell, E. & Shakhnovich, E.I. Glassy Dynamics of Side-Chain Ordering in a Simple Model of Protein Folding. Phys. Rev. Lett. 89, 16, 168101 (2002). Publisher's VersionAbstract
We introduce a modified version of protein lattice models in which monomers have several spin states, representing side-chain rotamers. Completion of folding corresponds to reaching the native backbone configuration with complete ordering of side chains. We find that as temperature is lowered, side-chain ordering becomes much slower than backbone folding. The presence of side chains leads to nonexponential kinetics and a broad distribution of relaxation times.
Dokholyan, N.V., Mirny, L.A. & Shakhnovich, E.I. Understanding conserved amino acids in proteins. Physica A: Statistical Mechanics and its Applications 314, 1–4, 600 - 606 (2002). Publisher's VersionAbstract
It has been conjectured that evolution exerted pressure to preserve amino acids bearing thermodynamic, kinetic, and functional roles. In this letter we show that the physical requirement to maintain protein stability gives rise to a sequence conservatism pattern that is in remarkable agreement with that found in natural proteins. Based on the physical properties of amino acids, we propose a model of evolution that explains conserved amino acids across protein families sharing the same fold.
Ding, F., Dokholyan, N.V., Buldyrev, S.V., Stanley, H.E. & Shakhnovich, E.I. Molecular Dynamics Simulation of the SH3 Domain Aggregation Suggests a Generic Amyloidogenesis Mechanism. Journal of Molecular Biology 324, 4, 851 - 857 (2002). Publisher's VersionAbstract
We use molecular dynamics simulation to study the aggregation of Src SH3 domain proteins. For the case of two proteins, we observe two possible aggregation conformations: the closed form dimer and the open aggregation state. The closed dimer is formed by "domain swapping”—the two proteins exchange their RT-loops. All the hydrophobic residues are buried inside the dimer so proteins cannot further aggregate into elongated amyloid fibrils. We find that the open structure—stabilized by backbone hydrogen bond interactions—packs the RT-loops together by swapping the two strands of the RT-loop. The packed RT-loops form a β-sheet structure and expose the backbone to promote further aggregation. We also simulate more than two proteins, and find that the aggregate adopts a fibrillar double β-sheet structure, which is formed by packing the RT-loops from different proteins. Our simulations are consistent with a possible generic amyloidogenesis scenario.
Mukamel, E.A. & Shakhnovich, E.I. Phase diagram for unzipping DNA with long-range interactions. Phys. Rev. E 66, 3, 032901 (2002). Publisher's VersionAbstract
We present a critique and extension of the mean-field approach to the mechanical pulling transition in bound polymer systems. Our model is motivated by the theoretically and experimentally important examples of adsorbed polymers and double-stranded DNA. We focus on the case in which quenched disorder in the sequence of monomers is unimportant for the statistical mechanics, but we include excluded volume interactions between monomers. Our phase diagram for the critical pulling force shows an interesting reentrant phase at low temperatures which should be observable in both disordered and homogenous polymer systems. We also consider the case of nonequilibrium pulling, in which the external force probes the molecule’s local, rather than global structure. The dynamics of the pulling transition in such experiments could illuminate the polymer’s loop structure, which depends on the nature of excluded volume interactions.

Pages