<<

Bioinformatics Algorithms

Protein structure prediction

David Hoksza http://siret.ms.mff.cuni.cz/hoksza Motivation

• Sequence → structure → function

• The number of available () sequences grows much faster than the number of available 3D structures

• Given a protein sequence we want to determine its structure • Inverse problem to design where, given a structure, we want to find sequence which codes for it

2 Structure → function

• Inferring function from structure • Detection of local structural motifs with functional roles • Analysis of surface clefts → catalytic sites • Conservation analysis • Quaternary structure (beware of false positives due to crystallization) • Buried and solvent exposed residues

• Issues • Moonlighting • Multiple functions carried out by a single domain • Conformational change of shape upon binding • ligand-bound state (holo structures) vs unbound state (apo structure) • Intrinsically disordered proteins (IDP) • Natively unfolded proteins

3 Sequence → structure

Size of common cores as a function of protein homology. If two proteins of length 푛1 and 푛2 have 푐 residues in the common core, the fractions of The relation of residue identity and the r.m.s. deviation of the backbone atoms 푐 푐 of the common cores of 32 pairs of homologous proteins each sequence in the common core are and . We plot these values, 푛1 푛2 4 connected by a bar,- against the residue identity of the core

source: Chothia, Cyrus, and Arthur M. Lesk. "The relation between the divergence of sequence and structure in proteins." The EMBO journal 5.4 (1986): 823. Protein structure prediction tasks

• Secondary structure prediction • Assign each one of three (or more) states (helix, sheet, loop)

• Tertiary structure prediction • Assign each amino acid/atom its position in 3D space

• Interaction sites prediction • Tertiary structure (intra-molecular) contacts • Protein-protein/DNA/RNA sites prediction (inter-molecular quaternary structure contacts) • Protein-ligand (active sites/pockets) prediction

5 Protein structure determination

6 Protein structure determination

• X-ray crystallography (89%)

• NMR spectroscopy (8%)

• 3D (cryo) electron microscopy (EM) (2%)

7 X-ray crystallography

• Crystallized protein subjected to X-ray beams, electrons disperse the beam, interfering with each other forming a diffraction patterns which is observed • Electron density of crystal is determined by the positions of electrons (atoms) ↔ magnitudes and phases of the X-ray diffraction waves = diffraction pattern of the crystal • Fourier transformation is used to estimate the electron density for each position

• Works only for proteins which form a crystal → suitable for rigid proteins but unsuitable for flexible proteins

source: https://www.nature.com/news/cryo-electron-microscopy-wins-chemistry-nobel-1.22738 8 X-ray crystallography – quality measures Electron density map • Resolution • 3Å→ secondary structure • 2.5Å→ side chains • <1Å→ hydrogen atoms • R-factor • After structure reconstruction, theoretical diffraction pattern can be computed → difference between real and theoretical pattern 3.7 Å 2.4 Å expressed as percentage (how well model back-predicts the data) • Rule of thumb - good structure should have R-factor lower than resolution/10 ( ≤ 0.3 for 3Å resolution) • R(free)-factor • When set aside data is used for the real pattern • B-factor (temperature factor) • Thermal motion is present even in crystal → extent to which electron density is spread out for each atom • 퐵 = 8휋2푈2 1.5 Å 0.8 Å

9 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) NMR spectroscopy

• Purified protein in a solution is put to strong magnetic PDB ID: 6F0Y field and probed with radio waves and observed resonances (each atom has characteristic resonance in magnetic field based on its surroundings) which are analyzed to build a model of atomic nuclei and bonded atoms • Resonances give indication of which atoms are close to each other → list of restraints to build the model • NMR structure commonly includes ensemble of structures which fit the constraints → diverse regions correspond to flexible parts PDB ID: 5MN3 • Proteins in solution → works also for flexible proteins which can’t be locked in a crystal

• Works for small to medium-sized proteins

PDB ID: 6BNH NMR spectroscopy – quality measures

• Completeness of resonance assignments • Percentage of atoms for which the resonances were measured

• Statistically unusual resonances

• Random coil index • How does the resonance fit usual protein conformations such as secondary structure

11 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) 3D cryo-EM

• A beam of electrons and a system of electron lenses is used to image the biomolecule directly. • Cryo-EM • Vitrification - protein solution is cooled so rapidly that water do not have time to crystallize → thin layer of non-crystalline ice • Thousands of 2D projection images → 3D density map → fitting atomic model to the map

• Chemistry Nobel prize in 2017 - Jacques Dubochet, Joachim Frank and Richard Henderson

• Ability to analyze large, complex and flexible structure • Works for proteins in native state • Often breaking 3Å resolution barrier

PDB ID: 3j3q 12 13 ource: ": the single global archive for 3D macromolecular structure data." Nucleic Acids Research 47, no. D1 (2018): D520-D528.

14

15 Protein folding

• Folding (skládání) is the process through which protein obtains its three- dimensional structure

• The protein wants to fold into most thermodynamically efficient state, i.e. state with the lowest free energy

• Information for folding is (mostly) driven by protein’s amino acid sequence through thermodynamic process • Anfinsen’s dogma

16 Anfinsen’s dogma

• All information needed to fold native structure of a protein is contained in its amino acid sequence

• Experiment with ribonuclease A (RNaseA), a 124-long extracellular with 4 disulfide bonds • Observation 1. SS bonds reduced using mercaptoethanol → denaturation with 8M urea → inactive protein, flexible random polymer 2. Removal of urea → oxidation of –SH groups back to SS bonds → regain of 90% of activity • Control (proving that the protein was unfolded) • Change of the order of steps in second phase → 1-2% of activity and random assortment of SS bonds

17 Levinthal’s paradox

• Reaching native folded state of a protein by a random search among all possible configurations can take an enormously long time • Unfolded polypeptide chain has many degrees of freedom • Even a small number of allowed 휙 and 휓 combinations leads to astronomically large number of structures

• Proteins fold in at most seconds which is a paradox → there must be pathway or set of pathways leading to energetically favorable conformation • Biased search • When considering some conformations as stabilizing and preferred (energy bias), the folding time becomes reasonable [Zwanzig et al. "Levinthal's paradox." PNSA 89.1 (1992): 20-22]

18 Structure prediction

19 Template existence dependency of tertiary structure prediction approaches

sequence identity 20% – 30% night twilight zone day

Combinatorial exploration Utilization of existing structures of the folding space with the goal to find the state with the lowest energy 20 source: Krieger, E., Nabuurs, S. B. and Vriend, G. (2003) , in Structural

21 Model scoring

23 Energy/scoring functions

• Native structure is the lowest free energy conformation → need for a function capable to assess energy/quality of a proposed structure

• Approaches • Potential energy • Atom-level resolution • Based on energy terms • Knowledge-based scoring functions • Residue-level resolution • Recognizing good folds from existing knowledge (PDB)

24 Potential energy function

• Potential energy function defines the potential energy of a system of positions of all its atoms • Behavior of a can be described by the Schrödinger equation which in general describes behavior of a dynamic system • We need to consider not only the atoms of the molecules, but also surrounding water molecules • Were we able to compute the energy of the system, we could use it to score our predictions • To compute the equation we need to consider all nuclei and electrons of the system and their interactions → impossible to solve for more than few atoms systems → potential energy function / • Molecular mechanics force field • Consists of energetic contribution of covalent (bonded) and electro-static (non-bonded) interactions • Each contribution consists of a functional part and its parametrization • Atoms represented by their centers only, but that depends on the type of energy function 25 Potential energy function – covalent interactions spring equilibrium bond • Bond-length potential constant length • Treating bond as a spring and describing its energy by Hooke’s law bond length • Bonds between chemically similar atoms have similar lengths, thus we can assume the observed 2 equilibrium is the one with minimum potential 퐸푏표푛푑 = 퐾푟 푟 − 푟푒푞 energy

• Bond-angle potential 2 • Same as bonds 퐸푎푛푔푙푒 = 퐾휃 휃 − 휃푒푞

• Dihedral angle potential Barrier height given number of • Dihedral angles do not have single energy energy minima minimum • Not sufficient to represent energy of a dihedral 푉 angle and often combined with electrostatic 퐸 = 푛 [1 + cos 푛휙 − 훾 ] energy between the first and last atom of the 푑푖ℎ푒푑푟푎푙 2 atoms involved in the dihedral angle angular offset 26 source: youttube - Introduction to Molecular Dynamics (OpenMM) Potential energy – electrostatic interactions

• Electrostatic potential charges • Partial charges are placed at the positions of the nuclei and their 푞푖푞푗 푞푖푞푗 퐸 = interactions are approximated by Coulomb’s law (퐸 = ) 푐ℎ푎푟푔푒푑 휖 푟 퐶 4휋휖0휖푟푟푖푗 0 푖푗 • Dielectric constant used to take into account polarity of the medium permitivity does not work for proteins, because distances between charges is of the distance same order of magnitudes as are the sizes of the microscopic dipoles → use of dielectric constant • Partial charges are computed from quantum mechanical simulations → crude approximation empirically determined constant • Van der Waals potential • Pauli exclusion principle (orbitals cannot overlap) 퐴푖푗 퐵푖푗 퐸푟푒푝푢푙푠푖표푛 = 12 − 6 푟푖푗 27 푟푖푗 Potential energy

• Approximate potential energy of a protein conformation 퐶

퐸퐶 = 2 2 = ෍ 퐾푏 푏퐶 − 푏푒푞 + ෍ 퐾휃 휃퐶 − 푏푒푞 푏∈푏표푛푑푠 휃∈푎푛푔푙푒푠 푉푛 + ෍ [1 + cos 푛휙 − 훾 ] 2 휙∈푑푖ℎ푒푑푟푎푙푠 퐴푖푗 퐵푖푗 푞푖푞푗 + ෍ [ 12 − 6 + ] 푟 푟 휖0푟푖푗퐶 푖,푗 ∈푛표푛−푏표푛푑푒푑 푖푗,퐶 푖푗,퐶

28 Knowledge-based scoring

• Also known as statistical potential • Residue-based strategy to score conformation • There exists correlation between frequency of some structural features and energy 푓푟푒푞 푓 ∼ 푒−훽퐸 푓 → 푬 풇 ∼ 휸퐥퐧(풇풓풆풒(풇))

• Knowledge of frequency of given feature can be used to determine its energy • Pairwise amino acids contacts • How often does one see residue of type 푖 in distance 푑 from residue of type 푗 compared to random conformation • Torsion angles • geometry

29 Ab initio approaches

30 Template-less prediction

• If there does not exist a homologue (with known structure) with respect to the target we cannot use any structure as a template and the model needs to be build de novo (ab initio)

• Approaches • Molecular dynamics • Model the folding process • Conformational space exploration • Sample the full conformation space • Fragment-based approaches • Restrict the conformation by considering predefined fragments

31 Molecular dynamics

• Potential energy function is often called force field in molecular dynamics • Different level of details can be captured in the energy function → different force fields

• Force field gives energy acting on every atom thus accelerating the atom in the direction of the force (negative gradient of the potential energy) 퐹 = −훻푈(푥) • Motion of the atom is (in molecular dynamics) determined by the force as described by Newton’s laws 훿2푥 훿 퐹 = 푚푎 → 퐹푖 = 푚푖푎푖 = 푚푖 2 = − 푈(푥1, 푥2, … , 푥푛) 훿푡 훿푟푖

32 Molecular dynamics - simulation

퐹 = −훻푈 푥 퐹 = 푚푎

• Application of the force in small time steps

퐹푖(푡) 푎푖 푡 = 푚푖

퐹푖(푡) 푥푖 푡 + Δ푡 = 푥푖 푡 + Δ푡푣푖 푡 푣푖(푡 + Δ푡) = 푣푖(푡) + Δ푡 푚푖

33 Voelz, Vincent A., et al. "Molecular simulation of ab initio protein folding for a millisecond folder NTL9 (1− 39)." Journal of the American Chemical Society 132.5 (2010): 1526-1528. 34 Conformational space exploration

• Exploring the whole conformational space to obtain the lowest energy state is not feasible → need of a strategy → simulated annealing

1. Generate a conformation 퐶1 2. Randomly modify 퐶 to obtain 퐶 1 2 Monte Carlo 퐸 퐶2 − 퐸 퐶1 3. Accept 퐶2 if E 퐶2 < 퐸(퐶1) or with probability 푒 푘푇 4. Decrease temperature 푇 Simulated annealing 5. Stop or go to step 2

35 Fragment-based approaches

• Short sequences have bias towards given structural motifs • Trying to sample the whole conformational space is highly time consuming → utilization of fragment libraries

• Algorithm outline • Start with initial configuration (e.g. straight backbone) • Randomly pick a position and a fragment of a given size from a library of fragments (extracted from PDB) • Rosetta uses 9-mers and 25 closest sequence neighbors in terms of similarity of query (QP) and 푘 database (DP) k-mer profiles → 푑 = σ푖=1 σ푎푎[푄푃 푎푎, 푖 − 퐷푃(푎푎, 푖)] (# of common residues) • Replace fragment and accept if a condition is satisfied (simulated annealing)

36 source: https://en.wikipedia.org/wiki/Protein#/media/File:Peptide-Figure-Revised.png Scoring function

• Corse-grained structure representation where sidechains are represented by center and radius of gyration

• Knowledge-based Bayesian-based scoring – how probable is a structure given a sequence • Structural features such as strand distribution, helix-helix distribution, overlaps (vdw), general shape, … • Sequence-structure features such as probability of given residue in the core, probability of pairs of residues being in given distance, ….

푷 sequence structure × 푷(structure) 푃 structure sequence = 푃(sequence)

We are looking for decoy with highest probability Identical for all decoys, so it does not need to be considered 37 Comparative modeling

38 General procedure

• Input: query (target) sequence • Output: corresponding (target) structure

1. Find structure(s) in PDB which is (are) homologous to the target → template(s) 2. Align target with template(s) 3. Identify conserved core regions and copy the respective backbone atoms 4. Build model for non-core regions / loop modelling 5. Model side chains 6. Optimize the model • The steps might repeat or be skipped 7. Assess quality of the resulting model (steps might impact each other) • Several possible templates (alignments, …) can be inspected in parallel

39 Template selection

• Any method for sequence similarity search (BLAST, PSI-BLAST, HMM, …) can be used

• It might be needed to first split protein into domains, since template might match only one domain of the target protein • Domains have around 100 – 200 residues • Domains identification • Method for boundary detection (uses to be based on AA composition, domain- linkers detection, …) • Split protein into overlapping fragments of given size • Inspect BLAST results for significant amount of sequences mapped to regions corresponding to domains

PDB ID: 5v56 40 Alignment construction

• Pairwise alignment might not be reliable, because it is difficult to tell whether a pair of amino acids is really evolutionary related or aligned just by chance

• Using multiple sequence alignment is preferred • CLUSTAL, T-COFFEE, MUSCLE, MAFFT, … • MSA does not need to be restricted to sequences with known structures

41 Core regions identification and transfer

• How to identify core, i.e. regions which kept their structure during evolution? → heuristics • Strongly conserved positions in MSA are likely to be part of the core • Regions in the neighborhood of insertions and deletions probably changed • Regions corresponding to secondary structure and buried core are more likely not to change • Modify alignment procedure parameters (such as scoring matrix) to see which regions are stable with respect to these changes • If multiple templates are available → structural superposition to see common regions

• Coordinates transfer • Coordinates of backbone atoms can be simply copied over to the target • When multiple templates are available • Different regions can use different templates • Templates can be averaged

42 Modeling non-conserved (loop) regions • Problematic since loop regions are not conserved, but also because they often occur on periphery of structure → do not stabilize protein & not part of active sites → not much evolutionary pressure → their structure might vary substantially • Due to difficulties of modeling loop regions, they should be interpreted only with great care! • Approaches • For short regions (3 or 4 residues) we can take the sequence and sample preferred dihedral angles conformation • Use rules for structure of such regions • Hydrophobic residues tend to be stabilized by packing side chains inward to the secondary structure elements which the region connects • Holds for dissimilar loops • Stabilization by hydrogen bonds with neighboring amino acids or even ligands • Structural context can differ • If multiple templates are available, use alternate structures as templates for non- conserved regions if sequence and length of the region matches it better • If neighboring regions match other regions in e.g. PDB use these • Try all stereochemically reasonable conformations and evaluate energy of the resulting conformation 43 source: Tramontano, Anna, and Arthur M. Lesk. "Protein structure prediction." Side-chain orientation

• Finding the correct combination of dihedral angles is a combinatorial problem because side chains cannot clash → testing all possible combinations and evaluation of their potential energy is not feasible → filtering + search strategy

• Some side chain conformations are preferred → rotamer library derived from X-ray structures • Backbone-independent / backbone-dependent • Some backbone conformation might allow for only one rotamer, providing anchor for surrounding side chains • Initially, conformations with the highest probability are taken and then modified to optimize the energy • Side-chains of conserved residues tend to keep their side chain conformation and thus rotamer libraries do not have to be used for such residues • Clashing side chains use to be grouped and each group is optimized separately

• Certain backbone conformation may favor some rotamers, e.g. when side chain forms hydrogen bond with backbone, reducing the side chains

44 Model optimization/refinement

• Predicting rotamers often leads to backbone correction, which in turn impacts the rotamers and their packing → iteratively model the rotamers and backbone structure

• Moleculear dynamics simulation of the model

45 Fold recognition

46 Fold recognition background

• When sequence identity between target and template is low so that evolutionary relationship cannot be discerned, identification of correct alignment becomes an issue, however that does not mean there are not structures with similar fold

• Premise: the number of folds is small PDB ID: 3I42

• We can use known structures and evaluate how well they fit the target sequence • Sequence fitting • Profile-based methods • Threading

• Target structure is then modeled as in homology modeling PDB ID: 2CE2

47 Profile-based methods

• Fitting local spatial and physico-chemical environment (structural profile) observed in potential template with the (possibly derived) environment in target sequence • Consider each template structure • Each amino acid of both target and considered template can be encoded using structural and physico-chemical properties • Secondary structure propensity, hydrophobicity, accessibility, … • Propensity and other statistical parameters can be either derived from analysis of PDB, or possibly by prediction algorithms • Align target and template sequences using dynamic programing using similarity of the properties • Pick the structure with the best fit as template for modeling the target structure

• Family of template structures can be grouped to form a profile and target is then aligned with template profile

48 Threading

• Profile-based approaches do not consider residue interactions, for example • Are hydrophobic residues placed in the core unless there is a charged residue bound to other charged residue? • Are residues often seen in given secondary structure motif present in regions corresponding to the motif • Threading tries to “thread” sequence on each of the candidate structures (library of 3D folds) and evaluate energy function which uses to consist of pair potentials (how likely is to see two residues close to each other), gap potential (alignment gap penalty) and solvent/structural potentials (how likely a residue fits given environment) • Pairwise energy is non-local, sequence-distant pairs of residues are also taken into account → favorable intramolecular side-chains interactions are preferred

49 Threading - alignment

• Threading tries to place target sequence over positions of considered template sequence (thread) and use the corresponding coordinates → many possible threadings (alignments) to consider and evaluate

• Optimal alignment needs to consider energy of pairs, i.e. scoring a position needs to take into consideration energy for interacting partners which are only determined by the alignment being built → standard DP not possible

• Double dynamic programming (see structure superposition slides) • Monte Carlo optimization • Frozen approximation – interaction partners are taken from template, instead of target

source: Tramontano, Anna, and Arthur M. Lesk. "Protein structure prediction." 50 Evaluation of prediction quality

51 Evaluation

• To evaluate quality of a method we need

• A metric which tells how well the model captures the experimental structure

• A dataset capable to test various cases which can be encountered • Existence of homologous structure • Various levels of similarity between target and template • Multiple templates • Existence of homologous structure without undiscernible sequence similarity • Non-existence of homologous structure

52 RMSD

• Mapping between the model and experimental structure is obvious since they have common sequences → RMSD • Should we consider the whole structures? • Shouldn’t we focus on biologically important sites only?

superposition using all residues superposition using 50% of residues 53 source: Tramontano, Anna, and Arthur M. Lesk. "Protein structure prediction." GDT-TS

• RMSD is quadratic → distant residues have higher impact (loop regions) • We care more about the amount of residues being in given distance from the experimental structure → GDT-TS (Global Distance Test Total Score)

GDT−TS = ퟏ #푪 in 1A from exp #푪 in 2A from exp #푪 in 4A from exp = [ 휶 + 휶 + 휶 ퟒ #푪휶 #푪휶 #푪휶 #푪 in 8A from exp + 휶 ] #푪휶

• Measures correctness of the overall structure, but does not take into account side- chain orientation • Modification to consider all atoms • Modification to consider fraction of matched dihedral angles (e.g. 휒1 and 휒2)

54 Points to consider when comparing model and experimental structure

• Atoms with high b-factor are not well defined

• Long solvent accessible amino acids ten to be more mobile

• Crystal contact regions can have different conformation in solution

• …

55 CASP experiment (competition)

• Effort to objectively compare protein tertiary structure prediction approaches • Takes place every two years (1994 – 2018) in many different categories • Tertiary structure prediction, secondary structure prediction, contacts prediction, domain boundary prediction, function prediction, .. • Categories by template availability • Homology modelling, protein threading, de novo prediction • Categories for server only and server + human predictions • Data are not known neither to the competitors, neither to the organizers • Soon to be resolved structures • Structures on-hold by PDB • 2018 results 57 State of the art tools

• I-TASSER • Rosetta • RaptorX • Phyre2 (successor of 3D-PSSM and Phyre) • MODELLER • HHpred • AlphaFold • SWISS-MODEL

59 Rosetta

• One of the most popular and best performing approaches (especially in de novo modeling) in CASP • Both de novo and comparative modelling • Fragment-based method using Monte-Carlo search

• Rosetta commons • software suite for modeling macromolecular structures • includes tools for structure prediction, design, and remodeling of proteins and nucleic acids • Robetta • Free service implementing Rosetta

60 MODELLER

• Homology modeling

• Model built by satisfaction of spatial constraints • Distance and dihedral angle restraints on the target sequence derived from its alignment with template 3D structures • Spatial restraints and the CHARMM22 force field terms (enforcing proper stereochemistry) are combined into an objective function • Model generated by optimizing the objective function

• Available also as PyMOL plugin PyMod

61 source: https://salilab.org/modeller/manual/node11.html HHpred + MODELLER

• Building HMM from MSA and searching precomputed database of HMMs (which include secondary structure information) to find suitable templates and their alignment to target • Very fast in comparison to most of other tools

• MPI Bioinformatics Toolkit

62 Phyre2 ARDLVIPMIYCGHGY HMM

PSI-Blast (10 million known sequences)

Hidden Markov HMM-HMM HMM HMM HMM Model DB of matching KNOWN ~ 65,000 known 3D structures ~ 65,000 hidden Markov models STRUCTURES

ARDL--VIPMIYCGHGY 3D-Model AFDLCDLIPV--CGMAY Sequence of known structure

63

source: http://www.sbg.bio.ic.ac.uk/phyre2/html/help.cgi?id=help/slides RaptorX

• Threading-based • CRF (conditional random fields) used for threading

• Scoring function takes into account sparsity of the sequence profile • For targets with few homologues, structural features are weighted more

• According to CASP9, better alignments for the hardest template- based modeling targets

64 (I-)TASSER

• The most successful method in CASP8-CASP12 homology modeling experiments

• Threading • Fragments • Clustering • Molecular dynamics

65 AlphaFold

source: Senior, Andrew W., et al. "Improved protein structure prediction using potentials from ." Nature 577.7792 (2020):66 706-710. AlphaFold2

• Using raw MSA instead of pairwise co- evolution

source: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology • End-to-end deep learning prediction • Attention mechanism → transformers NN?

• Pairwise distances only as a check

• Side chains

• Confidence score

67 source: Vaswani et al. (2017) SWISS-MODEL

• Web server for homology modeling • Easy-to-use automated modelling facility • Deepview (Swiss-PdbViewer) • sequence-to-structure workbench integrating functions for protein structure visualization, analysis and manipulation • Prediction approach • Ability to consider several templates and computing position of backbone atoms by (target-template sequence) weighted average of the positions of matched template atoms • Loops reconstructed from PDB-derived structure fragments • Refinement using a molecular force field • Statistical profile calculated by summing probabilities to observe a particular residue in a particular 3D context 68 Databases of models

• SWISS-MODEL Repository • Over million models built using the SWISS-MODEL pipeline

• ModBase • Over 6 millions of unique sequence models

• Protein Model Portal

69 - FoldIT

70 https://fold.it/ Sources

• Tramontano, Anna, and Arthur M. Lesk. "Protein structure prediction." John Wiley and Sons, Inc, Weinheim (2006).

• Gu, Jenny, and Philip E. Bourne, eds. Structural bioinformatics. Vol. 44. John Wiley & Sons, 2009

71