© 2015 Nature America, Inc. All rights reserved. evolutionary relationships from homologous sequences; (ii) the the (ii) sequences; homologous from relationships evolutionary extract to techniques statistical powerful of development the (i) factors: main to three due is methods other over TBM The of success communities. bioscience wider and modeling the both by technique used widely and reliable universally most the become has that technique this is It recognition. fold and modeling tive the strategies that have been called , compara the basis variation. TBM of patterns of evolutionary encompasses on structure known of to a sequence is aligned a sequence which similar energy functions and template-based modeling (TBM), in sequence with an derived experimentally fold is determined using ments of known structure, threading where the of a compatibility derived energy functions, construction of models from small frag or empirically physics-based using folding simulated ods include Such meth prediction. for structure methods putational more complex and error-prone approach simulatedof folding. sequence of interest to a library of known structures,a matching of ratherproblem a as thanconsidered be to theproblem prediction nature in folds finite and relatively small (1,000–10,000) number of unique protein evolution than protein sequence, and in (ii) that conserved there more is is evidence of a structure protein that (i) are: work niques eled using such techniques On average, 50–70% of a genometypical can be structurally mod gap. this shrink substantially can function and structure protein protein a of function interest. of sequence and structure the seek who researchers for challenges serious poses space structure and of space sequence knowledge our between gap ever-widening This structures. 3D determined experimentally 100,000 over just contains Bank Data Protein The sequences. protein million 80 over contained database protein UniProtKB/TrEMBL The 2014, September In I bio.ic.ac once and to run automatically weekly searches for that are difficult to model. A protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. guided through results by a simple interface at a level of detail they determine. sites and analyze the effect of variants (e.g., nonsynonymous protocol, we describe replaces P P Published online 7 May 2015; Centre for Molecular Processing, School of Biosciences, University of Kent, Kent, UK (M.N.W.). Correspondence should be addressed to L.A.K. ( 1 Lawrence A Kelley prediction and analysis The Phyre2 web portal for protein modeling, enormous growth in enormous projects, sequencing which provides the raw Structural Structural Group, Imperial College London, London, UK. NTRO hyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. hyre2 is a suite of tools available on the web to predict and analyze , function and mutations. range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at For over 30 years researchers have developed and refined com and refined For have over developed 30 researchers years predict to techniques computational in advances Fortunately, D UCT .uk/ P hyre, the original version of the server for which we previously published a paper in I ON 2 . These principles permit the protein structure structure protein the permit principles These . 2 . . A 1 P , Stefans Mezulis typical structure prediction will be returned between 30 min and 2 h after submission. hyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding doi:10.1038/n 1 . The key principles on which such tech prot.2015.05 1 , Christopher M Yates 3 2 Present addresses: University College London (UCL) Cancer Institute, London, UK (.M.Y.); ------

1 lished in in lished analysis. sequence the to augment is included information structural if enhanced be can alignment subsequent the and relationship the of detection low, is then protein(s) base data the and interest of protein the between similarity sequence another.the When of structure the of basis the on sequence one use or this alignment, equivalences, a set to of of model construct One can structure. then and known one of structure unknown of structures. This results in an alignment between two sequences,known for profiles of one database a against profile this scan quently an evolutionary or statistical profile of that sequence and to subse sequence of interest with a large database protein of a sequences, compare to tomethod constructsome on rely prediction structure turn-around. fast a with databases large process to computing of power the (iii) and information; of the primary objectives of the Phyre2 server is to provide a methods. provide bioinformatics to is cutting-edge to interface server user-friendly Phyre2 One the use. of of objectives ease primary is the Phyre2 of for factor differentiating key modeling The of majority minor are the tools such between accuracy in for differences the tasks, However, web. prediction the structure on powerful servers other of number a exist There other methods with Comparison genomes of range wide a of annotation the to Phyre2 applied have we groups, other with day.collaboration In users per year, processing ~700–1,000 user-submitted proteins per servers, protein prediction and structure it serves unique ~40,000 been cited over 2,400 times. Phyre2 is one of the most widely used protocol Phyre original the to reference have needed users date to but 2011, January in launched was Phyre2 ecessor. designed from scratch, and it shares no components with its pred been has protocol updated this in described system Phyre2 new the modeling, protein of aim common the share Phyre2 new the , 2 , Mark N Wass Phyre2 is the successor of Phyre, which we previously pub previously we which Phyre, of successor the is Phyre2 protein for methods reliable and used widely Today, most the SNP s s (ns aue Protocols Nature T his his protocol will guide users from submitting a SNP 1 s)) s)) for a user’s protein sequence. T , 2 he he server is available at & Michael J E Sternberg natureprotocols Nature Protocols 3 lhuh h oiia Phyre original the Although .

| VOL.10 NO.6VOL.10 . . In this updated [email protected] http T 4– he he focus of 6 protocol . ://www.sbg. 1 P hyre2 3 U , which has ,has which | sers are 2015 3 | and and

845 ). 7 - - - - .

© 2015 Nature America, Inc. All rights reserved. model of their protein but a model of a multimer. This is not yet not is a multimer. This of a model but protein their of model alterations the of backbone the of protein will be observed. Generally,variant. the no of position the at chain side different a variants using Phyre2 will receive essentially identical models with mutation. This point means that a a user of attempting to effect model several single-positionstructural wider the chain, side a of it is unable to accurately determine, beyond the estimated position functionality to predict the phenotypic effectmutations. point Phyre2 has of effects of structural the predicting a is point mutation, but structures. known to reference to predict a protein structure purely from sequence alone without of the protein-folding problem.difficulty There are ongoing still no or reliable methods wider the impossible reflects This be unreliable. either extremely will modeling then of structure, sequence a known and sequence user-supplied a between be by cannot detected used homology if First, methods servers. the related to other and Phyre2 limitations principal two are There Limitations non-bioinformaticians. by use of ease the in but accuracy, in not is Phyre2 and servers the these between ence from differ available primary the yet that Weconsider not website. center is prediction quality model average for data CASP11 conformation. native the Å of 4.5 1% within being residues a protein, 200-residue to typical two extra corresponds roughly quality in model a improvement in that note should one an average improvement over Phyre2 3.7%. of i-TASSER Excluding showed groups groups. superior eight improvement),remaining (8% the ~45 of out tenth ranked CASP10, Phyre2 In improvement. 5% a showing i-TASSER only with to Phyre2 an averagehad 2.8%, improvement of quality in model groups superior five The groups. unique ~55 of out sixth ranked group. experiment,Phyre2 CASP9 In the fromparticipating each servers in CASP, we consider only the single best-performing server identified template. As a an single research with groupidentity may sequence submit <30% multiple share domains Typically,these (120 targets in CASP9 and 98 targets in CASP10). in CASP) that they produce over the course of the automated CASP experiment fully systems for consider TBM and we the average systems, model other quality (known with as GDT_TSperformance at seen be can (results experiments CASP9, 11 the and tested in 10 been has consistent performance improvement over other methods. Phyre2 be i-TASSER cannot structure, known a to matched sequence a of regions key which in and uncertain is homology remote which in tasks, modeling difficult extremely In methods. these of accuracy the in difference major no is there CASP) or prediction structure protein of assessment (critical methods tion Raptor eodr srcue rdcin (PSIPRED) prediction structure secondary are Phyre2, i-TASSER tools. modeling online many of typical curve learning steep very the without techniques the-art to use state-of- in bioinformatics inexpert biologists This enables 846 protocol It is often the case that users do not want only a single-chain single-chain a only want not do users that case the often is It The second limitation, again applicable to all widely used methods, context, structural a in improvements these understand To Some the of most widely used web servers for protein modeling

| VOL.10 NO.6VOL.10 1 3 . In international blind trials of proteinpredic. structure of Inblind trials international 7 http://predicti , it is observed that for the majority of modeling tasks tasks modeling of majority the for that observed is it , | 2015 8 , Swiss-Model | natureprotocols oncenter.org/index.cg 9 , HHpred 8 has shown a small but small a shown has 1 0 1 , PSI-BLAST–based i 1 ).To the compare Robetta , 1 2 and and - -

rate software components created by our own group and oth and group own our by created components software rate dispa of number large a of combination a is system Phyre2 The The Phyre2 server user. the to ing a warn provide and cases these detect automatically will Phyre2 of versions Future sequence. user the of structure entirety a the to spans that detected be limita can This homology if apply protein. not will tion another from domain a onto transmembrane grafted then and found be may domain hydrophilic or transmembrane the globular of to structure crystal a apply in which homologous proteins also can This 12B. Step table in the discussed examining by discerned be can cases Such incorrect. be to likely very is model homology multidomain resulting the in domains the of orientation relative the 3b, stage in described the using combined are overlap mutual any section, below). If homology models of separate domains without sequence’ the 3b a of in single stage ‘Modeling (described Phyre2 of the mode ‘intensive’ using proteins multidomain modeling of building. complex for as templates structures multimeric by known using functionality this add to way under currently is work but Phyre2, in possible determine an evolutionary profile for the query that captures the captures that for the profile query an evolutionary determine sequences. homologous gathering 1: Stage sequence. protein term user-submitted the to refers the ‘query’ Throughout, algorithm. this of and results the navigation of user analysis to guide a are PROCEDURE the in the steps contrast, In prediction. structure for used underlying being the algorithm to refer figures corresponding their and stages in illustrated and 3b stage in described is This simplified and modeling template multiple of combination a through sequence a of model length full- complete a create to attempts that mode intensive optional in illustrated and below described stages, technical underlying four of composed is sequence tein pro a of model 3D a generating for Phyre2 of method core The Modeling a sequence single sequence. single a for describe we what follow largely interpretation their and produce they that results the as 35–39), Steps PROCEDURE be (optional; of Investigator Phyre use not the of will exception the with facilities PROCEDURE, the in advanced detailed These Phyre2. to submission query single a with mainly deal will procedure The tools. these of explanations brief by followed discussed, be will sequence gle sin a of modeling First, mutations. quality, of effects model the and of function analysis in-depth for Investigator Phyre (v) and model; to difficult are that proteins for scans weekly matic auto for Phyrealarm, (iv) structure; user a onto sequence user a of threading one-to-one of (iii) modeling; number for large sequences protein a of submission batch (ii) genomes; of a range against structure a search to Backphyre, (i) include: facilities Advanced sequence. protein submitted single a of structure 3D user’s the of focus research. the The most commonly used on facility is the prediction of the depending ways different several in used be may server Phyre2 The cores. CPU ~300 of farm Linux shared a on runs system The languages. multiple in written ers Finally, it is important to understand the potential limitations limitations potential the understand to important is Finally, it ab initio ab Figure Figure folding simulation. simulation. folding ab initio ab The first stage is to to is stage first The 1 . There is also an an also is There . Figure Figure techniques techniques 2 . These These . ------

© 2015 Nature America, Inc. All rights reserved. ful approach to specific and sensitive collection of homologs is is homologs of collection sensitive and specific to approach ful power most However, the matching. sequence-profile so-called BLAST multiple through (PSI-BLAST) Tool Search Alignment Local Basic Iterated Position-Specific as such grams pro using mined was database sequence the past, the In bases. data sequence protein be ever-growing may the searching Diversity by achieved distribution. this pollute to not as so vital is positives false avoiding whereas protein, the in position each at a preferences acid amino of distribution of representative statistically creation the to key is Diversity homologs. true yet of diverse number large a gather to needs one profile, evolutionary an at preferences residue along its length. each position To construct the final Phyre2 model. amino acid side are chains added to generate modeling. placement): Stage 4 (side-chain indels in these models by loop are corrected models. modeling): Stage 3 (loop backbone-only are crudefrom search to construct this used of alignments known structure. The top-scoring of a of against database HMMs proteins is scanned this scanning): (fold library into hiddena query Markov model. Stage 2 prediction combined structure secondary and the both alignment PSIPRED and with to structure is predict used secondary alignment The multiple-sequence resulting HHblits. with database sequence protein identity) mutual >20% sequence with (no nr20 sequences curated specially the against is scanned sequence a query homologous Stage 1 sequences): (gathering a stage are surrounded by a dashed box. are shown in and circles, within elements stages. Stage numbers showing algorithmic 1 Figure re of missingr Springs Poing: construction. sequence Query Final model

| Sy Normal mode Phyre2 pipeline fo nthesize from vir r constr egions. Backboneand sidechain side chainadditio Backbone and aints. Ab tual ribosome.

initi Normal Ph o n modeling 1 4 1 cn o te eune database— sequence the of scans 4 yre pr that iteratively evolve a profile profile a evolve iteratively that Poing

ot

ocol

Constr sequence Query Multiple high-scoring models covering diff aine Final model d side chains Add r egions ofquer HHblits Ab initio - - - 4

modeling Loop constr Extr 3 ing in a sequence profile. In addition, the secondary structure of of structure secondary the addition, In profile. a sequence in ing result identity, >20% shares sequences of pair no which in base computational data a sequence against query to the is used scan sacrificing HHblits speed. without alignments accurate more and PSI-BLAST over detected) homologs true all of (percentage HHblits as known method, This alphabet. restricted a into position each at matching by discretizing the vectors of 20 amino acid probabilities tics effectively reduce profile-profile matching to sequence-profile that overcome much of this burden.computational These heuris have developed been tive. heuristics Fortunately, powerful recent technique to large sequence databases is computationally prohibi matching. Unfortunately, such a profile-profile through applying y act aints fr C C onstr α -C alignment sequence Multiple om models α aine er Secondary dist prediction structure en d anc 1 t 5 , demonstrates 50–100% increase in sensitivity sensitivity in increase 50–100% demonstrates , PSIPRED e addition using R3. backbone using Pulchra followed by side chain the final C ribosome in the context of these forces, and The new protein is ‘synthesized’ from a virtual conformations, and prevention of steric clash. springs to maintain encourage burial, predicted secondary structure residues by notional solvent molecules to preferentially, bombardment of hydrophobic ab initio covered by templates are handled by the linear inelastic springs in Poing. Regions not extracted from these models and treated as sequence. Pairwise C confidence and coverage of the query are chosen by heuristics to maximize both as shown in stages 1–3 of Once a set of models has been generated, Figure 2 networks neural uses it structure widely and prediction, secondary most for the methods of used one is PSIPRED PSIPRED using predicted is query the backbone Crude natureprotocols components of the Poing algorithm:

| α Intensive mode Phyre2 pipeline. structure is used to construct a full HHsearch Alignment betweenqueryandtemplate 2 Query hidden α α

Markov model | -helix or -C VOL.10 NO.6VOL.10 α distances are Figure protocol β database of -strand structures 1 known HMM , , models | 2015

1

|

847 1 6 - - - - .

© 2015 Nature America, Inc. All rights reserved. vent bombardment model, predicted secondary structure springs vent springs model, structure bombardment predicted secondary by input constrained are models modeled is protein query that are Residues not ribosome. the from a virtual synthesized slowly as added are restraints Poing, In springs. different inelastic linear as between modeled are restraints These residues. of pairs constraints distance provide models These input HHsearch. by reported as confidence, high maintaining while protein query the of coverage increase that 3 and 2 stages in produced models of a subset to are used select heuristics First, Poing use we this, do ( all at templates no are there when or when templates, by separate even modeled are domains or protein regions different query the of model complete a create to is stage this of aim The used. is mode intensive if performed only Poing. with modeling template multiple 3b: Stage interpretation. alignment on details for PROCEDURE the of 25–28 Steps See alignment. original the in errors of indication an often is This backbone. the in remain some In to and such to gaps an indel fit a it cases, is fragment not possible selected. model scoring top the and terms energy ical Finally, fitted fragments are ranked using a combination of empir the of fragment. in angles changes the dihedral minimizing while model crude the to fragment the of ends the fit to attempts that descent coordinate cyclic using model crude to the fitted are fragments These used. is tion dele the of side either on window a encompassing sequence the members) of potentially useful fragments. Similarly, for deletions, 100 (typically list short a creates which distances, endpoint and composition sequence similar with fragments to detect sequence a search sequence-profile is using the performed inserted missing of geometry sequence, its For insertions, endpoints. between and distance by regions flanking characterized is model a in gap A given clustering. by structural followed database, structure the of by fragmentation a complete is constructed library This acids. amino 2–15 of lengths from structures protein known of ments Stage 3: loop modeling. chains. side and do that not contain (indels) and deletions insertions contain alignments are used to generate crude backbone models that often ranked by These as HHsearch. produced alignments probabilities, by posterior their query-template of list a is scan library fold (CASP) prediction structure protein of trials blind international in demonstrated as methods, detection HHsearch is Phyre2 in using the same approach as stage 1. The alignment algorithm used experimentally of set determined protein representative structures whose profiles a have been calculated of composed is library fold The library. fold the as known structure known of HMMs of database a precompiled against scanned matching HMM-HMM then using is HMM This (HMM). model Markov hidden a to converted is structure, secondary predicted the with together scanning. library fold 2: Stage 75–80%. of accuracy presence the predict to of profiles sequence protein on trained 848 synthesized are models 5–100 clashes. steric of penalization and protocol α

| VOL.10 NO.6VOL.10 -helices, -helices, β -strands and coils with an average three-state three-state average an with coils and -strands | 2015 1 8 | 1

, a simplified protein-folding simulator. simulator. protein-folding simplified a , Indels are handled using a library of frag 0 natureprotocols , which is one of the leading homology homology leading the of one is which , 1 7 , which is a robotic arm algorithm algorithm arm robotic a is which , The profile calculated in stage 1, stage in calculated profile The 7 ab initio ab . The end result of the the of result end The . ab ab initio modeling). To modeling). by Poing’s sol This stage is stage This - - - -

final representative model. As this model is composed only of of only α composed is model this As model. representative final to a and clustered choose demand) pro to owing teins computational large for (fewer size protein the on depending way this in above, while also including the known secondary structure within as 1 in stage and processed is extracted structure this of sequence 30 for requests. user libraries such of basis the on growing contains constantly is number this and genomes, currently Phyre2 genomes for interest. generated of be For must genomes. libraries HMM other purpose, in this exist structures homologous whether determine to want and interest of structure protein a have users genomes. across structure a detecting Backphyre: Advanced in facilities Phyre2 correct. is backbone the if accurate ~80% is most technique This their clashes. in steric avoiding chains while side rotamer probable place to library rotamer chain the side using performed is 3b protocol or R3 3 stage in generated placement. backbone chain Side 4: Stage for how to interpret this. interpret to how for PROCEDURE the of 25–28 the Steps on See match. the information of with confidence together protein view the of alignment model an a with and presented is user the Instead, hits. of list a produce course of not does threading one-to-one results, Phyre2 ordinary Unlike and algorithm. above HHsearch 1 the using stage aligned in as calculated sequence are the structure both of uploaded and HMMs it. model to which on template the and modeled be to sequence a both upload to user a allows threading One-to-one Phyre2. by chosen automatically one(s) the than model a accurate more produce would template chosen their that indicates that information biological some has or user a published yet not is that structure solved Perhaps newly a protein. has user a their model to particular which a on use template to wishes structural user a that case the sometimes is it sensitivity, and specificity high has Phyre2 by structures ogous One-to-one threading. PROCEDURE. the of 34 associated and 23 Steps in has as same the is interpretation sequence whose pages results individual Each batch. entire the for made are jobs batch for available, as are facilities pages to download detailed or summary results Summary processing. during and a batch 7,000 sequences month. Batch jobs can be monitored month per submissions individual 16,000 average on processes jobs. Phyre2 submitted individually than slower somewhat often on processed are jobs power as spare computing and it available, thus are they becomes Batch request. on changed be can limit this but time, a at sequences 100 upload to permitted are users default, By user. a by uploaded sequences of number large a on analysis. Batch of 23–30 Steps PROCEDURE. in the described that to out laid similarly output is screen final The available. 30 the from genomes user-selected the HMM. This HMM is scanned as in stage 2 against one or more -carbons, its backbone is reconstructed using Pulchra using reconstructed is backbone its -carbons, In Backphyre, a user uploads a structure in PDB format. The The format. PDB in structure a uploads user a Backphyre, In 2 0 that involves a fast graph-based technique and a and technique graph-based fast a involves that It is possible to run the single-sequence protocol to It the run is single-sequence possible Although the detection of remote homol Side chain fitting to the the to fitting chain Side Frequently, Frequently, 1 9 . - -

© 2015 Nature America, Inc. All rights reserved. Figure 3 Phyre2 by produced model we Any Investigator. purposes, Phyre these developed (see For 35–39). mutations Steps of PROCEDURE effects optional the and function potential quality, model of analyses in-depth more perform to desirable often is it Phyre Investigator. e-mail. by links and results the sent is user the and line, pipe modeling Phyre2 full the through is processed automatically query user the data, released newly this to detected is match confident a If updated. is construction profile for used database the to added are sequence the clustered few library, fold months Phyre2 and every structures new ~100 week Every week. each databases sequence new and structures new of scan automated an to added be may protein the confidently, modeled be cannot this reason, was developed. If a the service Phyrealarm user query in may the near future. detectable become For homology tectable unde week, that a every are meaning currently growing database in present not is that database. structural fold current the novel a adopting method; query matching the (iii) HMM-HMM being or the with structure detect known to any great and too query the between distance evolutionary the (ii) profile/HMM; useful a create to query to the sequences homologous of diversity and number sufficient a of a lack (i) factors: three of by one caused typically is a query of regions for significant matches structural confident ure to detect between a quarter and three-quarters confidently modeled. A fail had have 25% and modeled sequence their of a quarter than less 25% have 50% submissions, of had the confidence. remaining Of havesubmitted had over 75% of their length with >90% modeled missions over 2 months, on average more than 50% of all proteins Phyrealarm. preferences and predicted likelihood of a phenotypic effect from each of the 20 possible mutations at this position. shows an interactive 3D JSmol viewer, buttons to toggle different analyses and two bar graphs (in this case for residue A34) showing the sequence profile Fortunately, both the protein sequence database and structure structure and database sequence protein the both Fortunately, b c a

| Phyre Investigator user interface. ( On the basis of statistics from 30,000 Phyre2 sub Phyre2 30,000 from statistics of basis the On Given a confident model produced by Phyre2, a ) ) Information box. ( b ) ) Structure and analyses view. ( - - - -

mation box, the structure view and analyses buttons, and the the and buttons, view. sequence analyses and view structure the box, mation from is into three top divided main sections to bottom: the infor screen The context. interpret structural and sequence a and in simultaneously navigate to easy data of amount large this make ( interface Investigator Phyre The • • • • • • • • • refs. from also (see click following the one including analyses, of with range a for page Investigator results the Phyre to submitted be can illustrating the same information but in a the same information sequence context. illustrating view, sequence the to row extra an add will it Finally, structure. the analysis chosen and display a color-coded key to the left the of in accordance JSMolin the with view color structure the also will raw data. It to downloadable links and active currently is analysis will display, in the information box, a brief summary of whichever profile and mutational predictions. Clicking on an analysis button .org regions, from left to right: the JSmol interactive viewer ( Database (CDD) sequence features of Detection usingtheConserved Domain cs.iastate.edu Interface detection usingtheProtinDb ( analysis usingJensen-ShannonConservation Divergence Mutational analysis by SuSPect site detection fromCatalytic theCSA Pocket detection by fpocket2 (ref. Clashes, rotamers analysis andRamachandran by Molprobity confidenceAlignment from HHsearch Model assessment by quality ProQ2 (ref. The structure view and analyses section is itself divided into three / ), the analyses buttons and two graphs showing sequence 1 / ) andPI-site 0 c , ) ) Sequence view. The structure and analyses view 2 2 1 8 – 2 8 natureprotocols ): 2 7 databases 2 5 2 Fig. Fig. 3 ) 2 4 1 3 0

http: ) has been designed to to designed been has ) | 2 VOL.10 NO.6VOL.10 1 ) //protindb. protocol http://www. | 2015 2 6 |

849 2 2 -

© 2015 Nature America, Inc. All rights reserved.  clicking onthe appropriate radio buttoninthe form. 5| ? 4| 3| 2| 1| S PROCE • Data: thedatarequired (asfollows) used dependon thefacilities • EQUIPMENT M interaction protein-protein and accessibility solvent servation, are made using the SuSPect predictions These effect. to have a predicted phenotypic strongly bars above a residue type indicate that a mutation to Tall, this red residue sequence. is your in position particular a at mutations of (Uniref50). database a sequence large against query the of scan PSI-BLAST a by calculated (PSSM) matrix scoring position-specific the from taken are values These bars. blue shorter than favorable more being bar,bars red colored tall, with vertical a as displayed is type acid amino each for your protein at a sequence particular position. Residue preference the ‘Takebutton. snapshot’ JMol using time any at structure the of snapshot a take also may the clicking selection’ view.‘Clear above button the sequence You by selection your At clear any you can time, clicking. by repeated residues viewer. You JSmol multiple the in select may spacefilled position that later. to residue that be on cause a will described Clicking residue for graphs mutation and profile sequence priate appro the show Finally, will it residue. that of atoms the around halo red a as viewer 3D JSmol the in residue that highlight also will It question. in residue the of side either to bars vertical with context. sequence a in analysis the from mation infor reveal an row button will the extra corresponding showing analysis an on clicking addition, In modeled. been have regions which and sequence acid amino the model, the of structure ary second the prediction, this in confidence the sequence, your of 850 point after Step 12 and Steps 20–22, whether it is worthwhile resubmitting your protein in intensive mode. protocol Thus, the allowed characters are Thus, characters allowed ACDEFGHIKLMNPQRSTVWY the code. one-letter standard the in interest written protein the of of protocol, Standard sequencemodeling. acid Amino single-sequence browser web a and connection Internet an with computer personal A equence submission

ATER TROU

The mutational analysis graph represents the predicted effect effect predicted the represents graph analysis mutational The in preferences residue represents graph profile sequence The position that highlight will position over a sequence Hovering structure secondary predicted the displays view sequence The

CR

| VOL.10 NO.6VOL.10 Choose ‘normal’ or‘intensive’ mode (seedescription of stages 3and 3bin the INTRODUCTION for further information) by Copy and pasteyouramino acid sequence (seeEquipment section for data format) into the form provided. Enter anoptional jobdescription. Enter youre-mail address. Resultswillbemailed tothisaddress oncompletion. Go tothe Phyre2 home page ( I T D I I B ALS URE CAL LES

H STEP OOT | 2015 Normal is the default. It is recommended to use normal mode first and decide, on the basis of the critical I N G | natureprotocols 2 5 method. SuSPect uses sequence con http://w ww.sbg.bio.ic.ac.uk/phyre

- - - -

mary and links to all of their past and running jobs. Every com Every jobs. running and past their of all to links and mary sum a see to them allows that page a to user the takes manager job the in. Clicking logged when page home the at top the of link jobs’ past the ‘View via accessed is This manager. job Phyre2 the including tools other various to access gain they free), is (which manager. job Phyre2 proteins. human in mutations on best performs SuSPect information, network ing incorporat By information. network no but structure protein a using scores be version SuSPect will of calculated incorporating proteins, nonhuman For returned. be will scores precalculated protein, human a is protein the If SuSPect. from predictions ing mislead to lead will Investigator to then and Phyre2 to protein mutant a Submitting type. wild is sequence your that important more. and proteome human entire the for results calculated pre viewing sequences, of sets uploading for options more with ( web server a stand-alone benchmark (ref. superior PolyPhen-2 as such methods, demonstrating available other over humans, performance in lead disease to is to variant a likely how predict to information network • • • PROCEDURE. the of 8 Step in described as page, results the within possible also is delete This them. or expiry, prevent to them renew and jobs past user select a to permits manager job The d. 30 for server the on default jobs remain by Completed and confidence coverage information. mouse, displays an image of the top scoring model with summary pleted job has a link to results, which, when hovered over with the sequences with no gaps inFASTAsequences nogaps with format Advanced facilities, batch processing. A file containing acid amino multiple PDB structure Advanced facilities, one-to-one threading. sequence anaminoacid Both and Advanced facilities, BackPhyre. A protein fileinPDB format structure predictions the affect not will and ignored be breakswill line (unknown). and X Spaces also and When you are using SuSPect through Phyre Investigator, it is is it Investigator, Phyre through SuSPect using are you When 2 2 9 ). ), SIFT 3 0 and Condel If users register with the Phyre2 server server Phyre2 the with register users If http://www.sb 3 1 . The SuSPect method is available as g.bio.ic.ac.uk/suspe ct / ), - - - - -

© 2015 Nature America, Inc. All rights reserved. to to launch the JSmol 3D viewer in the browser Phyre’s intensive mode. Finally, there is a link could increase this coverage to 55% by using confidence templates were also detected that be modeled by a single template, other high- that although only 28% of the query could there is additional text informing the user (100% and 28%, respectively). In this case, in the model and coverage of the query sequence template extracted from the PDB file, confidence template used, information about the protein model, including the following: PDB code of the On the right are various data regarding the a PDB-formatted file containing this structure. structure. Clicking on the image will download page. On the left is an image of a large all- Figure 4 mode (option B). 12| Click the image to download the model in PDB format. This file is the same as the11| attachment in the e-mail received by the user. S Scroll tothe topof the webpage toview the ‘Summary section’. in-depth help onthatsection and toPDFversions of thatsection. Video tutorials for eachsection are under construction. or hidden using the show/hide linksnext tothe section titlestoreduce screen clutter. Beloweachsection titleare linksto sections willappear:binding siteprediction and/or transmembrane helix prediction. Most sections canbedynamically shown template information (Steps23and 34; secondary structure and disorder prediction (Steps17–19; 10| directory structure of the results page. Click thislinktosaveacopyof results locally. 9| expiry, the text willbeorange orred. Ifso,click onthe ‘Renew for 30d’linktoreset the expiry date for the results. sequence and expiry date of the job. Check the expiry date. Anew submission willsay‘30d’. Ifthe jobisapproaching 8| Main results page navigation registered. Click onthe linkinthe e-mail. Thiswillopenawebpage of results. visible tothose withthe uniquejobidentifier (useful for sharing results) orthe user’se-mail and password ifthey have to the main results page and anattachment containing the topscoring model inPDBformat. Note thatjobresults are only 7| O On completion, ane-mail willbesent tothe userand the page willbere-directed toagenerated results page. give usersmore information aboutthe server. The usermay choose tobookmark the page, orsimplywaitfor completion. and anestimate of the time itwilltake. Inaddition, various usertipsare periodically displayedatthe bottomof the page to monitoring page thatisautomatically updated every30s. Thispage shows aprogress barfor the job, information onthe job 6| molecular viewing software. and a link to a FAQ describing popular external ummary ummary section btaining btaining results (

(i) A The summary section ( The information tothe right of the image isdependent onwhether the userselectednormal (option A)orintensive The page isdivided into several main sections: summary (Steps11and 12; A buttonlabeled‘Downloadzipof allresults’ ispresent, which allowsuserstokeep anoff-line copyof the entire At the topof the results page isabox withinformation aboutthe submission date, description, linktothe input Upon jobcompletion, ane-mail issent tothe usercontaining summary information, auniquejobidentifier and alink To submitthe sequence, click the ‘Phyre Search’ button.Onclicking the button,the userwillbetaken toajob ) JSmol’ linktoview and rotate the model. Click on‘CloseJSmol’ toreturn tothe static image. and alinktoFAQ regarding other available3Dmolecular visualization tools. Click onthe ‘Interactive 3Dview in confidence and coverage of the model, alinktodisplayand interact withthe model using JSmol withinthe browser Normal results willshow information aboutthe known structural templateonwhich thistopmodel isbased, the N

| ormal moderesults Example Phyre2 summary results Fig. 4 ) displays an image of the top-scoring model (and its dimensions in Å) produced by Phyre2. β

Figs. 5 c and 6 ). Insome cases, depending onthe protein submitted, twofurther Fig. 5 a ), domain analysis (Steps20–22; Fig. 4 ), (Steps13–16), natureprotocols Fig. 5

| VOL.10 NO.6VOL.10 b ) and detailed protocol | 2015

|

851 © 2015 Nature America, Inc. All rights reserved. and 29–32. In this example, the rank 1 and rank 2 matches have confidence of 100% and sequence identities of 23%(ranks and 1–5), 24%, but respectively.no confident matches have been found to the intervening segment. ( the box indicates the length of the query sequence. In this example, confident (red) matches have been found at theconfidence N terminus (red, (rank highest6) and confidence;the C terminusblue, lowest confidence). ( is colored as described in Step 17. Question marks indicate predicted disordered regions. Each type of prediction is associated withFigure a rainbow5 color-coded 852 protocol c b a (B) Intensive moderesults (i)

| VOL.10 NO.6VOL.10 the protocol atStep7. Phyre2 modeling jobisautomatically runand the userise-mailed the results. Ifthishappens, the usercanresume new structures asthey become availableinthe fold library eachweek.Ifaconfident hitis detected, afull INTRODUCTION, once ithas beenadded toPhyrealarm, the sequence willautomatically bescanned against prefilled web form wherein one click willadd the sequence tothe Phyrealarm system. As discussed inthe the userisalertedtousePhyrealarm. Inthiscase, clicking onthe Phyrealarm icon orlinkwilltake the usertoa recommendation totryintensive mode. However, ifthe confidence ispoor(<90%)and there are no extra templates, additional portions of yourprotein are availableand whether they willprovide amessage tothateffectand a mode may bevaluable here. Phyre2 attemptstoautomatically determine whether other templatescovering by separate templates. SeeSteps20–22and the associated TROUBLESHOOTING section toseewhether intensive substantial fraction of the userprotein. Sometimes thisisbecausethere are multiple domains inthe protein covered to consider models withaconfidence value of <90%. Similarly, it may bethatthe top model does not covera  Click the browser’s ‘back’ button toreturn tothe topof the page of results. structural templateswere usedfor which regions of the usersequence and their associated color-coded confidence. to betaken lowerdown the page tothe ‘Multi-template and Other colorsare inherited from the confidence inthe template(s)usedto model that region. Click the ‘Details’link Regions modeled bythe with acolor-coded confidence key. The colorsindicate the predicted confidence of the model along the sequence. Intensive results willshow ahorizontal (possiblymultiline) colored barthatrepresents the usersequence together

| Examples of the three main sections of a typical Phyre2 results page. (

CR I T I CAL | 2015

STEP | natureprotocols Sometimes the confidence of the top model istoolowtobeuseful. It is not recommended ab initio approach of Phyre2 are alwayscolored bluetoindicate minimum confidence. b ) Example of the domain analysis results section described in Steps 20–22. The width of a ) Example secondary structure and disorder prediction. The query sequence ab initio c ) Example of the detailed table of results described in Steps 23 and 24 information’ table. Thistableindicates which

© 2015 Nature America, Inc. All rights reserved. ! that to ~65%. In addition, there are no predictions of you can check by looking at the PSI-BLAST results via the button near the top of the results page), then the accuracy falls sequence homologs detectable in the sequence database (Steps 13–16). If your sequence has very few homologs (something predicted to be in their correct state). However, this accuracy is only reached if there is a substantial number of diverse ! visually and bylooking at the statistics shown belowthe prediction. indicated byquestion marks (?).Assess whether alarge proportion of yoursequence ispredicted tobedisordered both 18| low probability of modeling success. amount of blueorgreen inthe confidence line is indicative of few homologous sequences detected and aconsequent confidence and blueshowing lowconfidence. Assesswhich regions are predicted with high and lowconfidence. A large indicate coils. The ‘SSconfidence’ line indicates the confidence in the prediction from PSIPRED,with red indicating high states: charged) are red and (W,Y,F,C: aromatic +cysteine) are purple. The secondary structure prediction comprisesthree simple property-based scheme: (A,S,T,G,P: small/polar) are yellow, (M,I,L,V:hydrophobic) are green, (K,R,E,N,D,H,Q: sequence isindicated inthe topline. The sequence isrepresented onthe next line withresidues colored according toa 17| S standard multiple-sequence viewer. FASTA-formatted version of the multiple . Click thistodownload the alignment for importing into any 16| secondary structure prediction, a weak sequence profile and consequently poor overall structure prediction accuracy. (>50% sequence identity) are both indicators of a lack of useful evolutionary information, which can lead to error-prone and powerful sequence profile. Conversely, a very small number of homologs or a large number of highly similar homologs indicative of a highly informative alignment, which is most likely to generate an accurate secondary structure prediction  prediction were obtained, assessthe number of homologs withlowE-values(<0.001). 15| E-value reported byPSI-BLAST, the percentage sequence identity tothe queryand asequence identifier for the homolog. presented inthisalignment. Eachrow of the tablecontains the region of the homolog matched tothe usersequence, the looking atthe left-hand side thatindexes the number of homologs detected. Upto1,000homologous sequences may be 14| nonredundant protein sequence library using PSI-BLAST. sequence alignment’ toopenanew window. Thiscontains the results of scanning the querysequence against anup-to-date 13| S bars indicate greater degrees of conservation. sequences is also shown, where thicker horizontal conservation for both the query and template prediction is displayed. The level of residue (from HHsearch) and in secondary structure per-residue confidence in both the alignment displayed: in this case Secondary structures (predicted and known) are query and template have a gray background. described in Step 17. Identical residues between described in Steps 25–28. Sequence coloring is as query sequence and known structure, as Figure 6 attempting attempting to model the protein may not be meaningful as the protein is unlikely to adopt a globular structure.

econdary econdary structure and disorder prediction equence analysis CAUT CAUT

CR The ‘Disorder’ line contains the prediction of disordered regions in yourprotein byDisoPred Inthe secondary structure and disorder prediction section of the main results page ( Closethe window. Next tothe ‘View PSI-Blastpseudo-multiple sequence alignment’ buttonisalinktozipped To ascertainwhether highly informative alignments thatare most likely togenerate an accurate secondary structure To determine how many homologous sequences were found, assessthe number of rows inthe newly opened window by Inthe ‘Sequence analysis’ section of the main result page, click the buttonentitled ‘View PSI-Blastpseudo-multiple β I -turns and T

I I I | α ON ON Example alignment between user CAL -helix, If a large (>50%) proportion of the query is predicted to be disordered, Phyre2 presents a warning that Secondary structure and disorder prediction is on average 78–80% accurate (i.e., 78–80% of the residues are

STEP β β -strand orcoil.Green helices represent A large number of high-confidence (E-value <0.001) homologs with extensive sequence diversity is α -bends are treated as coil, and -helices. Color-coded β π -turns, -helices and 310-helices are considered α β -helices, bluearrows indicate -bends, π -helices or 310-helices. These classes are merged such natureprotocols β Fig. 5 -strands and faint lines α -helices. a 32 ), the position inthe , and such regions are

| VOL.10 NO.6VOL.10 protocol

| 2015

|

853

© 2015 Nature America, Inc. All rights reserved. picture to download the coordinates of the model in PDB format29| for input to any other viewing or analysis programs you may have. 28| indicates moderate conservation and alarge blockindicates ahigh degree of conservation. across the detected sequence homologs classifiedinto three states. No symbolindicates unconserved, athingray bar display allextra information, asshown in 27| link when finished. Click onthe image tolaunch the JSmol appletinside the browser tointeractively inspect the model. Click the ‘CloseJSmol’ alignment inFASTA format and the coordinates of the model inPDBformat. Beneath these linksisanimage of the model. 26| and S=bend. Identical residues inthe alignment are highlighted withagray background. following: G=3-turnhelix (310helix), I=5-turnhelix (pihelix), T=hydrogen bonded turn,B=residue inisolated see ‘S’,‘T’,‘G’,‘I’and ‘B’characters. These are assigned secondary structure typesbythe program DSSP. They represent the is entitled ‘Template known secondary structure.’ Inthe ‘Template known secondary structure’ row youwillsometimes to interactively inspect the model using JSmol. SeeStep17for basic interpretation. One of the extra rows present here 25| template are usedinconjunction withthe sequence information ingenerating the alignment. matching. Boththe predicted secondary structure of yoursequence and the known and predicted secondary structure of the percentage coverage of the alignment. Eachmodel builtbyPhyre2 isbasedonanalignment generated byHMM-HMM Beneath thatline graphic isanalignment button.Hover overthatbuttontoview statistics aboutthe start,end and used for the model and a small graphic indicating where along yoursequence the match color-coded byconfidence occurs. structure similarityand the presence orabsence of insertions and deletions. Eachrow provides information onthe template quality of alignment. This inturnisbasedonthe similarity of residue probability distributions for eachposition, secondary 24| protein template. Lookfor red inthe ‘confidence’ column for reliable models. on the templatecode, alignment coverage, 3Dmodel, confidence, percentage sequence identity and text description of the 23| Detailed template information ? to collapsethe table. This inturnisanindicator of the probable domain structure of yourprotein. Returntothe topof the tableand click ‘Hide’ are regions thatcannot bemodeled atall,orwhether there are multiple templatescovering different regions of yourprotein. 22| information. Click thislinktogo tothe equivalent entry inthe detailed tableof results (Step23). regions thatare named afterthe templateused. Hover overthese toseeapop-upsummary picture of the model and further resources. (To model lower-ranked hits, seeTROUBLESHOOTING for Step29.)These havelinksinthe centers of their aligned 21| which the colorsrepresent the confidence inthe homology (red ishigh confidence, whereas blueislowconfidence). whose width represents the length of the userprotein. Matches byPhyre2 toknown structures are shown ascolored rows in 20| Domain analysis confidence line). Click onafeature tobetaken to more detail atthe CDDwebsite. as colored dots atthe appropriate positions inthe sequence withacolorkey atthe bottomof thissection (disorder 19| 854 ? protocol

TROU TROU

The ‘3D Model’ column contains a picture of the model constructed for your sequence based on that template. Click on the Click on‘Returntomain results’ inthe topleftcorner of the page and return tothe detailed tableof results. At the topof the screen are twobuttons thatcanadd orremove detail from the alignment. Click bothbuttons to Click the linksbelowthe alignment todownload atext version of the alignment, asimplepairwise representation of the Click the alignment button togo toanew page containing detailed alignment information ( The matches are ranked byaraw alignment score (not shown) thatisbasedonthe number of aligned residues and the Scroll down tothe ‘Detailedtemplateinformation’ tableonthe main results page ( Scroll the tabletodetermine which regions of yourprotein havebeenmodeled. Thisenables youtoseewhether there At most, the top20high-scoring matches are model built.Lower-ranked hitsare not modeled toconserve computing Click ‘Show’ next tothe heading ‘Domain analysis’ onthe main results page. Thiswillopenascrollable table( The usersequence isalsoscanned against the CDD | VOL.10 NO.6VOL.10 B B LES LES H H OOT OOT | 2015 I I N N G G | natureprotocols Figure 6 . The ‘Conservation’ rows contain information on residue conservation 2 8 for features of interest. When detected, these are alsohighlighted Fig. 5 c ). Thisdisplaysinformation Fig. 6 ) and the ability Fig. 5 β

-bridge

b ) © 2015 Nature America, Inc. All rights reserved. the master either the top-ranked model oramodel judged onsome previous background biological knowledge tobemost maximum subsetof atoms betweentwostructures thatcanbesuperposedwithin 4.5Å.Typically, one wouldchoose as superposed onthe master. Superposition isperformed using the MaxSub algorithm single master model onwhich other models willbesuperposed, and second, atick box toselectmodels (slaves)tobe each templatename (column two)of the main table are twobuttons: first,a radio buttonthatallowsyou toselectone 40| S 39| This information canguide mutagenesis experiments oraid ininterpreting SNPs. preferences for some typesof amino acid? Are some mutations strongly predicted tohaveaphenotypic effect? ‘Function’ measures, hover overitinthe sequence view and lookatitsprofile and mutational graphs. Are there strong 38| confidence intransferring the known interface of the templatetoyourprotein. 37| present inapocket are anevenstronger indication of likely functional importance. mutational sensitivity. Conservation cangivecluesastolikely functional residues. Highly conserved residues thatare also 36| near functional sites, you should exercise caution and investigate other alternative models that may avoid these problems. ested orinloopsthatare unlikely tobefunctionally important, they are not aconcern. However, ifproblematic residues lie Click eachof these inturnlooking for regions of poormodel quality. Ifthese are far from residues inwhich youare inter to seealistof buttons including ProQ2 (amodel quality prediction system),clashes, rotamers and Ramachandran analysis. 35| P link and proceed withthe optional Steps35–39. saying ‘View investigator results’. Ifyouhavebeenwaiting longer than10min,refresh the page inyourbrowser. Click this ‘Investigator running’. Processing typically takes 5–10min.When the jobiscomplete, the message willchange toalink 34| a progress barfor the Phyre Investigator job. Inthe topleftisalink‘Returntomain results’. Click thislink. 33| Investigator. Find amodel thatinterests you,and then click thisbutton. possible toperform arange of more in-depth analyses byclicking thisbuttonand submitting the model toPhyre 32| about the function of yourprotein. Agreement ingeneral functional classlends greater supporttothe prediction. functional information inthiscolumnmatches any previous biological expectation youmay have(ideally from experiment) Header’ and ‘Title’fieldsifthe structure inquestion is not present inthe current version of SCOP. Check toseewhether the Structural Classification of Proteins (SCOP; 31| However, surface loops will probably deviate from the native. shown and that the core of the protein is modeled at high accuracy (2–4 Å r.m.s.d. from the native, true structure). If you have a match with confidence >90%, one can generally be very confident that your protein adopts the overall fold ! with a remote homology that has still been modeled well but with greater expected deviationcolored fromred, native you are than dealing a close homolog. with a high-confidence close homology model. If onlyalenced the confidence to identical column istemplate red, residues you are dealingin the generated alignment. Check thebetween value yourof sequencethese two and columns. this template If both is values a true are homology. Sequence identity is the30| proportion of user protein residues equiv interesting. Click onthe radio button toselectthe model that youwishtobethe master model.

uperposition uperposition of models hyre Investigator (optional) CAUT The next two columns are ‘Confidence’ and ‘% i.d.’. Confidence represents the probability (from 0 to 100%) that the match At the bottomof the main table( When finished, click on‘Returnto main results’ inthe topleftcorner of the screen. Ifaresidue appearsproblematic from the ‘Quality’ measures, orifitislikely tobefunctionally important from the Click one of the interface buttons ifavailable. Are predicted interface residues alsoconserved? Ifso,thisincreases Click the ‘Function’ tabinthe ‘Analyses’ section todisplayoptions such asconservation, interfaces, pockets and The Phyre Investigator interface iscomposedof several sections ( Onthe main results page, scroll down tothe entry for the model thatyousubmitted. You willseeamessage saying You willbetaken toapage thatexplains Phyre Investigator and a‘submit’button.Click thisbutton.Thiswillshow you Alsointhiscolumnisabuttoncalled‘Phyre Investigator’. When amodel ormatch isparticularly interesting, itis The ‘Template Information’ columnprovides either the fold, superfamily and family of the template, asdetermined from I ON Confidence does not represent the expected accuracy of the model—although the two are intimately related. Fig. 5 http://scop.mrc-lmb c ) isabuttonentitled ‘Generate superposition of selected models’. Beneath .cam.ac.uk/scop Fig. 3 ). Click the ‘Quality’ tabinthe ‘Analyses’ section / ), ordescription fieldstaken from the ‘PDB 3 3 . Thisalgorithm attemptstofind the natureprotocols

| VOL.10 NO.6VOL.10 protocol

| 2015

-

|

855 - © 2015 Nature America, Inc. All rights reserved. modeled owing tothe limitations of long (>100-residue) unmodeled segments, youcan tryintensive mode, butsuch regions are extremely unlikely tobe well these individual domains together using which canbemodeled confidently. Inthiscase, youshould consider trying intensive mode, asitwillattempttoconnect multiple, high-confidence, largely nonoverlapping hits, thisindicates thatyourprotein contains multiple domains, each of This stepgivesyouvitalinformation onwhether youshould consider using the intensive mode of Phyre2. Ifyousee S One click inserts the chosen sequence into the main form. retrieve probable matches tothe userquery. Matches are returned asatableof sequences, species and Uniprot descriptors. ‘sequence finder’ onthe main submission page. Thisperforms a rapid keyword search of a number of sequence databases to If the userhasonlyanidentifier or descriptor of the protein of interest asopposedtothe sequence itself,they canclick the S of Phyre2 willautomate thisstepand displayoptional cutpoints tothe user. then bechopped atthese boundaries toensure thatthe length isbelowthe limitand resubmitted toPhyre2. Future versions this limit,itisadvisedthatthe querybesubmittedtothe CDD There iscurrently asequence length limitof 1,200amino acids. Work isunder waytoextend this limit.Ifthe queryexceeds S ? This information isnot explicitly usedinmodel generation butispresented asadditional usefulinformation for the user. membrane and the beginning and end of eachtransmembrane helix, illustrated withanumber indicating the residue index. section. Iftransmembrane helices are predicted, animage willbeseenshowing the extracellular and cytoplasmic sides of the accuracy of 89%onalarge testset.Scroll tothe bottom of the main results page justbelowthe ‘Binding siteprediction’ as topredict their topologyinthe membrane. Forthis, Phyre2 usesmemsat-svm (a powerfulmachine-learning tool)todetermine whether yoursequence islikely tocontain transmembrane helices, aswell 46| T Return tothe main results page. page and explore the results. Seethe 3DLigandSite submit the model regardless of confidence, butthis requires userintervention. Click thislinkto go tothe 3DLigandSite substantial (>50%)disorder, itisnot sent to3DLigandSite and amessage tothiseffectisdisplayed. Alinkisavailableto effect and alinktoresults ispresented. Ifthe top rank model iseither below90%confidence orispredicted tocontain template information tabletothe ‘Binding siteprediction’ section. Ifyourtopmodel is>90%confident, a message tothis to the 3DLigandSite 45| Binding site prediction 44| structural similaritybetweenmodels asindicated bythe TMscore, which isexplained inthe text onthe webpage. you asense for which regions of the model are trustworthy and which regions youshould becautious about.Take note of the structural flexibility. Thiscanbe helpful toestablishwhich regions of the models agree and disagree, which inturncangive will observeaconserved core withvariable surface loopsthatcanindicate where there islikely tobeamodeling error or 43| together withdescriptive help text. model. Thiswillthen take youtoapage withalarge JSMol window thatdisplaysthe superposition for interactive viewing, 42| as alternative structures withcomparable confidence valuestothe top rank model or master model. 41| 856 protocol ransmembrane helix prediction tep 22:shouldIresubmit my protein inintensive mode? tep 4:whatifIonlyhave anidentifierandnosequence? tep 4:how shouldlongsequencesbehandled?

TROU

Your sequence and the setof homologs detected byPSI-BLASTare processed byasupportvectormachine Ifthe top-ranked model isassigned aprobability of >90%,the model and sequence are submittedautomatically Use the backbuttoninyourbrowser toreturn tothe main results page. Rotatethe superposition, looking for the regions of the models thatare incloseagreement in3Dspace. Oftenone Click onthe ‘Generate superposition’ buttonbelowthe table;the ticked templateswillbesuperposedonthe master Afterselecting amaster model, youmay tick asmany slavemodels asyouwish.Slavemodels would typically bechosen | VOL.10 NO.6VOL.10 B LES H OOT | 2015 I N 3 G 4 | serverfor ligand binding siteprediction. Scroll down the main results page beyond the detailed natureprotocols ab initio ab initio protein modeling. –modeled connecting segments where required. Note thatifyouobserve 3 4 paperand the site’sFAQ for how tointerpret these results. 2 9 todetermine probable domain boundaries. The querymay 3 5 , which hasdemonstrated anaverage

© 2015 Nature America, Inc. All rights reserved. • • • • • 5. 4. 3. 2. 1. c at online available is information permissions and Reprints online version of pape the are in available the details interests: declare financial competing The authors CO the 3DLigandSite web resource. modeling protocol; C.M.Y. developed the SuSPect method and M.N.W. developed paper; M.J.E.S. supervised the project; S.M. developed the multiple template AUT EP/K502856/1). Sciences Research Council (EPSRC) (S.M.: EPSRC Standard Research Student (DTG) Research Student (DTA) andG1000390-1/1) the UK Engineering and Physical theBB/F020481/1), UK Medical Research Council (MRC) (C.M.Y.: MRC Standard and Biological Sciences Research Council (L.A.K.: M.N.W.:BB/J019240/1, A results contains the following: link toawebpage of results and anattachment containing the topscoring model inPDBformat (Step7).The webpage of Once the jobiscompleted, the userisnotified byane-mail thatcontains information onthe confidence of the modeling, a ANT large (>700 amino acids). is a substantial portion of the sequence that cannot be modeled by known homologous structures, or if the protein is of detected homologous sequences and server load. Intensive mode jobs can take considerably longer (2–6 h) if there In normal mode, job completion typically takes from 20 min to several hours depending on sequence length, number ● once youare logged intoPhyre2) and then navigate toone-to-one threading. threading method. Simplyreturn tothe Phyre2 home page, switchto‘expert mode’ (inthe topleftof the home page the PDBcoordinates of the template. Users may then uploadtheir sequence and thistemplatetothe one-to-one ‘Template’ columnof the detailed results tabletakes the usertothe Phyre2 fold library, where the usercandownload generate amodel using thistemplatebyusing the one-to-one threading method. Clicking onthe identifier inthe If astructural templateof interest ispresent lowerdown the listand thus hasnot beenautomatically modeled, youcan S 6. om/reprints/index.htm

cknowle tep 29:whatifatemplateisfound but notmodeled?

M Ligand binding sitepredictions (Step45)and transmembrane predictions (Step46)where applicable The top20all-atom3Dmodels and their associated alignments and estimated confidence values(Steps23–31; (Steps 20–22; Graphical summary tableshowing locations of matched homologs giving information onpotential domain boundaries Secondary structure, disorder and functional sitepredictions (Steps17–19; Facilities tointeractively view allmodels inthe browser using JSmol (Steps12and 26) T H PET systems biology on the world wide web. wide world the on biology systems domains. CATHand SCOP on based structures 3D predicted with sequences genomic (2013). fold-recognition.genome-scale by proteome (2009). server.Phyre the using study case a web: evolution. genome and (Springer,2010). 255–279 11, Ch. A.) Kolinski, (ed. in prediction.structure Fucile, G. Fucile, T.E.Lewis, C. Mao, the on predictionstructure Protein M.J.E. Sternberg, & Kelley,L.A. universe protein the of structure Karev,G.P.The & Y.I. E.V.,Wolf, Koonin, proteinGenome-wide Y. Zhang, & A. Roy, A., Szilagyi, S., Mukherjee, IMI I OR C I I

N CONTR PATE N G G FI dgm G et al. et Nucl. Acids Res. Acids Nucl. NANC et al. et D IB ents et al. et UT Functional assignment of assignmentFunctional RESULTS ePlant and the 3D data display initiative: integrative initiative: display data 3D the and ePlant I I

AL Genome3D: a UK collaborative project to annotate to project collaborative UK a Genome3D: This This work was supported by the UK Biotechnology ONS Fig. 5 l I . r NTERESTS .

L.A.K. L.A.K. designed the Phyre2 system and wrote the Nature Multiscale Approaches to Protein Modeling Protein to Approaches Multiscale

b 41 ) , D499–D507 (2013). D499–D507 ,

420 , 218–223 (2002). 218–223 , Mycobacterium tuberculosis Mycobacterium PLoS ONE PLoS Nat. Protoc. Nat. Tuberculosis

6 , e15237 (2010). e15237 ,

4 http://www.nature.

, 363–371 , 1 , 93 ,

20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7.

(2006). problem. conformationside-chain protein 29 representations.reduced from models protein full-atom cell. simulated a in control crowd closure. loop protein for algorithm matrices. scoring specific Methods Nat. alignment. HMM-HMM by searching sequence protein iterative (1997). programs.search database protein server. web RaptorX Rosetta. discrimination. superfamily and recognition fold protein improved for methods new pDomTHREADER: Bioinformatics modelling. homology structure protein for environment web-based a Workspace: 5 prediction.function and structure protein automated X. (CASP)—round predictionstructure protein of methods of assessment Critical Wei, X. & Sahinidis, N.V. Residue-rotamer-reduction algorithm for the forN.V.Residue-rotamer-reduction Sahinidis,algorithm & X. Wei, ofreconstruction for procedure Fast J. Skolnick, P.& Rotkiewicz, requires folding Protein M.J. Sternberg, & Kelley,L.A. B.R., Jefferys, robotics a descent: coordinate Cyclic R.L. Dunbrack, & A.A. Canutescu, position- on based predictionstructure secondary ProteinD.T. Jones, lightning-fast HHblits: J. Söding, & Hauser,A. A., Biegert, M., Remmert, S.F.Altschul, M. Källberg, usingrefinement all-atom with CASP8 for predictionStructure S. Raman, and pGenTHREADER D.T.Jones, & M.I. Sadowski, A., Lobley, comparison. HMM-HMM by detection homology Protein J. Söding, SWISS-MODEL TheT. Schwede, & J. Kopp, L., Bordoli, K., Arnold, for platform unified a I-TASSER:Y. Zhang, & A. Kucukural, A., Roy, Tramontano,A. & T. Schwede, A., Kryshtafovych, K., Fidelis, J., Moult, , 725–738 (2010). 725–738 , , 1460–1465 (2008). 1460–1465 , Proteins Bioinformatics Fig. 5 et al. et

9 et al. et 21 , 173–175 (2012). 173–175 ,

, 951–960 (2005). 951–960 , 77 Template-based protein structure modeling using the using modelingstructure protein Template-based Proteins Gapped BLAST and PSI-BLAST: a new generation ofgeneration new a PSI-BLAST: and BLAST Gapped (suppl. 9), 89–99 (2009). 89–99 9), (suppl. a Nat. Protoc. Nat. ) natureprotocols

22 J Mol Biol. Mol J

82 Bioinformatics. , 195–201 (2006). 195–201 , S2: 1–6 (2014). 1–6 S2:

7 , 1511–1522 (2012). 1511–1522 , J. Mol. Biol. Mol. J. Nucleic Acids Res. Acids Nucleic Protein Sci. Protein

292 , 195–202 (1999). 195–202 ,

Bioinformatics 25

, 1761–1767 (2009). 1761–1767 , |

VOL.10 NO.6VOL.10 12 397 , 963–972 (2003). 963–972 , , 1329–1338 (2010). 1329–1338 ,

25 Nat. Protoc. Nat. protocol J. Comput. Chem. Comput. J.

, 3389–3402 , 22

, 188–194 , Fig. 5

| 2015

c

| )

857

© 2015 Nature America, Inc. All rights reserved. 28. 27. 26. 25. 24. 23. 22. 21. 858 protocol

dimensional structure.dimensional Res. PDB. the in states binding multiple using sites interaction conservation. sequence from features. network using phenotype(SAV) variant acid amino single of prediction data. structural using enzymes in identified residues and sites catalytic of resource a Res tracking. and detection pocket ensemble protein for tools proteinsandnucleic acids. ProQ2. using Marchler-Bauer,A. protein of database a PiSite: K. Kinoshita, & T.Ishida, M., Higurashi, residues important functionallyPredicting M. Singh, & J.A. Capra, enhanced SuSPect: M.J. Sternberg, & Kelley,L.A. I., Filippis, C.M., Yates, Atlas: Site Catalytic The J.M. Thornton, & G.J. Porter,Bartlett, C.T., online Tufféry,P.Fpocket: & J. V.,Maupetit, Guilloux, Schmidtke,P.,Le Davis,I.W. assessment quality model ImprovedWallner, B. & E. Lindahl, R., Arjun, | VOL.10 NO.6VOL.10

38 37 (suppl. 2), W582–W589 (2010). W582–W589 2), (suppl. (Database issue): D360–D364 (2009). D360–D364 issue): (Database J. Mol. Biol. Mol. J. etal. BMC Bioinformatics BMC MolProbity: all-atom contacts andstructure validation for | Nucleic acids Res acids Nucleic 2015 et al. et

426 Nucleic Acids Res Acids Nucleic | CDD: conserved domains and protein three- protein and domains conserved CDD: natureprotocols Nucleicacids Res , 2692–2701 (2014). 2692–2701 , Bioinformatics

13 32 , 224 (2012). 224 , (suppl. 1), D129–D133 (2004). D129–D133 1), (suppl.

41

35

(D1): D348–D352 (2013). D348–D352 (D1): 23 (suppl.2), W375–W383 (2007). , 1875–1882 (2007). 1875–1882 , Nucleic acids Nucleic Nucleic Acids Nucleic

35. 34. 33. 32. 31. 30. 29.

(2007). information.evolutionary using prediction (2010). W469–W473 structures. similar using sites ligand-binding Bioinformatics. quality.predictionstructure protein of assessment the for measure life. ofkingdoms three the from proteins in disorder native of analysis functional and Prediction Condel.score, deleteriousnessconsensus a with nonsynonymousSNVs of outcome the (2012). proteins. on substitutions mutations.missense Jones, D.T. Improving the accuracy of transmembrane protein topology transmembraneprotein of accuracy the ImprovingD.T. Jones, Wass, M.N., Kelley, L.A. & Sternberg, M.J. 3DLigandSite: predicting3DLigandSite: M.J. Sternberg, & Kelley,L.A. M.N., Wass, automated an MaxSub: Fischer,D. & L. Rychlewski, A., Elofsson, Siew,N., D.T. Jones, & B.F. Buxton, L.J., McGuffin, J.S., Sodhi, J.J., Ward, of assessment the Improving N. López-Bigas, & A. González-Pérez, N. Sim, I.A. Adzhubei, et al. et SIFT web server: predicting effects of amino acid amino of effects predicting server: web SIFT

Am. J. Hum. Genet. Hum. J. Am. et al. et 16 , 776–785 (2000). 776–785 , A method and server for predicting damagingpredicting for server and method A Nat. Methods Nat. J. Mol. Biol. Mol. J. Nucleic acids Res. acids Nucleic

7 88

, 248–249 (2010). 248–249 , 337 , 440–449 (2011). 440–449 , , 635–645 (2004). 635–645 , Bioinformatics

40 Nucleic Acids Res. Acids Nucleic W1: W452–W457 W1:

3 , 538–544 ,

38

,