Ab Initio Scripts

Rosetta Scripts

Ab Initio Scripts

These scripts and executables may be used with the RosettaAbinitio? package. Perl scripts can be run without arguments for usage. Some scripts may require editing configuration values. The lib directory contains perl modules that are used by some of the scripts.

An example protocol

1. Run Rosetta to generate a silent mode file. Make sure the configuration values are set in rosettaAB.pl.

./bin/rosettaAB.pl -fasta test_.fasta -nstruct 1000

2. Run Rosetta with a different executable (build with Intel C++ instead of g++).

./bin/rosettaAB.pl -fasta test_.fasta -nstruct 1000 -binary ../../rosetta++/rosetta.intel

3. Run cluster.pl to cluster the decoys in the resulting silent mode file, aatest.out, and extract the top 10 cluster centers.

./bin/cluster.pl -silentfile aatest.out -get_centers 10

Scripts and Executables bin/rosettaAB.pl RosettaAbinitio? wrapper. Generates a silent mode file with nstruct decoys.

 Depends on rosetta++/rosetta.gcc. bin/cluster.pl Clusters silent mode file decoys and extracts cluster centers.

 Depends on RosettaAbinitio/src/rosetta_cluster/rosetta_cluster. src/rosetta_cluster/rosetta_cluster Clustering executable. bin/extract.pl Extracts decoys from a silent mode file.

 Depends on rosetta++/rosetta.gcc. bin/findhomologues.pl Gets homologs from a psi-blast search against the NCBI NR database.  Make sure the configuration values are set to point to blastpgp executable and the NCBI non-redundant database (nr). bin/reconstruct_PDB_by_index Decoy extraction executable.

 Requires the GNU Scientific Library (http://www.gnu.org/software/gsl/). Shared libraries must be installed in /usr/lib/. bin/make_fragments_from_server.pl Makes fragment libraries from the robetta fragment server.

 Requires libwww-perl (http://search.cpan.org/search?dist=libwww-perl) or individual modules, HTTP and LWP, for HTTP requests.  If you want to build fragment libraries yourself, use the RosettaFragments? package.  THIS SCRIPT IS NOT FOR COMMERCIAL USE!! o If you are a commercial user and would like to make fragment libraries, get a license to use the Rosetta Fragments (NNMAKE) package.

Barcode Scripts

Scripts used in processing barcodes. For an explanation of Rosetta's use of the term 'barcode', see Barcode Constraints. amino_acids.py Data on amino_acids. barcode_bb_silent.py barcode_bb_silent.py bar_code_chi_hydro.py Bar codes for chi angles with hydrogens. barcode_chisq.py Chi-square values from barcodes. barcode_flavors.py Barcodes for flavors. barcode_frags.py Barcodes for fragments. barcode_frags_ss.py Barcodes for fragments of secondary structures. barcode_graph.py Generate graph of barcode data. barcode.py Extract barcodes ??? barcode_tree_flavors.py Generate score trees from barcodes for each flavor. barcode_tree.py Generate score trees from barcodes. barcode_util_CA.py I/O routines for reading decoy files and silent files. barcode_util.py I/O routines for reading decoy files and silent files. basic_util.py Basic utility routines. cluster2barcode.pl Convert cluster files to barcode files. clusters2cst.py Generate constraint files from cluster data. compare_deviation_lists.py Compare deviations. condense_cst.py Collapse constraints files together. fig_devel.py Figure drawing functions. fisher_contact.py Calculate Fisher discriminant. frag_ss_dev.py ??? make_scop_barcode_cst.pl ??? plot_flavor_torsions.py Plot torsions of flavor files. plot_outfile_torsions.py Plot torsions of resulting decoy. pstat.py Module for array and list manipulation. res_barcode_script.py ??? resflavor2barcode.pl Generate a barcode file from a residue flavor file. residue_flavor_code.py Extract flavors from residues. residue_flavor_code_natlabel.py Extract flavors from residues, looking for a native consensus. residue_phipsi_features.py Extract phipsi features of particular residues. run_barcode_scripts.py Shorthand for executing Rosetta using barcodes. score_trees_devel.py Module to generate trees from distances. stats.py A collection of basic statistical functions.

Clustering Scripts

C Scripts cluster_info_silent Cluster data in silent files. Python Scripts amino_acids.py Data on amino_acids. blast.py Process BLAST data. compose_score_silent.py Compose the scores from multiple PDBs into a silent file. fig_devel.py Figure drawing functions. make_color_trees.py Generate trees with data colored by ??? make_new_plot.py Generate data plots pdb.py Generate coord files for scop_family score_trees_devel.py Module to plot score trees How to do clustering: step 1: If you have PDB files, you have to make them into a silent-mode output file. Make a list of the decoys:

/bin/ls aa*pdb > tmp.list then compose into .out file:

python/compose_score_silent.py tmp.out tmp.list step 2: (optional) Pre-process your native-pdb-file for reading by the clustering program.

python/make_coords_file.py nat.pdb A tmp.out > nat.coords step 3: Cluster

C/cluster_info_silent.out tmp.out nat.coords cluster/tmp 5,15,45,75 3,4

I you dont have a native, replace "nat.coords" by "-". This will make a million files in the directory cluster/ that start with the characters "tmp" (that's what the 3rd argument specifies). step 4: Make a contacts plot.

python/make_new_plot.py cluster/tmp.contacts step 5: Make a dendrogram of the clusters.

python/make_color_trees.py cluster/tmp 1 25

You can get some info about the scripts by running each one without any arguments.

The clustering is a very simple algorithm: given an RMSD threshold, find the decoy with the most neighbors within this threshold. This is your first cluster. Now delete all the members of this cluster, and repeat: find the decoy with the most neighbors within this threshold. This is your second cluster. etc etc

The only tricky part is how you decide what the clustering threshold should be. You could say you want N decoys in the top cluster. Or you could say you want the threshold to be 3 Angstroms. The complicated command line arguments to the clustering program are designed to allow the program to make a smart decision:

C/cluster_info_silent.out

a,b,c,d e,f a is the smallest cluster you want to see. b,c, and d bound the size of the top cluster, and e and f bound the clustering threshold.

The program will try to get a top cluster of size c. This will define some initial clustering threshold t. If t >= e and t <= f, you're done. If t

In short: the top cluster size will lie between b and d, and if possible the clustering threshold will lie between e and f.

The memory and speed are most sensitive to the setting "d" as well as the total number of decoys, so try not to set "d" too big. If it's too small the first time you can always run it again.

For 1000 decoys and a smallish protein you might use:

5,10,50,150 3,4

Of course the thresholds 3,4 should be scaled with the length of the protein.

Decoy Stats Scripts

Scripts for processing the results of runs of Rosetta using -decoystats. These scripts are not well documented. cluster_trees.py Module for generating trees from clustered distances. decoystats.py Process results of decoystats runs, o Docking Scripts STARTERS o POST-PROCESSING ROUTINES for large runs on computing clusters o HANDY SCRIPTS for looking at scorefile output o MISCELLANEOUS and supporting scripts

This is a collection of scripts used with the RosettaDock? package. Some are essential (such as post-processing tools), others are handy for working with or creating Rosetta input and output files, and others are obscure routines for dealing with specific types of, say, calibration runs.

If you add files:

1. enter descriptions here 2. all scripts should have a help message if arguments are not entered properly 3. comment scripts and list an author/email

Enjoy

STARTERS rrun.sh main script for invoking Rosetta in dock mode ppk.bash Creates a prepacked starting structure rosettarc Setup file for using RosettaDock testrun.bash Mike Daily's example script for testing a dock run

POST-PROCESSING ROUTINES for large runs on computing clusters pp_pdb2.sh Second half of post-processing, usually done on the lab intranet where R is present pp_pdb.sh Post-processes a docking run. Calls several other pp_ scripts. Usually done on the cluster, then data files are pushed to a desktop machine for further processing. pp_compile_scorefiles.sh Merges multiple scorefiles (aa,ab...) together for an analysis of the superset pp_extract_set.sh Extracts topN structures from multiple subdirectories pp_push_set.sh Pushes top decoy sets off the cluster pp_cluster_set.sh Create clusters of decoys (uses R) pp_calc_contacts.bash Calculate the number of correct residue-residue contacts (obsolete now that Fnat is calculated by Rosetta) pp_dwindle_byfile.sh Observe how many correct structures pass various filters pp_dwindle.sh same as above pp_set.sh calls pp_pdb.sh for all targets in the current directory pp_summarize_clusters.sh Details the results from clustering pp_summarize_lowscores.sh Details the results from final score pp_zip_pdb.sh Zips up completed sets of runs.

HANDY SCRIPTS for looking at scorefile output filter_bumps.pl Remove structures with bad bumps filter_column.pl Filter a scorefile on a particular column (by column number) filter_on.pl Filter by a particular score (by name) sort_on.bash Sort by a particular score (by name) findColumn.pl Find the column number for a particular score findIndex.pl Find the column number for a particular score find_max.pl Find the max value in a particular column find_min.pl Find the min value in a particular column find_percent.pl Filter at a cretain percentage findRank.pl Determine rank of first decoy fullfilling a criterion checkCol.pl be sure all lines of a scorefile have the same specified number of columns column_filter.pl Remove scorefile lines with the wrong number of columns column_unfilter.pl Show offending scorefile lines with the wrong number of columns

MISCELLANEOUS and supporting scripts analyzeclusters.sh Contacts and relative rms of top clusters calc_score_for_cluster.pl Finds the best scoring decoy in a cluster clean_scratch.sh Removes temp files from computer cluster scratch drives docking_make_scorefilter.sh Find cutoff values for filtering decoys by score do_voids.sh Calculate voids at interfaces using voidoo energy_diff.py Determine energy changes from ppk structure (monomer) to bound (decoy) structure getAllscores_complete.pl Combine scorefiles from calibration (perturbation) runs for input into R for regressions getAllscores_withTargets.pl, getAllscores.pl: ditto above histjoin.pl Join two histograms makeprefix.sh Convert a number to a two-letter code (useful for large runs) pdb_dir_maker.pl Create output directories for dock runs adddirs.pl From a list of decoys, add prefix subdirectories rms2avglink.csh For clustering, calculate rms between all pairs of decoys rms2.pl Clustering guts also: pdb_scripts/ contains scripts for manipulating pdbs, including preparation for docking runs R_scripts/ contains scripts for plotting scores from scorefiles and fitting weights from scorefiles

PDB Manipulation Scripts

The argument usage for these scripts can be found by running the scripts with no arguments. addChain.pl add a new chain to a pdb file that previously did not have one extractChains.pl: extract given chains from a pdb file, followed by a TER. You can extract multiple chains at once, but there will be only one TER at the end. changeChain.pl change a the chainID of a given chain. listChains.pl list chains in a pdb file, with the position of existing TERs marked by a '-' after the appropriate chain. openPDB.pl Separate two docking partners (delimited by TERs) along the line of centers by a given distance. pdb_detail.pl Breaks pdb file down completely by chains and residue numbers. pdb_fasta.pl Generate a FASTA format sequence file from a pdb. pdb_remove_missing_bb.bl remove missing backbone atoms from a pdb. Missing backbone atoms in a pdb will cause Rosetta to crash. renumberPDB.pl change the numbering in a pdb, starting over at TERs. Warning: discontinuous numbering (e.g. chain breaks) will become continuous. translate_xyz.pl translate given chain of a pdb file by given deltaX, deltaY, and deltaZ. truncate.pl truncate a given chain of a pdb before or after a given residue number. zapHs.pl Removes hydrogens from a pdb (usually to remove Hs that Rosetta has inserted).

Disulfides makefixdisulf.py python script to make .fixdisulf files for using -fix_disulf option in RosettaDock

 calccontacts.py, coordlib.py, and loadPDB.py are python scripts upon which makefixdisulf.py depends. do not remove them.

More zapChain.pl remove a chain from a pdb file pdb_sequence.pl extract the sequence from the pdb pdb_add_insert_codes.pl add insert letters to repeated pdb residue numbers orientPDB.pl rotate the PDB based on a particular residue renumberPDBandchains.pl starts each chain numbering at res 1 renumberPDBatoms.pl renumber the atoms from 1 homogenizeChain.pl identifies all ATOMs with given chain and removes TERs pdb_subtract_scores.pl Compare scores residue by residue pdb_only_ATOM_TER.pl remove everything from file except ATOM and TER lines truncateFABs.pl <-LH -H> <-truncL x> <-truncH x> Truncate light chain L at 112 and heavy chain H at 119

Ligand Scripts

This directory contains scripts for using the results of ligandcode in this version of Rosetta. It contains the following files: molecule.exe is a c++ program executable written by Jens Meiler. The operation of the subsequently describe rely on this program. The executable in this directory runs on an x86 Linux box ONLY. pdb2mdl.inp is a script to translate a pdb format file into a mdl format file which is output to stdout. output is directed to the screen. Usage: pdb2rosetta.inp pdbfile mdl2rosetta.inp translates an mdl format file into a pdb format file with rosetta atomtype names. Usage: mdl2rosetta.inp mdlfile addhydrogens.inp will add hydrogens to a PDB file to fill missing valences. Usage: addhydrogens.inp Ligand Scripts Unix Session? gives an example of how to use the commands in this directory. example_files are example files for use with the scripts.

Recommendations as to how the use these scripts to generate a small molecule protein input pdb for input into ligand mode of rosetta follow. First, create a file with only the HETATM or ATOM statements of the small moelcule or ligand that you want. For the example in the example_files folder we can grep the LOV residue HETATM statement using the follwoing line.

grep LOV 2ER7.pdb | grep HETATM > 2ER7_hetatm_start.pdb

With the pdb file we can now use the pdb2mdl.inp script to call molecule.exe to produce a an mdlfile.

pdb2mdl.inp 2ER7_hetatm_start.pdb > 2ER7_hetatm.mdl

The mdlfile allows us check that molecule.exe recognizes the bonding network of the molecule correctly. The fourth line of the file has two numbers of imporatnace in the first columns. The first number is the number of atoms and the second is the number of bonds. If you manually edit this file andremove or add a line these need to be updated. The next lines are the atoms in the molecule. With cartesian coordinates followed by the atomname. The Bond block follows the Atom block. The first two numbers are the line numbers in the atom block that the bond connects. The third number is the bond type: 1=single bond, 2=double bond, 3=triple bond, 4=aromatic/conjugated bond. Looking at the example mdlfile we see that there is a bond type 4 present between to carbons that should infact be a single bond. Change the 4 to a 1. Now run the addhydrogens.inp script to add hydrogens to the molecule.

addhydrogens.inp 2ER7_hetatm.mdl

The resulting 2ER7_hetatm.mdl file will look like 2ER7_hetatm_w_hydrogens.mdl. This mdlfile has all the atoms and bond description needed to allow molecule.exe to determine the appropriate rosetta atom types. Now run the mdl2rosetta.inp script to generate the HETATM statements that can be usedin the input file to rosetta.

mdl2rosetta.inp 2ER7_hetatm_w_hydrogens.mdl | grep HETATM > 2er7_hetam_rosetta.pdb

This final file should be added to the end of the input pdb file seperated by a TER statement form the PDB ATOM records. If the molecule is charged add a a CHARGE record after the HETATM statement.

All example files are found in the example_files/ directory.

NOTE If the hydrogens in the final pdb file appear to be misplaced, the bondlengths or bondtypes for heteroatoms in the original files are off. There is no fix other than converting the ligand from pdb to mdl (pdb2mdl) and correcting the mdl file (it contains bond types). After these corrections, hydrogens will be added properly.

Misc Scripts countIntCont.pl Count the number of heavy-atom contacts at the interface between chains. rosettaRadii.pl Put Rosetta radii in the B-factor field of a PDB file to display these, use "spacefill temperature" in Rasmol

Peptide Extension Scripts

These scripts help generate input files for peptide extension protocol. Run without args for usage. addDummyAlanines.pl will add placeholder alanines with zero coords to a pdbfile. They mark the spot where the extension will go. The output pdbfile will be renumbered sequentially (atoms and residues) Note that if your input pdbfile is not sequentially numbered for residues, some odd things may happen. makeLoopLibraryFromVall.pl and makeLoopLibraryFromPDBlist.pl As the names suggest, these scripts will generate loop library format files from either a vall style file, or from a list of idealized pdbs. You may use this loop library for your favorite loop modeling protocol and it is also the correct format for peptide extensions.

Resfile Scripts

These scripts make resfiles in the proper format for either Rosetta design mode or for designing with Tanja's interface code. Run script with no arguments to see what the proper usage is. makeResfiles.pl

 Pass in comma separated lists of residues to be designed and also (optional) to be repacked,or to be designed only as hphobic,charged, aromatic or polar.  Pass in the pdbfile  Specify rosetta or interface type resfile to be made  Optionally give a name for the ourput resfile or will write to STDOUT  Outputs a resfile in the appropriate format makeAllPointSubs.pl

 Makes a resfile for all 19 amino acids (no CYS) at the specified design residue  Pass in the residue to be designed  Optional: pass in a file with comma separated residues for repacking  Optional: pass in a file with comma separated one-letter amino acids  (these are the only substitution that will then be made)  Pass in the pdb file  Specify rosetta or interface type resfile to be made

Seqparam Scripts

These scripts parameterize rosetta using sequence profile data, with the result being a soft core potential.

Usage

1. Modify rosetta and set ddG weight to OPTE weight set. 2. Run a psi-blast for a set of proteins and get a series of multiple sequence alignment with the psi-blast. 3. Run the script to put the natural amino acid probability and all the energy term into a single file. 4. Run make to get rosetta_profile_param.

Fragments

o Making Fragments . Setup o Fragment Making Tutorials . James Thompson's Tutorial: How to Pick Fragments . Making Fragments as Part of Loop Modeling o How to Make a vall Without Knowing What You are Doing

Making Fragments WEBSERVER FOR FRAGMENTS

To make fragments locally with make_fragments.pl:

Setup

DATABASES: nr — downloadable from ftp://ftp.ncbi.nih.gov/blast/db/ nnmake_database — included in release. chemshift_database — include in release.

PROGRAMS:

PSI_BLAST — ftp://ftp.ncbi.nih.gov/blast/executables/release/

PSIPRED — http://bioinf.cs.ucl.ac.uk/psipred/

JUFO — http://www.meilerlab.org/

PROFphd — http://www.predictprotein.org/newwebsite/download/index.php

SAM — http://www.soe.ucsc.edu/research/compbio/sam.html nnmake — include in release chemshift — include in release

Configure paths at the top of nnmake/make_fragments.pl to point to these databases and programs. PSI-BLAST must be installed locally

After PSIBLAST and PSIPRED are installed, refer to its README or see quick directions below on how to create a filtered "NR" seqeuence data bank, called "filtnr", which is also used by make_fragments.pl.

Quick directions for creating filtnr:

tcsh% pfilt nr.fasta > filtnr tcsh% formatdb -t filtnr -i filtnr tcsh% cp filtnr.p?? $BLASTDB

1. Obtain a fasta file for the desired sequence. This file must have 60 characters/line with no white space. First line can be a comment starting with the '>' character.

2. Obtain secondary structure predictions from web servers, or setup shareware locally so that make_fragment.pl can run secondary structure predictions locally.

The fragment maker can use predictions from psipred (.jones or .psipred extension), PhD (.phd) and SAM-T99 rdb format (.rdb) and jufo (.jufo). Up to three predictions can be used. At least one must be used.

The getSSpred.pl script can be used to obtain predictions off the web. Edit the config portion of this script to include your email address and to include the correct path to the httpget script. To use this script, provide the fasta filename and the desired method. (invoke the command without arguments to see the usage explanation). Retrieve the secondary structure predictions from your email mail box.

3. (Optional) Prepare files with NMR data if avialbe - these include .cst and .dpl files that are the same files that rosetta uses, and the .chsft_in file that contains chemical shift information. The information from these files can help Rosetta better pick fragments. See the file 'data_formats.README' for the formatting information.

4. Run make_fragments.pl. Invoke without arguments for usage options. Likely the only argument you need to provide is the fasta file.

$> make_fragments.pl -verbose 2ptl_.fasta

If you want to exclude homologous seqeunces from the fragment search, add the -nohoms argument.

$> make_fragments.pl -verbose -nohoms 2ptl_.fasta

Note that if you want to exclude homologs from the chemical shift/TALOS search, you need to edit the talos database. See the README in the chemshift_source directory for instructions.

If you do not have a particular type of secondary structure prediction (say the .jufo file) and you do NOT want make_fragments to try to run the method locally, use the -nojufo option.

$> make_fragments.pl -verbose -nohoms -nojufo 2ptl_.fasta

Two fragment files will be generated with names like aa2ptl_03_05.200_v1_3 and aa2pt_09_05.200_v1_3. The prefix "aa" can be changed by -xx option. "2ptl_" is the five-letter base name which can be specified by -id option or it is derived from the name of fasta file. 03 or 09 indicate the lengths of fragments.

5. Generate loop library in addition to fragment files. Run make_fragments.pl with -template option such as (five-letter code is 2ptl_ for example):

$> make_fragments.pl -template 2ptl_ 2ptl_.fasta it requires 2ptl_.pdb and 2ptl_.zones to be present in your run dir and this pdb is a template pdb file which has been generated by createTemplate.pl described in README.loops". From the zone file, loops can be defined and a library of loop conformations for each defined loop are complied into a file called "2pt_.loops_all" (which usually contains 2000 loop conformations) based on fragment picking. Then the script "trimLoopLibrary.pl" is automatically called to reduce the size of the loop library and output the file as "2ptl_.loops". This file is later on used in the Rosetta loop modeling mode to build variable loops onto the template structure. A loop library differs from a fragment library mainly in that geometrical information is considered to pick "loop" fragments with desired length which can roughly close the gap based on the "take-off" stub positions.

A newer version vall database (2006-05-05) has been provided in nnmake_database together with the orginal version 2001-02-02. You can make fragments using either version of database, just modifying make_fragments.pl to have it pointing to the version you want to use. Currently, making loop library only works with 2001-02-02 version as some newly developed loop modeling methods do not need a loop library any more.( see README.loops for more information) NOTES:

1. name all your files with a five character base name followed by the appropriate extension. The base-name should be the four-letter pdb code and 1 letter chain id. 2. See also pNNMAKE? for a listing of the files involved in the fragment process 3. If a pdb file is in the directory you're making fragments in, nnmake will evaluate the fragment match to the pdb. Note that if the pdb file disagrees with the fasta file, the program will detect an error and stop

Fragment Making Tutorials

James Thompson's Tutorial:How to Pick Fragments

Making Fragments as Part of Loop Modeling

How to Make a vall Without Knowing What You are Doing

Jack Schonbrun May 27, 2004

This is a completely unguaranteed description of the process I went through to make a new vall from a specific set of proteins.

A "vall" is a what we call the list of idealized protein structures from which pNNMAKE picks fragments. It must contain the secondary structure (H,E or L) of each amino acid, the idealized backbone torsion angles (phi,psi,omega) of each amino acid, and a sequence profile for each position. Naturally these must be placed in a specific format.

1. Secondary structure is usually generated by dssp. I used /users/jack/bin/dsspcmbi.lnx 2. Idealized torsion angles are obtained by running rosetta in idealize mode: /users/jack/rosetta++/rosetta.gcc -idealize -l >& idealize.log

This will give you a new set of pdbs, with idealize torsion angles at then end of the file. They will have names like "1pdbA_0001.pdb", if 1pdbA.pdb was your starting structure. At the end of each of these new structures will be the idealized torsion angles. Also, if you have placed your dssp files for each protein where rosetta can find them, it will include the dssp secondary structure assignments in the table of torsion angles. This is nice, because rosetta has done the parsing for you. You can check that rosetta is finding and reading your dssp files by exmaining your stderr log file from the idealization run.

3. Profiles are made using multiple sequence alignments from psiblast. This part is a bit tricky, because you need to have databases of sequence files. There is a set on shampoo.baker in /scratch/shared/genomes. I don't know exactly which of the files in there you need. It is recommended that you run with these files on a scratch drive. But shampoo is a single processor machine, and I didn't have have room on the /scratch partition of peake. So I put a copy in /dump/jack/genomes. Because some scripts expect things to be in /scratch/shared/genomes, I made a symbolic link on peake:

ln -s /dump/jack/genomes /scratch/shared/

But I was only putting 11 proteins in my vall. If you are doing more, it is recommended you actually find a computer with space on its scratch drive. If you're lucky, there may already be a copy of genomes/ on it. You might want to know that your genomes directory is as current as possible, but you'd have to talk to Dylan about how to get an updated one. You will need fasta files for proteins to submit to psiblast. You can make this with Dylan's script (avaiable from cvs co pdbUtil):

/users/dylan/src/pdbUtil/getFastaFromCoords.pl -p -chain > pdbfile.fasta

Now you should be able to run another script of Dylan's to make your profiles. This takes a little while (I found ~20 minutes per 300 residue protein.) There are few things you should set up first. I set my BAKER_HOME environment variable to /users/dylan. Until everything is standardized, this is useful. In tcsh:

setenv BAKER_HOME /users/dylan

You run the script as:

/users/dylan/src/msaUtil/quickblast.pl 1pdbA.fasta outdir

As far as I know there is no batch processing, so I did

/bin/ls -1 *fasta | awk '{system ("./quickblast.pl "$1" outdir")}' Where outdir is a previously created directory for the output. When this is all done, you should have many files in your outdir, some of which have the suffix .checkPROFILE. These contain the profiles that you want to use in your vall. They contain the residue preferences from psiblast, plus blosum substitions for positions with no information.

4. Now you have all the information you need to make your vall. I have a is a primitive awk script that will put it together for you. It is available via 'cvs co rosetta_scripts/vall'. To run, I recommend you make a directory containing all your idealized pdbs, and all your .checkPROFILE files, and nothing else. Go to that directory and run:

~/rosetta_scripts/vall/assemble.awk * > vall.dat.whatever

You vall name *must* start with vall.dat or pNNMAKE will get mad. I believe you can have whatever you want after that.

5. Now you just need to know how to make make_fragments.pl work! There should be a readme for that too. 6. Caveats, I have not talked about discontinuous chains, or making the files needed for homolog detection. Because I don't really know how to do either. 7. Please let me know if you try this protocol, and where it fails.

Rosetta Databases avgE_from_pdb bb21sdep06.Jan.sortlib bbdep02.May.sortlib bb_hbW bbind00.Nov.lib disulf_jumps.dat DunbrackBBDepRots12.dat The Dunbrack Backbone-Dependent Rotamer Library dunsd energy_quantile__atre__aa_ss_sf_nb.data energy_quantile__dune__aa_ss_sf_nb.data energy_quantile__hbe__aa_ss_sf_nb.data energy_quantile__intrae__aa_ss_sf_nb.data energy_quantile__paire__aa_ss_sf_nb.data energy_quantile__probe__aa_ss_sf_nb.data energy_quantile__repe__aa_ss_sf_nb.data energy_quantile__rese__aa_ss_sf_nb.data energy_quantile__sole__aa_ss_sf_nb.data energy_quantile__spk__aa_ss_sf_nb.data energy_quantile__tlje__aa_ss_sf_nb.data Equil_AM.mean.dat Equil_AM.stddev.dat Equil_bp_AM.mean.dat Equil_bp_AM.stddev.dat Fij_AM.dat Fij_bp_AM.dat jump_templates.dat jump_templates_v2.dat Paa Paa_n Paa_pp paircutoffs pdbpairstats_fine phi.theta.36.HS.resmooth phi.theta.36.SS.resmooth plane_data_table_1015.dat Rama_smooth_dyn.dat_ss_6.4 SASA-angles.dat SASA-masks.dat sasa_offsets.txt sasa_prob_cdf.txt sc_hbW smart_scorefilter.pl template.pdb unsatisfied_buried_polar__pdb__aa_at.data unsatisfied_buried_polar__pdb__aa_at_ss.data