Yselect Users Guide

Total Page:16

File Type:pdf, Size:1020Kb

Yselect Users Guide

ySelect Users’ Guide (Sequest-compatible version)

Background. During the course of a typical run, Sequest 1 examines fragmentation spectra obtained from a LC-MS/MS experiment and tries to match them with peptide sequences drawn from a protein database. It reads the peaks list file (in .dta format) for each fragmentation spectrum in turn and writes out a companion search results file (.out format) listing the best matches with supporting information.

The utility program DTASelect 2 may then be invoked with the “-n” option, combing through all of the generated .out files and, among other things, assembling a list that includes one line of information for the best match to each fragmentation spectrum. A sample fragment of a DTASelect.txt file is included in the Appendix. As Sequest and

DTASelect have been described elsewhere, please refer to the references for further information.

The purpose of the ySelect program is to comb through the DTASelect-generated

“DTASelect.txt” file in turn, filtering out matches to spectra that fail to meet a confidence cutoff level and producing a list of those that passed for consumption by yRatios.

ySelect was implemented separately from its companion program, yRatios, to simply the process of integrating yRatios’ functionality with the results from other search programs such as Mascot. Further information concerning yRatios is provided in a separate users’ guide.

Algorithmic Details. ySelect uses the PRISM strategy 3 to assess confidence levels.

Briefly, by mapping a large dataset including decoy database hits onto a multi- dimensional space with axes representing such parameters as cross-correlation and examining the relative density of protein hits and decoy hits, an empirical function predicting confidence levels was derived. All necessary parameters for the function

(XCorr, deltCN, spRank and charge) are present in the “DTASelect.txt” file.

How to compile YSelect. The ySelect program is written in C. To compile it on a

UNIX platform, set the current working directory to be that directory within which the ySelect source has been copied and enter “cc ySelect.c –lm –o ySelect” at the shell prompt.

It is also readily possible to compile ySelect on any Windows platform that has a C compiler installed. Freeware programs such as Quincy 2005 are available on the web and are more than adequate for this purpose. The ySelect executable may be run from a

Windows command prompt.

Prior to compilation, one computational detail needs attending to, namely the

PATH_SEP_CHAR defined constant. If compiling on a UNIX system, this should be set to ‘/’ (i.e. a forward slash). On a Windows system, it should instead be set to ‘\\’ (i.e. a backslash; an extra backslash is necessary as an escape character as part of the C language convention).

Inputs. Invoking ySelect without any arguments, or with arguments incorrectly specified, will cause the following text to be displayed:

Usage: ySelect [option(s)] ..

Options are: -q set the confidence cutoff -L results for one locus only -1 ignore singly-charged matches -d output directory+filename The “-q” option is used to set a confidence level cutoff. The accepted range is a real number from 0 to 99.5 inclusive. If 95 were specified, for example, a 95% confidence level cutoff would be applied. Left unspecified, the default is 99%.

Users may wish to exclude matches to spectra based on presumed singly charged species of peptides. This can be accomplished by specifying the “-1” option; only matches based on the assumption of a multiple-charge (+2 or +3) will be listed.

It is necessary to provide the file name of at least one DTASelect.txt file as an argument to ySelect. Although an arbitrary number of these files may be specified, the ability to handle more than one was implemented for purpose of merging the quality- filtered results of precisely two searches performed on the same set of .dta files: one normal and one presuming that the tryptic peptides have been labeled with two 18O atoms at their C-terminal carboxyl groups (this labeling is explained more fully in the yRatios

Users’ Guide).

Output. All yRatios output is directed to standard output (i.e. the shell or command window). To store it in a file, redirect the output (i.e. add “> ” at the end of the command; this will work both for both a UNIX shell prompt and a Windows command prompt.

The output itself is a “scan list”, consisting of two types of lines. The locus line begins with “---locus---“, followed by a space and then a protein identifier. Until the next locus line is encountered, all spectra and peptides listed are considered to belong to that locus.

Accordingly, the second type of line names a spectrum file and the amino acid sequence of the associated peptide. The sequence of the peptide may appear on more than one line because a LC-MS/MS experiment may capture more than one spectrum that matches the peptide. It’s presumed that all the spectra associated with a given peptide will appear in successive lines, rather than being scattered about in the file. A portion of a scan list file is provided as a sample below:

--locus-- B0035.7 CW_042507_worm_both_step06.15642.15642.2.dta VGAGAPVYLAAVLEYLAAEVLELAGNAAR CW_042507_worm_both_step06.15630.15630.2.dta VGAGAPVYLAAVLEYLAAEVLELAGNAAR CW_042507_worm_both_step06.15300.15300.2.dta VGAGAPVYLAAVLEYLAAEVLELAGNAAR CW_042507_worm_both_step06.15289.15289.2.dta VGAGAPVYLAAVLEYLAAEVLELAGNAAR CW_042507_worm_both_step04.14782.14782.2.dta LLAGVTIAQGGVLPNIQAVLLPK CW_042507_worm_both_step05.12404.12404.2.dta LLAGVTIAQGGVLPNIQAVLLPK CW_042507_worm_both_step04.15550.15550.2.dta LLAGVTIAQGGVLPNIQAVLLPK CW_042507_worm_both_step04.14749.14749.2.dta LLAGVTIAQGGVLPNIQAVLLPK --locus-- B0035.9 CW_042507_worm_both_step05.8653.8653.2.dta TVTAMDVVYALK CW_042507_worm_both_step05.8639.8639.2.dta TVTAMDVVYALK --locus-- B0041.4 CW_042507_worm_both_step03.6050.6050.1.dta NIPGVDVMNVER CW_042507_worm_both_step03.6039.6039.1.dta NIPGVDVMNVER CW_042507_worm_both_step05.7627.7627.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step05.7623.7623.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step04.6831.6831.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step04.6811.6811.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step04.6807.6807.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step04.6796.6796.2.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step05.7699.7699.1.dta GHVIDQVAEVPLVVSDK CW_042507_worm_both_step05.7698.7698.1.dta GHVIDQVAEVPLVVSDK

ySelect lists the loci in alphabetical order.

To quickly examine a single protein of interest, use the “-L” option. The scan list subsequently produced will be for that locus alone, rather than the full list for all loci having matches to spectra that meet the confidence level cutoff. If the given locus doesn’t have any such spectra or is not represented in the DTASelect.txt file at all, the scan list will be empty.

In the sample above, only the names of the .dta files appear. If the “-d” option is specified, the directory that each file was in will be prefixed to the file’s name. This is useful in those cases where multi-step LC-MS/MS (MudPIT) was performed, with the spectrum files being deposited in several different subdirectories. Instead of having to gather them all into one directory prior to running yRatios, it can be invoked with the current working directory unchanged following the DTASelect and ySelect runs.

References.

1. Eng, J.K.; McCormack, A.L.; Yates, J.R. III An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. 1994. J. Am. Soc. Mass Spectrom. 1994. 5, 976-89 2. Tabb, D.L.; McDonald, W.H.; Yates, J.R. III DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. 2002. J. Proteome Res. 1(1):21-6. 3. Kislinger, T.; Rahman, K.; Radulovic, D.; Cox, B.; Rossant, J.; Emili, A. PRISM, a generic large-scale proteomic strategy for mammals. 2003. Mol. Cell. Proteomics 2(2):96-106 Appendix

Sample fragment of a DTASelect.txt file: (note than extensive line-wrapping has occurred):

DTASelect v1.8 /data/search/Carl_2007/O16vsO18/both/2007-04-25

S /data/dbase/C.elegans/wormpepandreverseAug2006.fasta 0.0 0.0 0.0 0.0 true true SM 57.0 C DM 0.0 * STY 0.0 0.0 DM 0.0 # M 0.0 0.0 DM 0.0 @ KR 0.0 0.0 Type Locus Length MolWt pI Gene Name Type Filename Subdirectory XCorr DeltCN PrecursorMass TotalIntensity SpRank IonProportion Sequence SequencePosition Tryptic UniqueToLocus L 2L52.1 427 50017.95 8.387695 CE32090 WBGene00007063 Zinc finger, C2H2 type status:Partially_confirmed TR:Q9XWB3 protein_id:CAA21776.2 U D CW_042507_worm_both_step04.2619.2619.1 CW_042507_worm_both_step04 0.8276 0.0152 425.8 2967.5 4 1.0 K.EFK.S 132 2 false U D CW_042507_worm_both_step04.3621.3621.1 CW_042507_worm_both_step04 0.7774 0.1261 424.49 3990.1 2 1.0 K.EFK.S 132 2 false U D CW_042507_worm_both_step04.5869.5869.2 CW_042507_worm_both_step04 1.7488 0.0755 1504.94 8773.7 5 0.45833334 K.MPKIEVEDSLVNK.F 387 2 true U D CW_042507_worm_both_step04.3352.3352.1 CW_042507_worm_both_step04 0.9388 0.0334 516.16 3063.6 9 0.6666667 K.RPSR.A 404 2 false U D CW_042507_worm_both_step04.3401.3401.1 CW_042507_worm_both_step04 1.0408 0.0774 513.29 2832.4 5 0.6666667 K.RPSR.A 404 2 false U D CW_042507_worm_both_step04.4186.4186.1 CW_042507_worm_both_step04 0.9512 0.0163 512.6 3550.8 29 0.5 K.RPSR.A 404 2 false U D CW_042507_worm_both_step06.9171.9171.3 CW_042507_worm_both_step06 2.0919 0.1901 3901.19 10440.4 29 0.12096774 R.CNYDSDESELESDEFWSATEMSDNEEVYVNFR.G 187 2 true U D CW_042507_worm_both_step02.14490.14490.2 CW_042507_worm_both_step02 1.4666 0.022 2202.57 7637.2 231 0.20588236 R.EECIQPVSVEKNILHFEK.F 283 2 true U D CW_042507_worm_both_step03.3233.3233.1 CW_042507_worm_both_step03 1.0342 0.0136 507.27 3324.0 77 0.5 R.ENNK.F 311 2 false U L 2RSSE.1 343 37960.273 9.23877 CE32785 WBGene00007064 status:Partially_confirmed TR:Q8I133 protein_id:CAD59137.1 U D CW_042507_worm_both_step04.11120.11120.3 CW_042507_worm_both_step04 1.9126 0.0075 3563.43 6773.9 72 0.10294118 K.CAGAYSLAAIHLAEEASPEPTPTTSKPPRGNGVGR.A 268 2 true U

Recommended publications