SAMPLE METHODS FOR PUBLICATIONS
Please customize any of the following sections for your publication:
General analysis version 1
Protein identification and quantification analysis were done with Integrated Proteomics Pipeline (IP2, Bruker Scientific LLC, Billerica, MA, http://www.bruker.com) using ProLuCID/Sequest 1,2, DTASelect2 3 4 and Census 5,6. Tandem mass spectra were extracted into MS1 and MS2 files 10 from raw files using RawExtract 1.9.9 (http://fields.scripps.edu/downloads.php) and were searched against IPI human protein database (version 3_57_01, released on 01-01-2009; plus sequences of known contaminants such as keratin and porcine trypsin concatenated to a decoy database in which the sequence for each entry in the original database was reversed 11 using ProLuCID/Sequest. LTQ data were searched with 3000.0 milli-amu precursor tolerance and the fragment ions were restricted to a 600.0 ppm tolerance. All searches were parallelized and performed on XXX LINUX cluster with xxx cores. Search space included all fully- and half-tryptic peptide candidates with no missed cleavage restrictions. Carbamidomethylation (+57.02146) of cysteine was considered a static modification and we require 2 peptides per protein and at least one tryptic terminus for each peptide identification. The ProLuCID search results were assembled and filtered using the DTASelect program (version 2.0) with false discovery rate (FDR) of 0.05; under such filtering conditions, the estimated false discovery rate was below ~1% at the protein level in all analysis.
General analysis version 2
Analysis of Tandem Mass Spectra. Protein identification and quantification and analysis were done with Integrated Proteomics Pipeline – IP2 (Bruker Scientific LLC, Billerica, MA, http://www.bruker.com)) using ProLuCID/Sequest 1,2, DTASelect2 3 4, and Census 5,6. Spectrum raw files were extracted into MS1 and MS2 files from raw files using RawExtract 1.9.9(http://fields.scripps.edu/downloads.php) 10, and the tandem mass spectra were searched against the European Bioinformatics Institute (IPI) mouse protein database (www.ebi.ac.uk/IPI/IPImouse.html, downloaded on January 1, 2009). To estimate peptide probabilities and FDRs accurately, we used a target/decoy database containing the reversed sequences of all the proteins appended to the target database 11. Tandem mass spectra were matched to sequences using the ProLuCID algorithm with 50 ppm peptide mass tolerance for precursor ions and 400 ppm for fragment ions. ProLuCID searches were done on an Intel Xeon cluster running under the Linux operating system.
The search space included all fully and half-tryptic peptide candidates that fell within the mass tolerance window with no miscleavage constraints. Carbamidomethylation (+57.02146 Da) of cysteine was considered as a static modification. The validity of peptide/spectrum matches (PSMs) was assessed in DTASelect using two SEQUEST-defined parameters, the cross-correlation score (XCorr), and the normalized difference in cross-correlation scores (DeltaCN). The search results were grouped by charge state (+1, +2, +3, and greater than +3) and tryptic status (fully tryptic, half-tryptic, and nontryptic), resulting in 12 distinct subgroups. In each of these subgroups, the distribution of Xcorr, DeltaCN, and DeltaMass values for (a) direct and (b) decoy database PSMs were obtained; then the direct and decoy subsets were separated by quadratic discriminant analysis.
Full separation of the direct and decoy PSM subsets is not generally possible; therefore, peptide match probabilities were calculated based on a nonparametric fit of the direct and decoy score distributions. Peptide confidence of 0.95 was set as the minimum threshold. The FDR was calculated as the percentage of reverse decoy PSMs against target PSMs that passed the confidence threshold. Each protein identified was required to have a minimum of two peptides. After this last filtering step, we estimate that protein FDRs were below 1% for each sample analysis. Each dataset was searched twice, once against light and then against heavy protein databases. After the results from SEQUEST were filtered using DTASelect2, ion chromatograms were generated using an updated version of a program previously written in our laboratory.
This software, called “Census”, is available from the authors for individual use and evaluation through an Institutional Software Transfer Agreement (see http://fields.scripps.edu/census for details). First, the elemental compositions and corresponding isotopic distributions for both the unlabeled and labeled peptides were calculated, and this information then was used to determine the appropriate m/z range from which to extract ion intensities, which included all isotopes with greater than 5% of the calculated isotope cluster base peak abundance. MS1 files were used to generate chromatograms from the m/z range surrounding both the unlabeled and labeled precursor peptides.
Census calculates peptide ion intensity ratios for each pair of extracted ion chromatograms. The heart of the program is a linear least-squares correlation that is used to calculate the ratio (i.e., the slope of the line) and closeness of fit [i.e., correlation coefficient ®] between the data points of the unlabeled and labeled ion chromatograms. Census allows users to filter peptide ratio measurements based on a correlation threshold; the correlation coefficient (values between zero and one) represents the quality of the correlation between the unlabeled and labeled chromatograms and can be used to filter out poor-quality measurements. In this study, only peptide ratios with the coefficient correlation values (r2) greater than 0.5 were used for further analysis.
In addition, the Census provides an automated method for detecting and removing statistical outliers. In brief, SDs are calculated for all proteins using their respective peptide ratio measurements. The Grubbs test (P < 0.01) then is applied to remove outlier peptides. The outlier algorithm is used only when more than two peptides are found in the same protein because the algorithm becomes unreliable for a small number of measurements. Final protein ratios were generated with QuantCompare, which uses Logtwofold change and ANOVA P value to identify regulated significant proteins. For a protein to be considered in our screen, it had to be “plotted” on our volcano scatter plot (Fig.XX); the y-axis of these volcano plots is the ANOVA p-value, which requires each protein to be quantified in at least two of the biological replicates (so we can calculate the variance) for both the trimmed and brushed animals.
General analysis version 3
Protein identification and quantification and analysis were done with Integrated Proteomics Pipeline-IP2 (Bruker Scientific LLC, Billerica, MA, http://www.bruker.com)) using ProLuCID/Sequest 1,2, DTASelect2 3 4, and Census 5,6. Spectrum raw files were extracted into ms1 and ms2 files from raw files using RawExtract 1.9.9 (http://fields.scripps.edu/downloads.php) 10, and the tandem mass spectra were searched against XXX protein database.
We performed label-free quantitative analysis using Census through Integrated Proteomics Pipeline (IP2, Bruker Scientific LLC, Billerica, MA, http://www.bruker.com). We grouped biological replicates of each sample to determine the protein and peptide level quantitative measurements. Census used protein identification results from DTASelect2 (Cociorva et al., 2007; Tabb et al., 2002) and generated a reconstructed MS1 based chromatogram for each identified peptide. When peptides are not identified in all the relevant samples, Census went through spectra searching them using accurate precursor mass, retention time, and charge states in order to retrieve them to build chromatograms. To increase accuracy for finding peptide precursors, we applied smoothing and calculated Pearson product-moment correlation coefficient comparing theoretical and experimental isotope distributions to minimize false peak detection. When peaks are not detected, we calculated background noise to assign small values to peptides. We calculated protein abundance from sum of peptide intensities. The statistical significance label free of the differential expression of all proteins was assessed using a two-tailed paired t-test on their corresponding peptide quantification ratios between both conditions. FDR-adjusted p values are calculated using the Benjamini–Hochberg correction.
1 Eng, J. K., McCormack, A. L. & Yates, J. R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry 5, 976-989 (1994).
2 two papers
1) Xu, T. a fast and sensitive tandem mass spectra-based protein identification program. Mol Cell Proteomics. Molecular & Cellular Proteomics 5(10):S174–S174 (2006).
2) Xu T, Park SK, Venable JD, Wohlschlegel JA, Diedrich JK, Cociorva D, Lu B, Liao L, Hewel J, Han X, Wong CCL, Fonslow B, Delahunty C, Gao Y, Shah H, ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity, J Proteomics. 2015
3 Cociorva, D., D, L. T. & Yates, J. R. Validation of tandem mass spectrometry database search results using DTASelect. Current protocols in bioinformatics / editorial board, Andreas D. Baxevanis … [et al.] Chapter 13, Unit 13 14, doi:10.1002/0471250953.bi1304s16 (2007).
4 Tabb, D. L., McDonald, W. H. & Yates, J. R., III. DTASelect and Contrast: Tools for Assembling and Comparing Protein Identifications from Shotgun Proteomics. Journal of Proteome Research 1, 21-26 (2002).
5 Park, S. K., Venable, J. D., Xu, T. & Yates, J. R., 3rd. A quantitative analysis software tool for mass spectrometry-based proteomics. Nat Methods 5, 319-322, doi:10.1038/nmeth.1195 (2008).
6 Park, S. K. et al. Census 2: isobaric labeling data analysis. Bioinformatics 30, 2208-2209, doi:10.1093/bioinformatics/btu151 (2014).
7 Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. & Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol 24, 1285-1292, doi:10.1038/nbt1240 (2006).
8 Lu, B., Ruse, C., Xu, T., Park, S. K. & Yates, J., 3rd. Automatic validation of phosphopeptide identifications from tandem mass spectra. Anal Chem 79, 1301-1310, doi:10.1021/ac061334v (2007).
9 Robinson, P. N., Wollstein, A., Bohme, U. & Beattie, B. Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics 20, 979-981, doi:10.1093/bioinformatics/bth040 (2004).
10 McDonald, W. H. et al. MS1, MS2, and SQT- Three Unified, Compact, and Easily Parsed File Formats for the Storage of Shotgun Proteomic Spectra and Identifications. Rapid Commun. Mass Spectrom. 18, pp2162-2168 (2004).
11 Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J. & Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2, 43-50 (2003).