Analytical & Bioanalytical Chemistry

ANALYTICAL & BIOANALYTICAL CHEMISTRYElectronic Supplementary MaterialProof of principle of a generalized fuzzy Hough transform approach to peak alignment of 1D 1H-NMR dataLeonard Csenki, Erik Alm, Ralf J.O. Torgrip, K. Magnus Åberg, Lars I. Nord, Ina Schuppe-Koistinen, Johan LindbergExperimentalNMR spectrometer: All NMR data were acquired using a Bruker DRX-600 spectrometer operating at 600.23 MHz for 1H observation (Bruker Analytische Messtechnik GmbH, Rheinstetten, Germany). NMR spectral data processing: The acquired Free Induction Decays (FIDs) were zero filled to double the number of data points (64k) and multiplied by an exponential line broadening function of 0.3 Hz prior to Fourier transformation. The spectra were corrected for phase and baseline distortions, and referenced internally to the downfield signal of the anomeric proton signal of -glucose at 5.236 ppm (for the plasma data) and TSP at 0 ppm (for the urine data) using in-house automatic software (PhaseCore ver. 4.0, by Dr. Ralf Torgrip) for Matlab. The spectra were not normalised prior to calculations.Plasma dataset Samples and batches: A total of 188 human plasma samples were analysed by 1H– NMR spectroscopy and the analysis runs were divided into 19 batches. The analytical performance between batches was assessed by analysing quality control (QC) samples prepared from a pooled sample of excess plasma. Each batch was composed of ten samples, three QC samples and one blank sample (14 in each batch). The samples were analysed twice (the second measurement ~10 h after the first), thus yielding duplicate spectra for each sample. The plasma samples were stored at -80°C before preparation for analysis, which consisted of adding 330 μL D2O to 250 μL sample and centrifugation (16060 g, 5 min). The resulting supernatants were analysed by NMR. The NMR spectra of the QC samples were selected for analysis in this paper, thus limiting the number of peak shift effects in the dataset and the variance of the intensities of the peaks.NMR spectroscopy: A 5 mm BBI 1H–13C Z-GRD probe was used for the study. NMR spectra were acquired using a Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence(relaxation delay –90°–{–180°–}n – acquire-FID), where n = 128 and  = 300 s giving a total T2 relaxation time of 77 ms. The residual water resonances were suppressed by presaturation during the relaxation delay. The spectral acquisition parameters used were as follows: 256 free induction decays (FIDs) collected into 32k complex data points, 12019 Hz spectral width, 2.73 s acquisition time, 2.27 s relaxation delay, temperature 293 K. The setting for the receiver gain was fixed at 547.7 for all measurements. Urine dataset Samples and batches: The rat urine dataset has been described in an earlier paper [1]. Rat urine was sampled twice a day (at 0-7 h and 8-24 h) on five and two days before, and days 1-7 after, the commencement of dosing with ethionine. The samples were each stored on ice during collection, then 1 % (w/v) sodium azide was added to 1 mL portions, which were centrifuged at 500g for 10 min. The resulting supernatants were stored at –80°C while awaiting analysis by NMR spectroscopy. Portions of the samples (400 μL) were then mixed with 200 μL of 0.2 M phosphate buffer 2 (Na2HPO4/NaH2PO4, pH = 7.4), and TSP (3-trimethylsilyl-1-[2,2,3,3,− H4] propionate, internal standard) prepared in deuterium oxide (D2O) was added to a final concentration of 0.09 mg/mL. NMR spectroscopy: A 4 mm FISEI 1H–13C Z-GRD probe was used for the study. Spectra were acquired using a standard Bruker NOESY presaturation pulse program(relaxation delay–90°–1–90°–m–90°–acquire-FID), where m = 100 ms. The residual water resonances were suppressed by presaturation during the relaxation and mixing time (m). Spectral acquisition parameters used were as follows: 64 FIDs were collected into 32k complex data points, 12019 Hz spectral width, 2.73 s acquisition time, 4.83 s total pulse recycle delay, at 300 K.Implementation of the algorithm In step one of the algorithm, peak detection is performed by taking the first and second derivative of the spectra, locating zero crossings of the first derivative and assuring that the second derivative is negative at the zero crossings. An additional constraint on a possible peak is that the peak shape must be decreasing for two data points on both sides of the maximum. In step two of the algorithm, for the plasma subset, the shape vector is derived from the histidine singlet at about 7.03 ppm, Fig. ESM_1. The impact of different values of the expansion factor  can be seen in Figs. ESM_2A, B and C. An important aspect is the selection of a peak from which to derive the shape vector. To obtain a well- determined shape vector it is necessary to select a peak that is as sensitive as possible to physico-chemical parameters, but the identity of the peak must be absolutely correct in order to avoid erroneous subsequent assignments. An erroneous assignment can be detected if there is a spectrum in which the number of peaks is considerably smaller than in the other spectra. Examples of shape vectors can be seen in Figs. 1, ESM_1 and 4 for the plasma dataset, and in Fig. 5. for the urine dataset. Figure ESM_1 Expanded image of the histidine model peak. The shape vector is plotted on top of the image representation of the peak.To conduct step three, according to the definition the generalized fuzzy Hough transform is a double sum over all the elements of X for every k. For practical reasons the sum over j is truncated to approximately 5. In this way the rows in H can be computed as a convolution between two vectors, xi (the i:th row in X) and a truncated Gaussian filter g of length L (L is an odd number) with the Gaussian centered on element number (L+1)/2+si. The value of L is determined by the truncation length(5and maxi[1, N](|si). Convolution between two vectors is a fast operation in Matlab [2] and the total time for computing the Hough transform in our implementation is about 1.5 minutes for a 64k NMR dataset of 100 samples using a 3.0 GHz computer. The transform can be seen in Fig. ESM_2D. In step four of the algorithm local maxima of H are extracted. A local maximum in H is an element that has a higher value than all its neighbours such that the centre element in a 33 sub-matrix has the highest value. This process, however, yields many spurious maxima that do not correspond to real peaks. Therefore the requirement for a maximum is increased to be the centre point and the maximum point of a 55 sub-matrix of H. The following process is utilised to extract peak information from the spectra. Given a local maximum in H, hkl, the positions of the corresponding peaks can be predicted in the spectra. This is referred to as a peak trace, that is, a predicted shape pattern. The algorithm assigns a peak, in spectrum i, within 2.5 from the predicted location j = k–sil on row i of X and records the intensity at this position, j, in raw spectrum i. Thereafter the peak is removed from X by setting the value of that element, xij, to zero. Figure ESM_2 Illustration of the principle for calculating the transform. A presents translated and expanded shape vectors sliding across the raw data. Two different expansion factors are used for demonstration. B is the same as A but when calculating the actual transform, the indicator matrix was used. C shows the transform calculated for two different values of . The circles and diamonds mark the ppm anchor point which is one of the two dimensions of the transform space. The blue line and the red line represent the extracted vectors of the Hough transform. D: An image of the transform matrix when calculated for all , in which the extracted vectors in C are shown. (red circles and lines:  = 1.0, blue diamonds and lines:  = 0.1, anchor spectrum k = 56) Algorithm parameters Bucketing: Bucket size was set to 0.04 ppm (132 bins). PARS: Search window size 0.012 ppm (40 bins), mismatch weight = 0.3. Score function: linear in distance and linear in relative intensity. Target spectrum: the GFHT anchor spectrum (k = 56). GFHT: For the plasma dataset the expansion factors () used for computing the Hough transform were 0 to 2 with a step size of 0.1. s = 0.0005 ppm (1.6 bins). For the urine dataset the expansion factors used were –2.5 to 2.5 with a step size of 0.1. s = 0.0005 ppm (1.6 bins). GFHT maxima with a value of hij lower than 7 votes were discarded. COW: Segment size 0.0183 ppm (60 bins) and max slack 0.0092 ppm (30 bins). Target spectrum: the GFHT anchor spectrum. The COW algorithm was used “as is” from [3].Refererences1. Torgrip R J O, Lindberg J, Linder M, Karlberg B, Jacobsson S, Kolmert J, Gustafsson I, Schuppe-Koistinen I (2006) Metabolomics 2:1-19 2. The MathWorks Inc., 3 Apple Hill Drive, Natick, MA, 01760-2098, USA. Ver. 7.3.0.267 (R2006b). Matlab. (2006) 3. Tomasi G, van den Berg F. Dept. of Food Science, The Royal Veterinary and Agricultural University, Denmark. DTW and COW, Code for signal alignment by Dynamic Time Warping and/or Correlation Optimized Warping for Matlab. (2006) http://www.models.kvl.dk/source/ Figure ESM_1 Figure ESM_2

Analytical & Bioanalytical Chemistry

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support