Louis et al. The goodness of fit of the modeled metabolic signal can be checked by using the built-in tools of the R batman package. Hao et al. However, it is worthwhile to note that comparing the integrated bin intensities with the BATMAN metabolite fit for multiplets in crowded-peak regions is less informative i. To illustrate this point, consider Fig 7 showing the diagnostic scatterplot for alanine.
While the integration values are aligned for the doublet at 1. This is primarily due to the glucose resonances and signals from other metabolites that lie in the vicinity of the quadruplet and which contribute to the integrated bin intensity see Fig B in S1 File.
Each number corresponds to a specific spectrum. Once the metabolic signal has been correctly assigned, the residual signal captured by the wavelet component of the BATMAN model can be used to estimate the lipid concentrations.
Towards this aim, integration regions that encompass lipid resonances were specified. Lipid resonances typically appear as broad peaks in NMR spectra. In this way, a set of lipid-specific features were obtained in addition to the relative metabolic concentrations. This approach only works when the metabolic signal has been sufficiently extracted.
Signal treatment and signal analysis in NMR
Should this not be the case, the residual signal will be contaminated by other metabolites resonating in the area. The narrower spectral binning integration regions delimited by red dashed lines capture lipid signals, but not necessarily exclusively. As a result, the following five sets of predictors were obtained:. Classifiers were built by using each set of predictors. The predictive performance of the classifiers was assessed by using a three-fold cross-validation CV scheme see Fig 9.
CV works by dividing the dataset in two parts, a training set and a test set. In K -fold CV, the data are split into K roughly equal parts. Thus, in three-fold CV, one-third of the data forms the test set and the remaining two-thirds of the data i. At each iteration, the performance of the classifier is evaluated in terms of the proportion of misclassifications and the sensitivity and specificity of the classifier when applied to the test set.
Log in to your subscription
Since the splitting is not uniquely determined, [ 12 ] the cross validation procedure was repeated times. The overall performance is based on the mean classification error rate, the mean sensitivity, and the mean specificity of the classifiers. For the classification analysis involving the binning features, variable selection was based on the discriminative power of the individual bins between the two conditions.
This was assessed by using the limma -based moderated t-statistic [ 13 ], as indicated by the asterisk in Fig 9.
The classification analysis involving the BATMAN estimated features proceeded using the following three subsets of the features: 1 all the BATMAN-estimated relative metabolite concentrations, 2 all the relative lipid concentrations, and 3 all the BATMAN-estimated relative metabolic concentrations together with all the relative lipid concentrations.
Five procedures that are appropriate for the analysis of large, complex datasets were used to build the classifiers, namely elastic net, lasso, orthogonal partial least squares-discriminant analysis OPLS-DA , support vector machines SVMs , and random forests RF. A brief description of each method is provided in S1 File and the reader is referred to Hastie et al. Although the limma -based moderated t-statistic was not used to build the BATMAN feature-based classifiers, the test was applied in iterations of three-fold cross-validation as a univariate approach to identify the top 15 variables of each of the five sets of predictors.
These variables were identified to check whether there were any similarities in the most discriminative variables selected for each classification task. The statistical analysis was conducted by using the R statistical software version 3. Classification methods were implemented by using the default options of the R Bioconductor package CMA [ 12 ]. For the BATMAN analysis, many multiplets were modelled best by using either empirical multiplets or raster multiplets. However, as stated in [ 8 ], while this does not necessarily result in a perfect fit, it does allow the user to capture metabolites, which may otherwise not be possible.
The resonances are more distinguishable in the MHz spectrum compared to the MHz spectrum. For the MHz spectrum, the four integration regions from left to right, beginning at 3. The original spectrum is shown in yellow. The two components of the BATMAN model fit, that is, the component modeling the metabolic signal metabolites fit and the component capturing the residual signal wavelet fit are indicated by blue and red curves, respectively.
The fit sum which is the sum of the metabolite fit and the wavelet fit is shown in black. The shaded regions show the resonances from creatine blue , creatinine yellow , lysine pink , and tyrosine green that are captured by the metabolite fit. Binning integration region limits for the region are delimited by grey dotted lines. The top 15 discriminative features, based on the univariate analysis, of each set of predictors are listed in Tables D to H in S1 File.
Only the elastic net classifiers are further discussed as they proved to be one of the better performing classifiers see Fig J in S1 File. Table 1 presents the mean cross validated classification error, sensitivity, and specificity of the MHz and MHz classifiers. Note that for the PepsNMR automatically pre-processed MHz spectra, an additional spectral alignment step was carried out to improve the homogeneity of the bins in terms of the signal captured across spectra.
Histograms of the probability of lung cancer for the different sets of features are presented in Fig Each histogram is based on the classifiers developed using the subset of features indicated by the letter a in Table 1. Assuming that a probability greater than 0. Blue corresponds to the control samples and red represents the lung cancer samples. Both spectral binning and spectral deconvolution using BATMAN require expert knowledge of the characteristic spectral signatures i.
For spectral binning, this insight is necessary to select meaningful integration regions.
For spectral deconvolution using BATMAN, this information is required to accurately specify and refine the prior information on each multiplet of interest. Although metabolites have characteristic resonances, experimental parameters and pre-processing steps influence the resultant chemical shift positions, identifiable coupling patterns, and relative peak intensities. Note that a single template file is specified for a large number of spectra which exhibit between-spectrum variation in peak shift and peak definition.
Thus, template adjustments made to improve the fit of some spectra or peaks may have an opposite effect on others. Updating the template file is a repetitious task which is extremely time-consuming, especially for crowded spectral regions, but it is essential. Once the template database is developed, the process is automated. Though selecting the integration regions for spectral binning is a manual task, spectral binning is a relatively fast and straightforward method for 1 H-NMR signal extraction.
- Freely available!
- Nurses Pocket Drug Guide 2008.
- Mnova NMR: 45-day FREE trial?
- Services on Demand.
The magnetic field strength of the NMR spectrometer influences the resolution of the metabolic peaks. In higher resolution spectra, peaks appear with greater definition, exhibit fewer higher-order effects, and show less overlap. Fewer overlapping regions imply a greater one-to-one mapping between spectral bins and metabolites [ 5 ] and the increased signal-to-noise ratio in the higher resolution spectra is advantageous for metabolic signal extraction using BATMAN see Fig An abundance of detail pertaining to biological functions is contained within the metabolome.
There is a strong desire to eventually utilize these data to make informed clinical decisions about disease status, susceptibility, and progression.
It is expected that metabolomics will be of vital importance in reaching the goal of providing healthcare that is customized for individual patients. Therefore, obtaining interpretable, reliable, and reproducible results is essential. The variation in chemical shift locations across spectra is a challenge for spectral binning.
Therefore, the inclusion of a spectral alignment step in the pre-processing of NMR data is important in order to obtain reliable and interpretable features. However, even with good spectral alignment, overlapping peaks often prevent a one-to-one mapping between integration regions and metabolites. Integration regions, especially those of lower resolution spectra, may contain signals from two or more metabolites in conjunction with an unidentified signal for illustration, see Tables D to H in S1 File.
Thus, a drawback of the simplicity surrounding spectral binning is the lack of biological interpretability of the resultant features.
Spectral deconvolution, particularly the BATMAN model, provides the means to obtain a single concentration estimate for each metabolite of interest. The residual signal captured by wavelets can be divided into integration regions in order to capture for instance, broad lipid resonances. In the end, clinically relevant features are extracted from the 1 H-NMR spectra. For the MHz spectra, the relative metabolic concentrations estimated by BATMAN excelled, producing the best performing classifier in terms of mean misclassification error. The authors acknowledge Wibren Oosterbaan for his contribution to the characterization of the 1 H-NMR metabolic resonances.
Browse Subject Areas?
VTLS Chameleon iPortal Full Record
Click through the PLOS taxonomy to find articles in your field. Abstract Nuclear magnetic resonance NMR spectroscopy is a principal analytical technique in metabolomics. The identification and quantification of blood plasma metabolites based on 1 H-NMR spectra is a challenge for the following reasons: 1 H-NMR spectrometers have detection limits.
Although the number of significantly detectable peaks increases for higher magnetic field strengths, the number of existing plasma metabolites that can be reliably detected and quantified remains rather small approximately More than one metabolite can contribute to a signal at a specific location which further complicates peak identification and metabolite quantification. Download: PPT.
Fig 1. Illustration of a portion of a MHz spectrum before grey spectrum and after blue spectrum baseline correction. Fig 2. Illustration of warping in the region of the lactate signal.