HOME

TheInfoList



OR:

Receiver Operating Characteristic Curve Explorer and Tester (ROCCET) is an open-access web server for performing biomarker analysis using ROC (
Receiver Operating Characteristic A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of ...
) curve analyses on metabolomic data sets. ROCCET is designed specifically for performing and assessing a standard binary classification test (disease vs. control). ROCCET accepts metabolite data tables, with or without clinical/observational variables, as input and performs extensive
biomarker In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
analysis and
biomarker In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
identification using these input data. It operates through a menu-based navigation system that allows users to identify or assess those clinical variables and/or metabolites that contain the maximal diagnostic or class-predictive information. ROCCET supports both manual and semi-automated feature selection and is able to automatically generate a variety of mathematical models that maximize the sensitivity and specificity of the biomarker(s) while minimizing the number of biomarkers used in the biomarker model. ROCCET also supports the rigorous assessment of the quality and robustness of newly discovered biomarkers using permutation testing, hold-out testing and cross-validation.


Background – ROC curves in biomarker discovery

Biomarkers are commonly defined as measured characteristics that may be used as indicators of some biological state or condition. They may be genes, chemicals, proteins, physiological parameters, imaging data or histological measurements. Biomarkers can consist of single components (i.e. blood glucose) or multiple components (a biomarker panel such as acylcarnitines). Medical biomarkers fall into 5 major categories: 1) diagnostic (used to identify if you have a disease or condition); 2)
prognostic Prognosis (Greek: πρόγνωσις "fore-knowing, foreseeing") is a medical term for predicting the likely or expected development of a disease, including whether the signs and symptoms will improve or worsen (and how quickly) or remain stable ...
(used to determine how well you will do with the disease or condition); 3) predictive (used to determine if you may get the disease); 4) efficacy or monitoring (used to determine how well a drug or treatment is doing in fighting the disease) and 5) exposure (used to determine if you have been exposed to a drug, food, toxin or other kind of substance). Good biomarkers should exhibit good sensitivity (the fraction of correctly identified true positives) and good specificity (the fraction of correctly identified true negatives). A perfect biomarker or biomarker panel would be 100% sensitive (predict all people in the sick group as being sick) and 100% specific (not predicting anyone from the healthy group as being sick). However, since few things in life are perfect, there is often a trade-off between sensitivity and specificity. In medical biomarker studies it is becoming increasingly common to report this tradeoff in sensitivity and specificity using a
Receiver Operating Characteristic A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of ...
(ROC) curve. ROC curves plot the sensitivity of a biomarker on the y axis, against the false discovery rate (1- specificity) on the x axis. An image of different ROC curves is shown in Figure 1. ROC curves provide a simple visual method for one to determine the boundary limit (or the separation threshold) of a biomarker or a combination of biomarkers for the optimal combination of sensitivity and specificity. The AUC (area under the curve) of the ROC curve reflects the overall accuracy and the separation performance of the biomarker (or biomarkers), and can be readily used to compare different biomarker combinations or models. As a rule of thumb, the fewer the biomarkers that one uses to maximize the AUC of the ROC curve, the better.


Metabolomics

ROCCET’s ROC curve generation and analysis is specifically tailored for
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...
datasets.
Metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...
data sets produced by high throughput analytical chemistry techniques typically consist of large matrices containing multiple values for multiple samples. The comparison between groups or subsets of samples within the data usually involves statistical procedures employing univariate analysis and multivariate analysis such as
Partial Least Squares Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a li ...
-
Discriminant Analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
(PLS-DA) or machine learning classification procedures such as Support Vector Machine (SVM). As a result, ROCCET offers two different kinds of analytical modules – a univariate module and a multivariate module. In the univariate module single variables are evaluated (by a
t-test A ''t''-test is any statistical hypothesis testing, statistical hypothesis test in which the test statistic follows a Student's t-distribution, Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test stati ...
) and ranked for their separation performance (i.e. the AUC of the ROC), including
confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
(CI) and a computed optimal threshold. In the multivariate module one can choose between three different techniques – SVM (
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
), PLS-DA (
partial least squares Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a li ...
discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
) and
Random Forests Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of th ...
for classifying and selecting metabolites or clinical variables for an optimal ROC performance. The resulting analysis produces the top-performing multi-variable model(s) based on their ROC curve characteristics. This module also presents the significant variables (clinical variables and/or metabolites) contributing to the model (via “ROC explorer”). ROCCET also supports an option to manually select specific variables to be included in a given biomarker model. These variables can be analyzed using “ROC tester”. ROCCET also supports the rigorous assessment of the quality and robustness of newly discovered biomarkers or biomarker panels using permutation testing, hold-out testing and cross-validation. ROCCET generates a variety of colorful, journal-ready graphs and tables (see Figures 1 and 2) and supports the downloading of all generated files including tables ( CSV format), graphs ( PNG or
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
) and the processing history as an R file (which can be read as a simple text file). A tutorial for using ROCCET is offered in the following reference. Training datasets are available on ROCCET website for experimenting with the tool.


See also

*
Receiver operating characteristic A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of ...
*
Metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...
*
Biomarker (medicine) In medicine, a biomarker is a measurable indicator of the severity or presence of some disease state. More generally a biomarker is anything that can be used as an indicator of a particular disease state or some other physiological state of an org ...
*
Sensitivity and specificity ''Sensitivity'' and ''specificity'' mathematically describe the accuracy of a test which reports the presence or absence of a condition. Individuals for which the condition is satisfied are considered "positive" and those for which it is not are ...
*
Biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
*
Microbiology Microbiology () is the scientific study of microorganisms, those being unicellular (single cell), multicellular (cell colony), or acellular (lacking cells). Microbiology encompasses numerous sub-disciplines including virology, bacteriology, prot ...
*
Univariate In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate cas ...
*
Multivariate Multivariate may refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial In computing * Multivariate cryptography * Multivariate division algorithm * Multivariate interpolation * Multivariate optical c ...
*
Biomarker In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...


References

{{reflist, 30em Biological databases