0

Quantitative validation methods

Patient level data

Here we present the measures used when handling patient-level data in the context of validation:

 

See Pearson correlation test in the Hypothesis testing section.

 

See Spearman's rank test in the Hypothesis testing section.

 

See Kendall rank correlation test in the Hypothesis testing section.

 

The Mean Bias Error (MBE) consists of comparing forecasted outputs ŷ (or predicted time series) with observed data y (or observed or measured time series). MBE is not a good indicator of the model reliability because the errors often compensate each other, but it allows one to see how much it overestimates or underestimates (1

 

A receiver operating characteristic curve (ROC curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters at different classification thresholds: True Positive Rate (TPR) and False Positive Rate (FPR). Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives (1)

 

AUC stands for "Area under the ROC Curve." That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (as in integral calculus) from (0,0) to (1,1). It  provides an aggregate measure of performance across all possible classification thresholds. AUC ranges in values from 0 to 1. A model, whose predictions are 100% wrong, has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0. (1)

 

The Bland-Altman (B&A) analysis is based on the quantification of the agreement between two quantitative measurements by studying the mean difference and constructing limits of agreement. The B&A plot analysis is a simple way to evaluate a bias between the mean differences and to estimate an agreement interval, within which 95% of the differences of the second method, compared to the first one, fall. (1)(2)(3)

 

Population level data

Here we detail the measures used when handling population-level data in the context of validation. Some of these methods are already presented in the Hypothesis testing section. Visit that section for more information.

 

See Parametric tests in the Hypothesis testing section.

See Z-test in the Hypothesis testing section.

See Student’s t-test in the Hypothesis testing section.

See Chi-squared test in the Hypothesis testing section.

 

See Non-parametric tests in the Hypothesis testing section.

See Kolmogorov-Smirnov test in the Hypothesis testing section.

See Mann-Whitney U test in the Hypothesis testing section.

See Wilcoxon signed-rank test in the Hypothesis testing section.

 

When only summary data is available, in order to evaluate the Computational Model’s (CM) capability to successfully reproduce a time series or a Kaplan-Meier-like curve, we can use visual predictive checking (VPC) associated with two metrics: coverage and precision. Both metrics are based on the width of the prediction (Prediction Interval) and observed intervals (Confidence Interval):

The coverage represents the model’s accuracy to reproduce the range of observed data in real life 

 

Coverage = (Obs. interval ∩ Pred. interval)/Obs. interval

 

The precision represents the model’s ability to provide results with a reasonable variability 

 

Precision = (Obs. interval ∩ Pred. interval)/Pred. interval

 

Both metrics are defined between 0 and 100%. A coverage and precision of at least 70% is considered as acceptable whereas 80% and more is considered as good (1).

 

When comparing two curves (observed vs simulated one), the ratio of AUC can be useful. This metric relies on the ratio of the area in between curves (the region existing below the upper curve and above the lower curve), divided by the area under the observed curve. We consider that an acceptable threshold for this metric would be 0.3 or below as one can notice on the plot of the simulated data with a limited amount of noise that an AUC ratio of 0.3 would be equivalent to a difference of 20% between simulated and observed values (See this reference for more detail 1). 

 

Reply

null