Hypothesis testing

updated 2 yrs ago

This section provides an overview of a set of statistical tests frequently used at nova - they are detailed in the following:

1. Parametric tests

Parametric tests are those that make assumptions about the parameters of the population distribution from which the sample is drawn. This is often the assumption that the population data is normally distributed (1). Some of the most commonly used parametric tests are:

1.1 Z-test

A z-test is a statistical test that is conducted on data that approximately follows a normal distribution. This test can be performed on one sample, two samples, or on proportions for hypothesis testing. It checks if the means of two large samples are different or not when the population variance is known (1). Some texts suggest that this test can be used even if the normality assumption is not met when the sample size is large enough (>30) (2). However, this approach may lead to inaccurate results. A slight variation of this test is required when comparing two matched (paired) samples (3).

1.2 Student’s t-test

A t-test is a statistical test used to determine if there is a significant difference between the means of two groups. This test is used when the data sets follow a normal distribution and have unknown variances (1). Generally speaking, this test is recommended over the z-test as knowing the actual variance of a sample distribution is hardly possible in practice. A slight variation of this test is required when comparing two matched samples (2).

1.3 Chi-squared test

A Chi-square (χ²) test is a hypothesis testing method. Two common Chi-square tests involve checking if observed frequencies in one or more categories match expected frequencies. The basic idea behind the tests is that you compare the actual data values, which must be random, raw, mutually exclusive, drawn from independent variables and drawn from a large enough sample, with what would be expected if the null hypothesis is true. You calculate the squared difference between actual and expected data values and divide that difference by the expected data values for each data point and add up the values. Then, you compare the test statistic to a theoretical value from the Chi-square distribution. The theoretical value depends on both the alpha value and the degrees of freedom for your data (1,2,3).

2. Non-parametric tests

Non-parametric tests are “distribution-free” and, as such, can be used for non-Normal variables (1). Some of the most commonly used non-parametric tests are:

2.1 Kolmogorov-Smirnov test

The Kolmogorov–Smirnov test assesses the equality of probability distributions that can be used to compare a sample with a reference probability distribution or to compare two samples. In essence, the test answers the question "What is the probability that this collection of samples could have been drawn from that probability distribution?" (1). This is a powerful tool to validate not only the location and spread of a distribution but also its density as a whole (2).

2.2 Mann-Whitney U test

The Mann–Whitney U test is a non-parametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X. This test assumes that:

all the observations from both groups are independent of each other,
the responses are at least ordinal (i.e., one can at least say, of any two observations, which is the greater),
under the null hypothesis H0, the distributions of both populations are identical,
the alternative hypothesis H1 is that the distributions are not identical (1,2).

2.3 Wilcoxon signed-rank test

The Wilcoxon rank-sum test (also known as Mann–Whitney U test) is a non-parametric statistical test that compares two independent groups. The test essentially calculates the difference between sets of pairs and analyzes these differences to establish if they are statistically significantly different from one another. It is used if the corresponding variable is either ordinal or continuous, but not normally distributed (1).

In the case of two dependent (matched) samples, the Wilcoxon signed-rank test is performed, where a paired difference test of repeated measurements on a single sample is executed to assess whether their population mean ranks differ (2,3).

3. Correlation tests

Correlation tests are used to evaluate the association between two or more variables (1).

3.1 Pearson correlation test

The Pearson correlation coefficient is a measure of linear correlation between two sets of data. It is essentially a normalized measurement of the covariance (a measure of common variability), such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. (1,2)

3.2 Spearman's rank test

The Spearman's rank correlation coefficient is a non-parametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. While Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotonic function of the other. (1,2)

3.3 Kendall rank correlation test

Kendall rank correlation coefficient test (often called Kendall’s τ or tau) is a non-parametric test which measures the strength of the relationship between two variables, assuming the monotonicity of the relationship between variables. It is an alternative to Pearson’s correlation (parametric) when the data fails one or more assumptions of the test and an alternative to Spearman correlation (non-parametric) when the sample size is small and has many tied ranks (1,2).

4. Permutation test

Permutation tests are non-parametric hypothesis tests that, making use of the proof by contradiction, compute the sampling distribution for any test statistic, under the strong null hypothesis that a set of generic variants has absolutely no effect on the outcome, i.e. that all samples come from the same distribution. Thus, permutation tests are a form of resampling (1,2).

5. Equivalence tests

Equivalence tests (two one-sided tests [TOST]) are a variety of hypothesis tests used to draw statistical inferences from observed data. In equivalence tests, the null hypothesis is defined as an effect large enough to be deemed interesting. This is specified by an upper and lower equivalence bound based on the smallest effect size of interest (knowledge of the domain under study). The alternative hypothesis is any effect that is less extreme than said equivalence bounds. The observed data are statistically compared against the equivalence bounds, where subsequently the presence of effects large enough to be considered worthwhile can be rejected (1,2).

6. TOST (two one-sided t-tests)

In the TOST procedure an upper (ΔU) and lower (–ΔL) equivalence bound is specified based on the smallest effect size of interest (e.g. a positive or negative difference of d = 0.3). Two composite null hypotheses are tested: H01: Δ ≤ –ΔL and H02: Δ ≥ ΔU. When both these one-sided tests can be statistically rejected, we can conclude that –ΔL < Δ < ΔU, or that the observed effect falls within the equivalence bounds and is statistically smaller than any effect deemed worthwhile and considered practically equivalent (1,2).