# Survival analysis

This section introduces the survival analysis concept and the most popular techniques related to this subject - they are detailed in the following:

## 0. Survival analysis

Survival analysis corresponds to a set of statistical approaches used to investigate the time that it takes for an event of interest to occur (time-to-event data). It is used in biology (for patients’ survival time analyses, time from first heart attack to the second, etc), sociology (for “event-history analysis”) and in engineering (for “failure-time analysis”), for instance (1,2).

## 1. Cox model

A Cox model [also Cox (proportional hazards) regression model] is a statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model builds a survival function which gives a probability of a certain event at a particular time t - the hazard (or risk) of death for an individual, given their prognostic variables. Once a model is built from the observed values, it can then be used to make predictions for new inputs. Cox models are able to deal with censored data (1,2).

## 2. Log rank test

The log-rank test (Mantel-Cox test) is a non-parametric hypothesis test to compare survival distributions from two samples, i.e. testing the null hypothesis that there is no difference between the populations in the probability of an event (i.e. death) at any time point. It is often used in clinical trials to compare survival experience for two groups of individuals, as it can handle censored or right skewed data. It could be used to establish the efficacy of a new treatment in comparison with a control treatment when the measurement is the time to event. An extension of the log-rank test exists (MaxCombo) which tends to be more robust in a context of non-proportional hazards (1,2).

## 3. Kaplan Meier estimator

The Kaplan–Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. A plot of the Kaplan–Meier estimator (KM curve) is a series of declining horizontal steps which, with a large enough sample size, approaches the true survival function for that population. An important advantage of the Kaplan–Meier curve is that the method can take into account some types of censored data, particularly right-censoring, which occurs if a patient withdraws from a study, is lost to follow-up or is alive without event occurrence at last follow-up. On the plot, small vertical tick-marks state individual patients whose survival times have been right-censored. (1,2)

## 4. Random Survival Forest

A random survival forest is a non-parametric ensemble method for the analysis of right censored survival data, built as a time-to-event extension of random forests for classification. The method can handle multiple covariates, noise covariates, as well as complex, nonlinear relationships between covariates without need for prior specification (1,2)