# How to subsample a virtual population?

## Subsampling

### The principle

**Subsampling** is used to select a subset of patients that match a set of constraints.

As opposed to a model calibration, here the constraints used are at the population level, and as opposed to the Vpop Design, they can be applied to both inputs and outputs of the model, as long as they are scalars (typically, some measures used in the Trial).

*For example*, you may run a simulation with a viral infection on a virtual population where some input parameters vary.

In this context, the available literature gives me information on the distribution of the level of seroprotection (HI titers) during infection.

Subsampling design allows me to select a subset of patients among the Vpop that best matches this distribution in the simulated trial.

Moreover, in the same context, you can also add a filter to completely remove patients with some characteristics.

*For example*, here, since you are interested in the symptomatic patients from the placebo arm, you can remove all patients with a max viral load lower than the level of detection.

Run the sampling to see your subsampled Vpop’s distribution appear on the plots, and compare it with the targets.

The Subsampling design is applied on a** Trial simulation**, and contains the definitions of all the constraints.

Constraints at the population level can be applied to any couple (descriptor, arm) from the trial.

When adding these constraints, the Subsampling design graphically shows the histograms of the initial and filtered Vpops, overlaid with the theoretical target distribution.

At this stage, you can visually check that the histogram of the filtered Vpop overlaps with the target distribution - the optimization algorithm only selects patients, but does not create any new ones.

If some areas of the target distribution are not covered by the filtered Vpop, a solution could be to re-run the same trial with a wider Vpop, or to loosen the filters.

Note also that depending on the number of targets you have, the initial Vpop should be significantly larger (*10 or 100) than the Vpop you expect to subsample.

**Note** that categorical parameters are not yet supported in the subsampling (neither in the distributions nor in the filters).

#### Reusing your Subsampling design with a different trial

You may want to reuse the same targets in another trial, for example if you realize that your input Vpop is not large enough to give a satisfactory subsampled Vpop.

However, the trial must contain the same (descriptor, arm) couples as those used in the subsampling design. This can easily be ensured by using the same CM, the same measures and the same protocol.

On the top right corner, a clickable link allows you to change the Trial simulation attached to a Subsampling design, if it has not yet been used to subsample a Vpop.

If it has already been used, you can create a duplicate in the app panel (on the right).

### Simulated annealing algorithm

The default algorithm used in jinkō to subsample a Vpop is Simulated annealing (Reeves, Colin R., ed. Modern heuristic techniques for combinatorial problems. John Wiley & Sons, Inc., 1993. ) Jinkō uses the GNU Scientific Library for this implementation.

This algorithm is a probabilistic algorithm used to find the global optimum of a given cost function. The cost function used here is defined as the weighted sum of the Kolmogorov–Smirnov distances between target and subsampled distributions and the difference between the target and real correlations.

**The different options available are: **

**NumSamples**is the number of patients you want in the output population. It must be lower than the size of the filtered Vpop.**Seed**is used to initialize the random number generation.

**Advanced options: **

numIterations is the total number of iterations of simulated annealing, default is 100. You can go up to 80k iterations if you have a lot of constraints to respect, but it may take a bit more time.

itersFixedTemperature is the number of iterations at each temperature, default is 10.

replacementRate is the proportion of samples to swap at each iteration, default is 0.01.

boltzmannConstant kT is used to define the probability p = exp((Ea - Eb) / kT) of taking a step from a state of energy Ea to one of energy Eb where Eb > Ea. It defaults to 1e-3.