Part 2. NOVA’s approach reconciles knowledge with data to predict clinical outcomes

updated 2 yrs ago

TL; DR

The M&S approach in MIDD should be designed and operated for the patient's health (in the form of clinical outcome). The M&S is the mix of the top-down problem solving approach and the bottom-up principles of conventional systems biology and bioinformatics.

Data-driven approach has too many limitations to predict living organism well.

Knowledge models are much less context- and time-dependent compared to raw data and thus more reliable as a source to generate in silico prediction. Multi-scaled and mechanistic pathophysiological models based on knowledge extracted from the scientific literature should be at the center of any in silico framework.

In VPop, data should be used both to calibrate and validate a mechanistic model and be served to account for between and within patient variability to ensure predictions reproduce the range of different genotypic and phenotypic profiles.

EM is used as a bridge between simulation outputs and predicted clinical outcomes. The EM defines the relationship between the probability of disease-related events without therapy and with therapy and can be calculated as a sum of differences for each patient. In silico can help in decision-making together with the EM, such as selecting the optimal target combination for a given condition and patient population.

EM is estimated by combining a formal model of the disease and therapy of interest with a Virtual Population (VP) of patients representative of a specific geography or context. Sum of all clinical benefits of the treatment for each patient results in the Number of Prevented Events (NPE).

What the NOVA approach is made for:

making the complexity of biological, physiological, clinical, and epidemiological knowledge interpretable and operational

reconciling and merging data and knowledge

predicting, at any time point of the R&S, the benefit on the clinical outcome that matters for patients and enabling benchmarking on it

reducing failure rate

answering unanswerable questions during the R&D of new therapies

shortening time at the market

helping to translate the knowledge and data accumulated during development into useful knowledge for real life

What is the role of M&S in treatment R&D?

The role of M&S in therapeutic R&D is to solve problems, with the ultimate and most important being the alleviation of the patients’ health condition. Thus the M&S approach in model informed drug development (MIDD) should be designed and operated with a continuous concern for the patient's health. A patient’s problem is measured as a clinical outcome, ranging from premature death to quality of life impairment. Ideally, the M&S should combine the top-down principle of problem solving and the bottom-up principles of conventional systems biology and bioinformatics [1]. If possible the model should predict treatment efficacy on the clinical outcome, not on a surrogate marker or endpoint.

Limitations of data-centric approaches

Much hype has surrounded the emergence of big and smart data analytics applications in biomedicine over the past years [2]. Unfortunately, data-centric approaches are bound to fail for a number of inescapable limitations, which are either practical or inherent to the nature of data. Omics on their own are weak predictors of clinical outcomes. Living systems are not built from a limited number of standard parts, with interactions encoded in the genome[3]. Furthermore, the majority of databases with omics investigations are obtained in an observational context. Genome-Wide Association studies (GWAS) for example, which have identified multiple genetic variations that contribute to common, complex diseases, are case-control studies by design. Thus, the explanatory link between patients’ treatment(s), outcomes and omics findings are derived from correlations. Correlations do not necessarily imply causation, and this is why the randomized controlled study with a priori protocols and randomisation reducing bias and confounding is still considered the gold standard to demonstrate a causal link between the intervention and the observed outcome. No statistical modelling or propensity score can render patients with different treatments fully comparable at baseline. The situation is even more confounded when a sequence of treatments has been applied, personalised to each study participant. With the exception of genomics data, the within-patient (or between observational settings) finding variability is poorly understood. The dynamics of, e.g. protein expression, need further exploration. Cross-sectional data do not inform these dynamics and sequential data inform dynamics poorly because the path of data collection cannot be harmonized with the timescale of the phenomenon of interest. A snapshot cannot capture behaviour. Another issue relates to the lack of standardization. Even the most common surrogate markers (e.g. progression-free survival (PFS) and pathology grading and staging) tend to lack sufficient robustness to predict overall survival accurately. Furthermore, even with large databases, meaningful rare events or rare variants may be missed because of insufficient statistical power. More importantly, validity of results depends on the data and the time and settings in which they were obtained, which limits generalizability to prospective patients as well as patients treated in different settings. These are limitations difficult to overcome, even with sophisticated machine learning approaches. Data-driven algorithms can only be tasked with making “inductive” predictions based on past data. The key resides in exploiting these increasingly large datasets in combination with the trove of knowledge available in the scientific literature.

Knowledge-driven modelling

Knowledge is the product of repeated consistent experimental observations, e.g. a high concentration of ligand saturated receptors at the cell surface. It is much less context- and time-dependent compared to raw data and thus more reliable as a source to generate in silico predictions. Thus, knowledge needs to be treated differently compared to the data, either observational or experimental, it is derived from. The amount of knowledge gathered on cancer, immunology and other domains of biology and pathophysiology by researchers over the course of little more than half a century is tremendous. This body of knowledge, measured by the number of original articles, has doubled every decade since the 1950s[4]. As of February 2016, 25 million original articles related to biomedical sciences were indexed in PubMed. This development is owed to the combination of two facts: The exponential growth in the number of articles and the inherent biologic complexity, both of which explain the failure of our current approaches to translate this rich body of knowledge into improvement of therapeutic innovations and patient care. Multi-scaled (from genes to cells, tissues and organs) and mechanistic (i.e. causal, e.g. “IkB kinase phosphorylates IkB, resulting in a dissociation of NF-kappaB from the complex with its inhibitor”) pathophysiological models based on knowledge extracted from the scientific literature should be at the center of any in silico framework. These formal (i.e. mathematical and computational) models of the disease - e.g. tumors with their microenvironment (tumor microenvironment, TME), the immune system, processes such as metastases, etc. - and the pharmacokinetics and pharmacodynamics of the drug of interest (with time-dependent drug concentration at its target site) represent in a causal (or mechanistic) way all relationships between the various entities involved in the relevant pathophysiological mechanisms (see Box 5).

Even if knowledge derives from data, it presents a major difference with the data contained in a database: its generality. A piece of established knowledge, i.e. a piece with the highest SoE, also called a scientific fact (for example: the earth is a flattened sphere) is the top of a pyramid of observations and experiments (therefore of data) which ensures its generality. Whereas the data in a database comes from a single observation or experiment. The multiple shots make the difference with a single shot.

The middle-out approach: the Virtual Population

In this approach, data serves two important purposes: First, data should be used both to calibrate and validate a mechanistic model. Second, while the mechanistic model is deterministic, data serves to account for between and within patient variability to ensure predictions reproduce the range of different genotypic and phenotypic profiles. This is represented in the “Virtual Population” (VP), a cornerstone of this comprehensive approach based on knowledge-driven pathophysiological models The variability of patient phenotypic and genotypic characteristics (including time-dependent random somatic mutation occurrence) is derived both from parameters of the formal model of the disease of interest and the available omics, biological, clinical and epidemiological datasets (Box 2). With this approach, data is not used as the basis to generate the predictive algorithms.

By using specific datasets to inform patient descriptor distributions, the VP can be made to represent specific real-world populations and contexts, e.g. the French melanoma population [5] [6] [7].

Box 2: Reconciling knowledge & data with the Virtual Population

Legend The Virtual Population is a cohort of virtual patients whose descriptors are derived mainly from model parameters; descriptor values are sourced from a mix of knowledge and data. Knowledge captured in the literature undergoes a systematic evaluation process before being represented in a formal disease model or a descriptor. For instance, the strength of evidence of each piece of knowledge should be evaluated. An open science platform such as Jinkö helps in formalising this process. Similarly, crude data should undergo a sequence of transformations before being represented as the distribution of descriptors.

Systems biology is a step towards the integration of knowledge with data11,. However, conventional systems biology approaches have failed thus far to generate predictions up to clinical outcomes, which should be the goal of in silico clinical trials[8] [9].

Predicting clinical outcomes in silico with the Effect Model

Running in silico clinical trials requires a valid methodology to bridge simulation outputs and predicted clinical outcomes [8.1] [10]. This methodology is derived from the Effect Model (EM) law (Box 3). It is the in silico equivalent of randomized placebo-controlled in vivo trials in terms of methodological standard. The EM defines the relationship between the probability of disease-related events (tumor progression, decreased quality of life, death, etc.) without therapy (Rc, "control" risk, obtained by applying the disease model to a virtual patient, see Box 3) and with therapy (Rt, the risk modified by the treatment “t”, the investigational compound or inhibition/stimulation of the target, obtained by altering the disease model by the treatment/modified target model and applying them to a virtual patient, see Box 3). The difference between these two probabilities (AB = Rc - Rt) can be calculated for each patient and provides the predicted individual therapeutic benefit, the Absolute Benefit (AB) For a cohort of patients or a population of interest the benefit metrics is obtained by summing the ABs of all the given patients. With the EM, the clinical efficacy derived from a given therapeutic modality in a given population becomes a quantitative and predictable metric.

There is one overarching consequence of the EM framework. The current R&D paradigm relies on serendipity and costly rounds of trial-and-error. It is facing many challenges including multiplication of possible targets, need for higher investments, globalization, changes in regulation [11] which require changes in R&D strategies in addition. Here in silico can offer decisive support for decision-making with the EM, such as selecting the optimal target combination for a given condition and patient population, which is driven by the prediction of downstream efficacy on clinical outcomes given by the EM derived metrics. The best scenario is the one which maximizes the predicted clinical benefit over the VP of interest.

Box 3: The discovery of the Effect Model law

In 1987, L'Abbe, Detsky and O'Rourke recommended including a graphical representation of the various trials when designing a meta-analysis. For each trial, on the x-axis, the frequency (risk or rate) of the studied criterion in the control group (Rc) should be represented, and on the y-axis, the frequency in the treated group (Rt) [12].

In 1993, Boissel et al., while studying the effectiveness of antiarrhythmic drugs in the prevention of death after myocardial infarction by using the meta-analysis approach, noted that regardless of the metric chosen to measure the average observed efficiency (Odds Ratio, Relative Risk and Rate Difference), the heterogeneity between trials results persisted, which is inconsistent with standard statistical assumptions of meta-analyses. They showed that this can be explained by focusing on the relationship between Rt and Rc of these antiarrhythmic drugs, a relationship they called “Effect Model” in an article published in 1993 [13] (Box 4). For these drugs, the relationship seems peculiar, with the presence of an Rc threshold below which they induce more deaths than they prevent. This illustrates the intuition that all doctors have, and that which Paurer and Kassirer emphasized in 1980: a treatment can yield little benefit; even worse, it can be more harmful than beneficial for moderately sick patients [14]. Doctors adjust their decision to a threshold they derive from what they know about the treatment at stake.

The approach followed by Boissel et al. is based on a model that combines a beneficial effect that is proportional to Rc and a constant adverse effect, independent of Rc. The mathematical expression of this model is a linear equation with two parameters, the risk of lethal adverse event caused by treatment and the slope of the line which represents the true beneficial risk reduction:

Rt = a∗Rc+b

where (a) carries the beneficial effect and (b) carries the constant lethal adverse effect.

This equation gives the treatment net mortality reduction. By fitting this equation to the available data through a statistical regression technique, the authors estimated the parameter values and inferred the value of the threshold. In theory, only patients whose risk without treatment is above this threshold should be treated.

In 1998, in an editorial accompanying the publication of a study about the risk of bleeding with aspirin therapy, Boissel applied what would become the Effect Model (EM) law to this medication used in cardiovascular prevention. He showed that for subjects with low risk of cardiovascular events, aspirin is probably harmful [15]. At the end of the 90s, a set of studies dealing with the generalization of this relationship was published [16] [17] [18] [19]. In two of these articles, simulations showed that the Effect Model was not linear in the general case. These results led to the notion of a law of the EM. Although the EM law has been first observed with empirical data, it appeared later that it is best estimated with the mathematical modelling approach of diseases and treatment. Further, it enables to solve the issue of predicting the treatment related benefit on clinical outcome in in silico clinical trials.

Operating principles of the new paradigm

The Effect Model (EM) of a treatment, a clinical candidate or even a target - each in isolation or in combination - is estimated by combining a formal model of the disease and therapy of interest with a Virtual Population (VP) of patients representative of a specific geography or context (see Box 4). Thanks to the EM law, the clinical benefit of the treatment for each patient, measured by the patient-specific AB, is summed over the VP to generate the population-level efficacy metric, the Number of Prevented Events (NPE), as shown in Box 4. The clinical event to be prevented can be defined as death, tumour progression, side effects, etc. For a given treatment, a given disease and a duration of follow-up, the Effect Model is a relationship between the course of the disease in subjects that are untreated and the same subjects but treated with the intervention of interest.

Box 4: Applying the new paradigm with the Effect Model law

Legend A: Model of potential target(s) alteration; it integrates a target component of C and a target alteration profile.

B: Model of disease modifier(s); it can be a marketed drug, a compound under development, a combination of either one or more, each altering a target, or any target alteration that modifies its function(s).

C: Formal disease model, representing what is known about the disease and the affected physiological and biological systems.

E: Virtual Population, a collection of virtual patients (D) characterized by values of a series of descriptors. Each model parameter is represented by one descriptor. Other descriptors (e.g. age or compliance to treatment) may not represent a model parameter although their values are input of the model. The descriptor joint distribution is derived from data (see Box 2).

These four components and the Effect Model law (see section II.3) enable to compute the following:

When C is applied to E, simulation of the disease course on the virtual patient results in a value for Rc, the outcome probability in the control group (i.e. without treatment).

and 3. When A or B is applied to C, and A+C or B+C applied to D, this yields Rt, the probability of outcome altered by the disease modifier(s), target alteration or drug, respectively. The difference Rc – Rt is the predicted Absolute Benefit (AB), i.e. the clinical event risk reduction the (virtual) patient is likely to get from the disease modifier. AB is an implicit function of two series of variables, patient descriptors which are relevant to A and/or B, and those relevant to C. AB is the output of a perfect randomized trial since each patient is his/her own control. AB is the Effect Model of the potential target alteration (A) or of the treatment of interest.

When C is applied to each virtual patients in E, this results in all Rc for D. Alternatively, if A or B are applied to all D from E, this yields the values of Rt for all virtual patients in D. By adding all the corresponding ABs, one derives the number of prevented outcomes (or Number of Prevented Events, NPE).

The processes can be represented graphically in the Rc,Rt plane, as shown in the accompanying visual: i) for the AB of an individual virtual patient D; the dotted line represents the no-effect line, where Rt = Rc. ii) for the NPE = sum of all individual ABs.

Noble D. The future : putting Humpty-Dumpty together again. Biochemical Society Transactions (2003) Volume 31, part 1 ︎
Eisenstein, M. The power of petabytes. Nature 527, 3–4 (2015) ︎
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. U. S. A. 109, 1193–1198 (2012) ︎
Wyatt, J. Information for clinicians: use and sources of medical knowledge. Lancet 338, 1368–1373 (1991) ︎
Chabaud, S., Girard, P., Nony, P. & Boissel, J. P. Clinical trial simulation using therapeutic effect modeling: application to ivabradine efficacy in patients with angina pectoris. J Pharmacokinet Pharmacodyn 29, 339–363 (2002) ︎
Allen RJ, Rieger TR, Musante CJ. Efficient Generation and Selection of Virtual Populations in Quantitative Systems Pharmacology Models; CPT Pharmacometrics Syst. Pharmacol 5, 140–146 (2016) ︎
Rieger TR, Allen RJ, Bystricky L, Chen Y, Colopy GW, Cui Y, Gonzalez A, Liu Y, White RD, Everett RA, Banks HT, Musante CJ. Improving the generation and selection of virtual populations in quantitative systems pharmacology models. Progress in biophysics and molecular biology 139, 15-22 (2018) ︎
Boissel, J.-P., Auffray, C., Noble, D., Hood, L. & Boissel, F.-H. Bridging Systems Medicine and Patient Needs. CPT Pharmacometrics {&} Syst. Pharmacol. 4, 135–145 (2015) ︎ ︎
Hood, L. & Perlmutter, R. M. The impact of systems approaches on biological problems in drug discovery. Nat. Biotechnol. 22, 1215–1217 (2004) ︎
Boissel JP, Ribba B, Grenier E, Chapuisat G, Dronne MA. Modelling methodology in physiopathology. Prog Biophys Mol Biol. 2008;97:28-39 ︎
(2011, June 1). The productivity crisis in pharmaceutical R&D | Nature Reviews .... Retrieved May 30, 2021, from https://www.nature.com/articles/nrd3405 ︎
L’Abbe, K. A., Detsky, A. S. & O’Rourke, K. Meta-analysis in clinical research. Ann Intern Med 107, 224–233 (1987) ︎
Boissel, JP; Collet, .P; Lievre, M; Girard, P. An effect model for the assessment of drug benefit: Example of antiarrhythmic drugs in postmyocardial infarction patients. J. Cardiovasc. Pharmacol. 22, 356–363 (1993). ︎
Pauker, S. G. & Kassirer, J. P. The threshold approach to clinical decision making. N Engl J Med 302, 1109–1117 (1980) ︎
Boissel, J. Individualizing aspirin therapy for prevention of cardiovascular events. Jama 280, 1949–1950 (1998) ︎
Glasziou, P. P. & Irwig, L. M. An evidence based approach to individualising treatment. BMJ 311, 1356–1359 (1995) ︎
Boissel, J. P. et al. New insights on the relation between untreated and treated outcomes for a given therapy effect model is not necessarily linear. J. Clin. Epidemiol. 61, 301–307 (2008) ︎
Wang, H., Boissel, J.-P. & Nony, P. Revisiting the relationship between baseline risk and risk under treatment. Emerg. Themes Epidemiol. 6, 1 (2009) ︎
Boissel, J.-P., Kahoul, R., Marin, D. & Boissel, F.-H. Effect model law: an approach for the implementation of personalized medicine. J. Pers. Med. 3, 177–190 (2013) ︎