An Introduction to Bayesian Artificial in Medicine

Giovanni Briganti1 and Marco Scutari2

1Unit of Epidemiology, Biostatistics and Clinical Research, Universit´eLibre de Bruxelles 2IDSIA Dalle Molle Institute for Artificial Intelligence

August 7, 2020

Abstract Artificial Intelligence has become a topic of interest in the medical sciences in the last two decades, but most studies of its applications in clinical studies have been criticised for having unreliable designs and poor replicability: there is an increasing need for literacy among medical professionals on this increasingly popular domain of research. This work provides a short introduction to the specific topic of Bayesian Artificial Intelligence: we introduce Bayesian reasoning, networks and causal discovery as well as their (potential) applications in clinical practice.

Keywords: statistics, , directed acyclic graphs

1 Introduction

The origins of Artificial Intelligence (AI) can be traced back to the work of in 19501. He proposed a test (known today as the Turing Test) in which a certain level of intelligence is required from a machine to fool a human into thinking he is carrying on a conversation with another human (). Although such level of intelligence has not yet been attained, a simple conversation with a virtual assistant like Siri serves as a clear example of how AI research has rapidly evolved in the past decades. AI has become a topic of interest in the medical sciences in the last two decades, with more and more applications being approved by health authorities worldwide. However, most physicians and medical scientists have no formal training in the disciplines the world of smart medical monitoring, diagnosis and follow-up has sprung from. This, along with historical epistemological differences in the practice of medicine and engineering disciplines, is one of the main obstacles to widespread collaboration between physicians and engineers in the development of AI software. It may also be argued that that is in turn one of the root causes of the ongoing replicability crisis in medical AI research2, characterised by unreliable study designs and poor replication3. Although there is an increasingly rich literature on how AI can be used in the clinical practice, few works aim to interest physicians in the fundamental concepts and terminology of AI. The gradual shift towards quantitative methods in the last century has made them

1 familiar with such concepts from probability and statistics as correlation, regression and confidence intervals; it is worthwhile to expand on these concepts and link them with modern AI. The most common AI approaches in the medical literature are neural networks (in the early 2000s) and clustering (in the last decade)4. Bayesian reasoning and methods for AI are less known, although they can be used for medical decision making, to study human physi- ology5, to identify interactions between symptoms6, and to investigate symptom recovery7. Bayesian models can facilitate reasoning in probabilistic terms when dealing with uncertainty, which is omnipresent in the medical sciences since we cannot construct a complete mecha- nistic model of diseases and of the physiological mechanisms they impact. One important characteristic of Bayesian models, and in particular of Bayesian networks, is that the vari- ables that are used as inputs for the model and their relationships are direct representations of real-world entities and of their interplay, not a purely mathematical construct as in neural networks8. This is why such approach is interesting to the medical field: Bayesian networks are the method of choice for explainable reasoning in AI. This work aims to introduce physicians to Bayesian AI through a clinical perspective on probabilistic reasoning and Bayesian networks. More detailed accounts on the topic may be found in popular textbooks9;10.

2 Probabilistic reasoning

Bayes’ theorem describes the probability of an event based on prior knowledge of conditions that may be related to that event: that is expressed with the well known mathematical notation Pr(B | A) Pr(A) Pr(A | B) = , Pr(B) the probability Pr(A | B) of an event A given prior knowledge of a condition B (that is, B has occurred). Pr(A | B) is a conditional probability, while Pr(A) and Pr(B) are known as marginal probabilities, that is, the probabilities of observing the events A and B individually. In the context of medical diagnosis, the goal is to determine the probability Pr(Di | Cp) of presence of a particular disease or disorder Di given the clinical presentation of the patient 11 0 Cp , the prior probabilities Pi of the disease in the patient’s reference group, and the prior 0 probabilities Pj of other diseases Dj; that is

0 Pi Pr (Cp | Di) Pr (Di | Cp) = 0 P 0 Pi Pr (Cp | Di) + j Pj Pr (Cp | Dj)

where Pr(Cp | Dj) is the probability of having the same clinical presentation given other diseases. Bayes’ theorem makes it possible to work with the distributions of dependent (or condi- tionally dependent) variables. However, in order to reduce the number of variables we need to observe simultaneously in probabilistic systems, it is also important to determine whether two variables A and B are independent (A ⊥⊥ B),

Pr(A | B) = Pr(A),

2 or conditionally independent (A ⊥⊥ B | C) given the value of a third variable C,

Pr(A ∩ B | C) = Pr(A | C) Pr(B | C).

Extracting conditional dependence tables is one of the building blocks upon which we can build Bayesian probabilistic reasoning. It allows to take a set of variables (say, A, B and C again) and to compute the conditional probabilities of some of them (say, A | C), P B∈{x,y} Pr(C = x, B, A = y) Pr(A = y | C = x) = P . B,A∈{x,y} Pr(C = x, B, A) Conversely, we can also take variables or set of variables that are (conditionally) independent from each other and combine them to obtain their joint probability. This joint probability will be structured as a larger conditional dependence table that is the product of smaller tables associated with the original variables. For instance,

Pr(A = y, B = z | C = x) = Pr(A = y | B = z) Pr(B = z | C = x) assuming A ⊥⊥ C. The ability of explicitly merging and splitting set of variables to separate variables of interest we need for diagnostic purposes from redundant variables is one of the reasons that makes Bayesian reasoning easy while at the same time mathematically rigorous.

3 Bayesian thinking in

Machine learning (ML) is the sub-field of AI that studies the algorithms and the statistical tools that allow computer systems to perform specific and well-defined tasks without explicit instructions12. Implementing machine learning requires four components. First, we need a working model of the world that describes the tasks and their context in a way that is understandable by a computer. In practice, this means choosing a class of Bayesian models defined over the variables of interest, and over relevant exogenous variables, and implementing it in software. Generative models such as Bayesian networks describe how variables interact with each other, and therefore represent the joint probability

Pr (X1,...,XN ); while discriminative models such as random forests and neural networks only focus on how a group of variables predicts a target variable by estimating

Pr (X1 | X2,...,XN ) .

Clearly, depending on the application, a generative model may be a more appropriate choice than a discriminative model or vice versa. If characterising the phenomenon we are modelling from a systems perspective, generative models should be preferred. If we are only interested in predicting some clinical event, as in the case of diagnostic devices, discriminative models provide better predictive accuracy at the cost of being less expressive. Second, we need to measure the performance of the model, usually by being able to predict new events. Third, we must encode the knowledge of the world from training data, experts or both into the model: this step is called learning. The computer system can then

3 Figure 1: An undirected network of five nodes connected through edges. learn what is the best model within the prescribed class by finding that that maximize the chosen performance metric, drawing either from observational or experimental data or expert knowledge available from practitioners. Fourth, the computer uses the model as a proxy of reality and performs inference as new inputs come in, all while deciding if and how to perform the assigned task. Successfully implementing machine learning applications is, however, far from easy: it requires large amounts of data, and it is difficult to decide how to structure the model from a probabilistic and mathematical point of view. Principled software engineering also plays an important role13. For these reasons, machine learning models should comprise a limited number of variables so that they are easy to construct and interpret. As a side effect, compact models are unlikely to require copious amounts of computational power to learn, unlike state- of-the-art neural networks13. Clinical settings limit our knowledge of the patients to what we can learn from them: hence probability should be used to determine whether two variables are associated given other variables. Formally, if the occurrence of an event in one variable affects the probability of an event occurring in another variable, we say the two are associated with or probabilistically dependent on each other. Association is symmetric in probability; it does not distinguish between cause and effects in itself. models, however, go beyond probability theory to represent causal effects as arcs in a graph to allow for principled causal reasoning.

4 Bayesian Networks

We introduce in this section the fundamental notions of Bayesian networks. Specialised textbooks have a more thorough review on the subject9, and software to implement them is available from the bnlearn package14 for the R statistical environment.

4.1 The use of graphs to represent interactions among entities Graph theory is the field of mathematics that deals with the study of graphs: such structures are meant to represent relationships between entities, like symptoms, signs or biological markers in the medical sciences.

4 Figure 2: A Bayesian network, or Directed Acyclic Graph composed of five nodes. Edges are directed from one node to another node.

A graph G is understood as a set V of nodes (also known as vertices) representing variables (or other feature of the data) that are connected through a set A of edges (also known as arcs). Let us consider a network of five nodes, such as that shown in Figure1. In this case, the set of nodes comprises V = {v1, v2, v3, v4, v5}, and the set of edges is A = {a12, a14, a23, a24, a25, a34},

where a12 represents the edges between node v1 and node v2. The network represented in Fig- ure1 is called an undirected network, because edges are not directed in a particular direction, and therefore (vi, vj) = (vj, vi). Undirected networks are commonly used to represent the pairwise interactions among psychopathological symptoms6. The edges can be unweighted, so that an edge can either be present aij = 1 or absent aij = 0 between two nodes; or they can be weighted, so that some edges can be stronger than others in a network, and can have either a positive or negative sign. For instance, an edge weight can represent a partial cor- relation estimate15 to convey the existence of a conditional association relationship between two variables.

4.2 Directed Acyclic Graphs Bayesian networks, on the other hand, are based on directed acyclic graphs (DAGs). A DAG contains only directed edges, hence (vi, vj) 6= (vj, vi) because the former is vi → vj and the latter is vj → vi. An example is shown in Figure2. Such arcs are often interpreted as causal relationships in which the tail node is the cause and the head of the arrow is the effect. Bayesian networks cannot contain loops (the effect of a node on itself) or cycles (for instance, A goes to B, B goes to C, and C goes to A). The primary goal of a Bayesian network is to express the conditional independence set of relationships among variables (that is, variables that do not predict each other). In addition to a DAG, Bayesian networks are defined by the global probability distribution of X (with Xi being the variable that corresponds to the node vi in the network) with

5 parameters Θ, N Y Pr(X, Θ) = Pr (Xi | ΠXi ;ΘXi ) i=1 where ΠXi represent the parent nodes of Xi. This factorisation derives from the Markov 10 property of Bayesian networks, that is, every variable Xi depends on its parents ΠXi . The three most common probability distributions for Bayesian networks are discrete, Gaussian, conditional linear Gaussian 9. Discrete Bayesian networks have, for instance, been used in expert systems to differentiate between tuberculosis and lung cancer16. Gaussian Bayesian networks are common is genetics and systems biology for reconstructing direct and indirect gene effects17, and together with conditional Gaussian Bayesian networks they have been used to study various clinical treatments and conditions7;18.

5 Difference between neural networks and Bayesian networks

Neural networks, often called Artificial Neural Networks (ANNs), are in widespread use in the AI field. They represent networks of interconnected artificial neurons that change their state with the external or internal information that flows through the network during a learning phase19. Neural networks are usually organised in several layers: an input layer, several intermediate layers of latent variables, and an output layer. Their aim is to identify a relationship between the input and the output. Hence they are discriminative models, and do not provide any insight into the interplay of the variables nor a semantic representation of causes and effects. They are known to be difficult to interpret20, to the point that post hoc methods to improve their interpretability and explainability are now a challenging new avenue for research21. The key advantage of Bayesian networks is that they model of the real world: the phenomenon under investigation is understood by the machine, dissected, and clearly represented as a set of causal relationships. This allows for a predictive reasoning much needed in medicine: Bayesian networks can answer diagnostic and prognostic questions of the form “how will symptom/sign/disease A change if we act upon symptom/sign/disease B?” in a way that is understandable by both patients and clinicians. This is possible because of the reversibility of Bayes’ theorem (Pr(A | B) Pr(B) = Pr(A, B) = Pr(B | A) Pr(A)).

6 Structure learning of Bayesian networks

In this section we will introduce the concepts of graphical separation and probabilistic inde- pendence.

6.1 The Markov property In Bayesian networks, if two nodes are unconnected (that is, they do not share an edge), that means that they are also conditionally independent: this is called the Markov property10. Graphical separation implies probabilistic independence,

A ⊥⊥G B | C =⇒ A ⊥⊥P B | C.

6 Figure 3: A Bayesian network of six nodes to illustrate graphical separation (meaning two nodes are not connected in the network). For instance, 1 is separated from 4 and 5 through 3; 2 is separated from 4 and 5 through 3, and 3 is separated from 6 through 5. making the network itself is a clear representation of the conditional independence relation- ships between nodes. For this reason, the DAG is called an independence map of the variables. The Markov property makes it possible to write

N Y Pr(X, Θ) = Pr (Xi | ΠXi ;ΘXi ) , i=1 decomposing the larger model Pr(X, Θ) into a set of smaller models Pr (Xi | ΠXi ;ΘXi ) that are easier to understand. This decomposition is only possible because of the absence of loops and cycles in the graph. Figure3 represents a Bayesian network with six nodes. Two nodes, say v1 and v4, are graphically separated by node v3, and are therefore conditionally independent given node v3:

v1 ⊥⊥G v4 | v3 =⇒ Pr(v1, v4 | v3) = Pr(v1 | v3) Pr(v4 | v3).

Figure3 also shows a specific kind of relationship in a Bayesian network, that is the one among nodes v1, v2 and v3. Both v1 and v2 have an edge pointing to v3 and the two share no connection. This kind of motif is commonly known as a v-structure, or a collider, and it is often considered as one of the building blocks of Bayesian networks. In a collider, the two causes are known to be negatively correlated, which is counter-intuitive; conditioning on the common effect in the collider (that is, studying the associations while manipulating the effect) leads to different estimates compared to studying the two causes on their own. This phenomenon is known as collider bias or Berkson’s bias and it is an important source of bias in the medical sciences22.

6.2 Markov blankets A DAG is understood as an independence map of the probability distribution of the variables Xi; retrieving such a map means testing which nodes are conditionally (in)dependent. d- separation is a useful instrument to algorithmically determine whether two nodes in a network

7 are (in)dependent or conditionally (in)dependent23: two nodes A and B are d-separated by a conditioning set of nodes S if conditioning on all members of S blocks all paths (sequence of nodes and edges with A as starting node and B as ending node) between A and B. A collider is known to block all paths it overlaps. The set S is known as the Markov blanket of node A in the graph G. By definition, the Markov blanket includes a node’s parents (nodes with an edge directed towards A), children (nodes that receive a directed edge from A), and spouses, that is, children’s other parents. The Markov blanket is useful in investigating a target node of interest while forgetting about the rest of the Bayesian network; all nodes outside of the Markov blanket are independent from the node of interest.

6.3 Bayesian networks and Causality Since Bayesian networks are based on DAGs, the relationships among variables are easily interpreted as causal relationship. However, three assumptions should be made before inter- preting an edge as a causal effect. First, each variable (node) must conditionally independent of its indirect and direct non-effects given its direct causes (this is the causal of the Markov property). Second, there must exist a DAG faithful to the probability distribution of X so that the only dependencies in the probability distribution are those that arise from d-separations in the DAG. The third assumption descends from the first two: there must be no latent variables that act as confounding factors (therefore developing causal effects on one or several nodes in the network without the DAG reporting such relationships). The third assumption is particularly important in clinical settings: to safely interpret a directed con- nection as a causal effect, the experimental design should be set as to block any confounding factors. A common device to achieve that is randomisation, which severs any incoming causal link between the randomised variables and possible exogenous effects. It is important to make a distinction between the probabilistic and causal interpretations of Bayesian networks. From a causal perspective, the direction of arcs is uniquely identified by the asymmetry between cause and effect: if we act on the cause, we may influence the effect; but if we act on the effect, the cause remains unaffected. From a probabilistic perspective, this is not true because of the reversibility of Bayes’ theorem. For instance, if we consider again the DAG in Figure2 we can write

Pr(v1, v2, v3, v4, v5) = Pr(v1) Pr(v2 | v1) Pr(v3 | v2) Pr(v4 | v2, v3) Pr(v5 | v2, v4)

where each node has a distribution conditional on its parents. However, for the nodes v1 and v2 we have that Pr(v1) Pr(v2 | v1) = Pr(v2) Pr(v1 | v2). This implies that the DAG in which the arc v1 → v2 is reversed into v2 → v1 encodes the same probability distribution as that in Figure2, despite having different arcs. The only arcs whose direction cannot be changed in this way, and is thus uniquely identified even without making causal assumptions, are those that are part of a collider; or those that would be part of a new collider or introduce a cycle if they were reversed. Another consequence of the duality between the probabilistic and the causal interpretation of Bayesian networks is that we can compute the conditional probability of any pair of variables regardless of how we construct the DAG. Depending on the application, it may make more sense to use a prognostic DAG in which arcs point from diseases to symptoms, or a diagnostic DAG in which arcs point from symptoms to diseases. From a purely probabilistic perspective, we note that for every diagnostic DAG there is a prognostic DAG that represents

8 the same probability distribution and vice versa. Clearly, one will be easier to interpret than the other because the DAG will be easier to read. However, any conditional probability we may wish to compute will be identical for both.

6.4 Structure learning algorithms Structure learning algorithms learn the structure of Bayesian networks from data, given the assumptions of Bayesian networks (such as the Markov property). Constraint-based algorithms use statistical tests to learn conditional independence relationships (the constraints themselves); score-based algorithms rank candidate DAGs based on some goodness-of-fit criterion; hybrid algorithms use conditional independence tests to exclude the vast majority of candidate DAGs, and then perform a score-based search on the few that are still under consideration.

6.4.1 The Inductive Causation algorithm and its implementation The Inductive Causation algorithm (IC)24 is the simplest example of structure learning constraint-based algorithm. Firstly, for each pair of variables A and B in X, the algorithm searches for a set SAB such that A and B are independent given SAB and such that A and B are not part of it (A, B∈ / SAB). Secondly, for each pair of variables that are not connected by an edge but that are both connected to a common neighbour C, the algorithm checks whether C ∈ SAB: if C/∈ SAB, then the direction of edge A −− C becomes A → C and that of edge C −− B becomes C ← B. Thirdly, the direction of the edges that are still undirected is set following two rules: if A is adjacent to B and there is a strictly directed path from a to B, then A −− B becomes A → B; if A and B are not adjacent but A → C and C −− B, then C −− B becomes C → B. This step ends the algorithm which then returns the partially directed graph in which only those arc directions that can be uniquely identified from the data are represented. The IC algorithm is implemented by the Peter & Clark algorithm (PC)25, which starts from a saturated network and then performs tests that gradually increases the number of conditioning nodes.

7 Discussion: applications and limitations of Bayesian Artifi- cial Intelligence in Medicine

Bayesian AI and networks are in use in several areas in the medical science and are a tool of interest for physicians. Applications of Bayesian AI in medicine stem from four main domains: diagnostic reasoning— establishing a diagnosis in a target patient given clinical evidence; prognostic reasoning— making predictions about what might happen in the future, since Bayesian networks encode a temporal information even in cross-sectional data; treatment selection—making predictions about the possible effects for a treatment; studying functional interactions among clinical ev- idence such as symptoms, signs and biomarkers. Several examples from the four main domains described above illustrate the vast poten- tial of Bayesian AI. First, recovering clinical evidence from the electronic medical record is a

9 substantial starting block for making inference: systems to construct clinical Bayesian net- works from electronic medical records have been developed26. Second, prognostic Bayesian networks are used to predict mortality in patients27. Third, Bayesian networks are also used for clinical decision support and treatment selection in complicated cases28. Fourth, study- ing functional interactions among symptoms in the medical domain of psychiatry shows great promise: because the classification of mental disorders is rapidly shifting paradigms, the new approach of mental disorders as networks of mutually influencing components29 is a promis- ing setting for the application of Bayesian reasoning. Works have already endeavoured to represent interacting symptoms for disorders like depression as DAGs in cross-sectional data, therefore retrieving the possible causal relationships among them6. Future work in this area may for instance include different types of variables (other than symptoms) in networks. The use of Bayesian networks is limited by the assumptions required to correctly learn and perform inference on their structure. It is up to the researchers to design studies accordingly: blocking confounding variables is by far the most difficult task in this respect6. In conclusion, Bayesian artificial intelligence captures uncertain reasoning in medicine through the promising model of Bayesian networks: they can be learned automatically from data and combine graphs and probabilities in a rigorous way, with algorithms that automate reasoning and use the graphical part of the model to guide a computer system in computing probabilities and predict events of interest.

References

[1] Turing AM. Computing machinery and intelligence. . 1950;59(236):433.

[2] Briganti G, Moine OL. Artificial intelligence in medicine: Today and tomorrow. Frontiers in Medicine. 2020;7:27.

[3] Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health. 2019;1(6):e271–e297.

[4] Mintz Y, Brodie R. Introduction to artificial intelligence in medicine. Minimally Invasive Therapy & Allied Technologies. 2019;28(2):73–81.

[5] Lucas PJF, Van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artificial intelligence in medicine. 2004;30(3):201–214.

[6] Briganti G, Scutari M, Linkowski P. Network structures of symptoms from the Zung Depression Scale. Psychological Reports. 2020:0033294120942116.

[7] Liew BXW, Scutari M, Peolsson A, Peterson G, Ludvigsson ML, Falla D. Investigating the Causal Mechanisms of Symptom Recovery in Chronic Whiplash Associated Disorders using Bayesian Networks. The Clinical Journal of Pain. 2019;35(8):647–655.

[8] Pearl J, Russell S. Bayesian networks. UCLA Cognitive Systems Laboratory; 2011. R-277.

10 [9] Scutari M, Denis JB. Bayesian Networks: With Examples in R. Chapman & Hall; 2015.

[10] Korb KB, Nicholson AE. Bayesian artificial intelligence. 2nd ed. CRC press; 2010.

[11] Miettinen OS, Caro JJ. Foundations of medical diagnosis: what actually are the param- eters involved in Bayes’ theorem? Statistics in medicine. 1994;13(3):201–209.

[12] Murphy KP. Machine Learning: A Probabilistic Perspective. The MIT Press; 2012.

[13] Beam AL, Manrai AK, Ghassemi M. Challenges to the Reproducibility of Ma- chine Learning Models in Heath Care. Journal of the American Medical Association. 2020;323(4):305–306.

[14] Scutari M. Learning Bayesian Networks with the bnlearn R package. Journal of Statis- tical Software. 2010;35(3):1–22.

[15] Briganti G, Chantal C, Braun S, Fried EI, Linkowski P. Network analysis of empathy items from the interpersonal reactivity index in 1973 young adults. Psychiatry Research. 2018;265:87–92.

[16] Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological). 1988;50(2):157–194.

[17] Kruijer W, Behrouzi P, Bustos-Korts D, Rodr´ıguez-Alvarez´ MX, Mahmoudi SM, Yandell B, et al. Reconstruction of networks with direct and indirect genetic effects. Genetics. 2020;214(4):781–807.

[18] Dao MC, Everard A, Aron-Wisnewsky J, Sokolovska N, Prifti E, Verger EO, et al. Akkermansia muciniphila and improved metabolic health during a dietary intervention in obesity: relationship with gut microbiome richness and ecology. Gut. 2016;65(3):426– 436.

[19] Hecht-Nielsen R. On the algebraic structure of feedforward network weight spaces. In: Advanced Neural Computers. Elsevier; 1990. p. 129–135.

[20] Correa M, Bielza C, Pamies-Teixeira J. Comparison of Bayesian networks and artificial neural networks for quality detection in a machining process. Expert systems with applications. 2009;36(3):7270–7279.

[21] Holzinger A, Langs G, Denk H, Zatloukal K, M¨ullerH. Causability and explainability of artificial intelligence in medicine. WIREs Data Mining and Knowledge Discovery. 2019;9:e1312.

[22] Berkson J. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin. 1946;2(3):47–53.

[23] Geiger D, Verma T, Pearl J. d-separation: From theorems to algorithms. In: Machine Intelligence and Pattern Recognition. vol. 10. Elsevier; 1990. p. 139–148.

11 [24] Pearl J, Verma T. A theory of inferred causation. In: Studies in Logic and the Founda- tions of Mathematics. vol. 134. Elsevier; 1995. p. 789–811.

[25] Spirtes P, Glymour C, Scheines R. Discovery Algorithms for Causally Sufficient Struc- tures. In: Spirtes P, Glymour C, Scheines R, editors. Causation, Prediction, and Search. New York, NY: Springer; 1993. p. 103–162.

[26] Shen Y, Zhang L, Zhang J, Yang M, Tang B, Li Y, et al. CBN: Constructing a clinical Bayesian network based on data from the electronic medical record. Journal of biomedical informatics. 2018;88:1–10.

[27] Verduijn M, Peek N, Rosseel PMJ, de Jonge E, de Mol BAJM. Prognostic bayesian networks: I: Rationale, learning procedure, and clinical use. Journal of Biomedical Informatics. 2007;40(6):609–618.

[28] Sesen MB, Nicholson AE, Banares-Alcantara R, Kadir T, Brady M. Bayesian networks for clinical decision support in lung cancer care. PloS ONE. 2013;8(12):e82349.

[29] Borsboom D, Cramer AOJ. Network Analysis: An Integrative Approach to the Structure of Psychopathology. Annual Review of Clinical . 2013;9(1):91–121.

12