Creating discrete Bayesian Networks¶
In this section, we show an example for creating a Bayesian Network in pgmpy from scratch. We use the cancer model (http://www.bnlearn.com/bnrepository/#cancer) for the example. The model structure is shown below.
In pgmpy, the model structure and it’s parametrization (CPDs) doesn’t depend on each other. So, the workflow is to first define the model structure, then define all the parameters (CPDs) and then add these parameters to the model. These CPDs can later on be modified, removed, replaced without changing or defining a new model structure.
[1]:
from IPython.display import Image
Image("images/cancer.png")
[1]:
Step 1: Define the model structure¶
The BayesianModel
can be initialized by passing a list of edges in the model structure. In this case, there are 4 edges in the model: Pollution -> Cancer, Smoker -> Cancer, Cancer -> Xray, Cancer -> Dyspnoea.
[3]:
from pgmpy.models import BayesianNetwork
cancer_model = BayesianNetwork(
[
("Pollution", "Cancer"),
("Smoker", "Cancer"),
("Cancer", "Xray"),
("Cancer", "Dyspnoea"),
]
)
Step 2: Define the CPDs¶
Each node of a Bayesian Network has a CPD associated with it, hence we need to define 5 CPDs in this case. In pgmpy, CPDs can be defined using the TabularCPD
class. For details on the parameters, please refer to the documentation: https://pgmpy.org/_modules/pgmpy/factors/discrete/CPD.html
[4]:
from pgmpy.factors.discrete import TabularCPD
cpd_poll = TabularCPD(variable="Pollution", variable_card=2, values=[[0.9], [0.1]])
cpd_smoke = TabularCPD(variable="Smoker", variable_card=2, values=[[0.3], [0.7]])
cpd_cancer = TabularCPD(
variable="Cancer",
variable_card=2,
values=[[0.03, 0.05, 0.001, 0.02], [0.97, 0.95, 0.999, 0.98]],
evidence=["Smoker", "Pollution"],
evidence_card=[2, 2],
)
cpd_xray = TabularCPD(
variable="Xray",
variable_card=2,
values=[[0.9, 0.2], [0.1, 0.8]],
evidence=["Cancer"],
evidence_card=[2],
)
cpd_dysp = TabularCPD(
variable="Dyspnoea",
variable_card=2,
values=[[0.65, 0.3], [0.35, 0.7]],
evidence=["Cancer"],
evidence_card=[2],
)
Step 3: Add the CPDs to the model.¶
After defining the model parameters, we can now add them to the model using add_cpds
method. The check_model
method can also be used to verify if the CPDs are correctly defined for the model structure.
[5]:
# Associating the parameters with the model structure.
cancer_model.add_cpds(cpd_poll, cpd_smoke, cpd_cancer, cpd_xray, cpd_dysp)
# Checking if the cpds are valid for the model.
cancer_model.check_model()
[5]:
True
Step 4: Run basic operations on the model¶
[8]:
# Check for d-separation between variables
print(cancer_model.is_dconnected("Pollution", "Smoker"))
print(cancer_model.is_dconnected("Pollution", "Smoker", observed=["Cancer"]))
False
True
[9]:
# Get all d-connected nodes
cancer_model.active_trail_nodes("Pollution")
[9]:
{'Pollution': {'Cancer', 'Dyspnoea', 'Pollution', 'Xray'}}
[10]:
# List local independencies for a node
cancer_model.local_independencies("Xray")
[10]:
(Xray ⟂ Smoker, Pollution, Dyspnoea | Cancer)
[11]:
# Get all model implied independence conditions
cancer_model.get_independencies()
[11]:
(Xray ⟂ Smoker, Pollution, Dyspnoea | Cancer)
(Xray ⟂ Pollution, Dyspnoea | Smoker, Cancer)
(Xray ⟂ Smoker, Dyspnoea | Pollution, Cancer)
(Xray ⟂ Smoker, Pollution | Cancer, Dyspnoea)
(Xray ⟂ Dyspnoea | Smoker, Pollution, Cancer)
(Xray ⟂ Pollution | Smoker, Cancer, Dyspnoea)
(Xray ⟂ Smoker | Pollution, Cancer, Dyspnoea)
(Smoker ⟂ Pollution)
(Smoker ⟂ Xray, Dyspnoea | Cancer)
(Smoker ⟂ Xray, Dyspnoea | Pollution, Cancer)
(Smoker ⟂ Dyspnoea | Xray, Cancer)
(Smoker ⟂ Xray | Cancer, Dyspnoea)
(Smoker ⟂ Dyspnoea | Pollution, Xray, Cancer)
(Smoker ⟂ Xray | Pollution, Cancer, Dyspnoea)
(Pollution ⟂ Smoker)
(Pollution ⟂ Xray, Dyspnoea | Cancer)
(Pollution ⟂ Xray, Dyspnoea | Smoker, Cancer)
(Pollution ⟂ Dyspnoea | Xray, Cancer)
(Pollution ⟂ Xray | Cancer, Dyspnoea)
(Pollution ⟂ Dyspnoea | Smoker, Xray, Cancer)
(Pollution ⟂ Xray | Smoker, Cancer, Dyspnoea)
(Dyspnoea ⟂ Smoker, Pollution, Xray | Cancer)
(Dyspnoea ⟂ Pollution, Xray | Smoker, Cancer)
(Dyspnoea ⟂ Smoker, Xray | Pollution, Cancer)
(Dyspnoea ⟂ Smoker, Pollution | Xray, Cancer)
(Dyspnoea ⟂ Xray | Smoker, Pollution, Cancer)
(Dyspnoea ⟂ Pollution | Smoker, Xray, Cancer)
(Dyspnoea ⟂ Smoker | Pollution, Xray, Cancer)
Loading example models¶
To quickly try out different features, pgmpy also has the functionality to directly load some example models from the bnlearn repository.
[12]:
from pgmpy.utils import get_example_model
model = get_example_model("cancer")
print("Nodes in the model:", model.nodes())
print("Edges in the model:", model.edges())
model.get_cpds()
Nodes in the model: ['Pollution', 'Smoker', 'Cancer', 'Xray', 'Dyspnoea']
Edges in the model: [('Pollution', 'Cancer'), ('Smoker', 'Cancer'), ('Cancer', 'Xray'), ('Cancer', 'Dyspnoea')]
[12]:
[<TabularCPD representing P(Cancer:2 | Pollution:2, Smoker:2) at 0x7fbbbcdffee0>,
<TabularCPD representing P(Dyspnoea:2 | Cancer:2) at 0x7fbbbcdff4f0>,
<TabularCPD representing P(Pollution:2) at 0x7fbbbcdffa30>,
<TabularCPD representing P(Smoker:2) at 0x7fbbbcdff7f0>,
<TabularCPD representing P(Xray:2 | Cancer:2) at 0x7fbbbcdff790>]