Inference in Discrete Bayesian Network¶
In this notebook, we show a simple example for doing Exact inference in Bayesian Networks using pgmpy. We will be using the Asia network (http://www.bnlearn.com/bnrepository/#asia) for this example.
Step 1: Define the model.¶
[1]:
# Fetch the asia model from the bnlearn repository
from pgmpy.utils import get_example_model
asia_model = get_example_model("asia")
[2]:
print("Nodes: ", asia_model.nodes())
print("Edges: ", asia_model.edges())
asia_model.get_cpds()
Nodes: ['asia', 'tub', 'smoke', 'lung', 'bronc', 'either', 'xray', 'dysp']
Edges: [('asia', 'tub'), ('tub', 'either'), ('smoke', 'lung'), ('smoke', 'bronc'), ('lung', 'either'), ('bronc', 'dysp'), ('either', 'xray'), ('either', 'dysp')]
[2]:
[<TabularCPD representing P(asia:2) at 0x7f08a40e6a90>,
<TabularCPD representing P(bronc:2 | smoke:2) at 0x7f08a40e6dc0>,
<TabularCPD representing P(dysp:2 | bronc:2, either:2) at 0x7f08a40fa730>,
<TabularCPD representing P(either:2 | lung:2, tub:2) at 0x7f08a40fa100>,
<TabularCPD representing P(lung:2 | smoke:2) at 0x7f08a40fa790>,
<TabularCPD representing P(smoke:2) at 0x7f08a40fa5e0>,
<TabularCPD representing P(tub:2 | asia:2) at 0x7f08a40fac40>,
<TabularCPD representing P(xray:2 | either:2) at 0x7f08a40fab80>]
If you would like to create a model from scratch, please refer to the Creating Bayesian Networks notebook: https://github.com/pgmpy/pgmpy/blob/dev/examples/Creating%20a%20Bayesian%20Network.ipynb
Step 2: Initialize the inference class¶
Currently, pgmpy support two algorithms for inference: 1. Variable Elimination and, 2. Belief Propagation. Both of these are exact inferece algorithms. The following example uses VariableElimination
but BeliefPropagation
has an identifcal API, so all the methods show below would also work for BeliefPropagation
.
[3]:
# Initializing the VariableElimination class
from pgmpy.inference import VariableElimination
asia_infer = VariableElimination(asia_model)
Step 3: Doing Inference using hard evidence¶
[4]:
# Computing the probability of bronc given smoke=no.
q = asia_infer.query(variables=["bronc"], evidence={"smoke": "no"})
print(q)
# Computing the joint probability of bronc and asia given smoke=yes
q = asia_infer.query(variables=["bronc", "asia"], evidence={"smoke": "yes"})
print(q)
# Computing the probabilities (not joint) of bronc and asia given smoke=no
q = asia_infer.query(variables=["bronc", "asia"], evidence={"smoke": "no"}, joint=False)
for factor in q.values():
print(factor)
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
+------------+--------------+
| bronc | phi(bronc) |
+============+==============+
| bronc(yes) | 0.3000 |
+------------+--------------+
| bronc(no) | 0.7000 |
+------------+--------------+
+------------+-----------+-------------------+
| bronc | asia | phi(bronc,asia) |
+============+===========+===================+
| bronc(yes) | asia(yes) | 0.0060 |
+------------+-----------+-------------------+
| bronc(yes) | asia(no) | 0.5940 |
+------------+-----------+-------------------+
| bronc(no) | asia(yes) | 0.0040 |
+------------+-----------+-------------------+
| bronc(no) | asia(no) | 0.3960 |
+------------+-----------+-------------------+
+------------+--------------+
| bronc | phi(bronc) |
+============+==============+
| bronc(yes) | 0.3000 |
+------------+--------------+
| bronc(no) | 0.7000 |
+------------+--------------+
+-----------+-------------+
| asia | phi(asia) |
+===========+=============+
| asia(yes) | 0.0100 |
+-----------+-------------+
| asia(no) | 0.9900 |
+-----------+-------------+
[5]:
# Computing the MAP of bronc given smoke=no.
q = asia_infer.map_query(variables=["bronc"], evidence={"smoke": "no"})
print(q)
# Computing the MAP of bronc and asia given smoke=yes
q = asia_infer.map_query(variables=["bronc", "asia"], evidence={"smoke": "yes"})
print(q)
0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
{'bronc': 'no'}
{'bronc': 'yes', 'asia': 'no'}
Step 5: Inference using virtual evidence¶
[6]:
lung_virt_evidence = TabularCPD(variable="lung", variable_card=2, values=[[0.4], [0.6]])
# Query with hard evidence smoke = no and virtual evidence lung = [0.4, 0.6]
q = asia_infer.query(
variables=["bronc"], evidence={"smoke": "no"}, virtual_evidence=[lung_virt_evidence]
)
print(q)
# Query with hard evidence smoke = no and virtual evidences lung = [0.4, 0.6] and bronc = [0.3, 0.7]
lung_virt_evidence = TabularCPD(variable="lung", variable_card=2, values=[[0.4], [0.7]])
Finding Elimination Order: : : 0it [00:04, ?it/s]
Finding Elimination Order: : : 0it [00:04, ?it/s]
Finding Elimination Order: : : 0it [00:02, ?it/s]
Finding Elimination Order: : : 0it [00:02, ?it/s]
0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
+------------+--------------+
| bronc | phi(bronc) |
+============+==============+
| bronc(yes) | 0.3000 |
+------------+--------------+
| bronc(no) | 0.7000 |
+------------+--------------+
[8]:
print(asia_model.get_cpds("lung"))
+-----------+------------+-----------+
| smoke | smoke(yes) | smoke(no) |
+-----------+------------+-----------+
| lung(yes) | 0.1 | 0.01 |
+-----------+------------+-----------+
| lung(no) | 0.9 | 0.99 |
+-----------+------------+-----------+
Step 4: Troubleshooting for slow inference¶
In the case of large models, or models in which variables have a lot of states, inference can be quite slow. Some of the ways to deal with it are:
Reduce the number of states of variables by combining states together.
Try a different elimination order by specifying
elimination_order
argument. Possible options are: MinFill, MinNeighbors, MinWeight, WeightedMinFill.Try a custom elimination order: The implemented heuristics for computing the elimination order might not be efficient in every case. If you can think of a more efficient order, you can also pass it as a list to the
elimination_order
argument.If it is still too slow, try using approximate inference using sampling algorithms.