Inference in Discrete Bayesian Network

In this notebook, we show a simple example for doing Exact inference in Bayesian Networks using pgmpy. We will be using the Asia network (http://www.bnlearn.com/bnrepository/#asia) for this example.

Step 1: Define the model.

[1]:
# Fetch the asia model from the bnlearn repository

from pgmpy.utils import get_example_model

asia_model = get_example_model("asia")
[2]:
print("Nodes: ", asia_model.nodes())
print("Edges: ", asia_model.edges())
asia_model.get_cpds()
Nodes:  ['asia', 'tub', 'smoke', 'lung', 'bronc', 'either', 'xray', 'dysp']
Edges:  [('asia', 'tub'), ('tub', 'either'), ('smoke', 'lung'), ('smoke', 'bronc'), ('lung', 'either'), ('bronc', 'dysp'), ('either', 'xray'), ('either', 'dysp')]
[2]:
[<TabularCPD representing P(asia:2) at 0x7f08a40e6a90>,
 <TabularCPD representing P(bronc:2 | smoke:2) at 0x7f08a40e6dc0>,
 <TabularCPD representing P(dysp:2 | bronc:2, either:2) at 0x7f08a40fa730>,
 <TabularCPD representing P(either:2 | lung:2, tub:2) at 0x7f08a40fa100>,
 <TabularCPD representing P(lung:2 | smoke:2) at 0x7f08a40fa790>,
 <TabularCPD representing P(smoke:2) at 0x7f08a40fa5e0>,
 <TabularCPD representing P(tub:2 | asia:2) at 0x7f08a40fac40>,
 <TabularCPD representing P(xray:2 | either:2) at 0x7f08a40fab80>]

If you would like to create a model from scratch, please refer to the Creating Bayesian Networks notebook: https://github.com/pgmpy/pgmpy/blob/dev/examples/Creating%20a%20Bayesian%20Network.ipynb

Step 2: Initialize the inference class

Currently, pgmpy support two algorithms for inference: 1. Variable Elimination and, 2. Belief Propagation. Both of these are exact inference algorithms. The following example uses VariableElimination but BeliefPropagation has an identifcal API, so all the methods show below would also work for BeliefPropagation.

[3]:
# Initializing the VariableElimination class

from pgmpy.inference import VariableElimination

asia_infer = VariableElimination(asia_model)

Step 3: Doing Inference using hard evidence

[4]:
# Computing the probability of bronc given smoke=no.
q = asia_infer.query(variables=["bronc"], evidence={"smoke": "no"})
print(q)

# Computing the joint probability of bronc and asia given smoke=yes
q = asia_infer.query(variables=["bronc", "asia"], evidence={"smoke": "yes"})
print(q)

# Computing the probabilities (not joint) of bronc and asia given smoke=no
q = asia_infer.query(variables=["bronc", "asia"], evidence={"smoke": "no"}, joint=False)
for factor in q.values():
    print(factor)
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]

0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]

0it [00:00, ?it/s]


0it [00:00, ?it/s]

Finding Elimination Order: : : 0it [00:00, ?it/s]


0it [00:00, ?it/s]
+------------+--------------+
| bronc      |   phi(bronc) |
+============+==============+
| bronc(yes) |       0.3000 |
+------------+--------------+
| bronc(no)  |       0.7000 |
+------------+--------------+
+------------+-----------+-------------------+
| bronc      | asia      |   phi(bronc,asia) |
+============+===========+===================+
| bronc(yes) | asia(yes) |            0.0060 |
+------------+-----------+-------------------+
| bronc(yes) | asia(no)  |            0.5940 |
+------------+-----------+-------------------+
| bronc(no)  | asia(yes) |            0.0040 |
+------------+-----------+-------------------+
| bronc(no)  | asia(no)  |            0.3960 |
+------------+-----------+-------------------+
+------------+--------------+
| bronc      |   phi(bronc) |
+============+==============+
| bronc(yes) |       0.3000 |
+------------+--------------+
| bronc(no)  |       0.7000 |
+------------+--------------+
+-----------+-------------+
| asia      |   phi(asia) |
+===========+=============+
| asia(yes) |      0.0100 |
+-----------+-------------+
| asia(no)  |      0.9900 |
+-----------+-------------+

[5]:
# Computing the MAP of bronc given smoke=no.
q = asia_infer.map_query(variables=["bronc"], evidence={"smoke": "no"})
print(q)

# Computing the MAP of bronc and asia given smoke=yes
q = asia_infer.map_query(variables=["bronc", "asia"], evidence={"smoke": "yes"})
print(q)



0it [00:00, ?it/s]


Finding Elimination Order: : : 0it [00:00, ?it/s]



0it [00:00, ?it/s]




0it [00:00, ?it/s]



Finding Elimination Order: : : 0it [00:00, ?it/s]




0it [00:00, ?it/s]
{'bronc': 'no'}
{'bronc': 'yes', 'asia': 'no'}

Step 5: Inference using virtual evidence

[6]:
from pgmpy.factors.discrete import TabularCPD

# Get state name, which ensure lung evidence has same state names as asia_model
state_names = asia_model.get_cpds('lung').state_names

lung_virt_evidence = TabularCPD(variable="lung", variable_card=2, values=[[0.4], [0.6]],state_names=state_names)

# Query with hard evidence smoke = no and virtual evidence lung = [0.4, 0.6]
q = asia_infer.query(
    variables=["bronc"], evidence={"smoke": "no"}, virtual_evidence=[lung_virt_evidence]
)
print(q)

# Query with hard evidence smoke = no and virtual evidences lung = [0.4, 0.6] and bronc = [0.3, 0.7]
lung_virt_evidence = TabularCPD(variable="lung", variable_card=2, values=[[0.4], [0.7]])
Finding Elimination Order: : : 0it [00:04, ?it/s]
Finding Elimination Order: : : 0it [00:04, ?it/s]
Finding Elimination Order: : : 0it [00:02, ?it/s]
Finding Elimination Order: : : 0it [00:02, ?it/s]

0it [00:00, ?it/s]
Finding Elimination Order: : : 0it [00:00, ?it/s]

0it [00:00, ?it/s]
+------------+--------------+
| bronc      |   phi(bronc) |
+============+==============+
| bronc(yes) |       0.3000 |
+------------+--------------+
| bronc(no)  |       0.7000 |
+------------+--------------+

[8]:
print(asia_model.get_cpds("lung"))
+-----------+------------+-----------+
| smoke     | smoke(yes) | smoke(no) |
+-----------+------------+-----------+
| lung(yes) | 0.1        | 0.01      |
+-----------+------------+-----------+
| lung(no)  | 0.9        | 0.99      |
+-----------+------------+-----------+

Step 4: Troubleshooting for slow inference

In the case of large models, or models in which variables have a lot of states, inference can be quite slow. Some of the ways to deal with it are:

  1. Reduce the number of states of variables by combining states together.

  2. Try a different elimination order by specifying elimination_order argument. Possible options are: MinFill, MinNeighbors, MinWeight, WeightedMinFill.

  3. Try a custom elimination order: The implemented heuristics for computing the elimination order might not be efficient in every case. If you can think of a more efficient order, you can also pass it as a list to the elimination_order argument.

  4. If it is still too slow, try using approximate inference using sampling algorithms.