Extending pgmpy¶
It’s really easy to extend pgmpy to quickly prototype your ideas. pgmpy has a base abstract class for most of main functionalities like: BaseInference
for inference, BaseFactor
for model parameters, BaseEstimators
for parameter and model learning. For adding a new feature to pgmpy we just need to implement a new class inheriting one of these base classes and then we can use the new class with other functionality of pgmpy.
In this example we will see how to write a new inference algorithm. We will take the example of a very simple algorithm in which we will multiply all the factors/CPD of the network and marginalize over variable to get the desired query.
[1]:
# A simple Exact inference algorithm
import itertools
from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product
class SimpleInference(Inference):
def __init__(self,model):
super(SimpleInference, self).__init__(model)
self._initialize_structures()
# By inheriting Inference we can use self.model, self.factors and self.cardinality in our class
def query(self, var, evidence):
# self.factors is a dict of the form of {node: [factors_involving_node]}
factors_list = set(itertools.chain(*self.factors.values()))
product = factor_product(*factors_list)
reduced_prod = product.reduce(evidence, inplace=False)
reduced_prod.normalize()
var_to_marg = (
set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
)
marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
return marg_prod
[2]:
# Defining a model
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
model = BayesianModel([("A", "J"), ("R", "J"), ("J", "Q"), ("J", "L"), ("G", "L")])
cpd_a = TabularCPD("A", 2, values=[[0.2], [0.8]])
cpd_r = TabularCPD("R", 2, values=[[0.4], [0.6]])
cpd_j = TabularCPD(
"J",
2,
values=[[0.9, 0.6, 0.7, 0.1], [0.1, 0.4, 0.3, 0.9]],
evidence=["A", "R"],
evidence_card=[2, 2],
)
cpd_q = TabularCPD(
"Q", 2, values=[[0.9, 0.2], [0.1, 0.8]], evidence=["J"], evidence_card=[2]
)
cpd_l = TabularCPD(
"L",
2,
values=[[0.9, 0.45, 0.8, 0.1], [0.1, 0.55, 0.2, 0.9]],
evidence=["J", "G"],
evidence_card=[2, 2],
)
cpd_g = TabularCPD("G", 2, values=[[0.6], [0.4]])
model.add_cpds(cpd_a, cpd_r, cpd_j, cpd_q, cpd_l, cpd_g)
[3]:
# Doing inference with our SimpleInference
infer = SimpleInference(model)
a = infer.query(var=["A"], evidence=[("J", 0), ("R", 1)])
[4]:
print(a)
+-----+----------+
| A | phi(A) |
|-----+----------|
| A_0 | 0.6000 |
| A_1 | 0.4000 |
+-----+----------+
[5]:
# Comparing the results with Variable Elimination Algorithm
from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
result = infer.query(["A"], evidence={"J": 0, "R": 1})
print(result["A"])
+-----+----------+
| A | phi(A) |
|-----+----------|
| A_0 | 0.6000 |
| A_1 | 0.4000 |
+-----+----------+
Similarly we can also create new classes for Factor or CPDs and add them to networks and do inference over it or can write a new estimator class.