Extending pgmpy

It’s really easy to extend pgmpy to quickly prototype your ideas. pgmpy has a base abstract class for most of main functionalities like: BaseInference for inference, BaseFactor for model parameters, BaseEstimators for parameter and model learning. For adding a new feature to pgmpy we just need to implement a new class inheriting one of these base classes and then we can use the new class with other functionality of pgmpy.

In this example we will see how to write a new inference algorithm. We will take the example of a very simple algorithm in which we will multiply all the factors/CPD of the network and marginalize over variable to get the desired query.

[1]:
# A simple Exact inference algorithm

import itertools

from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product


class SimpleInference(Inference):
    def __init__(self,model):
        super(SimpleInference, self).__init__(model)
        self._initialize_structures()

    # By inheriting Inference we can use self.model, self.factors and self.cardinality in our class
    def query(self, var, evidence):
        # self.factors is a dict of the form of {node: [factors_involving_node]}
        factors_list = set(itertools.chain(*self.factors.values()))
        product = factor_product(*factors_list)
        reduced_prod = product.reduce(evidence, inplace=False)
        reduced_prod.normalize()
        var_to_marg = (
            set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
        )
        marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
        return marg_prod
[2]:
# Defining a model

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

model = BayesianModel([("A", "J"), ("R", "J"), ("J", "Q"), ("J", "L"), ("G", "L")])
cpd_a = TabularCPD("A", 2, values=[[0.2], [0.8]])
cpd_r = TabularCPD("R", 2, values=[[0.4], [0.6]])
cpd_j = TabularCPD(
    "J",
    2,
    values=[[0.9, 0.6, 0.7, 0.1], [0.1, 0.4, 0.3, 0.9]],
    evidence=["A", "R"],
    evidence_card=[2, 2],
)
cpd_q = TabularCPD(
    "Q", 2, values=[[0.9, 0.2], [0.1, 0.8]], evidence=["J"], evidence_card=[2]
)
cpd_l = TabularCPD(
    "L",
    2,
    values=[[0.9, 0.45, 0.8, 0.1], [0.1, 0.55, 0.2, 0.9]],
    evidence=["J", "G"],
    evidence_card=[2, 2],
)
cpd_g = TabularCPD("G", 2, values=[[0.6], [0.4]])

model.add_cpds(cpd_a, cpd_r, cpd_j, cpd_q, cpd_l, cpd_g)
[3]:
# Doing inference with our SimpleInference

infer = SimpleInference(model)
a = infer.query(var=["A"], evidence=[("J", 0), ("R", 1)])
[4]:
print(a)
+-----+----------+
| A   |   phi(A) |
|-----+----------|
| A_0 |   0.6000 |
| A_1 |   0.4000 |
+-----+----------+
[5]:
# Comparing the results with Variable Elimination Algorithm

from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
result = infer.query(["A"], evidence={"J": 0, "R": 1})
print(result["A"])
+-----+----------+
| A   |   phi(A) |
|-----+----------|
| A_0 |   0.6000 |
| A_1 |   0.4000 |
+-----+----------+

Similarly we can also create new classes for Factor or CPDs and add them to networks and do inference over it or can write a new estimator class.