Bayesian Estimator

class pgmpy.estimators.BayesianEstimator(model: DAG | DiscreteBayesianNetwork, data: DataFrame, **kwargs)[source]

Class used to compute parameters for a model using Bayesian Parameter Estimation. See MaximumLikelihoodEstimator for constructor parameters.

estimate_cpd(node: Hashable, prior_type: str = 'BDeu', pseudo_counts: List[List[float]] | ndarray | float | int = [], equivalent_sample_size: int | float = 5, weighted: bool = False) TabularCPD[source]

Method to estimate the CPD for a given variable.

Parameters:
  • node (int, string (any hashable python object)) – The name of the variable for which the CPD is to be estimated.

  • prior_type ('dirichlet', 'BDeu', 'K2',) –

    string indicting which type of prior to use for the model parameters. - If ‘prior_type’ is ‘dirichlet’, the following must be provided:

    ’pseudo_counts’ = dirichlet hyperparameters; a single number or 2-D array

    of shape (node_card, product of parents_card) with a “virtual” count for each variable state in the CPD. The virtual counts are added to the actual state counts found in the data. (if a list is provided, a lexicographic ordering of states is assumed)

    • If ‘prior_type’ is ‘BDeu’, then an ‘equivalent_sample_size’

      must be specified instead of ‘pseudo_counts’. This is equivalent to ‘prior_type=dirichlet’ and using uniform ‘pseudo_counts’ of equivalent_sample_size/(node_cardinality*np.prod(parents_cardinalities)).

    • A prior_type of ‘K2’ is a shorthand for ‘dirichlet’ + setting every pseudo_count to 1, regardless of the cardinality of the variable.

  • weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.

Returns:

CPD – The estimated CPD for node.

Return type:

TabularCPD

Examples

>>> import pandas as pd
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BayesianEstimator
>>> data = pd.DataFrame(data={"A": [0, 0, 1], "B": [0, 1, 0], "C": [1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "C"), ("B", "C")])
>>> estimator = BayesianEstimator(model, data)
>>> cpd_C = estimator.estimate_cpd(
...     node="C",
...     prior_type="dirichlet",
...     pseudo_counts=[[1, 1, 1, 1], [2, 2, 2, 2]],
... )
>>> print(cpd_C)
+------+------+------+------+--------------------+
| A    | A(0) | A(0) | A(1) | A(1)               |
+------+------+------+------+--------------------+
| B    | B(0) | B(1) | B(0) | B(1)               |
+------+------+------+------+--------------------+
| C(0) | 0.25 | 0.25 | 0.5  | 0.3333333333333333 |
+------+------+------+------+--------------------+
| C(1) | 0.75 | 0.75 | 0.5  | 0.6666666666666666 |
+------+------+------+------+--------------------+
get_parameters(prior_type: str = 'BDeu', equivalent_sample_size: int | Dict[Any, int] = 5, pseudo_counts: int | Dict[Any, ndarray] | None = None, n_jobs: int = 1, weighted: bool = False) List[TabularCPD][source]

Method to estimate the model parameters (CPDs).

Parameters:
  • prior_type ('dirichlet', 'BDeu', or 'K2') –

    string indicting which type of prior to use for the model parameters. - If ‘prior_type’ is ‘dirichlet’, the following must be provided:

    ’pseudo_counts’ = dirichlet hyperparameters; a single number or a dict containing, for each

    variable, a 2-D array of the shape (node_card, product of parents_card) with a “virtual” count for each variable state in the CPD, that is added to the state counts. (lexicographic ordering of states assumed)

    • If ‘prior_type’ is ‘BDeu’, then an ‘equivalent_sample_size’

      must be specified instead of ‘pseudo_counts’. This is equivalent to ‘prior_type=dirichlet’ and using uniform ‘pseudo_counts’ of equivalent_sample_size/(node_cardinality*np.prod(parents_cardinalities)) for each node. ‘equivalent_sample_size’ can either be a numerical value or a dict that specifies the size for each variable separately.

    • A prior_type of ‘K2’ is a shorthand for ‘dirichlet’ + setting every pseudo_count to 1,

      regardless of the cardinality of the variable.

  • equivalent_sample_size (int) – Refer prior_type for more details.

  • pseudo_counts (int (default: None)) – Refer prior_type for more details.

  • n_jobs (int (default: 1)) – Number of jobs to run in parallel. Default: 1. Using n_jobs > 1 for small models might be slower.

  • weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.

Returns:

parameters – List of TabularCPDs, one for each variable of the model

Return type:

list

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BayesianEstimator
>>> np.random.seed(42)
>>> values = pd.DataFrame(
...     np.random.randint(low=0, high=2, size=(1000, 4)),
...     columns=["A", "B", "C", "D"],
... )
>>> model = DiscreteBayesianNetwork([("A", "B"), ("C", "B"), ("C", "D")])
>>> estimator = BayesianEstimator(model, values)
>>> estimator.get_parameters(prior_type="BDeu", equivalent_sample_size=5)
[<TabularCPD representing P(A:2) at 0x...>,
 <TabularCPD representing P(B:2 | A:2, C:2) at 0x...>,
 <TabularCPD representing P(C:2) at 0x...>,
 <TabularCPD representing P(D:2 | C:2) at 0x...>]