Bayesian Estimator¶
- class pgmpy.estimators.BayesianEstimator(model, data, **kwargs)[source]¶
Class used to compute parameters for a model using Bayesian Parameter Estimation. See MaximumLikelihoodEstimator for constructor parameters.
- estimate_cpd(node, prior_type='BDeu', pseudo_counts=[], equivalent_sample_size=5, weighted=False)[source]¶
Method to estimate the CPD for a given variable.
- Parameters:
node (int, string (any hashable python object)) – The name of the variable for which the CPD is to be estimated.
prior_type ('dirichlet', 'BDeu', 'K2',) –
string indicting which type of prior to use for the model parameters. - If ‘prior_type’ is ‘dirichlet’, the following must be provided:
- ’pseudo_counts’ = dirichlet hyperparameters; a single number or 2-D array
of shape (node_card, product of parents_card) with a “virtual” count for each variable state in the CPD. The virtual counts are added to the actual state counts found in the data. (if a list is provided, a lexicographic ordering of states is assumed)
- If ‘prior_type’ is ‘BDeu’, then an ‘equivalent_sample_size’
must be specified instead of ‘pseudo_counts’. This is equivalent to ‘prior_type=dirichlet’ and using uniform ‘pseudo_counts’ of equivalent_sample_size/(node_cardinality*np.prod(parents_cardinalities)).
A prior_type of ‘K2’ is a shorthand for ‘dirichlet’ + setting every pseudo_count to 1, regardless of the cardinality of the variable.
weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.
- Returns:
CPD – The estimated CPD for node.
- Return type:
Examples
>>> import pandas as pd >>> from pgmpy.models import BayesianNetwork >>> from pgmpy.estimators import BayesianEstimator >>> data = pd.DataFrame(data={'A': [0, 0, 1], 'B': [0, 1, 0], 'C': [1, 1, 0]}) >>> model = BayesianNetwork([('A', 'C'), ('B', 'C')]) >>> estimator = BayesianEstimator(model, data) >>> cpd_C = estimator.estimate_cpd('C', prior_type="dirichlet", ... pseudo_counts=[[1, 1, 1, 1], ... [2, 2, 2, 2]]) >>> print(cpd_C) ╒══════╤══════╤══════╤══════╤════════════════════╕ │ A │ A(0) │ A(0) │ A(1) │ A(1) │ ├──────┼──────┼──────┼──────┼────────────────────┤ │ B │ B(0) │ B(1) │ B(0) │ B(1) │ ├──────┼──────┼──────┼──────┼────────────────────┤ │ C(0) │ 0.25 │ 0.25 │ 0.5 │ 0.3333333333333333 │ ├──────┼──────┼──────┼──────┼────────────────────┤ │ C(1) │ 0.75 │ 0.75 │ 0.5 │ 0.6666666666666666 │ ╘══════╧══════╧══════╧══════╧════════════════════╛
- get_parameters(prior_type='BDeu', equivalent_sample_size=5, pseudo_counts=None, n_jobs=1, weighted=False)[source]¶
Method to estimate the model parameters (CPDs).
- Parameters:
prior_type ('dirichlet', 'BDeu', or 'K2') –
string indicting which type of prior to use for the model parameters. - If ‘prior_type’ is ‘dirichlet’, the following must be provided:
- ’pseudo_counts’ = dirichlet hyperparameters; a single number or a dict containing, for each
variable, a 2-D array of the shape (node_card, product of parents_card) with a “virtual” count for each variable state in the CPD, that is added to the state counts. (lexicographic ordering of states assumed)
- If ‘prior_type’ is ‘BDeu’, then an ‘equivalent_sample_size’
must be specified instead of ‘pseudo_counts’. This is equivalent to ‘prior_type=dirichlet’ and using uniform ‘pseudo_counts’ of equivalent_sample_size/(node_cardinality*np.prod(parents_cardinalities)) for each node. ‘equivalent_sample_size’ can either be a numerical value or a dict that specifies the size for each variable separately.
- A prior_type of ‘K2’ is a shorthand for ‘dirichlet’ + setting every pseudo_count to 1,
regardless of the cardinality of the variable.
equivalent_sample_size (int) – Refer prior_type for more details.
pseudo_counts (int (default: None)) – Refer prior_type for more details.
n_jobs (int (default: 1)) – Number of jobs to run in parallel. Default: 1. Using n_jobs > 1 for small models might be slower.
weighted (bool) – If weighted=True, the data must contain a _weight column specifying the weight of each datapoint (row). If False, assigns an equal weight to each datapoint.
- Returns:
parameters – List of TabularCPDs, one for each variable of the model
- Return type:
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.models import BayesianNetwork >>> from pgmpy.estimators import BayesianEstimator >>> values = pd.DataFrame(np.random.randint(low=0, high=2, size=(1000, 4)), ... columns=['A', 'B', 'C', 'D']) >>> model = BayesianNetwork([('A', 'B'), ('C', 'B'), ('C', 'D')]) >>> estimator = BayesianEstimator(model, values) >>> estimator.get_parameters(prior_type='BDeu', equivalent_sample_size=5) [<TabularCPD representing P(C:2) at 0x7f7b534251d0>, <TabularCPD representing P(B:2 | C:2, A:2) at 0x7f7b4dfd4da0>, <TabularCPD representing P(A:2) at 0x7f7b4dfd4fd0>, <TabularCPD representing P(D:2 | C:2) at 0x7f7b4df822b0>]