Hill Climb Search¶

class pgmpy.estimators.HillClimbSearch(data: DataFrame, use_cache: bool = True, **kwargs)[source]¶

Class for heuristic hill climb searches for DAGs, to learn network structure from data. estimate attempts to find a model with optimal score.

Parameters:

data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.nan. Note that pandas converts each column containing numpy.nan`s to dtype `float.)
state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.
use_caching (boolean) – If True, uses caching of score for faster computation. Note: Caching only works for scoring methods which are decomposable. Can give wrong results in case of custom scoring methods.

References

Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009 Section 18.4.3 (page 811ff)

estimate(scoring_method: str | StructureScore | None = None, start_dag: DAG | None = None, tabu_length: int = 100, max_indegree: int | None = None, expert_knowledge: ExpertKnowledge | None = None, epsilon: float = 0.0001, max_iter: int = 1000000, show_progress: bool = True) → DAG[source]¶

Performs local hill climb search to estimates the DAG structure that has optimal score, according to the scoring method supplied. Starts at model start_dag and proceeds by step-by-step network modifications until a local maximum is reached. Only estimates network structure, no parametrization.

Parameters:

scoring_method (str or StructureScore instance) – The score to be optimized during structure estimation. Supported structure scores: k2, bdeu, bds, bic-d, aic-d, ll-g, aic-g, bic-g, ll-cg, aic-cg, bic-cg. Also accepts a custom score, but it should be an instance of StructureScore.
start_dag (DAG instance) – The starting point for the local search. By default, a completely disconnected network is used.
tabu_length (int) – If provided, the last tabu_length graph modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space. Default value: 100.
max_indegree (int or None) – If provided and unequal None, the procedure only searches among models where all nodes have at most max_indegree parents. Defaults to None.
expert_knowledge (pgmpy.estimators.ExpertKnowledge instance (default: None)) – Expert knowledge to be used with the algorithm. Expert knowledge allows specification of required and forbidden edges, as well as temporal order of nodes.
epsilon (float (default: 1e-4)) – Defines the exit condition. If the improvement in score is less than epsilon, the learned model is returned.
max_iter (int (default: 1e6)) – The maximum number of iterations allowed. Returns the learned model when the number of iterations is greater than max_iter.

Returns:

Estimated model – A DAG at a (local) score maximum.

Return type:

pgmpy.base.DAG

Examples

>>> # Simulate some sample data from a known model to learn the model structure from
>>> from pgmpy.utils import get_example_model
>>> model = get_example_model("alarm")
>>> df = model.simulate(int(1e3))

>>> # Learn the model structure using HillClimbSearch algorithm from `df`
>>> from pgmpy.estimators import HillClimbSearch
>>> est = HillClimbSearch(data)
>>> dag = est.estimate(scoring_method="bic-d")
>>> len(dag.nodes())
37
>>> len(dag.edges())
45

Structure Scores¶

BDeu Score¶

class pgmpy.estimators.BDeu(data, equivalent_sample_size=10, **kwargs)[source]¶

BDeu structure score for discrete Bayesian networks with Dirichlet priors.

The BDeu score evaluates Bayesian network structures using an “equivalent sample size” to define Dirichlet prior hyperparameters, making it flexible for various data sizes and uncertainty levels. Use this score when you want to control the influence of your prior belief through the equivalent sample size.

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a discrete variable. Missing values should be set as numpy.nan. Note: pandas converts such columns to dtype float.
equivalent_sample_size (int, optional (default: 10)) – The equivalent (imaginary) sample size for the Dirichlet hyperparameters. The score is sensitive to this value; experiment with different values as needed.
state_names (dict, optional) – Dictionary mapping variable names to their discrete states. If not specified, unique values observed in the data are used as possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BDeu
>>> data = pd.DataFrame({"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> bdeu_score = BDeu(data, equivalent_sample_size=5)
>>> print(bdeu_score.score(model))
-241.872

Raises:: ValueError – If the data contains continuous variables, or if the model variables are not present in the data.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009,: Section 18.3.4–18.3.6 (esp. page 806).
[2] AM Carvalho, Scoring functions for learning Bayesian networks,: http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]¶

Computes the local BDeu score for a given variable and its parent variables.

This method calculates how well a given variable is explained by its parents according to the BDeu scoring metric, incorporating the equivalent sample size as the Dirichlet prior.

Parameters:

variable (str) – The name of the variable for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local BDeu score for the specified variable and parent configuration.

Return type:

float

Raises:

ValueError – If variable or any parent is not found in state_names or data.

BIC Score¶

class pgmpy.estimators.BIC(data, **kwargs)[source]¶

BIC (Bayesian Information Criterion) structure score for discrete Bayesian networks.

The BIC score, also known as the Minimal Descriptive Length (MDL) score, evaluates Bayesian network structures using a log-likelihood term with a complexity penalty to discourage overfitting. Use this score for structure learning when you want to balance model fit with simplicity.

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a discrete variable. Missing values should be set as numpy.nan. Note: pandas converts such columns to dtype float.
state_names (dict, optional) – Dictionary mapping variable names to their discrete states. If not specified, unique values observed in the data are used as possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BIC
>>> data = pd.DataFrame({"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> bic_score = BIC(data)
>>> print(bic_score.score(model))
-151.47

Raises:: ValueError – If the data contains continuous variables, or if the model variables are not present in the data.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009,: Section 18.3.4–18.3.6 (esp. page 802).
[2] AM Carvalho, Scoring functions for learning Bayesian networks,: http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]¶

Computes the local BIC/MDL score for a variable and its parent variables.

This method quantifies the fit of a variable to its parent set in the network, balancing log-likelihood with a complexity penalty to discourage overfitting.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local BIC score for the specified variable and parent configuration.

Return type:

float

Examples

>>> variable = "B"
>>> parents = ["A"]
>>> score = bic_score.local_score(variable, parents)
>>> print(score)
-19.315

Raises:: ValueError – If variable or any parent is not present in state_names or data, or if the data contains unsupported types (e.g., continuous values).

AIC Score¶

class pgmpy.estimators.AIC(data, **kwargs)[source]¶

AIC (Akaike Information Criterion) structure score for discrete Bayesian networks.

The AIC score evaluates Bayesian network structures using a log-likelihood term with a penalty for model complexity to discourage overfitting. Unlike BIC, the penalty term is independent of sample size, making AIC more sensitive to goodness of fit in smaller datasets.

Use this score when you want to select a network structure that balances model fit with simplicity, especially in contexts with moderate or small sample sizes.

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a discrete variable. Missing values should be set as numpy.nan. Note: pandas converts such columns to dtype float.
state_names (dict, optional) – Dictionary mapping variable names to their discrete states. If not specified, unique values observed in the data are used as possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import AIC
>>> data = pd.DataFrame({"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> aic_score = AIC(data)
>>> print(aic_score.score(model))
-140.12

Raises:: ValueError – If the data contains continuous variables, or if the model variables are not present in the data.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009,: Section 18.3.4–18.3.6 (esp. page 802).
[2] AM Carvalho, Scoring functions for learning Bayesian networks,: http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]¶

Computes the local AIC score for a variable and its parent variables.

This method quantifies the fit of a variable to its parent set in the network, balancing log-likelihood with a complexity penalty to avoid overfitting.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local AIC score for the specified variable and parent configuration.

Return type:

float

Examples

>>> variable = "B"
>>> parents = ["A"]
>>> score = aic_score.local_score(variable, parents)
>>> print(score)
-17.032

Raises:: ValueError – If variable or any parent is not present in state_names or data, or if the data contains unsupported types (e.g., continuous values).

K2 Score¶

class pgmpy.estimators.K2(data, **kwargs)[source]¶

K2 structure score for discrete Bayesian networks using Dirichlet priors.

The K2 score is commonly used to evaluate the fit of a Bayesian network structure on fully discrete data, assuming all Dirichlet hyperparameters (pseudo-counts) are set to 1. This metric is suitable for structure learning when variables are categorical and no prior preference for particular parameterizations is assumed.

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a discrete variable. Missing values should be set to numpy.nan. (Note: pandas will convert columns with numpy.nan to dtype float.)
state_names (dict, optional) – Dictionary mapping each variable to its discrete states. If not specified, the unique values observed in the data are used as possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import K2
>>> data = pd.DataFrame({"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> k2_score = K2(data)
>>> print(k2_score.score(model))
-356.1785

Raises:: ValueError – If the data contains continuous variables, or if the model variables are not present in the data.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009,: Section 18.3.4–18.3.6 (esp. page 806).
[2] AM Carvalho, Scoring functions for learning Bayesian networks,: http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]¶

Computes the local K2 score for a discrete variable and its parent variables.

The K2 local score measures how well the conditional probability distribution of variable given its parents fits the observed data, assuming uniform Dirichlet priors (all hyperparameters set to 1). The calculation is based on marginal and conditional counts, and is suitable for fully discrete Bayesian networks.

Parameters:

variable (str) – The name of the target variable (child node).
parents (list of str) – List of parent variable names (categorical/discrete).

Returns:

score – The local K2 score for the specified variable and parent configuration.

Return type:

float

Examples

>>> variable = "B"
>>> parents = ["A"]
>>> s = k2_score.local_score(variable, parents)
>>> print(s)
-42.18

Raises:: ValueError – If variable or any parent is not present in state_names or data, or if the data is not fully discrete.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009,: Section 18.3.4–18.3.6 (esp. page 806).

BDs Score¶

class pgmpy.estimators.BDs(data, equivalent_sample_size=10, **kwargs)[source]¶

BDs (Bayesian Dirichlet sparse) structure score for discrete Bayesian networks.

The BDs score is a variant of the BDeu score that sets Dirichlet hyperparameters (pseudo-counts) proportional to the number of observed parent configurations, leading to improved scoring in sparse or partially observed data scenarios.

Use this score when you expect many possible parent configurations in your data to be unobserved (common in sparse or high-dimensional discrete datasets).

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a discrete variable. Missing values should be set as numpy.nan. Note: pandas converts such columns to dtype float.
equivalent_sample_size (int, optional (default: 10)) – The equivalent (imaginary) sample size for the Dirichlet hyperparameters. The score is sensitive to this value; try different values if needed.
state_names (dict, optional) – Dictionary mapping variable names to their discrete states. If not specified, unique values observed in the data are used as possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BDs
>>> data = pd.DataFrame({"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]})
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> bds_score = BDs(data, equivalent_sample_size=5)
>>> print(bds_score.score(model))
-210.314

Raises:: ValueError – If the data contains continuous variables, or if the model variables are not present in the data.

References

[1] Scutari, Marco. An Empirical-Bayes Score for Discrete Bayesian Networks.: Journal of Machine Learning Research, 2016, pp. 438–48

local_score(variable, parents)[source]¶

Computes the local BDs score for a variable and its parent variables.

The BDs local score quantifies how well the given variable is explained by its specified parent set, using a Bayesian Dirichlet sparse prior. The hyperparameters are adjusted based on the number of observed parent configurations, making the score more robust in sparse data scenarios.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local BDs score for the specified variable and parent configuration.

Return type:

float

Examples

>>> variable = "B"
>>> parents = ["A"]
>>> score = bds_score.local_score(variable, parents)
>>> print(score)
-38.215

Raises:: ValueError – If variable or any parent is not present in state_names or data, or if the data contains unsupported types (e.g., continuous values).

structure_prior(model)[source]¶

Computes the marginal uniform prior for a Bayesian network structure.

This method assigns a marginal uniform prior to the graph structure, where the probability of an arc (edge) between any two nodes (in either direction) is 1/4, and the probability of no arc between any two nodes is 1/2. The returned value is the log prior probability for the given model structure.

Parameters:: model (DiscreteBayesianNetwork) – The Bayesian network model for which to compute the structure prior.
Returns:: score – The log prior probability of the given network structure under the marginal uniform prior.
Return type:: float

Examples

>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.estimators import BDs
>>> model = DiscreteBayesianNetwork([("A", "B"), ("C", "D")])
>>> score = BDs(data)
>>> prior = score.structure_prior(model)
>>> print(prior)
-4.1588830833596715

structure_prior_ratio(operation)[source]¶

Computes the log ratio of prior probabilities for a proposed change to the DAG structure.

This method implements the marginal uniform prior for the graph structure, where the log prior probability ratio is -log(2) for adding an edge, log(2) for removing an edge, and 0 otherwise.

Parameters:: operation (str) – The proposed operation on the Directed Acyclic Graph (DAG). Use “+” for adding an edge, “-” for removing an edge, or other values for no change.
Returns:: prior_ratio – The log ratio of the prior probabilities for the proposed operation.
Return type:: float

Examples

>>> from pgmpy.estimators import BDs
>>> score = BDs(data)
>>> score.structure_prior_ratio("+")
-0.6931471805599453
>>> score.structure_prior_ratio("-")
0.6931471805599453
>>> score.structure_prior_ratio("noop")
0

Gaussian Log-Likelihood Score¶

class pgmpy.estimators.LogLikelihoodGauss(data, **kwargs)[source]¶

Log-likelihood structure score for Gaussian Bayesian networks.

This score evaluates the fit of a continuous (Gaussian) Bayesian network structure by computing the (unpenalized) log-likelihood of the observed data given the model, using generalized linear modeling. It is suitable for networks with continuous variables.

Parameters:

data (pandas.DataFrame) – DataFrame where each column represents a continuous variable.
state_names (dict, optional) – Dictionary mapping variable names to possible states. Not typically used for Gaussian networks.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import LogLikelihoodGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randn(100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = LogLikelihoodGauss(data)
>>> ll = score.local_score("B", ["A", "C"])
>>> print(ll)
-142.125

Raises:: ValueError – If the data contains discrete or non-numeric variables.

local_score(variable, parents)[source]¶

Computes the log-likelihood score for a variable given its parent variables.

Fits a generalized linear model (GLM) for the variable as a function of its parents, and returns the resulting log-likelihood as the structure score.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The log-likelihood score for the specified variable and parent configuration.

Return type:

float

Examples

>>> ll = score.local_score("B", ["A", "C"])
>>> print(ll)
-142.125

Raises:: ValueError – If the GLM cannot be fitted due to non-numeric data or missing columns.

Gaussian BIC Score¶

class pgmpy.estimators.BICGauss(data, **kwargs)[source]¶

BIC (Bayesian Information Criterion) structure score for Gaussian Bayesian networks.

The BICGauss score evaluates continuous Bayesian network structures by penalizing the log-likelihood with a term proportional to the number of model parameters, discouraging overfitting. This is the Gaussian version of the BIC/MDL score, suitable for networks where all variables are continuous.

Parameters:: data (pandas.DataFrame) – DataFrame where each column represents a continuous variable.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import BICGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randn(100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = BICGauss(data)
>>> s = score.local_score("B", ["A", "C"])
>>> print(s)
-111.42

Raises:: ValueError – If the GLM cannot be fitted due to missing or non-numeric data.

local_score(variable, parents)[source]¶

Computes the local BIC/MDL score for a variable and its parent variables in a Gaussian Bayesian network.

The score is the log-likelihood minus a penalty term that increases with the number of model parameters and sample size.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local BICGauss score for the specified variable and parent configuration.

Return type:

float

Examples

>>> s = score.local_score("B", ["A", "C"])
>>> print(s)
-111.42

Raises:: ValueError – If the GLM cannot be fitted due to missing or non-numeric data.

Gaussian AIC Score¶

class pgmpy.estimators.AICGauss(data, **kwargs)[source]¶

AIC (Akaike Information Criterion) structure score for Gaussian Bayesian networks.

The AICGauss score evaluates continuous Bayesian network structures by penalizing the log-likelihood with a term proportional to the number of model parameters. The penalty is less severe than BIC and does not depend on sample size, making AIC preferable for model selection with smaller datasets.

Parameters:: data (pandas.DataFrame) – DataFrame where each column represents a continuous variable.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import AICGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randn(100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = AICGauss(data)
>>> s = score.local_score("B", ["A", "C"])
>>> print(s)
-97.53

Raises:: ValueError – If the GLM cannot be fitted due to missing or non-numeric data.

local_score(variable, parents)[source]¶

Computes the local AIC score for a variable and its parent variables in a Gaussian Bayesian network.

The score is the log-likelihood minus a penalty term that increases with the number of model parameters (but not sample size).

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local AICGauss score for the specified variable and parent configuration.

Return type:

float

Examples

>>> s = score.local_score("B", ["A", "C"])
>>> print(s)
-97.53

Raises:: ValueError – If the GLM cannot be fitted due to missing or non-numeric data.

Conditional Gaussian Log-Likelihood Score¶

class pgmpy.estimators.LogLikelihoodCondGauss(data, **kwargs)[source]¶

Log-likelihood score for Bayesian networks with mixed discrete and continuous variables.

This score is based on conditional Gaussian distributions and supports networks with both discrete and continuous variables, using the methodology described in [1]. The local score computes the log-likelihood of the observed data given the network structure, handling mixed parent sets as described in the reference.

Parameters:

data (pandas.DataFrame) – DataFrame where columns can be discrete or continuous variables. Variable types should be consistent with the structure.
state_names (dict, optional) – Dictionary mapping discrete variable names to their possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import LogLikelihoodCondGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randint(0, 2, 100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = LogLikelihoodCondGauss(data)
>>> ll = score.local_score("A", ["B", "C"])
>>> print(ll)
-98.452

Raises:: ValueError – If the data or variable types are not suitable for conditional Gaussian modeling.

References

[1] Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian: Networks of Mixed Variables. International journal of data science and analytics, 6(1), 3–18. https://doi.org/10.1007/s41060-017-0085-7

local_score(variable, parents)[source]¶

Computes the local log-likelihood score for a variable given its parent variables in a mixed (discrete and continuous) Bayesian network.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local conditional Gaussian log-likelihood score for the specified variable and parent configuration.

Return type:

float

Examples

>>> ll = score.local_score("A", ["B", "C"])
>>> print(ll)
-98.452

Raises:: ValueError – If the log-likelihood cannot be computed due to incompatible data or variable types.

Conditional Gaussian BIC Score¶

class pgmpy.estimators.BICCondGauss(data, **kwargs)[source]¶

BIC (Bayesian Information Criterion) score for Bayesian networks with mixed (discrete and continuous) variables.

The BICCondGauss score evaluates network structures by penalizing the conditional log-likelihood with a term proportional to the number of free parameters and the logarithm of sample size. This approach generalizes the classic BIC to handle mixed discrete/continuous data as described in [1].

Parameters:

data (pandas.DataFrame) – DataFrame where columns may be discrete or continuous variables.
state_names (dict, optional) – Dictionary mapping discrete variable names to possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import BICCondGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randint(0, 2, 100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = BICCondGauss(data)
>>> s = score.local_score("A", ["B", "C"])
>>> print(s)
-115.37

Raises:: ValueError – If the log-likelihood or number of parameters cannot be computed for the provided variables.

References

[1] Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian: Networks of Mixed Variables. International journal of data science and analytics, 6(1), 3–18. https://doi.org/10.1007/s41060-017-0085-7

local_score(variable, parents)[source]¶

Computes the local BIC score for a variable and its parent set in a mixed Bayesian network.

The score is calculated as the log-likelihood minus a complexity penalty, which is proportional to the number of free parameters and the log of the sample size.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local BICCondGauss score for the specified variable and parent configuration.

Return type:

float

Examples

>>> s = score.local_score("A", ["B", "C"])
>>> print(s)
-115.37

Raises:: ValueError – If the log-likelihood or parameter count cannot be computed for the given configuration.

Conditional Gaussian AIC Score¶

class pgmpy.estimators.AICCondGauss(data, **kwargs)[source]¶

AIC (Akaike Information Criterion) score for Bayesian networks with mixed (discrete and continuous) variables.

The AICCondGauss score evaluates network structures by penalizing the conditional log-likelihood with a term equal to the number of free parameters. This generalizes the classic AIC to handle Bayesian networks with both discrete and continuous variables [1].

Parameters:

data (pandas.DataFrame) – DataFrame where columns may be discrete or continuous variables.
state_names (dict, optional) – Dictionary mapping discrete variable names to possible states.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import AICCondGauss
>>> data = pd.DataFrame(
...     {
...         "A": np.random.randn(100),
...         "B": np.random.randint(0, 2, 100),
...         "C": np.random.randn(100),
...     }
... )
>>> score = AICCondGauss(data)
>>> s = score.local_score("A", ["B", "C"])
>>> print(s)
-99.75

Raises:: ValueError – If the log-likelihood or number of parameters cannot be computed for the provided variables.

References

[1] Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian: Networks of Mixed Variables. International journal of data science and analytics, 6(1), 3–18. https://doi.org/10.1007/s41060-017-0085-7

local_score(variable, parents)[source]¶

Computes the local AIC score for a variable and its parent set in a mixed Bayesian network.

The score is calculated as the log-likelihood minus the number of free parameters.

Parameters:

variable (str) – The name of the variable (node) for which the local score is to be computed.
parents (list of str) – List of variable names considered as parents of variable.

Returns:

score – The local AICCondGauss score for the specified variable and parent configuration.

Return type:

float

Examples

>>> s = score.local_score("A", ["B", "C"])
>>> print(s)
-99.75

Raises:: ValueError – If the log-likelihood or parameter count cannot be computed for the given configuration.

Hill Climb Search¶

Structure Scores¶

BDeu Score¶

BIC Score¶

AIC Score¶

K2 Score¶

BDs Score¶

Gaussian Log-Likelihood Score¶

Gaussian BIC Score¶

Gaussian AIC Score¶

Conditional Gaussian Log-Likelihood Score¶

Conditional Gaussian BIC Score¶

Conditional Gaussian AIC Score¶

Navigation

Related Topics