Structure Score

BDeu Score

class pgmpy.estimators.BDeuScore(data, equivalent_sample_size=10, **kwargs)[source]

Class for Bayesian structure scoring for BayesianNetworks with Dirichlet priors. The BDeu score is the result of setting all Dirichlet hyperparameters/pseudo_counts to equivalent_sample_size/variable_cardinality. The score-method measures how well a model is able to describe the given data set.

Parameters:
  • data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • equivalent_sample_size (int (default: 10)) – The equivalent/imaginary sample size (of uniform pseudo samples) for the dirichlet hyperparameters. The score is sensitive to this value, runs with different values might be useful.

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009 Section 18.3.4-18.3.6 (esp. page 806) [2] AM Carvalho, Scoring functions for learning Bayesian networks, http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]

Computes a score that measures how much a given variable is “influenced” by a given list of potential parents.

Bic Score

class pgmpy.estimators.BicScore(data, **kwargs)[source]

Class for Bayesian structure scoring for BayesianNetworks with Dirichlet priors. The BIC/MDL score (“Bayesian Information Criterion”, also “Minimal Descriptive Length”) is a log-likelihood score with an additional penalty for network complexity, to avoid overfitting. The score-method measures how well a model is able to describe the given data set.

Parameters:
  • data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009 Section 18.3.4-18.3.6 (esp. page 802) [2] AM Carvalho, Scoring functions for learning Bayesian networks, http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]

Computes a score that measures how much a given variable is “influenced” by a given list of potential parents.

K2 Score

class pgmpy.estimators.K2Score(data, **kwargs)[source]

Class for Bayesian structure scoring for BayesianNetworks with Dirichlet priors. The K2 score is the result of setting all Dirichlet hyperparameters/pseudo_counts to 1. The score-method measures how well a model is able to describe the given data set.

Parameters:
  • data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

References

[1] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009 Section 18.3.4-18.3.6 (esp. page 806) [2] AM Carvalho, Scoring functions for learning Bayesian networks, http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

local_score(variable, parents)[source]

Computes a score that measures how much a given variable is “influenced” by a given list of potential parents.

BDsScore

class pgmpy.estimators.BDsScore(data, equivalent_sample_size=10, **kwargs)[source]

Class for Bayesian structure scoring for BayesianNetworks with Dirichlet priors. The BDs score is the result of setting all Dirichlet hyperparameters/pseudo_counts to equivalent_sample_size/modified_variable_cardinality where for the modified_variable_cardinality only the number of parent configurations where there were observed variable counts are considered. The score-method measures how well a model is able to describe the given data set.

Parameters:
  • data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.NaN. Note that pandas converts each column containing numpy.NaN`s to dtype `float.)

  • equivalent_sample_size (int (default: 10)) – The equivalent/imaginary sample size (of uniform pseudo samples) for the dirichlet hyperparameters. The score is sensitive to this value, runs with different values might be useful.

  • state_names (dict (optional)) – A dict indicating, for each variable, the discrete set of states (or values) that the variable can take. If unspecified, the observed values in the data set are taken to be the only possible states.

References

[1] Scutari, Marco. An Empirical-Bayes Score for Discrete Bayesian Networks. Journal of Machine Learning Research, 2016, pp. 438–48

local_score(variable, parents)[source]

Computes a score that measures how much a given variable is “influenced” by a given list of potential parents.

structure_prior(model)[source]

Implements the marginal uniform prior for the graph structure where each arc is independent with the probability of an arc for any two nodes in either direction is 1/4 and the probability of no arc between any two nodes is 1/2.

structure_prior_ratio(operation)[source]

Return the log ratio of the prior probabilities for a given proposed change to the DAG.