AIC#

class pgmpy.structure_score.AIC(data, state_names=None)[source]#

Bases: LogLikelihood

AIC structure score for discrete Bayesian networks.

AIC balances discrete log-likelihood against model complexity using a sample-size independent penalty. The local score computed by local_score(variable, parents) is

\[\operatorname{AIC}(X_i, \Pi_i) = \ell(X_i, \Pi_i) - q_i (r_i - 1),\]

where \(\ell(X_i, \Pi_i)\) is the local discrete log-likelihood, \(q_i\) is the number of parent configurations of \(\Pi_i\), and \(r_i\) is the cardinality of \(X_i\).

Parameters:
datapandas.DataFrame

DataFrame where each column represents a discrete variable. Missing values should be set to numpy.nan.

state_namesdict, optional

Dictionary mapping each variable to its discrete states. If not specified, the unique values observed in the data are used.

Raises:
ValueError

If the data contains non-discrete variables, or if the model variables are not present in the data.

References

[1]

Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009, Section 18.3.4-18.3.6.

[2]

AM Carvalho, Scoring functions for learning Bayesian networks, http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf

Examples

>>> import pandas as pd
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.structure_score import AIC
>>> data = pd.DataFrame(
...     {"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]}
... )
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> score = AIC(data)
>>> round(score.score(model), 3)
np.float64(-11.931)
>>> round(score.local_score("B", ("A",)), 3)
np.float64(-4.773)