LogLikelihood#

class pgmpy.structure_score.LogLikelihood(data, state_names=None)[source]#

Bases: BaseStructureScore

Log-likelihood structure score for discrete Bayesian networks.

This score evaluates a discrete Bayesian network structure by computing the unpenalized log-likelihood of the observed data. The local score is computed as:

\[\ell(X_i, \Pi_i) = \sum_{j=1}^{q_i} \sum_{k=1}^{r_i} N_{ijk} \log \frac{N_{ijk}}{N_{ij}},\]

with the convention \(0 \log 0 = 0\), where \(r_i\) is the cardinality of \(X_i\), \(q_i\) is the number of parent configurations of \(\Pi_i\), \(N_{ijk}\) is the count of \(X_i = k\) in parent configuration \(j\), and \(N_{ij} = \sum_{k=1}^{r_i} N_{ijk}\).

Parameters:
datapandas.DataFrame

DataFrame where each column represents a discrete variable. Missing values should be set to numpy.nan.

state_namesdict, optional

Dictionary mapping each variable to its discrete states. If not specified, the unique values observed in the data are used.

Raises:
ValueError

If the data contains non-discrete variables, or if the model variables are not present in the data.

Examples

>>> import pandas as pd
>>> from pgmpy.models import DiscreteBayesianNetwork
>>> from pgmpy.structure_score import LogLikelihood
>>> data = pd.DataFrame(
...     {"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]}
... )
>>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")])
>>> score = LogLikelihood(data)
>>> round(score.score(model), 3)
np.float64(-6.931)
>>> round(score.local_score("B", ("A",)), 3)
np.float64(-2.773)