LogLikelihoodCondGauss#

class pgmpy.structure_score.LogLikelihoodCondGauss(data, state_names=None)[source]#

Bases: BaseStructureScore

Log-likelihood score for Bayesian networks with mixed discrete and continuous variables.

This score is based on conditional Gaussian distributions [1] and supports local families with both discrete and continuous variables.

For a continuous target \(C_1\) with continuous parents \(C_2\) and discrete parents \(D\), it computes

\[\ell(C_1 \mid C_2, D) = \sum_{t=1}^{n} \log \frac{p(c_{1t}, c_{2t} \mid d_t)}{p(c_{2t} \mid d_t)}.\]

For a discrete target \(D_1\) with continuous parents \(C\) and discrete parents \(D_2\), it computes

\[\ell(D_1 \mid C, D_2) = \sum_{t=1}^{n} \log \frac{p(c_t \mid d_{1t}, d_{2t}) p(d_{1t}, d_{2t})} {p(c_t \mid d_{2t}) p(d_{2t})}.\]

The Gaussian densities are estimated from the corresponding grouped samples.

Parameters:
datapandas.DataFrame

DataFrame where columns may be discrete or continuous variables.

state_namesdict, optional

Dictionary mapping discrete variable names to their possible states.

Raises:
ValueError

If the data or variable types are not suitable for conditional Gaussian modeling.

References

[1]

Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian Networks of Mixed Variables. International Journal of Data Science and Analytics, 6(1), 3-18. https://doi.org/10.1007/s41060-017-0085-7

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.structure_score import LogLikelihoodCondGauss
>>> rng = np.random.default_rng(0)
>>> data = pd.DataFrame(
...     {
...         "A": rng.normal(size=100),
...         "B": rng.integers(0, 2, size=100),
...         "C": rng.normal(size=100),
...     }
... )
>>> score = LogLikelihoodCondGauss(data)
>>> round(score.local_score("A", ("B", "C")), 3)
np.float64(-137.319)