LogLikelihoodCondGauss#
- class pgmpy.structure_score.LogLikelihoodCondGauss(data, state_names=None)[source]#
Bases:
BaseStructureScoreLog-likelihood score for Bayesian networks with mixed discrete and continuous variables.
This score is based on conditional Gaussian distributions [1] and supports local families with both discrete and continuous variables.
For a continuous target \(C_1\) with continuous parents \(C_2\) and discrete parents \(D\), it computes
\[\ell(C_1 \mid C_2, D) = \sum_{t=1}^{n} \log \frac{p(c_{1t}, c_{2t} \mid d_t)}{p(c_{2t} \mid d_t)}.\]For a discrete target \(D_1\) with continuous parents \(C\) and discrete parents \(D_2\), it computes
\[\ell(D_1 \mid C, D_2) = \sum_{t=1}^{n} \log \frac{p(c_t \mid d_{1t}, d_{2t}) p(d_{1t}, d_{2t})} {p(c_t \mid d_{2t}) p(d_{2t})}.\]The Gaussian densities are estimated from the corresponding grouped samples.
- Parameters:
- datapandas.DataFrame
DataFrame where columns may be discrete or continuous variables.
- state_namesdict, optional
Dictionary mapping discrete variable names to their possible states.
- Raises:
- ValueError
If the data or variable types are not suitable for conditional Gaussian modeling.
References
[1]Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian Networks of Mixed Variables. International Journal of Data Science and Analytics, 6(1), 3-18. https://doi.org/10.1007/s41060-017-0085-7
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.structure_score import LogLikelihoodCondGauss >>> rng = np.random.default_rng(0) >>> data = pd.DataFrame( ... { ... "A": rng.normal(size=100), ... "B": rng.integers(0, 2, size=100), ... "C": rng.normal(size=100), ... } ... ) >>> score = LogLikelihoodCondGauss(data) >>> round(score.local_score("A", ("B", "C")), 3) np.float64(-137.319)