BDeu#
- class pgmpy.structure_score.BDeu(data, equivalent_sample_size=10, state_names=None)[source]#
Bases:
BaseStructureScoreBDeu structure score for discrete Bayesian networks with Dirichlet priors.
The BDeu score evaluates a Bayesian network structure on fully discrete data using a Dirichlet prior parameterized by an equivalent sample size. The local score computed as:
\[\operatorname{BDeu}(X_i, \Pi_i) = \sum_{j=1}^{q_i} \left[ \log \Gamma\left(\frac{\alpha}{q_i}\right) - \log \Gamma\left(N_{ij} + \frac{\alpha}{q_i}\right) + \sum_{k=1}^{r_i} \left( \log \Gamma\left(N_{ijk} + \frac{\alpha}{r_i q_i}\right) - \log \Gamma\left(\frac{\alpha}{r_i q_i}\right) \right) \right],\]where \(\alpha\) is equivalent_sample_size, \(r_i\) is the cardinality of \(X_i\), \(q_i\) is the number of parent configurations of \(\Pi_i\), \(N_{ijk}\) is the count of \(X_i = k\) in parent configuration \(j\), and \(N_{ij} = \sum_{k=1}^{r_i} N_{ijk}\).
In the implementation, state_counts(…, reindex=False) drops unobserved parent configurations to save memory. The gamma_counts_adj and gamma_conds_adj terms restore the missing \(\log \Gamma(\beta)\) and \(\log \Gamma(\alpha)\) contributions so that the returned value still equals the full BDeu score over all parent configurations.
- Parameters:
- datapandas.DataFrame
DataFrame where each column represents a discrete variable. Missing values should be set to numpy.nan.
- equivalent_sample_sizeint, optional
Equivalent sample size used to define the Dirichlet hyperparameters.
- state_namesdict, optional
Dictionary mapping each variable to its discrete states. If not specified, the unique values observed in the data are used.
- Raises:
- ValueError
If the data contains non-discrete variables, or if the model variables are not present in the data.
References
[1]Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009, Section 18.3.4-18.3.6.
[2]AM Carvalho, Scoring functions for learning Bayesian networks, http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf
Examples
>>> import pandas as pd >>> from pgmpy.models import DiscreteBayesianNetwork >>> from pgmpy.structure_score import BDeu >>> data = pd.DataFrame( ... {"A": [0, 1, 1, 0], "B": [1, 0, 1, 0], "C": [1, 1, 1, 0]} ... ) >>> model = DiscreteBayesianNetwork([("A", "B"), ("A", "C")]) >>> score = BDeu(data, equivalent_sample_size=5) >>> round(score.score(model), 3) np.float64(-9.392) >>> round(score.local_score("B", ("A",)), 3) np.float64(-3.446)