Structure Learning in Bayesian Networks¶
In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms: 1. PC with stable and parallel variants. 2. Hill-Climb Search 3. Exhaustive Search
For PC the following conditional independence test can be used: 1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test) 2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression) 3. G-squared (https://en.wikipedia.org/wiki/G-test) 4. Log-likelihood (https://en.wikipedia.org/wiki/G-test) 5. Freeman-Tuckey (Read, Campbell B. “Freeman—Tukey chi-squared goodness-of-fit statistics.” Statistics & probability letters 18.4 (1993): 271-278.) 6. Modified Log-likelihood 7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma) 8. Cressie Read (Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464) 9. Power Divergence (Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)
For Hill-Climb and Exhausitive Search the following scoring methods can be used: 1. K2 Score 2. BDeu Score 3. Bic Score
Generate some data¶
[1]:
from itertools import combinations
import networkx as nx
from sklearn.metrics import f1_score
from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch
from pgmpy.estimators import K2Score
from pgmpy.utils import get_example_model
from pgmpy.sampling import BayesianModelSampling
[2]:
model = get_example_model("alarm")
samples = BayesianModelSampling(model).forward_sample(size=int(1e3))
samples.head()
Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 544.13it/s]
[2]:
HISTORY | CVP | PCWP | HYPOVOLEMIA | LVEDVOLUME | LVFAILURE | STROKEVOLUME | ERRLOWOUTPUT | HRBP | HREKG | ... | MINVOLSET | VENTMACH | VENTTUBE | VENTLUNG | VENTALV | ARTCO2 | CATECHOL | HR | CO | BP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | TRUE | LOW | LOW | FALSE | LOW | TRUE | LOW | FALSE | HIGH | NORMAL | ... | NORMAL | NORMAL | LOW | ZERO | ZERO | HIGH | HIGH | HIGH | LOW | LOW |
1 | FALSE | NORMAL | NORMAL | FALSE | NORMAL | FALSE | NORMAL | FALSE | HIGH | HIGH | ... | NORMAL | NORMAL | LOW | ZERO | LOW | HIGH | HIGH | HIGH | HIGH | HIGH |
2 | FALSE | NORMAL | NORMAL | FALSE | NORMAL | FALSE | NORMAL | FALSE | HIGH | HIGH | ... | LOW | LOW | ZERO | ZERO | ZERO | HIGH | HIGH | HIGH | HIGH | HIGH |
3 | FALSE | NORMAL | NORMAL | FALSE | NORMAL | FALSE | HIGH | FALSE | HIGH | HIGH | ... | HIGH | HIGH | HIGH | LOW | HIGH | LOW | HIGH | HIGH | HIGH | HIGH |
4 | FALSE | NORMAL | NORMAL | FALSE | NORMAL | FALSE | NORMAL | FALSE | HIGH | HIGH | ... | NORMAL | NORMAL | ZERO | HIGH | LOW | HIGH | HIGH | HIGH | HIGH | LOW |
5 rows × 37 columns
[3]:
# Funtion to evaluate the learned model structures.
def get_f1_score(estimated_model, true_model):
nodes = estimated_model.nodes()
est_adj = nx.to_numpy_array(
estimated_model.to_undirected(), nodelist=nodes, weight=None
)
true_adj = nx.to_numpy_array(
true_model.to_undirected(), nodelist=nodes, weight=None
)
f1 = f1_score(np.ravel(true_adj), np.ravel(est_adj))
print("F1-score for the model skeleton: ", f1)
Learn the model structure using PC¶
[4]:
est = PC(data=samples)
estimated_model = est.estimate(variant="stable", max_cond_vars=4)
get_f1_score(estimated_model, model)
Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00, 2.89s/it]
F1-score for the model skeleton: 0.7777777777777779
[5]:
est = PC(data=samples)
estimated_model = est.estimate(variant="orig", max_cond_vars=4)
get_f1_score(estimated_model, model)
Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00, 2.88s/it]
F1-score for the model skeleton: 0.7777777777777779
Learn the model structure using Hill-Climb Search¶
[6]:
scoring_method = K2Score(data=samples)
est = HillClimbSearch(data=samples)
estimated_model = est.estimate(
scoring_method=scoring_method, max_indegree=4, max_iter=int(1e4)
)
get_f1_score(estimated_model, model)
1%| | 55/10000 [00:16<50:30, 3.28it/s]
F1-score for the model skeleton: 0.84