{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structure Learning in Bayesian Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms:\n", "1. PC with stable and parallel variants.\n", "2. Hill-Climb Search\n", "3. Exhaustive Search\n", "\n", "For PC the following conditional independence test can be used:\n", "1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test)\n", "2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression)\n", "3. G-squared (https://en.wikipedia.org/wiki/G-test)\n", "4. Log-likelihood (https://en.wikipedia.org/wiki/G-test)\n", "5. Freeman-Tuckey (Read, Campbell B. \"Freeman—Tukey chi-squared goodness-of-fit statistics.\" Statistics & probability letters 18.4 (1993): 271-278.)\n", "6. Modified Log-likelihood\n", "7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)\n", "8. Cressie Read (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464)\n", "9. Power Divergence (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)\n", "\n", "For Hill-Climb and Exhausitive Search the following scoring methods can be used:\n", "1. K2 Score\n", "2. BDeu Score\n", "3. Bic Score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate some data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from itertools import combinations\n", "\n", "import networkx as nx\n", "from sklearn.metrics import f1_score\n", "\n", "from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch\n", "from pgmpy.estimators import K2Score\n", "from pgmpy.utils import get_example_model\n", "from pgmpy.sampling import BayesianModelSampling" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 544.13it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HISTORYCVPPCWPHYPOVOLEMIALVEDVOLUMELVFAILURESTROKEVOLUMEERRLOWOUTPUTHRBPHREKG...MINVOLSETVENTMACHVENTTUBEVENTLUNGVENTALVARTCO2CATECHOLHRCOBP
0TRUELOWLOWFALSELOWTRUELOWFALSEHIGHNORMAL...NORMALNORMALLOWZEROZEROHIGHHIGHHIGHLOWLOW
1FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...NORMALNORMALLOWZEROLOWHIGHHIGHHIGHHIGHHIGH
2FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...LOWLOWZEROZEROZEROHIGHHIGHHIGHHIGHHIGH
3FALSENORMALNORMALFALSENORMALFALSEHIGHFALSEHIGHHIGH...HIGHHIGHHIGHLOWHIGHLOWHIGHHIGHHIGHHIGH
4FALSENORMALNORMALFALSENORMALFALSENORMALFALSEHIGHHIGH...NORMALNORMALZEROHIGHLOWHIGHHIGHHIGHHIGHLOW
\n", "

5 rows × 37 columns

\n", "