Mmhc Estimator¶
- class pgmpy.estimators.MmhcEstimator(data, **kwargs)[source]¶
- estimate(scoring_method=None, tabu_length=10, significance_level=0.01)[source]¶
Estimates a BayesianNetwork for the data set, using MMHC. First estimates a graph skeleton using MMPC and then orients the edges using score-based local search (hill climbing).
- Parameters:
significance_level (float, default: 0.01) – The significance level to use for conditional independence tests in the data set. See mmpc-method.
scoring_method (instance of a Scoring method (default: BDeuScore)) – The method to use for scoring during Hill Climb Search. Can be an instance of any of the scoring methods implemented in pgmpy.
tabu_length (int) – If provided, the last tabu_length graph modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space. Default value: 100.
- Returns:
Estimated model – The estimated model without the parameterization.
- Return type:
References
Tsamardinos et al., The max-min hill-climbing Bayesian network structure learning algorithm (2005), Algorithm 3 http://www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.estimators import MmhcEstimator >>> data = pd.DataFrame(np.random.randint(0, 2, size=(2500, 4)), columns=list('XYZW')) >>> data['sum'] = data.sum(axis=1) >>> est = MmhcEstimator(data) >>> model = est.estimate() >>> print(model.edges()) [('Z', 'sum'), ('X', 'sum'), ('W', 'sum'), ('Y', 'sum')]
- mmpc(significance_level=0.01)[source]¶
Estimates a graph skeleton (UndirectedGraph) for the data set, using then MMPC (max-min parents-and-children) algorithm.
- Parameters:
significance_level (float, default=0.01) –
The significance level to use for conditional independence tests in the data set.
significance_level is the desired Type 1 error probability of falsely rejecting the null hypothesis that variables are independent, given that they are. The lower significance_level, the less likely we are to accept dependencies, resulting in a sparser graph.
- Returns:
skeleton (pgmpy.base.UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.
seperating_sets (dict) – A dict containing for each pair of not directly connected nodes a seperating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)
References
Tsamardinos et al., The max-min hill-climbing Bayesian network structure learning algorithm (2005), Algorithm 1 & 2 http://www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.estimators import MmhcEstimator >>> data = pd.DataFrame(np.random.randint(0, 2, size=(5000, 5)), columns=list('ABCDE')) >>> data['F'] = data['A'] + data['B'] + data ['C'] >>> est = PC(data) >>> skel, sep_sets = est.estimate_skeleton() >>> skel.edges() [('A', 'F'), ('B', 'F'), ('C', 'F')] >>> # all independencies are unconditional: >>> sep_sets {('D', 'A'): (), ('C', 'A'): (), ('C', 'E'): (), ('E', 'F'): (), ('B', 'D'): (), ('B', 'E'): (), ('D', 'F'): (), ('D', 'E'): (), ('A', 'E'): (), ('B', 'A'): (), ('B', 'C'): (), ('C', 'D'): ()} >>> data = pd.DataFrame(np.random.randint(0, 2, size=(5000, 3)), columns=list('XYZ')) >>> data['X'] += data['Z'] >>> data['Y'] += data['Z'] >>> est = PC(data) >>> skel, sep_sets = est.estimate_skeleton() >>> skel.edges() [('X', 'Z'), ('Y', 'Z')] >>> # X, Y dependent, but conditionally independent given Z: >>> sep_sets {('X', 'Y'): ('Z',)}