Mmhc Estimator

class pgmpy.estimators.MmhcEstimator(data, **kwargs)[source]
estimate(scoring_method=None, tabu_length=10, significance_level=0.01)[source]

Estimates a BayesianNetwork for the data set, using MMHC. First estimates a graph skeleton using MMPC and then orients the edges using score-based local search (hill climbing).

Parameters
  • significance_level (float, default: 0.01) – The significance level to use for conditional independence tests in the data set. See mmpc-method.

  • scoring_method (instance of a Scoring method (default: BDeuScore)) – The method to use for scoring during Hill Climb Search. Can be an instance of any of the scoring methods implemented in pgmpy.

  • tabu_length (int) – If provided, the last tabu_length graph modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space. Default value: 100.

Returns

  • model (BayesianNetwork()-instance, not yet parametrized.)

  • Reference

  • ———

  • Tsamardinos et al., The max-min hill-climbing Bayesian network structure learning algorithm (2005),

  • Algorithm 3

  • http (//www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf)

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import PC
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(2500, 4)), columns=list('XYZW'))
>>> data['sum'] = data.sum(axis=1)
>>> est = MmhcEstimator(data)
>>> model = est.estimate()
>>> print(model.edges())
[('Z', 'sum'), ('X', 'sum'), ('W', 'sum'), ('Y', 'sum')]
mmpc(significance_level=0.01)[source]

Estimates a graph skeleton (UndirectedGraph) for the data set, using then MMPC (max-min parents-and-children) algorithm.

Parameters

significance_level (float, default=0.01) –

The significance level to use for conditional independence tests in the data set.

significance_level is the desired Type 1 error probability of falsely rejecting the null hypothesis that variables are independent, given that they are. The lower significance_level, the less likely we are to accept dependencies, resulting in a sparser graph.

Returns

  • skeleton (UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.

  • seperating_sets (dict) – A dict containing for each pair of not directly connected nodes a seperating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)

References

Tsamardinos et al., The max-min hill-climbing Bayesian network structure learning algorithm (2005), Algorithm 1 & 2 http://www.dsl-lab.org/supplements/mmhc_paper/paper_online.pdf

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from pgmpy.estimators import PC
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(5000, 5)), columns=list('ABCDE'))
>>> data['F'] = data['A'] + data['B'] + data ['C']
>>> est = PC(data)
>>> skel, sep_sets = est.estimate_skeleton()
>>> skel.edges()
[('A', 'F'), ('B', 'F'), ('C', 'F')]
>>> # all independencies are unconditional:
>>> sep_sets
{('D', 'A'): (), ('C', 'A'): (), ('C', 'E'): (), ('E', 'F'): (), ('B', 'D'): (),
 ('B', 'E'): (), ('D', 'F'): (), ('D', 'E'): (), ('A', 'E'): (), ('B', 'A'): (),
 ('B', 'C'): (), ('C', 'D'): ()}
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(5000, 3)), columns=list('XYZ'))
>>> data['X'] += data['Z']
>>> data['Y'] += data['Z']
>>> est = PC(data)
>>> skel, sep_sets = est.estimate_skeleton()
>>> skel.edges()
[('X', 'Z'), ('Y', 'Z')]
>>> # X, Y dependent, but conditionally independent given Z:
>>> sep_sets
{('X', 'Y'): ('Z',)}