Greedy Equivalence Search (GES)

class pgmpy.estimators.GES(data, use_cache=True, **kwargs)[source]

Implementation of Greedy Equivalence Search (GES) causal discovery / structure learning algorithm.

GES is a score-based casual discovery / structure learning algorithm that works in three phases:
  1. Forward phase: New edges are added such that the model score improves.

  2. Backward phase: Edges are removed from the model such that the model score improves.

  3. Edge flipping phase: Edge orientations are flipped such that model score improves.

Parameters:
  • data (pandas DataFrame object) – dataframe object where each column represents one variable. (If some values in the data are missing the data cells should be set to numpy.nan. Note that pandas converts each column containing numpy.nan`s to dtype `float.)

  • use_caching (boolean) – If True, uses caching of score for faster computation. Note: Caching only works for scoring methods which are decomposable. Can give wrong results in case of custom scoring methods.

References

Chickering, David Maxwell. “Optimal structure identification with greedy search.” Journal of machine learning research 3.Nov (2002): 507-554.

estimate(scoring_method='bic-d', min_improvement=1e-06, debug=False)[source]

Estimates the DAG from the data.

Parameters:
  • scoring_method (str or StructureScore instance) – The score to be optimized during structure estimation. Supported structure scores: k2, bdeu, bds, bic-d, aic-d, ll-g, aic-g, bic-g, ll-cg, aic-cg, bic-cg. Also accepts a custom score, but it should be an instance of StructureScore.

  • min_improvement (float) – The operation (edge addition, removal, or flipping) would only be performed if the model score improves by atleast min_improvement.

Returns:

Estimated model – A DAG at a (local) score maximum.

Return type:

pgmpy.base.DAG

Examples

>>> # Simulate some sample data from a known model to learn the model structure from
>>> from pgmpy.utils import get_example_model
>>> model = get_example_model('alarm')
>>> df = model.simulate(int(1e3))
>>> # Learn the model structure using GES algorithm from `df`
>>> from pgmpy.estimators import GES
>>> est = GES(data)
>>> dag = est.estimate(scoring_method='bic-d')
>>> len(dag.nodes())
37
>>> len(dag.edges())
45