GES#

class pgmpy.causal_discovery.GES(scoring_method: str | BaseStructureScore | None = None, return_type: str = 'pdag', min_improvement: float = 1e-06)[source]#

Bases: _ScoreMixin, _BaseCausalDiscovery

Score-based causal discovery using Greedy Equivalence Search (GES).

This class implements the GES algorithm [1] for causal discovery. Given a tabular dataset, the algorithm estimates the causal structure among the variables in the data as a Directed Acyclic Graph (DAG) or Partially Directed Acyclic Graph (PDAG).

GES works in three phases:

Forward phase: Edges are added to improve the model score.
Backward phase: Edges are removed to improve the model score.
Edge turning phase: Edge orientations are flipped to improve the score.

Parameters:

scoring_methodstr or BaseStructureScore instance, default=None

The score to be optimized during structure estimation. Supported structure scores:

Discrete data: ‘k2’, ‘bdeu’, ‘bds’, ‘bic-d’, ‘aic-d’
Continuous data: ‘ll-g’, ‘aic-g’, ‘bic-g’
Mixed data: ‘ll-cg’, ‘aic-cg’, ‘bic-cg’

If None, the appropriate scoring method is automatically selected based on the data type. Also accepts a custom score instance that inherits from BaseStructureScore.

return_typestr, default=’pdag’

The type of graph to return. Options are:

‘dag’: Returns a directed acyclic graph (DAG).
‘pdag’: Returns a partially directed acyclic graph (PDAG).

min_improvementfloat, default=1e-6

The minimum score improvement required to perform an operation (edge addition, removal, or flipping). Operations with smaller improvements are not performed.

Attributes:

causal_graph_DAG or PDAG: The learned causal graph at a (local) score maximum.
adjacency_matrix_pd.DataFrame: Adjacency matrix representation of the learned causal graph.
n_features_in_int: The number of features in the data used to learn the causal graph.
feature_names_in_np.ndarray: The feature names in the data used to learn the causal graph.

References

[1]

Chickering, David Maxwell. “Optimal structure identification with greedy search.” Journal of machine learning research 3.Nov (2002): 507-554.

Examples

Simulate some data to use for causal discovery:

>>> import numpy as np
>>> from pgmpy.example_models import load_model
>>> np.random.seed(42)
>>> model = load_model("bnlearn/alarm")
>>> df = model.simulate(n_samples=1000, seed=42)

Use the GES algorithm to learn the causal structure from data:

>>> from pgmpy.causal_discovery import GES
>>> ges = GES(scoring_method="bic-d")
>>> ges.fit(df)
GES(scoring_method='bic-d')
>>> ges.causal_graph_
<pgmpy.base.PDAG.PDAG object at 0x...>
>>> ges.n_features_in_
37

delete(u: Any, v: Any, H: set[Any], current_model: PDAG) → PDAG[source]#: Perform delete(u - v) or delete(u -> v) with conditioning set H.

insert(u: Any, v: Any, T: Iterable[Any], current_model: PDAG) → PDAG[source]#: Perform insert(u -> v) with conditioning set T.

set_score_request(*, metric: bool | None | str = '$UNCHANGED$', true_graph: bool | None | str = '$UNCHANGED$') → GES#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

metricstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for metric parameter in score.
true_graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for true_graph parameter in score.

Returns:

selfobject: The updated object.

turn(u: Any, v: Any, C: Iterable[Any], current_model: PDAG) → PDAG[source]#: Perform turn operation (reverse or orient edge between u and v) with set C.