HillClimbSearch#

class pgmpy.causal_discovery.HillClimbSearch(scoring_method: str | BaseStructureScore | None = None, start_dag: DAG | None = None, tabu_length: int = 100, max_indegree: int | None = None, expert_knowledge: ExpertKnowledge | None = None, return_type: str = 'pdag', epsilon: float = 0.0001, max_iter: int = 1000000, show_progress: bool = True)[source]#

Bases: _ScoreMixin, _BaseCausalDiscovery

Score-based causal discovery using hill climbing optimization.

This class implements the HillClimbSearch algorithm [1] for causal discovery. Given a tabular dataset, the algorithm estimates the causal structure among the variables in the data as a Directed Acyclic Graph (DAG). The algorithm works by iteratively making local modifications to the graph structure (adding, removing, or reversing edges) and keeping changes that improve the score until a local maximum is reached.

The algorithm is a greedy local search method that: 1. Starts from an initial graph (empty by default or based on provided expert knowledge). 2. Evaluates all possible single-edge modifications (add, delete, reverse). 3. Applies the modification with the highest score improvement. 4. Repeats until no improvement can be made.

A tabu list is used to prevent the algorithm from immediately undoing recent changes, which helps avoid getting stuck in local optima.

Parameters:
scoring_methodstr or BaseStructureScore instance, default=None

The score to be optimized during structure estimation. Supported structure scores:

  • Discrete data: ‘k2’, ‘bdeu’, ‘bds’, ‘bic-d’, ‘aic-d’

  • Continuous data: ‘ll-g’, ‘aic-g’, ‘bic-g’

  • Mixed data: ‘ll-cg’, ‘aic-cg’, ‘bic-cg’

If None, the appropriate scoring method is automatically selected based on the data type. Also accepts a custom score instance that inherits from BaseStructureScore.

start_dagDAG instance, default=None

The starting point for the local search. By default, a completely disconnected network (no edges) is used. If provided, the DAG must contain exactly the same variables as in the data.

tabu_lengthint, default=100

The number of recent graph modifications to store in the tabu list. These modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space.

max_indegreeint or None, default=None

If provided, the procedure only searches among models where all nodes have at most max_indegree parents. This can significantly reduce the search space and computation time for large graphs.

expert_knowledgeExpertKnowledge instance, default=None

Expert knowledge to be used with the algorithm. Expert knowledge allows specification of:

  • Required edges that must be present in the final graph

  • Forbidden edges that cannot be present in the final graph

  • Temporal ordering of nodes

return_typestr, default=’pdag’

The type of graph to return. Options are: - ‘dag’: Returns a directed acyclic graph (DAG). - ‘pdag’: Returns a partially directed acyclic graph (PDAG) where edges that

could not be oriented are left undirected.

epsilonfloat, default=1e-4

Defines the exit condition. If the improvement in score is less than epsilon, the algorithm terminates and returns the learned model.

max_iterint, default=1e6

The maximum number of iterations allowed. The algorithm terminates and returns the learned model when the number of iterations exceeds max_iter.

show_progressbool, default=True

If True, shows a progress bar while learning the causal structure.

Attributes:
causal_graph_DAG

The learned causal graph as a DAG at a (local) score maximum.

adjacency_matrix_pd.DataFrame

Adjacency matrix representation of the learned causal graph.

n_features_in_int

The number of features in the data used to learn the causal graph.

feature_names_in_np.ndarray

The feature names in the data used to learn the causal graph.

References

[1]

Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009, Section 18.4.3 (page 811ff)

Examples

Simulate some data to use for causal discovery:

>>> from pgmpy.example_models import load_model
>>> model = load_model("bnlearn/alarm")
>>> df = model.simulate(n_samples=1000, seed=42)

Use the HillClimbSearch algorithm to learn the causal structure from data:

>>> from pgmpy.causal_discovery import HillClimbSearch
>>> hc = HillClimbSearch(scoring_method="bic-d")
>>> hc.fit(df)
>>> hc.causal_graph_.edges()

Use expert knowledge to constrain the search:

>>> from pgmpy.causal_discovery import ExpertKnowledge
>>> expert = ExpertKnowledge(forbidden_edges=[("HISTORY", "CVP")])
>>> hc = HillClimbSearch(scoring_method="bic-d", expert_knowledge=expert)
>>> hc.fit(df)
set_score_request(*, metric: bool | None | str = '$UNCHANGED$', true_graph: bool | None | str = '$UNCHANGED$') HillClimbSearch#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
metricstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for metric parameter in score.

true_graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for true_graph parameter in score.

Returns:
selfobject

The updated object.