HillClimbSearch#
- class pgmpy.causal_discovery.HillClimbSearch(scoring_method: str | BaseStructureScore | None = None, start_dag: DAG | None = None, tabu_length: int = 100, max_indegree: int | None = None, expert_knowledge: ExpertKnowledge | None = None, return_type: str = 'pdag', epsilon: float = 0.0001, max_iter: int = 1000000, show_progress: bool = True)[source]#
Bases:
_ScoreMixin,_BaseCausalDiscoveryScore-based causal discovery using hill climbing optimization.
This class implements the HillClimbSearch algorithm [1] for causal discovery. Given a tabular dataset, the algorithm estimates the causal structure among the variables in the data as a Directed Acyclic Graph (DAG). The algorithm works by iteratively making local modifications to the graph structure (adding, removing, or reversing edges) and keeping changes that improve the score until a local maximum is reached.
The algorithm is a greedy local search method that: 1. Starts from an initial graph (empty by default or based on provided expert knowledge). 2. Evaluates all possible single-edge modifications (add, delete, reverse). 3. Applies the modification with the highest score improvement. 4. Repeats until no improvement can be made.
A tabu list is used to prevent the algorithm from immediately undoing recent changes, which helps avoid getting stuck in local optima.
- Parameters:
- scoring_methodstr or BaseStructureScore instance, default=None
The score to be optimized during structure estimation. Supported structure scores:
Discrete data: ‘k2’, ‘bdeu’, ‘bds’, ‘bic-d’, ‘aic-d’
Continuous data: ‘ll-g’, ‘aic-g’, ‘bic-g’
Mixed data: ‘ll-cg’, ‘aic-cg’, ‘bic-cg’
If None, the appropriate scoring method is automatically selected based on the data type. Also accepts a custom score instance that inherits from BaseStructureScore.
- start_dagDAG instance, default=None
The starting point for the local search. By default, a completely disconnected network (no edges) is used. If provided, the DAG must contain exactly the same variables as in the data.
- tabu_lengthint, default=100
The number of recent graph modifications to store in the tabu list. These modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space.
- max_indegreeint or None, default=None
If provided, the procedure only searches among models where all nodes have at most max_indegree parents. This can significantly reduce the search space and computation time for large graphs.
- expert_knowledgeExpertKnowledge instance, default=None
Expert knowledge to be used with the algorithm. Expert knowledge allows specification of:
Required edges that must be present in the final graph
Forbidden edges that cannot be present in the final graph
Temporal ordering of nodes
- return_typestr, default=’pdag’
The type of graph to return. Options are: - ‘dag’: Returns a directed acyclic graph (DAG). - ‘pdag’: Returns a partially directed acyclic graph (PDAG) where edges that
could not be oriented are left undirected.
- epsilonfloat, default=1e-4
Defines the exit condition. If the improvement in score is less than epsilon, the algorithm terminates and returns the learned model.
- max_iterint, default=1e6
The maximum number of iterations allowed. The algorithm terminates and returns the learned model when the number of iterations exceeds max_iter.
- show_progressbool, default=True
If True, shows a progress bar while learning the causal structure.
- Attributes:
- causal_graph_DAG
The learned causal graph as a DAG at a (local) score maximum.
- adjacency_matrix_pd.DataFrame
Adjacency matrix representation of the learned causal graph.
- n_features_in_int
The number of features in the data used to learn the causal graph.
- feature_names_in_np.ndarray
The feature names in the data used to learn the causal graph.
References
[1]Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009, Section 18.4.3 (page 811ff)
Examples
Simulate some data to use for causal discovery:
>>> from pgmpy.example_models import load_model >>> model = load_model("bnlearn/alarm") >>> df = model.simulate(n_samples=1000, seed=42)
Use the HillClimbSearch algorithm to learn the causal structure from data:
>>> from pgmpy.causal_discovery import HillClimbSearch >>> hc = HillClimbSearch(scoring_method="bic-d") >>> hc.fit(df) >>> hc.causal_graph_.edges()
Use expert knowledge to constrain the search:
>>> from pgmpy.causal_discovery import ExpertKnowledge >>> expert = ExpertKnowledge(forbidden_edges=[("HISTORY", "CVP")]) >>> hc = HillClimbSearch(scoring_method="bic-d", expert_knowledge=expert) >>> hc.fit(df)
- set_score_request(*, metric: bool | None | str = '$UNCHANGED$', true_graph: bool | None | str = '$UNCHANGED$') HillClimbSearch#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- metricstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
metricparameter inscore.- true_graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
true_graphparameter inscore.
- Returns:
- selfobject
The updated object.