Hill Climb Search¶
- class pgmpy.estimators.HillClimbSearch(data, use_cache=True, **kwargs)[source]¶
- estimate(scoring_method='k2score', start_dag=None, fixed_edges={}, tabu_length=100, max_indegree=None, black_list=None, white_list=None, epsilon=0.0001, max_iter=1000000.0, show_progress=True)[source]¶
Performs local hill climb search to estimates the DAG structure that has optimal score, according to the scoring method supplied. Starts at model start_dag and proceeds by step-by-step network modifications until a local maximum is reached. Only estimates network structure, no parametrization.
- Parameters:
scoring_method (str or StructureScore instance) – The score to be optimized during structure estimation. Supported structure scores: k2score, bdeuscore, bdsscore, bicscore. Also accepts a custom score but it should be an instance of StructureScore.
start_dag (DAG instance) – The starting point for the local search. By default a completely disconnected network is used.
fixed_edges (iterable) – A list of edges that will always be there in the final learned model. The algorithm will add these edges at the start of the algorithm and will never change it.
tabu_length (int) – If provided, the last tabu_length graph modifications cannot be reversed during the search procedure. This serves to enforce a wider exploration of the search space. Default value: 100.
max_indegree (int or None) – If provided and unequal None, the procedure only searches among models where all nodes have at most max_indegree parents. Defaults to None.
black_list (list or None) – If a list of edges is provided as black_list, they are excluded from the search and the resulting model will not contain any of those edges. Default: None
white_list (list or None) – If a list of edges is provided as white_list, the search is limited to those edges. The resulting model will then only contain edges that are in white_list. Default: None
epsilon (float (default: 1e-4)) – Defines the exit condition. If the improvement in score is less than epsilon, the learned model is returned.
max_iter (int (default: 1e6)) – The maximum number of iterations allowed. Returns the learned model when the number of iterations is greater than max_iter.
- Returns:
Estimated model – A DAG at a (local) score maximum.
- Return type:
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.estimators import HillClimbSearch, BicScore >>> # create data sample with 9 random variables: ... data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI')) >>> # add 10th dependent variable ... data['J'] = data['A'] * data['B'] >>> est = HillClimbSearch(data) >>> best_model = est.estimate(scoring_method=BicScore(data)) >>> sorted(best_model.nodes()) ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] >>> best_model.edges() OutEdgeView([('B', 'J'), ('A', 'J')]) >>> # search a model with restriction on the number of parents: >>> est.estimate(max_indegree=1).edges() OutEdgeView([('J', 'A'), ('B', 'J')])
Structure Score¶
BDeu Score¶
Bic Score¶
K2 Score¶
BDsScore¶
- class pgmpy.estimators.BDsScore(data, equivalent_sample_size=10, **kwargs)[source]¶