PC#
- class pgmpy.causal_discovery.PC(variant: str = 'parallel', ci_test: str | Callable | None = None, return_type: str = 'pdag', significance_level: float = 0.01, max_cond_vars: int = 5, expert_knowledge: ExpertKnowledge | None = None, enforce_expert_knowledge: bool = False, n_jobs: int = -1, show_progress: bool = True)[source]#
Bases:
_ConstraintMixin,_BaseCausalDiscoveryThe PC algorithm for causal discovery / structure learning.
This class implements the PC algorithm [1] for causal discovery. Given a tabular dataset, the PC algorithm estimates the causal structure among the variables in the data in a Directed Acyclic Graph (DAG) or Partially Directed Acyclic Graph (PDAG). The algorithm works by identifying (conditional) dependencies in data set using statistical independence tests and estimates a DAG pattern that satisfies the identified dependencies.
When used with expert knowledge, the following flowchart can help you figure out the expected results based on different choices of parameters and the structure learned from the data.
┌──────────────────┐ No ┌─────────────┐ │ Expert Knowledge ├──────────► │ Normal PC │ │ specified? │ │ run │ └────────┬─────────┘ └─────────────┘
│
- Yes │
│ ▼
┌──────────────────┐ │ Enforce expert │ │ knowledge? │ └────────┬─────────┘
│ │
Yes │ No
┌─────────────────────────┴───────────────────────┐ │ │ ▼ ▼
┌──────────────────────────────┐ ┌─────────────────────────┐ │ │ │ │ │ 1) Forbidden edges are │ │ Conflicts with learned │ │ removed from the skeleton │ │ structure (opposite │ │ │ │ edge orientations)? │ │ 2) Required edges will be │ │ │ │ present in the final │ └───────────┬─────────────┘ │ model (but direction is │ │ │ not guaranteed) │ ┌────────────────┴──────────────────┐ │ │ Yes │ │ No └──────────────────────────────┘ │ │
▼ ▼
┌───────────────────┐ ┌──────────────────┐ │ Conflicting edges │ │ Expert knowledge │ │ are ignored │ │ applied fully │ └───────────────────┘ └──────────────────┘
- Parameters:
- variant: str, default=”parallel”
The variant of PC algorithm to run.
- “orig”: The original PC algorithm. Might not give the same results in different runs but does less
independence tests compared to stable.
“stable”: Gives the same result in every run but does needs to do more statistical independence tests.
“parallel”: Parallel version of PC Stable. Can run on multiple cores with the same result on each run. The parallel version would be faster only on datasets with large number of variables or samples. For smaller datasets, it might be slower due to the overhead of managing multiple processes.
- ci_teststr or callable, default=None
The conditional independence (CI) test to use for finding (conditional) independences in the data. This can be any of the CI test implemented in
pgmpy.ci_testsor a custom function that follows the signature of the built-in CI tests.If None, the appropriate CI test will be chosen based on the data type.
- return_typestr, default=”pdag”
The type of structure to return. Can be one of: pdag, cpdag, dag.
If return_type=pdag or return_type=cpdag: a partially directed structure is returned.
If return_type=dag, a fully directed structure is returned. This DAG is one of the possible orientations of the PDAG learned by the PC algorithm.
- significance_levelfloat, default=0.01
The p-value threshold to use for the statistical independence tests. If the p-value of a test is greater than significance_level, then the variables are considered independent.
- max_cond_varsint, default=5
The maximum number conditional variables to consider while performing conditional independence tests.
- expert_knowledge
pgmpy.estimators.ExpertKnowledge, optional Expert knowledge to be used in the causal graph construction. This needs to be an instance of
pgmpy.estimators.ExpertKnowledge. Users can specify knowledge in the form of required/forbidden edges, temporal information, or restrict the search space.- enforce_expert_knowledgebool, default=False
If True, the expert knowledge will be strictly enforced. This implies the following:
For every edge (u, v) specified in forbidden_edges, there will be no edge between u and v.
For every edge (u, v) specified in required_edges, one of the following would be present in the final model: u -> v, u <- v, or u - v (if CPDAG is returned).
If False, the algorithm attempts to make the edge orientations as specified by expert knowledge after learning the skeleton. This implies the following:
For every edge (u, v) specified in forbidden_edges, the final graph would have either v <- u or no edge except if u -> v is part of a collider structure in the learned skeleton.
For every edge (u, v) specified in required_edges, the final graph would either have u -> v or no edge except if v <- u is part of a collider structure in the learned skeleton.
- n_jobsint, default=-1
The number of jobs to run in parallel. This is only used when variant=”parallel”.
- show_progressbool, default=True
If True, shows a progress bar while learning the causal structure.
- Attributes:
- causal_graph_
DAGor :class: ~pgmpy.base.PDAG The learned causal graph.
If return_type=”pdag”, this will be a PDAG instance.
If return_type=”dag”, this will be a DAG instance.
- adjacency_matrix_pd.DataFrame
Adjacency matrix representation of the learned causal graph, i.e. causal_graph_.
- skeleton_
UndirectedGraph An estimate for the undirected graph skeleton of the DAG underlying the data.
- separating_sets_dict
A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes them conditionally independent. (needed for edge orientation procedures)
- n_features_in_int
The number of features in the data used to learn the causal graph.
- feature_names_in_np.ndarray
The feature names in the data used to learn the causal graph.
- causal_graph_
References
[1]Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search. doi:10.7551/mitpress/1754.001.0001
[2]Neapolitan, Learning Bayesian Networks, Section 10.1.2 for the PC algorithm (page 550), http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf
[3]Original PC: P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000.
[4]Stable PC: D. Colombo and M. H. Maathuis, “A modification of the PC algorithm yielding order-independent skeletons,” ArXiv e-prints, Nov. 2012.
[5]Parallel PC: Le, Thuc, et al. “A fast PC algorithm for high dimensional causal discovery with multi-core PCs.” IEEE/ACM transactions on computational biology and bioinformatics (2016).
[6]Expert Knowledge: Meek, Christopher. “Causal inference and causal explanation with background knowledge.” arXiv preprint arXiv:1302.4972 (2013).
Examples
Simulate some data to use for causal discovery:
>>> from pgmpy.example_models import load_model >>> model = load_model("bnlearn/alarm") >>> df = model.simulate(n_samples=1000, seed=42)
Use the PC algorithm to learn the causal structure from data:
>>> from pgmpy.causal_discovery import PC >>> pc = PC(variant="parallel", ci_test="chi_square", significance_level=0.01) >>> pc.fit(df) PC(ci_test='chi_square') >>> pc.causal_graph_ <pgmpy.base.PDAG.PDAG object at 0x...> >>> pc.n_features_in_ 37
Specify expert knowledge:
- set_fit_request(*, independencies: bool | None | str = '$UNCHANGED$') PC#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- independenciesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
independenciesparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, metric: bool | None | str = '$UNCHANGED$', true_graph: bool | None | str = '$UNCHANGED$') PC#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- metricstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
metricparameter inscore.- true_graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
true_graphparameter inscore.
- Returns:
- selfobject
The updated object.