PC#

class pgmpy.causal_discovery.PC(variant: str = 'parallel', ci_test: str | Callable | None = None, return_type: str = 'pdag', significance_level: float = 0.01, max_cond_vars: int = 5, expert_knowledge: ExpertKnowledge | None = None, enforce_expert_knowledge: bool = False, n_jobs: int = -1, show_progress: bool = True)[source]#

Bases: _ConstraintMixin, _BaseCausalDiscovery

The PC algorithm for causal discovery / structure learning.

This class implements the PC algorithm [1] for causal discovery. Given a tabular dataset, the PC algorithm estimates the causal structure among the variables in the data in a Directed Acyclic Graph (DAG) or Partially Directed Acyclic Graph (PDAG). The algorithm works by identifying (conditional) dependencies in data set using statistical independence tests and estimates a DAG pattern that satisfies the identified dependencies.

When used with expert knowledge, the following flowchart can help you figure out the expected results based on different choices of parameters and the structure learned from the data.

┌──────────────────┐ No ┌─────────────┐ │ Expert Knowledge ├──────────► │ Normal PC │ │ specified? │ │ run │ └────────┬─────────┘ └─────────────┘

Yes │

│ ▼

┌──────────────────┐ │ Enforce expert │ │ knowledge? │ └────────┬─────────┘

│ │

Yes │ No

┌─────────────────────────┴───────────────────────┐ │ │ ▼ ▼

┌──────────────────────────────┐ ┌─────────────────────────┐ │ │ │ │ │ 1) Forbidden edges are │ │ Conflicts with learned │ │ removed from the skeleton │ │ structure (opposite │ │ │ │ edge orientations)? │ │ 2) Required edges will be │ │ │ │ present in the final │ └───────────┬─────────────┘ │ model (but direction is │ │ │ not guaranteed) │ ┌────────────────┴──────────────────┐ │ │ Yes │ │ No └──────────────────────────────┘ │ │

▼ ▼

┌───────────────────┐ ┌──────────────────┐ │ Conflicting edges │ │ Expert knowledge │ │ are ignored │ │ applied fully │ └───────────────────┘ └──────────────────┘

Parameters:
variant: str, default=”parallel”

The variant of PC algorithm to run.

  • “orig”: The original PC algorithm. Might not give the same results in different runs but does less

    independence tests compared to stable.

  • “stable”: Gives the same result in every run but does needs to do more statistical independence tests.

  • “parallel”: Parallel version of PC Stable. Can run on multiple cores with the same result on each run. The parallel version would be faster only on datasets with large number of variables or samples. For smaller datasets, it might be slower due to the overhead of managing multiple processes.

ci_teststr or callable, default=None

The conditional independence (CI) test to use for finding (conditional) independences in the data. This can be any of the CI test implemented in pgmpy.ci_tests or a custom function that follows the signature of the built-in CI tests.

If None, the appropriate CI test will be chosen based on the data type.

return_typestr, default=”pdag”

The type of structure to return. Can be one of: pdag, cpdag, dag.

  • If return_type=pdag or return_type=cpdag: a partially directed structure is returned.

  • If return_type=dag, a fully directed structure is returned. This DAG is one of the possible orientations of the PDAG learned by the PC algorithm.

significance_levelfloat, default=0.01

The p-value threshold to use for the statistical independence tests. If the p-value of a test is greater than significance_level, then the variables are considered independent.

max_cond_varsint, default=5

The maximum number conditional variables to consider while performing conditional independence tests.

expert_knowledgepgmpy.estimators.ExpertKnowledge, optional

Expert knowledge to be used in the causal graph construction. This needs to be an instance of pgmpy.estimators.ExpertKnowledge. Users can specify knowledge in the form of required/forbidden edges, temporal information, or restrict the search space.

enforce_expert_knowledgebool, default=False

If True, the expert knowledge will be strictly enforced. This implies the following:

  • For every edge (u, v) specified in forbidden_edges, there will be no edge between u and v.

  • For every edge (u, v) specified in required_edges, one of the following would be present in the final model: u -> v, u <- v, or u - v (if CPDAG is returned).

If False, the algorithm attempts to make the edge orientations as specified by expert knowledge after learning the skeleton. This implies the following:

  • For every edge (u, v) specified in forbidden_edges, the final graph would have either v <- u or no edge except if u -> v is part of a collider structure in the learned skeleton.

  • For every edge (u, v) specified in required_edges, the final graph would either have u -> v or no edge except if v <- u is part of a collider structure in the learned skeleton.

n_jobsint, default=-1

The number of jobs to run in parallel. This is only used when variant=”parallel”.

show_progressbool, default=True

If True, shows a progress bar while learning the causal structure.

Attributes:
causal_graph_DAG or :class: ~pgmpy.base.PDAG

The learned causal graph.

  • If return_type=”pdag”, this will be a PDAG instance.

  • If return_type=”dag”, this will be a DAG instance.

adjacency_matrix_pd.DataFrame

Adjacency matrix representation of the learned causal graph, i.e. causal_graph_.

skeleton_UndirectedGraph

An estimate for the undirected graph skeleton of the DAG underlying the data.

separating_sets_dict

A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes them conditionally independent. (needed for edge orientation procedures)

n_features_in_int

The number of features in the data used to learn the causal graph.

feature_names_in_np.ndarray

The feature names in the data used to learn the causal graph.

References

[1]

Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search. doi:10.7551/mitpress/1754.001.0001

[2]

Neapolitan, Learning Bayesian Networks, Section 10.1.2 for the PC algorithm (page 550), http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf

[3]

Original PC: P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000.

[4]

Stable PC: D. Colombo and M. H. Maathuis, “A modification of the PC algorithm yielding order-independent skeletons,” ArXiv e-prints, Nov. 2012.

[5]

Parallel PC: Le, Thuc, et al. “A fast PC algorithm for high dimensional causal discovery with multi-core PCs.” IEEE/ACM transactions on computational biology and bioinformatics (2016).

[6]

Expert Knowledge: Meek, Christopher. “Causal inference and causal explanation with background knowledge.” arXiv preprint arXiv:1302.4972 (2013).

Examples

Simulate some data to use for causal discovery:

>>> from pgmpy.example_models import load_model
>>> model = load_model("bnlearn/alarm")
>>> df = model.simulate(n_samples=1000, seed=42)

Use the PC algorithm to learn the causal structure from data:

>>> from pgmpy.causal_discovery import PC
>>> pc = PC(variant="parallel", ci_test="chi_square", significance_level=0.01)
>>> pc.fit(df)
PC(ci_test='chi_square')
>>> pc.causal_graph_
<pgmpy.base.PDAG.PDAG object at 0x...>
>>> pc.n_features_in_
37

Specify expert knowledge:

set_fit_request(*, independencies: bool | None | str = '$UNCHANGED$') PC#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
independenciesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for independencies parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, metric: bool | None | str = '$UNCHANGED$', true_graph: bool | None | str = '$UNCHANGED$') PC#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
metricstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for metric parameter in score.

true_graphstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for true_graph parameter in score.

Returns:
selfobject

The updated object.