GCM#

class pgmpy.ci_tests.GCM(data: DataFrame, estimator=None)[source]#

Bases: _BaseCITest

Generalized Covariance Measure (GCM) [1] test for conditional independence.

Fit an estimator on \(X\) and \(Y\) on \([1, Z]\), let \(r_X\) and \(r_Y\) denote the resulting residuals, and define \(U_i = r_{X, i} r_{Y, i}\). The resulting test statistic is

\[T = \frac{1}{\sqrt{n}} \frac{\sum_{i=1}^n U_i}{\operatorname{std}(U_1, \ldots, U_n)},\]

where \(n\) is the sample size. Under the null hypothesis \(X \perp Y \mid Z\), this statistic is asymptotically standard normal.

Parameters:

datapandas.DataFrame: The dataset in which to test the independence condition.
estimator: optional (default=None): Any regressor with fit and predict methods to compute residuals. If None, LinearRegression() is used as default.

Attributes:

statistic_float: The GCM test statistic. Set after calling the test.
p_value_float: The p-value for the test. Set after calling the test.

References

[1]

Rajen D. Shah, and Jonas Peters. “The Hardness of Conditional Independence Testing and the Generalised Covariance Measure”.

run_test(X: str, Y: str, Z: list)[source]#

Compute GCM statistic and p-value.

Sets self.statistic_ (t-statistic) and self.p_value_.