GCM#

class pgmpy.ci_tests.GCM(data: DataFrame, estimator=None)[source]#

Bases: _BaseCITest

Generalized Covariance Measure (GCM) [1] test for conditional independence.

Fit an estimator on \(X\) and \(Y\) on \([1, Z]\), let \(r_X\) and \(r_Y\) denote the resulting residuals, and define \(U_i = r_{X, i} r_{Y, i}\). The resulting test statistic is

\[T = \frac{1}{\sqrt{n}} \frac{\sum_{i=1}^n U_i}{\operatorname{std}(U_1, \ldots, U_n)},\]

where \(n\) is the sample size. Under the null hypothesis \(X \perp Y \mid Z\), this statistic is asymptotically standard normal.

Parameters:
datapandas.DataFrame

The dataset in which to test the independence condition.

estimator: optional (default=None)

Any regressor with fit and predict methods to compute residuals. If None, LinearRegression() is used as default.

Attributes:
statistic_float

The GCM test statistic. Set after calling the test.

p_value_float

The p-value for the test. Set after calling the test.

References

[1]

Rajen D. Shah, and Jonas Peters. “The Hardness of Conditional Independence Testing and the Generalised Covariance Measure”.

run_test(X: str, Y: str, Z: list)[source]#

Compute GCM statistic and p-value.

Sets self.statistic_ (t-statistic) and self.p_value_.