Pearsonr#

class pgmpy.ci_tests.Pearsonr(data: DataFrame)[source]#

Bases: _BaseCITest

Partial Correlation test for conditional independence.

If \(Z = \emptyset\), compute Pearson’s correlation coefficient \(r_{XY}\) and its two-sided p-value.

If \(Z \neq \emptyset\), regress \(X\) and \(Y\) on \([1, Z]\) using least squares, compute the residuals \(r_X\) and \(r_Y\), and define the partial correlation as the Pearson correlation between those residuals. The resulting test statistic is

\[t = \rho_{XY \mid Z} \sqrt{\frac{n - |Z| - 2}{1 - \rho_{XY \mid Z}^2}},\]

where \(n\) is the sample size and \(|Z|\) is the number of conditioning variables. Under the null hypothesis \(X \perp Y \mid Z\), this statistic is Student’s t distribution with \(n - |Z| - 2\) degrees of freedom.

Parameters:

datapandas.DataFrame: The dataset in which to test the independence condition.

Attributes:

statistic_float: Pearson’s correlation coefficient (or partial correlation when Z is non-empty), ranging from -1 to 1. Set after calling the test.
p_value_float: The p-value for the test. Set after calling the test.

References

[1]

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

[2]

https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pgmpy.ci_tests import Pearsonr
>>> rng = np.random.default_rng(seed=42)
>>> data = pd.DataFrame(data=rng.standard_normal(size=(1000, 3)), columns=["X", "Y", "Z"])
>>> test = Pearsonr(data=data)
>>> test(X="X", Y="Y", Z=["Z"], significance_level=0.05)
np.True_
>>> round(test.statistic_, 2)
np.float64(0.01)
>>> round(test.p_value_, 2)
np.float64(0.87)
>>> test.dof_
997

run_test(X: str, Y: str, Z: list)[source]#

Compute Pearson correlation coefficient and p-value.

Sets self.statistic_ (Pearson’s r) and self.p_value_.