Pearsonr#
- class pgmpy.ci_tests.Pearsonr(data: DataFrame)[source]#
Bases:
_BaseCITestPartial Correlation test for conditional independence.
If \(Z = \emptyset\), compute Pearson’s correlation coefficient \(r_{XY}\) and its two-sided p-value.
If \(Z \neq \emptyset\), regress \(X\) and \(Y\) on \([1, Z]\) using least squares, compute the residuals \(r_X\) and \(r_Y\), and define the partial correlation as the Pearson correlation between those residuals. The resulting test statistic is
\[t = \rho_{XY \mid Z} \sqrt{\frac{n - |Z| - 2}{1 - \rho_{XY \mid Z}^2}},\]where \(n\) is the sample size and \(|Z|\) is the number of conditioning variables. Under the null hypothesis \(X \perp Y \mid Z\), this statistic is Student’s t distribution with \(n - |Z| - 2\) degrees of freedom.
- Parameters:
- datapandas.DataFrame
The dataset in which to test the independence condition.
- Attributes:
- statistic_float
Pearson’s correlation coefficient (or partial correlation when Z is non-empty), ranging from -1 to 1. Set after calling the test.
- p_value_float
The p-value for the test. Set after calling the test.
References
Examples
>>> import numpy as np >>> import pandas as pd >>> from pgmpy.ci_tests import Pearsonr >>> rng = np.random.default_rng(seed=42) >>> data = pd.DataFrame(data=rng.standard_normal(size=(1000, 3)), columns=["X", "Y", "Z"]) >>> test = Pearsonr(data=data) >>> test(X="X", Y="Y", Z=["Z"], significance_level=0.05) np.True_ >>> round(test.statistic_, 2) np.float64(0.01) >>> round(test.p_value_, 2) np.float64(0.87) >>> test.dof_ 997