# Conditional Independence Tests for PC algorithm¶

`pgmpy.estimators.CITests.``chi_square`(X, Y, Z, data, boolean=True, **kwargs)[source]

Chi-square conditional independence test. Tests the null hypothesis that X is independent from Y given Zs.

This is done by comparing the observed frequencies with the expected frequencies if X,Y were conditionally independent, using a chisquare deviance statistic. The expected frequencies given independence are . The latter term can be computed as :math:`P(X,Zs)*P(Y,Zs)/P(Zs).

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X u27C2 Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``cressie_read`(X, Y, Z, data, boolean=True, **kwargs)[source]

Cressie Read statistic for conditional independence. Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

 Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> cressie_read(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> cressie_read(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> cressie_read(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``freeman_tuckey`(X, Y, Z, data, boolean=True, **kwargs)[source]

Freeman Tuckey test for conditional independence . Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

 Read, Campbell B. “Freeman—Tukey chi-squared goodness-of-fit statistics.” Statistics & probability letters 18.4 (1993): 271-278.

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> freeman_tuckey(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> freeman_tuckey(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> freeman_tuckey(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``g_sq`(X, Y, Z, data, boolean=True, **kwargs)[source]

G squared test for conditional independence. Also commonly known as G-test, likelihood-ratio or maximum likelihood statistical significance test. Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> g_sq(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> g_sq(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> g_sq(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``independence_match`(X, Y, Z, independencies, **kwargs)[source]

Checks if X ⟂ Y | Z is in independencies. This method is implemneted to have an uniform API when the independencies are provided instead of data.

Parameters
• X (str) – The first variable for testing the independence condition X ⟂ Y | Z

• Y (str) – The second variable for testing the independence condition X ⟂ Y | Z

• Z (list/array-like) – A list of conditional variable for testing the condition X ⟂ Y | Z

• data (pandas.DataFrame The dataset in which to test the indepenedence condition.) –

Returns

p-value

Return type

float (Fixed to 0 since it is always confident)

`pgmpy.estimators.CITests.``log_likelihood`(X, Y, Z, data, boolean=True, **kwargs)[source]

Log likelihood ratio test for conditional independence. Also commonly known as G-test, G-squared test or maximum likelihood statistical significance test. Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> log_likelihood(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> log_likelihood(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> log_likelihood(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``modified_log_likelihood`(X, Y, Z, data, boolean=True, **kwargs)[source]

Modified log likelihood ratio test for conditional independence. Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> modified_log_likelihood(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> modified_log_likelihood(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> modified_log_likelihood(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``neyman`(X, Y, Z, data, boolean=True, **kwargs)[source]

Neyman’s test for conditional independence. Tests the null hypothesis that X is independent of Y given Zs.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> neyman(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> neyman(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> neyman(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```
`pgmpy.estimators.CITests.``pearsonr`(X, Y, Z, data, boolean=True, **kwargs)[source]

Computes Pearson correlation coefficient and p-value for testing non-correlation. Should be used only on continuous data. In case when uses linear regression and computes pearson coefficient on residuals.

Parameters
• X (str) – The first variable for testing the independence condition X u27C2 Y | Z

• Y (str) – The second variable for testing the independence condition X u27C2 Y | Z

• Z (list/array-like) – A list of conditional variable for testing the condition X u27C2 Y | Z

• data (pandas.DataFrame) – The dataset in which to test the indepenedence condition.

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the pearson correlation coefficient and p_value

of the test.

Returns

• Pearson’s correlation coefficient (float)

• p-value (float)

References

`pgmpy.estimators.CITests.``power_divergence`(X, Y, Z, data, boolean=True, lambda_='cressie-read', **kwargs)[source]

Computes the Cressie-Read power divergence statistic . The null hypothesis for the test is X is independent of Y given Z. A lot of the frequency comparision based statistics (eg. chi-square, G-test etc) belong to power divergence family, and are special cases of this test.

Parameters
• X (int, string, hashable object) – A variable name contained in the data set

• Y (int, string, hashable object) – A variable name contained in the data set, different from X

• Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []

• data (pandas.DataFrame) – The dataset on which to test the independence condition.

• lambda (float or string) –

The lambda parameter for the power_divergence statistic. Some values of lambda_ results in other well known tests:

”pearson” 1 “Chi-squared test” “log-likelihood” 0 “G-test or log-likelihood” “freeman-tuckey” -1/2 “Freeman-Tuckey Statistic” “mod-log-likelihood” -1 “Modified Log-likelihood” “neyman” -2 “Neyman’s statistic” “cressie-read” 2/3 “The value recommended in the paper”

• boolean (bool) –

If boolean=True, an additional argument significance_level must

be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.

If boolean=False, returns the chi2 and p_value of the test.

Returns

• If boolean = False, Returns 3 values

chi: float

The chi-squre test statistic.

p_value: float

The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.

dof: int

The degrees of freedom of the test.

• If boolean = True, returns

independent: boolean

If the p_value of the test is greater than significance_level, returns True. Else returns False.

References

 Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.

Examples

```>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
>>> data['E'] = data['A'] + data['B'] + data['C']
>>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05)
True
>>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05)
False
```