PC (Constraint-Based Estimator)¶
- class pgmpy.estimators.PC(data=None, independencies=None, **kwargs)[source]¶
- build_skeleton(ci_test='chi_square', max_cond_vars=5, significance_level=0.01, variant='stable', n_jobs=- 1, show_progress=True, **kwargs)[source]¶
Estimates a graph skeleton (UndirectedGraph) from a set of independencies using (the first part of) the PC algorithm. The independencies can either be provided as an instance of the Independencies-class or by passing a decision function that decides any conditional independency assertion. Returns a tuple (skeleton, separating_sets).
If an Independencies-instance is passed, the contained IndependenceAssertions have to admit a faithful BN representation. This is the case if they are obtained as a set of d-seperations of some Bayesian network or if the independence assertions are closed under the semi-graphoid axioms. Otherwise the procedure may fail to identify the correct structure.
- Returns
skeleton (UndirectedGraph) – An estimate for the undirected graph skeleton of the BN underlying the data.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation procedures)
References
- [1] Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550)
http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf
- [2] Koller & Friedman, Probabilistic Graphical Models - Principles and Techniques, 2009
Section 3.4.2.1 (page 85), Algorithm 3.3
Examples
>>> from pgmpy.estimators import PC >>> from pgmpy.base import DAG >>> from pgmpy.independencies import Independencies >>> # build skeleton from list of independencies: ... ind = Independencies(['B', 'C'], ['A', ['B', 'C'], 'D']) >>> # we need to compute closure, otherwise this set of independencies doesn't ... # admit a faithful representation: ... ind = ind.closure() >>> skel, sep_sets = PC(independencies=ind).build_skeleton("ABCD", ind) >>> print(skel.edges()) [('A', 'D'), ('B', 'D'), ('C', 'D')] >>> # build skeleton from d-seperations of DAG: ... model = DAG([('A', 'C'), ('B', 'C'), ('B', 'D'), ('C', 'E')]) >>> skel, sep_sets = PC.build_skeleton(model.nodes(), model.get_independencies()) >>> print(skel.edges()) [('A', 'C'), ('B', 'C'), ('B', 'D'), ('C', 'E')]
- estimate(variant='stable', ci_test='chi_square', max_cond_vars=5, return_type='dag', significance_level=0.01, n_jobs=- 1, show_progress=True, **kwargs)[source]¶
Estimates a DAG/PDAG from the given dataset using the PC algorithm which is a constraint-based structure learning algorithm[1]. The independencies in the dataset are identified by doing statistical independece test. This method returns a DAG/PDAG structure which is faithful to the independencies implied by the dataset
- Parameters
variant (str (one of "orig", "stable", "parallel")) –
The variant of PC algorithm to run. “orig”: The original PC algorithm. Might not give the same
results in different runs but does less independence tests compared to stable.
- ”stable”: Gives the same result in every run but does needs to
do more statistical independence tests.
- ”parallel”: Parallel version of PC Stable. Can run on multiple
cores with the same result on each run.
ci_test (str or fun) –
The statistical test to use for testing conditional independence in the dataset. If str values should be one of:
- ”independence_match”: If using this option, an additional parameter
independencies must be specified.
- ”chi_square”: Uses the Chi-Square independence test. This works
only for discrete datasets.
- ”pearsonr”: Uses the pertial correlation based on pearson
correlation coefficient to test independence. This works only for continuous datasets.
max_cond_vars (int) – The maximum number of conditional variables allowed to do the statistical test with.
return_type (str (one of "dag", "cpdag", "pdag", "skeleton")) –
The type of structure to return.
If return_type=pdag or return_type=cpdag: a partially directed structure is returned. If return_type=dag, a fully directed structure is returned if it
is possible to orient all the edges.
- If `return_type=”skeleton”, returns an undirected graph along
with the separating sets.
significance_level (float (default: 0.01)) –
The statistical tests use this value to compare with the p-value of the test to decide whether the tested variables are independent or not. Different tests can treat this parameter differently:
- Chi-Square: If p-value > significance_level, it assumes that the
independence condition satisfied in the data.
- pearsonr: If p-value > significance_level, it assumes that the
independence condition satisfied in the data.
- Returns
model – The estimated model structure, can be a partially directed graph (PDAG) or a fully directed graph (DAG), or (Undirected Graph, separating sets) depending on the value of return_type argument.
- Return type
DAG-instance, PDAG-instance, or (networkx.UndirectedGraph, dict)
References
- [1] Original PC: P. Spirtes, C. Glymour, and R. Scheines, Causation,
Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press, 2000.
- [2] Stable PC: D. Colombo and M. H. Maathuis, “A modification of the PC algorithm
yielding order-independent skeletons,” ArXiv e-prints, Nov. 2012.
- [3] Parallel PC: Le, Thuc, et al. “A fast PC algorithm for high dimensional causal
discovery with multi-core PCs.” IEEE/ACM transactions on computational biology and bioinformatics (2016).
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.estimators import PC >>> data = pd.DataFrame(np.random.randint(0, 5, size=(2500, 3)), columns=list('XYZ')) >>> data['sum'] = data.sum(axis=1) >>> print(data) X Y Z sum 0 3 0 1 4 1 1 4 3 8 2 0 0 3 3 3 0 2 3 5 4 2 1 1 4 ... .. .. .. ... 2495 2 3 0 5 2496 1 1 2 4 2497 0 4 2 6 2498 0 0 0 0 2499 2 4 0 6 [2500 rows x 4 columns] >>> c = PC(data) >>> model = c.estimate() >>> print(model.edges()) [('Z', 'sum'), ('X', 'sum'), ('Y', 'sum')]
- static skeleton_to_pdag(skeleton, separating_sets)[source]¶
Orients the edges of a graph skeleton based on information from separating_sets to form a DAG pattern (DAG).
- Parameters
skeleton (UndirectedGraph) – An undirected graph skeleton as e.g. produced by the estimate_skeleton method.
separating_sets (dict) – A dict containing for each pair of not directly connected nodes a separating set (“witnessing set”) of variables that makes then conditionally independent. (needed for edge orientation)
- Returns
pdag – An estimate for the DAG pattern of the BN underlying the data. The graph might contain some nodes with both-way edges (X->Y and Y->X). Any completion by (removing one of the both-way edges for each such pair) results in a I-equivalent Bayesian network DAG.
- Return type
References
Neapolitan, Learning Bayesian Networks, Section 10.1.2, Algorithm 10.2 (page 550) http://www.cs.technion.ac.il/~dang/books/Learning%20Bayesian%20Networks(Neapolitan,%20Richard).pdf
Examples
>>> import pandas as pd >>> import numpy as np >>> from pgmpy.estimators import PC >>> data = pd.DataFrame(np.random.randint(0, 4, size=(5000, 3)), columns=list('ABD')) >>> data['C'] = data['A'] - data['B'] >>> data['D'] += data['A'] >>> c = PC(data) >>> pdag = c.skeleton_to_pdag(*c.build_skeleton()) >>> pdag.edges() # edges: A->C, B->C, A--D (not directed) [('B', 'C'), ('A', 'C'), ('A', 'D'), ('D', 'A')]
Conditional Independence Tests for PC algorithm¶
- pgmpy.estimators.CITests.chi_square(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Chi-square conditional independence test. Tests the null hypothesis that X is independent from Y given Zs.
This is done by comparing the observed frequencies with the expected frequencies if X,Y were conditionally independent, using a chisquare deviance statistic. The expected frequencies given independence are . The latter term can be computed as :math:`P(X,Zs)*P(Y,Zs)/P(Zs).
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X u27C2 Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] https://en.wikipedia.org/wiki/Chi-squared_test
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.cressie_read(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Cressie Read statistic for conditional independence[1]. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> cressie_read(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> cressie_read(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> cressie_read(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.freeman_tuckey(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Freeman Tuckey test for conditional independence [1]. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] Read, Campbell B. “Freeman—Tukey chi-squared goodness-of-fit statistics.” Statistics & probability letters 18.4 (1993): 271-278.
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> freeman_tuckey(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> freeman_tuckey(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> freeman_tuckey(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.g_sq(X, Y, Z, data, boolean=True, **kwargs)[source]¶
G squared test for conditional independence. Also commonly known as G-test, likelihood-ratio or maximum likelihood statistical significance test. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] https://en.wikipedia.org/wiki/G-test
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> g_sq(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> g_sq(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> g_sq(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.independence_match(X, Y, Z, independencies, **kwargs)[source]¶
Checks if X ⟂ Y | Z is in independencies. This method is implemneted to have an uniform API when the independencies are provided instead of data.
- Parameters
X (str) – The first variable for testing the independence condition X ⟂ Y | Z
Y (str) – The second variable for testing the independence condition X ⟂ Y | Z
Z (list/array-like) – A list of conditional variable for testing the condition X ⟂ Y | Z
data (pandas.DataFrame The dataset in which to test the indepenedence condition.) –
- Returns
p-value
- Return type
float (Fixed to 0 since it is always confident)
- pgmpy.estimators.CITests.log_likelihood(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Log likelihood ratio test for conditional independence. Also commonly known as G-test, G-squared test or maximum likelihood statistical significance test. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] https://en.wikipedia.org/wiki/G-test
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> log_likelihood(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> log_likelihood(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> log_likelihood(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.modified_log_likelihood(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Modified log likelihood ratio test for conditional independence. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> modified_log_likelihood(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> modified_log_likelihood(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> modified_log_likelihood(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.neyman(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Neyman’s test for conditional independence[1]. Tests the null hypothesis that X is independent of Y given Zs.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list (array-like)) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> neyman(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> neyman(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> neyman(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False
- pgmpy.estimators.CITests.pearsonr(X, Y, Z, data, boolean=True, **kwargs)[source]¶
Computes Pearson correlation coefficient and p-value for testing non-correlation. Should be used only on continuous data. In case when uses linear regression and computes pearson coefficient on residuals.
- Parameters
X (str) – The first variable for testing the independence condition X u27C2 Y | Z
Y (str) – The second variable for testing the independence condition X u27C2 Y | Z
Z (list/array-like) – A list of conditional variable for testing the condition X u27C2 Y | Z
data (pandas.DataFrame) – The dataset in which to test the indepenedence condition.
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
- If boolean=False, returns the pearson correlation coefficient and p_value
of the test.
- Returns
Pearson’s correlation coefficient (float)
p-value (float)
References
[1] https://en.wikipedia.org/wiki/Pearson_correlation_coefficient [2] https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression
- pgmpy.estimators.CITests.power_divergence(X, Y, Z, data, boolean=True, lambda_='cressie-read', **kwargs)[source]¶
Computes the Cressie-Read power divergence statistic [1]. The null hypothesis for the test is X is independent of Y given Z. A lot of the frequency comparision based statistics (eg. chi-square, G-test etc) belong to power divergence family, and are special cases of this test.
- Parameters
X (int, string, hashable object) – A variable name contained in the data set
Y (int, string, hashable object) – A variable name contained in the data set, different from X
Z (list, array-like) – A list of variable names contained in the data set, different from X and Y. This is the separating set that (potentially) makes X and Y independent. Default: []
data (pandas.DataFrame) – The dataset on which to test the independence condition.
lambda (float or string) –
The lambda parameter for the power_divergence statistic. Some values of lambda_ results in other well known tests:
”pearson” 1 “Chi-squared test” “log-likelihood” 0 “G-test or log-likelihood” “freeman-tuckey” -1/2 “Freeman-Tuckey Statistic” “mod-log-likelihood” -1 “Modified Log-likelihood” “neyman” -2 “Neyman’s statistic” “cressie-read” 2/3 “The value recommended in the paper[1]”
boolean (bool) –
- If boolean=True, an additional argument significance_level must
be specified. If p_value of the test is greater than equal to significance_level, returns True. Otherwise returns False.
If boolean=False, returns the chi2 and p_value of the test.
- Returns
If boolean = False, Returns 3 values –
- chi: float
The chi-squre test statistic.
- p_value: float
The p_value, i.e. the probability of observing the computed chi-square statistic (or an even higher value), given the null hypothesis that X ⟂ Y | Zs.
- dof: int
The degrees of freedom of the test.
If boolean = True, returns –
- independent: boolean
If the p_value of the test is greater than significance_level, returns True. Else returns False.
References
[1] Cressie, Noel, and Timothy RC Read. “Multinomial goodness‐of‐fit tests.” Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD')) >>> data['E'] = data['A'] + data['B'] + data['C'] >>> chi_square(X='A', Y='C', Z=[], data=data, boolean=True, significance_level=0.05) True >>> chi_square(X='A', Y='B', Z=['D'], data=data, boolean=True, significance_level=0.05) True >>> chi_square(X='A', Y='B', Z=['D', 'E'], data=data, boolean=True, significance_level=0.05) False